Oriol Morales (oriol.morales01@campus.upf.edu) i Lucía Llorens (llucia_llorens01@campus.upf.edu)
Facultat de Ciències de la Salut i de la Vida
Universitat Pompeu Fabra
Summary |
Results |
Transcription corepressor activity
Transcription factor activity
mRNA transcription from RNA polymerase II promoter
Regulation of cell redox homeostasis
Regulation of progression through cell cycle
Regulation of transcription, DNA-dependent
Response to DNA damage stimulus
FUS is a protein component of nuclear riboprotein complexes. Because of the fact that it has a RNA binding motif it may be implicated in mRNA metabolism. It acts like a mRNA chaperone in orther to mantain the genomic stability. It was firstly discovered at myoxid liposarcoma, and was named TLS (translocated in liposarcoma).
Here we expose the biological function of FUS provided by Gene Onthology data base.
Nucleus
Protein binding
RNA binding
In the hybrid gene de RNA binding motif of FUS is repaced by de DNA-binding and the leucine zipper dimeritzation domain of CHOP. In this situation FUS is able to bind to DNA and activate the expression of some oncogenic genes like FUS-ERG oncoprotein. Furthermore, it is thought that the hybrid protein functions as abnormal transcription factor, causing deregulation of CHOP target genes and attenuation of some functions that are critical for the differentation and growth control. Because of that, myoxid liposarcoma cell aren't able to stop the DNA synthesis when there is DNA damage. In addition, as we have described before, CHOP is involved in adipose cell differentiation and the presence of aberrant transcripts can alter the molecular pathways that control the differentiation driving to the development of myoxid liposarcoma.
The FUS gene (fusion protein involved in translocation t(12;16) in malignant liposarcoma) is found at the human sixteenth chromosome on its short arm (p), 11.2 band, specifically located from the 31.098.973th to 31.110.397th position, as you can see on the following image extracted from UCSC Genome Browser Database.
It has three different transcripts which encode for the different isoforms of the FUS protein, which they are showed on the next image extracted from Ensembl Genome Browser Database. Click on the Ensembl ID of FUS to see the Ensembl Gene Report for ENSG00000089280.
FUS protein, transcript variant 1
FUS protein, transcript variant 2
FUS protein, transcript variant 3
The first transcript (Ensembl ID: ENST00000254108) encodes for the largest version of the protein (526 residues) and has 15 exons in total. On the other hand, the second transcript (Ensembl ID: ENST00000354711) is formed by an alternative splicing process and has 14 exons in total(*), missing the original seventh exon of the first transcript. Due to this the second transcript misses 35 nucleotides and as a result, it changes the codon lecture on the traduction process that derives to a new stop-codon (TGA) apparition specifically at the seventh exon (the eighth of the first transcript), which finalizes on a shorter isoform of the FUS protein (only 263 residues). Finally we found no information on the NCBI Database about the third transcript, but on the Ensembl Genome Browser we characterised that it has 14 exons and encodes for the shortest isoform of the FUS protein, which only has 151 residues. However, this last transcript is not involved in the formation of the hybrid protein TLS/CHOP so we haven't studied it.
(*)We found some information about the second transcript on the Ensembl Genome Browser Database that said that actually it had 13 exons only. But when we noticed that the first codon on the translated region wasn't an ATG (actually it was a TTT codon that encodes for a phenylalanine residue), we considered that the information was wrong or that some data was missing. In addition, the first exon of the transcript found in Ensembl coincided with the coding part of the first exon and with all the second exon from the second transcript found on the NCBI Database (we did a ClustalW in order to check it, it is presented clicking here). As a result, we thought that actually the second transcript only have missed the seventh exon of the first transcript and conserved the first and second original exons, as well as the starting codon ATG.
Below these lines we have represented the protein sequence of both trasncripts. Here we realize that the second transcript is shorter than the first. We have marked in purple colour the aminoacids which are found equal in both transcripts.
FUS Transcript 1 Protein Sequence:
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY
GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQ
SSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPR
DQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIK
VSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPD
GPGGGPGGSHMGGNYGDDRRGGRGGYDRGGYRGRGGDRGGFRGGRGGGDRGGFGPGKMDSRGEHRQDRRERPY
FUS Transcript 2 Protein Sequence:
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY
GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQ
SSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGPSGPRITS
On the following table we have the sequences of the FUS first and second transcripts extracted from the NCBI Database, each on the first and second column respectively. We have presented the processed mRNA sequences (without introns) indicating by the two colours black and blue the different exons they have. We can also identify in bolds the translated starting-codon ATG (at the first exon in both transcripts) and final untranslated stop-codon (TAA for the first transcript in the last exon and TGA for the second transcript in the eighth exon). We also have marked in bolds the seventh exon on the first transcript mRNA which is missing on the second's.
CGGGGCTGCTCAGTCCTCCAGGCGTCGGTACTCAGCGGTGTTGGAACTTCGTTGCTTGCT TGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACGATTATACCCAACAAGCAACCCAAA GCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCT ACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGCTATGGCCAGA GCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCC AGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGC AGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTT ACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGC AGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTC AGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAG GTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCG GTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACC GTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACA ACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTG GCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGAT CACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGG GTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGA CAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGC TGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACT GGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGG CAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGG GCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTG GAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCT GTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAG ATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTC GTGGTGGCAGAGGAGGCTATGATCGAGGCGGCTACCGGGGCCGCGGCGGGGACCGTGGAG GCTTCCGAGGGGGCCGGGGTGGTGGGGACAGAGGTGGCTTTGGCCCTGGCAAGATGGATT CCAGGGGTGAGCACAGACAGGATCGCAGGGAGAGGCCGTATTAATTAGCCTGGCTCCCCA GGTTCTGGAACAGCTTTTTGTCCTGTACCCAGTGTTACCCTCGTTATTTTGTAACCTTCC AATTCCTGATCACCCAAGGGTTTTTTTGTGTCGGACTATGTAATTGTAACTATACCTCTG GTTCCCATTAAAAGTGACCATTTTAGTTAAATTTTGTTCCTCTTCCCCCTTTTCACTTTC CTGGAAGATCGATGTCCCGATCAGGAAGGTAGAGAGTTTTCCTGTTCAGATTACCCTGCC CAGCAGGAACTGGAATACAGTGTTCGGGGAGAAGGCCAAATGATATCCTTGAGAGCAGAG ATTAAACTTTTCTGTCATGGGGAAAAAAAA |
In conclusion, we can affirmate that the synthesis of the different FUS transcripts by using the alternative splicing mechanism is changing the codon lecture during the translation process, and this can affect on the final translated protein. In our case, the hybrid protein TLS/CHOP conservates the first part of the aminoacidic residues from FUS protein, so it has incorporated the first exons of its mRNA, which are the same in both transcripts, as we have seen before. Because of this, we think that the hybrid protein could also suffer alternative splicing like the FUS protein, and as a result become two different isoforms.
The DDIT3 gene (DNA-damage inducible transcript 3) encodes for the CHOP protein and it's found on the human twelveth chromosome on its long arm (q), 13,3 band. It is specifically located from the 56,196,640th position to the 56,200,567th as it's seen on the next image extracted from the UCSC Genome Browser Database.
We have found on the Ensembl Genome Browser Database that the DDIT3 gene only has a single transcript (Ensembl ID:ENST00000346473) which is translated into the 169 aminoacidic residues of the final protein, which sequence is represented below. As we can see on the transcript image, the DDIT3 gene is transcribed reversely (take a look at the sense of the transcription marked by the black arrow). Click on the Ensembl ID of DDIT3 to see the Ensembl Gene Report for ENSG00000175197.
DDIT3 transcript
CHOP Protein Sequence:
MAAESLPFSFGTLSSWELEAWYEDLQEVLSSDENGGTYVSPPGNEEEESKIFTTLDPASLAWLTEEEPEP
AEVTSTSQSPHSPDSSQSSLAQEEEEEDQGRTRKRKQSGHSPARAGKQRMKEKEQENERKVAQLAEENE
RLKQEIERLTREVEATRRALIDRMVNLHQA
If we analyze the processed DDIT3 mRNA sequence which is showed below we will find that it has 4 exons in total, separated by black and blue colours. We also will realise that only the two last of them are codifying because the first translated codon ATG is on the start of the third exon and the TGA stop-codon is located in the middle of the fourth. We can see both codons marked in red bolds.
DDIT3 mRNA Sequence:
GAGGTCAGAGACTTAAGTCTAAGGCACTGAGCGTATCATGTTAAAGATGAGCGGGTGGCA
GCGACAGAGCCAAAATCAGAGCTGGAACCTGAGGAGAGAGTGTTCAAGAAGGAAGTGTAT
CTTCATACATCACCACACCTGAAAGCAGATGTGCTTTTCCAGACTGATCCAACTGCAGAG
ATGGCAGCTGAGTCATTGCCTTTCTCCTTCGGGACACTGTCCAGCTGGGAGCTGGAAGCC
TGGTATGAGGACCTGCAAGAGGTCCTGTCTTCAGATGAAAATGGGGGTACCTATGTTTCA
CCTCCTGGAAATGAAGAGGAAGAATCAAAAATCTTCACCACTCTTGACCCTGCTTCTCTG
GCTTGGCTGACTGAGGAGGAGCCAGAACCAGCAGAGGTCACAAGCACCTCCCAGAGCCCT
CACTCTCCAGATTCCAGTCAGAGCTCCCTGGCTCAGGAGGAAGAGGAGGAAGACCAAGGG
AGAACCAGGAAACGGAAACAGAGTGGTCATTCCCCAGCCCGGGCTGGAAAGCAGCGCATG
AAGGAGAAAGAACAGGAGAATGAAAGGAAAGTGGCACAGCTAGCTGAAGAGAATGAACGG
CTCAAGCAGGAAATCGAGCGCCTGACCAGGGAAGTAGAGGCGACTCGCCGAGCTCTGATT
GACCGAATGGTGAATCTGCACCAAGCATGAACAATTGGGAGCATCAGTCCCCCACTTGGG
CCACACTACCCACCTTTCCCAGAAGTGGCTACTGACTACCCTCTCACTAGTGCCAATGAT
GTGACCCTCAATCCCACATACGCAGGGGGAAGGCTTGGAGTAGACAAAAGGAAAGGTCTC
AGCTTGTATATAGAGATTGTACATTTATTTATTACTGTCCCTATCTATTAAAGTGACTTT
CTATGAGCC
In conclusion, we can affirmate that the total transcript has two non-coding exons (first and second) and two coding exons (third and fourth). And because of it only has one transcript on the Databases it only encodes for one isoform.
The TLS/CHOP gene is transcribed and translated to an hybrid protein that is made of the first part of FUS protein and the complete CHOP protein. We have analyzed more specifically it's mRNA in order to know which exons are conserved from the original genes and which aminoacids are derived from those exons.
First of all we are going to analyze the protein sequence of TLS/CHOP which is showed below. We have coloured the different regions that takes coincidences with the original proteins . We have marked in purple the aminoacids that TLS/CHOP shares with the FUS protein and in orange the aminoacids that TLS/CHOP shares with the CHOP protein. We can check that while the whole CHOP protein is conserved in the hybrid potein, only the first half of FUS is present on it, until the region encoded by the seventh exon.
TLS/CHOP Protein Sequence
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQS
TPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQ
QSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQ
QDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGVFKKEVYLHTSPHL
KADVLFQTDPTAEMAAESLPFSFGTLSSWELEAWYEDLQEVLSSDENGGTYVSPPGNEEEESKIFTTLDP
ASLAWLTEEEPEPAEVTSTSQSPHSPDSSQSSLAQEEEEEDQGRTRKRKQSGHSPARAGKQRMKEKEQEN
ERKVAQLAEENERLKQEIERLTREVEATRRALIDRMVNLHQA
As we can see in the protein sequence represented above, there are several aminoacids in the middle of the sequence that seems that are not derived from any of the two original proteins. Because of that, we need to analyze the mRNA sequence of TLS/CHOP, extracted from the NCBI Database, which is showed below. There are two columns which shows the same sequence but each represents different things. On the one hand, the first column shows the total number of exons that the sequence has; we have marked them in blue and black colours to differenciate them. On the other hand, the second column shows the conservation of the sequence with the FUS mRNA sequence (marked in purple colour) and with the DDIT3 mRNA sequence (marked in orange colour). We also have marked in red bolds the starting-codon ATG and the stop-codon TGA used in the translation process in the sequence of both columns. When we analized the protein sequence (look before) we found that some residues weren't present on FUS protien, neither CHOP protein. But now when we compare the mRNA sequence of the hybrid gene with the mRNA of the original genes, the whole sequence alignes. It is due to the fact that these residues are encoded by the non-translated second exon of DDIT3 mRNA. Using the information provided by the right column of the table, we can see which region of TLS/CHOP mRNA sequence is derived from the FUS mRNA sequence and which from the DDIT3 mRNA sequence. Considering that the whole sequence contains 1682 nucleotides, the region compressed between the first nucleotide and the 877th is derived from the first part of the FUS mRNA sequence, whereas the other region (from 877th nucleotide to the last one) is derived from the DDIT3 mRNA sequence. On the left column we can see that TLS/CHOP mRNA sequence contains 10 exons in total which all encode for the final protein, because the starting-codon ATG it's present on the first exon and the stop-codon TGA it's on the last one. If we relacionate the number of exons with the information provided on the right column, we will be able to realize that the final of the seventh exon of the TLS/CHOP mRNA sequence coincides exactly with the final conserved part of the FUS mRNA sequence, and also happens with the beginning of the eighth exon of TLS/CHOP mRNA sequence, which is the same as the beginning of the conserved part of the DDIT3 mRNA sequence. So now we can extract two conclusions about this coincidences: We can analyze in the next table the different exons of TLS/CHOP, from which original gene are derived, which are the positions that occupy in both genes and finally for which residues encode on the final protein.
ATGCTCAGTCCTCCAGGCGTCGGTGCTCAGCGGTGTTGGAACTTCGTTGCTTGCTTGC
CTGTGCGCGCGTGCGCGGACATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAG
CTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCC
TACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGCTATGGCC
AGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAAC
TCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCT
TACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCT
CGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAG
CTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGC
TATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTG
GAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGG
TGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGT
GGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCG
GCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGG
AGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA
TTTGGTGTGTTCAAGAAGGAAGTGTATCTTCATACATCA
CCACACCTGAAAGCAGATGTGCTTTTCCAGACTGATCCAACTGCAGAGATGGCAGCTG
AGTCATTGCCTTTCTCCTTCGGGACACTGTCCAGCTGGGAGCTGGAAGCCTGGTATGA
GGACCTGCAAGAGGTCCTGTCTTCAGATGAAAATGGGGGTACCTATGTTTCACCTCCT
GGAAATGAAGAGGAAGAATCAAAAATCTTCACCACTCTTGACCCTGCTTCTCTGGCTT
GGCTGACTGAGGAGGAGCCAGAACCAGCAGAGGTCACAAGCACCTCCCAGAGCCCTCA
CTCTCCAGATTCCAGTCAGAGCTCCCTGGCTCAGGAGGAAGAGGAGGAAGACCAAGGG
AGAACCAGGAAACGGAAACAGAGTGGTCATTCCCCAGCCCGGGCTGGAAAGCAGCGCA
TGAAGGAGAAAGAACAGGAGAATGAAAGGAAAGTGGCACAGCTAGCTGAAGAGAATGA
ACGGCTCAAGCAGGAAATCGAGCGCCTGACCAGGGAAGTAGAGGCGACTCGCCGAGCT
CTGATTGACCGAATGGTGAATCTGCACCAAGCATGAACAATTGGGAGCATCAGTCCCC
CACTTGGGCCACACTACCCACCTTTCCCAGAAGTGGCTACTGACTACCCTCTCACTAG
TGCCAATGATGTGACCCTCAATCCCACATACGCAGGGGGAAGGCTTGGAGTAGACAAA
AGGAAAGGTCTCAGCTTGTATATAGAGATTGTACATTTATTTATTACTGTCCCTATCT
ATTAAAGTGACTTTCTATG
ATGCTCAGTCCTCCAGGCGTCGGTGCTCAGCGGTGTTGGAACTTCGTTGCTTGCTTGC
CTGTGCGCGCGTGCGCGGACATGGCCTCAAACGATTATACCCAACAAGCAACCCA
AAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAG
CCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGCTATG
GCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTC
AACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG
TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCA
CCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGG
GAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA
AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTG
GTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAG
TGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCG
GCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCG
TGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAAT
AAATTTGGTGTGTTCAAGAAGGAAGTGTATCTTCATACATCACCACACCTGAAAGCA
GATGTGCTTTTCCAGACTGATCCAACTGCAGAGATGGCAGCTGAGTCATTGCCTTTCT
CCTTCGGGACACTGTCCAGCTGGGAGCTGGAAGCCTGGTATGAGGACCTGCAAGAGGT
CCTGTCTTCAGATGAAAATGGGGGTACCTATGTTTCACCTCCTGGAAATGAAGAGGAA
GAATCAAAAATCTTCACCACTCTTGACCCTGCTTCTCTGGCTTGGCTGACTGAGGAGG
AGCCAGAACCAGCAGAGGTCACAAGCACCTCCCAGAGCCCTCACTCTCCAGATTCCAG
TCAGAGCTCCCTGGCTCAGGAGGAAGAGGAGGAAGACCAAGGGAGAACCAGGAAACGG
AAACAGAGTGGTCATTCCCCAGCCCGGGCTGGAAAGCAGCGCATGAAGGAGAAAGAAC
AGGAGAATGAAAGGAAAGTGGCACAGCTAGCTGAAGAGAATGAACGGCTCAAGCAGGA
AATCGAGCGCCTGACCAGGGAAGTAGAGGCGACTCGCCGAGCTCTGATTGACCGAATG
GTGAATCTGCACCAAGCATGAACAATTGGGAGCATCAGTCCCCCACTTGGGCCACA
CTACCCACCTTTCCCAGAAGTGGCTACTGACTACCCTCTCACTAGTGCCAATGATGTG
ACCCTCAATCCCACATACGCAGGGGGAAGGCTTGGAGTAGACAAAAGGAAAGGTCTCA
GCTTGTATATAGAGATTGTACATTTATTTATTACTGTCCCTATCTATTAAAGTGACTT
TCTATG
GTTGCTTGCTTGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACG
AGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTC
CACGGACACTTCAGGCTATGGCCAGAGCAGCTATTCTTCTTATGGCC
AGAGCCAGAACA
SQSTDTSGYGQSSYSSYGQSQNT
GGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTC
CTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGG
GAAG
SQSSYGQQSSYPGYGQQPAPSSTSGS
GAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGG
ACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCA
GTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAG
GQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGG
TGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGC
TATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCG
GCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGA
ACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGG
GYGQQDRGGRGRGGSGGGGGGGGGGYNRSSG
GYEPRGRGGGRGGRGGMG
CCTTTCTCCTTTGGGACACTGTCCAGCTGGGAGCTGGAAGCCTGGTATGA
GGACCTGCAAGAGGTCCTGTCTTCAGATGAAAATGGGGGTACCTATGTTT
CACCTCCTGGAAATGAAGAG
LQEVLSSDENGGTYVSPPGNEE
ACTGAGGAGGAGCCAGAACCAGCAGAGGTCACAAGCACCTCCCAGAGCCC
TCACTCTCCAGATTCCAGTCAGAGCTCCCTGGCTCAGGAGGAAGAGGAGG
AAGACCAAGGGAGAACCAGGAAACGGAAACAGAGTGGTCATTCCCCAGCC
CGGGCTGGAAAGCAGCGCATGAAGGAGAAAGAACAGGAGAATGAAAGGAA
AGTGGCACAGCTAGCTGAAGAGAATGAACGGCTCAAGCAGGAAATCGAGC
GCCTGACCAGGGAAGTAGAGGCGACTCGCCGAGCTCTGATTGACCGAATG
GTGAATCTGCACCAAGCATGAACAATTGGGAGCATCAGTCCCCCACTTGG
GCCACACTACCCACCTTTCCCAGAAGTGGCTACTGACTACCCTCTCACTA
GTGCCAATGATGTGACCCTCAATCCCACATACGCAGGGGGAAGGCTTGGA
GTAGACAAAAGGAAAGGTCTCAGCTTGTATATAGAGATTGTACATTTATT
TATTACTGTCCCTATCTATTAAAGTGACTTTCTATG
PHSPDSSQSSLAQEEEEEDQGRTRKRKQSG
HSPARAGKQRMKEKEQENERKVAQLAEENE
RLKQEIERLTREVEATRRALIDRMVNLHQA
In this table we have marked in blue bolds the starting-codon ATG and the stop-codon TGA in the first and last exon sequences, and also with red bolds the aminoacid residues which are shared between the different exons. The purple bolded ATG codon located on the nineth exon corresponds to the starting-codon of the DDIT3 gene.
There are some things we can see with the information provided by the table:
We have made below those lines a similar version of the TLS/CHOP protein sequence we have presented before where we have marked in blue this new codified aminoacid sequence (by the original untranslated exons of the DDIT3 gene) and in red the valine which is codified by the new codon CTG generated by the translocation point:
TLS/CHOP Protein Sequence
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQS
TPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQ
QSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQ
QDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGVFKKEVYLHTSPHL
KADVLFQTDPTAEMAAESLPFSFGTLSSWELEAWYEDLQEVLSSDENGGTYVSPPGNEEEESKIFTTLDP
ASLAWLTEEEPEPAEVTSTSQSPHSPDSSQSSLAQEEEEEDQGRTRKRKQSGHSPARAGKQRMKEKEQEN
ERKVAQLAEENERLKQEIERLTREVEATRRALIDRMVNLHQA
Homo sapiens vs Specie | Ensembl ID | % homology peptide | |
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
![]() |
ENSBTAG00000031544 | ||
![]() |
ENSCAFG00000000232 | ||
![]() |
ENSCPOG00000012871 | ||
![]() |
|||
![]() |
ENSDARG00000059836 | ||
![]() |
ENSDNOG00000017504 | ||
![]() |
|||
![]() |
ENSETEG00000002521 | ||
![]() |
ENSEEUG00000003634 | ||
![]() |
ENSFCAG00000002084 | ||
![]() |
ENSGACG00000006480 | ||
![]() |
ENSMMUG00000011286 | ||
![]() |
ENSMUSG00000025408 | ||
![]() |
ENSORLG00000005361 | ||
![]() |
ENSPTRG00000022781 | ||
![]() |
ENSRNOG00000006789 | ||
![]() |
SINFRUG00000133607 | ||
![]() |
GSTENG00013575001 | ||
![]() |
ENSTBEG00000000521 | ||
![]() |
ENSXETG00000006245 |
The promoter sequence is represented below (which is extracted from the UCSC Genome Browser Database), which has marked in purple bolds the TSS (Transcription Starting Site) nucleotide, in blue the first exon and in red the first intron. The promoter region is written in small letters:
tttgcagttacaagacctggattcgaatcacgactcctcttagctgccctgtaatcaggcacaattacttgggtctctgagtctcactttccttatctag
aaaacggaggtatctttacttccttcgtaagactgatgacaaggaaattatctgtgcattttgaaaccacttaagccttgtacacgttttatttctggga
tcgccctggtagggcttcagaaaaataaaaaggaggtccctgagaaaaggctgggtaccgtacatctgaggtcaaccctctctggtcccaaggatggcct
gggctgttccgccccgtggctccccaggggcaaagccatgaggatccgggtgagagcccagtgctggacgagcccggggcccaggggtcccggccgaaat
ccctgctgtctttcaggtcaaacgtcataatccccgaaccccagaaaggccgaaaggcaaggcaaccctgaaagacgacgaagtcaacctcagggcgcag
gagagggagggccagtgtgctgccgacgagggaggctggagccgcggggacgaggcgccccatacagcggcaagagggtggagggcaggagctcgccatc
ctgggtgaaagcggggcccagcgaaggggcccggccacaggaatctcggttccaccccgctactcccggctgtgactccagtttcgtccccagccgccgg
gaccgccccctcgccccgcccccagcgggcactcaggccgtaccactgtgccttcatgggggtggagatagatcgtgggctagtcctgccgaggagagag
gggttcttcctcaaaaaatatgattatgtatagtattcgcatgattctagttaacttgtttcccttctgcctgctcggaccctctacctgccctacgaag
ggggcggagtgcgttcctgcctccccctgctcttccgcgtttggtgcgcgcctgcgcggtgcgtaggcggcggagcgtacttaagcttcgacgcaggagg
CGGGGCTGCTCAGTCCTCCAGGCGTCGGTACTCAGCGGTGTTGGAACTTCGTTG
CTTGCTTGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACGgtag
Now we have to compare both results to determinate which transcription factors are good candidates to bind the FUS promotor sequence. To do this comparation we have to ensure that the binding regions in the both results match approximately and the p value and the RE query should be similar. For example, the NF-AT1 transcription factor which has the lowest p value (0,21) also has a low RE query value (0,03), so it means that we have a low probability of random binding. In addition, the binding region of the promotor sequence is approximately the same in the Perl program and PROMO results. With all these data we can affirmate that the NF-AT1 transcription factor will probably bind the TLS/CHOP and FUS promotor sequences.
We also have another two transcription factors that could be considered good to bind the promotor sequence, such as AhR and YY1. The AhR transcription factor has low p and RE query values, and also would bind in a similar region in both results, the problem is that this binding region is located inside the first exon of the transcribed mRNA, so it really means that it's not a good option. On the other hand, the YY1 transcription factor matches the binding region in both results but has a very high RE query value, so it means that it has a high probability to bind the FUS or TLS/CHOP sequences by random.
To finalize this section we present again the TLS/CHOP (FUS) promotor sequence with the binding region of the NF-AT1 transcription factor marked in purple. We have also marked in orange the binding region of the other two transcription factors AhR and YY1:
tttgcagttacaagacctggattcgaatcacgactcctcttagctgccctgtaatcaggcacaattacttgggtctctgagtctcactttccttatctag
aaaacggaggtatctttacttccttcgtaagactgatgacaaggaaattatctgtgcattttgaaaccacttaagccttgtacacgttttatttctggga
tcgccctggtagggcttcagaaaaataaaaaggaggtccctgagaaaaggctgggtaccgtacatctgaggtcaaccctctctggtcccaaggatggcct
gggctgttccgccccgtggctccccaggggcaaagccatgaggatccgggtgagagcccagtgctggacgagcccggggcccaggggtcccggccgaaat
ccctgctgtctttcaggtcaaacgtcataatccccgaaccccagaaaggccgaaaggcaaggcaaccctgaaagacgacgaagtcaacctcagggcgcag
gagagggagggccagtgtgctgccgacgagggaggctggagccgcggggacgaggcgccccatacagcggcaagagggtggagggcaggagctcgccatc
ctgggtgaaagcggggcccagcgaaggggcccggccacaggaatctcggttccaccccgctactcccggctgtgactccagtttcgtccccagccgccgg
gaccgccccctcgccccgcccccagcgggcactcaggccgtaccactgtgccttcatgggggtggagatagatcgtgggctagtcctgccgaggagagag
gggttcttcctcaaaaaatatgattatgtatagtattcgcatgattctagttaacttgtttcccttctgcctgctcggaccctctacctgccctacgaag
ggggcggagtgcgttcctgcctccccctgctcttccgcgtttggtgcgcgcctgcgcggtgcgtaggcggcggagcgtacttaagcttcgacgcaggagg
CGGGGCTGCTCAGTCCTCCAGGCGTCGGTACTCAGCGGTGTTGGAACTTCGTTG
CTTGCTTGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACGgtag
Methods |
After that we programmed an algorism based in Perl language to be able to read a sequence (we use it with our promoter sequence) with the help of different TF matrix and find which of those transcription factors bind to the sequence. This program is based in four parts:
Discusion |
References |