COMPUTATIONAL GENOMIC STUDY OF TLS-CHOP



Oriol Morales (oriol.morales01@campus.upf.edu) i Lucía Llorens (llucia_llorens01@campus.upf.edu)


Facultat de Ciències de la Salut i de la Vida
Universitat Pompeu Fabra




Summary

Results

Methods

Discusion

References








Summary


In this project is explained the exonic structure, homology, function and expression of TLS-CHOP. This human hybrid gene is found in myxoid liposarcomas and is due to a characteristic chromosomal translocation, t(12;16)(q13;p11). In order to study TLS-CHOP here are shown the characteristics for each original gene separately. It is formed by FUS and DDIT3 genes, and both are expressed in different tissues like the fat tissue. When the translocation occurs, this tissue expresses an aberrant protein (TLS-CHOP) that alters the normal molecular pathways and develops to the disease.
As we are going to see, TLS-CHOP has the upstream sequence of FUS so the study of its expression and the transcription factors that are able to bind its promoter in order to regulate its transcription is focused on this gene. It is also presented an special algorism in Perl language made by us that has been used to made the promoter study.


Go back



Results




As we have said on the characterization of the genomic structure section, the hybrid gene TLS/CHOP contains at the beginning the genomic sequence of the FUS gene, so in order to study the promoter region of the TLS/CHOP gene we have to analyze which transcription factors are susceptible to bind to the promotor region of this gene.

The promoter sequence is represented below (which is extracted from the UCSC Genome Browser Database), which has marked in purple bolds the TSS (Transcription Starting Site) nucleotide, in blue the first exon and in red the first intron. The promoter region is written in small letters:

FUS promotor region

tttgcagttacaagacctggattcgaatcacgactcctcttagctgccctgtaatcaggcacaattacttgggtctctgagtctcactttccttatctag
aaaacggaggtatctttacttccttcgtaagactgatgacaaggaaattatctgtgcattttgaaaccacttaagccttgtacacgttttatttctggga
tcgccctggtagggcttcagaaaaataaaaaggaggtccctgagaaaaggctgggtaccgtacatctgaggtcaaccctctctggtcccaaggatggcct
gggctgttccgccccgtggctccccaggggcaaagccatgaggatccgggtgagagcccagtgctggacgagcccggggcccaggggtcccggccgaaat
ccctgctgtctttcaggtcaaacgtcataatccccgaaccccagaaaggccgaaaggcaaggcaaccctgaaagacgacgaagtcaacctcagggcgcag
gagagggagggccagtgtgctgccgacgagggaggctggagccgcggggacgaggcgccccatacagcggcaagagggtggagggcaggagctcgccatc
ctgggtgaaagcggggcccagcgaaggggcccggccacaggaatctcggttccaccccgctactcccggctgtgactccagtttcgtccccagccgccgg
gaccgccccctcgccccgcccccagcgggcactcaggccgtaccactgtgccttcatgggggtggagatagatcgtgggctagtcctgccgaggagagag
gggttcttcctcaaaaaatatgattatgtatagtattcgcatgattctagttaacttgtttcccttctgcctgctcggaccctctacctgccctacgaag
ggggcggagtgcgttcctgcctccccctgctcttccgcgtttggtgcgcgcctgcgcggtgcgtaggcggcggagcgtacttaagcttcgacgcaggagg
CGGGGCTGCTCAGTCCTCCAGGCGTCGGTACTCAGCGGTGTTGGAACTTCGTTG
CTTGCTTGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACG
gtag


Once we had the promoter sequence isolated we had to introduce it on a Perl program made following the instructions on the web in order to know which transcription factors were able to bind into the sequence. This program is described on the Methods section. Below these lines we have presented the results of the program on a table format. We have for each transcription factor the region which it probably binds, it's punctuation and it's p value, which we have to consider as the probability to bind this region by a random situation. If a transcription factor has a low p value (it means that we have a low probability to bind by random that position with that TF) and a positive punctuation we can consider it as a good candidate to bind the promoter sequence of the FUS gene and obviously for the promoter sequence of TLS/CHOP gene.

Transcription factor
Start binding region
End binding region
Binding region sequence
Punctuation
p value
AP-1 [T00029]
480
486
gaagtca
2,651
0,24
AR [T00040]
1044
1050
GAACTTC
2,747
0,43
c-Myc [T00140]
183
188
cacgtt
-996,022
0,66
NF-AT1 [T00550]
143
149
ggaaatt
3,201
0,21
NF-kappaB [T00590]
856
864
ttgtttccc
-996.421
0,96
SRF [T00764]
751
759
ccttcatgg
-996.355
0,43
YY1 [T00915]
294
299
atggcc
2,678
0,58
RXR-alpha [T01345]
162
167
tgaaac
2,496
0,64
HIF-1 [T01609]
1067
1075
TGCGCGCGT
-996.58
0,99
AhR [T01795]
1072
1078
GCGTGCG
3,011
0,29
PU.1 [T02068]
638
644
caggaat
2,664
0,32
HNF-4 [T02758]
1042
1049
TGGAACTT
-995.559
0,42
NRSF [T06124]
307
315
ttccgcccc
-1994.202
0,91

As we have said before, the transcription factors that have positive punctuation and lower p value are good candidates to bind our sequence. So in this case we could consider the transcription factors AP-1, NF-AT1 or AhR as good options. However, we used another method to find these transcription factors, using the PROMO Database. Below these lines we have the results in a table format. Again, we are going to analyze the binding region of the same transcription factors used in our Perl Program and their p value, in this case represented by the RE query column. This term shows the probability to obtain the same results on another sequence with the same length and the same nucleotide proportions as ours.

Transcription factor
Start binding region
End binding region
Binding region sequence
RE query
AP-1 [T00029]
672
680
tgactccag
0,07
AR [T00040]
1079
1087
GGACATGGC
0,15
NF-AT1 [T00550]
142
150
ggaaattat
0,03
NF-kappaB [T00590]
314
324
cgtggctcccc
0,03
YY1 [T00915]
293
296
atgg
4,011
RXR-alpha [T01345]
271
277
tcaaccc
0,07
AhR [T01795]
1067
1076
GCGCGCGTGC
0,03

First of all there are two things that we have to consider on the PROMO results:

  • c-Myc, SRF, HIF-1, PU.1, HNF-4 and NRSF transcription factors are not found to bind the FUS promotor region because they don't appear on the PROMO results.
  • There is a high numeric differency between the p values given by the Perl program and the RE query given by the PROMO results, but this is because both programs use a different type of matrix to analyze each transcription factor.

Now we have to compare both results to determinate which transcription factors are good candidates to bind the FUS promotor sequence. To do this comparation we have to ensure that the binding regions in the both results match approximately and the p value and the RE query should be similar. For example, the NF-AT1 transcription factor which has the lowest p value (0,21) also has a low RE query value (0,03), so it means that we have a low probability of random binding. In addition, the binding region of the promotor sequence is approximately the same in the Perl program and PROMO results. With all these data we can affirmate that the NF-AT1 transcription factor will probably bind the TLS/CHOP and FUS promotor sequences.

We also have another two transcription factors that could be considered good to bind the promotor sequence, such as AhR and YY1. The AhR transcription factor has low p and RE query values, and also would bind in a similar region in both results, the problem is that this binding region is located inside the first exon of the transcribed mRNA, so it really means that it's not a good option. On the other hand, the YY1 transcription factor matches the binding region in both results but has a very high RE query value, so it means that it has a high probability to bind the FUS or TLS/CHOP sequences by random.

To finalize this section we present again the TLS/CHOP (FUS) promotor sequence with the binding region of the NF-AT1 transcription factor marked in purple. We have also marked in orange the binding region of the other two transcription factors AhR and YY1:

FUS promotor region

tttgcagttacaagacctggattcgaatcacgactcctcttagctgccctgtaatcaggcacaattacttgggtctctgagtctcactttccttatctag
aaaacggaggtatctttacttccttcgtaagactgatgacaaggaaattatctgtgcattttgaaaccacttaagccttgtacacgttttatttctggga
tcgccctggtagggcttcagaaaaataaaaaggaggtccctgagaaaaggctgggtaccgtacatctgaggtcaaccctctctggtcccaaggatggcct
gggctgttccgccccgtggctccccaggggcaaagccatgaggatccgggtgagagcccagtgctggacgagcccggggcccaggggtcccggccgaaat
ccctgctgtctttcaggtcaaacgtcataatccccgaaccccagaaaggccgaaaggcaaggcaaccctgaaagacgacgaagtcaacctcagggcgcag
gagagggagggccagtgtgctgccgacgagggaggctggagccgcggggacgaggcgccccatacagcggcaagagggtggagggcaggagctcgccatc
ctgggtgaaagcggggcccagcgaaggggcccggccacaggaatctcggttccaccccgctactcccggctgtgactccagtttcgtccccagccgccgg
gaccgccccctcgccccgcccccagcgggcactcaggccgtaccactgtgccttcatgggggtggagatagatcgtgggctagtcctgccgaggagagag
gggttcttcctcaaaaaatatgattatgtatagtattcgcatgattctagttaacttgtttcccttctgcctgctcggaccctctacctgccctacgaag
ggggcggagtgcgttcctgcctccccctgctcttccgcgtttggtgcgcgcctgcgcggtgcgtaggcggcggagcgtacttaagcttcgacgcaggagg
CGGGGCTGCTCAGTCCTCCAGGCGTCGGTACTCAGCGGTGTTGGAACTTCGTTG
CTTGCTTGCCTGTGCGCGCGTGCGCGGACATGGCCTCAAACG
gtag




Go back



Methods




As we said in the summary, we have studied the hybrid protein using the information provided by each original gene separately, because of the fact that we found less information of TLS/CHOP than those ones.

The function information of the three proteins (hybrid and the two originals) has been consulted in diverse literary sources that are mentioned in the References section.

All the information obtained to characterise the genomic structure of the hybrid gene involved in the myoxid liposarcoma (TLS-CHOP) was provided by UCSC genome browser, NCBI and Ensembl. We also used some blast tools to analyze the genomic comparations between the hybrid gene and the FUS and DDIT3 genes, such as Blastn and Blastp, and also the ClustalW software to make more specific study about the aminoacid conservation between the proteins. It is important to say that in order to make the ClustalW between the TLS/CHOP protein and the FUS protein we used the protein sequence of the first transcript of FUS, because it conserves the original exons and is not affected by the alternative splicing mechanism.

In order to study the gene conservation and know if FUS and CHOP are highly conserved among the species we have search for information in Ensembl, NCBI and UCSC. At the first genome browser mentioned, Ensembl, it is possible to find a lot of information of ortholog genes for DDIT3 but not for FUS. It's because of that we have looked for more information in other genome browsers. So the genes that don't have an ensembl ID are the homoleg genes that we have found in NCBI and UCSC.

To analyze the expression in diferent tissues of TLS-CHOP we have consulted the expression of FUS in UCSC. We also looked for the expression of DDIT3 just to compare it with the expression of FUS. In that genome browser it's possible to access to the Gene Sorter where some tables that show the expression are found. This expression were represented by a gradiation of colours. In order to make it more clear we gave an upper brigthness to the colours.

On the study of the promoter characterization section, we first had to found the 5' upstream region from the TSS (Transcription Starting Site) of the FUS gene using the information provided by the USCS Genome Browser Database. We looked for the DNA sequence of the first 1000 nucleotides upstream region from the TSS of the FUS gene and the 100 nucleotides downstream region from it.

After that we programmed an algorism based in Perl language to be able to read a sequence (we use it with our promoter sequence) with the help of different TF matrix and find which of those transcription factors bind to the sequence. This program is based in four parts:

  • The first part consists in reading the matrix for each TF and save them as a hash of vectors.
  • The second part consists in transform this hash into a weight matrix, considering the nucleotide proportions in the sequence we have introduced.
  • The third part consists in calculate the best punctuation of the matrix in every possible binding position of the transcription factor along the sequence.
  • The final part consists in calculate the p value for each transcription factor, in where we consider the probability we have to find that TF in that position by random.

We have presented the program algorism we used clicking here.

And the results of this program are showed by clicking here.

In addition, we have used the PROMO software. In the SelectSpecies section we only chase human factors and sites in order to reduce the number of TF obtained. However, we obtained so many TF so we decided to decrease the dissimilarity rate into 10 and give special attention only to those transcription factors that were analyzed with the Perl algorism.



Go back



Discusion




To begin this section we are going to explain what we think it could happen with the hybrid protein TLS/CHOP. On the one hand, the FUS part doesn't conserve the RNA-binding region from the original protein so this is not able to stabilize the cellular mRNA when it's transcribed. This happens, as we said before, because this domain it's replaced by the CHOP DNA-binding and dimerization domains. As a result the FUS part of the hybrid protein would acquire a new function consisting in binding to another gene promoters using the CHOP DNA domain, causing the expression of oncogenic proteins which can help to develop the myoxid liposarcoma. On the other hand, in spite of the whole CHOP protein is conserved in TLS/CHOP, the fact that it is bound to the FUS protein and has translated it's second untranslated-exon would cause that the protein doesn't fold correctly and that would be the reason why it becomes to an afunctional transcription factor, incapable to bind it's normal target genes, which include the adipocyte growth and diferenttiation genes. This second fact also contributes to develope the disease.

We think that FUS has been discovered recently because as we said in the first section, it was named TLS when they found that it was involved in myoxid liposarcoma. Furthermore, there are other evidences like the fact that there are more ortholog genes for DDIT3 than for FUS. Actually in Ensembl database you will find lots of ortholog genes for CHOP but zero ortholog genes for FUS. We would affirmate that both genes are highly conservated in the mammal family (there is an exception on the conservation of the DDIT3 in the cat, only 51%), so it means that both play an important role in the organism. We can say that DDIT3 has an important function because not only controls the differentiation of an specific tissue, but also acts as a tumour supressor gene.

As we can see in the tables from expression caracteritzation, FUS is highly expressed in thymus, thyroid, CD4+ T cells and testis. It has a higher expression than the normal but not so intense in bood and lung. We can also apreciate that it has a low expression in lots of difernet organs like kidney, liver, heart, pancreas, skin, stomach and brain. The results of the expression of DDIT3 show that it also has a wide variety of expression on different tissues. DDIT3 is highly expressed in testis, thymus, lymph node,stomach, lung, trachea and bone marrow. It is shown that is has a low expression in ovary, heart, liver, kidney, blood and pancreas. We think that it could be related to the fact that these organs have a low composition of fat tissue. As CHOP is involved in cell division control and FUS stabilizes the mRNA it is natural that they have such an extensive expression.

We have used different strategies to find the TF that bind to the TLS-CHOP promoter. This results aren't quite reliable because the two methods used don't match exactly in all the FT. There is only one transcription factor that matches more or less in both methods and is suceptible to bind the promoter: NF-AT1. In addition, looking on the literature we found that there are some studies that demonstrated that there are three TFs that can bind the TLS-CHOP promoter. They are AP-2, GFC and Sp-1, so no one is NF-AT1. In the Perl program we have analyzed a variant of AP-2, the AP-1, which has a p value of 0,24, so we could consider this family of TF also good candidates.



Go back



References



  1. Crozac,A.,Aman,P.,Mandahi,N.,Ron,D. Fusion of CHOP to a novel RNA-binding protein in human myoxid liposarcoma. Nature363(6430):640-4.17 Jun 1993

  2. Thelin-Jarnum,S.,Lassen,C.,Panagopoulos,I.,Mandahl,N.,Aman,P. Identification of genes differentially expressed in TLS-CHOP carrying myoxid liposarcomas. Int J Cancer83(1):30-3.24 Sept 1999

  3. Rabbitts,TH.,Foster,A.,Larsson,R.,Nathan,P. Fusion of the dominant negative transcription regulator CHOP with a novel gene FUS by translocation t(12;16) in malignant liposarcoma. Nat Genet.4(2):175-80.Jun 1993

  4. Atlas of Genetics and Cytogenetics in Oncology and Hematology

  5. Online Book: Cancer Medicine. 6th ed. Kufe, Donald W.; Pollock, Raphael E.; Weichselbaum, Ralph R.; Bast, Robert C., Jr.; Gansler, Ted S.; Holland, James F.; Frei III, Emil, editors. Hamilton (Canada): BC Decker Inc; c2003

  6. OMIM data base: *137070 FUSION, DERIVES FROM 12-16 TRANSLOCATION, MALIGNANT LIPOSARCOMA;FUS

  7. Ensembl Genome Browser Database

  8. UCSC Genome Browser Database

  9. NCBI Database

  10. ClustalW

  11. PROMO Software



Go back