Chimeric MOZ-ASXH2 fusion protein

Manuel Gomez (manuel.gomez01@upf.edu), Sonia Laguna (sonia.laguna01@upf.edu) and Samanta Yubero (samanta.yubero01@upf.edu)

Faculty of Health and Life Science

Pompeu Fabra University

4th course of Biology

Index

Abstract
Results
Methods and procedures
Discussion
References

In this project we will talk about the protein number 2, a fusion protein that is codified by the genes Moz and Ashx2. These two genes are expressed in two separated chromosomes and translate two different proteins. However, sometimes Moz is translocated and it is fused to Asxh2 what may cause an alteration in histone acetylation, resulting in the development of therapy-related Myelodysplastic Syndrome (tDMS). This syndrome is an adverse effect of successful acute myeloblastic leukemia treatment and its symptoms develope 17 months after the treatment. Spectral karyotyping analysis detected a chromosomal translocation at t(2;8)(q23.2;q11.2) and fluorescence in situ hybridation analysis suggested a rearrangement in the monocytic leukemia zinc finger (MOZ gene). This gene has HAT activity (histone acetyl transferase), which plays fundamental roles in regulating chromatin structure, remodeling, transcription and other nuclear processes in leukemogenesis.

Results

The results we got are explained in separate parts that are genomic structure characterisation, homology study, gene expression, gene promotor region study and functionality.

a) Genomic structure characterisation

Moz gene

MOZ gene is also known as MYST3 (MYST histone acetyltransferase 3). It is located at the chromosome 8, p11. It only has one transcript.
Transcript 1: It has 17 exons. The first exon and half of the second and half of the last one do not codify for protein. Then, only 16 exons are codify and translate a protein with 16 exons and 13 of them are used to make our fusion protein.

Figure 1. In this picture, Moz exons are coloured. Yellow colour means not codifying region and green colour means codifying region.

Moz localisation. Here you can find Moz position in the chromosome.

Figure 2. In UCSC we found out how many transcripts had each gene and we shared with ensembl information. It confirmed that Moz is a gene with only one transcript, located at chromosome number 8. In UCSC we also got more information about homology, gene expression and conservative domains that we will explain later. In this grafic we can see that at the beginning of the gene sequence there is a significative conservation between some species like mouse, rabbit, rat, dog, armadillo and others.

Asxh2 gene

ASXH2 gene belongs to the group of polycomb proteins and is also known as ASXL2 (additional sex combs like 2). It is located at the chromosome 2, from p23.3 to p24.1. This gene has two transcripts:

Transcript 1: This transcript has 13 exons, from the first one to seventh and half of heighth are non-codify. When it is translated it forms a protein with 6 exons, all of them are used to make the fusion protein.
Transcript 2: This transcript has 12 exons, the half of the first and the last are non-codify. When it is translated it forms a protein with 12 exons, all of them are used to make the fusion protein.

Figure 3. In this picture, Asxh2 exons are coloured. Yellow colour means not codifying region andred colour means codifying region.

Asxh2 localisation. Here you can find Asxh2 position in the chromosome.

Figure 4. UCSC shows that in Ensembl this gene had 4 transcripts but in Ensembl we found only 2. UCSC confirmed that Asxh2 gene had 2 transcripts. It is located at chromosome number 2. In this grafic we can see that there are 3 places very conservated beetwen the same species as Moz. It happens at the beginning and twice in the middle of gene sequence.

Relation between two transcripts:

We have seen that two transcripts have the same coordenates of all exons but in number 3 and 13 .
It has also been checked that the exon number 12 of the transcript 2 is made of all the exon 12 of transcript 1, plus the intron from the exon 12 to 13 of this transcript and the exon 13.
There has been a codon deletion in intron 2-3 of transcript 1. To solve this change of coordenates, in the exon 3 there is a codon insertion, then the coordenates of two transcripts become equal again.

Moz - Asxh2 fusion

Figure 5. This fusion protein is expressed in bone narrow in a pediatric case of therapy-related myelodysplastic syndrome (t-MDS). Moz gene on chromosome 8p11.2 was found to be rearranged to the 2p23 locus. Exons 2-14 of the Moz gene are fused to exons 2-13 of the Asxh2 due to t(2;8)(p23.3;p11.2).

Moz gene has only one transcript, so it has only one isoform. Asxh2 gene has two alternative transcripts and we can see that they have different number of exons. Therefore we could say that there is alternative splicing. We have two isoforms that mantain the same open reading frame because the two transcripts are composed by the same codons. The differents isoforms afect codify and non-codify exons.

Figure 6. In this picture we can see that the open reading frame is the same in the two transcripts.

The fusion protein only considers the transcript number 2 of Asxh2 gene because the transcript number 1 is a part of transcript 2.

Figure 7. This sequence is our protein. The red corresponds to Moz gen, green and blue colour corresponds to Asxh2 gen transcript 2 and the sequence in blue to the transcript 1.

b) Homology study

Homology tables

We have studied all the ortolog species of each gene through Ensembl and Ncbi databases and it has been found out that Moz and Ashx2 genes are homolog to many common species. All of them are showed in the tables below.

Ensembl database

Ortolog species

Moz gene

Pictures

Ortolog species

Asxh2 gene

Best Reciprocal Hit

Pan Troglodites

99%

Pan Troglodites

99%

Best Reciprocal Hit

One to one

Bos Taurus

89%

Bos Taurus

86%

One to one

One to one

Rattus norvegicus

88%

Rattus norvegicus

74%

One to one

One to one

Mus musculus

88%

Mus musculus

77%

One to one

One to one

Echinops telfairi

87%

Echinops telfairi

71%

One to one

One to one (apparent)

Canis familiaris

83%

Canis familiaris

77%

One to one

One to one (apparent)

Macaca mulatta

83%

Macaca mulatta

96%

One to one

One to one

Gallus gallus

83%

Gallus gallus

59%

One to one

One to one

Tupaia belangeri

82%

Tupaia belangeri

82%

One to one

One to one

Monodelphis domestica

79%

Monodelphis domestica

66%

One to one

One to one

Loxondonta africana

77%

Loxondonta africana

62%

One to one

One to one

Erinaceus europeaus

75%

Erinaceus europeaus

61%

One to one

One to one

Dasypus novemcinctus

68%

Dasypus novemcinctus

81%

One to one

One to one

Cavia porcellus

65%

Cavia porcellus

70%

One to one

One to one

Ornithorhyncus anatinus

65%

Ornithorhyncus anatinus

60%

One to one

One to one

Felis catus

58%

Felis catus

78%

One to one

One to one

Danio rerio

58%

Danio rerio

35%

One to one

One to one

Tetraodon nigroviridis

58%

Tetraodon nigroviridis

19%

One to one

One to one

Takifugu rubripes

56%

Takifugu rubripes

32%

One to one

One to one

Gasterosteus aculeatus

54%

Gasterosteus aculeatus

34%

One to one

One to one

Oryctolagus cuniculus

32%

Oryctolagus cuniculus

66%

One to one

One to one

Aedes aegypti

9%

Aedes aegypti

12%

One to one

One to one

Xenopus tropicales

59%

One to one

Oryzias latipes

36%

One to one

Drosophila melanogaster

25%

One to many

Anopheles gambiae

8%

One to many

Ciona savigny

7%

One to many

Ciona intestinalis

7%

One to many

Saccharomyces cerevisiae

6%

In the table, the first and the last columns explain the relation between two species compared. We found that Pan troglodites is the best ortolog for Moz and Asxh2 genes in Homo sapiens and the relation is reciprocal. It is the Best Reciprocal Hit.

One to one means that one gene of Homo sapiens is similar to one gene in another specie.
One to many means that one gene of Homo sapiens is similar to many genes of another specie because maybe there has been duplications.
Many to many means that many genes of Homo sapiens is similar to many genes of another specie.

The third and sixth columns explains the % homology, from more to less.

Here we have the two species ortologs shared by two genes.

Ortolog species

Moz gene

Asxh2 gene

Pictures

Pan Troglodites

99%

99%

Bos Taurus

89%

86%

Rattus norvegicus

88%

74%

Mus musculus

88%

77%

Echinops telfairi

87%

71%

Canis familiaris

83%

77%

Macaca mulatta

83%

96%

Gallus gallus

83%

59%

Tupaia belangeri

82%

82%

Monodelphis domestica

79%

66%

Loxondonta africana

77%

62%

Erinaceus europeaus

75%

61%

Dasypus novemcinctus

68%

81%

Cavia porcellus

65%

70%

Ornithorhyncus anatinus

65%

60%

Felis catus

58%

78%

Danio rerio

58%

35%

Tetraodon nigroviridis

58%

19%

Takifugu rubripes

56%

32%

Gasterosteus aculeatus

54%

34%

Oryctolagus cuniculus

32%

66%

Aedes aegypti

9%

12%

Ncbi database

Ncbi database and the results of Moz gene are in the table below.

Figure 8. In this picture we see the homology between Moz gene of Homo sapiens and others species.

d: Means the number of nucleotide substitutions per site, corrected for multiple substitutions using the method of Jukes and Cantor.
dN/dS: Is the ratio of the rate of nonsynonymous substitutions (dN) to the rate of synonymous substitutions(dS), calculated using the method of Nei and Gojobori. A high value of this metric indicates adaptive selection, whereas a low value indicates purifying selection.
dNR/dNC: Is the ratio of radical nonsynonymous substitutions (dNR) to conservative nonsynonymous substitutions (dNC), calculated using the method of Hughes. This metric is analogous to dN/dS, but it has the advantage of being useful for studying the evolution of sequences that diverged in the distant past.

Ncbi has less ortolog species than Ensembl but all of them are very similar in their % homology. However, A.gambiae shows different results.

Ncbi database results of Asxh2 gene.

Figure 9. In this picture we see the homology between Asxh2 gene of Homo sapiens and others species.

Ncbi has less ortolog species than Ensembl but all of them are very similar in their % homology. However, R.norvegicus and M.musculus shows different results.

Gene's evolution

To study gene's evolution we built a tree of each gene. There are differences from Ensembl homology tables because in the trees all the sequences are compared to all sequences and in Ensembl just our sequence was compared to the rest.

It is important to say that there are species in Ensembl tables that are not on the phylogenetic trees. This is because some species were so far from the rest that the Mega program could not consider them to make the alignment.

Phylogenetic tree of Moz gene

Figure 10. The numbers on the tree are the probability that the relation between species could be faithful. This tree is not very faithful because number sometimes are too low.

The more branch length the more separate are the species from Homo sapiens. Then, this phylogenetic tree shows that Moz gene in Homo sapiens is very similar to Moz gene in Pan troglodites and less similar in B. taurus and M. mulatta. The furthest specie is Aedes.

Phylogenetic tree of Asxh2 gene

Figure 11. This phylogenetic tree shows that Asxh2 gene in Homo sapiens is very similar to Moz gene in Pan troglodites and less similar in M.mulata. The furthest species are Takifugu and Tetraodon.

c) Gene expression

Moz and Asxh2 have been studied separately because the protein fusion is only expressed when the myelodysplastic syndrome occurs. Red colour means high expression, green colour low, black colour means normal expression and when is no colour refers at the non gene expression.

The expression of Moz has been studied in normal tissues but Asxh2 is not expressed in normal tissues, so we had to study this gene in other database (GNF atlas 2).

Moz gene

High expression	Low expression	Normal expression
Brain frontal cortex Brain temporal cortex	Thyroid Thymus Lymph node Adrenal Embryonic Stem Cells Ovary Lung Heart Kidney Buffycoat	Testes Small bowel duodenum

Figure 12. This table shows Moz gene expression in normal tissues. The information is in next link (Page in UCSC)

Asxh2 gene

High expression	Low expression	Normal expression
PB-CD4+ Tcells Thymus Bone narrow Pancreatic islets	Heart Adipocyte Fetal brain Ovary Liver	Amygdala Whole brain skin testis lung kidney

Figure 13. This table shows Asxh2 gene expression tissues. The information is in next link (Page in UCSC)

d) Gene promotor region

Figure 14. As we can see in the table, the four first columns correspond to the results of our program and the fifth column to the results of PROMO web page.

To select the transcription factors (TF's) first of all we choose the factors with low p-value and high score, because the score measures the union of the TF to the promoter sequence and explains if it is good or not. The p-value is the probability we can find the maximum score in random sequences, so low values show that the union of the TF to the promoter sequence is difficult to see by chance and then is reasonable to think that this TF binds to the promoter region. We have 5 TF's for Moz gene (NF-AT1, c-Myc, YY1, AhR and PU.1) and 2 for Asxl2 gene (NF-kappaB and SRF) with good score.
After this, we compare this results to the TF's proposed by PROMO.

The difference between RE query and RE equally is that RE query uses a sequence that has the same proportion of aminoacids as our sequence. However, RE equally uses a sequence that has the same random proportion of each aminoacid (25%). Therefore, we think that RE query suits better to our protein because it has the same proportion of aminoacids.
Finally we select the two factors in red bolds as probably transcription factors of Moz gene and the factor in red bold of Asxl2 gene.

The score, position and p-value of each factor has been calculated with the transcription factors prediction program.

To see the code click to this link. Program font code

e) Study of gene functionality

First of all we will study the functionality of each gene separately and then we will see when and what for is expressed the fusion protein.

1. Moz gene

1. Structural information

Figure 15. This pictures shows Moz gene domains

This gene is found in nucleus and codifies for a protein that has 2004 amino acids, 225 kDa. Its domains are:

NEMM domain (N-term region of MOZ) including a H15 (linker H1 and H5 like) nuclear localization domain. Linker histone H1 is an essential component of chromatin structure. H1 links nucleosomes into higher order structures. Histone H5 performs the same function as histone H1, and replaces H1 in certain cells.
Two PHD that is aplant homeodomain, also known as LAP (leukemia associated protein). PHD finger is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in epigenetics and chromatin-mediated transcriptional regulation. The PHD finger binds two zinc ions using the so-called 'cross-brace' motif and is thus structurally related to the RING finger and the FYVE finger. It is not yet known if PHD fingers have a common molecular function. Some reports suggest that the PHD finger acts as a ubiquitin ligase, protein binding or zinc ion binding.
Essential part of the Histone acetyltransferase domain (HAT MOZ-SAS). They play fundamental roles in regulating chromatin remodeling, transcription, and other nuclear processes. It is formed by C2HC Zn finger, that is another variant of the RING-finger. Moz is a monocytic leukemia Zn_finger protein and it was reported to be homologous to acetyltransferases.
An acidic (Glu-Asp) domain. Localisation of breakpoints in the inv(8) and in the t(8;22) in 1118, and a Ser-(Pro-Glu)-Met rich domain, localisation of the t(8;16) breakpoint in 1547.
Our fusion protein is a Moz translocation to chromosome 2 and in the picture there is not this breakpoint. We know that chimeric Moz - Asxh2 fusion protein is formed by 14 exons of Moz as we said at the beginning. Therefore, we looked for the position where the exon number 14 ends. There we think that could be the breakpoint. It is in the aminoacid number 1013, that is near from the other known translocations.
The following link shows Moz protein sequence, and the differents exons have different colours (blue and black). The breakpoint position is thought to be in the aminoacid number 1013 that is the first of the fifteenth exon and it has yellow colour. Moz breakpoint

2. Moz domain logos

In Interpro inside Ensembl we found Moz domains. There, in pfam, we made the logos of 4 most important domains in this gene. The logos explain the aminoacid more conservated in every position. The higher is the letter the more conservated has been during the evolution.

Here you can find the logo of MOZ-SAS domain. MOZ-SAS domain
Here you can find the logo of PHD domain. PHD domain
Here you can find the logo of zinc finger domain. Zinc finger domain
Here you can find the logo of histone domain. Histone domain

3. Moz protein conservation

Here we show the conservation of Moz protein in human proteins.

In EMBL page we found a database called String and it gave to us some more information about the protein.

In next link there is shown the conservation of Moz protein with other human proteins. This protein is very similar from another of the same family; Myst4. There are another humans proteins that share only some regions. Human proteins

This gene presents some SNPs in its sequence what means that there are changes in one aminoacid. In the table below we can see that there are more synonymous changes than non-synonymous. This means that, although SNPs, aminoacids can be the same and probably they are important for the protein functionality.

4. Functionality

In gene ontology there are 3 functions described:

Acetyltransferase activity it makes easier the catalysis of the transfer of an acetyl group to an acceptor molecule.
Histone acetyltransferase activity makes the catalysis of the reaction: acetyl-CoA + histone = CoA + acetyl-histone. See the reaction
Histone acetylation is one major mechanism by which chromatin structure and function are regulated. Aberrant acetylation has been linked to the development of various human diseases. Through acetylating histone and nonhistone proteins, histone acetyltransferases (HATs) play fundamental roles in regulating chromatin remodeling, transcription, and other nuclear processes.
Transcription factor binding: interacts selectively with a transcription factor, any protein required to initiate or regulate transcription.

We have seen that this gene is envolved in some important biological processes like DNA packaging, histone acetylation, myeloid cell differentation, negative and positive regulation of transcription. That has been seen that when Moz is implicated in DNA packaging it can be associated to Asxh1l, one protein from Asxh2's family.

Moz has both transcription activation and transcription repression domains. The N-terminus is involved in transcriptional activation while the C-terminus is involved in transcriptional repression. Histone acetyltransferase may act as a transcriptional coactivator for Runx1 and Runx2.

Runx family is characterized by a highly conserved region of 128 amino acids, termed the Runt domain. The Runt domain is responsible for DNA binding and heterodimerization with CBFB (PEBP2b), which increases its DNA-binding affinity and also stabilizes Runx proteins against proteolytic degradation. The C-terminal portion is rich in proline, serine and threonine (PST region) and contains functional domains acting to regulate transcription.
Runx1 is a transcription factor (activator) for various hematopoietic-specific genes: binds to a large number of promotors and enhancers.is a transcription factor belonging to Runx family. Runx1 (runt-related transcription factor 1) is also known as AML1 (acute myeloid leukemia 1)and it is located at 21q22.3.
Runx2 is an osteoblast-specific transcription factor that plays a central role in osteoblast differentiation, chondrocyte maturation, bone formation and remodeling. It is a key target of mechanical signals that affect bone biology and is located at 6p21.
They both are expressed in nucleus.

2. Asxh2 gene

1. Structural information

Asxh2 is related to Asx Polycomb group proteins, implicated in embryogenesis and carcinogenesis through transcriptional regulation of target genes. ASXH1 is one of human homologs of Drosophila Asx. There has been a search for ASXL1-related gene within the human genome by using bioinformatics and identified the ASXL2 gene.
Human ASXL2 (1435 aa) showed 79.4% total-amino-acid identity with mouse Asxl2 (1370 aa), and 29.8% total-amino-acid identity with human ASXL1.

In Pubmed we found that Proteins from this family have three domains that were found to be conserved between human ASXL2 and ASXL1. However, this information does not correspond to the protein databases like Interpro, pfam, Uniprot...

ASXN domain (codon 1-86 of ASXL2)
ASXM domain (codon 269-380 of ASXL2)
PHD domain (codon 1400-1431 of ASXL2)

In pfam we found that this gene has many low complexity domains and one coiled coil. Low complexity domains are blue in the picture and coiled coil is green.

Figure 16. Asxh2 gene domains.

2. Asxh2 domain logos

As we said before, we did not find any important domain in Asxh2 gene. For this reason, we could not do any logo.

3. Asxh2 gene conservation

Here we show the conservation of Asxh2 protein in human proteins.

In EMBL page we found a database called String and it gave to us some more information about the protein.

In next link there is shown the conservation of Asxh2 protein with other human proteins. This protein is very similar from another of the same family; Asxh1. There are another humans proteins that share only some regions. Human proteins

This gene presents some SNIPs in its sequence what means that there are changes in one aminoacid. In the table below we can see that there are more non-synonymous changes than synonymous. This means that there are more changes in aminoacids, probably they are not important for the protein functionality.

4. Functionality

Polycomb group and trithorax group proteins are implicated in embryogenesis and carcinogenesis due to transcriptional regulation of target genes through histone modification and chromatin remodeling. Based on functional conservation and human chromosomal localization, ASXL2 and ASXL1 genes were predicted cancer-associated genes. It is also important to say that drosophila Asx mutations exhibit anterior and posterior transformations and it has been studied that Drosophila Asx is one of the ETP (Enhancers of trithorax and Polycomb) genes with dual functions in transcriptional activation and silencing.

3. MOZ-ASXH2 fusion protein

1. Structural information

In UCSC we find that Chimeric Moz-Asxh2 protein fusion has four domains but in pfam we found that only has 2 and in interpro we found 6 different domains. Trying to make it clear we looked for some pictures that show Moz - Asxh2 fusion protein and we think that it has two PHD domains and HAT Moz-Sas domain from Moz gene and then it is fusioned to some low complexity domains of Asxh2 gene.

Znf_PHD domain. It is the structure molecule of PHD domain.
MOZ_SAS domain. It is the structural molecule of Histone acetyltransferase.

Figure 17. Chimeric Moz-Asxh2 fusion protein. We can see the two Moz domains (PHD and Moz-SAS) and the low complexity Asxh2 domains.

2. Functionality

This fusion protein is expressed in bone narrow in a pediatric case of therapy-related myelodysplastic syndrome (t-MDS). MOZ gene on chromosome 8p11.2 was found to be rearranged to the 2p23 locus and exons 2-13 of the ASXL2 gene are fused to exons 2-14 of the MYST3 gene due to t(2;8)(p23.3;p11.2). By genomic cloning of the breakpoint, a novel fusion of the MOZ-ASXH2 was identified. Because MOZ also modulates transcription by regulating local histone acetylation, the MOZ-ASXH2 might cause the development of MDS through inducing an aberrant local chromatin structure and an abnormal gene expression. It has one subunitat that is a component of the NuA4 histone acetyltransferase complex, that is involved in epigenetic transcriptional activation of selected genes principally by acetylation of nucleosomal histones H4, H3, H2B, H2A. Acetylation of histone H4 is essential for DNA double-strand break repair through homologous recombination.

Myelodysplastic syndrome (MDS) is a pre-cancerous disorder in which there is impaired production of the normal components of blood (white cells, red cells and platelets).

MDS is frequently associated with fatigue, shortness of breath, infection and serious bleeding and patients are at risk of developing frank acute leukemia. Some individuals with MDS survive for years with little treatment required; others have complications relating to their low blood counts from the time of diagnosis, particularly if their white cell or platelet count is very low. One of the best predictors of outcome in MDS is the chromosomal content of the bone marrow cells and this requires a bone marrow sample to be taken. This test is helpful in assessing whether a patient with MDS is likely to develop acute leukemia within weeks, months or years, a development, which is usually associated with more symptoms for the patient.

Figure 18. Two large plasma cells; one with multiples

vacuoles, two of wich are over the nucleus. Marrow 100X

Treatment

Transfusions: Red cell and platelet transfusions are helpful in relieving the symptoms of MDS.
Chemotherapy: Intravenous chemotherapy is effective in producing a remission (correction of low blood counts and disappearance of abnormal cells from the bone marrow), in 70% of patients with frank acute leukemia. The same therapy is much less effective in MDS with less than 50% of patients achieving a remission. In addition, in patients who enter a remission, the MDS always recurs, often very quickly after completion of treatment. Furthermore, neither duration of survival nor quality of life is improved in patients responding to chemotherapy.
Bone Marrow Transplantation (BMT): High-dose chemotherapy, with or without radiation followed by a related or unrelated donor Bone Marrow Transplant is the only curative therapy available for patients with MDS. In Vancouver, 30% of MDS patients have been cured by this procedure although the results have not been as good for those patients with certain chromosomal abnormalities in their bone marrow cells.
Non-myeloablative - mini-transplant: This new procedure allows stem cell transplants to be performed more safely in patients 56-65 years old. Conditioning chemotherapy is low-dose and is designed only to suppress the patient's immune system enough to accept the donor's cells. The risk of a mini-transplant is still significant and it is not clear whether this procedure can cure patients with MDS.
5-Azacytidine: This is a chemotherapy drug, which has been shown to prolong survival and to improve quality of life in MDS patients. This drug is given by injection for seven days each month. It can decrease the likelihood of infection or the need for a transfusion.
Thalidomide: can reduce the red cell transfusion needs in 30% of patients with early MDS. It is expensive and causes drowsiness and constipation as common side effects.

Methods and procedures

First of all we did a Blast in Ncbi database with our protein number two. There we knew that our protein was a fusion protein and the gene names : Moz and Asxh2.

Then, we studied the gene structure, two both separately and later the fusioned genes. Using Ensembl we found all the information about transcripts, localisation and length of the genes and a little introduction to the genes functionality. We found out how many transcripts had each gene and how exons had each transcript. In exon information we found how many exons had each transcript and which of them were codify and non-codify and then which of them codified for our protein. It was also checked that joining two parts of the translated sequence of each gene was the same protein sequence as the protein number 2. Therefore, two genes were fusioned without insertion or delection. All the information in Ensembl was compared to UCSC genome browser to make sure it was right.

In transcript information we found that one gene had two different transcripts, and they had the same sequence but different codify exons. Then we knew that there was alternative splicing and then there were different isoforms. There we also found information about the open reading frame.

We studied homology in Ensembl, where we found the ortologs species of each gene. After that we made a table with the ortologs of each gene separately and another one with the ortologs shared. In this table we also explained the relation between species (one to one, one to many or many to many). All this information was compared to Ncbi databaser to make sure it was right and then we made another table with results achieved.

To study gene's evolution we built a tree of each gene. To do that, we got the sequences of every transcript of all ortologs found in Ensembl. After this we aligned these sequences with ClustalW and then we used Bioedit to transform it to fasta format. Finally, in Mega program we got mega format and the tree were obtained through Neighbour joining.

In Gene sorter in UCSC we studied Gene expression. We modified the brightness of the colours to see better in which tissues was expressed the gene. The expression of Moz has been studied in normal tissues but Asxh2 is not expressed in normal tissues, so we had to study this gene in other database (GNF atlas 2).

To characterise gene promotor region we first got promotor sequence upstream and downstream in UCSC of each gene. We pasted this sequence together in Promo with a dissimilarity factor unless 10%. There we obtaneid all transcription factors thant bound to our fusion protein. After that we did a informatic program following the instructions done.

Through some web pages we found the domains of each gene. We explained the functionality of each domain and we built some Logos with Interpro. In EMBL page we found a database called String which gave to us some extra information about the whole protein of each gene. In Ensembl we found SNPs of each gene and we put this information into a table. In Gene ontology we found the functions of each gene and in Pubmed and web pages there was extra information about interactions and the disease Myelodysplastic syndrome (MDS).

Discussion

In this project we will talk about the protein number 2, a fusion protein that is codified by the genes Moz and Ashx2. There is not much information about this protein and above all Asxh2 gene. The discussion we have arrived at is based on the only two scientific articles we found about these genes.

Moz gene

MOZ gene is located at chromosome 8, it only has one transcript, so it has only one isoform. This transcript has 17 exons which 16 exons codify and translate a protein. 13 exons are used to make our fusion protein.
This gene presents some SNPs in its sequence. There are more synonymous changes than non-synonymous. This means that, although SNPs, aminoacids can be the same and probably they are important for the protein functionality. Moz protein has five domains and it's very similar to Myst4.

We found that Pan troglodites is the best ortholog for Moz gene in Homo sapiens and the relation is reciprocal. NCBI has less ortolog species than Ensembl but all of them are very similar in their % homology. However, A.gambiae shows different results.In the phylogenetic tree Moz gene in Homo sapiens is very similar to Moz gene in Pan troglodites , B. taurus , M. mulatta and furthest similar to Aedes.

The expression of Moz has been studied in normal tissues and we found a high expression in brain frontal cortex and brain temporal cortex. The low expression was in thyroid, thymus, lymph node, adrenal, embryonic Stem Cells, ovary, lung, heart, kidney and buffycoat.

Moz gene has 3 functions described: acetyltransferase activity, histone acetyltransferase activity and transcription factor binding. We have seen that this gene is envolved in some important biological processes like DNA packaging, histone acetylation, myeloid cell differentation, negative and positive regulation of transcription.

About TF's, in the case of Moz gene selected TF's were NF-AT1 and AhR because they fits with the information we have.

Asxh2 gene

Asxh2 gene is located at the chromosome 2. Has two transcripts: one of this transcript has 13 exons and the other one has 12 exons. When it is translated it forms a protein with 12 exons, all of them are used to make the fusion protein. The fusion protein only considers the transcript number 2 of Asxh2 gene because the transcript number 1 is a part of transcript 2.

Asxh2 gene has two alternative transcripts and they have different number of exons. Alternative splicing producese two isoforms. Proteins from this family have three domains.
Pan troglodites is the best ortholog for Asxh2 gene in Homo sapiens. Asxh2 gene in Homo sapiens is very similar to Asxh2 gene in M.mulatta. The furthest specie is Takifug and Tetraodon.
The expression of Asxh2 was studied in GNF atlas 2 database. We found a high expression in PB-CD4+ Tcells, thymus, bone narrow and pancreatic islets. The low expression was in heart, adipocyte, fetal brain, ovary and liver.

Polycomb group and trithorax group proteins are implicated in embryogenesis and carcinogenesis due to transcriptional regulation of target genes through histone modification and chromatin remodeling.

In the case of Asxh2 gene the selected TF was one: YY1, although the high p-value.

Moz-Asxh2 fusion protein

The following link shows Moz protein sequence, and the differents exons have different colours (blue and black). The breakpoint position is thought to be in the aminoacid number 1013 that is the first of the fifteenth exon and it has yellow colour.Moz Breakpoint

We think that chimeric protein has two PHD domains and HAT Moz-Sas domain from Moz gene and then it is fusioned to some low complexity domains of Asxh2 gene. We have arrived at that conclusion after finding some pictures about this protein fusion and because MOZ modulates transcription by regulating local histone acetylation. Acetylation makes histones have more negative charges. DNA is also negatively charged, and then they repulse each other. Then, DNA cannot be packed and it makes transcription of other genes possible. We think that the translocation of Moz gene to chromosome number two plus this activity can make transcript some genes normally repressed and envolved in leukemogenesis. Therefore, the fusion MOZ-ASXH2 gene might cause the development of MDS through inducing an aberrant local chromatin structure and an abnormal gene expression. One sign of the abnormal expression is the presence of our protein.

Figure 19. One plasma cell containing a fine network of fibers

that may be outlining stored inmmunoglobulin packets.

References

Web pages

Databases & Tools

NCBI
Ensembl
UCSC
Pfam
Clustalw
Gene Ontology
PROMO Page
Mega 3.1
Bioedit

Papers

Go up

	Ortolog species	Moz gene	Pictures	Ortolog species	Asxh2 gene
Best Reciprocal Hit	Pan Troglodites	99%		Pan Troglodites	99%	Best Reciprocal Hit
One to one	Bos Taurus	89%		Bos Taurus	86%	One to one
One to one	Rattus norvegicus	88%		Rattus norvegicus	74%	One to one
One to one	Mus musculus	88%		Mus musculus	77%	One to one
One to one	Echinops telfairi	87%		Echinops telfairi	71%	One to one
One to one (apparent)	Canis familiaris	83%		Canis familiaris	77%	One to one
One to one (apparent)	Macaca mulatta	83%		Macaca mulatta	96%	One to one
One to one	Gallus gallus	83%		Gallus gallus	59%	One to one
One to one	Tupaia belangeri	82%		Tupaia belangeri	82%	One to one
One to one	Monodelphis domestica	79%		Monodelphis domestica	66%	One to one
One to one	Loxondonta africana	77%		Loxondonta africana	62%	One to one
One to one	Erinaceus europeaus	75%		Erinaceus europeaus	61%	One to one
One to one	Dasypus novemcinctus	68%		Dasypus novemcinctus	81%	One to one
One to one	Cavia porcellus	65%		Cavia porcellus	70%	One to one
One to one	Ornithorhyncus anatinus	65%		Ornithorhyncus anatinus	60%	One to one
One to one	Felis catus	58%		Felis catus	78%	One to one
One to one	Danio rerio	58%		Danio rerio	35%	One to one
One to one	Tetraodon nigroviridis	58%		Tetraodon nigroviridis	19%	One to one
One to one	Takifugu rubripes	56%		Takifugu rubripes	32%	One to one
One to one	Gasterosteus aculeatus	54%		Gasterosteus aculeatus	34%	One to one
One to one	Oryctolagus cuniculus	32%		Oryctolagus cuniculus	66%	One to one
One to one	Aedes aegypti	9%		Aedes aegypti	12%	One to one
One to one	Xenopus tropicales	59%
One to one	Oryzias latipes	36%
One to one	Drosophila melanogaster	25%
One to many	Anopheles gambiae	8%
One to many	Ciona savigny	7%
One to many	Ciona intestinalis	7%
One to many	Saccharomyces cerevisiae	6%