Index of contents


Abstract
Introduction
Aims
Metodology:
    1. Obtaining sequences
    2. Phylogenetic analyse
    3. Genomic information: conserved domains, genetic structure, cromosomal location.
Results:
    1. Phylogenetic distribution
    2. Human genes that code for aminoacyl-tRNA synthetases:
            2.1. The E-P-tRNA synthetase: a Bifunctional enzyme
            2.2. Whep-TRS domain signature
            2.3. Exonic structure
            2.4. Nucleotide variability in humans (SNPs)
Discussion
Bibliography
 
 
 



 


ABSTRACT



The codon/amino acid relationships of the genetic code are established by the aminoacylation reactions of tRNA synthetases. Because of its universality, the appearance of the modern genetic code is thought to predate the separation of prokaryotic and eukaryotic organisms in the universal phylogenetic tree. We present here a phylogenetic analysis that shows an unusual picture for tyrosyl and tryptophanyl-tRNA synthetases. In particular, the eukaryotic tyrosyl and tryptophanyl-tRNA synthetases are more related to each other than to their respective prokaryotic counterparts. In contrast, each of the other eukaryotic synthetases is more related to its prokaryotic counterpart than to any eukaryotic synthetase specific for a different amino acid. Our results raise the possibility that present day tyrosyl and tryptophanyl-tRNA synthetases appeared after the separation of nucleated cells from eubacteria.

This protein family has been evolving for a long time. It is for this reason that has incorporated a lot of modifications, like show all members of this family. After separation of Drosophila melanogaster many genes duplications has produced different copies for each gene. Some of these copies has been lost, but some of them has been conserved.
 
 

Go to index
 


INTRODUCTION




Aminoacyl-tRNA  synthetases  are  a  group  of  enzymes  which activate amino acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In  prokaryotic organisms  there  are  at  least twenty different types  of aminoacyl-tRNA synthetases,  one for each different amino acid. In eukaryotes  there are  generally two aminoacyl-tRNA synthetases for  each  different amino  acid: one cytosolic form and a mitochondrial form. While all these  enzymes have  a  common  function, they are widely diverse in terms of subunit size and of quaternary structure.

A few years ago it was found  that several aminoacyl-tRNA synthetases share a  region  of  similarity in  their   N-terminal  section,  in  particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is  very  well  conserved. The 'HIGH' region has been shown to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine,  tryptophan,  and   valine.  These  aminoacyl-tRNA  synthetases  are referred to as class-I synthetases  and seem to share the same tertiary structure based on a Rossmann fold.

Aminoacyl-tRNA synthetases are molecules that contain genetic signs extended before ancestral life  was made. For that reason their analyse is a dynamic reflex of the general evolution. Their molecular phylogeny links with the accepted philogeny for all organisms. (Fig. 1. Life universal tree based on leucyl-tRNA synthetase analyse).
 
 

Fig.1 Unrooted tree using Neighbor-Joinin for the leuRS.
Numbers in % indicates the branch bootstrap for 1000 replicates.



Every cell has 20 aminoacyl-tRNA synthetases, one for each aminoacid. Aminoacyl-tRNA synthetases has been classified in two classes depending on their common structural domain and their homologies sequences. They all have in commun an estructural domain called WHEP which main function is binding ATP molecules. The aminoacyl-tRNA synthetase classification is showed below:
 
 

Class I
Class II
1st
2nd
Leucine
Isoleucine
Valine
Arginine
Cisteine
Metionine
Serine
Treonine
Alanine
Glicine
Proline
Histidine
1b
2b
Glutamic
Glutamine
Lysine-I
Aspartate
Asparragine
Lysine-II
1c
2c
Tyrosine
Tryptophan
Phenylalanine

The 10th class I aminoacyl-tRNA synthetases has a Rossman fold, which is caracterized for the conserved motive 'HIGH'
(this region is made by His-Ile-Gly-His tetrapeptide and located at N-terminal region of the protein) and the motive 'KMSKS'. Other 10th class II aminoacyl-tRNA synthetases are caracterized for three motives degenerated sequences.

Because of  their essential role in the establishment of the genetic code, can be considered that aminoacyl-tRNA synthetases are found in the proteins that appeared most recently in evolution. Due to the ancestral origin of this enzymes, phylogenetic analysis of aminoacyl-tRNA synthetases sequences show that these enzymes are grouped following their aminoacid specificity and not following their phylogenetic universal tree position. This mean that aminoacyl-tRNA synthetases appeared and evolved before "tree of life" was divided en the three recognised present domains (Archaea, Bacterial and Eukarya).

This work offers a phylogenetic relationship analysis between tryptophanyl and tyrosyl-tRNA synthetases eukaryotes and prokaryotes. They show a phylogenetic standard absolutely different to the evolution relation that present other aminoacyl-tRNA synthetases.
 
 
 
 

Go to index


AIMS



* To know and determine and ancient protein family, that has been evolving during millions of years. They seem to be one of the oldest proteins know in The Earth.

* To study their evolution relationships throughout phylogenetic analysis of their components.

* To compare the phylogeny between tryptophanyl and tyrosyl t-RNA synthetases (two enzymes that share a high degree of structural similarity, suggesting a surprisingly recent common ancestor) and the rest of the aminoacyl-tRNA synthetases.

* To use computer tools to analyse the genes that codify these proteins: conserved domains, exon structure, etc.
 
 
 
 
 

Go to index


METODOLOGY


1.-Obtaining sequences
 

Sequences used in this study

All sequences used were available in a document that list the  Swissprot  sequence entries for the twenty types of aminoacyl-transfer RNA synthetases.  It  also  indicates  to what sequence class (I or II) each type belongs to and whether or not a 3D structure has been solved  for  at  least one member  of  a  specific  type  of  aa-tRNA synthetases.

BLASTP has been used to obtain the similarity sequences for this study family members. But it detects few members due to aminoacyl-tRNA synthetases milion year evolution. It's for this reason that all sequences has been obtained by Swissprot.
 
 
 

2.-Phylogenetic analysis

2.1.- Sequence alignment

Sequences has been aligned by Clustalw.

In the construction of the second tree, conserved motiffs KMSKS has been aligned. Clustalw does not give a good aliniation of this conserved motifs. So, we have use Jalview to do manual alignment of this pattern.
 

2.2.- Tree constructs

Phylogenetic trees have been constructed using most parsimonious metode, from Phylip 3.5. The unrooted grafic tree representation has been made using Treeview and the branch bootstrap has been obtained by Seqboot, from 100 replications bootstrap analyse.
 
 

3.-Genomic information: conserved domains, genic structure, cromosomal location

Aminoacyl-tRNA synthetases conserved domains and family proteins study has been done taking the sequences from Swissprot and using Prosite and Interpro.

Ensembl has been used for aminoacyl-tRNA synthetases cromosoma location study.
 
 
 

Go to index


RESULTS


1. Phylogenetic distribution

The first analysis was carried out by multiple sequence alignments across the entire sequences of several class I aminoacyl-tRNA synthetases, included eukaryotic and prokaryotic WRS and YRS.

The resulted tree (fig.x) clusters the WRS and YRS according to their prokaryotic or eukaryotic origin. These synthetases not follow the amino acid specificity groups. Instead, the other analized synthetases, MRS and RRS, shows a clustering of the enzymes on the basis of their amino acid specificities.
 
 

Fig. 2 Most parsimonious unrooted tree built from a bootstrap analysis of the WRS-YRS alignment (100 data sets), using complete sequence. Numbers (% in grey) at the branches correspond to percentage bootstrap frequencies for each particular branch.
Full species names are Escherichia coli and Homo sapiens. Synthetase abbreviations: YRS, tyrosyl-tRNA synthetase; WRS, tryptophanyl-tRNA synthetase; RRS, arginyl-tRNA synthetase; MRS, metionyl-tRNA synthetase.





The second analysis was carried out from the alignment of the conserved region around the ‘KMSKS’ motif (a characteristic region of class I aminoacyl-tRNA synthetases). All analized sequences are WRS and YRS. This analysis include synthetases taken from mitochondrial and archaeal genomes.

The resulted tree shows a clear distribution of synthetases according to their eukaryotic or prokaryotic nature. These results support the hypothesis that present day WRS andYRS appeared after the separation of nucleated cells from bacteria.
Consistent with the endosymbiotic origin of mitochondria, mitochondrial WRS and YRS groups together with their respective prokaryotic counterparts.

Instead, archaeal synthetases cluster with eukaryotic synthetases, according to the hypothesis that eukaryotic nucleus is the result of ancestral endosymbiotic relation with and archaea.
 
 

  YRS and WRS Archaea and Eukarya.
  YRS i WRS Bacteria and Mitochondrial.

               Fig. 3 Most parsimonious unrooted tree built from a bootstrap analysis of the WRS-YRS
               alignment (100  data sets), using KMSKS motifs (conserved residues). Numbers (% in grey)
               at the branches correspond to percentage bootstrap frequencies for each particular branch.
               Full species names are Neurospora crassa, Podospora anserina, Bacillus caldotenax, Archaeoglobus fulgidus
              Bacillus stearothermophilus,Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae, Bos taurus,
              Homo sapiens, Oryctolagus cuniculus i Mus musculus. Synthetase abbreviations: YRS, tyrosyl-tRNA
               synthetase; WRS, tryptophanyl-tRNA synthetase; mit, mitochondrial enzyme.
 

2. Studying human genes that code for aminoacyl-tRNA synthetases

The next topic we will like to introduce is an analyse of all human genes that code for this protein family. Human genome contains 20 aminoacyl-tRNA synthetases in different chromosomes. The gene number that code for each aminoacyl-tRNA synthetase is very variable: some aminoacyl-tRNA synthetases have one gene copy, 2 copies, 4 copies and some of them have 6 copies. The genes lenght goes from 1,4 Mb (CRS) until 228,1 Mb (E-PRS). The exon number for each gene is also very variable, from one (ARS) to 34 (IRS). It is also very curious that gene coding for prolyl-tRNA synthetase is the same of one of the genes that code for glutamyl-tRNA synthetase.
 
 

HUMAN GENES THAT CODE FOR AMINOACYL-tRNA SYNTHETASES

Human aminoacyl-tRNA synthetases 
. Swissprot 
(Access number)
Cromosomic location Segment 
location
Exon 
number
OMIM
(Access number)
Class I
AARS
Arginine
P54136
171123473 - 171156331 bp (171.1 Mb) on chromosome 5
5pter-q11 16 107820
Cysteine
P49589
1379373 - 1435861 bp (1.4 Mb) on
chromosome 11
11p15.5 19 123859
Glutamic acid
O14563
P07814
23398371 - 23426105 bp (23.4 Mb) on chromosome 16
228075312 - 228140152 bp (228.1 Mb) on chromosome 1
1q41-q42 4

29

138295
Glutamine
P47897
chromosome 3
3p21.3-p21.1 . 603727
Isoleucine
P41252
83419834 - 83503176 bp (83.4 Mb) on chromosome 9
9q21 34 600709
Leucine
Q9NSE1
148045217 - 148114181 bp (148.0 Mb) on chromosome 5
. 32 .
Metionine
P56192
68218714 - 68316167 bp (68.2 Mb) on chromosome 12
68354941 - 68357494 bp (68.4 Mb) on chromosome 12
. 19

4

.
Tyrosine
P54577
28055842 – 28063535 bp (28.1 Mb) on chromosome 1
. 6 .
Tryptophan
P23381
57923973 - 57928097 bp (57.9 Mb) on chromosome 11
86169124 – 86210654 bp (86.2 Mb) on chromosome 14
14q32.31 3

11

191050
Valine
P26640
37195691 - 37213908 bp (37.2 Mb) on chromosome 6
36334346 - 36346588 bp (36.3 Mb) on chromosome 6
6p21.3 30

29

604137
Class II
AARS
Alanine
P49588
132886654 – 132886950 bp (132.9 Mb) on chromosome 4
71478163 – 71508567 bp (71.5 Mb) on  chromosome 16
75130215 – 75132228 bp (75.1 Mb) on  chromosome 16
75140359 - 75142388 bp (75.1 Mb) on  chromosome 16
75149404 – 75152498 bp (75.1 Mb) on  chromosome 16
71693865 - 71739976 bp (71.7 Mb) on  chromosome 16
16q22 1

20

5

4

3

2

601065
Aspartic acid
P14868
126286089 – 126367641 bp (126.3 Mb) on chromosome 2
. 16 .
Asparagine
O43776
57695945 - 57717075 bp (57.7 Mb) on chromosome 18
57507702 – 57508757 bp (57.5 Mb) on chromosome 18
18q21.2-
q21.3
14

2

108410
Glycine
P41250
31057984 - 31097046 bp (31.1 Mb) on chromosome 7
55930988 – 55931263 bp (55.9 Mb) on chromosome 17
7p15 17

1

600287
Hystidine
P12081
144019327 - 144027179 bp (144.0 Mb) on chromosome 5
144001946 – 144019718 bp (144.0 Mb) on chromosome 5
143855521 - 143873309 bp (143.9 Mb) on chromosome 5
143872918 – 143880764 bp (143.9 Mb) on chromosome 5
5q31.3
 
 
 
 

 

13

13

13

13

142810
 
 
 

 

Lysine
Q15046
82272716 - 82292743 bp (82.3 Mb) on chromosome 16
69047553 – 69047897 bp (69.0 Mb) on chromosome 16
16q22.2-
q22.3
14

1

601421
Phenilalanine
O95363
5345271 - 5407872 bp (5.3 Mb) on 
chromosome 6
. 3 .
Proline
P07814
228075312 – 228140152 bp (228.1 Mb) on chromosome 1
1q41-q42 29 138295
Serine
P49591
5345271 - 5407872 bp (5.3 Mb) on
chromosome 1
. 3 .
Threonine
P26639
26987477 - 27009873 bp (27.0 Mb) on chromosome 5
99436321 – 99523695 bp (99.4 Mb) on chromosome 15
5p13-cen 19

19

187790

Looking at this table we can realise that glutamyl-tRNA synthetase gene is the same as prolyl-tRNA synthetase gene. This gene is located on chromosome 1 and it codes for a bifunctional enzyme. An analyse of these protein domains is given below.
 
 

The E-P-tRNA synthetase: a Bifunctional enzyme

Aminoacyl-tRNA synthetases are a class of enzymes that charge tRNAs with their cognate amino acids. The genome sequences of certain organisms do not contain recognizable prolyl tRNA synthetases, which are essential for messenger RNA-encoded protein synthesis. However, they contain an enzyme able to provide two aminoacyl tRNA synthetases. The Glutamyl-Prolyl-tRNA synthetase is a bifunctional enzyme that aminoacylates its cognate tRNAs with glutamate or proline.

In humans, glutamyl-tRNA synthetase (GluRS) and prolyl-tRNA synthetase (ProRS) activities are contained within a single polypeptide chain, even though these enzymes belong to different classes and are thought to have evolved along independent evolutionary pathways. Glutamyl-prolyl-tRNA synthetase is made up of 1,440 amino acids encoded by 29 exons. The exons encoding the glutamyl-specific and prolyl-specific parts of the enzyme are clustered at opposite ends of the gene, separated by a long intervening DNA section with a number of exons which encode functions that may be involved in the organization of the mammalian multienzyme synthetase complex.
 

WHEP-TRS domain signature

A conserved domain of 46 amino acids, called WHEP-TRS has been shown [2] to exist in a number of higher eukaryote aminoacyl-transfer RNA synthetases. This domain is present one to six times in the several enzymes. There are three copies in mammalian multifunctional aminoacyl-tRNA synthetase in a region that separates the N-terminal glutamyl-tRNA synthetase domain from the C-terminal prolyl-tRNA synthetase domain, and six copies in the intercatalytic region of the Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is found at the N-terminal extremity of the mammalian tryptophanyl- tRNA synthetase and histidyl-tRNA synthetase, and the mammalian, insect, nematode and plant glycyl- tRNA synthetases. This domain could contain a central alpha-helical region and may play a role in the association of tRNA-synthetases into multienzyme complexes.

The emergence of a multifunctional synthetase by a gene fusion event seems to be a specific, but general attribute of all higher eukaryotic cells. This type of structural organization, in relation to the occurrence of multisynthetase complexes, could be a mechanism to integrate several catalytic domains within the same particle.

The consensus pattern based on the first 29  positions of the WHEP-domain is described below:
 

Description of the WHEP conserved domain
Consensus pattern [QY]-G-[DNEA]-x-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)-[LIV]-[DENK]-x(2)-[IV]-x(2)-L-x(3)-K

 

Exonic estructure

Aminoacyl-tRNA syntehtase exon structure analyse shows a high degree of differences between all of them. This fact is an evidence of its far ancester. The exon number takes a wide range since 1 to 30 exons.
 


 

Human aminoacyl-tRNA synthetases

 

Classe I

Classe II

 
Arginine
Alanine
Cisteine
Aspartic Acid
Glutamic Acid
Asparagine
Glycine
Glutamine
 
Isoleucine
Hystidine
Leucine
Lysine
Metionine
Phenylalanine
Tyrosine
Proline
Tryptophan
Serine
Valine
Threonine


 

Nucleotide variability in humans (SNPs) :

After looking for some aminoacyl-tRNA synthetase disease or rate of mutation in OMIM or SNPs at NCBI, we’ve found no disease involving directely this proteins. This could be surprisingly at first. But it’s important to realise that aminoacyl-tRNA synthetases appeared at origin of life, and they have a so important role in the stablishment of the genetic code that they are present in every organisms (since prokaryotes to humans and archaeas). So, it’s a fact that any mutation in this enzymes is incompatible with life or, at least, “life” like we understand.
 
 

Go to index


DISCUSSION



Phylogenetic analyse

Our results do not explain why WRS and YRS are distinguished from all other synthetases.
A possible explanation would be that, after the appearance of the ancestor eukaryotic cell, either one of the genes encoding the primitive YRS or WRS was lost in the eukaryotic branch. This could have been achieved by the replacement of the lost gene by a duplicated allele of the other gene. This theory would require a further explanation on the process by which a functional and essential enzyme is replaced by the duplication of another, functionally distinct, enzyme.

Alternatively, a single ancestral enzyme of YRS and WRS may have been able to interact with both amino acids and attach them selectively to their respective cognate tRNAs. This ancestor could have remained functional and, after the separation of prokaryotes from eukaryotes, have duplicated independently in both branches. The caveat of this theory is that it requires an improbable double duplication and divergence event.

Both scenarios, however, suggest a late existence of a highly dynamic genetic expression machinery which, at the time of the eukaryote-prokaryote divergence, was still capable of undergoing changes in its essential components.
 
 

Studying human genes that code for aminoacyl-tRNA synthetases

From the study of the cromosomic location and exon structure of human aminoacyl-tRNA synthetase genes, some conclusions can be obtained. The most evident one is the high diversity between all members of this family. This fact is due to the long time since their appeared and the long time the different members diverged. Secondly, in those cases where the gene number that code for one of these enzymes is greater than one, the copies number is ewen (2,4,6). This fact could support an origin of these copies by successives duplications during evolution. In histidyl-tRNA synthetase there are four copies of the same gene, all of them in the 5th chromosome. These duplications could appeared 990 MA ago, after the separation between Drosophila and Primates, because in the Drosophila genome there's just one copy.
 
 
 

Go to index


BIBLIOGRAPHY




1. Bacardit M., Coll M., Gabernet N. Hostes vingueren i a sorgir ens empenyeren! Origen endosimbiòtic dels mitocondris i arbre sense arrel de tots els organismes vius, a partir de les aminoacil t-RNA sintetases. Pràctiques d’evolució 2001.

2. Cerini C., Kerjan P., Astier M., Gratecos D., Mirande M., Semeriva M. 1991. A component of the multisynthetase complex is a multifunctional aminoacyl-tRNA synthetase. EMBO J  Dec;10(13):4267-77.

3. Eriani G., Delarue M., Poch O., Gangloff J., and Moras D. 1990. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347:203-206.

4. Ribas de Pouplana, Ll., Frugier M., Quinn C., and Schimmel P. 1996. Evidence that two present-day components needed for the genetic code appeared after nucleated cells separated from eubacteria. Proc. Natl. Acad. Sci. USA 93:166-170.

5. Woese, C., Olsen G.J., Ibba M., Söll D. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiology and Molecular Biology Reviews 64: 202-236.
 
 
 
 
 

Go to index


M.Mercè Bacardit i Reguant, Montse Coll Lladó, Núria Gabernet i Díaz
Bioinformàtica-2002

For any comment and suggestion send an email to:    merce.bacardit01@campus.upf.edu
                                                                               montserrat.coll02@campus.upf.edu
                                                                             nuria.gabernet01@campus.upf.edu