The codon/amino acid relationships of the genetic code are established by the aminoacylation reactions of tRNA synthetases. Because of its universality, the appearance of the modern genetic code is thought to predate the separation of prokaryotic and eukaryotic organisms in the universal phylogenetic tree. We present here a phylogenetic analysis that shows an unusual picture for tyrosyl and tryptophanyl-tRNA synthetases. In particular, the eukaryotic tyrosyl and tryptophanyl-tRNA synthetases are more related to each other than to their respective prokaryotic counterparts. In contrast, each of the other eukaryotic synthetases is more related to its prokaryotic counterpart than to any eukaryotic synthetase specific for a different amino acid. Our results raise the possibility that present day tyrosyl and tryptophanyl-tRNA synthetases appeared after the separation of nucleated cells from eubacteria.
This protein family has been evolving for a long time. It is for this
reason that has incorporated a lot of modifications, like show all members
of this family. After separation of Drosophila melanogaster many
genes duplications has produced different copies for each gene. Some of
these copies has been lost, but some of them has been conserved.
Aminoacyl-tRNA synthetases are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure.
A few years ago it was found that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well conserved. The 'HIGH' region has been shown to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I synthetases and seem to share the same tertiary structure based on a Rossmann fold.
Aminoacyl-tRNA synthetases are molecules that contain genetic signs
extended before ancestral life was made. For that reason their analyse
is a dynamic reflex of the general evolution. Their molecular phylogeny
links with the accepted philogeny for all organisms. (Fig. 1. Life
universal tree based on leucyl-tRNA synthetase analyse).
Fig.1 Unrooted tree using Neighbor-Joinin for the leuRS.
Numbers in % indicates the branch bootstrap for 1000 replicates.
Every cell has 20 aminoacyl-tRNA synthetases, one for each aminoacid.
Aminoacyl-tRNA synthetases has been classified in two classes depending
on their common structural domain and their homologies sequences. They
all have in commun an estructural domain called WHEP which main function
is binding ATP molecules. The aminoacyl-tRNA synthetase classification
is showed below:
|
|
|
|
Isoleucine Valine Arginine Cisteine Metionine |
Treonine Alanine Glicine Proline Histidine |
|
|
Glutamine Lysine-I |
Asparragine Lysine-II |
|
|
Tryptophan |
|
The 10th class I aminoacyl-tRNA synthetases has a Rossman fold, which
is caracterized for the conserved motive 'HIGH'
(this region is made by His-Ile-Gly-His tetrapeptide and located at
N-terminal region of the protein) and the motive 'KMSKS'. Other 10th class
II aminoacyl-tRNA synthetases are caracterized for three motives degenerated
sequences.
Because of their essential role in the establishment of the genetic code, can be considered that aminoacyl-tRNA synthetases are found in the proteins that appeared most recently in evolution. Due to the ancestral origin of this enzymes, phylogenetic analysis of aminoacyl-tRNA synthetases sequences show that these enzymes are grouped following their aminoacid specificity and not following their phylogenetic universal tree position. This mean that aminoacyl-tRNA synthetases appeared and evolved before "tree of life" was divided en the three recognised present domains (Archaea, Bacterial and Eukarya).
This work offers a phylogenetic relationship analysis between tryptophanyl
and tyrosyl-tRNA synthetases eukaryotes and prokaryotes. They show a phylogenetic
standard absolutely different to the evolution relation that present other
aminoacyl-tRNA synthetases.
* To know and determine and ancient protein family, that has been evolving during millions of years. They seem to be one of the oldest proteins know in The Earth.
* To study their evolution relationships throughout phylogenetic analysis of their components.
* To compare the phylogeny between tryptophanyl and tyrosyl t-RNA synthetases (two enzymes that share a high degree of structural similarity, suggesting a surprisingly recent common ancestor) and the rest of the aminoacyl-tRNA synthetases.
* To use computer tools to analyse the genes that codify these proteins:
conserved domains, exon structure, etc.
1.-Obtaining sequences
Sequences used in this study
All sequences used were available in a document that list the Swissprot sequence entries for the twenty types of aminoacyl-transfer RNA synthetases. It also indicates to what sequence class (I or II) each type belongs to and whether or not a 3D structure has been solved for at least one member of a specific type of aa-tRNA synthetases.
BLASTP has been used
to obtain the similarity sequences for this study family members. But it
detects few members due to aminoacyl-tRNA synthetases milion year evolution.
It's for this reason that all sequences has been obtained by Swissprot.
2.-Phylogenetic analysis
2.1.- Sequence alignment
Sequences has been aligned by Clustalw.
In the construction of the second tree, conserved motiffs KMSKS has
been aligned. Clustalw does not give a good aliniation of this conserved
motifs. So, we have use Jalview
to do manual alignment of this
pattern.
2.2.- Tree constructs
Phylogenetic trees have been constructed using most parsimonious metode,
from Phylip 3.5. The unrooted grafic tree representation has been made
using Treeview
and the branch bootstrap has been obtained by Seqboot, from 100 replications
bootstrap analyse.
3.-Genomic information: conserved domains, genic structure, cromosomal location
Aminoacyl-tRNA synthetases conserved domains and family proteins study has been done taking the sequences from Swissprot and using Prosite and Interpro.
Ensembl has been
used for aminoacyl-tRNA synthetases cromosoma location study.
1. Phylogenetic distribution
The first analysis was carried out by multiple sequence alignments across the entire sequences of several class I aminoacyl-tRNA synthetases, included eukaryotic and prokaryotic WRS and YRS.
The resulted tree (fig.x) clusters the WRS and
YRS according to their prokaryotic or eukaryotic origin. These synthetases
not follow the amino acid specificity groups. Instead, the other analized
synthetases, MRS and RRS, shows a clustering of the enzymes on the basis
of their amino acid specificities.
Fig. 2 Most parsimonious unrooted tree built from a bootstrap
analysis of the WRS-YRS alignment (100 data sets), using complete sequence.
Numbers (% in grey) at the branches correspond to percentage bootstrap
frequencies for each particular branch.
Full species names are Escherichia coli and Homo sapiens.
Synthetase abbreviations: YRS, tyrosyl-tRNA synthetase; WRS, tryptophanyl-tRNA
synthetase; RRS, arginyl-tRNA synthetase; MRS, metionyl-tRNA synthetase.
The second analysis was carried out from the alignment of the conserved region around the ‘KMSKS’ motif (a characteristic region of class I aminoacyl-tRNA synthetases). All analized sequences are WRS and YRS. This analysis include synthetases taken from mitochondrial and archaeal genomes.
The resulted tree shows a clear distribution of synthetases according
to their eukaryotic or prokaryotic nature. These results support the hypothesis
that present day WRS andYRS appeared after the separation of nucleated
cells from bacteria.
Consistent with the endosymbiotic origin of mitochondria, mitochondrial
WRS and YRS groups together with their respective prokaryotic counterparts.
Instead, archaeal synthetases cluster with eukaryotic synthetases, according
to the hypothesis that eukaryotic nucleus is the result of ancestral endosymbiotic
relation with and archaea.
Fig. 3 Most parsimonious unrooted tree built from a bootstrap analysis
of the WRS-YRS
alignment (100 data sets), using KMSKS motifs (conserved residues).
Numbers (% in grey)
at the branches correspond to percentage bootstrap frequencies for each
particular branch.
Full species names are Neurospora
crassa, Podospora anserina,
Bacillus caldotenax, Archaeoglobus
fulgidus
Bacillus stearothermophilus,Bacillus
subtilis, Escherichia
coli, Saccharomyces cerevisiae,
Bos
taurus,
Homo sapiens, Oryctolagus
cuniculus i Mus musculus.
Synthetase abbreviations: YRS, tyrosyl-tRNA
synthetase; WRS, tryptophanyl-tRNA synthetase; mit, mitochondrial enzyme.
2. Studying human genes that code for aminoacyl-tRNA synthetases
The next topic we will like to introduce is an analyse of all human
genes that code for this protein family. Human genome contains 20 aminoacyl-tRNA
synthetases in different chromosomes. The gene number that code for each
aminoacyl-tRNA synthetase is very variable: some aminoacyl-tRNA synthetases
have one gene copy, 2 copies, 4 copies and some of them have 6 copies.
The genes lenght goes from 1,4 Mb (CRS) until 228,1 Mb (E-PRS). The exon
number for each gene is also very variable, from one (ARS) to 34 (IRS).
It is also very curious that gene coding for prolyl-tRNA synthetase is
the same of one of the genes that code for glutamyl-tRNA synthetase.
HUMAN GENES THAT CODE FOR AMINOACYL-tRNA SYNTHETASES
Human aminoacyl-tRNA synthetases | |||||
---|---|---|---|---|---|
. | Swissprot
(Access number) |
Cromosomic location | Segment
location |
Exon
number |
OMIM
(Access number) |
Class I
AARS |
Arginine
P54136 |
|
5pter-q11 | 16 | 107820 |
Cysteine
P49589 |
chromosome 11 |
11p15.5 | 19 | 123859 | |
Glutamic acid
O14563 P07814 |
228075312 - 228140152 bp (228.1 Mb) on chromosome 1 |
1q41-q42 | 4
29 |
138295 | |
Glutamine
P47897 |
|
3p21.3-p21.1 | . | 603727 | |
Isoleucine
P41252 |
|
9q21 | 34 | 600709 | |
Leucine
Q9NSE1 |
|
. | 32 | . | |
Metionine
P56192 |
68354941 - 68357494 bp (68.4 Mb) on chromosome 12 |
. | 19
4 |
. | |
Tyrosine
P54577 |
|
. | 6 | . | |
Tryptophan
P23381 |
86169124 – 86210654 bp (86.2 Mb) on chromosome 14 |
14q32.31 | 3
11 |
191050 | |
Valine
P26640 |
36334346 - 36346588 bp (36.3 Mb) on chromosome 6 |
6p21.3 | 30
29 |
604137 | |
Class II
AARS |
Alanine
P49588 |
71478163 – 71508567 bp (71.5 Mb) on chromosome 16 75130215 – 75132228 bp (75.1 Mb) on chromosome 16 75140359 - 75142388 bp (75.1 Mb) on chromosome 16 75149404 – 75152498 bp (75.1 Mb) on chromosome 16 71693865 - 71739976 bp (71.7 Mb) on chromosome 16 |
16q22 | 1
20 5 4 3 2 |
601065 |
Aspartic acid
P14868 |
|
. | 16 | . | |
Asparagine
O43776 |
57507702 – 57508757 bp (57.5 Mb) on chromosome 18 |
18q21.2-
q21.3 |
14
2 |
108410 | |
Glycine
P41250 |
55930988 – 55931263 bp (55.9 Mb) on chromosome 17 |
7p15 | 17
1 |
600287 | |
Hystidine
P12081 |
144001946 – 144019718 bp (144.0 Mb) on chromosome 5 143855521 - 143873309 bp (143.9 Mb) on chromosome 5 143872918 – 143880764 bp (143.9 Mb) on chromosome 5 |
5q31.3
|
13
13 13 13 |
142810
|
|
Lysine
Q15046 |
69047553 – 69047897 bp (69.0 Mb) on chromosome 16 |
16q22.2-
q22.3 |
14
1 |
601421 | |
Phenilalanine
O95363 |
chromosome 6 |
. | 3 | . | |
Proline
P07814 |
|
1q41-q42 | 29 | 138295 | |
Serine
P49591 |
chromosome 1 |
. | 3 | . | |
Threonine
P26639 |
99436321 – 99523695 bp (99.4 Mb) on chromosome 15 |
5p13-cen | 19
19 |
187790 |
Looking at this table we can realise that glutamyl-tRNA synthetase gene
is the same as prolyl-tRNA synthetase gene. This gene is located on chromosome
1 and it codes for a bifunctional enzyme. An analyse of these protein domains
is given below.
The E-P-tRNA synthetase: a Bifunctional enzyme
Aminoacyl-tRNA synthetases are a class of enzymes that charge tRNAs with their cognate amino acids. The genome sequences of certain organisms do not contain recognizable prolyl tRNA synthetases, which are essential for messenger RNA-encoded protein synthesis. However, they contain an enzyme able to provide two aminoacyl tRNA synthetases. The Glutamyl-Prolyl-tRNA synthetase is a bifunctional enzyme that aminoacylates its cognate tRNAs with glutamate or proline.
In humans, glutamyl-tRNA synthetase (GluRS) and prolyl-tRNA synthetase
(ProRS) activities are contained within a single polypeptide chain, even
though these enzymes belong to different classes and are thought to have
evolved along independent evolutionary pathways. Glutamyl-prolyl-tRNA synthetase
is made up of 1,440 amino acids encoded by 29 exons. The exons encoding
the glutamyl-specific and prolyl-specific parts of the enzyme are clustered
at opposite ends of the gene, separated by a long intervening DNA section
with a number of exons which encode functions that may be involved in the
organization of the mammalian multienzyme synthetase complex.
WHEP-TRS domain signature
A conserved domain of 46 amino acids, called WHEP-TRS has been shown [2] to exist in a number of higher eukaryote aminoacyl-transfer RNA synthetases. This domain is present one to six times in the several enzymes. There are three copies in mammalian multifunctional aminoacyl-tRNA synthetase in a region that separates the N-terminal glutamyl-tRNA synthetase domain from the C-terminal prolyl-tRNA synthetase domain, and six copies in the intercatalytic region of the Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is found at the N-terminal extremity of the mammalian tryptophanyl- tRNA synthetase and histidyl-tRNA synthetase, and the mammalian, insect, nematode and plant glycyl- tRNA synthetases. This domain could contain a central alpha-helical region and may play a role in the association of tRNA-synthetases into multienzyme complexes.
The emergence of a multifunctional synthetase by a gene fusion event seems to be a specific, but general attribute of all higher eukaryotic cells. This type of structural organization, in relation to the occurrence of multisynthetase complexes, could be a mechanism to integrate several catalytic domains within the same particle.
The consensus pattern based on the first 29 positions of the WHEP-domain
is described below:
|
|
Consensus pattern | [QY]-G-[DNEA]-x-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)-[LIV]-[DENK]-x(2)-[IV]-x(2)-L-x(3)-K |
Exonic estructure
Aminoacyl-tRNA syntehtase exon structure analyse shows a high degree
of differences between all of them. This fact is an evidence of its far
ancester. The exon number takes a wide range since 1 to 30 exons.
Human aminoacyl-tRNA
synthetases |
|
|||
Classe
I |
Classe
II |
|
||
Arginine
|
Alanine
|
|||
Cisteine
|
Aspartic
Acid
|
|||
Glutamic
Acid
|
Asparagine
|
|||
Glycine
|
||||
Glutamine
|
|
|||
Isoleucine
|
Hystidine
|
|||
Leucine
|
Lysine
|
|||
Metionine
|
Phenylalanine
|
|||
Tyrosine
|
Proline
|
|||
Tryptophan
|
Serine
|
|||
Valine
|
Threonine
|
Nucleotide variability in humans (SNPs) :
After looking for some aminoacyl-tRNA synthetase disease or rate of
mutation in OMIM or SNPs at NCBI, we’ve found no disease involving directely
this proteins. This could be surprisingly at first. But it’s important
to realise that aminoacyl-tRNA synthetases appeared at origin of life,
and they have a so important role in the stablishment of the genetic code
that they are present in every organisms (since prokaryotes to humans and
archaeas). So, it’s a fact that any mutation in this enzymes is incompatible
with life or, at least, “life” like we understand.
Phylogenetic analyse
Our results do not explain why WRS and YRS are
distinguished from all other synthetases.
A possible explanation would be that, after the
appearance of the ancestor eukaryotic cell, either one of the genes encoding
the primitive YRS or WRS was lost in the eukaryotic branch. This could
have been achieved by the replacement of the lost gene by a duplicated
allele of the other gene. This theory would require a further explanation
on the process by which a functional and essential enzyme is replaced by
the duplication of another, functionally distinct, enzyme.
Alternatively, a single ancestral enzyme of YRS and WRS may have been able to interact with both amino acids and attach them selectively to their respective cognate tRNAs. This ancestor could have remained functional and, after the separation of prokaryotes from eukaryotes, have duplicated independently in both branches. The caveat of this theory is that it requires an improbable double duplication and divergence event.
Both scenarios, however, suggest a late existence
of a highly dynamic genetic expression machinery which, at the time of
the eukaryote-prokaryote divergence, was still capable of undergoing changes
in its essential components.
Studying human genes that code for aminoacyl-tRNA synthetases
From the study of the cromosomic location and
exon structure of human aminoacyl-tRNA synthetase genes, some conclusions
can be obtained. The most evident one is the high diversity between all
members of this family. This fact is due to the long time since their appeared
and the long time the different members diverged. Secondly, in those cases
where the gene number that code for one of these enzymes is greater than
one, the copies number is ewen (2,4,6). This fact could support an origin
of these copies by successives duplications during evolution. In histidyl-tRNA
synthetase there are four copies of the same gene, all of them in the 5th
chromosome. These duplications could appeared 990 MA ago, after the separation
between Drosophila and Primates, because in the Drosophila
genome there's just one copy.
1. Bacardit M., Coll M., Gabernet N. Hostes vingueren i a sorgir ens empenyeren! Origen endosimbiòtic dels mitocondris i arbre sense arrel de tots els organismes vius, a partir de les aminoacil t-RNA sintetases. Pràctiques d’evolució 2001.
2. Cerini C., Kerjan P., Astier M., Gratecos D., Mirande M., Semeriva M. 1991. A component of the multisynthetase complex is a multifunctional aminoacyl-tRNA synthetase. EMBO J Dec;10(13):4267-77.
3. Eriani G., Delarue M., Poch O., Gangloff J., and Moras D. 1990. Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347:203-206.
4. Ribas de Pouplana, Ll., Frugier M., Quinn C., and Schimmel P. 1996. Evidence that two present-day components needed for the genetic code appeared after nucleated cells separated from eubacteria. Proc. Natl. Acad. Sci. USA 93:166-170.
5. Woese, C., Olsen G.J., Ibba M., Söll
D. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary
process. Microbiology and Molecular Biology Reviews 64: 202-236.
For any comment and suggestion send an email to:
merce.bacardit01@campus.upf.edu
montserrat.coll02@campus.upf.edu
nuria.gabernet01@campus.upf.edu