RESULTS

The aim of this study is to find orthologous introns to some known U12 introns in Homo sapiens and to study their evolution, looking at phenomena such as conservation of introns, possible subtype switching, U12 to U2 conversion or loss of the introns. The first step in the way to achieve this goal was looking for the orthologous genes in the species chosen (materials and methods ). Most of them were found, except for the orthologous of NHE6 in A. mellifera and C. elegans and the orthologous of KIFAP3 in A. thaliana and S. cerevisiae, as it is shown in table1.

Orthologous genes to the human ones were scanned in 12 different species (table1) in order to find homologous introns to the human U12 introns. The complete results (per intron) can be found here, where it is also available an explanation about the files containing them. A summarised version is shown in tables 2-7. The results obtained are a bit different between the introns. Some of them are found in more species than others and characteristics shared by homologous introns also present variation between introns. The explanation that follows has a first part with general results, then each gene is considered, and finally results from the analysis of paralogous genes are taken into account.

No orthologous introns were found in C. elegans or in S. cerevisiae, according to preceding studies [3]. In addition, they were not found in insects, although alignments were found between the flanking regions of intron 16-17 from KIFAP3 and the orthologous genes in A. gambiae and D. melanogaster, with a coverage of 97,5%, that is, covering almost the whole query sequence. All the orthologous introns found were U12-type. No subtype switching or conversion from U12 to U2 introns have been found in the present study. Most of the orthologous introns found have the same terminal nucleotides than the human intron, with the exception of the murine intron homologous to intron 13-14 of ERCC5, which is AT-AC, while the human one (also the chicken one) is AT-AT (table 6), a less common pair. From the alignments of the donor, acceptor and branch sites sequences it can be seen that they are quite similar, specially between near phylogenetically species. The position of the branch point is sometimes the same between different species, but it usually varies a little bit (tables 2-7). It is between -10 and -20, (except for one case which is -21 the second intron of NHE6 in rat), which is characteristic of U12 introns, as explained in the introduction.

From the alignment of donor, acceptor, and branch site sequence, the nucleotides of the signal sequences that are more likely to change can be observed. The less conserved nucleotides are the last nucleotides of the exon and the nucleotides from position +9 within the intron for the donor site. In the acceptor site, the more conserved nucleotides are the last two within the intron and the first two of the exon. The branch site shows a good conservation within the four nucleotides before the 'A' considered as the branch point and the nucleotide dowstream that one. Interestingly, the nucleotide upstream the mentioned 'A' is a 'T' in mouse, rat and chicken in ERCC5 intron 1-2, but according to the logo of the introduction, it can be an 'A' or a 'G'. The third position starting by the end of the branch site sequence shown in the tables 2-10 is the nucleotide considered as the branch point and it is an 'A' in most of U12 introns known, but three cases were found in the present study where it is not: in the chicken intron homologous to NHE6 intron 14-15 ('C'), the tetraodon one homologous to KIFAP3 intron 16-17 ('G') and in the mouse NHE7 intron homologous to NHE6 intron 10-11 ('T').

For NHE6 the results are highly similar between the three introns (tables 2-4), which are found in all the vertebrates analised (mouse, rat, chicken, frog and both fishes). The more variable thing is the distance between the branch point and the acceptor site, which for this gene presents the biggest variation observed in the present study. Interestingly, geneid does not find a donor site for T. nigroviridis in the intron 7-8, although it is very similar to the donor sequence in F. rubripes and the other donor sites (figure 2, NHE6 intron 7-8 results), but in position +6 it has a 'T', and nucleotide at this position is usually a 'C', according to present knowledge about U12 introns (it can be seen in the logo presented in the introduction). This fact will be discussed later.

musnhe6 TTTA|ATATCCTTT ratbhe6 TTTA|ATATCCTTT galnhe6 TTTA|ATATCCTTT xennhe6 ACTG|ATATCCTTC fugunhe6 CATC|ATATCCTTT tetranhe6 TATC|ATATCTTTT Figure 2. Alignment of donor sites for intron 7-8 of NHE6.

Results for ERCC5 (tables 5-6 ) are not so similar between its two U12 introns. With the first one, orthologous introns were found in all the vertebrates searched, as with NHE6. The distance between acceptor and branch point is 13 in H. sapiens and varies from 13 to 15 in the rest of species, being the same in the closer species: 14 in mouse and rat; and 15 in both fishes. This intron is the first of the gene and it can be observed that all the orthologous introns, except for the murine one start in position 89. The second U12 intron has AT-AT termini in the human being. Orthologous introns were only found in mouse and chicken. The former with AT-AC terminal nucleotides and the latter with AT-AT, as the human one. The position of the branch point is the same for the three introns.

With regard to KIFAP3 (table 7), no homologous introns were found for the intron between exons 7 and 8, which is a U12-type intron in human with AT-AC terminal nucleotides. Alignments were found with exonerate, but the query was in the negative strand; so an exonerate analysis looking for the 3 best alignments was performed (before, only the best one was shown). Two alignments were found for 3 species (rat, mouse and the frog), the second one with the query in the positive strand, but the introns were very long, almost the whole genes, and only a donor site was found with geneid in the X. tropicalis sequence (data not shown). In this alignment, the region before the intron aligns in the correct frame, but not the region downstream of the intron.

The second U12-type intron of this gene also gave interesting results (KIFAP3 intron 16-17 results). Homologous introns were found in mouse, rat, chicken and tetraodon. All of them have the same position of the branch point. The chicken intron found with exonerate starts with the TA nucleotides and does not have a score for the donor site. In the alignment of the donor sites found, it is observed that it could be one nucleotide moved (Figure 3). This donor site is predicted as a U12-type donor site by geneid with a score of 0.32, higher than the score for the U2-type (-2.04), although not very good. In the alignment made by exonerate it can be observed that the chicken sequence has a stop codon in the exonic sequence before the intron; possible explanations will be discussed in the next section.

muskifap3 ACAC|GTATCCTTT ACAC|GTATCCTTT ratkifap3 ACAC|GTATCCTTT ACAC|GTATCCTTT galkifap3 CTCG|TATTTTTTA CTC|GTATTTTTT tetrakifap3 ACCC|GTATCCTGA ACCC|GTATCCTGA
Figure 3. Alignment of donor sites for intron 16-17 of KIFAP3. Comparison between the alignment found and the alignment with the possible true donor site for the chicken sequence.

With this intron, alignments with a big coverage (97,5%) were found with the genes of A. gambiae and D. melanogaster, as said above.

The search for homologous introns in some paralogous genes (materials and methods) gave less positive results; only one of the genes scanned was found to have homologous introns, NHE7, which has three U12 introns homologous to the introns in NHE6, in the three species considered in this analysis: mouse, chicken and human (tables 8-10). All of them have the same terminal nucleotides than the U12 introns from NHE6.



Orthologous genes

Species NHE6 ERCC5 KIFAP3
Homo sapiens yes (3) yes (2) yes (2)
Mus musculus yes (3) yes (2) yes (1)
Rattus norvegicus yes (3) yes (1) yes (1)
Gallus gallus yes (3) yes (2) yes (1)
Xenopus tropicalis yes (3) yes (1) yes (0)
Fugu rubripes yes (3) yes (1) yes (0)
Tetraodon nigroviridis yes (3) yes (1) yes (1)
Anopheles gambiae yes (0) yes (0) yes (0)
Drosophila melanogaster yes (0) yes (0) yes (0)
Apis mellifera no yes (0) yes (0)
Caenorhabditis elegans no yes (0) yes (0)
Saccharomyces cerevisiae yes (0) yes (0) no
Arabidopsis thaliana yes (0) yes (0) no

Table 1. Orthologous genes found in each species. Number of homologous introns found for each gene is indicated between parentheses.

top

Species Gene Intron Termini Branch point Position
Homo NHE6 7-8 AT-AC ACTAGCCTTAACA -10
Mus NHE6 7-8 AT-AC AGCAAGCTTAACA -10
Rattus NHE6 7-8 AT-AC AGCAAGCTTAACA -10
Gallus NHE6 7-8 AT-AC GTTGTCCTTAACA -12
Xenopus NHE6 7-8 AT-AC TTAATTCTTGACC -12
Fugu NHE6 7-8 AT-AC GTTTTGTTTAACC -14
Tetraodon NHE6 7-8 AT-AC GCGTTCTTTAACC -14

Table 2. Summary of the results for intron between exons 7 and 8 in NHE6. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

Species Gene Intron Termini Branch point Position
Homo NHE6 10-11 GT-AG ATGATCCTTAACC -18
Mus NHE6 10-11 GT-AG ATGATCCTTAACC -18
Rattus NHE6 10-11 GT-AG ATGATCCTTAACC -21
Gallus NHE6 10-11 GT-AG AGTTTCCTTAACA -16
Xenopus NHE6 10-11 GT-AG TGTCTCTTTAACA -17
Fugu NHE6 10-11 GT-AG GGTCTCCTTAACC -18
Tetraodon NHE6 10-11 GT-AG TCTCTCCTTAACC -18

Table 3. Summary of the results for intron between exons 10 and 11 in NHE6. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

Species Gene Intron Termini Branch point Position
Homo NHE6 14-15 GT-AG CTCTTCCTTAACC -10
Mus NHE6 14-15 GT-AG CCCTTCCTTAACC -15
Rattus NHE6 14-15 GT-AG CCCTTCCTTAACC -15
Gallus NHE6 14-15 GT-AG TTCTTCCTTACCT -13
Xenopus NHE6 14-15 GT-AG ATATTTCTTAACT -10
Fugu NHE6 14-15 GT-AG TCTGTCCTTGACT -17
Tetraodon NHE6 14-15 GT-AG TCTGTCCTTGACT -17

Table 4. Summary of the results for intron between exons 14 and 15 in NHE6. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

Species Gene Intron Termini Branch point Position
Homo ERCC5 1-2 GT-AG TTTTCCATTAACA -13
Mus ERCC5 1-2 GT-AG GTTTTCCTTTACT -14
Rattus ERCC5 1-2 GT-AG GTTTTCCTTTACC -14
Gallus ERCC5 1-2 GT-AG TCTCCTCTTACC -13
Xenopus ERCC5 1-2 GT-AG TTTACCTTTAACT -13
Fugu ERCC5 1-2 GT-AG ATCAACCTTAACC -15
Tetraodon ERCC5 1-2 GT-AG ATCAACCCTAACC -15

Table 5. Summary of the results for intron between exons 1 and 2 in ERCC5. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

Species Gene Intron Termini Branch point Position
Homo ERCC5 13-14 AT-AT ATAAGTCTTAACT -11
Mus ERCC5 13-14 AT-AC ATAAACCTTAACT -11
Gallus ERCC5 13-14 AT-AT ATATTCCTTAACT -11

Table 6. Summary of the results for intron between exons 13 and 14 in NHE6. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

Specie Gene Intron Termini Branch point Position
Homo KIFAP3 16-17 GT-AG TGTTTCTTTAACC -15
Mus KIFAP3 16-17 GT-AG TGCATCTTTAACC -15
Rattus KIFAP3 16-17 GA-AT TGCATCCTTAACC -15
Gallus KIFAP3 16-17 TA-AG TTTATCCTTAACT -15
Tetraodon KIFAP3 16-17 GT-AG TCTTTCCTTAGTG -15

Table 7. Summary of the results for intron between exons 16 and 17 in KIFAP3. It is shown the species where the intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).

top

See complete data. Orthologous genes.

Paralogous genes

Species Gene Intron Termini Branch point Position
Homo NHE6 7-8 AT-AC ACTAGCCTTAACA -10
Homo NHE7 7-8 AT-AC AGGGTCCTTGACA -13
Mus NHE6 7-8 AT-AC AGCAAGCTTAACA -10
Mus NHE7 7-8 AT-AC GCTGTCCTTAACC -13
Gallus NHE6 7-8 AT-AC GTTGTCCTTAACA -12
Gallus NHE7 7-8 AT-AC TGATCCCTTAACA -14

Table 8. Summary of the results from paralogous of NHE6 for intron between exons 7 and 8 in NHE6. It is shown the species, the gene where the homologous intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).Each intron is compared with the corresponding NHE6 features.

Species Gene Intron Termini Branch point Position
Homo NHE6 10-11 GT-AG ATGATCCTTAACC -18
Homo NHE7 10-11 GT-AG TGGTCCCTTAACC -20
Mus NHE6 10-11 GT-AG ATGATCCTTAACC -18
Mus NHE7 10-11 GT-AG TGGCTCCTTATCA -15
Gallus NHE6 10-11 GT-AG AGTTTCCTTAACA -16
Gallus NHE7 10-11 GT-AG GAAATCCTTAACC -18

Table 9. Summary of the results from paralogs of NHE6 for intron between exons 10 and 11 in NHE6. It is shown the species, the gene where the homologous intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence).Each intron is compared with the corresponding NHE6 features.

Species Gene Intron Termini Branch point Position
Homo NHE6 14-15 GT-AG CTCTTCCTTAACC -10
Homo NHE7 14-15 GT-AG TTTTTCCTTAATT -15
Mus NHE6 14-15 GT-AG CCCTTCCTTAACC -15
Mus NHE7 14-15 GT-AG TTTTTCCTTAATT -15
Gallus NHE6 14-15 GT-AG TTCTTCCTTACCT -13
Gallus NHE7 14-15 GT-AG GTTTTCCTTAACT -11

Table 10. Summary of the results from paralogs of NHE6 for intron between exons 14 and 15 in NHE6. It is shown the species, the gene where the homologous intron was found, the termini of the intron, the sequence of the branch site and its position (calculated as the position of the third letter starting by the end of the site sequence). Each intron is compared with the corresponding NHE6 features.

top

See complete data. Paralogous genes.