a) Selenoproteins:
We used the human selenoproteins and the selenoprotein homologs (with cysteine instead of selenocysteine) as queries. For each query, we first used tblastn to find genomic regions encoding similar sequences. All the blast hits on the same scaffold were "merged" so that gene predictions with exonerate and GeneWise were performed between a given query and the subsequences of the scaffolds including the regions with the blast hits (one subsequence for each scaffold). Whenever the prediction was fruitful, we obtained a T-Coffee alignment between the prediction and the original query. Therefore, for a given query, we often obtained more than one prediction. We assume that one is the prediction that most likely corresponds to the actual protein, whereas the other predictions might correspond to other proteins exhibiting similarity to the query, as well as pseudogenes or duplications. In order to select the one which most likely corresponds to the protein, we manually inspected the T-Coffee alignments, the exonerate and GeneWise predictions, as well as the SECIS predictions and the results of the blast for the human SECIS elements. Ideally, the prediction most likely representing the actual protein exhibits high homology to the human query (including the alignment of the cysteine/selenocysteine in the query with a TGA codon or a cysteine), starts with methionine and does not contain frameshifts nor in-frame stop codons (besides those corresponding to selenocysteine). If the protein is predicted to have selenocysteine, it should have a predicted SECIS element relatively close to the end of the predicted sequence, overlapping with the region exhibiting similarity to the human SECIS element (if the human orthologue also contains selenocysteine). Finally, Selenoprofiles should also predict the same protein. If a protein with these characteristics was found, other regions predicted to encode similar proteins and not overlapping with the regions predicted for other selenoproteins were considered to be pseudogenes (when containing frameshifts and in-frame stop codons not corresponding to selenocysteine).
We also searched for non-mammalian selenoproteins: Fep15, SelL, SelJ and Dsba. As expected, no significant blast hits were found so no further analysis was done.
b) Machinery:
Selenoprotein synthesis requires a specific machinery (a set of proteins and the tRNA for Sec). If our predictions are correct and the Bolivian squirrel monkey genome does contain the genes for the selenoproteins we predict, we would also expect it to contain the genes encoding the machinery. Indeed, our results suggest we identified all the genes encoding the machinery proteins. The analysis was done using the same criteria as for the selenoproteins except for the SECIS analysis, which is not necessary in non-selenoproteins (SPS2, which is a selenoprotein, was analysed as such).
In the following table we show, for each query, whether the S. boliviensis ortholgue contains Sec, the tblastn output, the exonerate and GeneWise prediction for the protein, the T-Coffee alignment for the best prediction (if they differ), the SECIS element prediction by SECISearch, the blast output for the human SECIS element, and the Selenoprofiles prediciton.
We also checked the presence of the genes encoding the machinery proteins (see above for SPS1 and SPS2):
Protein | Sec | BLAST | Exonerate | Genewise | T-coffee | Location |
---|---|---|---|---|---|---|
Glutathione peroxidase 1 (GPx1)
According to our predictions, GPx1 is located between positions 8,939,854 and 8,940,562 on scaffold gi|357434173|gb|JH378190.1|, on the forward strand.
A protein with high homology to the C-terminus of the human protein is predicted and exonerate and genewise give the same prediction. However, neither program was able to give a prediction for the first 50 aminoacids of the protein, including the selenocysteine/cysteine residue. In fact, this genomic region has not been properly sequenced and appears as a row of Ns in the sequence. Nevertheless, a SECIS element is predicted between positions 8,940,598 and 8,940,689, which is less than 50 nucleotides after the end of the last predicted exon. Also, in this region we obtained the best blast hit for human GPx1 SECIS element. Furthermore, Selenoprofiles predicted the same protein and SECIS element. This data strongly suggests this region codes for GPx1. The presence of a SECIS element suggests it contains selenocysteine, as in humans, although proper sequencing of the region should provide the evidence, because we found homologs with cysteine of other selenoproteins which seem to conserve the SECIS element.
We could also predict proteins with homology to GPx1 from other genomic regions. We found homology with the regions that we predict that code for the other members of the GPx family, which is not surprising because they have sequence similarity (see the human GPx family alignment). We also found homology from the region which is predicted to encode DI1.
Finally, we found three additional regions with homology, which might be GPx1 pseudogenes due to the presence of frameshifts and in-frame stop codons. We also predicted sequences from these regions with similarity to other GPx family members. However, we hypothesize that they correspond to GPx1 pseudogenes because the blast for the SECIS element of human GPx1 gave significant hits less than 100 nucleotides after the predicted coding sequences.
Glutathione peroxidase 2 (GPx2)
We predict that GPx2 is located on scaffold gi|357434248|gb|JH378115.1, between positions 22,959,043 and 22,963,268, on the reverse strand. The gene is predicted to have 2 exons.
Both exonerate and genewise predict a protein with a very high homology (identical except for 2 aminoacids, across all the protein sequence) to the human GPx2. This protein is predicted to contain selenocysteine. Consistent with this, a SECIS element is predicted between positions 22,958,847 and 22,958,753, the region where we obtained the only blast hit for human GPx2 SECIS element. Selenoprofiles gives the same prediction.
As for the other GPx family members, we could also predict proteins with homology to GPx2 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 3 (GPx3)
We predict that GPx3 is located between positions 9,811,252 and 9,818,706 on scaffold gi|357434197|gb|JH378166.1, on the forward strand. The gene is predicted to have 5 exons.
Both exonerate and genewise predict a protein with a very high homology to the human GPx3. It seems that Saimiri boliviensis protein is 6 amino acids shorter than the human orthologue due to a premature stop codon (TAG). Interestingly, exonerate prediction extends after this stop codon and shows that the sequence afterwards is still conserved. The protein is predicted to contain selenocysteine. Consistent with this, a SECIS element is predicted about 600 nucleotides downstream the end of the coding sequence, between positions 9,819,345 and 9,819,439, at the same region where we obtained the only blast hit for human GPx3 SECIS element. However, Selenoprofiles predicted GPx6 to be in this region. This is not surprising due to the similarity between GPx3 and GPx6 (click here to see the alignment between human GPx3 and GPx6). However, we think that it is GPx3 because when we predict GPx6 from this region the homology is lower than when it is predicted with GPx3 as a query (click here to see the T-Coffee alignment between the protein predicted by genewise using GPx6 as query and human GPx6) and also because in this region there is the only sequence highly similar to human GPx3 SECIS element.
As for the other GPx family members, we could also predict proteins with homology to GPx3 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 4 (GPx4)
We predict GPx4 is located between positions 333,681 and 336,268 on scaffold gi|357434043|gb|JH378320.1, on the forward strand. The gene is predicted to have 7 exons.
A protein with high homology is predicted both with exonerate and genewise. It is predicted to contain selenocysteine. This is further supported by a predicted SECIS element between positions 336,304 and 336,400, the same region which is homologous to human GPx4 SECIS element. However, although Selenoprofiles also predicts a protein of the GPx family in this genomic region, it does not specify which one.
As for the other GPx family members, we could also predict proteins with homology to GPx4 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 5 (GPx5)
Our predictions suggest GPx5 is located between positions 105,238 and 113,445 on scaffold gi|357434034|gb|JH378329.1, on the reverse strand. It is predicted to have 5 exons.
Both exonerate and genewise predict a protein with high homology to human GPx5. The protein is predicted to be a cysteine-containing homologue, like in humans. Consistent with this, no SECIS element is predicted close to the end of the coding sequence. Selenoprofiles also predicts a cysteine homologue of GPx5 in this region, although it does predict a SECIS element.
Because the subsequence used to predict the protein also includes the region we predict encodes for GPx6, exonerate also predicted this protein. However,GPx6 gene is on the forward strand. Interestingly, human GPx5 and GPx6 have a similar genomic organisation, being on two different orientations and partly overlapping (See Ensembl).
As for the other GPx family members, we could also predict proteins with homology to GPx5 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 6 (GPx6)
Our predictions suggest GPx6 is located between positions 125,233 and 135,769 on scaffold gi|357434034|gb|JH378329.1, on the forward strand. It is predicted to have 5 exons.
Both genewise and exonerate predict a protein with homology to human GPx6 in this region. However, it is predicted to contain cysteine instead of selenocysteine, in contrast to human GPx6 and similar to rodents (Lobanov et al, 2009). However, a SECIS element can still be predicted between positions 136,368 and 136,461, where we also obtained the blast hit for human GPx6 SECIS element. This is not incompatible with a cysteine homologue. It might be that the mutation to cysteine occured first and the SECIS element is still conserved. Consistent with this, Selenoprofiles also predicts GPx6 in this region (although it does not predict the complete protein) and the same SECIS element.
As for the other GPx family members, we could also predict proteins with homology to GPx6 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 7 (GPx7)
Our predictions suggest GPx7 is located between positions 165,028 and 165,608 on scaffold gi|357434150|gb|JH378213.1, on the forward strand. It is predicted to have 2 exons.
Both exonerate and genewise predict a protein with high homology to the human GPx7 but lacking the first 45 amino acids. We checked the genomic sequence and found a stop codon shortly before the start of the prediction, which suggests a shorter homologue. As in humans, it is predicted to contain cysteine. Consistent with this, no suitable SECIS element is predicted downstream the gene (there is a prediction but starting 35,341 nucleotides after the end of the coding sequence, which we think is too far). In agreement, Selenoprofiles also predicts a cysteine homologue in the same region (although it does not predict the complete protein nor it details which GPx) and no SECIS element.
As for the other GPx family members, we could also predict proteins with homology to GPx7 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
Glutathione peroxidase 8 (GPx8)
Our predictions suggest GPx8 is located between positions 17,407,600 and 17,411,425 on scaffold gi|357434243|gb|JH378120.1, on the reverse strand. It is predicted to have 3 exons.
Both exonerate and genewise predict a protein with very high homology to the human GPx8. Similar to humans, it is predicted to contain cysteine. Consistent with this, no SECIS element is predicted close to the end of the coding sequence. However, although selenoprofiles also predicts a cysteine homologue in this region, it does predict a SECIS element between positions 17,404,914 and 17,405,031.
As for the other GPx family members, we could also predict proteins with homology to GPx8 from regions that encode other GPx family members, the regions we hypothesise might be GPx1 pseudogenes and the region which is predicted to encode DI1.
Back to top
According to our predictions, TR1 is located between positions 10,527,965 and 10,561,501, on scaffold gi|357434224|gb|JH378139.1, and on the forward strand. This gene is predicted to have 13 exons.
A protein with very high homology to the human query is predicted. The selenocysteine in the query is found one residue before the end of the sequence. As a result, exonerate prediction extends until the end, thus predicting a selenocysteine in the protein, whereas genewise stops at the TGA codon. Also, T-Coffee does not align the last residue properly. Consistent with the presence of selenocysteine, a SECIS element is predicted between positions 10,561,721 and 10,561,821, which is 200 nucleotides more or less after the end of the last predicted exon. Also in this region we obtained the best blast hit for human TR1 SECIS element. Furthermore, Selenoprofiles predicted TR1 in this region. However, it used a slighlty different query (also human TR1) and therefore it predicted a slightly different protein, with 4 extra exons at the beginning of the sequence. This suggests the presence of various isoforms of the protein. Click here to see the alignment between the prediction from exonerate and the one from Selenoprofiles.
We could also predict proteins with homology to TR1 from other genomic regions. We found homology with the regions that we predict that code for the other members of the TR family, which is not surprising because they have sequence similarity (see the human TR family alignment).
Back to top
We predict that TR2 is located on scaffold gi|357434094|gb|JH378269.1, between positions 2,205,122 and 2,255,989, on the reverse strand. This gene is predicted to have 16 exons.
A protein with very high homology to the human query except for the beginning is predicted. Neither exonerate nor genewise was able to give a prediction for the first part of the protein, including the starting methionine. When checking, we see that a stop codon is present before the prediction, so this might be why we are not able to get this missing part. Selenoprofiles predicts TR2 in this region, but with an extra exon at the beginning of the protein. However, the intron between exon 1 and 2 is almost 124,000 bp, which is unusually long, making the existance of this first exon doubtful and therefore suggesting the protein in Saimiri bolivensis is shorter. As for TR1, only exonerate could give a prediction for the last two amino acids, including the selenocysteine. Furthermore, a SECIS element is predicted between positions 2,203,628 and 2,203,714, which is 1,400 nucleotides more or less from the coding sequence. Also in this region we obtained the best blast hit for human TR2 SECIS element. Furthermore, Selenoprofiles predicted the same SECIS element. This data strongly suggest this region codes for TR2.
We could also predict proteins with homology to TR2 from other genomic regions. We found homology with the regions that we predict that code for the other members of the TR family, which is not surprising because they have sequence similarity.
Back to top
We predict that TR3 is located on scaffold gi|357434057|gb|JH378306.1, between positions 1,052,775 and 1,089,168, on the forward strand. This gene is predicted to have 15 exons.
A protein with very high homology to the human query is predicted, but neither exonerate nor genewise was able to give a prediction aligning with the first part of the query. When checking, we see that a stop codon is present shortly before the prediction, so that is probably the reason why we are not able to get this missing part, suggesting the protein in Saimiri boliviensis is shorter. As for TR1 and TR2, only exonerate could give a prediction until the end of the protein, including the selenocysteine.
When we tried to find a SECIS element, we encountered some problems. First of all, SECISearch did not give any SECIS prediction. When we analyzed Selenoprofiles, it gave a SECIS element prediction, between positions 1,089,367 and 1,089,461. This SECIS element could be a good one because it is located fairly close to the end of the coding sequence. However, the blast did not give any hit for human TR3 SECIS element. So in this case, it is not clear if a SECIS element is present or not.
Finally, we would like to point out that TR1, TR2 and TR3 belong to the same family. Consistent with this, we have almost found the same hits for each of them. It is not suprising because they present a high degree of homology. The choice has been done based on the homologies with the human queries and checking the results with Selenoprofiles.
Back to top
Iodothyronine deiodinase 1 (DI1)
According to our predictions, DI1 is located on scaffold gi|357434255|gb|JH378108.1 between positions 31,632,165 and 31,651,051, on the reverse strand. It is predicted to have 4 exons.
A protein with high homology to the human query is predicted. It is also predicted to have a selenocysteine. No SECIS element has been predicted with SECISearch, although a significant blast hit for the human DI1 SECIS element was found between positions 31,631,129 and 31,631,219. This is in agreement with Selenoprofiles, which has predicted DI1 to be on the same genomic region and has predicted a SECIS element between positions 31,631,128 and 31,631,220.
We also found homology between DI1 and the regions coding for the other two members of the family (DI2 and DI3) consistent with the fact that all DI family members show sequence similarity ( see the human DI family alignment). Interestingly, we also found homology between the GPxs and the region coding for DI1.
Back to top
Iodothyronine deiodinase 2 (DI2)
According to our predictions, DI2 is located on scaffold gi|357434248|gb|JH378115.1 between positions 38,189,391 and 38,198,096, on the reverse strand. It is predicted to have 2 exons.
Exonerate and genewise give the same prediction. The predicted protein has a high homology with human DI2 and it is predicted to have a selenocysteine in the same position as its human homologue. A SECIS element has been predicted between positions 38,184,511 and 38,184,610, coinciding with the blast hit for the human DI2 SECIS element. Selenoprofiles also predicted DI2 to be in the same genomic region but no SECIS element was found. Therefore, this sequence seems to be a good candidate for the human DI2 selenoprotein.
As for DI1, we also found homology between DI2 and the regions coding for the other two members of the family (DI1 and DI3).
Back to top
Iodothyronine deiodinase 3 (DI3)
According to our predictions, DI3 is located between positions 642,962 and 643,796, on scaffold gi|357434110|gb|JH378253.1 on the reverse strand. It is predicted to have 1 exon.
A protein with high homology is predicted and exonerate and genewise give the same prediction. This protein has a nearly perfect homology with human DI3. It is predicted to have a selenocysteine and a SECIS element has been predicted between positions 642,284 and 642,378, by both SECISearch and Selenoprofiles. This SECIS element also coincides with the blast hit for the human DI3 SECIS element. Selenoprofiles also predicts a selenocysteine-containing DI3 in this region. All this data, makes this sequence a good candidate to be the DI3 orthologue in Saimiri boliviensis.
Similar to DI1 and DI2, we also found homology between DI3 and the regions coding for the other two members of the family (DI1 and DI2).
Back to top
Selenophosphate synthetase 1 (SPS1)
We predict that SPS1 is located on scaffold gi|357434180|gb|JH378183.1, between positions 6,916,614 and 6,917,789, on the forward strand. The gene is predicted to only have 1 exon.
Both exonerate and genewise predict a protein with a very high homology, identical to the human SPS1 except for 14 aminoacids across all the protein sequence. Both predictions start with a methionine. As in humans, this protein is predicted to contain threonine. So in this case, a SECIS element is not required. The results show agreement with those obtained with Selenoprofiles, which also predicts a protein of the SPS family in this region.
We have found addittional regions with homology to SPS1. Firstly, we found a highly homologous region on scaffold gi|357434222|gb|JH378141.1. In this case, the threonine is not conserved. We could hypothesize that this region was a duplication, but, in fact, the protein is mapping incorrectly because this position is the candidate for SPS2. This is not surprising, because the two proteins show high similarity (see the human SPS1 and SPS2 alignment).
The rest of the regions with homology have been shown to be candidates for pseudogenes. See discussion SPS2.
Back to top
Selenophosphate synthetase 2 (SPS2)
We predict that SPS2 is located on scaffold gi|357434222|gb|JH378141.1, between positions 10,046,722 and 10,048,062, on the forward strand. This gene is predicted to only have 1 exon, as the other member of the same family (SPS1).
Both exonerate and genewise predict a protein with a very high homology to the human query, identical except for 19 aminoacids across all the protein sequence to the human SPS2. Both predictions start with a methionine. This protein is predicted to contain selenocysteine. Consistent with this, a SECIS element is predicted. All the results we checked to find SECIS elements agree. So, a SECIS element is predicted to be between positions 10,048,612 and 10,048,709. This data strongly suggests this region codes for SPS2.
We found, as in the case of SPS1, a lot of additional predictions with homology. The homologous predictions that appear in SPS2 are the same that have also appeared in the SPS1 analysis. So, with this, we should not be able to conclude if this additional regions with homology are duplications of SPS1 or SPS2, because they show a similar homology with the queries. However, considering the blast we ran for the SECIS element of human SPS2, we propose the existence of several SPS2 pseudogenes because of the presence of nearby regions with similarity to this SECIS element.
On scaffold gi|357434203|gb|JH378160.1, we predict a pseudogene between positions 5,901,983 and 5,902,817, and a SECIS element between positions 5,903,369 and 5,903,463, on the forward strand.
On scaffold gi|357434251|gb|JH378112.1, we predict another pseudogene between positions 26,373,618 and 26,374,941, and a SECIS element between positions 26,372,966 and 26,373,060, on the reverse strand.
Finally, on scaffold gi|357434208|gb|JH378155.1, we predict the last pseudogene between positions 1,346,959 and 1,347,513, and a SECIS element between positions 1,346,059 and 1,345,962, also on the reverse strand.
This results let us hypothesize that the hits shared between SPS1 and SPS2 are in fact pseudogenes of the SPS2 protein, which are still conserving SECIS element sequences.
According to our predictions, Sel15 is located between positions 2,833,186 and 2,872,486, on scaffold gi|357434081|gb|JH378282.1, on the forward strand. The gene is predicted to have 4 exons.
A protein with high homology to the human Sel15 is predicted and exonerate and genewise give the same prediction. However, it is slightly different from the protein predicted by Selenoprofiles, which aligns the sequence with the Sel15 from Bos taurus. Selenoprofiles prediction starts with Methionine whereas the exonerate and genewise predictions do not. Therefore, the beginning of the protein seems to be better predicted by Selenoprofiles and the protein in Saimiri boliviensis might actually be shorter than in humans. Furthermore, Selenoprofiles predicts an additional exon, which suggests the presence of two isoforms of the protein (click here to see the alignment between the human Sel15, the predicted Sel15 for Saimiri boliviensis and the one from Selenoprofiles). In both cases, it is predicted to have a selenocysteine. A SECIS element has been predicted between positions 2,873,132 and 2,873,234 by both Selenoprofiles and SECISearch. This suggests it is the Sel15 orthologue in Saimiri boliviensis.
Back to top
According to our predictions, SelH is located between positions 13,716,419 and 13,716,781 on scaffold gi|357434243|gb|JH378120.1, on the reverse strand.
A protein with high homology to the human protein is predicted and exonerate and genewise give the same prediciton. This protein is predicted to have Selenocysteine. Neither SECISearch nor Selenoprofiles was able to predict a SECIS element. However, after blasting the human SelH SECIS element, a highly homologous region was found near the end of the coding sequence, between positions 13,716,299 and 13,716,381. A last point we would like to highlight is the presence of the typical redox box Cys-xx-Sec.
We found three predictions, from scaffolds gi|357434081|JH378282.1, gi|357434257|gb|JHA378106.1 and gi|357434258|gb|JH378105.1, which have a high homology with SelH sequence. The three of them conserve the selenocysteine codon. They could be pseudogenes, due to the presence of in-frame stop codons and frameshifts. We could also find regions with homology to human SelH SECIS element near the end of these predicted sequences.
According to our predictions, SelI is located between positions 4,195,499 and 4,229,733 on scaffold gi|357434111|gb|JH378252.1, on the reverse strand. The gene is predicted to have 10 exons.
A protein with high homology to the human protein is predicted and exonerate and genewise give the same prediction. This protein is predicted to have a selenocysteine in the same position as the human homologue. A SECIS element has been predicted with SECISearch between positions 4,131,740 and 4,131,840, although Selenoprofiles predicted a SECIS element between positions 4,194,118 and 4,194,216.
In our predictions, we found another region with high homology to SelI on scaffold gi|357434202|gb|JH378161.1. This sequence is a shorter copy of SelI, which does not include the region with the selenocysteine and although it does not contain in-frame stop codons or frameshifts we cannot be sure about its functionality.
SelI | Sec | BLAST | Exonerate | Genewise | SECIS BLAST | Selenoprofiles | Location |
---|---|---|---|---|---|---|---|
According to our predictions, SelK is located on scaffold gi|357434173|gb|JH378190.1 between positions 4,443,015 and 4,448,763 (as predicted from exonerate). According to exonerate, the gene has 4 exons.
Exonerate has predicted one more exon than genewise and has also been able to predict over the TGA aligning with the selenocysteine in the query. A SECIS element is predicted, with both SECISearch and Selenoprofiles between positions 4,449,135 and 4,449,239.
In our predictions we found regions located on other scaffolds with high homology to SelK. However, the predictions from these regions contain frameshifts or stop codons, so they might be SelK pseudogenes. Some of them also conserve a sequence with homology to the human SelK SECIS element.
According to our predictions, SelM is located on scaffold gi|357434157|gb|JH378206.1 between positions 638,279 and 639,555, on the forward strand. The gene is predicted to contain 4 exons.
Exonerate predicted an extra exon, which contains a TGA aligning with the selenocysteine in the query. Therefore, the protein is predicted to have selenocysteine. However, neither program was able to predict the first part of the protein. We checked the genomic sequence and found a stop codon shortly before the beginning of the prediction, which suggests the protein in Saimiri boliviensis is shorter. A SECIS element has been predicted between positions 781,186 and 781,291, which we think is too far from the end of the coding sequence. Selenoprofiles predicted another SECIS element, between positions 639,580 and 639,680, very close to the end of the protein, which seems a more plausible SECIS element. A last point we would like to highlight is the presence of the typical redox box, formed by Cys-xx-Sec.
Another homologuous sequence to SelM has been found on scaffold gi|357434258|gb|JH378105.1. This sequence has a high homology with the protein, but it has frameshifts and is shorter. So, it could be a SelM pseudogene.
SelM | Sec | BLAST | Exonerate | Genewise | SECIS BLAST | Selenoprofiles | Location |
---|---|---|---|---|---|---|---|
We predict this protein is located on scaffold gi|357434190|gb|JH378173.1, between positions 941,427 and 955,973, on the reverse strand. The gene is predicted to have 12 exons.
Exonerate had several predictions for this genomic region, and we chose the prediction which was most in agreement with genewise. Both predict a protein with a very high homology to the human query. However, neither exonerate nor genewise were able to give a prediction for the first part of the protein, including the methionine. When checking the results, we see that just before the first amino acid predicted, there is a stop codon, suggesting a shorter protein.This protein is predicted to contain two selenocysteines. Therefore, a SECIS element is required. Analyzing SECISearch results we found a SECIS element between positions 940,281 and 940,366. And these results are similar with those reported by Selenoprofiles and the blast for the SECIS element of the human SelN. Selenoprofiles gives a different prediction for the beginning of the protein, but it includes a 82,000 bp intron which makes it quite unlikely.
The blast for the SECIS element of the human SelN found another highly homologus region on scaffold gi|357434227|gb|JH378136.1, but we did not find any good prediction from this scaffold.
We would like to highlight the fact that when we ran the blast, we got a high number of hits for this protein. When analyzing that, we saw that a big number of them were only containing a very small part of the sequence, and always the same. We blasted this fragment against the Homo sapiens genome in the NCBI BLAST and found that it contains a conserved domain found in other proteins. This domain consists of 4 aminoacids: GVQW, and is often found nested inside longer domains. Its function is unknown, but it has been proposed as a binding domain.
Back to top
According to our predictions, SelO is located between positions 6,463,684 and 6,496,017 on scaffold gi|357434206|gb|JH378157.1, on the forward strand. The gene is predicted to contain at least 9 exons.
Exonerate and genewise differ in their predictions. A fragment of the sequence we predict encodes SelO has not been sequenced, which makes the prediction of the beginning of the protein unreliable. Exonerate predicts something before, but the similarity is low. Most likely, the missing part is found on the unsequenced fragment. Exonerate predicts selenocysteine in the protein, whereas genewise is unable to extend the prediction over the TGA codon, which is found very close to the end of the protein. Consistent with the presence of selenocysteine,a SECIS element was predicted between positions 6,496,802 and 6,496,894. Selenoprofiles also predicted SelO in this genomic region and the same SECIS element.
Back to top
According to our predictions, SelP is located on scaffold gi|357434243|gb|JH378120.1 between positions 25,005,656 and 25,012,980. The gene is predicted to have 4 exons.
SelP is unique because it has multiple selenocysteine residues. This is also the case for the Saimiri boloviensis orthologue. The predicted protein shows a high homology with human SelP. Due to the presence of the TGA codons encoding selenocysteine, the genewise prediction ends earlier, whereas exonerate is able to predict a longer protein with more selenocysteines. It is interesting to point that in Saimiri boliviensis there are some selenocysteines aligning with cysteines in the human query. Also, an arginine in the query aligns with a TGA codon. This could either be a stop codon, which would produce a shorter protein, or encode a selenocysteine. In fact, it has been described for other instances the mutation of a TGA into an arginine codon (CGA) (Mariotti et al, 2012). SelP is known to have two SECIS elements. In agreement, there are two SECIS elements predicted on positions 25,013,223-25,013,321, and 25,013,645-25,013,733. Selenoprofiles has predicted a SelP in this region and a SECIS element on position 25,013,222-25,013,322, which coincides with one of our predictions.
Back to top
Methionine-R-sufoxide reductase 1 (SelR1)
According to our predictions, SelR1 is located on scaffold gi|357434229|gb|JH378134.1 between positions 16,036,669 and 16,042,198, on the forward strand.
This protein shows a high homology with human SelR1 and it is predicted to have a selenocysteine. There is a SECIS element predicted between positions 16,042,663 and 16,042,765. Selenoprofiles has predicted the same protein and the same SECIS element. For all this predictions, we believe this is a good candidate to be the orthologue of human SelR1.
We could also predict proteins with homology to SelR1 from other genomic regions. We found homology with the regions that we predict that code for the other members of the SelR family, which is not surprising because they have sequence similarity (see the human SelR family alignment).
In our predictions we found one prediction from scaffold gi|357434082|gb|JH378281.1 which has a high homology with the protein but presents some frameshifts. It also presents a selenocysteine in the same position as the protein. So, this sequence might be a pseudogene of SelR1.
SelR1 | Sec | BLAST | Exonerate | Genewise | SECIS BLAST | Selenoprofiles | Location |
---|---|---|---|---|---|---|---|
Methionine-R-sufoxide reductase 2 (SelR2)
According to our predictions, SelR2 is located on scaffold gi|357434219|gb|JH378144.1 between positions 14,803,179 and 14,816,100, on the forward strand.
Exonerate predicts a gene with 4 exons, whereas genewise predicts one more exon at the beginning. However, neither program was able to predict any sequence aligning with the beginning of the query. We checked the genomic sequence and found a stop codon near the beginning of the prediction, which suggests that the protein in Saimiri boliviensis is shorter. However, the predicted sequence shows a high homology with human SelR2. It is predicted to have a cysteine, not a selenocysteine, like its human homologue. In agreement with that, no SECIS element has been predicted in this region. Selenoprofiles also predicts a SelR protein in this region, but it does not detail which one.
Back to top
Methionine-R-sufoxide reductase 3 (SelR3)
According to our predictions, SelR3 is located on scaffold gi|357434240|gb|JH378123.1 between positions 8,193,634 and 8,369,350, on the forward strand.
A protein with high homology to the human protein is predicted and exonerate and genewise give the same prediction. This protein is predicted to have a cysteine, like its homologue. In agreement with this, no SECIS element has been predicted.
Back to top
According to our predictions, SelS is located between positions 3,161,632 and 3,155,821 on scaffold gi|357434129|gb|JH378234.1, on the forward strand. The gene is predicted to have 6 exons.
A protein with high homology with the human SelS is predicted. Only exonerate was able to extend the prediction until the end of the query, thus predicting a selenocysteine, which is the penultimate residue. A SECIS element is predicted between positions 3,155,393 and 3,155,495. Selenoprofiles predicted the same protein and SECIS element.
Back to top
According to our predictions, SelT is located on scaffold gi|357434246|gb|JH378117.1 between positions 24,617,380 and 24,635,796, on the reverse strand.
A protein with high homology to the human SelT is predicted, except for the C-terminus, which is missing. Neither exonerate nor genewise were able to give a prediction for this part. We checked the genomic sequence and found a stop codon shortly after the end of the exonerate prediction. Although genewise predicts a short last exon not predicted by exonerate, this is very unlikely to be real because the preceding intron is predicted to be almost 150,000 bp. Therefore, it seems that the Saimiri boliviensis homologue has lost the C-terminus or it might be that the sequence is less conserved and therefore not predicted. Nevertheless, both programs agree in that the protein is predicted to have a selenocysteine. Neither SECISearch nor Selenoprofiles predicted any SECIS element, but we obtained a region highly homologous to the SECIS element of the human SelT between positions 24,614,592 and 24,614,667.
Back to top
We predict that SelU1 is located on scaffold gi|357434097|gb|JH378266.1, between positions 190,982 and 201,943, on the forward strand. The gene is predicted to have 5 exons.
Both exonerate and genewise predict a protein with a very high homology to the human query. Specifically, it is identical except for 9 amino acids, across all the protein sequence, to the human SelU1. This protein is predicted to contain cysteine. So, consistent with this, a SECIS element is not predicted. Furthermore, Selenoprofiles also predicts a cysteine homologue in this region. A last point we would like to highlight is the presence of the typical redox box, formed by Cys-xx-Cys.
We could also predict proteins with homology to SelU1 from other genomic regions. We found homology with the regions that we predict that code for the other members of the SelU family, which is not surprising because they have sequence similarity (see the human SelU family alignment).
It is important to emphasize that another high homology region was predicted from scaffold gi|357434233|gb|JH378130.1, between positions 9,566,559 and 9,567,220. Nevertheless, it contains frameshifts, which means changing the reading frame, and consequently, resulting in a completely different translation from the original. So, in this case, we could hypothesize that it is a pseudogene.
SelU1 | Sec | BLAST | Exonerate | Genewise | SECIS BLAST | Selenoprofiles | Location |
---|---|---|---|---|---|---|---|
According to our predictions, SelU2 is located on scaffold gi|357434107|gb|JH378256.1, between positions 2,329,004 and 2,329,663, on the reverse strand. The gene is predicted to have 1 exon.
Both exonerate and genewise predict a protein with a very high homology to the human SelU2. However, neither program was able to give a prediction for the first 6 aminoacids of the protein, including the methionine. In fact, this region has not been properly sequenced. This protein, as the other members of the family, contains a cysteine residue. So, consistent with this, a SECIS element is not needed. Like the other members of the family, SelU2 also contains the Cys-xx-Cys motif.
We could also predict proteins with homology to SelU2 from other genomic regions. For instance, we found an homologous region on scaffold gi|357434120|gb|JH378243.1, between positions 2,821,615 and 2,822,240 which might be a SelU2 pseudogene due to the presence of frameshifts.
Finally, we found an additional region with homology on scaffold gi|357434160|gb|JH378203.1, between positions 2,192,915 and 2,200,766. However, this region is only predicted for the last part of the protein, and it does not conserve the cysteine residue. It might be a duplication of the last part of the protein.
SelU2 | Sec | BLAST | Exonerate | Genewise | SECIS BLAST | Selenoprofiles | Location |
---|---|---|---|---|---|---|---|
We predict that SelU3 is located on scaffold gi|357434142|gb|JH378221.1. Predictions from exonerate and genewise differ in a small region. For exonerate, the protein has 5 exons and it is located between positions 5,312,424 and 5,316,582. Meanwhile, genewise predicts a gene with 4 exons, located between positions 5,312,851 and 5,316,582. As we can see, the last position is the same, but exonerate starts its prediction a little bit earlier. The gene predicted in both cases is on the reverse strand.
Neither exonerate nor genewise were able to give a prediction for the first part of the query sequence, containing the methionine. When examining the sequence, we see that this part is not sequenced. However, the part which was predicted presents a good homology with the human query. However, this is not as good as the homology presented in the other two members of the family, which was almost perfect. Like SelU1 and SelU2, SelU3 also contains a cysteine residue. In agreement with this, a SECIS element is not required. Another point to highlight is the fact that like SelU1 and SelU2, SelU3 also contains the Cys-xx-Cys box.
We could also predict proteins with homology to SelU3 from other genomic regions. We found homology with the region that we predict that codes for SelU2, which is not suprising because they have sequence similarity.
Back to top
We predict SelV on scaffold gi|357434082|gb|JH378281.1, between positions 1,020,235 and 1,023,480, on the reverse strand. The gene is predicted to have 5 exons, although there is a region that codes for the protein that has not been sequenced and this may have interfered in the gene prediction.
The fact that the genomic region is not fully sequenced is probably the reason why exonerate and genewise predictions vary. Both predictions show homology to the human query, but the prediction from genewise was a little bit better. Selenoprofiles also aligned human SelV in this region, although it labelled the prediction as SelW. According to this prediction, the protein contains selenocysteine. Consistent with this, a SECIS element is predicted. However, a discordancy is shown. First, if we analyze SECISearch, a SECIS element is predicted between positions 1,016,637 and 1,016,733, which is about 3,500 nucleotides further from the coding sequence. This region includes the blast hit for human SelV SECIS element, which was found between 1,016,664 and 1,016,712. Second, analyzing Selenoprofiles, a SECIS element is predicted between positions 1,018,898 and 1,019,006, which is nearer to the coding sequence than the first SECIS prediction. This data strongly suggest that a SECIS element exists, but we cannot infer the correct position.
Back to top
We predict SelW1 on scaffold gi|357434131|gb|JH378232.1. Predictions from exonerate and genewise differ. For exonerate, the protein has 4 exons and it is located between positions 3,025,481 and 3,027,931. On the other hand, genewise predicts only 3 exons, locating the protein between positions 3,027,457 and 3,027,931. As we can see, the last position is the same, but exonerate starts its prediction a little bit earlier, including the part of the protein with the selenocysteine in the query and the first methionine. The gene predicted in both cases is on the forward strand.
The human homologue contains selenocysteine. However, this aligns with a leucine in the exonerate prediction. It is unlikely that this reflects the reality. When we look at the Selenoprofiles output, we realise that Selenoprofiles predicts a SelW gene on the same region but with a different distribution of introns and exons. This results in a TGA that aligns with a selenocysteine in the query. We were surprised by the fact that the sequence of the query in Selenoprofiles has two selenocysteines, whereas the Canis lupus familiaris sequence found in the Protein database contains a cysteine instead of the first selenocysteine. Therefore, we omitted this fact. Consistent with the presence of a selenocysteine, a SECIS element is predicted by Selenoprofiles between positions 3,028,730 and 3,028,839. We found a blast hit for the human SelW1 SECIS element between positions 3,031,002 and 3,031,061. We can see that the SECIS element predicted by Selenoprofiles is nearer than the other to the end of the coding sequence. However, we cannot confirm that this is the good one because SECIS element prediction is not always reliable.
We found a region with high homology to the SelW1 query corresponding with the region coding for SelW2. This is not surprising considering the similarity between the sequences of both proteins (see the human SelW1 and SelW2 alignment).
Finally, we found two additional regions with homology, on scaffolds gi|357434165|gb|JH378198.1 and gi|357434224|gb|JH378139.1, which might be SelW1 pseudogenes due to the presence of in-frame stop codons and frameshifts. For the first scaffold, we predict a pseudogene between positions 6,458,628 and 6,458,795. Moreover, a SECIS element is also conserved, between positions 6,458,828 and 6,458,887. Concerning the second one, a pseudogene is predicted between positions 4,491,761 and 4,492,053, but here we do not have any conserved SECIS element sequence.
We predict that SelW2 is located on scaffold gi|357434092|gb|JH378271.1, between positions 3,225,523 and 3,226,514, on the forward strand. The gene is predicted to have 4 exons.
Both exonerate and genewise predict a protein with a very high homology to the human query. Specifically, it is identical except for 7 aminoacids across all the protein sequence to the human SelW2. This protein is predicted to contain cysteine, like human SelW2. Consistent with this, a SECIS element is not predicted because it is not required.
Back to top
Methionine sulfoxide reductase A (MsrA)
According to our predictions, MsrA is located between positions 8,712,109 and 9,090,030, on scaffold gi|357434168|gb|JH378195.1, on the forward strand. The gene is predicted to have 6 exons.
A protein with high homology to the human query is predicted and exonerate and genewise give the same prediction. This protein has a nearly perfect homology with human MsrA. This protein is predicted to have a cysteine, in the same position as its homologue. For this, no SECIS element has been predicted neither by SECISearch nor Selenoprofiles.
Back to top
Selenophosphate synthetase 1 (SPS1)
Selenophosphate synthetase 2 (SPS2)
See above
Phosphoseryl-tRNA(Sec) selenium transferase (SecS)
We predict this protein is located between positions 11,315,398 and 11,350,588 on scaffold gi|357434235|gb|JH378128.1, on the reverse strand. It is predicted to have 11 exons. Both exonerate and genewise predict a protein with a high similarity to the human query.
Back to top
We predict that this protein is located between positions 10,071,557 and 10,078,560 on scaffold gi|357434253|gb|JH378110.1, on the reverse strand. It is predicted to have 6 exons. Both exonerate and genewise predict a protein with high similarity to the human query although the prediction of the start of the last exon differs slightly between the two programs.
Back to top
tRNA-selenocysteine 1-associated protein 1 (SECp43)
We predict this protein is located between positions 114,116 and 143,010 on scaffold gi|357434040|gb|JH378323.1, on the reverse strand. It is predicted to have 9 exons. Both exonerate and genewise predict a protein identical to the human query.
Back to top
SECIS binding protein 2 (SBP2)
We predict this protein is located on scaffold gi|357434160|gb|JH378203.1, on the forward strand.
Genewise predicted a gene with 17 exons, located between positions 4,774,003 and 4,815,059. Exonerate gave two main predictions from this subsequence. The first one predicts a protein that spans the C-terminus of the query. When we checked the sequence, we saw that there was a stop codon just before the initiation of the prediction. And moreover, before the stop codon, there was a part of the sequence which was not sequenced. Concerning the second prediction, we saw that it was giving a prediction for the first part of the protein, and it stopped shortly before a stop codon. So, joining the two predictions, we got one prediction which has the first and the last part sequenced, and an important gap in the middle. Genewise included the unsequenced part in an intron and gave a full prediction. Therefore, athough it seems that the protein is actually found in the genome, we cannot be sure about its sequence.
Back to top
Selenocysteine-specific elongation factor (eEFSec)
We predict that this protein is located on scaffold gi|357434241|gb|JH378122.1, between positions 552,627 and 819,908, on the forward strand. The gene is predicted to have 7 exons.
A protein with very high homology is predicted and exonerate and genewise give the same prediction. However, any prediction was able to predict the first part, containing the metionine.
Back to top