We predicted that Saimiri boliviensis has all the selenoproteins and selenoprotein homologues annotated in humans, with high homology in most cases, as expected from their phylogenetic proximity. This includes a total of 36 proteins. In all cases, human and S. boliviensis share the residue, being selenocysteine or cysteine, except for GPx6, which contains cysteine in S. boliviensis instead of selenocysteine. For GPx1, it is not possible to predict the residue because the genomic region has not been fully sequenced. We also predict that S. boliviensis has the proteins required for selenoprotein synthesis.
The aim of this project was to search for selenoproteins in the S. boliviensis genome. In order to find them, we used gene prediction based on homology. Moreover, SECIS element prediction was used to complement the analysis. The fact that S. boliviensis and Homo sapiens present phylogenetic proximity let us use the human proteins (which are already annotated) as queries to search for homologous proteins in S. boliviensis genome. In most cases, our results were in agreement with Selenoprofiles.
Here we present which proteins contain selenocysteine, cysteine or other homologues:
The first group includes GPx2, GPx3, GPx4, DI1, DI2, DI3, SPS2, Sel15, SelH, SelI, SelK, SelM, SelN, SelO, SelR1, SelS, SelT, SelV, TR1, TR2, TR3 and SelW1. All of them have selenocysteine residues like their human homologue. We could not assert whether GPx1, which in humans has selenocysteine, is also a selenoprotein because the genomic region encoding this protein is not properly sequenced.
The second group consists of GPx5, GPx7, GPx8, MsrA, SelR2, SelR3, SelU1, SelU2, SelU3, SelW2, which have a cysteine residue, also like their human homologues. Additionally, we found one protein which contains a cysteine residue, instead of the selenocysteine present in the query: GPx6.
Thirdly, SelP, which is special for the presence of multiple selenocysteines, contains additional selenocysteines compared to the human homologue.
Finally, SPS1, which contains threonine like its human homologue.
We also analyzed the presence or absence of SECIS elements. Those proteins which contain selenocysteine should contain a SECIS element downstream the coding sequence. And those which contain cysteine or other homologues, should not contain SECIS elements. When analyzing, we encountered different situations. Most of them were in agreement with the theory. However, in some cases proteins were predicted to have a selenocysteine but no SECIS element was predicted. It could be that SECISearch failed because SECIS prediction is challenging. In order to complement the analysis we also searched for regions homologous to the human SECIS elements with blast. On the other hand, a SECIS element was predicted for proteins without selenocysteine. This could be explained by the fact that the protein has lost the selenocysteine but the SECIS element is still conserved in the sequence.
Moreover we would like to refer to certain troubles we have encountered during the project:
First of all, some of our candidate genes for selenoproteins were missing parts of the protein. When checking that, we saw that these regions were not properly sequenced in the S. boliviensis genome. For instance, GPx1 is missing 50 of its first amino acids, including the selenocysteine or cysteine residue.
Our predictions also found shorter selenoproteins than the human homologues, most probably due to stop codons. These stop codons might be a real stop codon that generates a shorter protein, but it could also be part of an intron that is preceded by an exon not conserved from the human sequence and therefore not predicted.
Furthermore, we hypothesized that some of our findings could be possible pseudogenes for some proteins like GPx1, SPS2, SelH, SelI, SelK, SelM, SelR1, SelU1, SelU2, SelW1. They have high homology with the human sequence but they show some frameshifts or stop codons in the sequence that discard them as the functional gene. Also, some of them, have conserved the selenocysteine like its homologue gene and even have SECIS elements. We should consider that because the genome has not been fully sequenced there might be other pseudogenes or even functional homologues that were not identified.
Concerning the extent of the project, it should be taken into account that this homology-based approach only allows to identify those proteins that have already been annotated in other species. It could be that S. boliviensis contains additional selenoproteins belonging to families that have not been discovered yet. Moreover, further research could be done to confirm our results and extend the characterization of the predicted proteins. One line could be the comparison of the predicted proteins with orthologs in other species besides human. It could also be interesting to further analyze the SECIS elements predicted in proteins without selenocysteine. Moreover, the machinery characterization could be improved by the search of the tRNA[Ser]Sec. Nevertheless, we think our findings are significant and contribute to extending the current knowledge in the selenoprotein field.