Conclusions

The 21st amino acid is Selenocysteine (Sec) which has an important micronutrient, Selenium, incorporated into its structure. These types of proteins are called selenoproteins and they are critical for some influential biological functions such as the ones involving oxidative damage protection. Despite the fact they are very important, the precise function of many of them is not known yet as it has been discussed previously in this project.

A substantial problem we have had to deal with are the imprecise and ambiguous selenoproteomes among the different species. UGA is the coding sequence for Sec, which is relevant as this is classically recognised as a stop codon. Consequently, bioinformatic programmes can fail in recognising selenoproteins.

Using an homology-based approach we performed an exhaustive genome-wide prediction of all the selenoproteins, cys-homologues and machinery proteins involved in the biosynthesis of selenoproteins (MPBS) that are found either in human or squirrel (see methods). We were able to find a total of 21 selenoproteins, 9 cys homologues and 7 MPBS. However we were unable to certainly predict 3 selenoproteins because they lacked the N-terminus part as well as the Sec residue. Also we could not predict the human SelenoV, which is normal because this protein is not found in the squirrel selenoproteome, which is phylogenetically closer to our species. Regarding the MPBS, we could find all of the but the eEFSec.

Our study was limited by two important factors. First, half of the selenoproteins of squirrel found in SelenoDB 2.0 did not start by a Met residue, this made us be concerned about protein annotation. For this reason we had to use humans queries for the prediction of most proteins, which are better annotated . These resulted in less efficacy when it came to find good hits in the genome as the human proteins are phylogenetically further distant. The second limitation was related to possible sequencing problems of the genome, we found that in some regions there were multiple N instead of the corresponding nucleotide (A, C, T or G), that probably was the reason why we got some parts that were lacking in some predicted proteins. Also, in some cases we found insertions/deletions in the predicted sequence that caused frameshifts in the alignment when compared to the model organisms annotated selenoproteins. These might also be sequencing problems or due to the fact that the Spermophilus dauricus proteins that have those ins/dels have diverged in relation to the human ones.