Selenoproteins in Nannopterum harrisi

Selenoproteins have been conserved through evolution, probably due to their unique physicochemical properties, making them an easy target to phylogenetic analysis. However, identification of these proteins in sequence databases is challenging mostly because of the recognition of selenocysteine UGA codons as this are normally understood as stop signals by sequence annotators.

In this project, we characterized selenoproteins of the proteome of Nannopterum harrisi by applying gene prediction and phylogenetic reconstruction methods and supplemented it with the analyses of SECIS elements as well as the presence of the selenocysteine aminoacid. Moreover, we have used Seblastian prediction of selenoproteins to check the reliability of our predicted proteins.

Overall, we correctly predicted 37 selenoproteins, such as Sel15, GPx2, GPx3, GPx4, GPx7, GPx8, DIO1, DIO2, DIO3, MsrA, SELENOH, SELENOI, SELENOK, SELENOM, SELENON, SELENOO (1), SELENOO (2), SELENOP (1), SELENOP (2), MSRB1, MSRB2, MSRB3, SELENOS, SELENOT, SELENOU(1), SELENOU(2), TXNRD1, TXNRD2, TXNRD3, as well as the following machinery: eEFsec, PSTK, SBP2, SECS, SEPHS, SECp43(1), SECp43(2)

This result is consistent with what has been in the literature, as we could identify almost all of the selenoproteins described in birds (we failed in identifying for example SelW) (26).

Jiang L, Ni J, Liu Q. Evolution of selenoproteins in the metazoan. BMC Genomics. 2012 Sep 3;13:446.


Some selenoproteins were lost across vertebrates after the terrestrial environment colonization. This resulted in a loss of the selenocysteine residue, even though it is characteristic of all selenoproteins and consequently the conversion of Sec to Cys. In our project, we found some selenoproteins such as Gpx7, GPx8, MrsA, SELENOI, SELENOO(1), MSRB3, SELENOU(2) which had lost the selenocysteine residue. In some of this cases, the loss of cysteine was already reported in the literature. We also need to take into account the machinery proteins which don’t contain the selenocysteine residue, but play an important role in the formation and expression of the selenoproteins.

As selenocysteine residue is encoded as an stop codon, the insertion to peptide chains needs a specific mRNA structure called SECIS which recognizes the selenoprotein synthesis complex and allows the change of the stop codon to a selenocysteine. As SECIS elements are crucial to the insertion of selenocysteine residues, this structures are conserved amongst species, usually in 3’UTR in eukaryotes. In our predicted proteins, most of the selenoproteins contained SECIS elements in 3’UTR.

Another point worth mentioning is the presence of duplications, we have found duplications in proteins such as SEHPS, SELENOI and SECp43.

Our prediction has some limitations, first of all, we have found out that there was a poor annotation of both G. gallus and T. guttata selenoproteome in SelenoDB 2.0. In most of the cases Met residue was not found as the first amino acid of the protein. The only proteins that we could predict with Met as a starting codon were: DIO1, DIO2, SELENOK, SELENOU(1), TXNRD2 and SEPHS(1). This lack of Met as the first aminoacid was highly perceived during the comparison with Seblastian predicted protein, since this ones were much more informative and complex than our predicted proteins.

Another limitation we faced is the lack of SECIS elements in some predicted proteins, even though they could be lost due to evolution, there also could have been experimental problems, for example not having taken into account a region larger enough in fastasubseq to contain them.

All in all, after this project we have obtained successfully the selenoproteins in Nannopterum harrisi, which was not available until now.