Selenoprotein annotation in

Ceratotherium simum




Abstract

Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins, UGA is recoded to Sec in presence of specific features on gene transcripts. Due to the dual role of the UGA codon, selenoprotein prediction and annotation are difficult tasks, and even known selenoproteins are often miss-annotated in genome databases.

We performed a homology-based in silico search to scan the genome of Ceratotherium simum. Our method was based on the known selenoproteins of Homo sapiens, Mus musculus and Equus caballus. We analyzed our data using four different programs: T-Coffe, GeneWise, Seblastian and Selenoprofiles as well as exonerate for the gene prediction. The results were contrasted between those outputs and finally obtained 29 selenoprotein genes, 8 Cys-homologs and 5 proteins involved in the selenoprotein synthesis.

We would like to highlight those results of special interest such as: the GPx5 evolutive origin which has been defined as an inverted duplication in tandem; the SelO2 finding in Ceratotherum simum and not present in humans; the SelR2 prediction as a challenge of our predicting system; the SelV and SelW association linked to the evolutive origin; the double form of SBP2 described; among others. This work intends to contribute to the study of the selenoproteome evolution in mammals.