Main ideas
The aim of this study was to identify selenoproteins and selenoprotein machinery required for their synthesis in the genome of Callopanchax toddi. In order to achieve this, we tried to align forty-seven different proteins previously annotated in Dario rerio. We chose this specie as a reference because of its phylogenetic proximity with C.toddi and due to the fact that its genome is the most characterized.
In this study proteins have to comply certain requirements to be considered selenoproteins. First, they have to contain a selenocysteine residue in their sequence. Second, they have to have a SECIS element in the same strand as where the protein is found and it has to be within a minimum distance of 51-111 nucleotides from the 3’-UTR end of the gene and not extremely far away. Following this criteria the results obtained were:
SELENOPROTEINS
- The ones which exist in Zebrafish and C.toddi: DIO1, DIO2, DIO3a, DIO3b, GPx1a, GPx1b, GPx2, GPx3a, GPx3b, GPx4a, GPx4b, MSRB1a, MSRB1b, Sel15, SelenoH, SelenoI, SelenoK, SelenoL, SelenoM, SelenoN, SelenoO1, SelenoP, SelenoS, SelenoT1, SelenoU1a, SelenoW and TXNRD2.
- The ones which exist only in Zebrafish: SelenoE, SelenoJ1, SelenoO2, SelenoT1b, SelenoT2 and TXNRD3.
- The ones which suffered a duplication in C.toddi: SelenoO1.
CYSTEINE HOMOLOGOUS
- The ones which exist in Zebrafish and C.toddi: GPx7, GPx8, MSRB2, MSRB3, SelenoU2, SelenoU3 and MsrA.
- The ones which suffered a duplication in C.toddi: MSRB3
SELENOPROTEIN MACHINERY
- The ones which exist in Zebrafish and C.toddi: SEPHS1, SEPHS2 (selenoprotein), PSTK, SBP2, SECp43, SecS and eEFsec.
A total of 28 selenoproteins can be found in Callopanchax toddi, one of them being part of the selenoprotein machinery. On the other hand, 13 cysteine homologues can be located in C.toddi, 6 of which are part of the selenoprotein machinery. Some other possible selenoproteins couldn’t be studied because of the lack of a reference sequence.
It is important to take into account that the number of selenoproteins found in Zebrafish and C.toddi is almost the double of the number found in Homo sapiens due to the whole-genome duplication, exclusive of Teleosts. However, the number of selenoproteins is not exactly the double because deletions tend to happen after whole-genome duplications. Some of the duplicated selenoproteins are: DIO3, GPx1, GPx3, GPx4, MSRB1, SelenoO and SelenoT.
We have to remember that many of the predicted selenoproteins didn’t start with a Methionine. The same happened with some of Zebrafish’s proteins. This could mean that we are missing parts of the beginning of the gen so the predictions could be only a fragment of the whole protein. By using Seblastian, we partially corrected this problem because in this program almost all proteins started with a methionine, even though they were from other reference species.
Another issue that we encountered is that we didn’t classified some proteins that had a selenocysteine residue as selenoproteins because the SECIS predicted was too close to the end of gene. Nevertheless, the SECIS elements obtained were from different species than Zebrafish, so location could vary.
To sum up, various selenoproteins have been identified in the genome of Callopanchax toddi but further research is needed. A possibility to improve the accuracy could be aligning the genome of C.toddi with other organisms more closely related phylogenetically.