After the discussion analysis, we proceed to develop the conclusions of this study.
First of all, it is important to remember the main aim of this project, which was to identify precisely of selenoproteins and selenoprotein machinery required for their synthesis that are present in Helostoma temminkii’s genome. Fifty-three proteins from Danio rerio were tested in order to align with Helostoma temminkii’s genome. This species was chosen because of its phylogenetic proximity to Helostoma temminkii and because it has the most accurate genome annotation among fishes. For some uncertain results, proteins from Oryzias latipes (Medaka) were also tested, considering that this species has been demonstrated to be more related to Helostoma temminkii than Danio rerio.
During this study, in order to consider proteins as selenoproteins, they had to follow some criteria requirements. First of all, selenocysteine should be present in the sequence of an exon of the gene. Second, SECIS structure should be predicted in the 3’-UTR. And third, this SECIS should be predicted in the same chain where the gene is located , not in the complementary one and within a short distance. A classification of the obtained results is shown below:
Present in both Helostoma temminkii and Danio rerio: DIO1, DIO2, DIO3a, DIO3b, GPx1a, GPx1b, GPx2, GPx3a, GPx3b, GPx4a, GPx4b, MSRB1a, MSRB1b, Sel15, SELENOE, SELENOH, SELENOI, SELENOJ1, SELENOK, SELENOM, SELENON, SELENOO1, SELENOO2, SELENOP1, SELENOP2, SELENOS, SELENOT1, SELENOT1b, SELENOT2, SELENOU1a.2, SELENOW3, SEPHS2, TXNRD2 and TXNRD3.
Not present in Danio rerio: SELENOJ1.2 and SELENOU1a.1, SELENOU1a.3.
Present in Danio rerio but not in Helostoma temminkii’s genome: SELENOL, SELENOW1 and SELENOW2.
GPx7, GPx8, MsrA1, MsrA2, MSRB2, MSRB3, SELENOU2, SELENOU3 and SEPHS.
eEFsec, PSTK, SBP2-1, SBP2-2, SECp43-1, SECp43-2, SecS.
To sum up, 34 selenoproteins were present in both organisms, 3 selenoproteins seemed to have disappeared in Helostoma temminkii’s genome and 3 new selenoproteins appeared in Helostoma temminkii genome presumably by an event of duplication in SELENOJ1 gene and a SELENOU1a. A total of 37 selenoproteins were present in Helostoma temminkii, added to 16 Cys-containing homologs, 7 of which are part of the selenoprotein machinery.
It is important to remember the duplication event previously explained in the introduction. Its results can be observed in the quantitative difference that exists between Helostoma temminkii and, for instance, Homo sapiens selenoproteomes. In the human genome, only 25 selenoproteins are present, while approximately 10 more are found in Helostoma temminkii. This number may not be the double due to the fact that deletion events are favoured after large duplication events. Also, other duplication events could have occured during the divergence of these species.
Some of the genes that appeared duplicated in Helostoma temminkii opposite to Human genome were DIO3, GPx3, GPx4, MrsA, MSRB1, SPBP2, SECp43, SELENOJ1, SELENOO, SELENOU1, SELENOT and SEPHS.
Overall, a total number of 53 proteins have been predicted in Helostoma temminkii. In spite of that, the results shown may indicate that, even though Zebrafish genome was thought to be carefully and precisely annotated in the database, some of the protein queries did not start with methionine (Met); this observation would lead to a hypothesis based on the assumption that we are obtaining some predictions that may not contain the whole protein but a fragment of it, whereas some others could start earlier or later than the real protein. These results were corroborated with Seblastian protein prediction results, where almost all the proteins start with Met. In some cases, the queries present Met in the beginning of their sequence but the prediction of our protein starts some amino acids later. These results could be due to errors during the prediction but it is not conclusive since Exonerate or Genewise sometimes do not capture the first amino acids of the protein.
As it has been previously mentioned, SECIS are secondary mRNA structures and Seblastian and SECISearch3 follow algorithms in order to predict all similar structures found in the DNA sequence corresponding to the gene being tested.
One of the issues encountered remain in the differences between genes considering that, for instance, no SECIS was predicted for SEPHS2 while we had predicted it corresponded to a selenoprotein due to the presence of a selenocysteine residue in the sequence and also due to the presence of SEPHS2 as a selenoprotein in both Medaka and Zebrafish. This could be explained because Sebastian uses databases containing genomes from different species to compare the sequence of the gene which codifies for our protein. In some cases, this could lead to results where the SECIS element was not predicted although it might still be present. Despite that, in most of these cases, SECISearch3 could finally predict SECIS elements.
On the other hand, multiple SECIS were predicted for other proteins where we did not predict as selenoproteins but Cys-containing homologs (GPx7, MsrA1, SBP2-1, SBP2-2 and SELENOU3), whereas some of them were not located on the 3’-UTR gene extreme or really far from this end or located in an opposite strand. All these SECIS structures were predicted only by SECISearch3. This observation could be due to a high sensitivity of the algorithm. It is important to highlight that, with this data, we can neither evaluate nor conclude if this presence or absence of SECIS is truly due to prediction limitations or to evolution.
Another limitation we found was that, in some cases, the selection of scaffold was not so obvious. This could be due to the basis of alignment programs, where gaps or mismatches are included in different parts of the sequence due to its attempt to obtain the maximum score. Probably, different programs could penalize more or less these mismatches or gaps, and, consequently, different scores could be obtained. Because of that, in order to obtain the best alignment, Genewise and Exonerate were executed for all the proteins and the differences were analyzed to determine which prediction was more accurate for every single protein. When the predictions were not good enough with both programmes, other species closer to Helostoma temminkii were taken as reference in order to obtain stronger alignments, even though no other fish species were as properly annotated as Danio rerio.
For a better accuracy of the results, other databases and organisms could be used to align with the genome of Helostoma temminkii.
In summary, multiple selenoproteins have been predicted in Helostoma temminkii’s genome. This information represents a small contribution to selenoprotein knowledge in bony fishes and could be useful for further research developed in the field.