When the time came to analyze the outputs obtained from the program for the different proteins, we encountered two recurrent problems in some proteins that we have considered to be important to highlight and discuss in more depth.
The first challenge we met is the fact that some of our protein predictions do not start with a methionine (Met) residue. To surpass this problem, we decided to not only compare our experimental genome against chicken proteins, but to also do it against human selenoproteins, Cys-homologs and machinery proteins found in SelenoDB 1.0. Looking at the acquired data, we see that even in the predictions done with the human genome as the reference genome, we still have some predicted protein that do not start in Met. Because of this limitation, in some cases we have been unable to obtain a high-quality protein prediction. In the given cases, we have taken into account all the acquired data in order to come to conclusions. In the cases in which our outputs suggest the presence of a protein, we have concluded that the final prediction is not a whole protein, but simply a part of one (e.g. MsrA).
This procedure has a direct consequence: in some predicted proteins, we have alignments done against the human and the chicken homolog. If we analyze both outputs, we can see that the highest quality alignment is the one obtained against the chicken protein, which has an obvious evolutionary explanation. Since our bird is evolutionarily closer to chicken than human, it is normal to observe a better alignment against the Gallus gallus sequence than the Homo sapiens.
The second problem worth mentioning is that in some cases (i.e. eEFsec, PSTK, SBP, SECp43, and DIO1), we have found Xs (which should represent Sec residues) in our experimental genome's sequence that were not present in the reference genomes. We deduced that this is probably due to an error in the program, since some of these proteins are not even a Cys-containing homolog and in the query sequences they align with residues that are not Cys, revealing that a Sec cannot appear in this position. Moreover, Seblastian does not show a prediction of a selenoprotein nor SECIS element in these cases, which makes the previous deduction more solid, since the SECIS element is essential to encode a selenoprotein.
Finally, it is important to note that the proteins in which we have only found hits against the chicken's genome, is not because the given protein does not exist in human, but because the database from which we have obtained the human proteins (SelenoDB 1.0) is much more conservative and only shows predictions that have been manually curated. Thus, it prioritizes quality over quantity.
eEFsec is a translation factor necessary for the incorporation of Sec into proteins, meaning that it is a part of the selenoproteins machinery [6], found both in the human and chicken genomes. This is one of the proteins in which we have found an X with no selenoprotein nor SECIS element prediction in Seblastian. Seeing that there are no Sec present in the reference genomes, we conclude that this X must not represent a Sec. This is probably due to an error of the program [see previous considerations].
Though the chicken-seedeater T-Coffee output does not begin with Met, the fact that both alignments of the query sequences (human and chicken) with the sequence in our experimental genome's is notably accurate, allows us to conclude that there is, in fact, conservation of eEFsec in S. hypoxantha.
GPx1 is a selenoprotein only found in the human genome, but not in the chicken's. As we can see in the T-Coffee output, in S. hypoxantha a Sec was found in the same position as in human, meaning that a selenoprotein was predicted. In the same manner, the alignment between both sequences was fairly accurate, to which we can also add a selenoprotein prediction in this region by Seblastian. A SECIS element was also predicted in the 3'-UTR region of the same strand. On the other hand, it is important to mention that the sequence predicted did not begin with Met. Notwithstanding, taking into account the previously said, we conclude that even though we cannot find this selenoprotein in the chicken genome, it is indeed conserved in S. hypoxantha.
GPx3 is a selenoprotein found both in human and chicken. We selected two hits between the chicken sequence and the Sporophyla's taking into account the percentage of identity and the e-value, thus meaning that we analyzed two different scaffolds (NDFG01000277.1 and NDFG01000386.1). After the data analysis, we found that the Sec was only preserved in the scaffold NDFG01000386.1, meaning that the GPx3 protein in our experimental genome is found in this region. What is more, the T-Coffee alignment was much better in this scaffold and Seblastian only predicted a selenoprotein and a SECIS element in the mentioned scaffold. The SECIS element is found in the 3'-UTR region of the same strand as the sequence predicted. When comparing the human sequence with the seedeater's, only one hit was found, being the scaffold NDFG01000386.1, which confirmed our hypothesis. In the same manner, the alignment was fairly accurate, but not as much as with the chicken. Nevertheless, the differences were insignificant.
Though the chicken sequence does not begin with Met, the alignment with the human protein does. All in all, we conclude that GPx3 is conserved and found in this region of the S. hypoxantha genome.
GPx4 is a selenoprotein only found in humans (and other mammals), but not found in chicken. We found one hit between the protein sequence in the human genome and the Sporophyla's genome. As we can see in the T-Coffee output, the Sec in the human sequence is preserved in the seedeater's genome sequence in the same position, meaning that the selenoprotein is conserved. In the same manner, the alignment is considerably accurate. Likewise, Seblastian predicted a selenoprotein in this same region and a SECIS element, this last one being in the 3'-UTR region of same strand. However, we found that the sequence does not begin with Met, which is a handicap. Nonetheless, taking into account all the mentioned before, we conclude that GPx4 is found in S. hypoxantha's though not in Gallus gallus.
GPx7 is a Cys-containing homolog found in human and chicken. When comparing both sequences with S. hypoxantha's genome in the T-Coffee output, we can see a nearly perfect alignment with a conserved Cys residue. The alignment with chicken is quite better than with human, as it is a much closer species with S. hypoxantha. Seblastian did not predict a selenoprotein but it did predict a SECIS element. Taken all together, we can confirm that the protein predicted is not a selenoprotein and that is, indeed, a Cys-containing homolog well-preserved between species.
GPx8 is a Cys-containing homolog found in both human and chicken. We selected two hits between the human protein sequence and our experimental genome, meaning that we analyzed two different scaffolds (NDFG01000392.1 and NDFG01000454.1). The alignment between the protein sequence and the sequence selected in both scaffolds was very similar and the Cys was found to be preserved in both sequences, so we could not conclude which region of our genome is equivalent to the query protein. However, when analyzing the seedeater's sequence with the chicken sequence of the protein, we found only one hit in scaffold NDFG01000392.1, thus meaning that GPx8 in Sporophyla is found in this region. The alignment confirms that it is the same protein (GPx8). Seblastian did not predict any selenoprotein or SECIS element in this region, sustaining our prediction. In the alignment with chicken, neither of the protein sequences begin with Met, yet they do in the alignment with human.
The hit found between the NDFG01000454.1 scaffold and the human protein must be due to the homology between the protein sequence of GPx8 and GPx7, as it is the same hit we analyzed and concluded to be the region where GPx7 is found. This makes a lot of sence, since, according to the literature, GPx7 and GPx8 evolved from a GPx4-like selenoprotein ancestor.[5]
All three selenoproteins in this family were found to be conserved in S. hypoxantha, as the Sec were conserved in the same position and the alignments were genuinely good (better with chicken than human). However, it is important to note that for both alignments done for DIO2/DI2 and the alignment with chicken's DIO3, Seblastian did not predict a selenoprotein. Analyzing all the data available we finally settle that these proteins are present in the seedeater's genome, given the fact that the T-Coffees performed show reasonable alignments, and there were SECIS found for each of the genes in the same strand (reverse for DIO1/DI1 and DIO2/DI2, forward for DIO3/DI3) and in the 3'-UTR region.
On the other hand, in the annotation of the seedeater's DOI1/DI1, an X appears in a position where it should not be, probably due to an error, as previously explained.
It is also important to remark the fact that the alignment with DIO3 (chicken) does not start with Met, probably due to a poor annotation of the gene in the SelenoDB database. In spite of this, we still consider that there is a selenoprotein, though it probably starts at a different site.
Our conclusions are consistent with the existing literature, as it has been reported that all vertebrates have these three proteins of the iodothyronine deiodinase family in their genome.[5]
Though we are unable to find the Cys in this Cys-homolog protein in the alignment done with chicken, it is due to the fact that the chicken protein extracted from SelenoDB was incomplete, as we can see here:
As the T-Coffee alignment is fairly good against the human protein and much better against what we have of the chicken protein, we conclude that MsrA protein is conserved in the seedeater. The fact that the chicken's protein is incomplete explains why our protein prediction does not start with Met [see previous considerations].
Hits for the PSTK where only searched against the chicken genome. The T-Coffee output shows that there is an alignment with the tawny-bellied seedeater and 10/15 Cys are conserved in the seedeater's genome. In this protein, we also found a random X in the sequence of Sporophila that was not present in the reference genome and in a position that does not correspond to a Cys [see previous considerations].
Taking into consideration our data and the fact that this is an essential protein for selenoprotein assembly and that it is highly conserved across archaea and eukaryotes, we have settled on the fact that this protein must be conserved in S. hypoxantha.
The SBP family is required for efficient recoding of UGA as Sec, meaning that it is part of the selenoproteins machinery. SBP2 is found in humans and SBP2(1), (2) and (3) are found in chicken. Two hits were obtained in S. hypoxantha's genome for the human SBP2 (scaffolds NDFG01000251.1 and NDFG01000168.1). The first scaffold did not have a good alignment with the reference sequence. Though the second one had a slightly better alignment, it still was not a high-quality alignment. This same scaffold had a hit with chicken's SBP2(1), the alignment of which was much better than the ones obtained against the human homologs. However, in the chicken alignment, some random X appeared in the seedeater's sequence, but not in the human's alignment. This could be due to an error in the program [see previous considerations]. What is more, as the alignment with both queries were in the same region (scaffold NDFG01000168.1), it is not possible to find an X in the chicken-seedeater output but not in the human-seedeater output. Likewise, Seblastian did not predict any selenoprotein or SECIS element in this region. On the other hand, the queries' sequences start with Met, but not the seedeater's sequence.
As for SBP2(2), we found one hit with the seedeater's genome, in the scaffold NDFG01000251.1. Nonetheless, the alignment was not good enough and some random X appeared as well, like in the previous case.
Regarding SBP2(3), it was not selected for further analysis since the hit found was not good enough, meaning that it may not exist in S. hypoxantha.
Taken all together, we conclude that S. hypoxantha has an SBP2(1) homolog, which would equal SBP2 in humans. However, we cannot come to the same conclusions for the SBP2(2) protein.
For both proteins in this family, we only had hits in the chicken genome, as they did not appear in the human database of SelenoDB 1.0.
SecS(1) had hits in two different scaffolds of our genome. Looking and comparing the data from both, the T-Coffee alignment obtained with the NDFG01000445.1 scaffold was much better than the one with the NDFG01000010.1 scaffold. What is more, all 5 Cys found in the seedeater's genome align with a Cys of the chicken's genome.
SecS(2) on the other hand, had only one hit, perfectly aligned with the chicken protein, including all 11 of the Cys in the reference sequence. It is important to note that there was a Sec found in the S. hypoxantha protein's sequence. We speculate that it was probably due to an error of the program [see previous considerations].
In this family, we have found two hits in the Sporophila's genome in two different scaffolds: NDFG01000530.1 and NDFG01000166.1. Both of these hits align with chicken's SEPHS and human's SPS1. Looking at our data, we conclude that these proteins have a homolog in Sporophila in the scaffold NDFG01000530.1, for various reasons: 1) the T-Coffee alignments are much better and start with Met, 2) though Seblastian does not predict a selenoprotein in neither of the scaffolds, it does predict a SECIS element for the hit in the NDFG01000166.1 region, and 3) SEPHS is defined to be in SelenoDB as a Thr-containing homolog, and this threonine is found to be conserved only in the NDFG01000166.1 scaffold.
SPS2 is a selenoprotein found only in the human genome, and not the chicken's. We found one hit in scaffold NDFG01000166.1 mentioned before, but the Sec is not found in Sporophyla's genome, the alignment is not good enough and Seblastian does not predict a selenoprotein nor a SECIS element. With this data, we conclude that SPS2 is not found in S. hypoxantha.
Sel15 is a selenoprotein found in both human and chicken. As we can see in the T-Coffee output, in S. hypoxantha a Sec was found in the same position when comparing with the human sequence and the chicken sequence, meaning that a selenoprotein was predicted in the seedeater's genome. However, Seblastian did not predict a selenoprotein in this region, but it predicted a SECIS element in the same strand and in the 3'-UTR region. Though the sequence does not begin with Met in any of the two alignments, taking into account all the data, we conclude that there is a selenoprotein in this region. Thus, Sel15 is conserved in S. hypoxantha.
It is interesting to notice two things here. On the one hand, this protein is found in both reference genomes but in humans it is a selenoprotein, whereas in chicken (SELENOH) it is classified as a Cys-containing homolog which does not contain a Sec. On the other hand, we have found a Sec in S. hypoxantha that indeed aligns with the Sec found in the human genome and the T-Coffee shows a very good alignment starting with a Met residue. Thus, we can conclude that an homolog of Sel H is found in S. hypoxantha as a selenoprotein.
Two different scaffolds were found to have hits in the chicken's SELENOI, but it was NDFG01000016.1 the one with the best T-Coffee alignment, being also the only one of the two that had the Sec conserved in the same position as the reference genome. Moreover, a SECIS element in the 3'-UTR region and a selenoprotein were predicted in Seblastian. The same two scaffolds had hits with the human protein (SelI), and again, the best hit found was the one with the NDFG01000016.1 scaffold.
Our findings are consistent with the information found in the literature, as it exposes that this protein is supposed to be found in all vertebrates.[7]
There are three SELENOKs found in the chicken genome, whereas only one is found in the human genome. There were no hits obtained from the tBLASTn performed against the the SELENOK(2) and SELENOK(3) of the chicken, and the hit found with human SelK was not selected for further analysis. SELENOK(1), however, appears to be conserved, since the Sec is found in the seedeater's genome even though it does not align perfectly with the Sec in the chicken's genome, but the T-Coffee output shows a very good alignment starting with a Met residue. Though it is true that Seblastian does not predict a selenoprotein, it does, however, give a prediction of a SECIS at the 3'-UTR region. With all the data, we consider that this selenoprotein is conserved in S. hypoxantha.
A hit was found in the NDFG01000027.1 scaffold of the seedeater's genome that aligned with the SelM human selenoprotein, which was found to be quite good. Also, a Sec was found in the seedeater protein sequence exactly in the same position as the query's Sec. Despite the fact that the first amino acid of the obtained protein prediction is not a Met and that Seblastian did not predict a selenoprotein, it did predict a SECIS element in the 3'-UTR region of the same strand as the Sec-containing predicted region (forward strand). Taking all this into account, we consider that this selenoprotein is found in
Though it is true that the human T-Coffee output has a notable bad alignment, we have found hits for Selenoprotein N against both chicken (SELENON) and human (SelN) genomes. Considering both T-Coffee alignments, our data seems to point to the presence of this selenoprotein in Sporophila's genome, since the Sec in the chicken's sequence is conserved in the seedeater's sequence in the same position. Furthermore, Seblastian predicts for a selenoprotein and a SECIS element in the 3'-UTR region for this gene.
This protein has been described in the literature as one of the 31 selenoproteins found in ancestral vertebrates.[5] So, taking this information and our data into account, we conclude that this protein must be conserved in S. hypoxantha.
A hit was found in our experimental genome for 2/4 proteins of the Selenoprotein O family of the chicken's genome and two hits for human's SelO. Out of the two scaffolds with hits against the human protein, NDFG01000399.1 was found to contain the highest quality hit: it conserved the Sec in almost the same position and Seblastian predicted a selenoprotein and a SECIS element in the right region. It was this same scaffold, in the same genomic region, that was found to have a hit against chicken's SELENOO(3). What is more, the Sec in SELENOO(3) was found in the same position in the seedeater's sequence. However, two more Xs were found in the seedeater's sequence [see previous considerations].
There was also a hit found against chicken's SELENOO(2), a machinery protein for selenoprotein formation. T-Coffee shows a good alignment (though there is no Met at the beginning of the sequence) and Seblastian does not predict a selenoprotein in this region, as expected. However, it does predicts a SECIS element.
All in all, we conclude that we can find two proteins from the Selenoprotein O family in the genome of S. hypoxantha, one being a SELENOO(3) and SelO homolog, and the other a SELENOO(2) homolog.
We found two hits of this protein family in the chicken genome and one in the human SelenoDB 1.0 database, which had hits from two different scaffolds, the same ones that aligned with SELENOP(1) and SELENOP(2). As far as the T-Coffee alignments obtained with the human protein (SelP), we see that the one obtained with the NDFG01000392.1 scaffold is better and almost all the Sec found in the human sequence are found in the seedeater's sequence. Furthermore, the two proteins that aligned in this scaffold started with a Met residue, which allows us to draw a more robust conclusion as far as the presence of this protein in S. hypoxantha.
All in all, we conclude that two homologs of this family are found in S. hypoxantha.
Neither MSRB1 (chicken) nor SelR1 (human), which are selenoproteins, were found in the genome of S. hypoxantha as no hits were obtained in the tBALSTn. As for the other members of the family, which are Cys-containing homologs, their respective potential homolog was found in the tawny-bellied seedeater. For protein SelR2, the T-Coffee alignment was quite conservative and the Cys was preserved. The Seblastian did not predict a selenoprotein or a SECIS element. Thus, we can conclude that this protein is not a selenoprotein but a Cys-containing homolog that can be found in the genome of S. hypoxantha.
Furthermore, hits were found for MSRB3/SelR3 protein for both chicken and human. The T-Coffee output resulted to be well-aligned. In both species of reference, Seblastian did not predict a selenoprotein, but it did predict a SECIS element. Therefore, taking all the data together we conclude that an homolog of MSRB3/SelR3 is found in S. hypoxantha.
We blasted the genome of S. hypoxantha against the human and the chicken Selenoproteins S, SelS and SELENOS respectively. The protein appears to be conserved in the seedeater's genome, as we see in the T-Coffee output that Sporophila's protein aligns with both the chicken and the human protein and the Sec is preserved. Furthermore, Seblastian also predicts for a selenoprotein and a SECIS in the 3'-UTR region of the gene. Thus, we conclude that an homolog of this selenoprotein is present in S. hypoxantha.
A hit was found for both the comparison with the chicken selenoprotein SELENOT sequence and for the human homolog SelT protein sequence, in similar positions of the same subject scaffold. For both of the hits, T-Coffee predicted a considerably accurate protein alignment and a Sec was found in the S. hypoxantha protein sequence exactly in the same position as in the query. Even though Seblastian did not predict a selenoprotein, it did predict a SECIS element in both of them in the 3'-UTR region of the same strand as the one predicted with T-Coffee (forward strand). Taking all this into account, we predict a Selenoprotein T in S. hypoxantha, although we must say that neither of the predicted regions started with a Met.
A hit was found both for the SELENOU (chicken) and SelU1 (human) sequences with tBLASTn, in approximately the same region. To be precise the chicken annotation begins three amino acids before the human's although it should be taken into consideration that the human's starts with Met and it is more accurately annotated. The T-Coffee alignment with the proteins of both species was substantially good and it predicted a Sec in both of the seedeater regions: in the alignment with chicken the Sec was in the same position as the Sec of the chicken's sequence, which means that it is conserved between these two species; on the other hand, the human Cys-containing homolog had a Cys that aligned a Sec in the Sporophila's sequence in that position. Seblastian also predicted a selenoprotein in this region and a SECIS element in approximately the same region situated in the 3'-UTR region of the reverse strand, the same strand predicted by T-Coffee for the location of the gene. Although the chicken alignment did not start with Met, the human alignment did start with Met. All in all, we can say that S. hypoxantha contains an homolog of SELENOU.
Furthermore, a hit with the SelU2 (human) sequence was found in another scaffold of S. hypoxantha. T-Coffee predicted a fairly accurate protein alignment, in which the sequence of the seedeater contained a Cys in the same position as the query. Seblastian neither predicted a selenoprotein for this region nor a SECIS element. It is important to mention that, although the first amino acid of the human SelU2 protein was a Met, the first one of the S. hypoxantha predicted protein was not. Despite this, considering all the information previously said, we consider that S. hypoxantha has an homolog of SelU2.
The only protein of the Selenoprotein W family against which we found a hit is the SelW2 human Cys-containing homolog protein. T-Coffee predicted a good protein alignment, in which a Cys could be found in the S. hypoxantha sequence in the same position as the one in the query sequence. Seblastian neither predicted a selenoprotein nor a SECIS element for this region. Even though the seedeater protein sequence did not start with a Met, we conclude that S. hypoxantha has an homolog of this protein.
Two hits were found for the chicken TXNRD1 selenoprotein (one in the NDFG01000277.1 scaffold and the other in the NDFG01000920.1 scaffold) and also two for the human TR1 selenoprotein (one in the NDFG01000920.1 scaffold and the second in the NDFG01000057.1 scaffold). For the predicted regions in the scaffold NDFG01000920.1, alignments with both protein queries (chicken and human) were fairly accurate and a Sec was found almost in the same position as the query Sec in both cases. What is more, Seblastian predicted a selenoprotein and a SECIS element in this region. The SECIS element is in the 3'-UTR region of the same strand as the predicted Sec-containing sequence (forward strand). The alignment of the scaffold NDFG01000277.1 with the TXNRD1 chicken protein sequence was considerably accurate and a Sec was detected as well in the tawny-bellied seedeater's sequence almost in the same position as the query Sec. Likewise, Seblastian predicted a selenoprotein and a SECIS element in this region, the latter being in the 3'-UTR region of the same strand. Contrarily, the T-Coffee protein alignment of the scaffold NDFG01000057.1 region with the human TR1 protein did not show such an evident homology as the one performed with TRXRD2 chicken selenoprotein. The hit for TXNRD2 was in the same region of the NDFG01000057.1 scaffold, so we predicted that this Sec-containing region was homologous to TXNRD2. In addition, Seblastian predicted a selenoprotein and SECIS element in the 3'-UTR region of the forward strand, the same as the predicted gene location of S. hypoxantha.
A hit for the TR3 human selenoprotein was found in the same region of the scaffold NDFG01000277.1 that was predicted for TXNRD1. Considering that the protein alignment performed by T-Coffee for TR3 was not as accurate as the one for TXNRD1, we consider that this predicted region is homologous to the chicken TXNRD1 selenoprotein. The approximate good alignment for TR3 and still a prediction of selenoprotein and SECIS element by Seblastian is due to the fact that these two proteins are included in the same family.
Altogether, we predicted three proteins of this family in S. hypoxantha: two homologous of thioredoxin reductase 1 (one in the NDFG01000920.1 scaffold and the other in the NDFG01000277.1 scaffold), which means that it has suffered a duplication; and one homologous to thioredoxin reductase 2 (in the NDFG01000057.1 scaffold). Moreover, in the majority of the predicted sequences (i.e. the ones from the scaffolds NDFG01000920.1 and NDFG01000057.1), a random X has been found [see previous considerations].
Two hits were found when S. hypoxantha sequence was compared with the chicken SECp43 sequence: one in the NDFG01000063.1 scaffold and the other in the NDFG01000604.1 scaffold. The latter did not show a very good T-Coffee alignment. Conversely, the first one did show a fairly accurate protein alignment and no Sec was found. This is consistent with the literature [27], which states that it is a selenium machinery gene. However, it is important to mention that the first position of the sequence is not a Met [see previous considerations]. In conclusion, S. hypoxantha has an homolog of SECp43.
After an exhaustive analysis of all the data obtained, we have concluded that Sporophila hypoxantha has homologs of the following selenoproteins:
As for Cys-containing homologs and machinery, we found the following protein
It is important to remark that GPx4, SelM, SelH, SelW2, SelU2 and SelR2 are found in human but not in chicken. Moreover, SelH is a selenoprotein in humans and S. hypoxantha but a Cys-containing homolog in chicken. Another important remark is that we found a duplication of thioredoxin reductase 1, since we predicted two different regions in S. hypoxantha's genome homologous to this protein.
All in all, we have identified and annotated the selenoproteins and other related genes of S. hypoxantha by comparing its genome to the sequences of these proteins in Gallus gallus and Homo sapiens previously described. Bioinformatics methods have been implied in this project, thus we suggest that molecular analysis should be performed in order to confirm our results.