Discussion
The criteria for deciding whether the protein in Spermophilus dauricus is a selenoprotein, an cysteine-containing homologue or none of them was:
- - SELENOPROTEIN: whether a TGA codon is aligned either with a selenocysteine in the query or with a cysteine and the sequence after the TGA codon shows a perfect alignment with the query.
- - CYSTEINE-CONTAINING HOMOLOGUE: whether a cysteine in Spermophilus dauricus is aligned with a selenocysteine or a cysteine in the query and the sequence after the cys residue shows a perfect alignment with the query.
- - OTHERS: some proteins could have lost their selenocysteine and therefore another amino acid apart from cysteine could be replacing its position. Hence, it is not neither a selenocysteine nor a cystein-containing homologue.
- - UGA codon within the ORF.
- - SECIS elements located on the 3'UTR.
- - Exons up and down-stream the UGA codon.
During our research, we could not get perfect alignments for all the selenoproteins and because of that, not all found selenoproteins are fully annotated. Some of them lack the N-terminus region and consequently they don’t start by a Met residue. Nevertheless, we obtained a good alignment with the query for the rest of the sequence. That could be explained by:
- - Query's poor annotation, particularly if provided by selenodb 2.0.
- - Defective Spermophilus dauricus genome sequencing.
GPx family:
Glutathione peroxidase (GPx) is a family of enzymes with peroxidase activity in which they catalyse the reduction of hydrogen peroxide using monomeric glutathione as a cofactor. Their main biological role is to protect the organism from oxidative damage.(15)In most mammals, GPx1-4 and GPx6 are selenoproteins and GPx5, GPx7 and GPx8 are cysteine homologues. According to SelenoDB 2.0, the squirrel genome has 5 selenoproteins and 4 cysteine homologues in this family. GPx2, GPx3, GPx4 and GPx6 are selenoproteins. GPx5, GPx7, GPx8 are homologues, and there are also 2 unclassified GPx proteins (1 selenoprotein and 1 Cys homologue).
On the other hand, there are 8 subfamilies of human GPx proteins annotated (GPx1-GPx4 and GPx6, selenoproteins, and GPx5, GPx7 and GPx8, Cys homologues), so this difference with the squirrel annotation raised our concern and made us run the analysis for all human proteins to make the relevant associations.
Due to the high homology between them, almost all of them present predictions against each other, which results in a complex analysis. We will discuss their Spermophilus dauricus relatives individually, making the relevant associations.
Below, we show the proteins predicted in the Spermophilus dauricus genome:
GPx1 (GPx unclassified in squirrel):
GPx1 is the most abundant version, found in the cytoplasm of nearly all mammalian tissues, whose preferred substrate is hydrogen peroxide. (16)
This protein was not well annotated in the squirrel SelenoDB 2.0 because of two main reasons:- - The protein is not well classified as the subfamily is not specified.
- - The protein sequence is truncated in a final rare “X” position which has been misled by a STOP codon. This fact results in a unusual short protein sequence.
The following graph shows GPx1 exon distribution:
exon start exon end exon length 1 614365 614721 356 GPx2:
This gene encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract (17).
In this case, the prediction was also performed using the GPx2 Homo sapiens query. All Exonerate, Genewise and Seblastian predicted the same protein, which aligned perfectly with the chosen query (SPP0000005). In addition, a SECIS element was found in 3’UTR. A selenocysteine protein is found in our predicted sequence and, therefore, we conclude that Spermophilus dauricus has the GPx2 selenoprotein. The sequence of it is found in the positive strand.
The following graph shows GPx2 exon distribution:
exon start exon end exon length 1 614440 614718 278 GPx3:
This isozyme is secreted, and is abundantly found in plasma. Downregulation of expression of this gene by promoter hypermethylation has been observed in a wide spectrum of human malignancies, including thyroid cancer, hepatocellular carcinoma and chronic myeloid leukemia. This isozyme is also a selenoprotein, containing the amino acid selenocysteine at its active site (18).
The prediction was performed using the GPx3 Homo sapiens query as the Ictidomys tridecemlineatus did not start with Met. Both Exonerate and Genewise predicted the same protein, which aligned well with the chosen query (SPP0000006). Also a SECIS element was found in 3’UTR. In this case, though, Seblastian was not able to predict any protein. Despite the failure of Seblastian predicting a selenoprotein, we consider that Exonerate and Genewise predictions are accurate, so we can conclude that Spermophilus dauricus has the GPx3 selenoprotein in the reverse strand. The following graph shows GPx3 exon distribution:
exon start exon end exon length 1 379534 379624 90 2 374843 374992 149 3 373378 373495 117 4 372918 373017 99 5 372292 372495 203 GPx4:
This glutathion peroxidase isoform is highly expressed in the sperm. (19) Query SPP0000007 from Homo sapiens was examined, in this case, through Exonerate, Genewise and Seblastian. All of them showed the same protein prediction. Although it contains a initial gap in the prediction, there is a selenocysteine protein in the alignment. We also found a SECIS element in the 3’UTR. Taking into account all these facts, we conclude that Spermophilus dauricus has the GPx4 selenoprotein in the forward strand. The following graph shows GPx4 exon distribution:
exon start exon end exon length 1 645122 645219 97 2 645301 645445 144 3 645534 645685 151 4 645761 645785 24 5 645864 645926 62 GPx5 (cysteine-containing homologue):
This protein is specifically expressed in the epididymis in the mammalian male reproductive tract, and is androgen-regulated. Unlike mRNAs for other characterized glutathione peroxidases, this mRNA does not contain a selenocysteine (UGA) codon. Thus, the encoded protein is selenium-independent, and has been proposed to play a role in protecting the membranes of spermatozoa from the damaging effects of lipid peroxidation and/or preventing premature acrosome reaction (20).
Following the same pattern as the previous isoforms, the squirrel isoform SPP00001183 did not show a good homology so we used the human SPP0000010 although it has an initial gap which includes the Cys homologous. Alignments obtained from both and Exonerate and Genewise show a poor homology. These facts are indicative of an incorrect alignment of this protein isoform and therefore, we can not conclude that this protein is found in the Spermophilus dauricus genome. Furthermore, Seblastian prediction did not show any results. This is the detailed exon prediction:
exon start exon end exon length 1 5830798 5831351 553 GPx6:
Expression of this gene is restricted to embryos and adult olfactory epithelium. (21)
The prediction of this isoform was performed using the Homo sapiens query (SPP0000009). All Exonerate, Genewise and Seblastian predicted the same protein, which aligned perfectly with the chosen query. Also a SECIS element was found in 3’UTR. Therefore we can conclude that Spermophilus dauricus has the GPx6 selenoprotein. The sequence of it is found in the reverse strand. The following graph shows GPx6 exon distribution:
exon start exon end exon length 1 636251 636337 86 2 633291 633444 153 3 630960 631077 117 4 630288 630387 99 5 628787 628987 200 GPx7 y GPx8:
Unlike mRNAs for other characterized glutathione peroxidases, these mRNA do not contain a selenocysteine codon. Thus, the encoded proteins are selenium-independent like the previous GPx5. (22,23)
In these cases, the prediction was performed using the query SPP0000010 and SPP0000011 from Homo sapiens and did not show satisfactory results neither for SECIS elements nor Seblastian. On one hand, Exonerate and Genewise predictions showed a initial gap in the results of GPx7. On the other hand, Exonerate and Genewise predictions showed a perfect homology for GPx8. Both cases have a Cys-containing homologous amino acid. Taking into account that these are a Cys-containing homologues, these results are consistent with this fact so we conclude that these proteins sequences are found in the Spermophilus dauricus (GPx7 and GPx8, respectively) in the forward strand. The following graphs show GPx7 and GPx8 exon distribution:
exon start exon end exon length 1 272935 274275 1340 exon start exon end exon length 1 621955 622158 203 2 621125 621386 261 3 618180 618340 160
Iodothyronine deiodinase family:
Iodothyronine deiodinases are a subfamily of deiodinase enzymes important in the activation and deactivation of thyroid hormones.
This protein family appeared to be sufficient conserved in vertebrates and mammals. It exists three different isotyopes of deiodinase in mammals (DIO1-3). All of them are transmembrane proteins with a thyoredoxine domain and are involved in activation regulation and in the inactivation of the thyroid hormone by reductive deiodination.
Iodothyronine deiodinase 1 (DIO1):
It activates thyroid hormone by converting the prohormone thyroxine (T4) by outer ring deiodination (ORD) to bioactive 3,3-prime,5-triiodothyronine (T3). It also degrades both hormones by inner ring deiodination (24).
The prediction was performed using the Ictidomys tridecemlineatus DIO1 query. All Exonerate, Genewise and Seblastian predicted the same protein, which aligned perfectly with the chosen query. Also a SECIS element was found in 3’UTR. Therefore we can conclude that Spermophilus dauricus has the DIO1 selenoprotein. The sequence is found in the reverse strand.
The following graph shows DIO1 exon distribution:
exon start exon end exon length 1 1867632 1867968 336 2 1859780 1859923 143 3 1854544 1854743 199 4 1851665 1851730 65 Iodothyronine deiodinase 2 (DIO2):
This isoform also activates the thyroid hormone. It is highly expressed in the thyroid, and may contribute significantly to the relative increase in thyroidal T3 production in patients with Graves disease and thyroid adenomas. (25)
The prediction was performed using the Ictidomys tridecemlineatus DIO2 query. Both Exonerate and Genewise predicted the same protein, which aligned perfectly with the chosen query. Seblastian did not find any selenoprotein in the sequence of the contig, this could be due to the fact that DIO2 is not found in its database, this hypothesis is supported by the fact that in other previous years DIO2 could not be found using Seblastian neither. A SECIS element was found in 3’UTR. Therefore we can conclude that Spermophilus dauricus has the DIO2 selenoprotein. The sequence of it is found in the forward strand.
The following graph shows DIO2 exon distribution:
exon start exon end exon length 1 1873698 1873919 221 2 1882655 1883227 572 Iodothyronine deiodinase 3 (DIO3):
Unlike the previous isoforms, DIO3 catalyzes the inactivation of thyroid hormone by inner ring deiodination of the prohormone thyroxine (T4) and the bioactive hormone 3,3-prime,5-triiodothyronine (T3) to inactive metabolites, 3,3-prime,5-prime-triiodothyronine (RT3) and 3,3-prime-diiodothyronine (T2), respectively. This enzyme is highly expressed in the pregnant uterus, placenta, fetal and neonatal tissues, suggesting that it plays an essential role in the regulation of thyroid hormone inactivation during embryological development. (26)
Since the squirrel query did not start by Met, the prediction was performed using the Homo sapiens DIO3 query. All Exonerate, Genewise and Seblastian predicted the same protein, which aligned perfectly with the chosen query. Also a SECIS element was found in 3’UTR. Therefore we can conclude that Spermophilus dauricus has the DIO3 selenoprotein encode in its genome. The sequence is found in the reverse strand.
The following graph shows DIO3 exon distribution:
exon start exon end exon length 1 550733 551560 827
Sel family:
Selenoprotein 15 (Sel15):
This selenoprotein is encoded by 15 kDa selenoprotein gene. Previous studies based on the analysis of this selenoprotein suggest that it is related to regulate the protein folding and redox processes. However, its general functions remain unclear. (17)
Sel15 was predicted by both Exonerate and Genewise, using as reference the query SPP00001196 of Ictodomys tridecemlineatus as it had a higher values of homology than Homo sapiens. These high identity and low e-value indicated good alignment between the two sequences. Selenoprotein 15 has a selenocysteine in its sequence. Therefore, a SECIS element was found in 3’UTR. Although Seblastian could not predict a selenoprotein, we concluded that this protein sequence exists in Spermophilus dauricus genome. The sequence is found in the reverse strand .
exon start exon end exon length 1 505445 505519 74 2 502968 503135 167 3 481920 481983 63 4 473137 473186 49 5 468805 468933 128 Selenoprotein H (SelH):
SelH is located in the nucleus, where it binds to sequences that contain heat shock and stress response elements. This fact suggests that this protein has an antioxidant role.
Total alignment was found between the query SPP00001200 of squirrel and the genome of Spermophilus dauricus. The sequence has a residue of selenocysteine. However, no SECIS elements nor selenoproteins were predicted. As there are not Seblastian evidences, it is not possible to conclude that Spermophilus dauricus has SelH in its genome. The sequence is found in the reverse strand.
exon start exon end exon length 1 272241 272359 118 2 272001 272146 145 3 271770 271867 97 Selenoprotein I (SelI):
This protein is involved in the formation and maintenance of vesicular membranes, regulation of lipid metabolism, and protein folding.
The SPP00001201 query of squirrel aligned perfectly with a sequence in Spermophilus dauricus genome. However, both exonerate and genewise files showed a GAP at the beginning of the sequence. Referring to Seblastian prediction, the analysis of SECIS elements and selenoproteins showed both positive results. Also a selenocysteine was found in its sequence. We could conclude that Selenoprotein I appears in our species genome. The sequence is found in the reverse strand.
exon start exon end exon length 1 741059 741130 71 2 740388 740496 108 3 738027 738101 74 4 732270 732532 262 5 731088 731196 108 6 724475 724523 48 7 722626 722806 180 8 721021 721203 182 9 718582 718677 95 Selenoprotein K (SelK):
Selenoprotein K is a transmembrane protein located in the endoplasmic reticulum. It participates in the degradation of misfolded glycosylated proteins.((29)
Apart from a small GAP at the beginning of the sequence alignment, a perfect alignment was found between the SPP00001199 query of squirrel with both Exonerate. However, results obtained with Genewise show a loss of the selenocysteine amino acid which can be explained as a mistake of this programme. The sequence of our species is located in the in the forward strand. A selenocysteine has been identified in the sequence, a SECIS element was found in the 3’ UTR and finally a selenoprotein was predicted by Seblastian. We concluded that Selenoprotein K appears in Spermophilus dauricus genome.
exon start exon end exon length 1 8181 8363 182 Selenoprotein M (SelM):
This protein appears in the nucleus of the cells, specially in brain areas, but its functions remain unclear.
As it happened in previous analysis, the alignment between the query of squirrel (SPP00001202) and the sequence of the genome is perfect, apart from a GAP that appears at the beginning. The first part of the query is not aligned with the sequence of Spermophilus dauricus because of this GAP and this is relevant as the Sec residue in the query is included in this GAP. It is for this reason that we found a Sec residue in the query but not in the sequence of Spermophilus dauricus. However, a SECIS element was found in the 3’ UTR, but Seblastian did not predict a selenoprotein.
In this case, the GAP problem did not allow to empirically conclude that Selenoprotein M has been predicted in Spermophilus dauricus genome.
exon start exon end exon length 1 293540 293577 37 2 293913 293991 78 3 294070 294225 155 Selenoprotein N (SelN):
No studies of this protein have been done yet and therefore its functions remain unknown.
Query SPP00001203 of squirrel did not start by Met. Despite that, it was used as a reference as it had higher homology values than the human one. Selenoprotein N has been predicted in Spermophilus dauricus, specifically in the forward strand. This has been concluded due to the good alignment between the two sequences using both Exonerate and Genewise, the appearance of Sec residue, the SECIS found and the selenoprotein predicted by Seblastian.
exon start exon end exon length 1 382473 382596 123 2 385659 385796 137 3 394586 394739 153 4 396291 396372 81 5 396520 396708 188 6 397025 397130 105 7 398161 398273 112 8 398358 398459 101 9 400075 400236 161 Selenoprotein O (SelO):
No information related to this protein could be found in prior literature, as its functions remain unknown.
In this case, it was only possible to find an alignment using squirrel as reference, specifically the query SPP00001204, and not by comparing with human selenoproteome. Moreover, this alignment was not significant, as there was only homology at the beginning of the sequences, and the Sec residue in the query is not included there. Therefore, no Sec residue was found in the genome of Spermophilus dauricus. Moreover, no SECIS elements were found in the analysis nor selenoprotein predicted by Seblastian.
In conclusion, by using this method it was not possible to identify selenoprotein O in Spermophilus dauricus selenoproteome.
Selenoprotein P (SelP):
Selenoprotein P is a secreted glycoprotein. As it contains most of the selenium in plasma, its function is related to extracellular antioxidant defense properties of selenium and transport of this element.(33)
It was not possible to identify this protein in our species genome by using squirrel as reference as an important part of the query was missing. However, the selenoprotein has been predicted when human selenoproteome was used, specifically the query SPP00000020. We demonstrated that SelP appears in Spermophilus dauricus genome because a perfect alignment was found between the two sequences, where the Sec residues were included. This protein contains 10 Sec in Homo sapiens, and we can also find this in Spermophilus dauricus. Therefore, a SECIS element was found in KZ296009.1. In particular this was found in the forward strand.
exon start exon end exon length 1 179151 179353 202 2 177974 178186 212 3 175815 175932 117 4 172909 173505 596 Selenoprotein S (SelS):
This protein is involved in the degradation process of misfolded endoplasmic reticulum (ER) luminal proteins. It participates in the transfer of misfolded proteins from the ER to the cytosol, where they are destroyed by the proteasome in a ubiquitin-dependent manner. It may regulate cytokine production, and thus play a key role in the control of the inflammatory response. (37)
The prediction of SelS was performed using the Homo sapiens query (SPP0000024). Both Exonerate and Genewise predicted the same protein. which aligned well with the chosen query. However, Sec alignment has resulted in a mutation in which Sec has been replaced by Thr. A SECIS element was found in 3’UTR. Therefore, we conclude that Spermophilus dauricus may have the SelS selenoprotein. Perhaps this protein is mutated and not functional in this species. The sequence of it is found in the forward strand.
The following graph shows SelS exon distribution:
exon start exon end exon length 1 2902410 2902440 30 2 2901674 2901808 134 3 2899578 2899684 106 4 2894732 2894821 89 5 2894453 2894531 78 6 2892087 2892163 76 Selenoprotein T (SelT):
Although its functions remain unknown, it is related to oxidoreductase activity and protection to dopaminergic neurons against oxidative stress and cell death.(10)
The squirrel SelT annotated protein does not start with Met, whereas the human does. For this reason we took from the human selenoproteome the SPP00000025 query to predict this selenoprotein in our species. The alignment between them is almost perfect and the predicted sequence shows the Sec residue both for Exonerate and Genewise software. Also a SECIS element was found. Taking into account all these facts we concluded the presence of Selenoprotein T.
exon start exon end exon length 1 9854 10441 587 Selenoprotein V (SelV):
This selenoprotein query was found in human selenoproteome, but not in squirrel nor mouse. Furthermore, alignment trials were made between the human query and the genome of Spermophilus dauricus resulting in no significant results. This reasons allowed us to conclude that this protein does not exist in Spermophilus dauricus selenoproteome, consistently with the squirrel and mouse genomes.
It is known that SelV is expressed in testes. However, functions of this selenoprotein remain unknown. Therefore, it has not been possible to see which are the effects of this protein loss in these rodent species commented above.
SelU family:
The SelU family is widely present in eukaryotes as selenoprotein, but also as a Cys-containing homologous. This is the case of mammals, where the Sec residue is replaced with Cys.(38)
Selenoproteins U1, U2 and U3 were found by using human selenoproteome as reference. The human queries offered a great alignment with sequences of our species, less in selenoprotein U2, where the alignment is perfect but there is a GAP at the beginning. No SECIS elements nor Seblastian predictions were found in none of the 3 proteins.
Selenoprotein U1 (SelU1):
Query SPP00000026 for human allowed us to find by homology the protein U1 in Spermophilus dauricus genome.
exon start exon end exon length 1 1802028 1802714 686 Selenoprotein U2 (SelU2):
Query SPP00000027 for human allowed us to find by homology the protein U2 in the genome of Spermophilus dauricus.
exon start exon end exon length 1 1802028 1802714 686 Selenoprotein U3 (SelU3):
Query SPP00000028 for human allowed us to find by homology the protein U3 in the genome of Spermophilus dauricus.
exon start exon end exon length 1 272935 274275 1340
Methionine sulfoxide reductase B family:
This zinc-containing family of proteins show structural differences, but they have complementary functions, as they are involved in the enzymatic conversion of methionine sulfoxide to methionine. MSRB1 is the most abundant MSRB in mammals (34,35,36)
MSRB proteins in Ictidomys tridecemlineatus are called Selenoproteins R in humans.
In terms of results achieved by both Exonerate and Genewise software, it is relevant to say that all of them show a gap at the beginning of the alignment resulting in proteins that do not start by Met. These unexpected results may be due to experimental mistakes in our software or because of some mistakes in the original queries.
Selenoprotein MSRB1 (SelR1):
This selenoprotein has been predicted in Spermophilus dauricus by using human selenoproteome as reference, as we did not find any significant case of homology when squirrel was used. The human query SPP00000021, that refers to its selenoprotein R1, aligned almost perfectly (apart from the GAP at the beginning) with a sequence located in the in the forward strand in Spermophilus dauricus genome. The prediction could be confirmed when a SECIS element was found in the 3’ UTR.
exon start exon end exon length 1 13578 13599 21 2 14935 15083 148 3 15392 15526 134 Selenoprotein MSRB2 (SelR2):
MSRB2 in rodents and its homologue in humans (Selenoprotein R2) are not selenoproteins as they are Cys-containing homologous proteins.
As in the previous case, the squirrel MSRB2 annotated protein does not start with Met, whereas the human does. For this reason, human protein was used as a reference to identify homologies in Spermophilus genome (query: SPP00000022). Although the alignment between the human query and Spermophilus is not as perfect as in previous cases, we considered it enough significant to predict this protein in our species genome, in the forward strand. No SECIS elements were found.
exon start exon end exon length 1 2004773 2004859 86 2 2000142 2000218 76 3 1991238 1991385 147 4 1990101 1990202 101 Selenoprotein MSRB3 (SelR3):
In this case, we have been able to use the MSRB3 of squirrel (query SPP00001210) as a reference to predict the protein in our species because the query starts with Met. Apart from a GAP at the beginning, the alignment between the two sequences is perfect. We concluded that this Cys-containing homologous protein could be predicted in Spermophilus dauricus in the forward strand. A SECIS element was found in the 3’ UTR.
exon start exon end exon length 1 5048 5253 205 2 3490 3679 189
WTH selenoproteins family:
SelWTH family possesses a thioredoxin-like fold suggesting a redox function, playing a role as a glutathione (GSH)-dependent antioxidant. This protein is highly expressed in skeletal muscle, heart and brain. An important paralog of this gene is SELENOV.
*********************http://www.genecards.org/cgi-bin/carddisp.pl?gene=SELENOV***********According to the selenodb_2.0, the squirrel genome has two unclassified selenoW proteins. One of them does not start by Met, which may mean that is bad annotated. No good hits were obtained with the other one. On the other hand, regarding the human selenoW, there are two annotated proteins on selenodb_1.0: SelW1, which is a selenoprotein, and SelW2, which is a cys-containing homologue. Therefore all selenoW predictions were performed using the Homo sapiens queries.
Selenoprotein W1 (SelW1):
Both Exonerate and Genewise predicted the same protein in the forward strand, even though it lacks the Sec and N-terminus*. A SECIS element was found in 3’UTR but Seblastian failed in predicting a selenoprotein in our contig of interest. Therefore we cannot conclude that Spermophilus dauricus has SelW1. It would be necessary to elucidate whether there is a GAP in N-terminus because of a sequencing error - which would mean that the selenoprotein might exist - or because due to an evolutionary process Spermophilus dauricus lost that part of the sequence.
exon start exon end exon length 1 272935 274275 1340 Selenoprotein W2 (SelW2):
Both Exonerate and Genewise predicted the same protein in the reverse strand using the human query SPP00000031. It had a really good alignment with the human cys-homologue query. No Seci elements were found and Seblastian did not predict any selenoprotein. Therefore we conclude that the cys-homologue SelW2 is found in the Spermophilus dauricus genome.
The following graph shows SelW2 exon distribution:
exon start exon end exon length 1 1270485 1270573 88 2 1270275 1270372 97 3 1269758 1269834 76 4 1269595 1269675 80
Thioredoxin reductase family:
This selenoprotein family is involved in protecting mitochondria from reactive oxygen species, so it is important in the defense against oxidative stress and redox homeostasis.
Thioredoxin reductase 1 (TR1):
TR1 could be predicted by comparing with the query SPP00000034 of human selenoproteome, because the alignment obtained was significant. Both exonerate and genewise allowed to identify a significant sequence in the genome of Spermophilus dauricus that shows homology with the human query, and also a Sec residue. A SECIS element was also found. We concluded that Spermophilus dauricus contains TR1 in its selenoproteome.
exon start exon end exon length 1 1396229 1396315 86 10 1369784 1369891 107 11 1366936 1367031 95 12 1362878 1363012 134 13 1350912 1350977 65 2 1394477 1394549 72 3 1392878 1392997 119 4 1387988 1388130 142 5 1387496 1387611 115 6 1385401 1385626 225 7 1376344 1376436 92 8 1375716 1375792 76 9 1374942 1375098 156 Thioredoxin reductase 2 (TR2):
As in the previous case, a great alignment was obtained between the human query SPP00000035 and a sequence located in genome of our species. The Sec residue appears in exonerate analysis, but not in genewise, due to a small GAP at the end of the sequence. However, the fact that a SECIS element was found allowed us to conclude that Spermophilus dauricus contains TR2 in its selenoproteome.
exon start exon end exon length 1 2341097 2341172 75 10 2313148 2313322 174 11 2299303 2299439 136 12 2295802 2295897 95 13 2295358 2295450 92 14 2293350 2293421 71 15 2293058 2293155 97 16 2289215 2289341 126 2 2334159 2334215 56 3 2333550 2333694 144 4 2332804 2332878 74 5 2330902 2330980 78 6 2330221 2330283 62 7 2325588 2325658 70 8 2314661 2314680 19 9 2313796 2313887 91 Thioredoxin reductase 3 (TR3):
Squirrel selenoproteome was used to predict TR3 in Spermophilus dauricus, as big gaps when using human made impossible the comparison between human-Spermophilus. A sequence with a Sec was found. However, neither exonerate nor genewise showed a significant alignment. A SECIS element was found but with low score, and Seblastian did not predict a selenoprotein. With this information, it was not possible to conclude the presence of TR3 in the genome of Spermophilus dauricus.
exon start exon end exon length 1 1391156 1391843 687
Machinery genes:
SECIS binding protein 2 (SBP2):
SBP2 is an essential protein involved in selenoprotein synthesis. After the recognition of the SECIS element, SBP2 can interact with the ribosome and the translation elongation factor, facilitating the translational process (5).
In this case, the prediction was performed using the Homo sapiens query (SPP0000037). Both Exonerate and Genewise predicted the same protein, which have the same gap at the beginning of the alignment with the chosen query. As this is a machinery protein, no Sec amino acid was found in the predicted sequence. No SECIS element were found in 3’UTR nor Seblastian prediction as expected. Therefore, we conclude that Spermophilus dauricus may have the SBP2 protein. The following graph shows SBP2 exon distribution:
exon start exon end exon length 1 219984 220129 145 10 195819 195985 166 11 195050 195179 129 12 193073 193229 156 13 192096 192316 220 14 185975 186129 154 15 185416 185608 192 16 184822 184922 100 2 216375 216624 249 3 215687 215834 147 4 214088 214296 208 5 212118 212223 105 6 210287 210492 205 7 207642 207764 122 8 206082 206171 89 9 204697 204829 132 Phosphoseryl-tRNA kinase (PSTK):
This enzyme participates in selenoprotein biosynthesis and regulation. It specifically catalyzes the formation of phosphoseryl-tRNA(Ser)Sec (47).
In this case, the prediction was performed using the squirrel query (SPP00001222). Both Exonerate and Genewise predicted the same protein. which have the same gap at the beginning of the alignment with the chosen query. As this is a machinery protein, no Sec amino acid was found in the predicted sequence. No SECIS element were found in 3’UTR nor Seblastian prediction as expected. Therefore, we conclude that Spermophilus dauricus may have the PSTK protein. The following graph shows PSTK exon distribution:
exon start exon end exon length 1 48692 48983 291 2 48256 48454 198 3 46603 46681 78 4 46000 46095 95 Eukaryotic elongation factor (eEFSec):
This protein is an essential factor for selenoprotein synthesis, as it is involved in the elongation process during translation (50).
In this case, the prediction was performed using the squirrel query (SPP00001221). Neither Exonerate nor Genewise predicted a good alignment between the two sequences in KZ294859.1. As this is a machinery protein, no Sec amino acid was found in the predicted sequence. Surpsisingly, a SECIS element was found in 3’UTR but that was not the case for the Seblastian prediction as expected. Therefore, we conclude that Spermophilus dauricus do not have the eEFSec protein. The following graph shows eEFSec exon distribution.
exon start exon end exon length 1 518883 519422 539 Methionine sulfoxide reductase A (MsrA):
This protein participates in the conversion of methionine sulfoxide to methionine, in a chemical process called enzymatic reduction. It is also involved in the reparation of oxidatively damaged proteins to restore its biological activity (46).
MsrA prediction was performed using the Homo sapiens query (SPP0000012). Exonerate software predicted an alignment with a gap at the beginning and another one at the end. For its part, Genewise predicted the same protein, but in this case without the gap at the end of the alignment. As this is a machinery protein, no Sec amino acid was found in the predicted sequence. However, a Cys-homologue amino acid could be found when the human query was used for the prediction. SECIS element could be found in 3’UTR but that was not the case for Seblastian prediction as expected. Therefore, we conclude that Spermophilus dauricus have the MsrA protein in its genome. The following graph shows MsrA exon distribution:
exon start exon end exon length 1 2650453 2650522 69 2 2702778 2702897 119 3 2784355 2784459 104 4 2807500 2807606 106 5 2903897 2904058 161 Selenophosphate synthetases 1 (SEPHS1):
SEPHS1 prediction was performed using the squirrel query (SPP00001193). Both Exonerate and Genewise softwares predicted an alignment with high levels of homology. These softwares predicted the same protein, but instead of finding a Sec amino acid we found that the homologue amino acid was a Thr. As this is a machinery protein no Seblastian prediction was observed. SECIS element could be found in 3’UTR. Therefore, we conclude that Spermophilus dauricus have the SEPHS1 protein in its genome. The following graph shows MsrA exon distribution:
exon start exon end exon length 1 147565 147757 192 2 142260 142363 103 3 140026 140133 107 4 138773 138924 151 5 135141 135231 90 6 132124 132648 524 Selenophosphate synthetases 2 (SEPHS2):
The prediction was performed using the human SEPHS2 query SPP00000064. All Exonerate, Genewise and Seblastian predicted the same protein, which aligned perfectly with the chosen query but for a gap at the beginning of the sequence. Also a SECIS element was found in 3’UTR. Therefore we can conclude that Spermophilus dauricus has the SEPHS2 selenoprotein, which is a machinery protein. The sequence is found in the reverse strand. The following graph shows SEPHS2 exon distribution:
exon start exon end exon length 1 147623 147799 176 tRNA Sec 1 associated protein 1 (SECp43):
SECp43 is part of the machinery which is responsible for selenoproteins formation and regulation. In this case, this protein is involved in the control of selenoprotein expression (45).
The prediction of SECp43 was performed using the squirrel query (SPP00001223). Both Exonerate and Genewise predicted the same protein, which aligned well with the chosen query. As this is a machinery protein, no Sec amino acid was found in the predicted sequence. No SECIS element were found in 3’UTR nor Seblastian prediction as expected. Therefore, we conclude that Spermophilus dauricus may have the SECp43 protein. The following graph shows SECp43 exon distribution:
exon start exon end exon length 1 4491504 4491601 97 2 4496805 4496904 99 3 4497526 4497578 52 4 4500206 4500337 131 5 4501748 4501867 119 6 4507956 4508118 162 7 4508534 4508567 33 8 4510919 4511025 106 Selenocysteine synthase (SecS):
This enzyme catalyses one of the three step process in which selenocysteine tRNA is made. Specifically, it participates in the conversion of O-phosphoseryl-tRNA(Sec) to selenocysteinyl-tRNA(Sec) (48,49).
Following the same pattern as PSTK, but using the human query SPP00000065, we found alignment by using both Exonerate and Genewise softwares. Results for this protein are very similar to the ones found for PSTK.
exon start exon end exon length 1 440028 440182 154 10 407009 407312 303 2 438000 438118 118 3 437288 437446 158 4 436303 436456 153 5 431144 431246 102 6 422182 422311 129 7 421957 422048 91 8 410324 410417 93 9 408688 408778 90