DISCUSSION
We characterized Thunnus orientalis (tuna) selenoproteins by analyzing its homology with Danio rerio (zebrafish) and Homo sapiens (human) proteins. As zebrafish is phylogenetically closer with tuna than with human, we tried to base our analysis focusing mainly on the first one rather than the second. Zebrafish has the largest selenoproteome found in bony fishes, with a maximum of 48 selenoproteins (extracted from SelenoDB and UniProt databases).
The largest and the best studied selenoprotein families are glutathione peroxidases (GPx), iodothyronine deiodinases (DI) and thioredoxin reductases (TR) families, with 5, 3, and 3 Sec-containing genes in the human genome, respectively. However, the function of half of mammalian selenoproteins is still unknown [6]. Selenoproteins with partially characterized biologic functions (e.g. SelH, SelI, SelM, SelN, SelS, SelT and SelW) or unknown functions (e.g. SelK and SelO) have been less well studied. However, most of them are likely redox enzymes, as their active sites have Sec, which thus far has always been associated with redox functions [14].
After the split with fishes, several selenoproteins were lost across vertebrates after the terrestrial environment was colonized. This is consistent with the idea that mammals reduced the utilization of Sec compared with fishes. 21 selenoproteins are found in all vertebrates, whether the other selenoproteins are only found in certain lineages. For example, selenoproteins SelL, SelJ and Fep15 are today found only in fishes [6].
Several selenoproteins' genes are found duplicated in all bony fishes genomes investigated up until now. This fact is due to the dynamic process by which new selenoprotein genes were generated by duplication, while others were lost or replaced their Sec with Cys during the evolutionary process. This event generated selenoproteins GPx1b, GPx3b, GPx4b, DI3b, SelT2, SelU1b and SelU1c. Additionally, some gene duplications were observed only in specific lineages. In zebrafish, we find additional copies of SelO, SelT1 and SelW2, named respectively SelO2, SelT1b and SelW2b [6].
Below, we describe the most outstanding facts found in our results according to the bibliographic research carried out upon tuna selenoproteome and we provide the interpretation given to the detected matches, as well as other relevant features. However, for a large group of selenoproteins, mainly the named Sel proteins, we could not contrast our results with bibliographic information, as most of them have not been fully studied.
Selenoproteins in Thunnus Orientalis
Iodothyronine deiodinases (DI)
Iodothyronine deiodinases (DI) regulate activation and inactivation of thyroid hormones. All DI selenoproteins show intrafamiliar homology. There are three DI enzymes known in mammals, all of which contain Sec: DI1, DI2, DI3. DI3 protein is duplicated in all bony fishes resulting in DI3a and DI3b [6]. Although the latter proteins are only found in bony fishes, these were found to be bad annotated in zebrafish. This is the reason why we decided to blast them against the human DI3 selenoprotein. All detected DI3 mammalian genes are intronless [6]. In the case of bony fishes the only one that has introns in its sequence is DI3a, as it is shown in the correspondent exonerate file of our results.
Moreover, an interesting feature in DI2 mammalian gene is that its mRNA has a second in-frame UGA codon [6]. However, in tuna this characteristic has not been found, according to the results we have obtained.
The characteristics of the proteins from this family found in our study correspond with those found in all DI proteins described in bony fishes, with the exception of protein DI3a whose SECIS element has not been observed. This absence of SECIS elements may be caused by the high fragmentation of our genome, which maked difficult to find these specific sequences. To avoid this problem we systematically checked three contiguous contigs beyond the end of our protein in order to find their SECIS element/s. Nevertheless, in case of DI3 selenoprotein it could not be found, perhaps because its SECIS sequence was located in a further contig.
Our predictions were confirmed by the Selenoprofile output, where every protein was matched with the same contigs that our program did, including all Sec found in their sequence.
Glutathione peroxidase (GPx) is the general name of an enzyme family with peroxidase activity whose main biological role is to protect the organism from oxidative damage. GPx is the largest selenoprotein family in vertebrates. Mammals have eight GPx homologues, five of which are selenoproteins (GPx1-4 and GPx6). The rest of GPx currently described have evolved from the ancestors just mentioned [6] [15].
It appears that Cys-containing GPx7 and GPx8 evolved from a GPx4-like selenoprotein ancestor, and this happened prior to separation of mammals and fishes [6]. The high similarity between GPx7 and GPx8 was also confirmed in our results by comparing the best-e-value hits in the blast file of these proteins. The highest e-value hit for GPx7 corresponded to the the second best hit for Gpx8, and vice versa. Moreover, in the blast file we also observed a significant but not enough consistent similarity between the latter proteins and GPx4, confirming the affirmation made above.
In bony fishes, three GPx duplications are described, generating GPx1b, GPx3b and GPx4b, probably due to the common duplication observed in fishes’ genomes. GPx5 and GPx6 are the most recently evolved GPxs, which appear to be the result of a tandem duplication of GPx3 at the root of placental mammals [6]. To confirm the absence of these two selenoproteins we blasted tuna genome against human GPx5 and GPx6 and, although finding some hits, all of them corresponded to other homologous GPxs, indicating the high intrafamiliar similarity of these selenoproteins.
Neither in GPx3, GPx7 nor GPx8 we could detect any SECIS element. In the case of GPx3 we attributed this result to the high fragmentation of the tuna genome sequentiation, which maked difficult following the continuity of the sequence of a specific region. In the other two GPxs mentioned we observed that the protein sequences did not contain any Sec, indicating that the SECIS element might have been lost during the evolutionary process.
When analyzing the Selenoprofiles results we noticed that the program predicted every selenoprotein found in the previous analyses, except a fragment of GPx8. Nevertheless, it was already predicted during the execution of our program, as shown in its t-coffee file.
Fep15 (fish 15kDa selenoprotein) is a recently discovered selenoprotein which is found to be distantly related to members of the 15 kDa selenoprotein (Sep15) family. It is absent in mammals and can be detected only in fishes, being present in these organisms only in the selenoprotein form [16]. However, it has recently been found in other species such as cartilaginous fishes and frogs. This fact implies that Fep15 formed part of the ancestral vertebrate selenoproteome and was lost prior to the split of the reptiles [6]. Moreover, it appears that Fep15 evolved by duplication of SelM (a selenoprotein that shares 31% sequence identity with Sep15) in animals, most likely in fish, followed by mutations that resulted in the loss of Cys in the region upstream of the Sec [6][16].
In our results, we have been able to appreciate this relationship between Fep15 and SelM by modifying the e-value in our program up to 1. In both of the Fep15 blast file and SelM blast file is shown the homology found between these proteins. This is confirmed by the hits corresponding to Fep15 and SelM proteins, in which we observed that the second best hits corresponded to SelM and Fep15 respectively. We can also appreciate the loss of Cys in Fep15 exonerate file compared to the SelM exonerate file, in which we can appreciate the presence of Sec in BADN01035957.1 contig.
Using our program we were able to predict this protein in the tuna genome, and although its sequence was incomplete, we could observe the Sec and its SECIS element. This result was also confirmed by using Selenoprofiles analysis.
Methionine sulfoxide reductases (Msr) are thiol-dependent enzymes which catalyze conversion of methionine sulfoxide to methionine. Three Msr families are known up until now (MsrA, MsrB, and fRMsr) [17]. The use of Sec in MsrA and MsrB is rare when viewed from an evolutionary perspective. The absolute majority of these enzymes are Cys-containing proteins. However, it was found that MsrA exists in a selenoprotein form in some lower organisms, such as green algae and bacteria, wherein it utilizes catalytic Sec in place of Cys in determined environmental conditions [17].
When looking at our results, we observed that our predicted MsrA proteins did not contain any Sec amino acid, which is consistent with the bibliographic information previously consulted. Due to this lack of Sec, we did not find any SECIS elements in these genes. Moreover, we could check from the Seleprofiles results that, in both cases, the Sec/Cys change did happen in these two proteins.
When analyzing Sel15 protein we observed that this was associated to two different contigs, as indicated in the results table. We observed that the first contig showed some problems in the gene structure prediction through exonerate and genewise. We suspected that this was due to the small size of this particular contig, which did not allow the program to successfully predict the exons of the protein. This did not happen with the second contig associated to this protein.
To solve this problem, we obtained the predicted protein directly from the blast file, assuming that the whole sequence would be an exon, and we ran the t-coffee manually. Analyzing this file and the t-coffee file obtained from the other contig we could predict the whole protein, including the Sec amino acid. In the blast file we observed hits from other contigs, but due to its low quality we could not extract any results from them.
SelI is one of the last discovered selenoproteins. It contains a CDP-alcohol phosphatidyltransferase domain, which is typically found in CHPT1 and CEPT1 proteins, making these proteins similar. The most prominent difference between SelI and its homologs is a C-terminal extension in SelI, which contains the Sec residue, as shown in our query obtained from human genome. The function of this extension is still unknown [6].
We observed two contigs in the blast file. The first one was associated to the SelI protein and the other one we hypothesized that could be related with CEPT1. To check this hypothesis out we blasted tuna genome against CEPT1. The obtained results showed that the second best-e-value contig for SelI blast corresponded to CEPT1 protein.
Previous studies indicate that SelJ show significant similarity to the jellyfish J1-crystallins and, with these, they constitute a distinct subfamily within the large family of ADP-ribosylation enzymes. Nevertheless, recent studies suggest that this protein could have a structural role rather than a functional one. Like the majority of eukaryotic selenoproteins, the function of SelJ is not fully clear [18].
SelJ has a very restricted phylogenetic distribution and, in contrast to all known eukaryotic selenoproteins, does not exist in mammalian, bird or amphibian genomes, not even as a Cys-homologue; it appears to be restricted to fishes either in Sec or Cys form, with Cys homologues only found in cnidarians. SelJ has 9 exons, with a single Sec residue lying in exon 7, and it has a type I SECIS element [18].
Our results showed that, in tuna, SelJ has the exact same structure previously predicted, showing the same number of exons, the same Sec localization, as well as the presence of the SECIS element.
SelK is a small selenoprotein with no assigned biochemical or biologic functions. Only one published study linked SelK to a possible function in cellular redox homeostasis. SelK is the most widespread selenoprotein, being present in nearly all eukaryotes that use Sec, which has been described to be replaced with a Cys-containing homolog in nematodes and several other organisms. Thus, SelK appears to be an ancient selenoprotein that evolved in the early ancestor of eukaryotes [19].
Among the selenoproteins that have pseudogenes, SelK has been found to be the one that has more than any of them. Up until now, only mammals have been described to present these pseudogenes, being rodents the ones that have the highest number [6].
According to this information, our results showed that fishes (or at least tuna) do not present any of this pseudogenes, as only one hit has been noted to match with this protein, and in order to describe pseudogenes it would be needed to find more hits showing homology with this protein.
We supposed that the contig did not include the Sec because of the reduced size of it. Moreover, we could assure Sec presence as we were able to predict the selenoprotein corresponding SECIS element in the 3’UTR region of the gene.
SelL proteins are widely distributed in bony fishes, being also present in cartilaginous and jawless fish, tunicates, crustaceans, mollusks, and bacteria. Previous studies have identified a selenoprotein family named SelL, whose members have two highly conserved Sec residues, as we could observe in the SelL query in zebrafish. 16 eukaryotic SelL sequences were detected, including 11 fish, 2 ascidian, 2 crustacean, and 1 mollusk sequences and 2 prokaryotic sequences from unknown marine microorganisms. The function of SelL selenoprotein remains still unknown, and it is also unclear why all described rare selenoproteins occur in aquatic organisms, as it happens with Fep15 and SelJ [20].
Our results showed that, indeed, we could observe SelL selenoprotein in tuna genome. Although our contigs were not long enough to include the Sec amino acid, we were able to predict two SECIS elements in the 3’UTR region of the gene, which allowed us to think that tuna conserves both Sec despite the fact we could not observe them.
SelP is a protein in charge of the Se transport, which is important to maintain normal brain function. According to previous research, SelP has a varying number of Sec residues and is unique in that it contains two SECIS elements. These two SECIS elements are separated by an average of 334 nucleotides and are always located in the same exon in the 3’-UTR in the vertebrates studied up until now [6].
Our results showed 13 Sec residues in tuna SelP, and by checking this gene in both zebrafish and human genomes, 17 and 10 Sec were respectively found. These results allowed the confirmation of this reported variation in the number of Sec residues among species. Moreover, our results confirm the presence of these two mandatory SECIS elements in the gene sequence. Additionally, it has been found that, interestingly, SelP1b only has one Sec residue, but the reason of this difference is still unknown.
SelR is a Sec-containing selenoprotein in mammals, whereas non-mammalian eukaryotes (animals, plants and yeast) and prokaryotic organisms contain Cys in place of Sec [21]. In contrast with this information, and by searching through some databases, we could find the sequence corresponding to SelR1b gene in zebrafish containing a Sec residue; while SelR1a contained Cys instead of Sec, as expected.
Our results showed that both SelR1a and SelR1b in tuna were Sec-containing selenoproteins. Both of them also contained SECIS elements, which confirms the presence of this amino acid in the protein. This same results were confirmed by Selenoprofiles analysis. However, according to the information exposed above, we could not conclude if this discrepancy in Sec- or Cys-containing differences between tuna and zebrafish were due to a loss of Sec in the zebrafish’s selenoporteome or to a mutation in tuna’s genome, turning out to a regain of this Sec residue. Nevertheless, we opt for the first argumentation as being the most evolutionarily plausible.
When looking at the exonerate file of tuna SelR1a protein we observed that the contig appeared to be divided in two different regions. According to the starting and ending positions of these two parts, we concluded that the two predicted sequences were attached contiguously inside the contig. Moreover, when searching for Sec in these two regions, we found out that this amino acid was located at the same position in both sequences. In the second sequence (that goes from position 4 to position 113 at the reference zebrafish protein sequence), we observed that the seventh position (Ser) matched with a Stop codon (TAA) in tuna. We concluded that the first sequence corresponded to SelR1a protein in tuna’s genome, while the second was a copy found in 5’ region of SelR1a, that might be equivalent to a pseudogene, as it contains the previously described Stop codon within its sequence, that truncates the protein translation. Moreover, as the alignment with the original query appears to be good, it could be though that this event has occurred recently in the evolutionary process.
In reference to SelR2 and SelR3 proteins we did not detect Sec neither in zebrafish nor tuna, as both of them had converted them to Cys. This result was also confirmed by Selenoprofiles. Moreover, we did not find any SECIS elements beyond this protein sequences, which confirms the absence of the amino acid.
We reported a similiar case in SelU1c, a selenoprotein widely spread among vertebrate organisms such as fishes and birds, but not present in mammals (as it is only found in platypus). In bony fishes, this gene has suffered a duplication that generated selenoprotein SelU1b [22]. Among all SelU1 proteins we could only find SelU1c well annotated in zebrafish after consulting all databases (SelenoDB, Ensembl and UniProt).Therefore, SelU1a and SelU1b were searched in tuna’s genome genome by blasting it against the SelU1 human protein. As we can see in the exonerate file, we observe that the alignment shows that right after the first intron, where zebrafish sequence has a Cys, tuna has a Stop codon.
With these results, we conclude that our study has been able to predict two pseudogenes. These provide relevant insights about tuna's unique selenoproteome features.
In order to find and analyze selenoprotein SelT in tuna’s genome we compared it to the related zebrafish protein. In zebrafish selenoproteome, we found three sequences corresponding to SelT protein: SelT1a, SelT1b and SelT2. However, SelT1b has not been found in tuna’s genome. When analyzing SelT1a we observed that the zebrafish protein did not contain the Sec amino acid, while our predicted protein did, as well its SECIS elements, that confirmed the prediction. This results were also confirmed by using Selenoprofiles.
SelW2 protein is found in many bony fishes. There are two copies of this gene described in zebrafish, SelW2a and SelW2b, and a third one described in other bony fishes, SelW2c. Moreover, the organisms that show this third copy seem to have lost SelW1 [6].
However, from our results, we have not been able to characterize these proteins in the tuna genome, probably due to the fact that the genome is very fragmented. However, we were only able to find a hit corresponding to SelW2a when comparing tuna’s genome with zebrafish’s proteins. This result was also coincident with the result obtained from Selenoprofiles.
Thioredoxin reductases (TR) are proteins that control the redox state of thioredoxins, key proteins involved in redox regulation of cellular processes. Mammals have three TR isoenzymes: cytosolic TR1, mitochondrial TR3, and TGR. Previous studies have revealed various protein variants for each mammalian TR. Only TR1 and TR3 have been detected in fish genomes [6]. In our analysis on tuna’s genome we are able to find three isoenzymes of this family: TR1, TR2 and TR3.
Machinery
SPS1 and SPS 2
Selenophosphate synthase 2 (SPS2) is the protein that generates the Se donor compound necessar for Sec biosynthesis, and it is itself a selenoprotein. In non-mammalian vertebrates only the isoform SPS2a is present, whether in mammals we find another copy of the gene named SPS2b. Both of these proteins have strong SECIS elements [6]. In our results we found two different hits, which we named SPS2 and SPS2~. According to this previous information found, we can hypothesize that SPS2 and SPS2~ correspond to SPS2a and SPS2b proteins.
Gene duplications
For some selenoproteins’ genes it has been found that the quantity of hits exceeded the number of contigs assigned to one specific protein. This fact is corroborated by the contigs’ e-value, which is better for the ones matching with the protein itself rather than for the extra found. Moreover, the t-coffee files obtained from the alignment of these proteins show that the best-e-value contigs entirely represent nearly the whole protein in a fragmented way. Hence, assuming that the tuna contigs are well assembled and therefore have no more than one repeat per region, the extra hits found can not be associated to the same protein although being so similar to them. This results make us think about the possibility that these extra copies of some selenoproteins might be duplications occurred recently under the light of evolutionary process.
This fact mentioned above has been detected to happen in tuna genome for proteins SelH, SelJ, SelL, SelO and TR; which we renamed as SelH~, SelJ~, SelL~, SelO~ and TR~. By way of example, we will describe the case of selenoprotein SelH, which was found to have 3 hits. From these, the contigs with the best e-value are BADN01109579.1 and BADN01109580.1. That is why we associate SelH with both of them, while the third hit (the contig BADN01109583.1) can be predicted to be a recent duplication of a high mutated SelH protein in tuna. The same happens with the rest mentioned proteins, except from TR. In the case of TR selenoproteins family we observed an interesting phenomenon after finding 3 variants belonging to the protein TR1 (TR1~A, TR1~B and TR1~C). This could be attributed to the previously mentioned fact, in which they might have been duplicated in tuna genome. The last explanation resides in the possibility that the presence of these new proteins is an artefact resulting from working with a very fragmented genome, turning out to be a false result.