Home Introduction Materials and methods Results Discussion Conclusions Contact Bibliography

In search of selenoproteins in the world's smallest mammal

-Craseonycteris thonglongyai-


DISCUSSION


In order to achieve our aim of characterizing the selenoproteome and the needed machinery to synthesize it from Craseonycteris thonglongyai, we studied its homology with Homo sapiens’s one. Even though Mus musculus was also a good candidate to use as the genome of reference, we have selected the human genome due to its close phylogenetic relation with our specie and its exhaustively good annotated genome. Regarding that selenoproteins are widely conserved in all kingdoms of life, we would expect a high similarity. Nevertheless, we are analyzing relatively distant species, which means that some disparity could be observed too.

We will analyse and discuss all the deduced proteins individually. The SECIS elements deduced using the Seblastian tool, will be also discussed individually for each selenoprotein.

Finally, one figure for each deduced protein has attached in order to schematize it. The predicted exons were represented as rectangle and the found Sec as the Sec found.

SELENOPROTEIN DISCUSSION

Thioredoxin Reductase (TR)

    Thioredoxin Reductase 1 (TR1):

For this protein, the scaffold PVKE010016090.1 was accepted to deduce the protein because it match our criteria. 11 hits were obtained, which means, 11 fragment alignments. The deduced gene was found in the forward strand and it had 13 exons (shown in the image below).

The score obtained from the T-coffee output was 1000 and the coverage type II was of 96%, meaning that the alignment had high quality. Moreover, the protein starts with a methionine. However, the C-terminus of the predicted protein was missing, lacking the last 12 amino acids. In the human protein, the selenocysteine is found in these fragment, therefore our protein does not align any selenocysteine with the reference one neither contains a selenocysteine in other location.

Regarding the SECIS prediction from Seblastian, no SECIS elements were predicted. According to the given results it could be affirmed that TR1 is found in Craseonycteris thonglongyai. However, it cannot be said that it is a selenoprotein since no selenocystein either SECIS element were predicted. Nevertheless, analyzing the exonerate output, we can see that the last predicted exons is almost at the end of the scaffold since only 4138 bp are left until its end. Keeping in mind this observation and assuming that these base pairs left until the end of the scaffold are part of an intron, it could be thought that the C-terminus of the protein is found in another scaffold together with the SECIS element but we have not been able to detect it because is a really small sequence. Therefore, it cannot be totally confirmed that TR1 is not a selenoprotein in Craseonycteris thonglongyai.

TR1


    Thioredoxin Reductase 2 (TR2):

In case of TR2, among the all the scaffolds PVKE010000238.1 was chosen because of the great values found in identity and e-value. The deduced gene was found in the reverse strand and several exons were deduced. However, when they were analyzed individually, we realised that some of them covered the same region of the scaffold, as some of them overlapped. Because of these, only the first 17 exons were considered since the rest were alternative exons of these ones (the resulting prediction of the gene is shown in the image below).

The score obtained in T-coffee was great and a coverage type II of 90% was obtained. Although the deduced protein does not start with a methionine, the C-terminus is totally conserved and a selenocysteine is found to be aligned with the one found in the protein of reference.

Regarding to Seblastian results, one SECIS element was deduced in the coordinates 44240-44319, positioned in the negative strand. This results are congruous, as the SECIS element must be found at the 3’ UTR. Moreover, Seblastian deduced a selenoprotein. The predicted sequence has more information than the one obtained with our program, as we can find a better alignment in terms of the starting region of our protein because now we can find the starting methionine. From this results we conclude that this is the conserved member of Thioredoxin Reductase family in Craseonycteris thonglongyai.

TR2


    Thioredoxin Reductase 3 (TR3):

In this case, three different scaffolds were considered to deduce the protein in Craseonycteris thonglongyai since all of them had significant alignments according to our criteria. Scaffold PVKE010034544.1 and PVKE010031651.1 were aligned with the middle of the protein whereas finally PVKE010019712.1 was aligned with the C-terminus of the query. Because these hits only overlapped for a few amino acids, it was hypothesized that TR3 was truncated between these three scaffolds in Craseonycteris thonglongyai. Therefore, from scaffold PVKE010034544.1 a fragment of the gene was found in the negative strand and it had 7 exons, in PVKE010031651.1 it was found in the negative strand and it had 3 exons and finally in PVKE010019712.1 it was located in positive strand and it had 4 exons (resulting gene shown in the image below). When the three gene deductions were deeply analyzed, it was seen that our hypothesis was coherent since the exons were really near to the end of the scaffold (less than 5.000 bp) indicating that the remaining base pairs could be part of an intron which was divided between the two scaffolds.

When the three T-coffee were analyzed in all cases the scores obtained were really high, however when coverage was studied for each scaffold separately, it was seen that they were really low (PVKE010034544.1 coverage was 38%, PVKE010031651.1 coverage was 15% and PVKE010019712.1 coverage was 14%). Nevertheless, when a multiple alignment is performed with the three scaffolds, the coverage type II arises to 67%. That is because the first 249 amino acids of the protein of reference have not been aligned with any predicted protein, therefore there is not a methionine in the beginning of the deduced protein. What we see is that a selenocysteine is conserved in the C-terminus of the protein, in same location as in the Homo sapiens protein.

Finally, Seblastian deduced a SECIS element when scaffold PVKE010019712.1 was used, since it codified for the C-terminus of the protein. Moreover, it predicts a selenoprotein. Therefore, considering all the given arguments, we accept that TR3 selenoprotein is found in Craseonycteris thonglongyai and that it has been truncated in three different scaffolds.

TR3


Finally, since it was seen that some scaffolds appeared aligned in all the three proteins of the family, a phylogenetic tree was generated in order to have a second way to validate that the scaffolds selected according to our criteria, when it comes from identity and e-value, were the most adequate in each protein deduction. Below, we can see the phylogenetic tree that shows evolutionary relationships between the protein of reference and the aligned scaffolds.

ARBRE


When we analyze the phylogenetic tree, we can see that in case of TR2 it was really clear that the scaffold that is evolutionary closer to this protein is the one selected to predict TR2 in our organism, PVKE010000238.1. The same happens with TR3 and the scaffold PVKE010031651.1. Finally, for TR1 results are not as clear as the cases above. However, we see that the selected scaffold PVKE010016090.1. is relatively close to TR1 in phylogenetic tree. When it comes to the other two scaffolds selected to deduce TR3, which we accepted that in Craseonycteris thonglongyai was truncated in three different scaffolds, we see that they are not as close as PVKE010031651.1. is to TR3. However in this case the multiple alignment was also considered. It was performed with all the four scaffolds that match our criteria for the values of identity and e-value. Because PVKE010034544.1. and PVKE010019712.1 showed a high coverage combined with PVKE010031651.1. and they overlapped only for a few amino acids, all were taken to deduce TR3. PVKE010038000.1. was not selected since in the multiple alignment introduced a lot of gaps and it was quite separated from the alignment seen with the other scaffolds. Moreover, its identity and e-value were not as great as the ones seen in the other three.

SelW

    SelW1:

The scaffold PVKE010004680.1 was the only region with results of the TBLASTN against the query protein that matched with our criteria. From this one, 9 exons from the negative strand were predicted after applying Exonerate.

From T-coffee we obtained an alignment with a score of 1000, this is to say it was of high quality. The predicted protein starts with a methionine, and contains a Sec that matches with the human Sec.

Three SECIS elements were obtained, however, only one of them gives congruent results. The first one is found at the positions 12544-12619 in the negative strand, which makes sense regarding the coordinates of the exons (see the figure below), but when we analyze the second one, we obtain the coordinates 53187-53260 from the negative strand, which would mean that the SECIS element would not be found in the 3’ UTR region. Finally, the third one localizes in positions 4348-4423 from the positive strand, which does not make sense as we are predicting positions from the negative strand.

From this we conclude that that SelW1 is a very conserved selenoprotein.

selw1


    SelW2:

The scaffold selected to predict the proteinwas PVKE010000318.1, and it was the only one to appear with the assigned identity and e-value. Four exons from the negative strand were obtained by Exonerate, whose coordinates are shown in the figure below.

The alignment obtained from T-coffee is excellent very precise. Also, the predicted protein starts with a methionine. In this case, no Sec can be found in any of the two species.

No SECIS element was predicted by Seblastian. Therefore, we understand from this results that this selenocysteine has been lost in both of the species, which means that it is not a selenoprotein.

selw2


SelV

The only scaffold that appeared of SelV was PVKE010006012.1, therefore, it was the one chosen. Nevertheless, when we obtained the results from T-coffee, we acquired a score of 347, which is very low. We analyzed the alignment and we saw that there was practically no homology as the Homo sapiens did not seem to have that protein sequence. Also, from this results, we obtained 245 Sec for Craseonycteris thonglongyai, and only one for Homo sapiens. None of the selenocysteines from the bumblebee bat matched with the human one. Also, the Exonerate result showed 376 exons in the positive strand.

Therefore, we analyzed the exonerate document and we saw that most of the exons that appeared happened to be alternative exons. To solve this, we created a new GFF document with just one set of the predicted exons. From that, we runned manually the program until the T-coffee result, obtaining much more optimal results. Now, the score obtained was 993. Only one Sec was found and it matched the human Sec.

Seblastian predicted one SECIS element, also in the positive strand, which is congruent with the previous results. Therefore, we conclude that this selenoprotein is found in both species and it has many alternative exons.

selV


SelU

    SelU1:

According to our criteria, we found that scaffold PVKE010001609.1. was aligned with the query. That is why it was used to predict the gene which was located in the reverse strand, and composed by 5 exons (shown in the image below).

By using T-coffee tool we appreciated there was a high identity, with a type II coverage of 97%. The N- and C-terminus were highly conserved in the predicted protein, starting in methionine. No selenocysteine was found in any of both sequences.

After running Seblastian, the program could not predict any selenoprotein in our sequence. However, a SECIS element was found in the same strand as the predicted gene and in the 3’UTR, but not considered for further interpretation of the results, due to the far distance from our scaffold and no apparent relation with the predicted protein, since no Sec was found. Therefore, according to the given results, we can say that this protein is conserved in Craseonycteris thonglongyai and, as in Homo sapiens Sec has been lost since we still can find the SECIS element.

selU1


    SelU2:

From the alignments obtained with TBLASTN, only one scaffold was accepted with our criteria: PVKE010002044.1. The predicted gene was located in the forward strand and was composed by 6 exons, all in forward strand (shown in the image below).

T-coffee showed a good alignment between the predicted protein and human SelU2 sequences, having a high (96%) type II coverage. The C-terminus was highly conserved, but the 2 first amino acids in N-terminus were missed, meaning that our protein did not start with methionine. No Sec was found in the predicted protein just as in Homo sapiens protein.

Seblastian did not find SECIS element or Sec in our predicted protein. Therefore, we can conclude that this protein is conserved in Craseonycteris thonglongyai and that as in Homo sapiens it is not a selenoprotein.

selU2


    SelU3:

TBLASTN showed only one scaffold that matched with our criteria: PVKE010028091.1. The predicted gene was composed by 7 exons in the forward strain (shown in the image below).

Using T-coffee, the alignment between the predicted protein and the reference protein was analyzed. The type II coverage was 96%, which represents a very good alignment. The predicted protein started and ended with the same amino acid sequences than the human SelU3, so we considered that the protein was complete. However, no Sec residues were found in predicted protein either in human query.

After running Seblastian, no SECIS elements or selenoproteins were predicted, what coincide with our previous comments. Therefore, we accept our results, and we can say that SelU3 is conserved in Craseonycteris thonglongyai and it has lost its Sec meaning that it is not a selenoprotein.

selU3


SelT

According to our criteria two alignments in the same scaffold (PVKE010000592.1) were found for SelT. The predicted gene was found in the reverse strand and it was compound of five exons (shown in the image below).

T-Coffee results showed a good score and the predicted protein has the same length as the human reference protein. Therefore, it has a methionine in the N-terminus, and C-terminus was also high conserved Moreover, a selenocysteine is found in the same position in both proteins.

Finally, using Seblastian software a SECIS element was found also in the reverse strand and in the 3’UTR. Moreover, Seblastian also predict a selenocysteine in our sequence. Therefore, for all the given arguments, we can accept our prediction and we can confirm that SelT is present in Craseonycteris thonglongyai.

selT


SelS

For this protein, one scaffold was found to be aligned. The scaffold we selected, PVKE010010535.1 deduced a gene in the forward strand which had six exons (shown in the image below).

The score obtained with T-coffee was 993, and the predicted SelS had an almost perfect homology if we compare it to its homologous in the human protein. We calculated a coverage of approximately 97% for this protein. The deduced protein starts with a methionine and a Sec was found to be conserved as well as in the query protein.

Now regarding the SECIS elements, we found only one SECIS element located in the 3'UTR region, and whose coordinates are shown in the scheme below too. Therefore, according to the given results we can say that SelS is a selenoprotein found in Craseonycteris thonglongyai.

selS


SelR

    SelR1:

For this protein, one scaffold was found to be aligned. The scaffold we selected, PVKE010006209.1, deduced a gene in the reverse strand that had 4 exons (shown in the image below).

The score obtained with T-coffee was 1000, and the predicted SelR1 has perfect homology if we compare it to its homologous in the human proteome. The coverage calculated was 95%. It starts with a methionine residue both in human and in Craseonycteris thonglongyai as well as a Sec.

Now regarding the SECIS elements, we found only one SECIS element located in the 3'UTR region, and whose coordinates are shown in the scheme below too. Therefore, we can conclude that SelR1 selenoprotein is conserved in Craseonycteris thonglongyai.

selR1


    SelR2:

For this protein, scaffold PVKE010012725.1 was selected. The deduced gene was found in the reverse strand and consisted of 4 exons (shown in the image below).

The score obtained with T-coffee was 997 and the predicted protein has an almost perfect homology if we compare it to its homologous in the human genome. The coverage score calculated for this protein was 86%. The deduced protein does not start with a methionine residue and as in Homo sapiens no Sec is found.

Now regarding the SECIS elements, we could not find any for SelR2 when using Seblastian. This matches the prediction that was made while using our program, therefore, we can say that SelR2 is conserved in Craseonycteris thonglongyai and it is not a selenoprotein.

selR2


    SelR3:

We found a total of 6 scaffolds but only 2 of them met our criteria for being selected. After analysing them we accepted that this protein was truncated into the two scaffolds selected, PVKE010008082.1 and PVKE010015543.1. The first deduced a gene in the reverse strand with five exons and the second deduced a gene in the forward strand with four exons.

All of them had a reasonably good score in its respectives T-coffees (998 and 995). With the use of multiple sequence alignment (MSA) we were able to confirm the initial hypothesis and to accept that our predicted protein would be found truncated in two different scaffolds. We calculated a final coverage of 95%. Deduced protein does not start with a methionine residue possibly due to the non-complete exactitude of the exonerate programme.

The query protein did not have selenocysteine in its sequence. However, we predicted a Sec residue in the PVKE010008082.1 scaffold. However, it was located in the region that overlapped with both sequences where the quality of the alignment with the query is deficient. Moreover, we found that in the query protein, the cysteine resulting from the Sec transition was located in the end of the sequence, and we found the same residue in our predicted protein. Finally, regarding the SECIS elements, we could not find any for the deduced protein when using Seblastian.

For all these reasons, we accept that it is senseless to predict the Sec that was found in the scaffold PVKE010008082.1, and more studies should be performed in order to obtain a predicted protein with higher quality and to determine if the predicted protein is a selenoprotein.

selR3


SelP

For this protein, 1 scaffold was found to be aligned according to our criteria, which was PVKE010010787.1. The deduced gene was located in the reverse strand and consisted of a total of 4 exons (shown in the image below).

The score obtained with T-coffee was 977 indicating that the alignment was really good. The coverage calculated was 88%. Moreover, the deduced protein starts with a methionine residue. However, the C-terminus of the protein is not as well conserved as the rest of the deduced proteins when is compared with the query. Regarding to Sec, we found 15 selenocysteines in our organism whereas in Homo sapiens only 10 selenocysteines were observed. The last four Sec did not align with the query protein, but it was due to an insertion in the predicted protein, assigned by T-Coffee.

We found in the literature (Mariotti M et al, 2008) that SelP has a varying number of Sec residues among mammals and vertebrates, and some of the Sec observed in our predicted protein are also found in organisms as the cow or the dog.

Now regarding the SECIS elements, using Seblastian we found 2 different SECIS elements located in the 3'UTR region of the negative strand. Both of them being pretty close to each other and to the last exon of the protein. Therefore, we accepted both of them because SelP has two SECIS elements in mammals (Mariotti M et al, 2008) separated by an average of 334 nucleotides. Hence, analyzing the given results we can accept that SelP is found in Craseonycteris thonglongyai and it is a selenoprotein.

selP


SelO

For this protein, only 1 scaffold was found to be aligned PVKE010014341.1. The deduced gene was found in the forward strand and it had 9 exons (shown in the image below).

The score obtained with T-coffee was 991 and the coverage of this protein is of 93%. The deduced protein starts with a methionine residue. We also see one selenocysteine residue aligned with the query.

Now regarding the SECIS element, only one was found for this protein in the positive strand and in the 3’UTR. Therefore, according to the given results we can accept that Craseonycteris thonglongyai has selenoprotein SelO.

selO


SelN

The human protein SelN was aligned with 1 scaffold for the Craseonycteris thonglongyai genome: PVKE010010817.1. The resulting gene was located in the forward strand and it had 13 exons if alternative splicing exons were not taken into account.

The score of the alignment that we obtained with t-coffee was 988 and the coverage calculated was 85%. The predicted protein starts with a methionine but the C-terminus lacks of 56 lasts amino acids. Moreover, we see that one of the two selenocysteines found in the query is conserved in our organism.

Finally, one SECIS element could be predicted by using Sebastian, located in the same strand as our sequence and in the 3’ UTR region. Therefore, according to these results we accept that SelN is found in Craseonycteris thonglongyai and it is a selenoprotein.

selN


SelM

The human protein SelM was aligned with 1 scaffold for the Craseonycteris thonglongyai genome: PVKE010015265.1. The deduced gene was located in the reverse strand and it had 5 exons (shown in the figure below.)

The score of the alignment that we obtained with t-coffee was 970 and we see that the deduced protein is well aligned with the query (coverage 90%). The N-terminus of the deduced protein lacks of the 3 first aminoacids so it does not start with a methionine. Moreover, it was seen that the selenocysteine found in the query is conserved in our specie.

Now regarding the SECIS elements, we found only one SECIS element located in the 3'UTR region. However, it did not deduced any selenoprotein. However, according to the obtained results we accept that SelM is a selenoprotein conserved in Craseonycteris thonglongyai.

selM


SelK

PVKE010000173.1 was the scaffold with best results in TBLASTN according to our criteria. By exonerate, 5 exons were predicted and they were localized in the reverse strand (shown in the image below). However, one of its exons had a length of only 1 nucleotide and we discard it to be analysed.

The predicted protein had a Sec at the C- terminus of the sequence, and it aligned with the Sec in the query sequence. The T-Coffee score was 100 and the alignment had a very high coverage, as only three mutations could be observed, showing a coverage type II of 100%.

By Seblastian, a SECIS element was predicted at 3’ UTR. SelK was also predicted by the server and it consisted in the same number of exons in the same positions from our prediction. Therefore, according to the given arguments we accept that the selenoprotein SelK is conserved in Craseonycteris thonglongyai.

selK


SelI

When TBLASTN was performed, two scaffolds aligned with two consecutive sequences of the human protein: PVKE010011700.1 (S1) covers until the position 233, and PVKE010039195.1 (S2), from 228 to the end of the query sequence. Together, they cover the totality of the protein sequence without a lot of overlapping, and we hypothesized that the gene sequence could be found fragmented in the two scaffolds. Both sequences were located in the forward strand (shown in the image below). The quantity and position of the exons predicted by exonerate coincided in both scaffolds with the number and positions of the different hits from TBLASTN.

When we aligned the scaffolds independently with T-Coffee, they showed a score of 993 in S1 and 997 in S2. S1 aligned with the first half of SelI, until the amino acid in the position 232, and S2 from 224 to 397.

By Clustal Omega, a multiple alignment was performed using a multifasta file with the query sequence and both predicted amino acid sequences to observe in a more clear way how both sequences completed the query protein, confirming the previous observations.

The predicted protein covered the beginning of the query and it started with Met. A Sec was predicted in the scaffold that covered the C- terminus of the protein, and it aligned with the Sec in the query sequence. Coverages were higher than 90%. Performing Seblastian from the fastasubseq file from the scaffold PVKE010039195.1, a SECIS element was predicted, as well as a selenoprotein. In order to use get a better prediction of the selenoprotein, fastasubseq files from both scaffolds were combined and runned in the server. The same SECIS element was predicted and the predicted selenoprotein had better quality, showing a more accurate alignment.

Therefore, according to the given arguments we accept that the selenoprotein SelI is conserved in Craseonycteris thonglongyai.

selI


SelH

The human SelH sequence matched with the scaffold PVKE010003995.1. After, exonerate deduced a gene consisting of three exons, all in reverse strand (shown in the image below).

By using T-coffee we saw the alignment between predicted and reference protein had high identity, having a type II coverage of 94%. The predicted protein started with methionine residue and a C-terminus highly conserved. Selenocysteine has also been found in same place as the human SelH protein.

Finally, Seblastian tool deduced two SECIS elements. However only the one deduced in the same strand where the protein was predicted was accepted. Moreover it deduced a selenoprotein. Therefore, according to the given results we accept that SelH is a selenoproteine conserved in Craseonycteris thonglongyai.

selH


Sep15

From the TBLASTN output we obtained one scaffold that coincided with our criteria (PVKE010007457.1). The deduced gene was found in the forward strand and using exonerate, two exons were found (shown in the image below).

T-Coffee tool indicates high score and good alignment. The predicted protein starts and ends with the same amino acids as the reference protein. The coverage was 91%. Two selenocysteine were found in the predicted protein: one in the same position as human protein and a second in the C-terminus of the predicted protein.

Using Seblastian, one SECIS element were found in 3’ UTR of the forward strand. It also predicted a selenocysteine (U) in the sequence in forward strand in the same position as the reference protein. However, the second U on the predicted protein that we detected in the T-Coffee output is not found. Considering these facts, we consider our analysis confirm the presence of Sep15 selenoprotein in the studied Craseonycteris thonglongyai.

sep15


MsrA

In this case, we run all the scaffolds obtained in TBLASTN to check if the alignments with the reference sequence were good enough. To ease the analysis, a clustal omega was prepared. By comparing the alignments we considered 3 scaffolds that matched the mostly with the human protein: PVKE010012352.1, PVKE010003299.1 and PVKE010096847.1.

After analysing the data using Exonerate, the considered gene had 5 exons: 1 from PVKE010012352.1, in forward strand, 3 from PVKE010003299.1 and 1 from PVKE010096847.1, both in reverse strand.

Revising T-coffee results of the scaffolds, the coverage type II was considerably low when scaffolds were analyzed individually (20% in PVKE010012352.1, 35% in PVKE010003299.1 and 23% in PVKE010096847.1). Type II coverage was higher (78%) if we considered the multiple alignment made with the three scaffolds. Nevertheless, there was a 100 amino acid sequence that was not aligned with any scaffold found. Revising the database SelenoDB we saw that the cysteine resulting from de Sec to Cys transformation was found in this gap. Therefore we see that our prediction has a really low accuracy. Moreover in scaffold PVKE010003299.1 it was found one Sec that was not aligned with the query.

Finally, Seblastian did no deduced any SECIS element. Thus, according to the given results, we can say that our prediction is not conclusive and that it should be further analyzed to determine if it is a selenoprotein in Craseonycteris thonglongyai.

msra


Glutathione Peroxidases (GPx)

    GPx1:

According to our criteria, one prediction for GPx1 was found. The predicted gene was found in scaffold PVKE010063236.1, in the reverse strand, and it is compound of two exons (shown in the image below).

T-Coffee results show a good identity and the predicted protein has almost the same length as the human reference protein (coverage type II of 95%). However, it can be seen that the predicted protein misses the first six amino acids. Therefore, it does not have a methionine in the N-terminus. This could be explained because exonerate has not predicted properly the gene. Hence, these residues are lacking. A selenocysteine was found in the same position in both proteins.

Finally, two SECIS elements were found, one in the forward strand and one in the reverse strand. Because the deduced gene is found in the reverse strand, we have chosen the SECIS element found in the reverse strand. Moreover, this element is found in the 3’ UTR. Seblastian also predict a selenocysteine in our sequence, also in the reverse strand. Therefore, for all the given arguments, we can accept our prediction and we can confirm that Craseonycteris thonglongyai has GPx1.

gpx1


    GPx2:

For the protein GPx2 two alignments in the scaffold PVKE010009682.1 were found using our criteria. Therefore, the predicted gene was found in the forward strand and it has two exons (shown in the image below).

A good identity score is found when T-coffee results are analyzed. Moreover, all the deduced protein is found to be aligned with the reference one (coverage type II of 99%). Therefore, a methionine is found in the N-terminus and a Selenocysteine is observed in the same place as in the human protein.

Seblastian results predicted a SECIS element in the forward strand and in the 3’ UTR of the predicted gene. Moreover, the program deduced a selenocysteine in the predicted sequence. Thus, taken all results together it can be said that Craseonycteris thonglongyai has GPx2.

gpx2


    GPx3:

Two scaffolds aligned with the protein sequence and PVKE010005934.1 was chosen to be studied, as it had smaller e-values and higher identity. The aligned sequence was located in the reverse strand and it had 5 exons (shown in the image below).

T-Coffee score was 997, showing a strong alignment between both sequences. The N- and C- terminus are high conserved, and our predicted protein starts in Met (coverage Type II of 97%). A selenocysteine was predicted, aligning with the Selenocysteine in the human protein.

Seblastian predicted 1 SECIS element in the 3’ UTR of the reverse strand, but it did not predict any selenoprotein.

Therefore, according to the given arguments we accept that the selenoprotein GPx3 is conserved in Craseonycteris thonglongyai.

gpx3


    GPx4:

PVKE01004804.1 was chosen to be analysed because of its small e-values (between e-31 and e-41), even though two of its four hits showed an identity of only 50% and 66% (the other identities were higher than 85%). The aligned sequence was located in the reverse strand and it had 7 exons (shown in the image below).

T-Coffee showed a score of 1000, with coverages higher than 90%. A Selenocysteine was predicted in the position 73, assembling with the same amino acid in the query protein and UGA was located at 43658-43656. The start codon was a Methionine.

Seblastian predicted two SECIS elements, but we decided to take as a valid result the one between the positions 42824-42754, as it had best grade and it was closer the predicted protein. No selenoprotein was predicted by Seblastian.

Therefore, keeping in mind all the given results, we accept that selenoprotein GPx4 is conserved in Craseonycteris thonglongyai.

gpx4


    GPx5:

When the program was asked to find alignments with our criteria, just one hit showed an identity greater than 90%. However, it was the same scaffold than the one chose in GPx6 to analyse, and it had better results in the TBLASTN with GPx6 as query. Because of this, we decided to search for alignments with identities higher than 65% to be more permissive and to search for alternatives scaffolds that could contain the protein of interest, and we manually chose the scaffold PVKE010005899.1. The predicted protein had 5 exons and the nucleotide sequence was located in the forward strand (shown in the image below).

The T-Coffee alignment had a score of 999 and the alignment was correct (coverage type II of 92%), even the N- terminus of the protein was not predicted and it did not start with methionine. Finally, any selenocysteine was predicted in the protein, nor the query protein had this amino acid, as it was converted to a Cys.

We could not find SECIS elements by Seblastian. Therefore, according to the given results, we can accept that this protein, as it has happened in Homo sapiens, has undergone a Selenocysteine to Cysteine transformation.

gpx5


    GPx6:

PVKE010018851.1 was chosen to be analysed. The aligned sequence was located in the reverse strand and 5 exons were predicted (shown in the image below).

T-Coffee score was 1000. Although the deduced protein started with a methionine, neither the N- nor the C- terminus were found well conserved: the beginning showed some changes in the amino acid sequence, and 4 amino acid are lost at the end of the predicted protein compared with the query sequence. The coverages were between 79% and 91%.

Seblastian predicted a SECIS element in the reverse strand, located in the 3’ UTR of the sequence between 19771 and 19846, as well a selenoprotein. When the predicted selenoprotein by Seblastian and the one predicted by our program were compared together with the human protein, the first one did not show the first part of the sequence (27 amino acids).

Therefore, according to the given results, we can accept that the selenoprotein GPx6 is found conserved in Craseonycteris thonglongyai.

gpx6


    GPx7:

The scaffold chose to analyse was PVKE010001538.1, as its hits had very high identities (>93,3%) and very low e-values. It was located in the forward strand and it consisted in 3 exons (shown in the image below).

The T-Coffee score was 995, and the alignment observed had very high coverages (coverage type II of 98%), showing a strong assembly with the query protein. However, the predicted protein did not start with an methionine. Neither query nor the predicted sequence did not show Sec and Seblastian did not predict SECIS elements.

Therefore, according to the given results and we can accept that GPx7 has been found in Craseonycteris thonglongyai, but as in Homo sapiens it is not a selenoprotein.

gpx7


    GPx8:

Three hits were found in the scaffold PVKE010023323.1 with good identities (>82%) and e-values. The alignment was performed in the forward strand. By exonerate, three exons were found (shown in the image below).

T-Coffee results show a high coverage type II of 97% and the deduced protein is aligned in all its length with the reference protein. Therefore it starts with a methionine. Regarding to selenocysteines we did not found them in Homo sapiens either in Craseonycteris thonglongyai. Because of this, no SECIS elements were predicted by Seblastian.

Therefore, according to the given results, we can accept that GPx8 is found in Craseonycteris thonglongyai, but as in Homo sapiens it is not a selenoprotein.

gpx8


In order to confirm that the scaffolds that we manually chose were the more accurate to predict the GPx proteins, the phylogenetic relationship between the query proteins and all the scaffolds that match (identity > 60% and e-value < 0.001) with them were studied (Shown in the figure above).

gpx


It could be observed that, in all the cases, each human GPx protein that we used as a query had as a closer predicted protein the sequence that we chose because of its identity and e-value in the TBLASTN results.

Because of there were more scaffolds than query proteins, GPx1 and GPx8 had close more than one predicted protein. As a last step to confirm that the predicted proteins that we selected were the best matches, all the predicted proteins were studied and analysed one by one and query protein by query protein.

The multiple alignment and the phylogenetic analysis (Figure shown above) of GPx1 and all the predicted proteins from the scaffolds PVKE010063236.1, PVKE010008100.1 and PVKE010005383.1 reveled that the first one had a better alignment. The results of the different T-Coffee confirmed the observations. These observations agreed with the previous selection of the scaffold, confirming that the sequence in PVKE010063236.1 (the one that we chose) had the best prediction.

gpxa


In the case of GPx8, the results of the phylogenetic tree (shown below) were not as clear as the one with GPx1. However, the multiple alignment, as well as the individual results from T-Coffee, clearly showed that the predicted protein from the scaffold PVKE010023323.1 had the best alignment, the same sequence from our first results.

gpxb


To sum up, we can conclude that the family of GPx have been conserved in Craseonycteris thonglongyai.

Iodothyroine Deionidase (DI)

    DI1:

For protein DI1, according to our criteria, significant alignments were found in the scaffold PVKE010001208.1 and PVKE010040570.1. Specifically, two hits in the N-terminus of the query were found using the scaffold PVKE010001208.1 and two hits in the query C-terminus were found with the scaffold PVKE010040570.1. Since the hits only overlapped for four amino acids, it was hypothesized that in Craseonycteris thonglonyai, the protein was truncated between these two scaffolds. Therefore, the two scaffolds were considered and the resulting gene was located in the reverse strand and consisted of 4 exons, 2 exons predicted with each scaffold (shown in the image below). When the gene deduction was analyzed, it was also seen that our hypothesis was coherent since the exons were really near to the end of the scaffold (less than 5.000 bp) indicating that the remaining base pairs could be part of an intron which was divided between the two scaffolds.

When both T-coffee are analyzed, it can be seen that the alignment has a good identity in both cases. However, the coverage type II is low when only one scaffold is considered (58% for PVKE010001208.1 and 35% for PVKE010040570.1). Nevertheless, when we look at the multiple alignment done with the two scaffolds, the coverage type II is 93%. Moreover, we see that the predicted protein starts with a methionine and that the selenocysteine is conserved in the same place as in the human protein.

Finally, when Seblastian results are analyzed we can see that a SECIS element is deduced when the scaffold that codifies for the C-terminus of the protein is used. Although Seblastian does not predict a selenoprotein, according to the given results, it can be affirmed that DI1 selenoprotein is found in Craseonycteris thonglongyai and we accept that it is a protein that has been found truncated in two different scaffolds.

di1


    DI2:

According to our criteria, the gene found in the PVKE010012833.1 scaffold was selected since the values of its alignment with the human query were significant according to our criteria. The gene is found in the forward strand and it has two exons (shown in the image below).

T-coffee output shows a very good identity, because all the predicted protein can be aligned with the reference one obtaining a coverage type II of 98%. This means that in the N-terminus of the predicted protein we can find a methionine. Furthermore, a selenocysteine is observed in the predicted protein at the same place as in the human protein.

In this sequence a SECIS element was predicted in the forward strand, the same strand where the predicted gene was located. Moreover, the SECIS is found in the 3’UTR of the deduced gene. Although Selbastian did not predict any selenoprotien, keeping in mind all the given arguments, we accept accept our prediction and we confirm that Craseonycteris thonglongyai has DI2 selenoprotein.

di2


    DI3:

Tblastn ouput showed one significant alignment in scaffold PVKE010002073.1 according to our criteria. Therefore, we accepted this gene which was found in the forward strand and was compound of one exon (shown in the image below).

In T-coffee results we see a really good identity since the predicted protein is totally aligned with the reference one and the coverage type II is of 98%. The predicted protein has a methionine in the N-terminus and two amino acids are inserted in the middle of the sequence. A selenocysteine is found both in the predicted protein and in the reference one.

However, in this sequence no SECIS elements were found. Therefore, Seblastian did not deduced a selenoprotein in the sequence either. According to the given reasons we confirm that DI3 protein is found in Craseonycteris thonglongyai, but since any SECIS element is deduced,we cannot guarantee that it is a selenoprotein. Nevertheless, if we look at the exonerate output, we can see that the predicted exon is almost at the end of the scaffold. The end of the exon corresponds to position 87626 and the scaffold ends at 88200, that is only 576 bp later. Since some SECIS elements are found 5000 bp with respect to the end of the gene, we could thought that no SECIS element is predicted because it is found in other scaffold. Therefore, we think that DI3 cannot be either discarded as a selenoprotein in Craseonycteris thonglongyai.

di3


MACHINERY PROTEINS DISCUSSION

Selenophosphate synthetase (SPS)

    SPS1:

For this protein, only the PVKE010001866.1 scaffold was obtained with good identity and e-value. The predicted gene was found in the forward strand and eight exons were obtained by using Exonerate as shown in the figure below.

By using T-coffee we obtained a highly optimal alignment. The predicted protein starts with methionine and no Sec was found in predicted protein, just like in Homo sapiens.

By Seblastian, no SECIS elements were predicted. We conclude that SPS1 is found in Craseonycteris thonglongyai and as in Homo sapiens is part of the selenoproteins machinery.

SPS1


    SPS2:

PVKE010007810.1 was the only scaffold that aligned with good identity and e-value with the query sequence. Only one exon in the reverse strand is found, as shown in the figure.

The score obtained from T-coffee was 998, therefore, the alignment is highly optimal. The predicted protein starts with methionine. One Sec is found for our tested specie as well as for Homo sapiens, meaning that this may be a well conserved selenoprotein.

One SECIS element was found in the negative strand by the use of Seblastian.According to our prediction, SPS2 appears to be conserved in Craseonycteris thonglongyai.

SPS2


eEFSec

According to our criteria, in this case we decided to study three Tblastn predictions since alignments with similar identities were found in three different scaffolds. Moreover, it was observed that these scaffolds were aligned with different parts of the protein of reference, so it was hypothesized that the protein could be truncated in Craseonycteris thonglongyai. Because of this, scaffold PVKE010007345.1, PVKE010075947.1 and PVKE010012895.1 were ran in our program. The deduced gene was found in reverse strand when scaffold PVKE010007345.1 and PVKE010075947.1 were used, whereas in scaffold PVKE010012895.1 was found in the forward strand. It also was seen that deduced gene with scaffold PVKE010007345.1 had 4 exons, whereas deduced genes using scaffolds PVKE010075947.1 and PVKE010012895.1 had 2 exons (shown in the image below).

Looking at T-coffee results great identity values were found in three cases. In the case of PVKE010012895.1, the predicted protein was completely aligned with human protein in the first 107 amino acids and also had a methionine in the N-terminus. With the predicted sequence using scaffold PVKE010075947.1 the alignment lenght was 73 amino acids and it was found after the alignment seen with the scaffold PVKE010012895.1. Finally, when scaffold PVKE010007345.1 was used, the alignment length was about 310 amino acids and again it was found after the alignment seen with the scaffold PVKE010075947.1 but in this case there was a gap of 87 amino acids. Therefore, we can see that using the three deduced genes using the three different scaffolds almost all the human protein is covered (coverage type II of 78%). As expected, no selenocysteine was found in human protein either in the deduced one because it is part of the machinery.

Finally, with Seblastian, only SECIS elements were found when the deduced sequence from PVKE010012895.1 scaffold was analyzed. It was found one SECIS element in the forward strand and another in the reverse strand. Since the deduced part of the gene using PVKE010012895.1 scaffold was in the positive strand, the SECIS element found in the reverse strand was immediately discarded. The other SECIS element was also discarded because it was not in the 3’UTR of the sequence. Therefore, according to our hypothesis, Seblastian did not predict any selenoprotein. Thus, according to the given arguments, we can accept that protein eEFSec is conserved protein in Craseonycteris thonglongyai in which will also act as a protein of selenoprotein biosynthesis machinery and at the same time, we conclude that this protein is found truncated in three different scaffolds.

eEFSec


SBP2

According to our criteria, 4 scaffolds were considered due to matching with our initial criteria: PVKE010008993.1, PVKE010000153.1, PVKE010037555.1 and PVKE010000525.1. After running them with our algorithm, it was hard to tell apart which would be useful by using t-coffee. Therefore, a multialign was performed with the predicted proteins using the scaffolds mentioned before and the protein of reference.

The result showed two scaffolds that aligned highly with the human SBP2 sequence: PVKE010037555.1 for the N-terminus, and PVKE010008993.1 for the C-terminus. Since the sequences of these scaffolds overlap for seven amino acids, we consider that the predicted protein is truncated.

The considered gene was formed by 17 exons: 10 exons from PVKE010037555.1, in reverse strand, and 7 exons from PVKE010008993.1, in forward strand. Nevertheless, we consider only 16 because one exon of PVKE010037555.1 was formed of one nucleotide.

These two scaffolds had considerably good scores in T-Coffee results. When calculating the coverage we saw that even though type II was low when considering separate scaffolds (43% for PVKE010037555.1 and 40% for PVKE010008993.1), taking the two scaffolds into account, the value amounted to 83%.

No SECIS elements were found using Seblastian tool, nor selenocysteine in its sequence. Therefore, we considered this protein is conserved in Craseonycteris thonglongyai as a protein of the machinery of selenoprotein biosynthesis.

sbp2


SecP43

We found a total of 30 scaffolds but only 3 of them met our criteria for being selected. After analysing them, we found that this machinery protein was truncated into the three scaffold selected, PVKE010014577.1, PVKE010053199.1 and PVKE010020672.1. All of them were located in the forward strand. The number of exons obtained for each scaffold were two, seven and two, respectively. All its coordinates are found in the scheme below.

All of them had a reasonably good score in its respectives T-Coffees. With the use of multiple sequence alignment we were able to confirm the initial hypothesis and to accept that our predicted protein was found truncated in three different scaffolds. The coverage calculated for the whole protein was 99%.

We found no selenocysteine residues in any of the three scaffolds and we didn’t find either any SECIS elements. Therefore, we concluded that SecP43 is conserved in Craseonycteris thonglongyai and it is a part of machinery of selenoprotein biosynthesis.

secp43


PSTK

We found a total of 2 scaffolds but only 1 of them met our criteria for being selected, therefore we selected scaffold PVKE010012691.1. This scaffold was located in the forward strand. This protein has 6 exons whose coordinates are shown in the figure below.

The score of the alignment that we obtained with t-coffee was 994 and the predicted protein is highly conserved between both species. The coverage calculated for the whole protein was 87%. Moreover, a selenocysteine was found in the predicted protein but not in Homo sapiens. However, this selenocysteine was found in a region of non-complete homology with the human sequence. This makes sense when regarding to the identity and e-value of the hit that corresponded to the C-terminus. This two elements did not meet with our criteria, therefore we can not accept a predicted protein with a non congruent Sec. Moreover, we did not find any SECIS element for this protein when using Seblastian. Therefore, we can conclude that the overall of this protein is conserved in Craseonycteris thonglongyai, except the C-terminus end, which has undergone some modifications.

PSTK


SecS

We found a total of 2 scaffolds and all of them met our criteria for being selected. After analysing them we hypothesised that this machinery protein was truncated into two scaffolds selected: PVKE010024052.1 and PVKE010021348.1. Both of them were located in the forward strand. The number of exons obtained for each scaffold were four and nine, respectively. All its coordinates are found in the scheme below.

Both of them had a reasonably good score in its respectives T-coffees (996 and 999). With the use of multiple sequence alignment we were able to confirm the initial hypothesis and to accept that our predicted protein was found truncated in three different scaffolds. The coverage calculated for the whole protein was 99%.

The query protein did not have selenocysteine in its sequence. However, we predicted a Sec when PVKE010024052.1 scaffold was used. Nevertheless, since the predicted Sec is in the region of overlapping between the both scaffolds, we consider that this prediction is not accurate. Moreover, although SECIS elements were found when Seblastian was used, since they were in the opposite strand from where the gene was found, they were not accepted.

Taking all that into account all the given argument, we consider that Sec deduction must not to be accepted because of its location and the lack of SECIS prediction. However, since the rest of the protein is well preserved we can conclude that SecS is a machinery protein whose function is conserved in Craseonycteris thonglongyai.

secs


ADDITIONAL INFORMATION

UGA-to-SECIS distances

The distances between the predicted SECIS elements and the UGA codons that codify for a Sec were studied to a better understanding of the Sec insertion process, as well as to analyse if the characteristics of the selenoproteome of Craseonycteris thonglongyai follows the defined features determined in other studies.

The minimal distance observed was 84 nucleotides in SelO, and the maximum 4871 in SelV, but there was high variability among the predicted proteins (Results are shown in the figure above). The average distance was 1192.

uga