Bioinformatics project UPF
Glandirana rugosa image

Glandirana rugosa

Abstract

Selenoproteins are proteins that contain selenocysteine (Sec, U), known as the 21st amino acid, which has an important role in redox homeostasis. That amino acid is encoded by the UGA codon, which normally signals translation termination. Thus, for the recognition of this codon as a selenocysteine codon, a SECIS element is required, a cis-acting stem-loop located in the 3′-UTR of selenoprotein mRNAs, immediately downstream of the UGA codon.


This study aims to annotate the selenoproteome and the machinery necessary for its synthesis of Glandirana rugosa, which is an endemic frog from japan. To archive that, a homology-based approach was used by making a comparison between Glandirana rugosa’s genome and Xenopus tropicalis’ selenoproteins found in SelenoDB 2.0. A program in Bash was created to automate the process of analysis, including different bioinformatic tools such as tblastn, exonerate and t-coffee. Additionally, programs such as Seblastian and GeneWise were also used to identify SECIS elements and to check the result.  

As a result, a global alignment of Xenopus’ selenoproteins and predicted G. rugosa’s proteins was obtained. Therefore, it can be concluded that the proteins which have selenocysteines are SecS80, PSTK, Sel15, SelT, SelI, GPx72, GPx73, TR02, GPx69 and DI62. Whereas, proteins which do not have selenocysteine are Fep15, TR04, SelK, GPx67, GPx68, GPx70, GPx71, TR01, TR03, TR04, DI63, DI64, MsrA74, MsrA75, MsrA76, SelR91, SelR92, SelR93, SelI, SelK, SelK, SelS, SelM, SelN, SelO_86, SelO_87, SelO_88, SelP_89, SelP_90, SelW00, SelW99, SelU96, SelU97, SelU98, Fep15, eEFsec05, eEFsec06, SBP2_77, SBP2_78 and SecS79. Finally, there are a group of proteins which do not exist in Glandirana rugosa: TR04, FrnE and SelK.

Introduction

Selenoproteins

Selenium (Se) is a trace essential element for humans, plants, and microorganisms. Inorganic selenium is present in nature in four oxidation states, and these forms are converted by biological systems into more bioavailable organic forms, mainly as the two seleno-amino acids selenocysteine and selenomethionine (1). Interest in selenium has considerably grown over the last decades due to the association of selenium deficiencies with an increased risk of several human diseases, including cancers, cardiovascular disorders and infectious diseases (2).



The discovery of genetically encoded 21st amino acid, selenocysteine (Sec or U), is a fascinating 
breakthrough in molecular biology. It is a structural and functional analogue of cysteine that has selenium 
(Se) instead of sulphur (S) in its residue.


Selenoproteins are proteins that incorporate selenocysteine, whose insertion is based on a non-canonical 
translational event, based on the recoding of a UGA codon in selenoprotein mRNAs, normally used as a stop codon 
in other cellular mRNAs. The choice between a stop codon and a selenocysteine codon depends upon the presence 
of a selenocysteine insertion sequence (SECIS element) (3). Selenoproteins have an important role in 
skeletal muscle regeneration, cell maintenance, oxidative and calcium homeostasis, thyroid hormone metabolism, 
and immune responses (4).


Selenoprotein biosynthesis

The biosynthesis of selenoproteins is unique, since the incorporation of Sec occurs co-translationally by the ribosome and not post-translationally.
Sec introduction requires special trans-acting protein factors, Sec-tRNA[Ser]Sec and a cis-acting Sec insertion sequence (SECIS) element.


Sec-tRNA[Ser]Sec

It is the own tRNA of Sec, which is the longest tRNA sequenced, and the key molecule and central component of selenoprotein biosynthesis. The gene for tRNA[Ser]Sec is Trsp, it is found in all three evolutionary lines of descent: eukarya, archaea, and eubacterya. Its transcription is regulated by three upstream regions: a TATA box motif, a proximal sequence element and a distal sequence element.


Once tRNA [Ser]Sec is transcribed, it is aminoacylated with Serine (Ser) by the enzyme 
Seryl-tRNA synthetase (SerS). Then, Ser-tRNA [Ser]Sec is phosphorylated by 
Phosphoseryl tRNA Kinase (PSTK), giving rise to PSer-tRNA [Ser]Sec. The last step is 
the formation of Sec-tRNA[Ser]Sec by Sec synthase (SecS), which incorporates the active 
form of Se. Selenophosphate made by Selenophosphate Synthetase (SPS2) is used as a donor of active Se. 
In addition to its role in Sec biosynthesis, SPS2 enzyme has been recently implicated in Cys biosynthesis 
(Figure 1).



Figure 1. Synthesis of Sec-tRNA[Ser]Sec (5).


SECIS element

The SECIS elements are cis-acting stem-loop located in the 3′-UTR of selenoprotein mRNAs, immediately downstream of the UGA that encodes for Sec. When a ribosome encounters the UGA codon, which normally signals translation termination, Sec machinery interacts with the canonical translation machinery to augment the coding potential of UGA codons and prevent premature termination. SECIS elements serve as the factors that dictate recoding of UGA as Sec. In response to the SECIS element in selenoprotein mRNA, Sec-tRNA[Ser]Sec, which has an anticodon complimentary to the UGA, translates UGA as Sec.


At least two trans-acting factors are required for efficient recoding of UGA as Sec in eukaryotes: 
SECIS binding protein 2 (SBP2) and Sec-specific translation elongation factor (eEFSec).

SBP2 is stably associated with ribosomes and contains a distinct L7Ae RNA-binding domain that is known to 
bind SECIS elements with high affinity and specificity. Aside from binding to ribosomes and SECIS elements, 
SBP2 also interacts with eEFSec, which recruits Sec-tRNA[Ser]Sec and facilitates incorporation 
of Sec into the nascent, growing polypeptide. Ribosomal protein L30 has been predicted to constitute 
a part of the basal Sec insertion machinery. Nucleolin and eIF4a3 serve as regulatory proteins that 
modulate synthesis of selenoproteins and may contribute to the hierarchy of selenoprotein expression 
(Figure 2).



Figure 2. Mechanism of Sec insertion (5).


Evolution and phylogeny

Proteins containing Sec are present in all three evolutionary lines of descent: eukarya, archaea, and eubacteria, and they were also observed in viruses.
However, selenoproteins were completely lost in fungi, higher plants, and some animal species, including beetles, silkworms, and several other insects. Interestingly, more than half of the identified selenoprotein families are present in both single-cell eukaryotes and vertebrates, indicating that they have an ancient origin (5).


Recent studies also showed that aquatic organisms generally have larger selenoproteomes than 
terrestrial organisms, and that mammalian selenoproteomes show a trend toward reduced use of 
selenoproteins. Various groups of terrestrial organisms reduced their utilization of Se by 
replacing selenoproteins with Cys-homologues or completely losing some selenoproteins. 

In our study, as we will focus on the Glandirana rugosa which belongs to frog lineage, 
it is important to highlight the Sec-to-Cys containing residue in defined selenoproteines, 
concretely SelU1, SelPb and Fep15 (Figure 3).



Figure 3. Evolution of the vertebrate selenoproteome (6).
The ancestral vertebrate selenoproteome is indicated in red. The ancestral selenoproteins found uniquely in vertebrates are underlined. The creation of a new selenoprotein (here always by duplication of an existing one) is indicated by its name in green. Loss is indicated in grey. Replacement of Sec with Cys is indicated in blue (apart from SelW2c in pufferfish, which is with arginine). Events of conversion of Cys to Sec were not found. On the right, the number of selenoproteins predicted in each species is shown.


Selenoprotein families

Sec is the main element in charge of the physiological functions of selenoproteins. It is found on the active site of the enzyme and perform catalytic redox reactions. The classification of selenoproteins bases on their function is the following:


Glutathione peroxidases (GPxs)

GPxs are well known to be the major components of antioxidant defence. Moreover, they are involved in hydrogen peroxide (H2O2) signalling, detoxification of hydroperoxides, and maintaining cellular redox homeostasis.


In humans, there are now five Sec-containing GPxs: the ubiquitous cytosolic GPx (GPx1), 
the gastrointestinal-specific GPx (GPx2), the plasma GPx (GPx3), the ubiquitous 
phospholipid hydroperoxide GPx (GPx4), and the olfactory epithelium- and embryonic 
tissue-specific GPx (GPx6). The Cys-containing GPx homologs also prevail in bacteria, 
protozoa, fungi, and terrestrial plants.

GPx1–3 catalyze the reduction of hydrogen peroxide and organic hydroperoxides, whereas GPx4 
can directly reduce phospholipid and cholesterol hydroperoxides.

The catalytically active Sec is normally located at the N-terminal end of the helix. The 
conserved catalytic triad of all these GPxs contains Sec, Gln, and Trp.


Thioredoxin reductases (TRs)

TR is the only enzyme able to reduce oxidized Thioredoxin (Trx). In addition, the Trx system participates in many cellular signalling pathways by controlling the activity of transcription factors containing critical cysteines in their DNA-binding domains, such as NF-κB, AP-1, p53, and the glucocorticoid receptor.


TRs in mammalian cells are members of the pyridine nucleotide-disulphide oxidoreductase 
family, and three TRs have been identified in mammals: TR1 in the cytosol/nucleus, 
TR2 in mitochondria and thioredoxin glutathione reductase in the testis, with 
the latter also possessing glutathione and glutaredoxin reductase activity.


Iodothyronine deiodinases (DIs)

Three DIs have been identified with a tissue and subcellular localization, which are involved in regulation of thyroid hormone activity by reductive deodination.


DIO1 is found primarily in the liver, kidney, and thyroid; DIO2 is in the brain, 
pituitary, thyroid, skeletal muscle, and brown adipose tissue; and DIO3 is found in the 
cerebral cortex and skin and is expressed at a very high level in the placenta and pregnant uterus.

DIO1 and DIO2 catalyse the deiodination of T4, the major thyroid hormone secreted by the thyroid 
gland, into the active hormone T3; DIO3 converts T4 into reverse T3 and T3 into 3,3′-diidothyronine. 
DIO1 and DIO2 can also convert reverse T3 into 3,3′-diidothyronine. DIO1 and DIO3 are localized at 
the plasma membrane, whereas DIO2 resides in the ER membrane.


Methionine-R-sulfoxide reductase 1 (Msr)

MsrB1 (methionine-R-sulfoxide reductase B1) was initially identified as selenoprotein R (SelR) and selenoprotein X (SelX) by searching for putative SECIS element structures in EST databases. Later, this protein was found to function as a stereospecific methionine-R-sulfoxide reductase, which catalyses repair of the R enantiomer of oxidized methionine residues in proteins. Based on its functional similarity to methionine-S-sulfoxide reductase A (MsrA), which catalyses reduction of the other isomer, this selenoprotein was renamed MsrB1.


MsrB1 and MsrA are structurally different and have no sequence similarity, but they have complementary 
functions.


Selenophosphate synthetase 2 (SPS2)

SPS2 catalyses the synthesis of the active Se donor selenophosphate that is necessary for Sec biosynthesis.


All vertebrates possess Sec-containing SPS2, whereas in lower eukaryotes the active-site Sec residue 
in SPS2 is replaced with Cys. Because SPS2 is a selenoprotein in vertebrates, it was proposed to serve 
an autoregulatory role in selenoprotein synthesis.


Selenoprotein I (SelI)

SelI is found only in vertebrates. It is a transmembrane protein containing a highly conserved CDP-alcohol phosphatidyltransferase domain, which is present in choline (CHPT1) and choline/ethanolamine (CEPT1) phosphotransferases. CHPT1 and CEPT1 catalyse the last step in de novo synthesis of the two major phospholipids through the transfer phosphocholine and phosphoethanolamine groups to diacylglycerol from CDP-choline and CDP-ethanolamine, respectively.


Selenoprotein K (SelK) and Selenoprotein S (SelS)

SelK and SelS have different sequence, but they are studied together because they similar topology, including a single transmembrane domain in the NH2-terminal sequence; the presence of a glycine-rich (G-rich) segment and a characteristic location of Sec residues in the COOH-terminal end of the protein. They are found in ER membrane. Their exact function is unknown, but it is suggested that both are related to ER-associated degradation (ERAD) of misfolded proteins.


Selenoprotein M (SelM)

SelM is an ER-resident thiol-disulphide oxidoreductase that is highly expressed in the brain and bestows neuroprotective properties, preventing oxidative damage induced by H2O2. In humans, it seems to be a protector factor in Alzheimer's disease.


Selenopreotein N (SelN)

SelN is an ER-resident transmembrane glycoprotein that is highly expressed during embryonic development and in adult tissues, including skeletal muscle. It plays a role in the regeneration of skeletal muscle tissue after injury or stress. Mutations in the human SelN gene (also known as SEPN1) are associated with a group of early-onset muscle disorders known as SEPN1-related myopathies.


Selenoproteins O (SelO)

SelO are in mammalian and yeast mitochondria, engaged in redox interaction with an unknown protein through its CXXU motif. Redox regulation of protein function in mitochondria may involve kinase functions.


Selenoprotein P (SelP)

SelP is a selenoprotein with multiple Sec residues per protein subunit, 10 Sec residues in all, and is present in human plasma. The main role of SelP is the transport and delivery of selenium to the tissues. An additional role may be to serve as a heavy metal chelator or antioxidant.


Rdx family

The members of this protein family possess a thioredoxin-like fold and are characterized by the presence of a conserved Cys-x-x-Sec motif, and a conserved stretch of amino acids in the COOH-terminal portion of the protein with the tGxFEI(V) consensus sequence.



Selenoprotein W (SelW)

SelW is a small 9-kDa selenoprotein localized in the cytosol and is expressed at high levels in muscles and brain. It belongs to the stress-related group of selenoproteins as its expression is highly regulated by the availability of Se in the diet.



Selenoprotein T (SelT)

SelT is predominantly localized to the ER and Golgi and is ubiquitously expressed both during embryonic development and in adult tissues. Knockdown of SelT in mouse fibroblasts leads to decreased expression of extracellular matrix genes involved in cell structure organization and alters cell adhesion properties. In addition, the loss of SelT resulted in the upregulation of SelW.



Selenoprotein H (SelH)

SelH has a unique subcellular localization pattern and was found to localize specifically to the nucleoli. Expression of SelH is relatively low in adult mouse tissues, but is elevated during embryonic development. Similar to SelW, SelH is sensitive to dietary Se intake.


It was found that SelH specifically binds to sequences containing heat shock and stress response 
elements. Moreover, SelH possesses glutathione peroxidase activity and has been implicated in 
the regulation of transcription of a group of genes that are involved in de novo 
glutathione synthesis and phase II detoxification enzymes.



Selenoprotein V (SelV)

SelV is one of the least characterized selenoproteins. It recently evolved, most likely by duplication from SelW, and is found only in placental mammals. SelV expression is detected only in testes, and thus may be involved in male reproduction, but its specific function is not known.


Selenoproteins U (SelU)

SelU was firstly found in fish and also reported in birds and unicellular eukaryotes. In high mammalian species, such as humans and mice, all SelU proteins exist in Cys form. The function of SelU remains unclear.


15-kDa Selenoprotein (Sel15)

Sel15 is known in mammals as Sep15, which is together with Selenoprotein M (SelM) thioredoxin-like fold ER-resident protein. Sel15 involved in calnexin regulation cycle, which is essential for the folding process of some glycoproteins in the ER, and its expression is induced by misfolded proteins in ER. There is a protein called Fep15 (15-kDa selenoprotein-like protein) which it as a Cys-homologue in frog.


Known Selenoproteins Machinery
PTSK (Phosphoseryl tRNA Kinase)

This protein phosphorylates Ser-tRNA[Ser]Sec to create then the Sec-tRNA[Ser]Sec. The function and homology of this protein are conserved across archaea and eukaryotes that synthesize selenoproteins.



eEFSec (Sec-specific translation elongation factor)

eEFSec recruits Sec-tRNA[Ser]Sec and includes a Sec amino acid in a protein.



SBP2 (SECIS binding protein 2)

SBP2 promotes Sec incorporation by associating with SECIS elements and recruiting the eEFSec-selenocysteyl-tRNA[Ser]Sec complex to the ribosome.




SecS (Sec Synthase)

This enzyme incorporates the active form of Se to the Ser-tRNA[Ser]Sec phosphorylated to create the final Sec-tRNA[Ser]Sec.

Glandirana rugosa

The Japanese rugose frog, which has the scientific name Glandirana rugosa (Temminck & Schlegel, 1838), is an endemic species of Japan, belonging to the family Ranidae.

Taxonomy
Kingdom Animalia
Phylum Chordata
Class Amphibia
Order Anura
Family Ranidae
Genus Glandirana
Species G. rugosa
Geographic distribution

The species is distributed in Japan (north, central and southern Honshu Island, as well as in the Shikoku, Kyushu, Sado, Oki, Goto, Yakushima, Tsushima and Tanegashima Islands), north and southwest Korea and northeast China (Liaoning Provinces, Jilin and Heilongjiang Provinces).
It was also introduced to Hawaii at the end of the 19th century (10).


Description

The dorsal colouration of Glandirana rugosa is predominantly muddy brown, with many short ridges protruding from its back. The ventral colouration is pale yellow or greyish yellow. The size of males varies from 30 to 47 mm and females from 45 to 60 mm.
Sexual maturity is reached between 1 and 2 years after metamorphosis in males and 2 or 3 years later in females, with variations of at most one year after metamorphosis. Males can live for up to four years and females five years after metamorphosis (11).


Newborn tadpoles measure about 8 mm in total length and reach 38–80 mm before metamorphosis (12).

It feeds mainly on a wide variety of insects, but it is also known to consume arachnids, crustaceans, chilopods, 
diplopods, molluscs, oligochaetes and occasionally small frogs (13).


Habitat

In Japan, Glandirana rugosa is distributed mainly in lowland areas, with a wide habitat preference. It can be found in rice paddies, aqueducts, water reservoirs, ponds (natural and artificial) and along small streams of rapid flow.


In Hawaii, they are found in both lentic and lotic habitats. They are mostly found in low elevations, mainly 
in ponds and in mid-lifts, in clear streams. Above all, they prefer the streams of calm water or nearby ponds.

Hibernation takes place underwater in adult frogs, while tadpoles hibernate in the mud of the water 
bodies (14). 


Habitat invasion

In the northern province of Hokkaido, Japan, the species was accidentally introduced through the aquaculture production of carp in 1985. It has been reported to predate the terrestrial insects of Hokkaido, where it could pose a threat to the trophic networks and compete with native frog species.



Outside of Japan, it has been established in Hawaii, where it was introduced in the 1890s 
as pest control for introduced insects, and could have an impact on Newcomb's snail 
(Erinna newcombi), threatened and endemic, on the island of Kauai (15).


Sex determination

The sex determination of frogs is complicated. In some species, males have two distinct sexual chromosomes (a system of XX-XY sex chromosomes, such as mammals), while females of other species have two distinct sexual chromosomes (a system of ZZ-ZW sex chromosomes, such as birds). The fact that a species has a system XX-XY or ZZ-ZW has changed tens of times throughout the evolution of frogs.


The Japanese rugose frog (Glandirana rugosa) is an evolutionary witness to the 
remarkable complexity of the determination of the sex of frogs. Some populations have a 
sex chromosome system XX-XY, and others have a ZZ-ZW system. In central Japan, there are 
adjacent populations of rough frogs with different sexual chromosome systems. 
Recently, a hybrid area was discovered where these two populations are found. They found a ZZ-ZW system but also a hybrid sex chromosome system (Neo-ZW). As far as this hybrid population is concerned, the Z chromosome is derived from the Z chromosome of the ancient ZZ-ZW population and, on the other hand, from the Y chromosome of the 20-XY population. Instead, the hybrid population's W chromosome is surprisingly derived from the X chromosome of the XX-XY population (16).


Comments

Glandirana rugosa has sometimes been considered a single species, along with Glandirana emeljanovi, which is found on the East Asian mainland.
The two species are distinguished from others by their rough and uneven skin.

Methods

The aim of our project was to find the selenoproteins and cysteine-containing homologues or selenium machinery proteins in the organism Glandirana rugosa. Its genome was obtained from executing the following command in the shell:

$/mnt/NFS_UPF/soft/genomes/2021/Glandirana_rugosa/

The file which contains its genome is named as genome.fa.


Program

Firstly, we have chosen Xenopus tropicalis as our animal of reference, due to its close phylogenetic relationship with our organism.


A total of 46 selenoproteins of Xenopus tropicalis in FASTA format were downloaded 
from SelenoDB 2.0 which were used as 
queries to compare with G. rugosa’s genome. In each protein, we have also changed 
the amino acid U that represents the selenocysteine to an X, to not create errors in the 
following programs. 
The command line that we have used to archive that was: 

$sed s/U/X/g $p > $p

Then, the changed sequence of each selenoprotein was named as each protein’s name manually by us. The name was the one that was on the Xenopus protein annotation.


In order to characterize the selenoproteome of Glandirana rugosa and its machinery, an 
algorithm in Bash language was created in the Emacs GUI text editor which contains the 
following steps:

BLAST database

A BLAST database of the G. rugosa genome was made from the genome.fa file using the following command:

$makeblastdb -in genome.fa -dbtype nucl -out gr2.fa

TBlastn

Then, it ran TBLASTN, a program that compared the protein query to our genome of interest and gave us all the scaffolds found with their significant hits. All the scaffolds with an E-value bigger than 0.01 were discarded.

$tblastn -query $f -db $gr2in -outfmt 6 -evalue 0.01 -out $blastfile

Furthermore, we identified the start and the end of each scaffold and calculated the length.

$cut -f2 $blastfile | sort | uniq | while read scaffold; do
$start= grep $scaffold $blastfile | cut -f9-10 |sed 's/\t/\n/' | sort -n | head -1
$end= grep $scaffold $blastfile | cut -f9-10 |sed 's/\t/\n/' | sort -n | tail -1
$hit=$(echo $scaffold | cut -f 2 -d ' ')
$begin= "$(($start-$hit_offset))"
$length= "$(($begin + $hit_offset ))"

Genome.index

To extract the genomic sequence from the found region, firstly, we have created an index of the genome of G. rugosa by using:

$fastaindex ./genome.fa gr2.index

Fastafetch

Afterwards, with the following command, we extracted all the selected scaffolds from the G. rugosa genome, and the output was saved in a name_scaffold.fa file.

$fastafetch $genin $indexgr2in "$hit" > $fastfechfile

Fastasubseq

Once we extracted the scaffold, and we defined the start and length position, we obtained the region of interest (contig) with the fastasubseq program using the following command line:

$fastasubseq $fastfechfile $begin $length > $fastagenomfile

Exonerate

From the protein query and the genomic regions that we have extracted above, we will generate an annotation of the genes by using exonerate. In other words, we generated a FASTA file containing the exons of the predicted protein. Our command line was:

$exonerate -m p2g --showtargetgff -q $f -t $fastagenomfile | egrep -w exon > $exoneratefile>

FastaseqfromGFF

FasteseqfromGFF was used to generate a FASTA file with the cDNA of the exons of predicted protein from the .gff file provided by exonerate.

$fastaseqfromGFF.pl $fastagenomfile $exoneratefile > "${fastaseqdir}/fastaseq_${hit}.fa"

Fastatranslate

In this step, we have translated the cDNA nucleotide sequence into a protein sequence. For each cDNA, several possible proteins were generated by different reading frames. We have used the first option of the program (-F 1). This was done with the following command line:

$fastatranslate "${fastaseqdir}/fastaseq_${hit}.fa" -F 1 > "${fastaseqdir}/fastaseq_${hit}.aa.fa"

Changing * to X

In the protein sequences acquired by the previous step, the residue U was represented by a *. In order to compare it later with the query sequence from Xenopus tropicalis, the * has changed to an X.

$sed s/*/X/g "${fastaseqdir}/fastaseq_${hit}.aa.fa" > "${fastaseqdir}/fastaseq_${hit}.aa.x.fa"

T-Coffee

Finally, we have used T-Coffee to perform a global alignment between the predicted protein sequence and the query protein from Xenopus tropicalis. The command line used by us was:

$t_coffee $f "${fastaseqdir}/fastaseq_${hit}.aa.x.fa" > "${t_coffeedir}/${queryname}_${hit}.tc.fa" "${t_coffeedir}/${queryname}_${hit}.tc.aln" "${t_coffeedir}/${queryname}_${hit}.tc.html

GeneWise

To check the results obtained by Exonerate, we have also executed the program called GeneWise, which from the protein query and the genomic region we have extracted generate an annotation of the gene that gives rise to this protein.

$genewise -pep -pretty -cdna -gff $f $fastagenomfile > "${genwisedir}/${queryname}_${hit}.gw.fa"

SECIS prediction

Furthermore, we have used the SECISearch3 and Seblastian web server to predict SECIS elements in the 3’-UTR, and then to search upstream for selenoprotein coding sequences. As Seblastian does not recognize the other symbols from ambiguity code rather than N we solved it by substituting them by an N.

$sed s/*/N/g $fastagenomfile > "${seblastdir}/${hit}.fa

Once we have substituted the ambiguous bases for N, we have to manually select the proteins and go to the SECISearch3 website to predict the SECIS elements and the selenoprotein coding sequences.


Results

The following table shows, for each query, the reference species (Xenopus tropicalis), the Sec/Cys-homologue, its Tblastn output, the scaffold or scaffolds selected, the gene prediction (fastasubseq), the exonerate prediction for the protein (.gff), the protein prediction, the sequence alignment (T-coffee) for the prediction, the SECIS-element predictions and the Seblastian prediction for the selenoprotein.


	The abbreviations used in the Sec/Cys-homologue column are defined as follows:
	
  • Sec: Sec is found both in Xenopus tropicalis and Glandirana rugosa proteins
  • Sec gain: Sec is found in the G. rugosa's protein but not in the Xenopus’ protein
  • Cys: Cystein containing homologue
  • -: There is a gap where a Sec should be
  • /: Selenoprotein machinery and has no Sec
  • X: The protein is lost in G. rugosa

  
Protein Query Sec/Cys-homologue TBlastn Scaffold Gene Prediction Exonerate Protein prediction Sequence alignment GeneWise SECIS Seblastian
Glutathione peroxidases (GPxs)
GPx67 Xenopus tropicalis - BLSH010357311.1
GPx68 Xenopus tropicalis X BLSH010554733.1
GPx69 Xenopus tropicalis Sec gain BLSH010357311.1
GPx70 Xenopus tropicalis - BLSH010357311.1
GPx71 Xenopus tropicalis X BLSH010085861.1
GPx72 Xenopus tropicalis Sec BLSH010357311.1
GPx72 Xenopus tropicalis Sec BLSH010554733.1
GPx73 Xenopus tropicalis Sec BLSH010357311.1
Thihoredoxin reductases (TRs)
TR01 Xenopus tropicalis X BLSH010481670.1
TR02 Xenopus tropicalis Sec BLSH010481670.1
TR03 Xenopus tropicalis X BLSH010481670.1
TR04 Xenopus tropicalis X -
Iodothyronine deiodinase (DI)
DI62 Xenopus tropicalis - BLSH010015975.1
DI62 Xenopus tropicalis - BLSH010535000.1
DI63 Xenopus tropicalis X BLSH010226958.1
DI64 Xenopus tropicalis X BLSH010172101.1
Methionine-S-sulfoxide reductase 1 (Msr)
MsrA74 Xenopus tropicalis X BLSH010095144.1
MsrA75 Xenopus tropicalis X BLSH010095144.1
MsrA76 Xenopus tropicalis X BLSH010095144.1
Selenoproteins R (Sel R)
SelR91 Xenopus tropicalis X BLSH010417220.1
SelR92 Xenopus tropicalis X BLSH010417220.1
SelR93 Xenopus tropicalis X BLSH010331723.1
Selenoprotein I (SelI)
SelI Xenopus tropicalis X BLSH010070464.1
SelI Xenopus tropicalis X BLSH010528819.1
Selenoprotein K (SelK) and Selenoprotein S (SelS)
SelK Xenopus tropicalis X -
SelS Xenopus tropicalis X BLSH010486305.1
Selenoprotein M (SelM)
SelM Xenopus tropicalis X BLSH010143868.1
Selenoprotein N (SelN)
SelN Xenopus tropicalis X BLSH010384004.1
Selenoproteins O (SelO)
SelO_86 Xenopus tropicalis - BLSH010113357.1
SelO_87 Xenopus tropicalis X BLSH010391438.1
SelO_88 Xenopus tropicalis X BLSH010113357.1
Selenoproteins P (SelP)
SelP_89 Xenopus tropicalis X BLSH010100113.1
SelP_90 Xenopus tropicalis X BLSH010213157.1
Rdx family
SelW00 Xenopus tropicalis X BLSH010113654.1
SelW99 Xenopus tropicalis X BLSH010260171.1
SelT Xenopus tropicalis Sec BLSH010368923.1
SelT Xenopus tropicalis Sec gain BLSH010283353.1
Selenoproteins U (SelU)
SelU96 Xenopus tropicalis X BLSH010317410.1
SelU97 Xenopus tropicalis X BLSH010509110.1
SelU98 Xenopus tropicalis - BLSH010465673.1
15-kDa Selenoprotein (Sel15)
Sel15 Xenopus tropicalis Sec BLSH010355584.1
Fep15 Xenopus tropicalis X BLSH010059689.1
Known Selenoproteins Machinery
PSTK Xenopus tropicalis Sec BLSH010239871.1
eEFSec05 Xenopus tropicalis X BLSH010363128.1
eEFSec06 Xenopus tropicalis X BLSH010494947.1
SBP2_77 Xenopus tropicalis / BLSH010247575.1
SBP2_78 Xenopus tropicalis X BLSH010275778.1
SecS79 Xenopus tropicalis X BLSH010085618.1
SecS80 Xenopus tropicalis Sec BLSH010085618.1

Discussion

The proteins that we have obtained from SelenoDB 2.0 were numbered by families, but in various families we obtained more than one different protein for each family, which we have named by the name of the family and a number that corresponds to the last two digits of the query name of Xenopus tropicalis. In the case we had more than one scaffold protein that had exons, we chose those that present the minor e-value scaffold since it is statistically the most reliable.
Due to the problems mentioned above with the annotation of the Xenopus tropicalis proteins we will discuss our proteins with our own created annotation. It will be difficult to contrast our proteins with the literature hypothesis but even with this issue we will be able to study the differences between each family and them with Xenopus tropicalis proteins.
Of each SECIS element we have chosen the greater grade, being A the best, that is on 3’ position in the same strand as our exons of our scaffold.
In our study we studied the selenoproteins in Glandirana rugosa studying its homology with Xenopus tropicalis (frog) proteins. Our aim was to focus the analysis principally on Xenopus tropicalis, due to its closer phylogeny to Glandirana rugosa.
All the results obtained for each predicted protein were carefully analysed and discussed, paying special attention to the T-Coffee output and SECIS elements prediction.
The criteria for deciding whether a detected protein in Glandirana rugosa genome was a selenoprotein, a cysteine-containing homologue or neither of them was the following:

  • Selenoprotein: at least one selenocysteine (Sec, U) was detected in the T-Coffee output or a SECIS element was located at the 3'UTR region of the gene.
  • Cysteine-containing homologue: a cysteine (Cys, C) from Glandirana rugosa genome was aligned with a Sec from the Xenopus tropicalis genome.
  • Others: some proteins could have lost or replaced their Sec position for another amino acid other than Cys. Hence, these proteins are not considered selenoproteins nor Cys-containing homologues.

There is one protein, FrnE, a disulfide isomerase, which we don't have any information about its phylogeny in Xenopus tropicalis, family and didn’t match on the tblastn with the genome of Glandirana rugosa; so we descarted it. 
Even though half of mammalian selenoproteins function is still unknown, it has been seen that many of them have a role in redox regulation. The largest and the best-studied selenoprotein families are glutathione peroxidase (GPx), thioredoxin reductase (TR) and iodothyronine deiodinase (Dio) families, with 5, 3, and 3 Sec-containing genes in the human genome, respectively. Studies of selenoproteins have also been shown to be involved in cancer prevention, modulation of the ageing process, male reproduction, and immune response. However, selenoproteins such as SelH, SelI, SelM, SelN, SelS, SelT, SelW and Sel15 have been partially characterized with biological functions, or unknown functions, such as SelK and SelO, that have been less well studied.
After the split of amphibians, some selenoproteins, like SelU1, Fep15 and SelPb, suffered a Sec- to Cys-containing residue switch. In addition, frog is the only specie that presents both SelW2 and Rdx12 selenoproteins.
The smallest selenoproteome were predicted in frogs and in some mammals, with only 24 selenoproteins genes. Nevertheless, 21 selenoproteins were found in all vertebrates: GPx1-4, TR1, TR3, Dio1, Dio2, Dio3, SelH, SelI, SelK, SelM, SelN, SelO, SelP, MsrB1, SelS, SelT1, SelW1 and Sep15. This proves the high conservation of selenoproteins among vertebrate's evolution. In contrast, there are some selenoproteins that can only be found in certain lineages, because some were lost during evolution of a particular lineage and some replaced their Sec residue for a Cys-residue.
From this point, we will describe and discuss the results of our analysis, correlating them with the literature found upon Xenopus tropicalis selenoproteins.


Glutathione peroxidases (GPxs)

The glutathione peroxidase family (GPx) is the common name for a family of multiple isozymes that catalyze the reduction of H2O2 or organic hydroperoxides to water or corresponding alcohols using reduced glutathione (GSH) as an electron donor (H2O2 + 2GSH fi GS-SG + 2H2O). Aerobic reactions lead to the accumulation of reactive oxygen species that can be toxic to the cells. In this context, aerobic organisms have developed several n-enzymatic and enzymatic systems to neutralize these compounds.
After our results’ analysis, we could tree GPxs selenoprotein in our organism. Into to the information explained below we will expose if they have or not SECIS elements, their Seblastian prediction and if they present or not Sec residus.


GPx67

The GPx67 protein location is in the scaffold BLSH010357311.1 between positions 282868 and 283005 in the forward strand. There was no predicted exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPx67 protein found in SelenoDB. To find the gene structure, we analysed the exonerate file.
Regarding the SECIS element, a grade B SECIS was found in the 3'UTR region. Regarding the Seblastian there were no selenoproteins coding sequences. In this case, we did not observe selenocysteine.


GPx68

The GPx68 protein location is in the scaffold BLSH010554733.1 between positions 367617 and 367925 in the reverse strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPxs protein found in SelenoDB. To find the gene structure, we analysed the exonerate file.
The SECIS element predicted in the 3'UTR region was not useful for this protein, because the coordinates of the SECIS were not larger than the last exon. Regarding the Seblastian there were no selenoproteins coding sequences. In this case, we observed a loss of Xenopus tropicalis selenocysteine.


GPx69

The GPx69 protein location is in the scaffold BLSH010357311.1 between positions 282859 and 408144 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPxs protein found in SelenoDB. To find the gene structure, we analysed the exonerate file.
Sequentially, SECIS element of grade B, predicted in the 3'UTR region, was found but in the case of Seblastian we did not have any result. Nevertheless, in the t-coffee result we observed a sec gain.


GPx70

The GPx70 protein location is in the scaffold BLSH010357311.1 between positions 284272 and 284421 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPxs protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Sequentially, SECIS element of grade B, predicted in the 3'UTR region, was found. Finally, we did not have a Seblastian result and in t-coffee results we did not have a selenocysteines.


GPx71

The GPx71 protein location is in the scaffold BLSH010085861.1 between positions 1079646 and 1079795 in the reverse strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPx71 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


GPx72

The GPx72 protein location is in the scaffold BLSH010357311.1 between positions 282850 and 283005 in the forward strand; in this scaffold one exon was predicted. Another scaffold BLSH010554733.1 is located between positions 367623 and 374034 in the reverse strand, in which two exons were predicted.
These proteins were predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPx72 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
In the scaffold BLSH010357311.1, a B grade SECIS element was predicted in the 3'UTR region.

In the scaffold BLSH010554733.1, a B grade SECIS element was predicted in the 3'UTR region.

In both cases, regarding the Seblastian, there were no selenoprotein coding sequences. Nevertheless, in both scaffolds there was a sec-homologous selenocysteine.


GPx73

The GPx73protein location is in the scaffold BLSH010357311.1 between positions 282844 and 283005 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis GPx73 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Sequentially, SECIS element of grade B, predicted in the 3'UTR region, was found. Finally, no Seblastian result was obtained. Nevertheless, in t-coffee we predicted 2 selenocysteines, one of them was sec-homologous selenocysteine and the another one was a loss of selenocysteine in G. rugosa.


Thioredoxin reductases (TRs)

The thioredoxin reductases family (TR) is a protein family composed by flavoproteins, which function as homodimers, actively involved in redox regulation of cellular processes due to their capacity to control the redox status of thioredoxins. It is the only enzyme known to catalyze the reduction of thioredoxin (Trx) and, hence, is a central component in the thioredoxin system. This system is present in all living cells, and it also has an evolutionary history tied to metabolism and redox signaling.
After our results’ analysis, we could predict one TRs selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in our two of our four proteins. The other two have SECIS elements but neither have Sec residue, they have experimented a lost on this residue in three of them. The same two that have SECIS elements they also have Seblastian prediction.



TR01

The TR01 protein location is in the scaffold BLSH010481670.1 between positions 1521727 and 1595605 in the forward strand, in which 12 exons were predicted. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis TR01 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file. There was no SECIS element useful for our strand, because the coordinates of the SECIS were not larger than the last exon, and we did not detect any selenocysteine through Seblastian, and in the t-coffee we could observe a loss of X. tropicalis selenocysteine.


TR02

The TR02 protein location is in the scaffold BLSH010481670.1 between positions 1521727 and 1595605 in the reverse strand. The gene contains 14 exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis TR02 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file. A SECIS element of grade A was predicted in the 3'UTR region. Finally, the Seblastian was positive and we found a sec-homologous selenocysteine in t-coffee.


TR03

The TR03 protein location is in the scaffold BLSH010481670.1 between positions 1521730 and 1558217 in the reverse strand. The gene contains 5 exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis TR03 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file. Sequentially, a SECIS element of grade A predicted in the 3'UTR region was found. Moreover, the Seblastian result indicated the presence of selenoprotein coding sequences. In the t-coffee result, we found a loss of a X. tropicalis selenocysteine.


TR04

This protein was predicted blasting Glandirana rugosa genome against the i>Xenopus tropicalis TR04 protein found in SelenoDB. But we could not do the tblastn, so we assumed that this protein is not found in Glandirana rugosa.


Iodothyronine deiodinase (DI)

The iodothyronine deiodinases family (DI) is the general name for a family constituted of enzymes that catalyze the removal of iodine atoms from various thyroid hormones (Ths) in the thyroid gland and extrathyroidal tissues. They are responsible for both the activation and inactivation of these compounds, and are thus important regulators of TH actions.
After our results’ analysis, we could predict one DI selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in our two of our four proteins. The other two have SECIS elements but neither have Sec residue nor Seblastian prediction.



DI62

The DI62 protein location is in the scaffold BLSH010015975.1 between positions 251100 and 251343 in the reverse strand. The gene contains 1 exon. Another scaffold BLSH010535000.1 is located between positions 614022 and 614355 in the reverse strand, which contains 1 exon.
These proteins were predicted blasting Glandirana rugosa genome against the Xenopus tropicalis DI62 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
A SECIS element of grade B was predicted in the 3'UTR region of BLSH010015975.1. In this case, we observed a loss of Xenopus tropicalis selenocysteine.

A SECIS element of grade B was also predicted in the 3'UTR region of BLSH010535000.1. In t-coffee results we observed 2 selenocysteines, one of them there was a Xenopus tropicalis selenocysteine which is lost in Glandirana rugosa.

However, non-one of them have a positive Seblastian result.


DI63

The DI63 protein location is in the scaffold BLSH010226958.1 between positions 2016293 and 2093027 in the forward strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis DI63 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Moreover, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


DI64

The DI64 protein location is in the scaffold BLSH010172101.1 between positions 13299 and 35919 in the forward strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis DI64 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Moreover, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


-S-sulfoxide reductase (MsrA)

MsrA exists like a selenoprotein in some lower organisms, as bacteria or green algae, using a Sec-catalytic residue instead of Cys residue. They are also absent in many hyperthermophylic organisms because at higher temperatures, methionine sulfoxide reduction may not require catalysis. Apart from that, it has been seen that Msrs have roles in protecting cellular proteins from oxidative stress and through this function they may regulate lifespan in several model organisms.
Our results showed that the MsrA protein we predicted had no Sec residue, which concords to the literature.
After our results’ analysis, we could not predict MsrA selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in our 3 proteins. This means a loss of one of the most conserved families of selenoproteins in vertebrates in our organism.


MsrA74

The MsrA74 protein location is in the scaffold BLSH010095144.1 between positions 656655 and 690927 in the forward strand. The gene contains 4 exons.

This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis MsrA74 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
SECIS element of grade A, predicted in the 3'UTR region, was found. Regarding the Seblastian there were no selenoproteins coding sequences. Finally, when we analyzed the t-coffee there was any selenocysteine.


MsrA75

The MsrA75 protein location is in the scaffold BLSH010095144.1 between positions 656655 and 690933 in the forward strand. The gene contains 3 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis MsrA75 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
There was no SECIS element useful for our strand, because the coordinates of the SECIS were not larger than the last exon. Regarding the Seblastian there were no selenoproteins coding sequences. Finally, when we analyzed the t-coffee, there was any selenocysteine.


MsrA76

The MsrA76 protein location is in the scaffold BLSH010095144.1 between positions 656676 and 690933 in the forward strand.


This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis MsrA76 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
A SECIS element of grade A, predicted in the 3'UTR region, was found. In addition, we did not find a Seblastian result. Finally, when we analyzed the t-coffee there was any selenocysteine.


Selenoprotein R (SelR)

MrsB1 (SelR) is responsible for the reduction of methionine- R-sulfoxide residues in proteins, is a major MsrB in cytosol and nucleus in mammalian cells.
After our results’ analysis, we could not predict SelR selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in our 3 scaffolds. This means a loss of two of the most conserved families of selenoproteins in vertebrates in our organism.


SelR_91

The SelR_91 protein location is in the scaffold BLSH010417220.1 between positions 198407 and 198496 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelR_91 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Then, there was no SECIS element useful for our strand. Regarding the Seblastian there were no selenoproteins coding sequences.
Finally, when we analyzed the t-coffee there was any selenocysteine.


SelR_92

The SelR_92 protein location is in the scaffold BLSH010417220.1 between positions 196901 and 198505 in the forward strand. The gene contains 2 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelR92 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
A SECIS element of grade B was predicted in the 3'UTR region of BLSH010331723.1, that is for a reverse strand. In our case, this SECIS element was not useful, because we had a forward strand. Regarding the Seblastian there were no selenoproteins coding sequences. Finally, when we analyzed the t-coffee there was any selenocysteine.


SelR_93

The SelR_93 protein location is in the scaffold BLSH010331723.1 between positions 79399 and 185591 in the reverse strand. The gene contains 2 exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelR93 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
A SECIS element of grade B was predicted in the 3'UTR region of BLSH010331723.1, that is for a forward strand. In our case, this SECIS element was not useful, because we had a reverse strand. Regarding the Seblastian there were no selenoproteins coding sequences. Finally, when we analyzed the t-coffee there was any selenocysteine.


Selenoprotein I (SelI)

This protein is one of the more recently discovered selenoproteins. It is characterized by a highly conserved CPD-alcohol phosphatidyltransferase domain that is commonly encountered in choline phosphotransferases (CHPT1) and choline/ethanolamine phosphotransferases (CEPT1). Moreover, no Cys forms with homology to the SelI C-terminal extension were found.
After our results’ analysis, we could predict one posible SelI selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor Seblastian prediction in both of our scaffols. But we have found an SECIS element for one of them. We can assume that this is a mistake of our prediction of SECIS elements, so we can discard the possible selenoprotein. This means a loss of one of the most conserved families of selenoproteins in vertebrates in our organism.


The SelI protein location are in two scaffolds, on one hand the scaffold BLSH010070464.1 between positions 1113034 and 1148690, and on the other hand the scaffold BLSH010528819.1 between positions 554328 and 595850, in the reverse and forward strand respectively. The two scaffolds contain 7 exons.
These proteins were predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelI protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
A SECIS element of grade A was predicted in the 3'UTR region of scaffold BLSH010070464.1.


Several SECIS elements were predicted in the scaffold BLSH010528819.1. However, non-one SECIS element in the positive strand was correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not larger than the last exon.

In both cases, there were no positive Seblastian results, t-coffee results did not show any selenocysteine.


Selenopreotein K and Selenoprotein S (SelK and SelS)

SelK and SelS have different sequence, but they are studied together because they similar topology, including a single transmembrane domain in the NH2-terminal sequence; the presence of a glycine-rich (G-rich) segment and a characteristic location of Sec residues in the COOH-terminal end of the protein. They are found in ER membrane. Their exact function is unknown, but it is suggested that both are related to ER-associated degradation (ERAD) of misfolded proteins.
After our results’ analysis, we could not predict SelK SelS selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in two of our proteins. This means a loss of two of the most conservated families of selenoproteins in vertebrates in our organism.



SelK
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelK protein found in SelenoDB. But we could not do the tblastn, so we assumed that this protein is not found in Glandirana rugosa.

SelS
The SelS protein location is in the scaffold BLSH010486305.1 between positions 201803 and 201970 in the forward strand. The gene did not contain any exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelS protein found in SelenoDB. To find the gene structure we analyzed the exonerate file. Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.

Selenoprotein M (SelM)

SelM is an ER-resident thiol-disulfide oxidoreductase that is highly expressed in the brain and bestows neuroprotective properties, preventing oxidative damage induced by H2O2. In humans, it seems to be a protector factor in Alzheimer's diseases.
After our results’ analysis, we could not predict SelM selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in two of our 3 proteins. This means a loss of one of the most conserved families of selenoproteins in vertebrates in our organism.



SelM

The SelM protein location is in the scaffold BLSH010143868.1 between positions 27817 and 31294 in the reverse strand. The gene does not contain exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelM protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


Selenoprotein N (SelN)

The selenoprotein N (SelN) is an ancestral selenoprotein found in all the vertebrates. This eukaryotic selenoprotein is located basically in the ER membrane. It has a high expression in fetal and growing muscular tissue, skeletal muscle, heart, lung and placenta.
After our results’ analysis, we could not predict SelN selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in two of our 3 proteins. This means a los of one of the most conserved families of selenoproteins in vertebrates in our organism.


SelN

The SelN protein location is in the scaffold BLSH010384004.1 between positions 1492669 and 1494354 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelN protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Then, several SECIS elements of grade B predicted in the 3'UTR region were found. But non-one SECIS element in the positive strand was correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not larger than the last exon.
Finally, in t-coffee results, there was any selenocysteine.


Selenoprotein O (SelO)

The SelO family is the common name for a family of ancestral selenoproteins found in all the vertebrates. These isozymes are localized to mitochondria and expressed in different tissues. This expression is affected by the deficiency of selenium, suggesting that it has a high priority for selenium supply. Although, there is a majority of eukaryotes and bacteria that have a single-copy protein of SelO, many metazoans have duplicated it.
Additionally, this phenomenon is also observed in specific lineages of bony fish such as zebrafish.
After our results’ analysis, we could predict one possible SelO selenoprotein in our organism. According to the information explained below we didn’t find a Sec residue nor SECIS element nor Seblastian prediction in two of our 3 proteins. But we can find one with a SECIS element, but it not has a Seblastian prediction nor the presence of Sec residue. This means that the SECIS element may be wrong and therefore we do not have any selenoprotein in this family.


SelO_86

The SelO_86 protein location is in the scaffold BLSH010113357.1 between positions 425632 and 458181 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelO86 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Finally, a SECIS element of grade A was predicted in the 3'UTR region. However, regarding the Seblastian there were no selenoproteins coding sequences, we also didn’t have t-coffee results.


SelO_87

The SelO_87 protein location is in the scaffold BLSH010391438.1 between positions 365955 and 397742 in the forward strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelO87 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


SelO_88

The SelO_88 protein location is in the scaffold BLSH010113357.1 between positions 425710 and 425883 in the forward strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelO88 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


Selenoprotein P (SelP)

SelP is a selenoprotein with multiple Sec residues per protein subunit, 10 Sec residues in all, and is present in human plasma. The main role of SelP is the transport and delivery of selenium to the tissues. An additional role may be to serve as a heavy metal chelator or antioxidant.
After our results’ analysis, we could not predict SelP selenoprotein in our organism. According to the information explained below we didn’t find a Sec residue nor SECIS element nor Seblastian prediction. This means a lost of one of the most conserved families of selenoproteins in our organism.


SelP_89
The SelP_89 protein location is in the scaffold BLSH010100113.1 between positions 3045 and 5427 in the forward strand. The gene does not contain exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelP_89 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file. Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.

SelP_90

The SelP_90 protein location is in the scaffold BLSH010213157.1 between positions 19583 and 20122 in the forward strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelP90 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, we cannot search for the SECIS element because there was no comparison between protein and indexed genome. Also, we didn’t obtain t-coffee results.


Rdx family

Selenoprotein W (SelW)

​​SelW is a small 9-kDa selenoprotein localized in the cytosol and is expressed at high levels in muscles and brain. It belongs to the stress-related group of selenoproteins as its expression is highly regulated by the availability of Se in the diet. Several SelW homologues were observed across non-mammalian vertebrates. Phylogenetic analysis revealed a distinct group of proteins, SelW2. It was described SelW2 as a selenoprotein in bony fishes, but also in frog and in elephant and shark, which suggests that it was part of the ancestral vertebrate selenoproteome.
After our results' analysis, we could not predict SelW selenoprotein in our organism. According to the information explained below, we didn’t find a Sec residue nor SECIS element nor Seblastian prediction. This means a lost of one of the most conserved families of selenoproteins in our organism.


SelW00

The SelW00 protein location is in the scaffold BLSH010113654.1 between positions 875118 and 875198 in the reverse strand. The gene does not contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelW00 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, we cannot search for the SECIS element because there was no comparison between protein and indexed genome. Also, we didn’t obtain t-coffee results.


SelW99
The SelW99 protein location is in the scaffold BLSH010260171.1 between positions 426828 and 430790 in the forward strand. The gene does not contain exons. This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelW99 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file. Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.

Selenoprotein T (SelT)

SelT is predominantly localized to the ER and Golgi and is ubiquitously expressed both during embryonic development and in adult tissues. Knockdown of SelT in mouse fibroblasts leads to decreased expression of extracellular matrix genes involved in cell structure organization and alters cell adhesion properties. In addition, the loss of SelT resulted in the upregulation of SelW. This protein is also important in neuroendocrine processes, as it contributes to the homeostasis of the intracellular calcium and secretion of hormones.
After our results’ analysis, we could predict two diferent SelT selenoproteins in our organism. According to the information explained below we found Sec residues on both of our scaffolds, this indicated with the present SECIS element and the prediction of the Seblastian that SelT is a preserved selenoprotein in our organism. Furthermore we can postulated that in our organisme it has suffered a duplication because we can find to different scaffolds from the same inicial protein that was present on Xenopus tropicalis.


The SelT protein location is in two scaffolds, first in the scaffold BLSH010368923.1 between positions 61138 and 61662, and second BLSH010283353.1 between positions 692580 and 692314, in the forward and reverse strand, respectively. The two scaffolds contained 1 exon.
These proteins were predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelT protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
In the case of BLSH010368923.1, a SECIS element of grade A was predicted in the 3'UTR region. In addition, the Seblastian result concurred with the SECIS prediction. Finally, in the t-coffee result we observed a sec-homologous cysteine between Glandirana rugosa and Xenopus tropicalis.


In the case of BLSH010283353.1, a SECIS element of grade B was predicted in the 3'UTR region. However, regarding the Seblastian there are no selenoprotein coding sequences.
Finally, in t-coffee results we observed 4 selenocysteines, one of them was a sec-homologous selenocysteine and another one was a cysteine conversion into selenocysteine.


Selenoprotein U (SelU)

The selenoprotein U family (SelU family) is composed of three members (SelU1, SelU2 and SelU3). The SelU1 selenoprotein is the only member of the family that takes part of the ancestral selenoproteome. A relevant finding in prior phylogenetic analysis of Sec- and Cys-containing forms of the SelU family, suggested that all Sec-containing SelU sequences belong to the SelU1 group. Concretely, in mammals there are three Cys-containing SelU proteins (SelU1-3), while in some fishes there are three Sec-containing SelU proteins.
After our results analysis we could not predict SelU selenoprotein in our organism. According to the information explained below we didn’t find a Sec residue, even though there is a different grade of SECIS element.


SelU96

The SelU96 protein location is in the scaffold BLSH010317410.1 between positions 95126 and 107060 in the forward strand. The gene doesn't contain exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelU96 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, neither the SECIS element nor Seblastian nor t-coffee results were obtained.


SelU97

The SelU97 protein location is in the scaffold BLSH010509110.1 between positions 1433442 and 1438867 in the forward strand. The gene contains 3 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelU97 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Then, several SECIS elements of grade B y C were predicted in the 3'UTR region. However, non-one SECIS element in the positive strand was correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not larger than the last exon.
Finally, t-coffee results showed there was any selenocysteine.


SelU98

The SelU98 protein location is in the scaffold BLSH010465673.1 between positions 168103 and 228546 in the reverse strand. The gene contains 5 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SelU98 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Then, a SECIS element of grade A predicted in the 3'UTR region was found. Regarding the Seblastian, there were no selenoproteins coding sequences.
Finally, in t-coffee results showed there was any selenocysteine.


15-kDa Selenoprotein (Sel15)

Sel15

The selenoprotein 15 (Sel15) protein is an ancestral selenoprotein found in all the vertebrates. In 1998 the protein was identified in humans, by Gladyshev et al. But, its specific function remains unknown. It has been shown that Sel15 levels specially respond to selenium addition.
After our results' analysis, we could predict the Sel15 selenoprotein in Glandirana rugosa genome. The protein we predicted contains a Sec residue, and a SECIS element was found in the 3'UTR region of the gene, which concord with the Seblastian output. As it takes part of the common vertebrate selenoproteins, and it was previously well annotated in other databases, this protein was clearly predicted using Xenopus tropicalis as our protein source.


The Sel15 protein location is in the scaffold BLSH010355584.1 between positions 1428416 and 1458525 in the forward strand. The gene contains 3 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis Sel15 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Then, a SECIS element of grade A, was predicted in the 3'UTR region. In addition, we had a Seblastian result.
Finally, in this case, we also detected selenocysteine in the t-coffee result.


Fep15

Fep 15 is a 15 kDa selenoprotein-like protein in frog.
The Fep15 protein location is in the scaffold BLSH010059689.1 between positions 155503 and 182936 in the reverse strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis Fep15 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Then, a SECIS element of grade B was predicted in the 3'UTR region. But it was not correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not smaller than the exon.
Finally, when we analyzed the t-coffee there was any selenocysteine.

Therefore, it is not a selenoprotein.


Known Selenoproteins Machinery

PSTK (Phosphoseryl tRNA Kinase)

This protein phosphorylates Ser-tRNA[Ser]Sec to create the Sec-tRNA[Ser]Sec. The function and homology of this protein are conserved across archaea and eukaryotes that synthesize selenoproteins. For years, identification of the phosphoseryl kinase remained unknown; however, recent in silico analyses of the archeal and eukaryotic genomes for novel kinase-like genes that are present within genomes containing the Sec incorporation genes revealed a candidate, phosphoseryl-tRNA[Ser]Sec kinase gene. Phosphoseryl-tRNA[Ser]Sec kinase was subsequently cloned and characterized as a protein that phosphorylates the seryl moiety on seryl-tRNA[Ser]Sec in the presence of ATP and Mg2+. Moreover, the function and homology of this protein is conserved across archaea and eukaryotes that sinthetise selenoproteins, a fact that suggests that it plays an important role in selenoprotein biosynthesis and/or regulation.
After our results' analysis, we could predict PSTK machinery protein in Glandirana rugosa genome. The protein we predicted contains a Sec residue, but no logic SECIS elements were found in the 3'UTR region of the gene, which concord with the negative Seblastian output.

The PSTK protein location is in the scaffold BLSH010239871.1 between positions 271099 and 1458525 in the forward strand. The gene contains 5 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis PSTK protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
We have found 9 SECIS elements of grade B and C. But non-one SECIS element in the positive strand was correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not larger than the last exon.
In t-coffee results we observed 4 selenocysteines, one of them was a cysteine conversion into selenocysteine.

Therefore, it is a selenoprotein, even though there are no positive results of SECIS and Seblastian.

eEFsec (Sec-specific translation elongation factor)

eEFSec recruits Sec-tRNA[Ser]Sec and includes a Sec amino acid in a protein. Including a Sec amino-acid in a protein requires a selenocysteyl-tRNA[Ser]Sec specific elongation factor in eukaryotes (eEFSec) and in prokaryotes (SELB or Efsec) that suppresses UGA codons that are upstream of Sec insertion sequence (SECIS) elements bound by SECIS-binding protein 2 (SBP2).
After our results’ analysis, we could predict eEFsec machinery protein in Glandirana rugosa genome. The proteins we predicted do not contain any Sec residue, and any SECIS elements were found in the 3'UTR region of both genes. The negative Seblastian output on both of our proteins might suggest a false positive on the SECIS prediction.




eEFsec05

The eEFsec05 protein location is in the scaffold BLSH010363128.1 between positions 553174 and 553299 in the forward strand. The gene contains 1 exon.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis eEFsec05 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, 5 SECIS elements have been predicted, but non-one SECIS element in the positive strand was correctly predicted in the 3 ’sequence of the protein, because the coordinates of the SECIS were not larger than the exon.
In this case, we observed a loss of Xenopus tropicalis selenocysteine since there was an alanine instead of selenocysteine.


eEFsec06

The eEFsec06 protein location is in the scaffold BLSH010494947.1 between positions 768085 and 939193 in the forward strand. The gene contains 5 exons.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis eEFsec06 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
Finally, 11 SECIS elements have been predicted, where there were 5 for the forward strand. Nevertheless, non-one SECIS element in the positive strand was correctly predicted in the 3’ sequence of the protein, because the coordinates of the SECIS were not larger than the last exon.
When we analyzed the t-coffee there was any selenocysteine.
Therefore, it is not a selenoprotein.


SecS (Sec synthase)

Selenocysteine synthase (SecS), it is an enzyme that incorporates the active form of Sec to the Ser-tRNA[Ser]Sec phosphorylated to create the final Sec-tRNA[Ser]Sec.
Mutations in SecS genes are associated with the development of autosomal-recessive progressive cerebellocerebral atrophy, and that phenotypes could be partially reproduced in the corresponding KO animal models.
After our results' analysis, we could predict two SecS machinery proteins in Glandirana rugosa genome. One of the proteins that we predicted contained a Sec residue, and a SECIS element found in the 3'UTR region, which did not concord with the negative Seblastian output. Both proteins have the same scaffold, and they are in the same positions. We deduce that they are the same protein that was duplicated in Xenopus tropicalis, but in our organism there is only one protein, so maybe it has succeeded a deletion of one of the copies of the gene of Xenopus tropicalis.



SecS79

The SecS79 protein location is in the scaffold BLSH010085618.1 between positions 73039 and 73184 in the forward strand, in which one exon was predicted.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SecS79 protein found in SelenoDB. To find the gene structure we analyzed the exonerate file.
A SECIS element of grade A was found in the 3'UTR region. Regarding the Seblastian, there were no selenoprotein coding sequences.
When we analysed the t-coffee, we observed that BLSH010085618.1 did not have a selenocysteine protein.

The proteinSecS79 does not contain a selenocysteine but it still contains a SECIS element. This suggests that nowadays SecS79 is not a selenoprotein.

SecS80

The SecS80 protein location is in the scaffold BLSH010085618.1 between positions 73039 and 73184 in the forward strand. One exon was predicted in this case.
This protein was predicted blasting Glandirana rugosa genome against the Xenopus tropicalis SecS80 protein found in SelenoDB. To find the gene structure, we analyzed the exonerate file.
Finally, a SECIS element of grade B was found in the 3'UTR region. Regarding the Seblastian, there were no selenoprotein coding sequences.
When we analyzed the t-coffee, we observed that BLSH010085618.1 had a selenocysteine protein which it was not homologous with Xenopus tropicalis protein.
In conclusion, this protein contains a selenocysteine and a SECIS element, so we think that Sec80 is a selenoprotein.

Conclusion

Selenoproteins have an important function, acting against oxidation. They are characterized because they have a selenocysteine residue (Sec) in its sequence, a 21st amino acid that contains a selenium atom. However, the selenoproteome differs between species and is not easily identified because the Sec residue is codified by UGA, a stop codon, leading to a low recognition rate by some bioinformatic tools.
For this reason, the objective of this project was to identify the selenoproteins and selenoprotein machinery required for their synthesis present in Glandirana rugosa's genome. To do that, selenoproteins of Xenopus tropicalis are used to do a homology-based prediction, which is the closest relative of our organism.
After having analysed the alignment of each query protein and predicted protein, we concluded that in Glandirana rugosa, there are:

  • Selenoproteins: SecS80, PSTK, Sel15, SelT, SelI, GPx72, GPx73, TR02, GPx69, DI62.
  • No selenoproteins: Fep15, TR04, SelK, GPx67, GPx68, GPx70, GPx71, TR01, TR03, TR04, DI63, DI64, MsrA74, MsrA75, MsrA76, SelR91, SelR92, SelR93, SelI, SelK, SelK, SelS, SelM, SelN, SelO_86, SelO_87, SelO_88, SelP_89, SelP_90, SelW00, SelW99, SelU96, SelU97, SelU98, Fep15, eEFsec05, eEFsec06, SBP2_77, SBP2_78, SecS79.
  • Proteins that do not exist in Glandirana rugosa: TR04, FrnE, SelK
A large amount of the predicted proteins did not start with methionine, which means that the beginning of the protein has been predicted correctly, this phenomenon mostly is because our organism Glandirana rugosa, although it belongs to the class of Amphibia, being an endemic frog to Japan, it may not share many characteristics with Xenopus. Another possibility is that the Xenopus proteins were not very well annotated in the database, and maybe in some cases it is better to use human selenoproteins to the prediction which are better annotated.
On one hand, during our analysis, we have observed several limitations. First of all, is that our proteins of Xenopus tropicalis were not correctly annotated therefore we could not discuss with more correlation to other species and the literature. Furthermore, in our process of automatization, we have several problems selecting the best read that conforms to the contigs/scaffolds of each protein of the Glandirana rugosa genome. Moreover, we have to manually change each of the coordinates of our exons from relative to absolute numbers. This may be improved for other projects.
On the other hand, Glandirana rugosa genome presents severe fragmentation. So, in many cases, we could not correctly predict the proteins due to the absence of some parts. As a consequence, we found several proteins divided in different scaffolds, thus, there are insertions and deletions that can cause frameshifts.
Another issue of our work was the interpretation of the results. We had to decide the exclusion criteria for choosing the best and most significant hits from the tblastn according to what we thought was correct. We did not take into account the identity of our scaffold, but we have taken into account the e-value that presents each read and the presence or not of exons for each scaffold.
A different limitation was related to possible sequencing problems of the genome, we found that in some regions there were multiple N instead of the corresponding nucleotide (A, C, T or G), which probably was the reason why we got some parts that were lacking in some predicted proteins.
Despite all our obstacles, in this project we have identified and annotated the selenoproteins, cysteine-homologues and machinery proteins of Glandirana rugosa genome. Therefore, this information could be useful in studies about selenoproteins in other species and for the whole study of selenoproteins. Furthermore, we believe that our results may need to be contrasted with molecular studies for more contrasted results.

Annexe

gr2in="/mnt/NFS_UPF/soft/genomes/2021/Glandirana_rugosa/genome.fa"
genin="/mnt/NFS_UPF/soft/genomes/2021/Glandirana_rugosa/genome.fa"
indexgr2in="/mnt/NFS_UPF/soft/genomes/2021/Glandirana_rugosa/genome.index"
mkdir -p results
#estamos en $HOME
for p in ./Proteins/*.fa ; do
sed s/U/X/g $p > $p
#done
for f in ./Proteins.fa/*.fa ; do
#f -> ./Proteins.fa/query.fa
#${string##substring} Deletes longest match of $substring from front of $string.
queryname=${f##*/}
#${string%%substring} Deletes longest match of $substring from back of $string.
queryname=${queryname%%".fa"}

	tblastdir="./tblast"
scaffdir="./results/$queryname/scaffolds"
exodir="./results/$queryname/exonerate"
fastaseqdir="./results/$queryname/fastaseq"
t_coffeedir="./results/$queryname/coffee"
genwisedir="./results/$queryname/genwise"
seblastdir="./results/$queryname/seblastian"
mkdir "./results/$queryname"
mkdir $tblastdir
mkdir $scaffdir
mkdir $exodir
mkdir $fastaseqdir
mkdir $t_coffeedir
mkdir $genwisedir
mkdir $seblastdir
#queryname -> query
blastfile="$tblastdir/$queryname.blast"
tblastn -query $f -db $gr2in -outfmt 6 -evalue 0.01 -out $blastfile
#query acc.verc 1
#query acc.verc 1
#subject acc.ver 2
#% identity 3
#alignment length 4
#mismatches 5
#gap opens 6
#q. start 7
#q. end 8
#s. start 9
#s. end 10
#evalue 11
#bit score 12
cut -f2 $blastfile | sort | uniq | while read scaffold; do
#SPP00002862_2.0 BLSH010355584.1:subseq(1420000,35000) 47.059 17 7 1 244 260 9288 9244 7.4 18
start=$(grep $scaffold $blastfile | cut -f9-10 |sed 's/\t/\n/' | sort -n | head -1 )
end=$(grep $scaffold $blastfile | cut -f9-10 |sed 's/\t/\n/' | sort -n | tail -1 )

	       hit=$(echo $scaffold | cut -f 2 -d ' ')
#start=$(echo $hit_data | cut -f 9 -d ' ')
#end=$(echo $hit_data | cut -f 10 -d ' ')

		if [[ $start -gt $hit_offset ]]; then
begin=$(($start-50000))
else
begin=0
fi
length=$(($begin+100000))
fastfechfile="${scaffdir}/${hit}.fa"
fastagenomfile="${scaffdir}/genomic_${hit}.fa"
exoneratefile="${exodir}/${hit}.exonerate.gff"
fastafetch $genin $indexgr2in "$hit" > $fastfechfile
fastasubseq $fastfechfile $begin $length > $fastagenomfile
sed s/*/N/g $fastagenomfile > "${seblastdir}/${hit}.fa"
exonerate -m p2g --showtargetgff -q $f -t $fastagenomfile | egrep -w exon > $exoneratefile
fastaseqfromGFF.pl $fastagenomfile $exoneratefile > "${fastaseqdir}/fastaseq_${hit}.fa"
fastatranslate "${fastaseqdir}/fastaseq_${hit}.fa" -F 1 > "${fastaseqdir}/fastaseq_${hit}.aa.fa"
sed s/*/X/g "${fastaseqdir}/fastaseq_${hit}.aa.fa" > "${fastaseqdir}/fastaseq_${hit}.aa.x.fa"
t_coffee $f "${fastaseqdir}/fastaseq_${hit}.aa.x.fa" > "${t_coffeedir}/${queryname}_${hit}.tc.fa" | rm *.html | rm *.aln
genewise -pep -pretty -cdna -gff $f $fastagenomfile > "${genwisedir}/${queryname}_${hit}.gw.fa"
done
done
exit

References

  1. Mangiapane E, Pessione A, Pessione E. Selenium and selenoproteins: an overview on different biological systems. Curr Protein Pept Sci. 2014 Sep 30;15(6):598–607.
  2. Vindry C, Ohlmann T, Chavatte L. Translation regulation of mammalian selenoproteins. Biochim Biophys Acta - Gen Subj. 2018 Nov 1;1862(11):2480–92.
  3. Clark DP, Pazdernik NJ. Protein Synthesis. Mol Biol. 2013 Jan 1;e250–5.
  4. Bellinger FP, Raman AV, Reeves MA, Berry MJ. Regulation and function of selenoproteins in human disease. Biochem J. 2009 Aug 15;422(1):11.
  5. Labunskyy VM, Hatfield DL, Gladyshev VN. Selenoproteins: Molecular Pathways and Physiological Roles. Physiol Rev. 2014 Jul 1;94(3):739.
  6. Mariotti M, Ridge PG, Zhang Y, Lobanov A V., Pringle TH, Guigo R, et al. Composition and Evolution of the Vertebrate and Mammalian Selenoproteomes. PLoS One. 2012 Mar 30;7(3) :e33066.
  7. Lu J, Holmgren A. Selenoproteins. J Biol Chem. 2009 Jan 9;284(2):723–7.
  8. Jiang L, Ni J, Liu Q. Evolution of selenoproteins in the metazoan. BMC Genomics. 2012 Sep 3;13(1):1–15.
  9. Han SJ, Lee BC, Yim SH, Gladyshev VN, Lee SR. Characterization of Mammalian Selenoprotein O: A Redox-Active Mitochondrial Protein. PLoS One. 2014 Apr 21;9(4):e95518.
  10. AmphibiaWeb - Glandirana rugosa [Internet]. [cited 2021 Nov 29]. Available from: https://amphibiaweb.org/cgi/amphib_query?query_src=aw_lists_genera_&where-genus=Glandirana&where-species=rugosa
  11. Khonsue W, Matsui M, Hirai T, Misawa Y. Age Determination of Wrinkled Frog, Rana rugosa with Special Reference to High Variation in Postmetamorphic Body Size (Amphibia: Ranidae). https://doi.org/102108/zsj18605. 2001 May 1;18(4):605–12.
  12. Takase M. Differences in genetic backgrounds affecting gonadal differentiation between two local populations of the Japanese wrinkled frog (Rana rugosa). Anat Embryol 1998 1982. 1998;198(2):141–8.
  13. Hirai T, Matsui M. Myrmecophagy in a Ranid Frog Rana rugosa: Specialization or Weak Avoidance to Ant Eating. Zoolog Sci. 2000;17(4):459–66.
  14. Jameson DL, Okada Y. Fauna Japonica/Anura (Amphibia). Copeia. 1968 Jun 5 ;1968(2):425.
  15. Rugosa rugosa (wrinkled frog) [Internet]. [cited 2021 Nov 30]. Available from: https://www.cabi.org/isc/datasheet/121512#tosummaryOfInvasiveness
  16. Ogata M, Lambert M, Ezaz T, Miura I. Reconstruction of female heterogamety from admixture of XX-XY and ZZ-ZW sex-chromosome systems within a frog species. Mol Ecol. 2018 Oct 1;27(20):4078–89.

About us

We are a 4th Human Biology students at Pompeu Fabra University. This is our project about Glandirana rugosa selenoproteins of Bioinformatics subject.

Yaiza Barrera

Human Biology student

Yichao Hong

Human Biology student

Laia Pareras

Human Biology student

Carla Sandi

Human Biology student

Contact

Lets get in touch and talk about your next project.