In search of selenoproteins in the world's smallest mammal
-Craseonycteris thonglongyai-
INTRODUCTION
SELENOPROTEINS
Selenium is an essential element in many organisms for multiple cellular functions, like development, immune function and delaying the progression of AIDS in HIV-patients. The deficiency of this metal is associated to heart disease, cancer, male infertility and other disorders (Rayman M et al, 2000).
The amino acid selenocysteine (Sec or U), known as the “21th amino acid”, is one of the main sources of Se in the cell, and it is encoded by UGA codon. Proteins that contain Sec in their structure are known as selenoproteins (Vindry C et al, 2018). This amino acid is a cysteine analogue that has selenium (Se) instead of sulfur (S) in its residue, and can be found in the three domains of life: Archaea, Bacteria and Eukarya, but not in every species (Mariotti M et al, 2008). In certain species, Sec is inserted during translation in the new polypeptide in response to UGA codon, that normally is an end-of-translation indicator. These organisms require a cis-acting Sec insertion sequence (or SECIS) element (Labunskyy V et al, 2014).
Regarding the functions of selenoproteins, the main activity is to break down organic compounds to obtain energy, but also antioxidant protection, immune responses and thyroid hormones metabolism.
There are two key RNA components for translational recoding of UGA codon as selenocysteine. The first one is the SECIS element, present in the 3’-UTR (untranslated region) of all selenoproteins’ mRNA, and it is sufficient to recode UGA as Sec. Its function is to help the messenger ribonucleoprotein particle (mRNP) assembly and allow the selenocysteine insertion (Vindry C et al, 2018).
The second essential component is the Sec-tRNA (Ser-Sec), that has an anticodon complementary to the UGA codon. Sec happens to be the only amino acid that has its own tRNA, designated Sec-tRNA, which is aminoacylated with serine (Ser) in a reaction by seryl-tRNA synthetase (SerS) to generate seryl-tRNA[Ser]Sec. The pathway of Sec biosynthesis was discovered using genomic, structural and molecular comparisons and functionality (Labunskyy V et al, 2014). Following on, we describe the elements involved in this process.
The main elements involved in selenoprotein biosynthesis are:
        -Sec tRNA (tRNA[Ser]Sec):
As it has been said, this tRNA is considered to be a key molecule of selenoprotein biosynthesis. Its sequence revealed characteristics that distinguished it from other tRNAs. Sec tRNA is longer (90 nucleotides) than other tRNAs (approximately 75 nucleotides), and has less modified bases and a long arm. Also, it has a long acceptor stem and long D-stem up to 6 base pairs instead of 3-4 base pairs of other tRNAs (Labunskyy V et al, 2014).
Sec tRNA is codified in humans by the gene TRU-TCA1-1 and has upstream regulatory elements and downstream Pol III termination signal. It is located on chromosome 19q13.2 to 19q13.3, and its expression is modulated by three upstream regions: a TATA box motive at nucleotide -30, a proximal sequence element at -70 and a distal sequence at -200 (Vindry C et al, 2018).
There are 2 isoforms of the Sec tRNA, that can be differentiated by the modification in the position 34: 5-methoxycarnolykmethyluridine (mcm5U) and 5-methoxycarbonylmethyl-2’-O-methyluridine (mcm5Um) (Labunskyy V et al, 2014). Also, the anticodon loop can modify the interaction with the ribosome. Housekeeping selenoproteins such as TR1 and TR3 are synthesized by one isoform (mcmU), while stress-related selenoproteins (GPx1 and GPx3, MsrB1, SelT and SelW) are synthesized by mcmUm. However, there are also other selenoproteins that can be synthesized by both isoforms (GPx4 and selenoprotein P) (Labunskyy V et al, 2014).
        -Seryl-tRNA synthetase:
The first step in Sec biosynthesis is the aminocylation of tRNA[Ser]Sec, because serine will serve as the mainstay of selenocysteine structure. This fact indicates that the tRNA[Ser]Sec has identifying elements for Ser but not Sec, found in the discriminator base and the long variable arm, both essential for aminoacylation.
The conversion of the serine-tRNA[Ser]Sec to selenocysteyl-tRNA[Ser]Sec is catalyzed by the enzyme Sec synthase (SecS), that attaches selenophosphate, the active form of selenium, into the amino acidic structure and generates Sec-tRNA (Labunskyy V et al, 2014).
In eukaryotes, there is another step in between this pathway: the synthesis of Phosphoseryl-tRNA[Ser]Sec by the phosphoseryl-tRNA kinase (PSTK). It was seen that SecS belonged to a family of PLP-dependent enzymes. Moreover, Phosphoseryl-tRNA kinase is found to be highly conserved in evolution, hence suggesting that it plays an important role in selenoprotein biosynthesis and regulation (Araiso Y et al, 2009). It selectively phosphorylates seryl-tRNASec. It has 2 independent linker-connected domains, a N-terminal catalytic domain and a C-terminal domain (Chiba S et al, 2010).
Following on with the process, when the ribosome finds the UGA codon (that normally signals translation termination), Sec machinery interacts with the canonical machinery and prevents premature termination by augmenting the UGA coding potential. SECIS dictates the recoding of UGA as Sec instead of stop signal. When this occurs, Sec-tRNA[Ser]Sec that has a complementary anticodon for UGA, adds a Sec. (Labunskyy V et al, 2014).
In eukaryotes, it is also needed SECIS binding protein 2 (SBP2) and Sec-specific translation elongation factor (eEFSec). SBP2 is associated with ribosomes and binds SECIS specifically, and interacts with eEFSec, which recruits Sec-tRNA[Ser]Sec and allows the incorporation of Sec in the growing peptide. (Labunskyy V et al, 2014)
       -SECIS elements:
SECIS elements are cis-acting stem-loop RNA structures in the 3’-UTR of all eukaryotic selenoprotein mRNAs (Vindry C et al, 2018). SECIS are formed by two helixes separated by an internal loop (SECIS core) and an apical loop. The SECIS core is needed to interact with SBP2 and is the main functional element of the SECIS. The core is formed of four non-Watson-Crick interacting base pairs with two tandem G·A/A·G base pairs (Labunskyy V et al, 2014).
In some SECIS, the apical loop has an additional ministem, and that allows to classify SECIS elements in 2 types (SECIS without ministem are type 1, and SECIS with the apical ministem are type 2). In the apical loop, there is another element necessary for Sec incorporation: the AAR motif. In some cases, such as selenoproteins M and O, the motif is formed by CC instead of AA, suggesting that this loop may not be involved in protein binding, but in ribosome interaction (Labunskyy V et al, 2014).
       -SBP2:
Secis binding protein 2 associates with ribosomes and binds SECIS with high affinity and specificity thanks to its binding domains. SBP2 also interacts with eEFSec for recruiting Sec-tRNA[Sec]Sec and helps on the incorporation of Sec into the new polypeptide (Labunskyy V et al, 2014).
Knock-down cells lead to decreased expression of selenoproteins, while an overexpression leads to enhance Sec incorporation. Mutations in this protein lead to defects in synthesis of type 2 deiodinase (DI2), and by extension, GPx1 and SelP (Labunskyy V et al, 2014).
eEFSec is a Sec-specific eukaryotic elongation factor which recruits tRNA and, in conjunction with SBP2, inserts Sec into new protein chains in response to UGA codons. It is very similar to the elongation factor (eEF1A) that incorporates all the other 20 amino acids, thereby, it also has GTPase activity. However, in contrast to eEF1A, this one is very specific for the aminoacytilated tRNA[Ser]Sec, and does not bind a other aminoacylated tRNAs. The complex SBP2-eEFSec is formed on presence of SECIS element, and it is possible even in the absence of GTP or tRNA[Ser]Sec (Labunskyy V et al, 2014).
       -Ribosomal Protein L30:
L30 is a component of the ribosome (large subunit) in eukaryotes that binds to SECIS element, but its function remains unknown.
There are three other selenoproteins that are featured during the process. Selenophosphate synthetase (SPS2) is a selenoprotein that possibly serves as a regulator of selenoprotein synthesis, and implicated in cysteine biosynthesis. Sec Lyase is an enzyme that catalyzes degradation of Sec to L-alanine and Se (Labunskyy V et al, 2014). And tRNA Sec 1 associated protein 1 or SecP43 is a RNA binding protein involved in the synthesis of the selenoproteins. It forms a complex with tRNA[Ser]Sec. It has two binding sites and it plays an important role in the Selenocysteine metabolism (Oudouhou F et al, 2017).
SPS2 is a mammalian homologue of Selenophosphate synthetase (SelD) of eubacteria, and SPS1 is an eukaryotic homologue of this same protein. All these enzymes have the function to generate the selenium donor for selenocysteine biosynthesis. In studies performed on these proteins, it has been shown that SPS2 is essential for generating the selenium donor for selenocysteine biosynthesis in mammals, however SPS1 probably has a more specialized, non-essential role in selenoprotein metabolism (Xu X et al, 2007).
This enzyme catalyzes the active Se donor selenophosphate that is required for Sec biosynthesis. SPS2 is Sec-containing in all vertebrates. Studies have shown that in mammals the SPS2 gene is duplicated. In addition, in placental mammals, the original multiexon gene (SPS2a) was replaced by a gene without introns (SPS2b) (Labunskyy et al, 2014).
Below is shown a graphical representation of the complex process stated above:
Figure 2. Sec-tRNA biosynthesis pathway in eukaryotes. Serine residue (Ser) is added by SerS (seryl-tRNA synthetase). Once Ser-tRNA[Ser]Sec is produced, PSTK (phosphoseryl-tRNA kinase) phosphorylates the tRNA, producing PSer-tRNA[Ser]Sec. SecS (selenocysteine synthase) incorporates selenophosphate (H2SePO3-) generated by SPS2. The final substrate is Sec-tRNA[Ser]Sec. Modified from Labunskyy V et al, 2014.
        SELENOPROTEIN FAMILIES
As it has been said, all selenoproteins present Sec in their sequences. Generally, Sec is found in the active site of the proteins, so when there is a mutation in this amino acid, this leads to enzyme inactivation. Following on we add a brief description of each of this selenoproteins. (Labunskyy V et al, 2014).
The mammalian TR family are characterized for containing selenium in their structure as well as for their function as catalizers of the NADPH-dependent reduction of thioredoxin (and as many other compounds too). The presence of selenium in the media is a fundamental factor to determine this enzyme’s function in vivo and in vitro. It is presumed that this protein also has some fundamental roles against oxidant injury, cell growth and transformation (Mustacich et al, 2000).
-Thioredoxin Reductase 1:
Localized in cytosol and nucleus. There are at least six isoforms in mammals formed by alternative splicing. Involved in several processes like antioxidant defense and homeostasis, apoptosis or regulation of transcription factors. Sec is found in the sequence of the active site of the enzyme (Labunskyy V et al, 2014).
-Thioredoxin Reductase 2:
TR2 or TGR (Thioredoxin/glutathione reductase): This enzyme has an additional glutaredoxin domain, but the physiological role remains unknown. High levels in testis after puberty (Labunskyy V et al, 2014).
-Thioredoxin Reductase 3:
TR3 is localized in mitochondria, where it is involved with reduction of mitochondrial thioredoxin and glutaredoxin 2 (Labunskyy V et al).
       SelWTH family
-Selenoprotein W:
Selenoprotein W is expressed in skeletal muscle, heart and brain. It is a member of the SelWTH family, which contain a thioredoxin-like fold and a conserved CxxU (C stands for cysteine, and U for Sec) motif, which may represent a redox function. Studies in mouse reveal that this protein is involved in differentiation and muscle growth, as well as in the protection of neurons from oxidative stress during the development of neurons (SELENOW Homo sapiens - Gene - NCBI, 2019)
SelW is one of the most abundant Sec-containing protein in mammals, and shares a conserved CxxU motif and thioredoxin-like fold with SelT, SelH and SelV. It is located in brain and muscle cytosol mainly and belongs to stress-related selenoproteins (Labunskyy V et al, 2014).
Selenoprotein W is itself a family as there are two different members of this group: SelW1 and SelW2.
-Selenoprotein V:
Selenoprotein V is expressed in the testis. This one also belongs to the SelWTH family, thus, its characteristics are the same as SelW. SelV is only found in placental mammals, but not in all placental species. Because it is only expressed in testes, it is thought it might be involved in male reproduction (Labunskyy V et al, 2014).
-Selenoprotein T:
Selenoprotein T is found in the endoplasmatic reticulum, and belongs to the SelWTH family just like SelV and SelW (Database G. SELENOT Gene - GeneCards, 2019) Studies in mice reveal a role of this gene against oxidative stress in Parkinson’s disease, and in the control of glucose homeostasis in pancreatic ß-cells. It is present during embryonic development and in adult tissues. Recent studies show that has a role in calcium regulation, neuroendocrine function and cellular structure organization (Labunskyy V et al, 2014).
-Selenoprotein H:
Selenoprotein H contains a Sec residue and also a conserved nuclear targeting motif. In this way, SelH has a distinctive subcellular localization pattern, and is found specifically in the nucleoli. In mouse studies, it has been found to be specially present during embryonic development. It is a sensitive selenoprotein as to Se dietary intake (Labunskyy V et al, 2014).
SelH has, as well as CxxU motif, a nuclear targeting RKRK motif in its sequence, which concurs with its location in the nucleus. These proteins have glutathione peroxidase activity and regulates expression of detoxification enzymes (Labunskyy et al, 2014).
       Selenoprotein U:
In superior mammal species selenoprotein U is found in the Cys form. Nevertheless, SelU were found in fish first, as well as in birds and unicellular eukaryotes. Three subfamilies have been reported in humans (those present below). They have a Prx-like2 structure, which means that they belong to the thioredoxine-like superfamily (Jiang L et al, 2012).
       Selenoprotein R:
Selenoprotein R, also known as MsrB1, is a zinc-containing mammalian selenoprotein. It was firstly designed as selenoprotein R (SelR) but later on, it was found that its function was methionine-R-sulfoxide reductase, like MsrA, so finally it was called MsrB1, even though structures between this protein and MsrA were not similar. MsrB1 is found primarily in nucleus and cytosol, with high activity in kidney and liver. Its main functions include cellular oxidation repair, e.g. transcription factors or cytoskeleton. Moreover, there are two additional homologs of MsrB in mammals: MsrB2 in mitochondria and MsrB3 in ER (Labunskyy V et al, 2014).
       Selenoprotein P:
Selenoprotein P is highly secretet and represents amongst the 50% of the total Se in plasma. It has a unique characteristic, which is the presence of many Sec residues in its structure. This protein is synthesized mainly in liver, even though its mRNA was detected in many tissues. Recent studies have suggested that its function is related to Se supply to peripheral tissues, specially brain and testis (Labunskyy V et al, 2014).
       Selenoprotein O:
SelO is one of the least studied selenoproteins due to absence of structural or biochemical information reported, despite of that, it was discovered more than a decade ago. This protein contains a single Sec residue in the end of the protein. It has been found a kinase domain and a mitochondrial targeting peptide. The function remains unknown. Multiple homologs have been detected in other species, but those do not contain the Sec residue but a Cys instead (Labunskyy V et al, 2014).
       Selenoprotein N:
Selenoprotein N has been the first selenoprotein to be identified by informatic tools. It is a transmembrane glycoprotein found in the ER in high levels during embryonic development but in less quantity in adult tissues. Studies suggested that has an important role in skeletal muscle regeneration and maintenance of satellite cells, but also as cofactor of ryanodine receptor involved in regulation of intracellular calcium flux. Mutations of this gene in humans associate with a group of early-onset muscle disorders (Labunskyy V et al, 2014).
       15-kDa Selenoprotein (Sep15) and Selenoprotein M (SelM)
Sep15 and SelM are thioredoxine-like selenoproteins found in ER that form a distinct family. SelM is a distant homolog of Sep15 and share approximately 30% of their sequence, and it is expressed mainly in brain cells, while Sep15 is expressed in prostate, testis and kidney.
Sep15 and SelM functions may be involved with reduction or modification of disulfide bonds during protein folding in ER. In addition, SelM function could be related to neuroprotection against oxidative damage and selenoprotein 15 is found in all vertebrates (Labunskyy V et al, 2014).
       Selenoproteins K and S
These two selenoproteins are considered a SelK/SelS family based on their topology, because both have transmembrane domain in their NH2-terminal sequence. In this family are included homologs, being the most widespread eukaryotic family present in several species of different kingdoms.
SelK and SelS are found in the ER membrane, and their functions include ER-associated degradation of misfolded proteins, and a role in immune system by mediating anti-inflammatory effects. SelenoproteinS has been suggested to play an important role in the control of inflammatory response by regulating cytokine production (Seiderer J et al, 2007). Nevertheless, Selenoprotein K is found on the endoplasmic reticulum and still has an unknown function (Labunskyy V et al, 2014).
       Selenoprotein I
Selenoprotein I is only found in vertebrates because of its recent evolution. Its function is related to phospholipids de novo synthesis (Labunskyy V et al, 2014).
       Methionine-S-sulfoxide reductase A (MsrA)
Methionine-S-sulfoxide reductase A catalyzes the reduction of methionine-S-sulfoxide, it is, therefore, a stereospecific reductase.
This family of proteins are found widespread in all three domains of life. In mammals, eight of the paralogs contain a Sec residue in the active site (GPx1, GPx2, GPx3, GPx4 GPx6), and the other three (GPx5, GPx7, GPx8) have the Sec replaced by Cys. In some mammals, GPx6 also contains a Cys and therefore is not a selenoprotein. These enzymes have a wide range of roles in physiological functions and take part in signaling peroxidase, detoxification hydroperoxidases, and maintaining cellular redox homeostasis (Labunskyy et al).
-Gluthatione peroxidase 1 (GPx1):
GPx1 is a widespread enzyme with homotretrameric structure found in all mammalian cell types. It was the first selenoprotein identified, and the most abundant in mammals. This enzyme catalyzes the conversion of hydrogen peroxide to water (intracellular antioxidant enzyme) using glutathione (GSH), which is consumed after this reaction, playing a protective role for oxidative stress.
-Gluthatione peroxidase 2 (GPx2):
GPx2 is found in the epithelium in the gastrointestinal tract. It has specificity for hydrogen peroxide. Its gene expression is controlled by Nrf2 (antioxidant response transcription).
-Gluthatione peroxidase 3 (GPx3):
GPx3 is the major GPx form in plasma. It has specificity for hydrogen peroxide.
-Gluthatione peroxidase 4 (GPx4):
GPx4 is similar to GPx1 but with monomeric structure. Involved in reduction of phospholipid hydroperoxides associated with membranes. It belongs to the housekeeping selenoproteins.
-Gluthatione peroxidase 6 (GPx6):
GPx6 is only found in olfactory epithelium, and can be found during the embryologic development. It has specificity for hydrogen peroxide.
This family of selenoenzymes are tissue-specific regulators of intracellular thyroid hormone availability and signaling (Valverde-R C et al, 2014). Deonidases are membrane selenoproteins that have thioredoxin fold. In mammals, this family is composed by three paralogs: DI1, DI2 and DI3, all playing an important role in maintaining thyroid hormone levels and activity. DI1 and DI3 are found in plasmatic membrane, but DI2 in the endoplasmic reticulum (RE) (Labunskyy V et al, 2014).
The majority of thyroid hormone is produced and secreted as T4, the inactive form. It can be converted into T3 in a reaction catalyzed by DI1 and DI2, and both T3 and T4 can be inactivated by DI3, leading to the formation of rT3. They have an important role in the regulation of thyroid hormone, being DI1 the main regulator in thyroid hormone blood levels, while DI2 and DI3 are involved in intracellular roles. DI2 has an important role in skeletal muscle cells during development and muscle regeneration process, but also in brown adipose tissue during adaptive thermogenesis (Labunskyy et al).
Craseonycteris thonglongyai
This study aims to identify the selenoproteome and the machinery needed to synthesizing it in a recently sequenced organism: Craseonycteris thonglongyai. In order to have a global understanding about this animal, here we collect its main characteristics.
Craseonycteris thonglongyai, commonly known as bumblebee bats or hog-nosed bat, is usually found in the Oriental Region of the world. Specifically, they are found in the western part of Thailand and the southeastern parts of Mynammar, near the border with Thailand (Humphery S et al, 1990)
-Habitat:
They prefer to live in limestone caves from tropical rainforest biomes for roosting on the top for warmth. Even though they are not social, the caves will be shared with about 100 individuals. Bamboo forests serve as their food habitat, and they feed from insects of the upper canopy forest (Hill J et al, 1981)
-Physical Description:
Bumblebee bats take their common name from their size, which is almost as a large bumblebee. They are considered among one of the smallest mammals in the world. Adults have a range mass between 1.7 and 2.0 grams and a range length among 29 to 33 millimeters. Their small eyes are usually hidden by fur. Hog-nosed bats mouths contain 28 teeth with relatively large incisors, specially, the lower incisors are long and narrow. They can present two different colors on their upper body (brownish red or gray). The underside is paler, but the wings and the membrane between the legs (uropatagium) are darker (Hill J et al, 1981; Nowak R et al, 1999).
Distinctive characters, described by Nowak R et al (1999):
Their skulls are small with large inflated spherical braincase and with no lambdoidal crests, postorbital processes, and supraoccipital ridges. In both genders, a sagittal crest (bone that runs down the top middle of the skull) is visible. The zygomata (arch in the cheek) is slim but complete. Lack of tail, even though they have caudal vertebrae.
Pig-like nose with large nostrils separated by a large septum.
Large ears (9-10.2 mm) with a tragus about half the size of the ears.
Wide wings that allow them to hover.
Thumbs with claws.
Females have 2 sets of nipples (chest and pubic area). The nipples in the pubic area are thought to be vestigial or not fully developed.
Males have a big gland in the base of their throats.
-Reproduction:
Little is known about these mammals reproduction, however, there are acknowledgements that they breed once a year from late April to May, and have one offspring per year (Hayssen V et al 1993). During their time inside the cave they are found attached to the mother, and when the mother goes finding for food, they are left alone (Kurta A et al, 1987).
-Lifespan:
It is thought to be around 5 to 10 years, however, this approximation is based on other related bats as their real lifespan is unknown (Ward A et al, 2004).
-Behavior:
Thanks to their motile characteristics, they are able to take long powered flights. Their activity mostly comprehends sunrise and nightfall timing, being more active during dawn (30 minutes) than during dusk (18 minutes). Bumblebee bats fly on a home range of 1 kilometer form the cave, but they do not defend territories (Burton M et al, 2002).
Echolocation is their way of navigation through sound of high intensity and constant frequency followed by a shallow downward sweep (Humphery S et al, 1990).
Insects and spiders are the bats’ main source of nutrition. They catch the prey while flying, which means they are aerial feeders. Its ecosystem impact on the prey is probably not substantial due to its small number and stature (Hill J et al, 1981; Humphery S et al, 1990).
-Conservation status:
Hog-nosed bats are considered vulnerable on the IUCN Red List and endangered on the U.S. Endangered species list. In the first one, they report that the current population of 5100 individuals is decreasing because of human activities in caves such as habitat-altering by limestone extractions. Also, the habitats from which they feed are being deforested decreasing prey availability (Ward A et al, 2004).
AIM AND EXPECTED RESULTS
Selenoproteins contain Sec in their sequences which are codified by an UGA codon. Because of this, it is common to find them miss annotated in sequence databases. Therefore, the main aim of this project is to identify the selenoproteome and the machinery needed for its biosynthesis in Craseonycteris thonglongyai. In order to achieve it bioinformatic approaches will be used. First, this proteins will be deduced by an homology analysis using human proteome as a reference and a program designed by us. After this, SECIS elements will be predicted using Seblastian tool.
Craseonycteris thonglongyai is a placental mammal, for this reason and assuming that selenoproteins are quite conserved into this group, we expect that the results will be similar to human selenoproteome.