Introduction

Selenoproteins

Selenoproteins are proteins that contain selenocysteine, an aminoacid that is analogous to cystein but that has selenium (an essential micronutrient in the diet) in spite of sulphur. These selenoproteins need an specific machinery of synthesis that is conserved in the different species in which selenoproteins have been described. In selenoproteins, the incorporation of the selenocystein aminoacid is specified by the UGA codon, that commonly codifies for a stop signal (genetic code). The alternative decodification of this codon is performed thanks to an mRNA structure: the SECIS element, that can be found in UTR 3'. The SECIS elements are defined by a characteristic sequence and by an specific secondary structure that is formed because of the basepairy, this fact represents a non canonic basepairing (U-G) in eucariotes. Because the major part of prediction programs interpret UGA codon as a stop codon, these programs ara unable to identify selenoproteins which makes it necessary to work out other bioinformatic methods to detect the presence of a selenoprotein.

Selenocystein synthesis

Selenocystein synthesis comprises the following steps:

Serine tRNA aminoacylation with a seril-tRNA synthetase.
Seril-tRNA synthetase phosphorylation by PSTK (PSTKinase), allowing the formation of a O-phosphoseril tRNA.
Dephosphorilation of the tRNA that produces the acceptor molecule for SeP.
SeP previous synthesis by SPS2, so SeP is responsible for the last step to obtain selenocystein.

SECIS element

As we have already explained, the SECIS elements are stem-loop structure sequences located in the 3'UTR end, that are needed for ignoring the STOP codon and being able to include in this position a selenocysteine. The SECIS element is located after the UGA codon, and sometimes even some Kb after it. Its secondary structure has been fairly maintained, and contains consensus characteristic sequences, as we can here appreciate in the red color of this image.
Function of Selenoproteins

The presence of such a complex and specific machinery makes us think that the function of these proteins is not trivial. Actually, these proteins are involved in catabolic processes and in the redox reactions catalization. In these last reactions they produce an antioxidant effect, avoiding harmful effects that cause several alterations. Every Selenoprotein has a protective effect against different free radicals (Hidrogen peroxide, Phospholipid hidroperoxides, etc.)

Plasmodium yoelii yoelii

Plasmodium yoelii yoelii is an eukaryotic organism (kingdom Alveolata, filum Apicomplexa, class Aconoidasida, order Haemosporida) that has been isolated from blood of shiny thicket rats (Thamnonys rutilans from Central Republic of Africa, Brasil and the west part of Nigeria). Three P.yoelii subspecies can be distinguished:P. yoelii yoelii, P. yoelii killicki and P. yoelii nigeriensis.

These organisms are parasites that cause the malaria disease in rodents. Because of that, Plasmodium yoelii yoelii (and others) are used spreadly as a model in functional analyses in order to identify new medicines and targets to treat and prevent this disease (it is grown in the laboratory in rats and mice; and it has been demonstrated that it prefers inmature erythrocytes and retyculocytes).

Malaria disease is an infeccious disease caused by the Plasmodium genus and that it is caught by a vector, the female Anopheles mosquito. Every year, this illness causes 515 millions of sick people and kills between 1 and 3 millions of people, the majority of whom are children from Sub-saharan Africa. Therefore, the malaria represents an important problem in public health.

Inside the Plasmodium genus just four species infect mankind:Plasmodium falciparum, Plasmodium vivax, Plasmodium ovale i Plasmodium malariae. Other species are able to infect other organisms, as Plasmodium yoelii.

Malaria's cycle

The malarial parasite's life-cycle includes two hosts: the female Anopheles mosquito and human.

When a female Anopheles infected with Plasmodium feeds with human blood, it inoculates sporozoytes (1). The sporozoytes infect the hepatocytes (2) and mature to schizont (3), who break the cells and release the merozoytes (4)(in P.vivax and P.ovale there is a latent stage [hypnozoite] that can persist in the liver and cause relapses due to its release to the bloodstream weeks or years later). After this initial replication in the liver (Exo-erythrocytic cycle [A]), the parasite reproduce themselves in an asexual way in the erythrocytes (Erythrocytic cycle [B]). The merozoytes infect the erythrocytes (5). The trophozoyte stage mature to schizont, who cause the cellular breakage that leads to merozoytes (6). Some parasites differ in the sexual erythrocytic stages (7). The stages occuring in blood are the cause of the classical signs of the malaria desease.

The male gametocytes (microgamonts) and female (macrogamonts) are eaten by the female Anopheles mosquito during one of its alimentation periods (8). The multiplication of parasites inside the mosquito is known as the Sporognic cycle [C]. In the meantime, in the mosquito's stomach the microgamonts penetrate the microgamonts, producing zygotes (9). The zygotes become movile and extended (known as oocinets) (10), who invade the mosquito's intestinal wall where they develope to oocysts (11). The oocysts grow and turn into sporozoytes (12), who move to the mosquito's salivate glands. The inoculation of sporozoytes to a new human host perpetuates the malaria cycle (1).

Tot i que s'estan desenvolupant algunes vacunes contra la malària, el tractament bàsic es basa en fàrmacs profilàctics dels quals cada vegada és més comú detectar-ne resistències. Although some vaccines against malaria are nowadays under development, the basic treatment is based in prophylactic medicines. However, resistence has been detected against these medicines. That is why the Plasmodium species that affect humans are being deeply studied in parallel to species that affect laboratory animals, like Plasmodium yoleii. They allow us to investigate in order to reach a satisfactory treatment for this disease.

The Institute for Genomic Research (TIGR) in cooperation with the Naval Medical Research Center have concluded a research programm to sequenciate the complete genome of Plasmodium falciparum, Plasmodium vivax, P. yoelii, P. berghei and P. chabaudi, that provide us with a deep knowledge of the evolutive history of these species. The specific data from the P. yoelii yoelii genome used in this work can be found in the "Materials & Methods" section.
Phylogenetic tree image: Phylogenetic relationship among the 17 Plasmodium species inferred from the gene encoding cytochrome b. The tree was estimated using the NJ method.Image extracted from http://www.pnas.org/cgi/content/full/95/14/8124?ck=nck

Materials & Methods

Research of Plasmodium yoelii yoelii's genome and other organisms selenoproteins

First of all, the genome of Plasmodium yoelii yoelii (str 17XNL) that has been obtained, was provided by the project supervisors. However, it can be found at plasmodb.org and NCBI.

Plasmodium yoelii yoelii genome characteristics

The project to sequence the whole genome (Whole Genome Shotgun) consists in an anotation that has been genered authomatically, it is preliminar and is formed by contigs. Consequently, it is submitted for revision because it is necessary to go deeper in the anotation in order to finish it. The contigs represent more than 2kb (20Mb in total), geting an 87% of the whole genomic sequence. The genetic code of Plasmodium yoelii yoelii's genome belongs to table 1 (standard) while the mithocondrial genetic code belonts to translation table 4.

On the other side, sequences of selenoproteins and proteins that belong the their synthesis machinery of other organisms have been obtained thanks to different sources:

Selenodb database: sequences of Homo sapiens, Pan troglodytes, Mus musculus, Tetraodon nigroviridis, Drosophila melanogaster, Anopheles gambiae, Caenorhabditis elegans and Saccharomyces cerevisae. A multifasta document has been generated for each organisme that shows all its proteins.
NCBI database: Plasmodium falciparum and Plasmodium yoelii sequences, as well as Drosophila melanogaster protein sequences. The sequences have been obtained from this application, since we have obtained the access codes for these proteins from scientific articles quotated at the references.

Using BLAST to research similarity

The sequences of the selenoproteins and proteins of the synthesis machinery from different organisms have been compared with the Plasmodium yoelii yoelii genome. In this case, TBLASTN has been used because it allows to compare aminoacidic sequences of proteins (query) with nucleotide sequences belonging to the Plasmodium yoelii yoelii genome (subject) due to its capacity of translating the nucleotide sequence in each of the possible open reading frames.

The alignments with TBLASTN have been performed using the command prompt. Previously the genome of Plasmodium yoelii yoelii has been given format with:

formatdb -i genome.fa -p F

-p F indicates a nucleotidic genome.

After having installed BLAST, it has been used the command:

blastall -p tblastn -d database -i query.fa

Where database belongs to the Plasmodium yoelii yoelii genome and the query.fa belongs to the aminoacidic sequence

The remarkable parameters of TBLASTN configuration choosed to perform the alignmetns are:

BLOSUM62 matrix= substitution matrix based in local alignments. The number corresponds to the identity of the sequences used to build up the matrix. The similarity percentage is not very high due to the long evolutive distance that can be found between Plasmodium yoelii yoelii and the organisms with which it is compared.
e-value = 10.0. This value is very high, taking into account that ideally it should be 0.1. Another time, it is important to remark that the alignment has been done between proteins of organisms evolutively far from Plasmodium yoelii yoelii. However, the e-values superior to 1 haven't been analyzed.
-m 9 format= table style format that shows accurate and classified information of the most representative alignments in different aspects: query identity, subject identity, % identity, alignment length, mismatches, gap openings, query start, query end, subject start, subject end, e-value and bit score.

The identification of the most outstanding matches is achieved using TBLASTN. Thus, what has been really done, is to restrict the genome areas that are potential selenoproteins genes, selenoprotein machinery or proteins homologous to selenoproteins. In the case of selenoproteins the objective is to identify the alignments where the selenocystein aminoacid is aligned with a genome's stop codon. Contrary, in the case of homologous proteins it is interesting to find the alignments that coincide with a cysteine. Referring to the machinery proteins, it depends on whether they are selenoproteins or not: in the first case, they are analyzed as selenoproteins, in the second case as homologous proteins, without taking into account any specific aminoacid.

Obtaining the genomic sequence of interest

In order to improve the posterior analysis, the previous aligned sequence has been cut off from its contig. This procedure has been done as follows:

perl FastaToTbl.pl < genome.fa | awk '$1=="subject_identity"' | perl TblToFasta.pl output.fa

fastasubseq -f (fasta file) -s (inici) -l (llargada) > output.fa

With the first command and using two perl programmes, the contig of interest is isolated creating a new fasta document. The second command cuts the specific sequence of nucleotides inside the contig.

Searching similarity and structure: Exonerate and GeneWise

The following step is to predict the gen structure. Exonerate and GeneWise are programs that allow the prediction of the exonic structure taking into account the presence of introns. Both programs realign the two sequences; having restricted the nucleotide sequence of interest, the alignment conditions are improved. Exonerate has been used as follows:

export PATH=$PATH:/disc8/bin/exonerate/bin

exonerate -m p2g -q query.fa -t target.fa --showtargetgff

query.fa corresponds to the aminoacidic sequence of an organism protein and target.fa to the previous chosen genomic region of Plasmodium yoelii yoelii. The gff format allows to summarize in a table the most important information about the gene. GeneWise has been used from its web page www.ebi.ac.uk with the Advanced GeneWise option, choosing the following: gene structure, translation, cDNA, GFF output, global alignment and organism worm.

Research of SECIS elements: SECISearch

When the gene codifies for a selenoprotein, the SECISearch program has been used to search for the potential SECIS element at 3'UTR. It is important to take into account that it is necessary to search for the SECIS element at a minimum distance of 3000bp downstream, because up to the moment it is reported that the SECIS element is separated from the codifying region of the gene.

Translating genes: Expasy

Given the exonic sequences predicted by Exonerate o GeneWise, they have been translated in order to obtain the predicted sequence for each single gene. This process is achieved with the Translate application from the Expasy server that translates the sequence into the six possible frames (three for each sense) and chosses the best one.

Aligning sequences: ClustalW

The sequences of Plasmodium yoelii yoelii proteins and the protein sequences of the organisms used in each case have been aligned by ClustalW, making it possible to show their similarities and differences. With this last step the conservation between both sequences can be observed.

The web sources used are:
National Center for Biotechnology Information
Plasmo database
Seleno database
GeneWise
SECISearch
ExPaSy Proteomics Server
ClustalW
Pfam Home Page

Note
This is a general protocol, that is to say, the procedures have been explained at a global level. However, it is important to take into account that each gene has been a different challenge and from the first alignment to the obtention of the final protein in some cases it has been difficult to solve the problems only by following the general protocol.
Because of that, each gene has an specific section in which the specific procedure is explained.

Results

Plasmodium yoelii yoelii Selenoproteins

Sel 1

Gene obtention process
First of all, the aminoacid sequence of Plasmodium falciparum selenoprotein 1 is obtained and the alignment with TBLASTN is performed. The alignment results are not as good as expected due to the shortness of the aligned sequence. However, as the first alignment has an e-value equal to 0.001 the analisis is continued because of the hight posssibility of this alignment to be significative.

Once the alignment is analyzed, the specific sequence of Plasmodium yoelii genome is isolated (the genome is a multifasta document) which corresponds to the contig MALPY02298.
Genewise is used to align this contig with the P. falciparum protein sequence and to predict the exons within the protein. The Genewise results show an alignment that starts at the very beginning of the protein and that shows a gap an the site where the selenocystein of Sel1 can be found. To check if the gap corresponds to a selenocystein, the nucleotides of the P. yoelii yoelii genome are translated by hand and exactly, that is the case.

The Exonerate program is also used with the same aim as genewise, to corroborate its results. The Exonerate results are also satisfactory, and even more, in this case it is possible to check directly that the penultimate aminoacid is a selenocystein. Both applications predict a gen formed by two exons.

The results seem to indicate that the protein analyzed is a real selenoprotein, so the following step is to determine if the SECIS element is present and after the 3'UTR Sel1 region. With this purpose the program SECISearch is used: a sequence of aproximately 2000 bases extracted right after the STOP codon is introduced to obtain the SECISearch results.

Finally, the nucleotidic sequence of Sel1 is showed. Therefore, with all these results we can declare that Plasmodium yoelii yoelii has the Selenoprotein 1 in its genome.

Gene information

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY02298|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: forward sequence from the 2816th nucleotid to 3477th.

GFF results:

Aminoacidic sequence

Results of the alignment between both proteins

Sketch of Sel1 and its SECIS element:

Sel 3

Gene Obtention Process
First of all, the Plasmodium falciparum Selenoprotein 3 sequence was obtained from the PlasmoDB Database. An alignment of the protein against de Plasmodium yoelii yoelii genome was carried out using the TBLASTN programm. The alignment results show an interesting hit, that has an score of 202 and an e-value of 2e-53 and that belongs to the contig MALPY07123. The most remarcable characteristic is that the Plasmodium falciparum selenocysteine (shown as an "X") is aligned with an asterisk of the Plasmodium yoelii yoelii genome, that belongs to an STOP codon. Furthermore, the alignment starts at the beginning of the P. falciparum protein and ends at the penultimate aminoacid, which idicates that it is a good alignment and that Sel 3 is probably found in Plasmodium yoelii yoelii as well.

The following step is selecting and extracting the contig region of interest where the alignment has been found. This new region, together with the P. falciparum protein, are used to run the Genewise application. The Genewise results indicate that Sel 3 has a unique exon, that is limited by the contig nucleotides 47-1066.
Next, once the Sel 3 nucleotidic sequence has been localized, it has to be found a SECIS elemnt near the 3'UTR region of Sel3. In order to achieve this, SECISsearch has been used to look for SECIS motives in the region between the last exon codon (last encoding codon) and the following 2000 nucleotides, as the SECIS element can be further away from the exon. The SECISearch results are very satisfactory, as it shows a SECIS element in these 2000 bases after the encoding region with a value of 34.9, that compared with the threshold value (15) seems to be a good prediction.

Therefore, with all these data it can be declared that Plasmodium yoelii yoelii has the Selenoprotein 3 in its genome.
Gene Information

Localitzation: Plasmodium_yoelii_yoelii_str._17XNL|MALPY01723|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic Sequence inside the contig:forward sequence from the 47th nucleotid to the 1066th nucleotide.

GFF Results

Aminoacidic Sequence

Result of the alignment between both proteins

Sketch of Sel 1 and its SECIS element:

Sel 4

Gene obtention process

First of all, the Plasmodium faciparum selenoprotein 4 is downloaded from the plasmodb database. An aligment is performed between this protein and the Plasmodium yoelii yoelii genome using TBLASTN. The alignment results only show a single hit that corresponds to MALPY2554 contig with an score of 50 and an e-value of 3e-07, however, the align is only performed from the 36th aminoacid to 130th of P. falciparum Sel4.

The next thing to do is to select the contig that contains the alignment and to cut the nucleotidic sequence that corresponds to this and it is enlarged at the beginning and at the end to ensure that all the area where the gene can be located is covered. The previous sequence is aligned with Plasmodium falciparum Sel4 using GeneWise. The a href="./Genewise_Sel4.txt">GeneWise results aren't satisfactory: there is an incorrect alignment of the protein, and the region that surrounds the selenocystein is not conserved as expected because it is found in the active center of the protein.

As an alternative to the GeneWise program, the sequences are aligned with Exonerate, to check if this program is able to offer a better alignment. Effectively, the Exonerate results are much better than the ones obtained with GeneWise and because the genetic distance of this species is shorter, it indicates that Plasmodium yoelii yoelii having selenoprotin 4 is very probable. However, Exonerate only aligns from the 16th aminoacid the the 131th (the protein has a total of 135 aminoacids). Given that the selenocystein of P. falciparum Sel 4 is located at the penultimate position, the alignment of selencystein is not showed in the alignment results and it is not possible to chek if it aligns with a Plasmodium yoelii yoelii stop codon. This problem can be caused by two phenomena. On one side, as the selenocystein is located at the penultimate position of P. falciparum, it is very difficult for the programs to overcome the penalization of finding a stop codon in a protein and, vecause of that, the alignment is ended due to the low score. On the other side, it is a possibility that the P. yoelii yoelii selenoprotin has more than one intron. Moreover, the selenocystein codon is located at the penultimate position, which makes lower the score. It the selenocystein was located at the center of the protein, probably, the posterior aminoacidic sequence would be conserved and the penalization for a stop codon would be overcome.

With the aim to find the possible stop codoin in the P. yoelii yoelii codon, the sequence obtained with exonerate is translanted into protein using the Expasy server. As previous, this nucleotidic sequence doesn't correspont to the whole protein (only from the 16th amonoacid to the 131th). The following step is to paste the first 15 aminoacids and the 4 last aminoacids, obtaining a mixed protein.

The P. yoelii yoelii genome is realigned with this mixed protein using Exonerate. The Exonerate results show and alignment almost perfect but another time doesn't show the possible alignment beteween the selenocystein and the stop codon, so it is not possible to say definitely that Plasmodium yoelii yoelii has a stop codon in the genome. This result is more indicative of an intron that doesn't allow the alignment at the end.

Next, the genomic area obtained from Exonerate is checked by hand. The nucleotidic sequence is enlarged at the beginning (the alignment strats from the 16th amonoacid) and also at the end (220 more nucleotides) because it is known that the length of Plasmodium falciparum Sel4 intron is of 180 nucleotides. This sequence is compared another time with P. falciparumSel4 using GeneWise and Exonerate. The GeneWise results and Exonerate results aren't satisfactory because the alignment also ands at the 130th aminoacid.

As the alternative analysis words, an area of 590 nucleotides of the reverse sequence is selected and translated using the Expasy server. Of the obtained results the 5'3' frame 1 is the very beginnig of the protein because it starts with methionine and it is confirmed by performing the clustalw alignment with frame 1 and with the Plasmodium yoelii yoelii Sel 4 selenoprotein that is found in the scientific papers consulted. This last protein is found in a picture, because of this, it is transcribed directly to fasta format. From this results, it is possible to check that is probable the existnce of an intron, because the final part doesn't coincide. Moreover, the frame 3 clustalw alignment with the same Sel4 shows a posterior area with the frame 1 that aligns perfectly 9 aminoacides. It is important to highlight that the alignment of the penultimate aminoacid of these nine is an U and a stop. Because all this, it is possible to conclude that the gene encodes for a selenoprotein and that it has an intron that produces a change in the reading frame of the second exon.

As the results in both GeneWise and Exonerate are not satisfactory, it is decided to search by hand the intronic sequence. The intron obtained is correct because it matches with the proteic sequence, but the acceptor and donor standard sites are not localized.

Finally, a SECIS element in the 3'UTR are of Sel 4 is determined. As the gene is in the reverse sequence of the contig, the complementary and reverse sequence is done in order to use it in the program SECISearch. The SECISearch results are satisfactory because they present a result of 22.87 and that, if compared with the threshold recommended (which is 15), showed a good prediction.

Finally, the nucleotidic sequence of the gene with the structure and the SECIS element is showed.

With all these results is possible to afirm that Plasmodium yoelii yoelii has the Selenoprotin 4 in its genome.

Gene information

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY02554|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: reverse sequence from the 1845th nucleotide to the 1352th

GFF results: can not be done because not GeneWise neither Exonerate can predict the gene correctly.

Aminoacidic sequence

Results of the alignment between both proteins

Sketch of Sel1 and its SECIS element

Plasmodium yoelii yoelii Selenoprotein Sythesis Machinery

Eukaryotic elongation factor (EFSec)

Gene obtention process:
The first step is to search EFsec sequences in other organisms in order to perform a research by homology at the P. yoelii yoelii genome. From the SelenoDB database the sequence of Homo sapiens EFsec is downloaded. The sequence of the Plasmodium falciparum EFsec protein is obtained from NCBI database.

Next, the alignment between the P. yoelii yoelii genome and the two protein sequences is performed. In both alignments the results are good, however, as expected because of the evolutive proximity, the results are better in the case of P. falciparum. In concrete, the alignment with H. sapiens is in the MALPY00118 contig of P. yoelii, showing an e-value of 2e-33 and a socre of 140. In the case of the alignment with P. falciparum, the best result is achieved in the same contig with an e-value of 3e-49 and a score of 193. Because the best results are obtained with P. falciparum, from now on the analysis will be based in this alignment.

The following step is to select the region of the contig that contains the aligned nucleotides (10190 to 9489) and to elongate the beginning and the end. In this way, the area selected of the contig is from 8000 to 12000 (that is the nucleotides margin in which the previous alignments can be found). Later, exonerate is used to find the gene structure aligning the subsequence of P. yoelii yoelii with the P. falciparum EFsec protein. The Exonerate results for EFsec show five successive alignments that could be exons of a single gene, but are not detected in this way.

The same operation is performed with GeneWise, but as the gene is reverse, it is necessary to perform the reverse and complementary sequence of the original one. The GeneWise results for EFsec indicate that the gene is formed by a single exon and also outputs the protein and DNA sequence.

Finally, the nucleotidic sequence of EFsec is located in MALPY00118 contig obtaining the gene that starts at 8034 nucleotide and that ends at 10190.

Gene information:

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY00118|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: reverse sequence from the 10190 nucleotide to 8034.

GFF results

Aminoacidic sequence:

Result of the alignment between both proteins

Sketch of EFSec:

Selenophosphate Synthetase (SPS)

Gene obtention process
In the case of SPS, in the initial search for information, SPS was anotated as a P. yoelii putative protein. The next step was to find this protein at the P. yoelii yoelli genome. With TBLASTN the alignment between the SPS putative protein and the genome was obtained being it perfect, with an e-value of 0.0 and a score of 1473. So the protein is anotated at the MALPY01772 contig, where the beginning is at the 5337th nucleotide and the end at the 2353th at the reverse complementary sequence.

Later, the genewise is performed with the putative SPS protein and P. yoelii genome. The results show that the gene is formed by two exons, in this way the final diagram of SPS is obtained.

It is known that this protein corresponds to SPS. However, it remains unknown if it is SPS1 or SPS2. For this reason, a multiple alignment with Anopheles gambiae SPS is performed.

In this alignment it is possible to observe the great similarity between SPS1 and SPS2 of Anopheles gambiae and that the U of SPS2 coincides in the alignment with the R of SPS1. For this reason, it is difficult to decide if the protein analyzed is SPS1 or SPS2. However, SPS2 commonly contains selenocystein and, in this case, it doesn't occur. Bearing in mind this argument it could be thought that the putative protein is SPS1. Nevertheless, according to nowadays data, SPS2 is essential for selenoproteins synthesis and its function can't be replaced by SPS1. So, if selenoproteins have been found in the genome this protein should be SPS2.

On the other side, due to the similarity between SPS1 and SPS2 in eukaryotes (contrary to the unique SPS that procariotes possess), it is believed that in an specific moment in the evolution, a duplication of that gene took place, giving birth to SPS1 and SPS2. Although the analyzed organism is an eukaryote, the protozoan branch was one of the first to separate, so it is possible that the duplication was already present in the eukaryote branch after the differentiation of Plasmodium. This fact could explain why it is not possible to identify two SPS.
Gen Information

Localitzation: Plasmodium_yoelii_yoelii_str._17XNL|MALPY01772|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic Sequence inside the contig: reverse sequence from the 5337th nucleotide to the 2353th

GFF Results

Aminoacidic Sequence

Sketch of SPS

SECIS binding protein (SBP)

Gene Obtention Process
In order to look whether the machinery protein for selenoprotein synthesis SECIS binding protein (SBP) exists in the Plasmodium yoelii yoelii genome, the results of the alignments carried out with the TBLASTN application between the Plasmodium yoelii yoelii genome and the Homo sapiens protein extracted from the database SelenoDB were first analyzed.
From these TBLASTN alignments two results were obtained, although the analyses were just focused on the first of them for being more remarcable. It is an alignment of the contig MALPY00816 with a score of 44 and an e-value of 4e-04. That is why the analyses continue with this contig.

Although it shows a good e-value, not the complete protein has been aligned with the contig, but just the amioacids 631 to 740 were aligned. Then the contig of interest was selected and used to realign the sequencies with the Exonerate programm.

The Exonerate results give more information than the TBLASTN results, as the query protein is aligned with a bigger amount of aminoacids. However, a problem without solution arises. The alignment begins at the Homo sapiens SBP2 aminoacid 106, which is aligned with the nucleotide 66 from Plasmodium yoelii yoelii genome. That indicates that this contig's initial sequence is not long enough as to be translated and result in the aproximately 106 first aminoacids of SBP2. Therefore, as our genome has not been anotated yet and so it is divided in contigs, the SBP2 gen is truncated. Until the moment when the genome is anotated the nucleotidic sequence of this gene can not be characterized.

Gen Information

Localitzation: Plasmodium_yoelii_yoelii_str._17XNL|MALPY00816|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

tRNASec

Gen obtention process

The sequence of Plasmodium yoelii yoelii tRNASec has been obtained from the web page plasmodb.org . This tRNA is necessary and indispensable for the selenoproteins synthesis.
The nucleotide sequence is as follows:
GCACCGATGAGTTAGCATGGTTGCTAAAGATGACTTCAAATCATTTGGTGTAGGTCACTGCACAGAGGTTCGATTCCTCCTTCGGTGCG
As the sequence of tRNA is available since the beginning, an alignment is performed with BLASTN (in this case the alignment is nucleotide-nucleotide) in order to check if the results from PlasmoDB are correct.

The BLASTN alignment results reach the expectations: the best alignment is located at MALPY00618 contig, has an e-value of 8e-45 and a score of 176. Moreover, the 89 query bases (tRNASec nucleotidic sequence) have been aligned completely.
Gene information

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY00618|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: forward sequence from the 2292th nucleotide to the 2380th.

Aminoacidic sequence: in these case it is not interesting due to the fact that the tRNA is not translated into protein.

Sketch of tRNASec

Secp43

Gene obtention process
With the aim to search for the machinery protein Secp43 in Plasmodium yoelii yoelii genome, the first thing to do is to obtain the Secp43 protein of Drosophila melanogaster and align it against the genome with TBLASTN. It is important to remark that Drosophila melanogaster Secp43 protein is used because there isn't any Secp43 at selenodb.org database.

Positive results are obtained with the TBLASTN alignments. There are two contigs that have good results, and it should be interesting to go deeper in both cases: the alignment with MALPY01249 contig show an e-value of 7e-10 and a score of 61 and the alignment with MALPY01712 contig show an e-value of 9e-10 and a score of 61.

It is important to take into account that in both contigs the alignment is in the reverse sequence, so it is possible that Secp43 is duplicated in the genome. However, it is necessary to analyze both contigs sepparately to corroborate the hypothesis. From now on the contigs will be analized sepparately.

MALPY01249 contig analysis

The alignment between Drosophila melanogaster Secp43 and the genome area starts at position 4843 and ends at 4310 and has an e-value of 7e-10. The next step is to cut the are of interest in the genome to realign the sequence using GeneWise and Exonerate. The GeneWise results show the presence of a single exon that aligns exactly with the same area found in the analysis with TBLASTN. The Exonerate results don't show any gene prediction.

Next, a manual prediction of the gene is performed. In the GeneWise results the gene starts the alignment at the aminoacid 8 of Drosophila melanogaster Secp43, so the initial methionine has to be searched. The sequence that encodes the first seven aminoacids in Drosophila melanogaster but has not been predicted by GeneWise is indicated in yellow. As can be observed, the first triplet doesn't encode a methionine neither any triplet in the area, so it is possible that in this area there is an splicing acceptor site.

To define the splicing acceptor site, the domains of Secp43 ara analyzed with pfam (a source to predict proteic domains) and the result for Drosophila melanogaster Secp43 shows two motifs of RNA recognition.

If the process is repeated in order to analyze the domains of Secp43 predicted by hand, the results of Plasmodium yoelii yoelii Secp43 are obtained. In this case, there are also two motifs of RNA recognition, however, the first one is incomplete and only the final part can be observed. This is due to the possible splicing site that has been predicted. Observing the pfam alignment it is possible to conclude that the first nucleotide affected by splicing is D (aspartic acid), encoded by GAT at the reverse sequence, that is to say, CTA at the forward chain. The C of the CTA triplet takes part in the splicing acceptor site, so when the splicing takes place, the encoding area will finish with C (so when the two encoding regions join the CTA triplet will be formed). It is important to take into account that it is not possible to predict the 24 aminoacids of the first domain because there is no information of the intron beginning.

To conclude, the ATG codon that initiates the translation of Plasmodium yoelii yoelii Secp43 protein can't be predicted due to the splicing site, but the stop codon is known. It is possible to confirm that the gene has more than one exon because the presence of one splicing site is known, but more exons could also be found. However, there is little chance for it, as it lacks only for the 24 initial aminoacid encoding region.

MALPY01712 contig analysis

The alignment between Drosophila melanogaster Secp43 and the genoma area starts at position 3620 and ends at 3126 and has an e-value of 9e-10. The following step is to cut the area of interest of the genome to realign using GeneWise and Exonerate. The GeneWise results with the region of interest but with the reverse sequence shows the presence of one exon that aligns with the same region found in the TBLASTN analysis. The Exonerate results with the same area of the genome don't predict any gene.

In the GeneWise results the alignment starts at the aminoacid 8 of Drosophila melanogaster Secp43, so the methionine start codon is searched by hand obtaining the prediction of the gene by hand.

In this case it is possible to predict the initial methionine, that is found nine aminoacids before the encoding area predicted by GeneWise. In conclusion, Plasmodium yoelii yoelii Secp43 encoded in the MALPY01712 contig has two more aminoacids in the first area than its analog in Drosophila melanogaster. Although the first codon is found, the domains of Secp43 ara analyzed with pfam using the sequence made by hand. With the pfam results four motifs of RNA recongnition can be found, however, the third one is broken and only the first part is showed, probalby, due to the presence of a tinny intron.

Gene information

In Plasmodium yoelii yoelii Secp43 is duplicated

Secp43 in MALPY01249 contig:

Localization: >Plasmodium_yoelii_yoelii_str._17XNL|MALPY01249|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: only the last part of the protein is known, this one goes from 4755 nucleotide to 3308.

Aminoacidic Sequence

Sketch of Secp43 in this contig

Secp43 in MALPY01712 contig:

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY01712|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig:reverse sequence, from nucleotide 3647 to 1146.

Aminoacidic Sequence

Sketch of Secp43 in this contig.

Alignment between the two predicted Secp43

SLA/LP

Gene obtention process
With the aim to find if the machinery protein SLA/LP is within the Plasmodium yoelii yoelii genoma, the first step is to obtain the Drosophila melanogaster SLA/LP and to align it with Plasmodium yoelii yoelii genome. The comparison is made with Drosophila melanogaster because no other organism's SLA/LP protein is available.

The results of the TBLASTN alignment are satisfactory particularly in MALPY01707 where a more detailed analysis could be interesting. The results of this contig indicate and e-value of 2e-66 and a score of 232. It is important to keep in mind that the alignment is made with the reverse sequence.

With the e-value obtained, it can be observed that a part of the Drosophila melanogaster SLA/LP is aligned with P. yoelii genome from the 8571th nucleotide to the 7430th at the cited contig. The GeneWise results with the subsequence shows the presence of two exons. Looking at the Exonerate analysis a similar result is obtained, that is to say, two exons.

From this point, a prediction by hand has been done. It is not possible to know the start coding aminoacid because no ATG codon is found in the 60 aminoacids before the encoding area predicted by Exonerate. It is probable that in this area there is an splicing acceptor site, so the beginning part of the SLA/LP protein is upstream in the genome (the fact that the genome is not anotated but it is represented in contigs makes it impossible to find the beginning of the protein).

The possible stop codon of the protein has been predicted following the reading frame of the second exon. However, it is important to take into account that the sequence after the second exon is not conserved at all with Drosophila melanogaster's SLA/LP. Given that, it is not possible to be sure that it is the real stop codon where the protein ends. Probably, there is another splicing site and another exon afterwards.

Gene information

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY01707|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence within the contig: reverse sequence from the 8571 nucleotide to 6818.

Aminoacidic Sequence

Alignment between the predicted protein and Drosophila's protein

Sketch of Secp43 in this contig

Homologous Proteins to Plasmodium yoelii yoelii Selenoproteins

Tioreduxin reductasa (TR)

Gen obtention process
In order to look if in the Plasmodium yoelii yoelii genome there is the selenoprotein Tioreduxin Reductase, first of all we analyzed the results from the alignment made with the TBLASTN application. It was carried out with all Tioreduxin reductases present in the SelenoDB database.
The TBLASTN alignments showed satisfactory results. The alignments with the higher scores and lower e-values were selected, which we show on the following table:

What we can see is that the different Tioreduxin reductases are always aligned with the contig MALPY00659 (in concrete with its reverse sequence)from the Plasmodium yoelii yoelii genome. Furthemore, the "subject start" and the "subject end" are aproximately the same in both cases, so we can conclude that Plasmodium yoelii yoelii has only one Tioreduxin reductase.

The alignment that has the higher score and the lower e-value belongs to the Homo sapiensTioreduxin reducatse 2, that is why this protein is used on further analysis.

In this alignment between Plasmodium yoelii yoelii and Homo sapiens it can be checked that although the high score and the low e-value that it shows, the query sequence is not fully aligned with the P. yoelii genome, but it starts at the 39th aminoacid and ends at the 515th (the human protein is made up of 534 aminoacids). Taking into account that the selenocysteine of the human TR2 is found in the penultimate position and therefore, has not been included in the alignment, the contig region that includes the aligned nucleotides (13333-11887) has been selected and lengthen. Thus, now we have the contig region that covers the nucleotides 11000-14000, which we use for the Genewise application in order to realign the sequences (although the first aim to use this application is to discover the exonic structure of the protein)

The Genewise results gives more information, as it aligns the query protein from its beginning and includes the final selenocysteine. However, since the last part of the alignment is not really good, what it has been done is checking if P. yoelii yoelii has cysteine or selenocysteine at the same position as the selenocysteine of the human protein. In order to achieve that, the region of interest of our contig has been translated to aminoacids using the web page Expasy, after choosing the appropiate reading frame.

Then we looked the aligned aminoacids from Genewise and predicted the complete proteic sequence. The prediction of the aminoacidic sequence allows to declare that the protein has cysteine, which means that the Plasmodium yoelii yoelii TR is not a selenocysteine, but a selenoprotein homologous. Furthemore, the Plasmodium yoelii yoelii TR has five more aminoacids at the end compared to the human TR.

In order to check that the predicted protein in Plasmodium yoelii yoelii is correct, TBLASTN was used from the NCBI web page. We aligned the predicted protein against all the existing mRNAs from the Plasmodium genus. The results for the alignment with the mRNAs are very interesting: from all the hits, the best one belongs to the Plasmodium yoelii yoelii TR, which means the Plasmodium yoelii yoelii TR has already been characterized since the 21st of April 2006. However, not all the mRNA has been aligned, which shows that the TR is longer. The initial metionine is not the predicted one, but the final glycine is endeed the predicted one.

With the real protein found in NCBI and our Plasmodium yoelii yoelii contig MALPY00659, we executated the Exonerate program in order to find the exact nucleotide region and to confirm the gen exonic structure, that is already anotated. The Exonerate results allow us to know the exact position of the TR nucleotidic sequence of Plasmodium yoelii yoelii.

Gen Information

Localization: Plasmodium_yoelii_yoelii_str._17XNL|MALPY00659|2002-09-10|ds-DNA|Plasmodium_yoelii_TIGR

Nucleotidic sequence inside the contig: reverse sequence from the 13744 nucleotid until the 11831.

GFF Results

Aminoacidic Sequence

Results of the alignment between both proteins

Sketch of TR

Note:
While performing the alignments, the second alignment was always located at the MALPY00317 contig, showing a low e-value and a high score. Given that, it was decided to analyze this case. The next step is to align the Homo sapiens thioreduxin reductase 1 with the isolated contig of Plasmodium yoelii yoelii with GeneWise. The result of GeneWise shows an alignment from the first aminoacid but not until the last one.

The next step is to enlarge by hand the nucleotidic sequence that should correspond to the protein. With this predicted protein a TBLASTN is performed in the NCBI database agains all the mRNA within the Plasmodium. The result of the alignment with mRNA shows a perfect alignment between the predicted protein and the anotated protein of P. yoelii yoelii glutathion reductase. Because this protein doesn't take part in the selenoproteome, the analysis is not continued.

New SECIS elements

In order to find new SECIS elements, the genome of Plasmodium yoelii yoelii has been fragmented into four parts. These fragments have been introduced to SECISearch program to find the SECIS elements within the genome. The results obtained are the following ones:

Fraction 1: this fragment has given no results with SECISearch.

Fraction 2: the following SECISearch results are obtained with the corresponding picture. The results are quite good, so it is possible to find a new selenoproteine.

To know if the SECIS element found corresponds certainly to a new selenoprotein an alignment with BLAST is performed, confronting the sequence of the new SECIS with all the Plasmodium genomes available at the BLAST database. In that way, it is possible to check if the domain is conserved in other Plasmodium. The alignment results show that the SECIS element is conserved in two of the analyzed sequences of Plasmodium, so it is indeed possible to be talking about a new selenoprotein.

The following step is to search for the particular part of the genome where the SECIS element is located. Next, the TGA codon that would encode the selenocysteine has to be found. The results show a SECIS element that is found at the very beginning of a multifasta document. This fact makes it impossible to find the selenoprotein, because there isn't enough nucleotidic sequence to encode it. Moreover, the possible TGAs are too near to the SECIS element for being the ones that encode for selenocystein. Another time, the fact that the genome is in contigs and is not anotated difficults the work and, in this case, makes it impossible to predict the gene.

Fraction 3: the following SECISearch results for this fragment, with the correspondent picture.

As in the previous fractions, an alignment with BLAST is performed with the new SECIS element sequence and all the Plasmodium genomes from BLAST database. The results show no conservation in any of the Plasmodium analyzed, so there is very low probability for this region to be a new selenoprotein.

Fraction 4: for this fragment of the genome the following SECISearch results with the corresponding picture) are obtained.

Another time, an alignment with BLAST is performed with the sequence of the new SECIS element and all the Plasmodium genomes. The results are the same as in fraction 3: there is no conservation with any other Plasmodium specie, so there is very low probability for this region to be a new selenoprotein.

Proteins not found in P. yoelii yoelii genome

All the machinery proteins necessary to synthetize selenoproteins are downloaded from the SelenoDB database. These proteins have been used to perform a TBLASTN against the contigs of Plasmodium yoelii yoelii genome. The proteins that showed a positive result have been already showed previously. The basic criteria to rule out the alignments has been an e-value too high and a low score.

Sel 2

One remarkable Plasmodium falciparum selenoprotein that was not found in Plasmodium yoelii yoelii is Sel 2. First of all we obtained the aminoacidic sequence of Selenoprotein 2 from Plasmodium falciparum and, as it was already made before, we begun aligning the protein against the genome of Plasmodium yoelii yoelii with the TBLASTN application. The results of this alignment are of poor significance, as the lower e-value is 4,7. Furthermore, from the 229 aminoacids that Sel2 has in Plasmodium falciparum just 36 align with Plasmodium yoelii's genome.

Although the initial results seem to show that this protein is not present in Plasmodium yoelii's genome, we have used the application SECISearch to look for SECIS elements in this sequence. Again, the results from SECISearch are not significant, as either one SECIS has been found in any possible option of this programm (canonical, non-canonical, ATGA nor GTGA).

Due to all these negative results and with the actual data, we considered that in Plasmodium yoelii yoelii there is not Selenoprotein 2.

List of the synthesis machinery proteins that showed non-satisfactory results:

- DI (iodothyronine deiodinase)

- MrsA (methionine sulfoxide reductase A)

- GPx (glutathione peroxidase)

Other non-satisfactory results:

Alignment with Homo sapiens selenoproteome

Alignment with Pan troglodytes selenoproteome

Alignment with Mus musculus selenoproteome

Alignment with Tetraodon nigroviridis selenoproteome

Alignment with Drosophila melanogaster selenoproteome

Alignment with Anopheles gambiae selenoproteome

Alignment with Caenorhabditis elegans selenoproteome

Alignment with Saccharomyces cerevisae selenoproteome

As expected, due to the great evolutive distance between these organisms and Plasmodium yoelii yoelii , there are a lot of bad alignments.

Conclusions

In this work the Plasmodium yoelii yoelii genome has been analyzed in order to find the selenoproteome of the organism. This has been performed comparing other selenoproteins searching for homology and researching the SECIS elements to find new selenoproteins. Also, the machinery necessary for selenoproteins synthesis and other proteins that present homology have been searched.

The genome analized in this work is a non-fully sequenced genome from Plasmodium yoelii yoelii, that can be found divided in contigs (multifasta format). This fact has added a difficulty when analysing it, as contigs represent the first step in the shot gun sequentiation and therefore, are random genome sequences without any kind of order. That means that it is possible to find proteins that are coded in two different contigs, which makes its anotation impossible. It can also occur that some fragments have not been sequenced, and so it can not be found the gens that should be located in those regions. This has been a frequent problem when developing this work, as we've found many cases where our sequences of interest were located right at the beginning or end of one contig, and so we could not continue with its analysis.

Three of the four Plasmodium falciparum's selenoproteins have been found in the genome of Plamsodium yoelii yoelii. That demonstrates the great similarity that can be found between the Plasmodium's genomes and the evolutionary nearness between its different species. Furhtemore, it also strengthen the hypotesis that selenoproteins have an important function. It must be highlighted that the life-style of this specie, that remains a great part of its existence in the bloodstream, requires a sistem to face the great amount of free radicals that they can find in the blood due to the oxidative metabolism.

Once the selenoproteins from our organism's genome had been found, we expected to find the proteins required for their synthesis as well. Thus, thanks to the homology alignments with already known proteins that are involved in selenocysteine synthesis, we could find the following proteins in Plasmodium yoelii's genome that participate in the selenoprotein synthesis: EFSec, SPS, SBP, tRNA, Secp43 and SLA/LP.

Regarding the research of new selenoproteins, we must say that 3 new potential SECIS have been found, one of which has been really conserved in other Plasmodium species. Consequently, we have ruled out the other two, considering that the poor preservation compared with other Plasmodium species indicated the little possibility for them to be real selenoproteins. The two conserved potential SECIS have been analysed unsuccessfully, as the SECIS element was located right at the beginning of the contig, making it impossible to go upstream on the sequence in order to find the suspected selenoprotein.

The analysis of Plasmodium selenoproteome has a special interest because it is a possible target for new drugs against malaria. It is known that selenoproteins have a role in the redox metabolism, so the organism that has selenoproteins is more resistant to oxidative stress. For this reason, a new strategy could be to interfere in this metabolism to kill the parasite because Plasmodium has unique seleoproteins, so these proteins could be attacked without danger of damaging the host organism. It is also important to take into account that two of the selenoproteins are located in the apicoplast, an organella that is not present in humans and could also represent a therapeutic target.

References

Carlton MJ, Angiuoli VS, Suh BB, Kooij WT, Pertea M, Silva CJ et al. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature Publishing Group.2002;419:512-519.
Castellano S, Gladyshev NV, Guigó R and Berry JM. SelenoDB 1.0: a database of selenoprotein genes, proteins and SECIS elements. Nucleic Acids Research. 2008; 1-7.
Castellano S, Lobanov AV, Chapple C, Novoselov SV, Albrecht M, Hua D et al. Diversity and functional plasticity of eukaryotic selenoproteins: Identification and characterizaion of the SelJ family. PNAS. 2005; 102(45): 16188-93.
Castellano S, Morozova N, Morey M, Berry MJ, Serras F Corominas M et al. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO reports. 2001; 2(8): 697-702.
Castellano S, Novoselov SV, Kryukov GV, Lescure A, Blanco E, Krol A et al. Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO reports. 2004; 5(1): 71-77.
Hall N, Karras M, Raine DJ, Carlton MJ, Kooij WT, Berriman M et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005; 307: 82-85
Kryukov GV, Castellano S, Novoselov SV, Lobanov, Zehtab O, Guigó R et al. Characterization of Mammalian Selenoproteomes. Science. 2003; 300: 1439-43.
Lobanov VA, Delgado C, Rahlfs S, Novoselov VS, Gromer S, Hatfield LD et al. The plasmodium selenoproteome. Nucleic Acids Research. 2006; 34(2)
Xu XM, Carlson BA, Mix H, Zhang Y, Saira K, Glass RS et al. Biosynthesis of Selenocysteine on lts tRNA in Eukaryotes. PloS Biol. 2007; 5(1): 96-105.
Xu XM, Carlson BA, Zhang Y, Mix H, Kryukov GV, Glass RS et al. New Developments in Selenium Biochemistry: Selenocysteine Biosynthesis in Eukaryotes and Archaea. Biol Trace Elem Res. 2007; 119: 234-41.

Introduction

Selenoproteins

Selenocystein synthesis

SECIS element

Function of Selenoproteins

Plasmodium yoelii yoelii

Malaria's cycle

Materials & Methods

Research of Plasmodium yoelii yoelii's genome and other organisms selenoproteins

Plasmodium yoelii yoelii genome characteristics

Using BLAST to research similarity

Obtaining the genomic sequence of interest

Searching similarity and structure: Exonerate and GeneWise

Research of SECIS elements: SECISearch

Translating genes: Expasy

Aligning sequences: ClustalW

Results

Plasmodium yoelii yoelii Selenoproteins

Sel 1

Sel 3

Sel 4

Plasmodium yoelii yoelii Selenoprotein Sythesis Machinery

Eukaryotic elongation factor (EFSec)

Selenophosphate Synthetase (SPS)

SECIS binding protein (SBP)

tRNASec

Secp43

SLA/LP

Homologous Proteins to Plasmodium yoelii yoelii Selenoproteins

Tioreduxin reductasa (TR)

New SECIS elements

Proteins not found in P. yoelii yoelii genome

Sel 2

Conclusions

Conclusions

References

References