SELOPROTEIN I | SELOPROTEIN N | SELOPROTEIN O |
Selenoprotein I
Mapping:
We obtain the transcript report of this selenoprotein from the ensembl database.
The genomic location is on chromosome 2p23.3, between the 26480633 and 26530402bp.
The gene contains 10 exons that are transcribed into 8093bp on mRNA and the protein result is composed by 397 residues. The selenocysteine is located in the residue 387. SELi mRNA was detected in a variety of tissues and cell types.
All the information about the mapping is resumed in this gff.
Finding SNPs:
We can extract from the transcript report all the SNPs located in the sequence, where they are and which kind of nucleotide change makes a polymorphism.
We have found so many SNPs but we are only interested in those which affect the SecIS element, the SNPs that are in the selenoprotein or close to it.
We obtained a complete list of all the SNPs that mostly are contained in introns.
There are two SNPs affecting the protein: a T or G in the residue 102, and a T or C in the residue 289, but both are synonymous changes so the aminoacid maintains conserved. As the change produced is synonymous there is no alteration in the peptide sequence of the selenoprotein and therefore we will not study it.
Residue | SNP ID | SNP type | Alleles | Ambiguity code | Alternate residues |
102 | rs6748996 | Synonymous | T/G | K | - |
299 | rs934280 | Synonymous | T/C | Y | - |
Analyzing SecIS elements structure:
With the 3’UTR sequence obtained thanks to the possibility of extracting data in fasta files, we can use the SECISarch program.
We past the sequence and we obtain the SecIS structure predicted in our sequence. The structure of the SecIS is correct and it has not any strange form or nucleotide pattern.
The program also give us the SecIS sequence, which will be used in the next studies.
Analyzing SecIS elements:
In this step we made use of blastn. As we have explained previously we blast our SecIS sequence with the est_human database.
Analyzing SecIS elements with SNPs:
Now we know the SNP, but we want to analyze how this SNP affects the SecIS structure.
Relation between SecIS sequence and diseases:
The following table sumarises all the information mentioned above.
Figure 9: SecIS structure from Sel I.
The results show the alignment of our SecIS sequence with different database est sequences. The blastn output shows the identifier, the identity percent between the query sequence (our sequence pasted) and subject sequence (est sequences from the database), and the align of both sequences.
This output of the blastn was run in the fndsnp.pl that we have done, and we obtain the following output.
The program selects the aligns that contain an identity percent between 95 and 99%. So we have now a selection of the subject sequences that came from the est_human database.
We can see there is a sequence that contains only one SNP from the output file. Moreover the program produces a list of all SNPs that have been found in all sequences, and it counts how many times each change has occurred.
So we can see which the most habitual SNPs are. In this case, we have only found one SNP at the
position 18 consisting in a T or a gap, counting from the begining of the SecIS sequence.
So we prepared the Obtain_SecIS_sequence program to obtain our query sequence (SecIS sequence of the Sel I) with the SNP on it.
We will find this sequence in the output file.
This output is introduced into the SECISearch program to be able to obtain the SecIS Structures.
The result shows us that the Secis has a good structure and that the SNP does not affect the basic structure or the base-pairing patterns. Only a mutation in this motif: ATGA_GA could affect in an important way the SecIS structure.
ID Est | Origin | SNPs | SecIS Alteration |
gi|11984132|gb|BF698724.1| | 602126262F1 NIH_MGC_56 Homo sapiens cDNA clone | 18,T/- | NO |
No relation has been found between the Selenoprotein I and some kind
of disease in current databases.
As we have seen above, the position where the SNP is located is in the nucleotide 18 counting from the begining
of the SecIS sequence. The SNP produces a T or a gap in this position, as the table shows. This est sequence comes from a
primitive neuroectoderm, and the SecIS element is not altered.
These facts lead us think there is no relationship between selenoprotein I and any
disease.
Selenoproteina N
Mapping:
We have found in the ensembl database two transcripts of this protein.
So we have one trasncrpit report for the first transcript and another one for the second trascript.
The genomic location is on chromosome 1p35-36, between the 25810868 and 25828856bp.
The two transcripts that we have found are two different isoforms and differ in the fact that the transcript 1 contains one more exon (number 3) than the transcript 2, where it has been spliced out.
Moreover a selenocysteine is located in this exon.
So the transcript one contains one more exon and has 2 selenocysteines while the transcript 2 contains one exon less and only one selenocysteine.
Ensembl database contains some errors related to the isoform 2. We find out that the isoform 2 has one selenocystein at the same position as the isoform 1.
In the residue 428 and 429 of the isoform 2, we find a Histidine and a Cysteine instead of a Selenocysteine and a Glycine.
This finding is supported by some pubmed articles (1,2,3), where we have found that isoform 1 has two selenocysteines.
Moreover the first one (located in the exon 3, spliced out in the isoform 2) can code for a codon stop resulting a shorter peptid.
SEPN is present at a high level in several human fetal tissues and at a lower level in adult tissues, including skeletal muscle. Both transcripts were detected in skeletal muscle, brain, lung, and placenta,
but isoform 2 was always predominant.
All the information about the mapping of the transcripts is resumed in this gff .
Finding SNPs:
We can extract from the transcript report all the SNPs located in the sequence, where they are and which kind of nucleotide change makes a polymorphism. We have found so many SNPs but we are only interested in those which affect the SecIS element, the SNPs that are in the selenoprotein or close to it. We obtained a complete list of all the SNPs from isoform 1 and isoform 2.
Residue | SNP ID | SNP type | Alleles | Ambiguity code | Alternate residues |
106 | 1132855 | Non-synonymous | A/G | R | A, T |
142 | 7349185 | Non-synonymous | G/A | R | C, Y |
391 | 760597 | Synonymous | T/C | Y | - |
443 | 12023720 | Synonymous | G/A | R | - |
502 | 2294228 | Non-synonymous | C/A | M | N, K |
Residue | SNP ID | SNP type | Alleles | Ambiguity code | Alternate residues |
108 | 7349185 | Non-synonymous | G/A | R | C, Y |
357 | 760597 | Synonymous | T/C | Y | - |
409 | 12023720 | Synonymous | G/A | R | - |
468 | 2294228 | Non-synonymous | C/A | M | N, K |
None of the SNPs found are in the SecIS structure, close the selenocysteine,
or affecting this aminoacid, therefore we are not interested on them.
Based on fact that two isoforms come from the same gene but by alternative splicings, both isoforms
should contain the same aminoacid sequence but one of them containing one more exon.
As we have said before there is a difference between two isoforms in the aminoacid sequence that
they share.
Surprisingly, ensembl does not find any SNP in the codons of the two aminoacids that differ from one
isoform to the other one. This means that ensembl database contains errors or unespicified facts.
Analyzing SecIS elements structure:
With the 3’UTR sequence obtained thanks to the possibility of extracting data in fasta files, we can use the SECISearch program. We past the sequence and we obtain the SecIS structure predicted in our sequence. The program shows the different SecIS structure elements, they are correct and the base-pairing patterns seem to be also correct.
Analyzing SecIS elements:
The output of the blastn was run in the findsnp.pl program that we have prepared, and we obtain the output.
Analyzing SecIS elements with SNPs:
To evaluate the different SNPs that we have met, we use Obtain_SecIS_sequence program to obtain our query sequence (SecIS sequence of the Sel N) with one polymorphism in it. With this program we obtain a list of all the sequences each one with a SNP in an output.
Figure 11: SecIS structure from Sel N.
As we have made with the selenoprotein I, we obtain a list with the different sequences with the SNPs and also a number of the different SNPs met.
We found 10 sequences with SNPs from 99 Blast hits on the Query Sequence; we counted 16 different SNPs, but only appearing once.
With this output we go to the SECISearch and we obtain the different SecIS structures each one for each sequence introduced.
The SNPs seems not affect the structure of the SecIS element, all seems to be correct and no alteration will appear.
SNP | Frequence | Affects Structure? | SNP | Frequence | Affects Structure? |
12,C/G | 1 | NO | 52,T/A | 1 | NO |
14,G/T | 1 | NO | 57,C/A | 1 | NO |
14,-/C | 1 | NO | 61,-/C | 1 | YES?? |
16,-/T | 1 | NO | 70,-/T/C/A/G | 1 | NO |
17,-/G | 1 | NO | 72,T/C | 1 | NO |
21,G/A | 1 | NO | 73,-/G | 1 | NO |
25,C/- | 1 | NO | 75,-/A | 1 | NO |
43,A/G | 1 | NO | 82,A/C | 1 | NO |
Only the SecIS from the est with the id: gi|18797332|gb|bm556239.1|,
can present some kind of problems because of the SNP located in the residue 61 from the begining
of the secis sequence (61-c: 1 87), next to the important motif described above. This nucleotide
does not match with anything, therefore this fact can produce an alteration of secis function and
the selenoprotein in general.
Figure 12: Secis structure from Sel N adding a C at position 61.
With the Obtain_SecIS_sequence program2 that we have prepared,
we can obtain each subject sequence with all the SNPs that it contains. We have sequences with only one SNP but we have others with more than one,
and more than one possibility of nucleotide change due to the iupac ambiguity code where each letter can correspond to more than one nucleotide.
So with this program we obtain the sequences chosen from the blast output with all the SNPs that contain.
Now with this output file of our third program we use again the SECISearch program to analyze the SecIS structures of these sequences.
ID Est Origin SNPs Secis Alteration
gi|15164992|emb|AL601486.1| DKFZp313N1642_r1 313 (synonym: hlcc2) Homo sapiens cDNA clone DKFZp313N1642 5' 70,-/A/C/G/T
NO
gi|18797332|gb|BM556239.1| AGENCOURT_6544460 NIH_MGC_88 Homo sapiens cDNA clone IMAGE:5550208 5' 61, -/C YES?
gi|16771388|gb|bm042121.1| 603615746F1 NIH_MGC_112 Homo sapiens cDNA clone IMAGE:5420674 5' 12, C/G; 43, A/G NO
gi|10160532|gb|be746540.1| 601580105F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:3928800 5' 57, C/A; 73, -/G NO
gi|14294011|gb|bg913535.1| NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:4943421 5' 14,G/T;16,-/T;17,-/G NO
gi|845754|gb|r71722.1| yj85g05.r1 Soares breast 2NbHBst Homo sapiens cDNA clone IMAGE:155576 5' 25,C/- NO
gi|16523466|gb|bm009112.1| 603629475F1 NIH_MGC_41 Homo sapiens cDNA clone IMAGE:5434747 5' 75,-/A NO
gi|22694159|gb|bu180175.1| GENCOURT_8107027 NIH_MGC_112 Homo sapiens cDNA clone IMAGE:6267459 5' 21,G/A; 52,T/A NO
gi|19362988|gb|bm912609.1| GENCOURT_6612214 NIH_MGC_41 Homo sapiens cDNA clone IMAGE:5474167 5' 14,-/C; 82A/C NO
gi|10204277|gb|be783079.1| 601470665F1 NIH_MGC_67 Homo sapiens cDNA clone IMAGE:3873688 5' 72,T/C NO
The result shows us different SecIS structures each one to each sequence. All of them seem to be normal SecIS, and none has alterations
in the important motif (ATGA_GA ) for the SecIS structure. Moreover the score of all the SecIS structures calculated by the SECISearch program are very solid.
Relation between SecIS sequence and diseases:
All the est sequences found in the blast search hits
come from RNA of no altered cells, so the sequence that we have mentioned above
that could implicate an alteration in the secis structure has no relevance.
The association between selenium deficiency and muscular dystrophy in
livestock suggests a role for selenium in the pathophysiology of striated muscles.
Selenoprotein N has been associated to muscular dystrophy, causing congenital
muscular dystrophy with spinal rigidity and restrictive respiratory syndrome by mutations
in SEPN1.
Selenoprotein N is high expressed in cultured myoblasts, and is also downregulated
in differentiating myotubes, suggesting a role for SEPN1 in early development and in cell
proliferation or regeneration.
It is one the first descriptions of the relation of a selenoprotein implicated in a human
disease.
According to OMIM, there are many different allelic variants associated to the SEPN
and to the spine muscular dystrophy. Two of the allelic variants are related to changes
in a selenocysteine.
One for a T-to-G transversion at nucleotide 1384 in exon 10 of the SEPN1 gene, resulting
in a sec462-to-gly (U462G) substitution. The second is a G-to-A transition at nucleotide 1385
of the SEPN1 gene, in this case the mutation changes the selenocysteine (sec) codon (TGA) into
a stop codon (TAA), preventing selenocysteine incorporation and leading to a shorter protein.
Selenoproteina O
Mapping:
The genomic location is on chromosome 22q13.33, between the 48941865 and 48958501bp.
The gene contains 9 exons that are transcribed into 2314bp on mRNA and the protein result is composed by 669 residues. The selenocysteine is located in the residue 667.
SELO mRNA was detected in a variety of tissues and cell type.
All the information about the mapping is resumed in the gff.
Finding SNPs:
We can extract from the transcript report all the SNPs located in the sequence, where they are and which kind of nucleotide change makes a polymorphism. We have found so many SNPs but we are only interested in those which affect the SecIS element, the SNPs that are in the selenoprotein or close to it. We obtained a complete list of all the SNPs.
Residue | SNP ID | SNP type | Alleles | Ambiguity code | Alternate residues |
3 | 5771225 | Non-synonymous | T/C | Y | A, V |
21 | 5771226 | Synonymous | T/C | Y | - |
57 | 6010201 | Synonymous | C/G | S | - |
105 | 3747942 | Synonymous | T/C | Y | - |
167 | 2272846 | Non-synonymous | C/A | M | T,N |
216 | 5771231 | Non-synonymous | G/T | K | S, A |
630 | 2272852 | Non-synonymous | G/A | R | K,E |
638 | 17013238 | Non-synonymous | G/A | R | K, E |
640 | 9617118 | Synonymous | G/A | R | - |
658 | 1052827 | Synonymous | C/T | Y | - |
669 | 1052832 | Non-synonymous | C/T | Y | S,L |
None of the SNPs found are in the SecIS structure, or close the selenocysteine, so they are not relevant for our project.
Analyzing SecIS elements structure:
With this selenoprotein we have had some problems in obtaining the structure of the SecIS element. SECISearch program used with the other selenoproteins can not be used now in this case because the program does not find any secis structure. Our tutor gave us a new program that detects the secis sequence in the 3'UTR sequence of the selenoprotein with a different pattern, evaluates the thermodynamic energy and shows at the end the secis structure. RNAFold and patscan program are also needed to run this program. We obtained the SecIS structure predicted in our sequence. The program shows the SecIS structure for our sequence and it seems to be correct and nothing strange neither in the structure nor in the base-pairing pattern.
Figure 13: Secis structure from Sel O.
Analyzing SecIS elements:
The output of the blastn was run in the findsnp.pl program that we have done, and we obtain the output. As we have made with the other two selenoproteins we obtain a list with the different sequences with the SNPs and also a number of the different SNPs met.
Analyzing SecIS elements with SNPs:
Thanks to Obtain_SecIS_sequence program,
we can prepare a list of all the sequences only with one SNP in an output.
Going to the SECISearch we obtain the different SecIS structures each one for each sequence introduced that contains one SNP.
We found 14 sequences with SNPs from 59 Blast hits on the Query Sequence; we counted 22 different SNPs, some appearing once and
the other more than one time. If some SNPS appear more than once in tissues we can expect
there is a relationship between these two facts.
SNP | Frequence | Afects Structure | SNP | Frequence | Afects Structure |
4,C/A | 2 | NO | 72,C/T | 5 | NO |
5,C/T | 1 | NO | 73,-/G | 1 | NO |
39,-/T | 1 | YES | 87,-/T | 1 | NO |
39,G/A | 3 | YES | 86,G/A | 3 | NO |
45,T/A | 1 | NO | 89,G/A | 1 | NO |
53,G/C | 1 | YES | 90,T/A | 1 | NO |
56C/- | 1 | NO | 94,G/A | 1 | NO |
66,G/T | 3 | NO | 95,C/_ | 1 | NO |
68,C/T | 3 | NO | 98,G/C/T/A | 1 | NO |
70-/C | 1 | NO | 100,T/A | 1 | NO |
71,T/A | 1 | YES |
We have not been able to obtain the different images from each SecIS structure due to
an informatic problem, but we have obtained the SecIS sequence and we are able to know if the
SecIS structure can occur or not. We have analyzed 24 different sequences, but we only have
results for 20 of them, so the SecIS element is only viable in 20 of them. These 4 sequences that
seem not to have a SecIS element can have some alterations in their sequence affecting the
matches necessaries to form the shape of the SecIS element. For instance, these alterations can
be in the relevant motif (ATGA_GA) or we can find an unpaired nucleotide.
The third program (Obtain_SecIS_sequence2 program), now applied to Sel O,
makes an output file.
This file sequences with more than one SNP.
Pasting the sequences in the SECISearch program we can analyze the SecIS structures of these sequences.
The result shows us the different secis structures obtained.
We can see there are less secis structures in the result obtained than sequences introduced into the SECISearch program due
to the fact that there should be repeated sequences and also secis structures with no
thermodinamic stability that are not shown in the results of this program.
The SecIS structures are correct and nothing has happened in the structure so they are correct and viable.
ID Est | Origin | SNPs | Secis Alteration |
gi|46569321|emb|bx368938.2| | BX368938 Homo sapiens HELA CELLS COT 25-NORMALIZED cDNA clone CS0DK012YM07 5' | 71,T/A;95,C/A;100,T/A | NO |
gi|10333084|gb|be884308.1| | 601505777F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:3907314 5' | 56,C/-; 95,C/- | NO |
gi|6141147|gb|aw137014.1| | UI-H-BI1-acu-d-08-0-UI.s1 NCI_CGAP_Sub3 Homo sapiens cDNA clone IMAGE:2715686 3' | 45,T/A; 53,G/C; 94,G/A | NO |
gi|46554667|emb|bx354872.2| | BX354872 Homo sapiens NEUROBLASTOMA COT 25-NORMALIZED cDNA clone CS0DC026YN23 3' | 4,C/A; 89,G/A; 90,T/A | YES?? |
gi|3694022|gb|ai160642.1| | qc89a02.x1 Soares_pregnant_uterus_NbHPU Homo sapiens cDNA clone IMAGE:1721354 3' | 72,C/T; 86,G/A; | NO |
gi|46304542|emb|bx354948.2| | BX354948 Homo sapiens NEUROBLASTOMA COT 25-NORMALIZED cDNA clone CS0DC027YH03 3' | 4,C/A | NO |
gi|3213814|gb|ai004304.1| | ou56e04.x1 NCI_CGAP_Br2 Homo sapiens cDNA clone IMAGE:1631838 3' | 39,G/A;66,C/T;68,C/T;72,C/T;86,G/A | NO |
gi|20135238|gb|bq102254.1| | ij19a02.x1 Melton Normalized Human Islet 4 N4-HIS 1 cDNAclone IMAGE:6135051 3' | 66,C/T;68,C/T;72,C/T | NO |
gi|46307094|emb|bx349459.2| | X349459 Homo sapiens NEUROBLASTOMA COT 25-NORMALIZED cDNA clone CS0DC027YM11 3' | 70,-/C;73,-/G | YES?? |
gi|5056776|gb|ai735252.1| | at08c08.x1 Barstead aorta HPLRB6 Homo sapiens cDNA cloneIMAGE:2354510 3' similar to SW:YDIU_ECOLI P77649 HYPOTHETICAL 54.4 KD PROTEIN IN AROH-NLPC INTERGENIC | 39,G/A;72C/T | NO |
gi|14168296|gb|bg820709.1| | 602780639F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:4931528 5' | 39,-/T; 87,-/T | NO |
gi|6144075|gb|aw139357.1| | UI-H-BI1-ada-g-12-0-UI.s1 NCI_CGAP_Sub3 Homo sapiens cDNA clone IMAGE:2716246 3' | 5,C/T | NO |
gi|5673006|gb|ai934136.1| | wn97e01.x1 NCI_CGAP_Ut1 Homo sapiens cDNA clone IMAGE:2453784 3' | 39,G/A;66,C/T;68,C/T;72,C/T;86,G/A | NO |
gi|46304039|emb|bx363833.2| | BX363833 Homo sapiens B CELLS (RAMOS CELL LINE) COT 25-NORMALIZED Homo sapiens cDNA clone CS0DL006YE03 3-PRIME | 98,G/A/C/T | NO |
All of these sequences can form a SecIS element.
There are two sequences (gi|46304542|emb|bx354948.2| and gi|46304039|emb|bx363833.2|) that
not have their corresponding image. We know that the SecIS is possible but we do not know
if there is an aberrant conformation.
The four sequences that contain only one SNP mentioned above, appear know with its images.
Therefore we think that only one SNP can alter the SecIS structure, but when we have more
than one SNP, the structure is viable due to the fact that the different changes make up for each
other.
gi|46554667|emb|bx354872.2| 4c 89a 90t | gi|46307094|emb|bx349459.2| 70- 73g |
Figure 14: Secis structure from Sel O.
In this table we show two different cases of SecIS structures from the selenoprotein O.
The first one, corresponding to the identifier
gi|46554667|emb|bx354872.2| 4c 89a 90t, shows a strange SecIS structure. We do not know if it is due to
the program, that shows the real structure that will adopt the sequence in the cell, or that
this conformation is one of the multiple shapes that the sequences can adopt.
The second one corresponds to the identifier gi|46307094|emb|bx349459.2|70- 73g. Now the
SecIS structure seems to be correct but if we observe accurately the different matches, we
can see there is one nucleotide ("G" near the important motif of the SecIS structure)
that does not match with any nucleotide.
We can expect that the SecIS structure can not join SBP2, so the selenocysteine will not be
introduced into the peptid sequence.
Due to these facts, these two cases could be involved in neuroblastoma or some different diseases.
Relation between SecIS sequence and diseases:
As we have mentioned, some of the ests obtained in the blast hits are related to neuroblastoma. This fact points to a possible relation between SNPs in the selenoprotein O and this kind of disease. We wanted to know much about this possible relation so we have been looking for information in many databases, but we have not found any article or review explaining the possible relation between these two facts.