Iron Response ElementS
METHODS, PROCEDURE AND RESULTS
RUN ON BIGGER SET
After obtaining the different patterns for each protein we wanted to run them on a bigger
set of sequences, so that we use the same database from which we obtained the initial sequences (training and test
sequences). We wanted to do this in order to check if we had done a constrained prediction of these patterns or not.
Firstly, we downloaded the UTR database and decompressed it. It was organizated for kind of organism and on
5' or 3' location. From all the organisms, we just took human, invertebrate, mammalian, vertebrate and rodent sequences.
We selected at random the sequences using the AC number (access number) and as
for running the patterns they had to be changed to Fasta
format.
We created two files:
- 5'db file:
contained 200 5'UTR sequences from each organism subgroups told before (1000 sequences)
- 3'db file: contained 200 3'UTR sequences from each organism subgroups told
before (1000 sequences)
These two files simulated a small UTR database, so we could run in this way the transferrin pattern in 3'db and ferritin pattern in 5'db. As a result no sequences
were found. This corrovorated that our patterns were very strict and then, it would be better
creating two new files containing a greater number of sequences:
- 5'db_2: all 5'UTR
sequences of the original database that we have downloaded previously.
- 3'db_2: all 3'UTR sequences of the original database that we have downloaded previously.
As before with the limited databases, we run the transferrin pattern in 3'db_2 and ferritin pattern in 5'db_2.
Although in this case some sequences appeared, none of them corresponded to the original ones from which the patterns were
built (training set). We realized then that AC (access number) was not the same as ID (identification number), therefore we
had to modify the two files (3' and 5' UTR db_2) replacing the AC with the ID.
Finally, we run the ferritin and transferrin patterns again but on the bigger
and modified files and the results were the following:
- We obtained all the initial sequences
for transferrin and ferritin (test and training) and,
- A new group of sequences arised: 11 ferritin
sequences and 10 transferrin sequences
(in fact they were just two different transferrins but with 5 IREs each one). All of them
were checked at the EMBL database and they were all sequences with confirmed IREs.
To conclude this analysis on a bigger set we want to be sure that the positive results (sequences obtained with ferritin and transferrin pattern) are really true. To do this we search the universal IRE pattern into the database and run it through all the new sequences.The results confirm that the new found sequences are really IREs.