Iron Response ElementS

METHODS, PROCEDURE AND RESULTS

 

RUN ON BIGGER SET

After obtaining the different patterns for each protein we wanted to run them on a bigger set of sequences, so that we use the same database from which we obtained the initial sequences (training and test sequences). We wanted to do this in order to check if we had done a constrained prediction of these patterns or not.

Firstly, we downloaded the UTR database and decompressed it. It was organizated for kind of organism and on 5' or 3' location. From all the organisms, we just took human, invertebrate, mammalian, vertebrate and rodent sequences.

We selected at random the sequences using the AC number (access number) and as for running the patterns they had to be changed to Fasta format.

We created two files:

- 5'db file: contained 200 5'UTR sequences from each organism subgroups told before (1000 sequences)
- 3'db file: contained 200 3'UTR sequences from each organism subgroups told before (1000 sequences)

These two files simulated a small UTR database, so we could run in this way the transferrin pattern in 3'db and ferritin pattern in 5'db. As a result no sequences were found. This corrovorated that our patterns were very strict and then, it would be better creating two new files containing a greater number of sequences:

- 5'db_2: all 5'UTR sequences of the original database that we have downloaded previously.
- 3'db_2: all 3'UTR sequences of the original database that we have downloaded previously.

As before with the limited databases, we run the transferrin pattern in 3'db_2 and ferritin pattern in 5'db_2. Although in this case some sequences appeared, none of them corresponded to the original ones from which the patterns were built (training set). We realized then that AC (access number) was not the same as ID (identification number), therefore we had to modify the two files (3' and 5' UTR db_2) replacing the AC with the ID.

Finally, we run the ferritin and transferrin patterns again but on the bigger and modified files and the results were the following:

- We obtained all the initial sequences for transferrin and ferritin (test and training) and,
- A new group of sequences arised: 11 ferritin sequences and 10 transferrin sequences (in fact they were just two different transferrins but with 5 IREs each one). All of them were checked at the EMBL database and they were all sequences with confirmed IREs.

To conclude this analysis on a bigger set we want to be sure that the positive results (sequences obtained with ferritin and transferrin pattern) are really true. To do this we search the universal IRE pattern into the database and run it through all the new sequences.The results confirm that the new found sequences are really IREs.