Iron Response ElementS
METHODS, PROCEDURE AND RESULTS
BUILDING THE PATTERNS
Once we got all the sequences we started constructing the secondary patterns. The procedure for constructing them was the following:
- We grouped all training sequences from the same protein together;
- We search into each sequence the hexaloop found at the literature conserved among all the IREs structure (CAGWUN). It was placed in almost all entries at the middle of the sequence.
- We searched for base complentation between both sites of the hexaloop. The base pairing allowed were the classic ones (AóT and CóG) and as well GóT (it has been that although it is not a classic base pair it is present in many secondary structures). The patterns were built taking in mind that in all sequences we should found at 5bp from the hexaloop a C-bulge or a loop-bulge. In ferritin sequences we found a loop-bulge while in transferrin, eALAS and mt-aconitase a C-bulge was present. The IRP pattern could not be built because despite the sequences contained the hexaloop, there were did not find the base pairings expected arround it.
The base pairing obtained can be seen at the following links:
- Training set (ferritin and transferrin plus high homology proteins, 15 sequences).
- Aconitase (4 sequences. The original number of sequences were 10 but they were a mixture of 5' and 3' UTR sequences. All the 3' sequences did not contained the terminal hexaloop so there rejected for building the pattern. Some ot the 5' entries did not fit the IREs pattern because the were not having base pairing around the loop and neither the middle C-bulge. They were rejected as well).
- Ferroportin (5 sequences).
After this we built the consensus pattern for each group of proteins. It was built looking for nucleotide conservation at each sequence and along the positions at the hexaloop, the five nucleotides upstream the hexaloop, the middle loop and all the other nucleotides with base pairing at both mRNA sites. Just saying that the ferritin from the yellow fever mosquito did not fit the general ferritin structure conformation so it was not considered for building the patterns (this one contains a C-bulge instead of a loop-bulge). The linear patterns obtained are the following:
Segment 1 Middle loop Segment 2 Hexaloop
- Ferritin: Y TAC WDMVD CAGTAH
- Transferrin: WKTAT C RGDRR CAGWAH
- Aconitase: YAT C TTTAT CAGWAH
- Ferroportin: AACTT C RGCTA CAGTGW
But the patterns for being recognized for Patscan, a program capable of searching a specified motif along a sequence, had to be in a way the system could recognize (in a secondary stuctured way). We modified the patterns as the Patscan rules determined (http://www-unix.mcs.anl.gov/compbio/PatScan/HTML/patscan.html).
Click here to get the new patterns
At this point we were able to run all the patterns on all the sequences we had. Before but we colleted a total 40 sequences (20 sequences from the 5' region and 20 more from the 3') whose mRNAs did not contain IRE sequences. Running the patterns not only along the interested sequences (the IREs containing ones) but also on these, we could be able to determine how strict the patterns were.
Click in the following link to see the sequences
We run the ferritin secondary pattern on the training set, the test set and on the files that contained teh non IREs mRNAs 5' and 3' (click here to get the Pearl commands you can use to do it). The pattern recognized from the training set a total of 7 sequences (6 ferritins and the succinate dehydrogenase), from the test set 7 sequences (all of them ferritins, excluded the one from fruit fly) and no sequences in both, 5' and 3' non-IREs sequences.
The transferrin pattern recognized (click here to get the commands) on the training set the 5 sequeces, on the test set the 4 sequences and as before zero sequences from non-IREs mRNAs.
When running the aconitase pattern on the original mitochondrial aconitase file (the one containing the 10 sequences), 5 sequences were recognized (click here to get the commands). They were all the ones used for building the pattern. No sequences were found at all from the non IREs containin mRNAs.
The ferroportin pattern when run on the corresponding sequences (click here to get the commands) gave us 4 sequences back. One sequence, the one from rerio, was missing.
As it can be seen, all our patterns are very strict because they discriminate in a really specific way the different IREs isoforms. To sum up, they just recognize the mRNAs from the protein family to which the pattern is directed to.