Prediction of genes containing IRE sequences.
by Francesc Xavier Guix Ràfols & Eva Lambea Martínez



INTRODUCTION

The aim of our project is to find those genes of mouse containing IRE motifs in their structure.

To read general introduction click here.


MATERIALS

Our source of cDNAs was obtained from the FANTOM database . It contains data for full length sequenced 60,770 RIKEN cDNA clones in FASTA format, which were used to make exon predictions running the Geneid program.

We also created a program called Predictor.pl written in Perl programming language. This program allowed us to make predictions on each cDNA sequence independently.

We were provided with an IRE evidences file called IRESpatrolaxeforward.IDs which was used to extract those genes containing IRE motifs.

Data mining was performed by using Unix commands through Shell terminal.


METHODS & PROCEEDINGS

The first step was to make a program Predictor1.0.pl witten in Perl programming language to perform the following tasks:

  1. Reading sequence by sequence from the source file containing the 60,770 cDNAs in FASTA format.
  2. Introducing the present reading sequence into an intermediate file called mid.fa so one only sequence is found inside the file when read.
  3. Geneid is run to predict an exonic structure on the sequence contained in mid.fa.
  4. The result of the prediction is added to an outfile which name is specified by the user before running Predictor.pl. At the end, this outfile will contain all the predictions that Geneid has made.



RESULTS

We obtained a file called IRESfantom.fa with 594 cDNA sequences with possible IRE structures.

The next step would consist of using a gff file with the predicted IREs as external evidence in the prediction performed by Geneid on these 594 cDNA. This was carried out by Selma Serra & Mateu Lichtenstein.


DISCUSSION

At this point, we would like to comment the results obtained during our project in a general way (the specific results will be shown in the general conclusions page). It is important to emphasize the fact that there are no predictions for all the cDNA sequences as well as not all the cDNAs were introduced into the Fantom data base in reverse, as the article affirms.

Part of our work consisted of preparing a program (Predictor V2.0) capable of making predictions of genes having IREs (through Geneid), and choosing (at the same time) the gff file containing the evidence (there is one for each cDNA sequence with possible IRE structure, in total 594) that corresponds to the cDNA being read. The program worked perfectly and allowed us to obtain the general results shown in the conclusions.


REFERENCES