As we have said, the aim of this project is to find novel U12 introns in the human genome. Starting from Geneid_v1.2_u12 and SGP predictions of U12-intron-flanking exons, we processed this information to obtain the sequence of these flanking exons. Then, after concatenating this two sequences of 50 pb we did a BLASTN and BLASTX search to prove our predictions and identify which of the predicted U12 introns are real.
       We did the two kinds of BLAST with two different groups of predicted introns: one was real U12 introns that were analysed by our supervisor (exonpairsC50.fa
) and that we will use as a positive control. The other fasta file (exonpairsP50.fa) consisted in our predicted U12 introns that should be proved as real U12 introns or not. So, after BLASTN and BLASTX analysis, we obtained four files: : they are BLASTN results of predicted U12 introns against dbEST.       The following thing we did was classifying this results into different categories (using woundedknee.pl) according with possibilities of BLAST matching (we used some parameters to filter the hits that are not significant):
       The parameters that we used to discriminate between significant and non-significant hits were an e-value under one, number of gaps under four and a percentage of identity higher than 97%.
       In the next table there are the different categories of predictions after doing the BLASTN and BLASTX for the real and predicted U12 introns. You can follow the link if you want to see which predicted U12 introns correspond to each category:
       As it can be seen in the table, positive control files ('C' files) contain other categories than 'aligment across junction', that was what we expected. It may be due to that our supervisor and we have used different parameters to filter BLAST hits.
       The next graphic is a comparison of the results of BLASTN and BLASTX, according with the classification of the predicted U12 introns ('P' file) in the different categories.
       As we have explained before, predicted U12 introns contained in the 'alignment across junction' category are strong candidates to be novel U12 introns. To prove it, its nucleotide sequence should correspond with an intron in a database such as the UCSC Genome Browser. The UCSC Genome Browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks (known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, mouse homologies, and more). Half of the annotation tracks are computed at UCSC from publicly available sequence data. The remaining tracks are provided by collaborators worldwide. Users can also add their own custom tracks to the browser for educational or research purposes.
       We took some of the predicted U12 introns ('P' file) from the 'alignment across junction' category (from BLASTX and BLASTN) in order to determine if its position really corresponds with an intron position in the UCSC Genome Browser. Here we present some of the novel U12 introns that we found:
       If we look the start and the end position of the intron (thin grey line) we would appreciate that it corresponds with our predicted U12 intron, so it means that this predicted intron really exist in the database and we can affirm it is a U12 intron because its flanking exons had been predicted with Geneid_v1.2_u12 and SGP as U12 intron-flanking exons. We can also see where it is located in the chromosome.
       At this point, more analysis with the UCSC Genome Browser should be done to locate and confirm all the predicted U12 introns classified in the 'Alignment across junction' category. Moreover, it would be interesting to compare if BLASTX and BLASTN obtain the same U12 introns or not. Another think that could be interesting is classifying U12 introns in ATAC or GTAG group.
HOME |