As the objective of this project was finding novel U12 introns, our conclusions should inform which of the predicted U12 introns are in reality U12 introns. However, as we don't know the accuracy of our last script (woundedknee.pl), we can not conclude without any doubt that these predicted introns into the group 'Alignment across junction' are novel U12 introns. Furthermore, it could be possible that other U12 introns could exist into other groups, so they would be categorized in the wrong way. Actually, this fact seems to be supported because we obtain predictions which are classified into other groups than 'Alignment across junction' if we analyze 'C' files. As 'C' files are real U12 introns proved by our supervisor, they should be all categorized into 'Alignment across junction' group. This fact may be due to that our supervisor may used other parameters to classify the predictions.
       What we have get in this project is some novel U12 introns such as those four showed in the results. As it is explained in the results, UCSC Genome Browser has been used to verify and locate novel U12 introns. More data of the predicted U12 introns should be done to find all U12 introns.
       Other interesting thing that could be done is classifying novel U12 introns into different groups: atac vs gtag. Moreover, and if we want to find which of the predicted U12 introns into 'Alignment across junction' group are the best candidates to be real U12 introns, we should compare this group between that derived from BLASTN and that derived from BLASTX. If we can find two predictions in the same group but from different BLAST variants, they should be U12 introns with higher probability than if can not find it. However, we should also have to verify them with UCSC Genome Browser.
       We can think about other general conclusions about our project. Geneid_v1.2_u12 and SGP software, although they are very sensitive in detecting U12 splice sites and assemble exons with U12 donor or acceptor sites into valid gene models, their specificity is low and some predictions are false positives. So, we need some processing of their output to find real U12 introns. We can confirm after doing this project, that applying PERL programming skills and using computers as a tool, is very useful an sometimes essential to do such kind of work. PERL was useful in this project to find predicted U12 introns from predicted U12 exon-flanking exons, to obtain the sequence of the flanking exons and to analyze BLAST results and organize the predictions into different groups. Moreover, other bioinformatics tools such as BLAST had been vital to accomplish with our initial purpose.
HOME |