GENE ANALYSIS
In the
database Genebank (http://www.ncbi.nlm.nih.gov/Genebank) we found the genomic sequences for our hemagglutinin-esterase protein
sequences, except three of them that we didn’t find; these are Influenza C,
Breda virus and Murine hepatitis virus (the shortest one). Our
viruses are RNA viruses, but in this database the sequences are like cDNA,
maybe, because the entire database has to be in the same format, unless you
can’t compare the sequences. With
Clustalw, we aligned them.
Usually, in
eukariotic coding regions, we find more frequently C and G nucleotides, but in
this case, when we analyzed the content of nucleotides in our sequences, the
ratio AT/CG is bigger than the expected results. For the strange results that
we obtain, we decided analyze this with more detail, for this, we study the
AT/CG ratio of more proteins in Coronaviridae and Influenza, and
other proteins of a lot of different kind of random viruses.
We always found
the same results: the viruses’ sequences are rich in A and T. The reasons of
this aren’t known, but we propose some hypothesis:
It can be related with the
autocomplementarity of the strand, if it has a strong autocomplementarity can
be interesting have low content of G and C because their interaction is
stronger than AT one. If the sequence is rich in A and T it’s easier separate
it to the translation.
G and C are the most mutable
nucleotides. The evolution of viruses are very fast, so it mutes more, and its
content in GC was reduced in favor of A and T.
Finally we
made the analysis of the conserved motifs in our cDNA sequences, with GENIO/logo
server, and we didn’t obtain any significant conservation on
our sequences, except the initial ATG. This maybe occurs, because we
worked with RNA virical sequences and this evolves very quickly, so the RNA polimerase
mistaken more than DNA polimerase.