SEQUENCE PROTEIN ANALYSIS
With the
hemagglutinin-esterase sequence, which corresponds with Bovine coronavirus (PP33468) we make a BLAST against Swissprot database, to find more sequences similar to this. Most of the sequences that we have
found had an e-value of 0.0. These sequences have the same protein with a few
differences (punctual mutations) and others are hemagglutinin-esterase from
other viruses like influenza or murine hepatitis virus. When we arrive to e-values of approximately
5.5, there begin to appear some proteins that are not related with
hemagglutinin-esterase, like Phenylalanyl-tRNA synthetase.
We selected
hemagglutinin-esterase sequences from random organisms; to choose the sequences
we use the ones with lower e-value, and we completed this selection with more
viruses from a virus
sequences database.
After, we ran ClustalW to observe these sequences aligned.
In the alignement
we can appreciate that influenza’s sequence is the longest and it
contains some domains, that only appears in this protein:
(184)-NCNNSFLK- (193), which
corresponds to N-glycosilation signal.
C-terminal end contains
other glycosilation signals, and other domains, like phosphorylation
sites.
Berne virus and one of the Murine
hepatitis viruses are the shortest, but all of them are practically 100% aligned.
When we analyze the
protein domains using different database, we found differences between the
results:
In Pfam, we only found one functional domain; it corresponds to hemagglutinin-esterase,
which position is near N-terminal end, from approximately the 20-30th to 460th
aminoacid position in Coronaviridae family and longer for influenza.
In Prodom database, we found multiple
domains
that group together. They occupy the same aminoacid position as
hemagglutinin-esterase in P-fam. The majority of these domains are
glycosilation signals, and some of them are specific for a sequence, like Influenza
or Bovine coronavirus’ domains.
After we analyze the information obtained in the database, we hypothesize that our protein contain 420-430 aminoacids approximately, but in influenza it’s longer. In its N-terminal end, it has a signal peptide; maybe this function is to indicate that this portien must be expressed in membrane. Very near to this signal peptide we found the functional domain and a transmembrane region.
We can appreciate too, a lot of N-glycosilation
signals, inside and outside of the functional domain. These glycosilations are
responsible for the hard molecular weight of our glycoprotein (47709Da
approximately)
When we study our sequences running Blocks, we didn’t find any motifs. But when we searched the conserved domain of this family protein in
the database of Blocks, we found 8 different motifs. We
studied them with Prosite, and we
found that the second one didn’t have any function, the others are Casein and
Tyrosin kinase phosphorylation sites and N-glycosilation or myristoylation
sites; this strengthens the information that we obtained from studied domains,
where we found a lot of N-glycosilation signals. These motifs aren’t completely
maintained, but they have a correlation with regular expressions that characterizes
these signals.
Besides, the function that we obtained for
these motifs are very usual, so its possible, that these aren’t really. When we
analyze the aminoacid structure of Influenza, we found that some of this
motifs are conserved, this strengthens that they are really, but we cannot say
the same in all case.