SEQUENCE PROTEIN ANALYSIS

With the hemagglutinin-esterase sequence, which corresponds with Bovine coronavirus (PP33468) we make a BLAST against Swissprot database, to find more sequences similar to this. Most of the sequences that we have found had an e-value of 0.0. These sequences have the same protein with a few differences (punctual mutations) and others are hemagglutinin-esterase from other viruses like influenza or murine hepatitis virus. When we arrive to e-values of approximately 5.5, there begin to appear some proteins that are not related with hemagglutinin-esterase, like Phenylalanyl-tRNA synthetase.

We selected hemagglutinin-esterase sequences from random organisms; to choose the sequences we use the ones with lower e-value, and we completed this selection with more viruses from a virus sequences database. After, we ran ClustalW to observe these sequences aligned.

In the alignement we can appreciate that influenza’s sequence is the longest and it contains some domains, that only appears in this protein:

(184)-NCNNSFLK- (193), which corresponds to N-glycosilation signal.

C-terminal end contains other glycosilation signals, and other domains, like phosphorylation sites.

Berne virus and one of the Murine hepatitis viruses are the shortest, but all of them are practically 100% aligned.

When we analyze the protein domains using different database, we found differences between the results:

In Pfam, we only found one functional domain; it corresponds to hemagglutinin-esterase, which position is near N-terminal end, from approximately the 20-30th to 460th aminoacid position in Coronaviridae family and longer for influenza.

In Prodom database, we found multiple domains that group together. They occupy the same aminoacid position as hemagglutinin-esterase in P-fam. The majority of these domains are glycosilation signals, and some of them are specific for a sequence, like Influenza or Bovine coronavirus’ domains.

After we analyze the information obtained in the database, we hypothesize that our protein contain 420-430 aminoacids approximately, but in influenza it’s longer. In its N-terminal end, it has a signal peptide; maybe this function is to indicate that this portien must be expressed in membrane. Very near to this signal peptide we found the functional domain and a transmembrane region.

We can appreciate too, a lot of N-glycosilation signals, inside and outside of the functional domain. These glycosilations are responsible for the hard molecular weight of our glycoprotein (47709Da approximately)

When we study our sequences running Blocks, we didn’t find any motifs. But when we searched the conserved domain of this family protein in the database of Blocks, we found 8 different motifs. We studied them with Prosite, and we found that the second one didn’t have any function, the others are Casein and Tyrosin kinase phosphorylation sites and N-glycosilation or myristoylation sites; this strengthens the information that we obtained from studied domains, where we found a lot of N-glycosilation signals. These motifs aren’t completely maintained, but they have a correlation with regular expressions that characterizes these signals.

Besides, the function that we obtained for these motifs are very usual, so its possible, that these aren’t really. When we analyze the aminoacid structure of Influenza, we found that some of this motifs are conserved, this strengthens that they are really, but we cannot say the same in all case.