Proteins have a modular organisation. They are made up of different regions with specific functions: protein (or functional) domains. Each domain is characterized by one or more sequence motifs, which are related to the function carried out by the domain. Preservation of that function is what prevents the motif from gradually disappearing by the accumulation of mutations during evolution. The same domain may be found in many different proteins from the same organism and in many different organisms. For instance, the RNA binding domain is one of the most abundant in all eukaryotes.
We can represent a conserved domains as a multiple alignment. From the multiple alignment we can build a description of the sequence motif using a consensus sequence, a position specific scoring matrices (or weight matrix) or a hidden markov model (this will be studied in the course of structural biology).
A number of databases exists that store
information on known protein domains:
(Prosite, Pfam, Interpro, SMART, ...).
In Interpro (Mulder
et al., 2005) we can access different domain databases.
Using a formal representation of the domain (for example by a position
specific scoring matrix) we can search for other molecules that contain
the same domain in sequence databases. These searches are usually very
sensitive and allow us to detect remote homologies.
CLUSTAL W (1.82) multiple sequence alignment
<>
ABL_CALVI/28-40 YIHRDLAARNCLV 13
ABL_DROME/505-517 YIHRDLAARNCLV 13
ABL_FSVHY/308-320 FIHRDLAARNCLV 13
ABL2_HUMAN/405-417 FIHRDLAARNCLV 13
ABL1_MOUSE/359-371 FIHRDLAARNCLV 13
ABL1_HUMAN/359-371 FIHRDLAARNCLV 13
ABL1_CAEEL/428-440 FIHRDLAARNCLV 13
7LES_DROME/2339-2351 FVHRDLACRNCLV 13
7LES_DROVI/2351-2363 FVHRDLACRNCLV 13::*****.*****
>fragment_seq1 ACGTGTATCAGAGCTCATCAGAGGGTAAAGTTCACAAAAGACCACACTGTCAGACAGAAAGAGGAAGTAT CTCCAGAGGCAGTTGGTGTCACCAGCCAGCGACCAGTGTTTTGTCCTTTTCATAAAAAGGAGCAGCTGAA GCTGTACTGTGAGACATGTGACAAACTGACATGTCGAGACTGTCAGTTGTTAGAACATAAAGAGCATAGA TACCAATTTATAGAAGAAGCTTTTCAGAATCAGAAAGTGATCATAGATACACTAATCACCAAACTGATGG AAAAAACAAAATACATAAAATTCACAGGAAATCAGATCCAAAACAGAATTATTGAAGTAAATCAAAATCA AAAGCAGGTGGAACAGGATATTAAAGTTGCTATATTTACACTGATGGTAGAAATAAATAAAAAAGGAAAA GCTCTACTGCATCAGTTAGAGAGCCTTGCAAAGGACCATCGCATGAAACTTATGCAACAACAACAGGAAG