Practical 18: Sequence similarity searches (II)
Bioinformatics 2004/2005

PSI-BLAST

PSI-BLAST (Position-Specific Iterated BLAST) is a tool that produces a position-specific scoring matrix constructed from a multiple alignment of the top-scoring BLAST responses to a given query sequence. This scoring matrix produces a profile designed to identify the key positions of conserved amino acids within a motif. When a profile is used to search a database, it can often detect subtle relationships between proteins that are distant structural or functional homologues. These relationships are often not detected by a BLAST search with a sample sequence query.

For a simple example of what a consensus sequence, or profile, looks like, consider that the EF-hand binding loop of the calmodulin family. A multiple alignment of this region for three calmodulin proteins is as follows:

		POSITION #	 1   3 4   6   8       12
		
		CALM_HUM      ...D K D G D G T I T T K E
		CALM_SCHPO    ...D R D Q D G N I T S N E
		CALM_YEAST    ...D K D N N G S I S S S E

The profile of this region can be represented as follows:

     	Loop Position #		1   3    5  6   8       12
   	Profile			D x D x D/N G x I x x x E

Here x stands for positions where there is variability in amino acid type, and therefore, that position is not heavily weighted in the alignment.

Exercise




The rules for deriving this simple profile are:

  1. any position with 90% amino acid identity or greater is considered conserved in the profile, and thus a higher score would be given when the conserved amino acid is found at that position in the sequence.
  2. any position that always contains one of only two types of amino acids would be up-weighted to give a higher score whenever either of those two amino acids appears at that position.

PSI-BLAST employs more sophisticated rules to create a profile than this example. This profile guides the sequence similarity search and increases the sensitivity. In the first round, PSI-BLAST is just like a normal BLAST; it finds sequence homologues. In the second round or "iteration" of PSI-BLAST, it figures out which residues tend to be conserved by creating a custom profile for each position of the sequence from a multiple alignment. Then another BLAST is performed, using the profile to produce a position-specific scoring matrix based on which positions evolution has conserved vs. which positions evolution has allowed to vary. The sequences found after the first round are added to the profile, allowing PSI-BLAST to detect more distant homologues in each iteration. One of the known weaknesses of PSI-BLAST is that its ability to detect distant relationships between proteins is critically dependent on the choice of the query sequence. For this reason, a recommended strategy with PSI-BLAST is to query using individual functional domains. PSI-BLAST will then find other proteins that share this domain, even if they do not possess overall homology.

psi_blast

There are three common categories of homologues that are studied in relation to biological molecules, sequence homology, structural homology, and functional homology. Sequence homology is the easiest to identify, and is therefore the primary target of many bioinformatics methods. Sequence homology yields direct implications about the relatedness of proteins and their potential pathways of derivation. However, to help understand how a protein is implicated in a certain disease state, or how to design a pharmaceutical that interacts with a given protein, functional and/or structural information is necessary. Functional homologues are relatively easy to define, as they are any two proteins, or protein domains, that perform similar functions. Structural homologues contain similar "folds", which are localized regions of a molecule that comprise a structural feature such as a "beta barrel" or "four helical bundle" motif. The fold can encompass the entire protein, or just one domain of the protein. When considering sequence, functional, or structural homology, it is important to understand that one type of homology between proteins does not always infer another type of homology. Nevertheless, it is a reasonable assumption that proteins that are related through evolutionary pathways are likely to have some degree of all three types of homology. PSI-BLAST was engineered to identify distant relationships between sequences that are too subtle to discover with a regular BLAST search.

For more information, see the BLAST/PSI-BLAST Tutorial at NCBI



Case study: the Kua-UEV protein

1. Identification of homologues of Kua-UEV using PSI-BLAST



2. Alignment of Kua-UEV, Kua, UEV and protein gi:57104280.

We have seen that the matches to the query protein distribute in two regions. We take the proteins Kua (AAF08702) and UEV variant 1 (NP_071887) and align them to the query:


3. Gene organisation of Kua and UEV in Homo sapiens






4. Identification of homologues of Kua and UEV separately


References proteins Kua-UEV and Kua homolog in Myxococcus xanthus (CarF):  

Genome Res. 2000 Nov;10(11):1743-56.

Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene.

Thomson TM, Lozano JJ, Loukili N, Carrio R, Serras F, Cormand B, Valeri M, Diaz VM, Abril J, Burset M, Merino J, Macaya A, Corominas M, Guigo R.

Institut de Biologia Molecular, Consejo Superior de Investigaciones Cientificas, Barcelona, Spain. tthomson@hg.vhebron.es

UEV proteins are enzymatically inactive variants of the E2 ubiquitin-conjugating enzymes that regulate noncanonical elongation of ubiquitin chains. In Saccharomyces cerevisiae, UEV is part of the RAD6-mediated error-free DNA repair pathway. In mammalian cells, UEV proteins can modulate c-FOS transcription and the G2-M transition of the cell cycle. Here we show that the UEV genes from phylogenetically distant organisms present a remarkable conservation in their exon-intron structure. We also show that the human UEV1 gene is fused with the previously unknown gene Kua. In Caenorhabditis elegans and Drosophila melanogaster, Kua and UEV are in separated loci, and are expressed as independent transcripts and proteins. In humans, Kua and UEV1 are adjacent genes, expressed either as separate transcripts encoding independent Kua and UEV1 proteins, or as a hybrid Kua-UEV transcript, encoding a two-domain protein. Kua proteins represent a novel class of conserved proteins with juxtamembrane histidine-rich motifs. Experiments with epitope-tagged proteins show that UEV1A is a nuclear protein, whereas both Kua and Kua-UEV localize to cytoplasmic structures, indicating that the Kua domain determines the cytoplasmic localization of Kua-UEV. Therefore, the addition of a Kua domain to UEV in the fused Kua-UEV protein confers new biological properties to this regulator of variant polyubiquitination.



Mol Microbiol. 2003 Jan;47(2):561-71.
A novel regulatory gene for light-induced carotenoid synthesis in the bacterium Myxococcus xanthus.

Fontes M, Galbis-Martinez L, Murillo FJ.

Departamento de Genetica y Microbiologia, Facultad de Biologie, Universidad de Murcia, Spain.

Myxococcus xanthus cells respond to blue light by producing carotenoids. Light triggers a network of regulatory actions that lead to the transcriptional activation of the carotenoid genes. By screening the colour phenotype of a collection of Tn5-lac insertion mutants, we have isolated a new mutant devoid of carotenoid synthesis. We map the transposon insertion, which co-segregates with the mutant phenotype, to a previously unknown gene designated here as carF. An in frame deletion within carF causes the same phenotype as the Tn5-lac insertion. The carF deletion prevents the activation of the normally light-inducible genes, without affecting the expression of any of the regulatory genes known to be expressed in a light-independent manner. Until now, the switch that sets off the regulatory cascade had been identified with light-driven inactivation of protein CarR, an antisigma factor. The exact mechanism of this inactivation has remained elusive. We show by epistatic analysis that the carF gene product participates in the light-dependent inactivation of CarR. The predicted CarF amino acid sequence reveals no known prokaryotic homologues. On the other hand, CarF is remarkably similar to Kua, a family of proteins of unknown function that is widely distributed among eukaryotes.



Page mantained by Mar Albà and Eduardo Eyras, February 2006