Introduction
Position Wieght Matrix(PWM)
PWM is a motif descriptor that attempts to capture the intrinsic variability characteristic of sequence patterns. in a a sequence.
PWM is usually derived from a set of aligned sequences functionally related.
The matrix shows how many times a given nucleotide has been observed at a given position. We normalize the PWM because it's absolute frequency values,To get the relative frequencys we divide the value in each position of the matrix by the number of secuences used to built the matrix.
Position Weight Matrix:TRANSFACT.
Sequence in FASTA format
We use sequences in a predetermined format, the FASTA format, it begins with a single-line description of the sequence, followed by the lines of sequence data. The description line is distinguished from the sequence data by (">") in the first column.
To obtain sequences in FASTA:NCBI.
Objective
This program has been created to predict posible gene promoter regions along a DNA sequence using the Position Weight Matrix method.
MATERIAL
Position Weight Matrix: TATA box and/or GC box and/or others(TRANSFAC)    
Problem Sequences in FASTA format   
Operative System: LINUX (UNIX)
Programation Language: Perl
The Program : PROMFINDER
To run Promfinder you need:
-options |
file_sequence.fa |
file_matrix.txt |
The program is structured in three sections :
1. Inicializating the program:
Declare the options of the program. We also declare all the variables we'll use.
2. Processing the sequence:
The program reads the sequence or sequences and executes each rutine for each
sequence.
The first pass is to catch the identificator(">.......") of each sequence so we can identify
them at the results.
The next is to build an array with the sequence, now the sequence is ready be scanned by the PWM.
2.1) PWM Processing:
This section es repeated for each sequence as many times as PWM
contains the file.
Each candidate has a lenght especified by the number of positions of the PWM.
a) Open Matrix.txt
2.2) Candidates evaluating:
b)Matrix normalization.
The result of this operation is the relative frequency of each nucleotide in the different positions
of the matrix.
c)Foreach matrix we calculate the consensous sequence and its score.
The program will only show those candidates wich surpass the threshold chosen by user or
assigned by default.a) Estimating the score of the candidates.
b) If the score is high enough, when the execution of Promfinder ends, the program will show:
the initial position, the final position, the score and the sequence of each candidate.
c) Close file_matrix if there're no more matrices. If there are more matrices, go back to point b)
of section 2.1
d) Close file_sequence if there're no more sequences. If there are more, go back to point b) of
section 2.
3. Graphical Representation and end of Promfinder:
While the promfinder ran it stored all the sequences
,the candidates(and their inicial and final position), the matrices names, in order to represent
graphically those results. The graphical representation allows to visualize the exact location of
candidates in the sequence.
Finally close Promfinder
-v: information concerning the program execution,
indicating each step at schell by means of the subroutine "sub print_mess".
-m: Information about the matrix,number of matrices contained in the file,the consensous
sequences and their scores.
-s: Information about the sequence. Promfinder shows the sequence name (the FASTA id) ,its number
of nucleotides (lenght) and the content of G and C (in absolute value and %).
-t x.x: Especifying a threshold. If user doesn't use this option the program will assign a
threshold value by default. (0.8).
Download The Promfinder
Ir al principio de la página