The results presented about the
second prediction of Geneid and Genescan show the same kind of conclusions.
Blastp indicates the presence of homologous regions to the polymerase domain
of various reverse transcriptase. Observing the RepeatMasker results we
can see that this region is full of LINE 1 elements. It is known that L1
family sequences have some characteristic features such as an A-rich stretch
at the 3' end, a truncated 5' end, the existence of significantly long
open reading frames (ORFs), and the presence of L1 family transcripts in
various types of cells. However, due to base substitutions or truncation,
most elements appear incapable of producing mRNA that can be translated
(1). Furthermore, L1 family sequence contains an ORF which has a significant
homology with several RNA-dependent DNA polymerases (RT) with viral origin
(2); which may allow their own dispersion. In conclusion, we think that
these predictions only refer to the LINE1 elements, without translation
possiblity.
According to the results obtained
for the third gene, we affirm that the predictions made by the gene predicting
programmes are not right. Although GeneId and Genscan predict a gene that
matches to the end of the sequence of the FLJ22419 protein, they fail to
predict its begining. Not only do not they consider the exon at the 4.000
nucleotide inside the gene, but they suppose a begining for the gene, around
the 64,000 nucleotide, which is not correct. There are enough evidences
to think that the gene is composed by eight exons, although only five of
them are codified inside our BAC, RP11-758L3. Through further research,
we have stablished the position of the other three exons and we have evidences
which support that the two first ones are codded in BAC RP11-598P2. There
is a great variability among the length of the introns inside the gene,
while the last ones are quite short, the first introns are longer and can
easily reach 60,000 nucleotides. Anyway, it is important to highlight that
we have not found any BAC containing the sequence for the third exon, although
we know its exact position in the chromosome. After all the investigation
developed, we are bound to suggest that the FLJ22419 protein could exist
as a product of this gene. |
The fourth gene predicted by Geneid,
and the fifth one predicted by Genescan, are not real genes. The results
obtained with the different programmes confirm that they are repeatitive
areas which codify non functional envelope proteins. These proteins belong
to the HERV-H human endogenous retrovirus family, which are commonly found
integrated in the genome (3). Usually the three large envelopes (HERV-H/env62,
HERV-H/env60, and HERV-H/env59) in humans are prematurely stopped in the
majority of primates, that is, they are not translated (4). In this case,
the absence of ESTs supports this hypothesis. To sum up, our results are
consistent with the absence of a strong selective pressure for the conservation
of a functional envelope gene of possible benefit for the host.
After analysing the results we have
obtained, we suggest that the gene number six, predicted by Genscan, could
possibly be a pseudogen. The Blastp and Interpro results, indicate that
the protein product of this gene has a Homeobox domain which appears to
be the same of the Homeobox domain from VENTX2 protein. VENTX2 protein
is an haemopoietic progenitor, homeobox protein, expressed in bone marrow
and it is located in chromosome 10q26.3 (5). Doing a Blastx 2 sequence
we saw that a part of VENTX2's mRNA aligns with a segment of 200 aa of
the predicted protein (428aa). However, the rest of the protein has no
similarity with other known proteins. These results support the hypotesis
that a piece of VENTX2 is duplicated in this region. Furthermore, the analysis
of the RepeatMasker shows a high concentration of repeatitive elements
in this area. These reasons made us think this gene could have been stablished
by a process of duplication or pseudogene formation (retroposition) and
consequently inactivated by mutation and mobile elements as transposons. |