Introduction and Objectives


Cancer is one of the most frequent pathology in our society. In biomedicine, many studies are focused on diagnosis and treatment of cancer. Technologic and biomolecular development offer a lot of tools for diagnosis. For example, cDNA Microarrays technology studies differential expression patterns in tumours. With this information, we could know which genes are the cause of a pathology. So, we could discover some potential targets for therapeutical treatment.

In our project, we will study differential gene expression through bladder cancer progression using microarrays databases and we will also learn many important tools related on computer science. Our objectives are focused on identifying:

Moreover, we will try to get conclusions about potential targets for diagnosis and treatment.

We have based our study in these two scientific articles:




back to the top




Materials and Methods


Databases. We obtained the database of the article “Identifying distinct classes of bladder carcinoma using microarrays” from NCBI web page in Gene Expression Omnibus (GEO).

Bladder tumour stage classification is a simple channel database. We just wanted to study the progresion of cancer and this is why we only checked tumour samples Ta, T1 and T2+.

Significant expression values selection. In this step, we studied statistical analysis using R-script. We found this program in the following web page:nin.crg.es/cgi-bin/pMargeWeb.cgi.

First of all, we identified samples corresponding on different stages and we created three files: Ta-T1, T1-T2 and Ta-T2. We obtained p-values. P-value shows differential expression significance. In our study, we selected genes which had a p-value lower than 0.001 (p<0.001) following the next process:

column selection---Data---Filter---Autofilter---Personalized autofilter: “lower than 0.001”.

After this process, we obtained 138 differential expressed genes in Ta-T1 progression, 9 in T1-T2 and 103 genes in Ta-T2. These were the genes analyzed later.

Looking for “Gene Symbol”. We obtained gene symbols using EASE program. These symbols will be very important to analyze results lately.

EASE program was found in DAVID page (Database for Annotation, Visualization and Integrated Discovery). We introduced gene codes in “Input Genes” and we pressed “Annotate Genes”. Apart from obtaining gene symbols, we also obtained some information about functionality and chromosome location.

Clusterings.It is one of the most frequent method used to analyzing information obtained by biochips. This method displays the information in a tree cluster, which is very useful because we can see the results graphically. There is not only one type of clustering method. Here we show the classification and give information of different clusters:

In our project, we used hierchichal clustering. First of all, we went to GEPAS (Gene Expression Pattern Analysis Suite) and followed these instructions:

Tools---Preprocessing DNA array data files

We chose Log-transform base 2 (changing negative values to 10 is important) and Standardize patterns. This process normalize our values, which is very important to get good results.

Afterwards, we send to cluster. We selected UPGMA using correlation distance condition.

Genes functionality. To find information about the function of genes and biological processes in which these genes were involved, we used two sources::

Using GOCharts and FatiGO, we got information about functionality, but we could use NCBI web page to find more specific and interesting information about genes.

Representative mRNA Access. We wanted to have the same kind of gene nomenclature to be able to compare and analyze the results. For this, we used SOURCE program.

Compare results.

In the previous step, we obtained files containig two columns, one of them had gene symbols and the other one contained corresponding Representative mRNAcess for each gene. To compare our results and the article’s ones, we needed files with just one of the columns. That was the reason why we used these commands in Shell (Unix): Then, we ordered every list of genes using sort and we eliminated repeated genes using uniq command to avoid mistakes:

cut -f2 Sanchez_NM.txt*
cut -f2 TaT1_NM.txt
cut -f2 T1T2_NM.txt
cut -f2 TaT2_NM.txt

*this file contains the information of Sanchez-Carbayo publication. Then, we ordered every list of genes using sort and we eliminated repeated genes using uniq command to avoid mistakes:

cut -f2 Sanchez_NM.txt
cut -f2 TaT1_NM.txt
cut -f2 T1T2_NM.txt
cut -f2 TaT2_NM.txt

Finally, we should identify genes that were common in both studies. We determined if we had obtained the same representative genes than Sanchez-Carbayo or if there were no common genes.

cat Sanchez.txt TaT1NM.txt | sort | uniq -c
cat Sanchez.txt T1T2NM.txt | sort | uniq -c
cat Sanchez.txt TaT2NM.txt | sort | uniq -c


back to the top




Results and Discussion

Now, we are going to comment the results that we have obtained:

Gene Discovery in Bladder Cancer Progression using cDNA Microarrays.
According to Sanchez-Carbayo article, significant overexpressed genes would be related on:

They observe a remarkable repression in genes involved in:

There are other pathways with a differential expression patterns. These are: checkpoint regulation (for example Cdc16, which is necessary in mitotic spindle formation), and also genes involved in apoptosis.

Identifying distinct classes of bladder carcinoma using microarrays.

Comparing gene expression in different stages they conclude that:


Our results.

Here we show clusters that we have obtained by taking genes with p-value lower than 0.001:


Cluster progressió Ta-T1

Cluster progressió T1-T2

Cluster progressió Ta-T2
*You can see images bigger if you click on them.

As we said before in Matherial and Methods, using Shell commands we tried to find genes which were differentially expressed in Sanchez-Carbayo article and in the database of the other article that we analyzed.

Unfortunately, we did not find common genes. Then, we focused our objectives on analyzing the genes which were more significative according to clusters. Below we show chosen genes, their localisation and also their functions.

Genes differentially expressed in Ta-T1 progression:

Overexpressed Gene Symbol Chomosome Function
ETV6 12 Member of ETS family transcription factor.
NPM1 5 RNA-binding nucleolar phosphoprotein.
DDX3X X DEAD box proteins family. Involved in the beginning of traduction, mithocondrial and nuclear splicing, ribosome complex formation
RORC 1 Immune system: limphoide embryogenesis.
TNR 1 Embryonic development.


Repressed Gene Symbol Chromosome Function
DSC2 18 cadherin family member. It is important in cell-cell unions.
VTN 17 Involved in adhesion
NR4A3 9 Transcription activator
MEST 7 Development



Genes differentially expressed trogh Ta-T2 progression:

Overexpressed Gene Symbol Chromosome Function;
LAMA2 6 Extracellular protin which regulates adhesion, migration and cell organization during embryonic development.
VTN 17 Cell adhesion
MEST 7 Development
APOA1 11 Lipid metabolism. Enhance cholesterol eflux from tissues to liver tissue.


Repressed Gene Symbol Chromosome Function
CCR6 6 Immune system: Chemokine receptor in T cells and in dendrytic cells.S
AIF1 6 Immune system: Genes induced by citokines and intepheron. This gene is involved in anti-inflammatory responses.
TAF10 11 Transcription begining.
JAG1 20 Development. Receptor notch1 ligand.




As we can verify, significant genes wih p-values<0.001 have similar functions as compared with representative genes of both articles. Therefore, we can see genes typically related on tumoral processes as for instance genes involved in cell adhesion and genes responsible for decreasing immune system efficacy. We can also see changes in lipid metabolism and deregulation of transcription driving to cell cycle regulation problems.

Below, we show FatiGO and GOCharts results. We can confirm what we have already said and it is also important to stand out the importance of cellular programmed death and oncogenesis in cancer. We conclude that our results about functionality of differentially expressed genes are the same as in both studied articles.


FatyGO Ta-T1.

FatyGO T1-T2

FatyGO Ta-T2.

GOCharts Ta-T1

GOCharts T1-T2

GOCharts Ta-T2

back to the top



Conclusions




This project has been very interesting for us because we have realized the importance of computer tools in biology and specially in results’analysis obtained in experimental processes.

cDNA microarrays technology permits mollecular markers identification. These markers have an enormous clinical potentiality for diagnosis and terapheutical treatment. Moreover, gene expression profile changes during cancer evolution let us know which cell processes are responsible for pathology.

Finally, we have concluded that science advance is the key to discover efficient and specific solutions for current treatments.


back to the top