Using Internet Databases to Collect Information on
Bladder Cancer
Riham Soliman
Research assistant
Bioinformatics group
Objectives of talk
1. Outline the importance of web-based public databases in the medical field
2. Necessity of having a biomedical research portal containing information collected from experiments on Egyptian samples
3. Outline our research goals
4. Explain our research and its benefits to the Egyptian community
Introduction
The basics
Cytoplasm
Nucleus
DNA double-helix
From: iGenetics CD-ROM (Animation Chapter 1: Genetics: An Introduction)
Molecular genetics
Nucleotides are molecules constituting the DNA double-helix
All our traits are encoded in DNA
Genes are specific sequences of nucleotides that characterize our traits passed on from parents
Modified from: iGenetics CD-ROM (Animation Chapter 2: DNA as Genetic Material: The Hershey-Chase Experiment)
C G
A T
CG
T
C G
CG
3 billion nucleotides
C G A T
Complements
Gene expression
How is DNA transformed into functional output for the cell, and consequently organism, survival?
Central dogmaDNA RNA protein
Gene expression analysis can be performed by studying RNA level- transcriptome Protein level- proteome
Transcription Translation
Genetic mutations
Changes in the genetic sequence
Required for genetic diversity among individuals
Disease-causing mutations Deletions Insertions Duplications
http://www.genome.gov//Pages/Hyperion/DIR/VIP/Glossary/Illustration/mutation.cfm
What is cancer?
Normally cells will grow and divide until organism has completed development
Some cells retain ability to grow and divide long after termination of development carcinogenesis
Uncontrolled cell division arises
The cell only cares about making more copies of itself rather than undergoing proper division
Cancer-causing mutations
Tumor suppressor genes (TSG) Mutations might cause under expression of TSG
Proto-oncogenes Mutations cause them to become over expressed Become oncogenic (cancer-causing)
Carcinogenesis is a multi-step process A single mutation is not enough Accumulation of more than one mutations is necessary
Mutagenesis: multi-step
http://www.cancervic.org.au/about-cancer/what_is_cancer
Bioinformatics: a history
Is an interdisciplinary discipline combining medicine, biology, computer science and mathematics. Serves the biological and medical community Based on computational power
Dates back to 1960s Discovery of DNA double helix Discovery of genes; contain information guiding building of all cellular
components.
Human genome project Completed in 2003 Sequencing of the entire human genome
Today Challenge of amalgamating large amounts of data from biomedical research
Genetic research Molecular research
Databases and information stored within them
Why are databases necessary?
Data provided is tailored to scientist’s requirement
Offers a variety of information on genes, RNA, proteins, diagrams, images, etc.
Databases sprout collaborations between scientists Improved research Data sharing Interoperability
Ease-of-access to stored data
Considers the fact that molecular scientists might not be computer proficient
Information provided on databases
Literature NCBI (National Centre for Biotechnology Information) General databases
Google search Scholar Academic databases
Ebscohost
Sequence data
Protein Sequence 3D structure
Level of expression Different experimental conditions comparable to physiological environment Time-course experimentation
Protein-protein and protein-DNA interactions
KEGGKyoto Encyclopedia of Genes and Genomes
Cytoplasm
Nucleus
Nuclear membrane
KEGG: bladder cancer
MAPK pathway from KEGG
WikiPathways
MAPK pathway on Wikipathway: downloaded using GenMAPPGenMapp is an open source bioinformatics application to visualize metabolic pathways
BioCarta: MAPK pathway
Data extraction from NCBI
National Center for Biotechnology Information.
Run and maintained by collaborative efforts of computer scientists, molecular biologists, biochemists, research physicians and structural biologists.
Provides information on diseases, genes, gene sequences, gene transcripts, proteins, protein interactions, function, additional resources.
Types of services offered by NCBI
PubMed Literature search service of the National Library of Medicine. Access to over 16 million citations linked to participation online journals. Speed, efficient, easy to use.
BLAST (Basic Local Alignment Search Tool) Most famous tool on NCBI Used for pair-wise sequence comparison Identification of novel sequences and/or determining their property(ies).
Entrez One of the most popular search engines in NCBI Search query can be name of gene, protein (if different) or accession number for the gene,
RNA or protein. A plethora of relevant information produced
OMIM (Online Mendelian Inheritance in Man) Used mostly by physicians and medical investigators interested with genetic disorders
Cancer-specific databases
caBIG
Is an information network connecting the cancer research community
Cancer Biomedical Informatics Grid
Provided by the National Cancer Institute (NCI) in the USA
Integrative cancer research extending from bench to bedside and back again
Accelerate discovery of new detection, diagnostic and treatment techniques to improve outcome
Shares information on clinical research, imaging, pathology and molecular biology
caBIG services and resources
Domain workspaces constitute areas of interest to the cancer-researching and medical community
1. Integrative cancer research (ICR) workspaces
2. Clinical trial management systems
3. In vivo imaging workspace
4. Tissue banks and pathology tools workspace
caBIG Tools1. Bioconductor: established open-source collection of software packages for high
throughput genome analysis
2. caArray: open-source, web and programmatically accessible array data management system
3. caIMAGE: database of cancer images
4. caMATCH: system that identifies patients who are potentially eligible for clinical trials
Profiling of bladder cancer data from public
databases
Objectives of research
1. Collecting information on genes involved in bladder cancer.
2. Assembling an interaction network for these genes.
3. Identifying biomarkers
4. Collecting expression level data, e.g., microarray data.
5. Automatic management, processing, visualization of this data.
0 5 10 15 20 25 30 35 40
MelanesiaMiddle Africa
MicronesiaIndia
ChinaWestern Africa
South-Central AsiaSouth -Eastern Asia
Eastern AfricaEastern Asia
PolynesiaCentral America
CaribbeanJapan
South AmericaSouthern Africa
Western AsiaCentral & Eastern
Australia/New ZealandNorthern Europe
Northern AfricaWestern Europe
Northern AmericaSouthern Europe
Egypt
Rate per 100,000 population
Males
Females
Figure 1.3: Age-standardised (World) incidence rates for bladder cancer, by sex, world regions, 2002 estimates
Source: http://info.cancerresearchuk.org/cancerstats/types/bladder/incidence/
Bladder cancer stages
From: OXFORD,G.A.R.Y. and THEODORESCU,D.A.N. Review Article: The Role of Ras Superfamily Proteins in Bladder Cancer Progression, The Journal of Urology, 170: 1987-1993, 2003.
From: http://cornellurology.com/bladder/gi/types.shtml
Carcinoma in situ
Transitional cell carcinoma
Metastatic transitional cell
carcinoma
Squamous cell carcinoma
Invasive (high grade)
Superficial (low grade)
Bladder cancer types
Aetiology of bladder cancer in Egypt
Cigarette smoking (3-7 fold risk) (Samanic et al. 2006)
Aromatic amines Occupational hazard
Schistosomiasis (Michaud, 2007) Bathing in infested waters Working in fields SCC was more common TCC during times of high
schistosomiasis.
Genes involved in bladder cancer
To identify genes involved in bladder carcinogenesis and progression, internet research was performed to gather information about these genes.
Sources
Publicly available databases e.g.
NCBI www.ncbi.nlm.nih.gov/ KEGG http://www.genome.jp/ BioGRID http://www.thebiogrid.org/ GeneOntology http://amigo.geneontology.org/ Ensembl www.ensembl.org/
Literature search using Pubmed (NCBI) and Google.
Data collection
Genes were collected using Boolean queries, e.g., “Bladder cancer, name of gene”.
We identified 261 genes related to bladder cancer Data was summarized in a list containing gene information
and interacting genes. Gene name, NCBI accession number, URLs Chromosome locus Protein-protein interactions Function in normal cell Function in bladder cancer cell Diagnostic/prognostic potential or use Literature
Data annotation
Biomarker identification
Target in cancer research is mainly to predict tumor behavior. Early diagnosis Prevent delayed treatment situations
We need to distinguish harmless early lesions from those that will progress into cancer.
Depends on good tests and tools.
Current diagnosis of bladder cancer: cystoscopy.
Research community is developing good biomarkers for this purpose. Biomarkers are molecules that could be targeted in therapy.
MarkerSensitivity
%Specificity
% Method of detection Manufacturer
NMP22 47-87 58-91 Enzyme immunoanalysis Matritech
BTA STAT 57-82 61-82Antigen-antibody
colorimetric Bard Diagnostics
BTA TRAK 55-80 38-98 Enzyme immunoanalysis Bard Diagnostics
FDP 41-93 77-94Antigen-antibody
colorimetric Intracel Corp
Telomerase 53-91 46-99 Polymerase chain reaction Oncor
Immunocyt 86-95 79-90Immunofluorescence
immnoassay/ cytology Diagnocure
Quanticyt 45-59 70-93 MorphometryGentian Scientific
Software
UBC 59-79 84-96 Enzyme immunoanalysis IDL
CYFRA 21-1 74-99 57-78Eelectrochemiluminescence
assayRoche Diagnostics
BLCA4 85-96 85-100 Enzyme immunoanalysisEichrom Technologies
Hyaluronic acid/hyaluronidase 82-92 83-96 Enzyme immunoanalysis
Biomarkers in use
NMP22
CD44
Hyaluronic acid
Hyaluronidase
NMP22GPSM2
Markers inserted onto KEGG’s bladder cancer network
Microarray technology
Measuring gene expression
cDNA microarrayCustom-made
OligonucleotideReady
Gene expression analysis: Transcriptomics
Microarray technology: the study of mRNA levels in cells Transcriptome
Looks at the abundance of the transcript for thousands of genes High throughput
Revolutionized by Affymetrix company
http://en.wikipedia.org/w
iki/DN
A_m
icroarray
Affymetrix array
From : http://www.fastol.com/~renkwitz/microarray_chips.htm
Differential expression
Up regulation
Down regulation
ControlCancer
Output of microarray
Raw image is usually a 16-bit TIFF file.
Microarray image processor converts color intensities into raw quantitative data (probe-level data)
No immediate observations can be made concerning gene expression from raw data
Statistical analysis applications are used to interrogate the data for information on gene expression patterns
Raw data storage
Modes of data storage
As files•Data is stored directly on the institution’s or lab’s computer
•Does not require special software
•Difficult to track and query the data if larger experiments are performed.
In local databases•Commercial or academic
•Allows local storage of data
•Good tracking and management of experimental data and integration with public MA databases.
•Requires purchase, installation and maintenance of complex software
Public and commercial microarray databases
PUBLIC
GEO (Gene expression omnibus) NCBI
ArrayExpress (EBI-EMBL) caBIG SMD (Stanford microarray
database) Yale microarray database RED (Rice expression
database) Oncomine
COMMERCIAL Oncomine Array Informatics Limas GeNet (Russian website)
OTHER CleanEx (SIB) GenMAPP
Our bladder cancer microarray data collection
Queried “Bladder cancer” using all public databases identified
Collected 14 data sets on bladder cancer ArrayExpress GEO Oncomine
Based on literature, there are unpublished data sets
Gender
Disease state
Disease staging
Precomputational analyses
Some databases provide information from preliminary analysis on data.
Make data exploration much easier and quicker for the user.
Oncomine
ONCOMINE™ RESEARCH
ONCOMINE performs pre-computations on data to make data exploration much easier and quicker
Oncomine is made up of 3 layers• Data input• Data analysis• Data visualization
Single-experiment analysis Multiple-experiment analysis
Upper quartile
Lower quartile
Median
Smallest value
Outlier
Largest value
Outlier
Single and multiple experiment analyses
SIB (Swiss Institute of Bioinformatics)
Research groups based in different European countries.
The main goal is to provide a bioinformatics platform conglomerating as well as analyzing different data sets
CleanEx microarray database Data is analyzed into their portal for easier access and interpretation
CleanEx•Provided through the Swiss Institute of Bioinformatics (SIB)
•Service similar to ONCOMINE but gathers data sets only from GEO
Does not allow profile visualization
Collecting information on bladder cancer in Egypt
specifically
Published article on bladder cancer in Egypt
Ewis et al. (2007) studied bilharzia-associated SCC (squamous cell carcinoma)
Analysis performed using with microarray 17 patients diagnosed at the Egyptian National Cancer
Institute.
RESULT Showed a change in expression- differential expression -
in 82 genes 38 genes up regulated 44 genes down regulated
Our own data analysis on Ewis et al. data
1. Annotated information gathered on each of 82 genes
2. Compared expression pattern for each gene with other data sets from public, free databases
3. Identified 7 genes from the Ewis study showing opposition to all other datasets collected
4. Identified 3 genes from the Ewis study correlating in expression with other studies from databases
5. Gathered more detailed information on the 7 genes
Where do they lie in our KEGG pathway network How vital are they to cell function Does Ewis data make sense (based on the known function)?
Discrepancies found in results Keratin 16
KRT16
KRT7
TGFBR
TGFβ
SMAD2/3 SMAD4
ACVR1B
JNK
KEGG BC pathway with all significant markers for research
Not much data provided on the remaining proteinsWE NEED TO UNDERSTAND THEIR FUNCTION
Modified from the KEGG database
CONCLUSION
Follow up of Ewis et al. study
PROS Offers good preliminary
information on bilharzia-associated bladder cancer in the Egyptian population
CONS Several mistakes detected in
annotation Pooled samples Only SCC studied Does not explain the present
discrepancies in the results e.g. Keratin 16
FOLLOW UP STUDY IS NECESSARY TO UNDERSTAND DISCREPANCIES AND GENETIC DIFFERENCES
BETWEEN WESTERN AND EGYPTIAN PATIENTS
Problems with data collection
1. Information in databases is expanding as more research is carried out.
2. Each public database does not have a complete representation of all molecules.
Time-consuming to look through several databases.
3. There is no bladder cancer-specific database.
4. Automated methods are needed to update the data.
Long-term objectives of our study
1. Determine the genetic and molecular profile of the Egyptian bladder cancer patients
Based on histology Based on the bilharzial status
2. Identify biomarkers to use as drug targets in a clinical setting
3. Improve treatment modalities Tailored to the Egyptian profile
Thank you