diseases

71
Disea ses Disea ses Disea ses Disea ses Disea ses Disea ses Diseas es Anatom y Anatom y Anatom y Anatom y Anatom y Anatom y G ene s G ene s Ge n e s G ene s G ene s G ene s Physi ol ogy Physiol ogy Phy s i ol ogy Physi ol ogy Physiol ogy Phy s i ol ogy Disease s Physiol ogy Anatomy Genes Genes Genes Disease s Disease s Medical Informati cs Bioinforma tics Novel relationship s & Deeper insights

Upload: adolph

Post on 13-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Genes. Diseases. Diseases. Diseases. Physiology. Diseases. Physiology. Genes. Genes. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Anatomy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Diseases

DiseasesDiseasesDiseases

DiseasesDiseasesDiseases

Diseases

Anatomy

Anatomy

Anatomy

Anatomy

Anatomy

Anatomy

Gen

esG

enes

Gen

esG

enes

Gen

esG

enes

Physiology Physiolog

y Physiology Physiolog

y Physiology Physiolog

y

Diseases

Physiology

Anatomy

Genes

Genes

Genes

Diseases

Diseases

Medical Informatics

Bioinformatics

Novel relationships & Deeper insights

Page 2: Diseases

Integrative Genomics For Understanding Disease

Process

Anil JeggaDivision of Biomedical Informatics,

Cincinnati Children’s Hospital Medical Center (CCHMC)

Department of Pediatrics, University of CincinnatiCincinnati, Ohio - [email protected]

Page 3: Diseases

Acknowledgement• Jing Chen• Mrunal Deshmukh• Sivakumar Gowrisankar• Chandra Gudivada• Arvind Muthukrishnan• Bruce J Aronow

Page 4: Diseases

Medical Informatics Bioinformatics

Patient Records

Patient Records

Disease Database

Disease Database→Name→Synonyms→Related/Similar Diseases→Subtypes→Etiology →Predisposing Causes→Pathogenesis→Molecular Basis→Population Genetics→Clinical findings→System(s) involved→Lesions →Diagnosis→Prognosis→Treatment→Clinical Trials……

PubMed

Clinical Trials

Clinical Trials

Two Separate Worlds…..

With Some Data Exchange…

Genome

Transcriptome

Proteome

Interactome

Metabolome

Physiome

Regulome Variome

Pathome

Pharm

aco

geno

me

OMIMClinical

Synopsis

Disease

World

354 “omes” so far………

and there is “UNKNOME” too - genes with no function knownhttp://omics.org/index.php/Alphabetically_ordered_list_of_omics

(as on October 15, 2006)

Page 5: Diseases

To correlate diseases with anatomical parts affected, the genes/proteins involved, and the underlying physiological processes (interactions, pathways, processes). In other words, bringing the disciplines of Medical Informatics (MI) and BioInformatics (BI) together (Biomedical Informatics - BMI) to support personalized or “tailor-made” medicine.

Motivation

How to integrate multiple types of genome-scale data across experiments and phenotypes in order

to find genes associated with diseases

Page 6: Diseases

Model Organism Databases: Common Issues

• Heterogeneous Data Sets - Data Integration– From Genotype to Phenotype– Experimental and Consensus Views

• Incorporation of Large Datasets– Whole genome annotation pipelines– Large scale mutagenesis/variation projects

(dbSNP)

• Computational vs. Literature-based Data Collection and Evaluation (MedLine)

• Data Mining– extraction of new knowledge– testable hypotheses (Hypothesis Generation)

Page 7: Diseases

Support Complex Queries• Get me all genes involved in brain

development that are expressed in the Central Nervous System.

• Get me all genes involved in brain development in human and mouse that also show iron ion binding activity.

• For this set of genes, what aspects of function and/or cellular localization do they share?

• For this set of genes, what mutations are reported to cause pathological conditions?

Page 8: Diseases

Bioinformatic Data-1978 to present

• DNA sequence• Gene expression• Protein expression• Protein Structure• Genome mapping• SNPs & Mutations

• Metabolic networks• Regulatory networks• Trait mapping• Gene function analysis• Scientific literature• and others………..

Page 9: Diseases

Human Genome Project – Data Deluge

Database name Records

Nucleotide 11,512,792

Protein 313,099

Structure 8,490

Genome Sequences

51

Popset 20,801

SNP 12,702,095

3D Domains 31,862

Domains 25

GEO Datasets 2,969

GEO Expressions 9,783,946

UniGene 86,804

UniSTS 322,092

PubMed Central 3,140

HomoloGene 20,123

Taxonomy 1

No. of Human Gene Records currently in NCBI: 31507 (excluding pseudogenes, mitochondrial genes and obsolete records).

Includes ~460 microRNAs

NCBI Human Genome Statistics – as on October 18, 2006

Page 10: Diseases

The Gene Expression Data Deluge

Till 2000: 413 papers on microarray!

YearPubMed Articles

2001 834

2002 1557

2003 2421

2004 3508

2005 4400

2006-

4083+

Problems Deluge!Allison DB, Cui X, Page GP, Sabripour M. 2006. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 7(1): 55-65.

Page 11: Diseases

• 3 scientific journals in 1750

• Now - >120,000 scientific journals!

• >500,000 medical articles/year

• >4,000,000 scientific articles/year

• >16 million abstracts in PubMed derived from >32,500 journals

• >4.5 billion distinct web pages indexed by Google! Google Search for integrative genomics: ~930,000 hits

“integrative genomics”: ~112,000 hits

Information Deluge…..

A researcher would have to scan 130 different journals and read 27 papers per day to follow a single disease, such as breast cancer (Baasiri et al., 1999 Oncogene 18: 7958-7965).

Page 12: Diseases

•Accelerin•Antiquitin•Bang Senseless•Bride of Sevenless•Christmas Factor•Cockeye•Crack•Draculin•Dickie’s small eye

Disease names• Mobius Syndrome with

Poland’s Anomaly• Werner’s syndrome• Down’s syndrome• Angelman’s syndrome• Creutzfeld-Jacob

disease

•Draculin•Fidgetin•Gleeful•Knobhead•Lunatic Fringe•Mortalin•Orphanin•Profilactin•Sonic Hedgehog

Data-driven Problems…..

Gene Nomenclature

• How to name or describe proteins, genes, drugs, diseases and conditions consistently and coherently?

• How to ascribe and name a function, process or location consistently?

• How to describe interactions, partners, reactions and complexes?

• Develop/Use controlled or restricted vocabularies (IUPAC-like naming conventions, HGNC, MGI, UMLS, etc.)

• Create/Use thesauruses, central repositories or synonym lists (MeSH, UMLS, etc.)

• Work towards synoptic reporting and structured abstracting

Some Solutions

1. Generally, the names refer to some feature of the mutant phenotype

2. Dickie’s small eye (Thieler et al., 1978, Anat Embryol (Berl), 155: 81-86) is now Pax6

3. Gleeful: "This gene encodes a C2H2 zinc finger transcription factor with high sequence similarity to vertebrate Gli proteins, so we have named the gene gleeful (Gfl)." (Furlong et al., 2001, Science 293: 1632)

What’s in a name!Rose is a rose is a rose is a rose!

Page 13: Diseases

Some more ambiguous examples……..• The yeast homologue of the human gene PMS1,

which codes for a DNA repair protein, is called PMS2; whereas yeast PMS1 corresponds to human PMS2!

• Even more confusing, 4,257 abbreviated names were used to refer to more than one gene. Top of the list was MT1, used to describe at least 11 members of a cluster of genes encoding small proteins that bind to metal ions (Nature: 411: 631-632).

• AR*E: aryl sulfatase E in all species• f**K: fuculokinase gene in bacteria

and there are some weird ones too……..

Page 14: Diseases

Rose is a rose is a rose is a rose….. Not Really!

Image Sources: Somewhere from the internet…

What is a cell?

• any small compartment

• (biology) the basic structural and functional unit of all organisms; they may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals

• a device that delivers an electric current as a result of chemical reaction

• a small unit serving as part of or as the nucleus of a larger political movement

• cellular telephone: a hand-held mobile radiotelephone for use in an area divided into small sections, each with its own short-range transmitter/receiver

• small room in which a monk or nun lives

• a room where a prisoner is kept

Page 15: Diseases

Foundation Model Explorer

Semantic Groups, Types and Concepts:

• Semantic Group Biology – Semantic Type Cell

• Semantic Groups Object OR Devices – Semantic Types Manufactured Device or Electrical Device or Communication Device

• Semantic Group Organization – Semantic Type Political Group

Page 16: Diseases

Hepatocellular Carcinoma

CTNNB1

MET

TP53

1. COLORECTAL CANCER [3-BP DEL, SER45DEL]2. COLORECTAL CANCER [SER33TYR]3. PILOMATRICOMA, SOMATIC [SER33TYR]4. HEPATOBLASTOMA, SOMATIC [THR41ALA]5. DESMOID TUMOR, SOMATIC [THR41ALA]6. PILOMATRICOMA, SOMATIC [ASP32GLY]7. OVARIAN CARCINOMA, ENDOMETRIOID TYPE, SOMATIC [SER37CYS]8. HEPATOCELLULAR CARCINOMA SOMATIC [SER45PHE]9. HEPATOCELLULAR CARCINOMA SOMATIC [SER45PRO]10. MEDULLOBLASTOMA, SOMATIC [SER33PHE]

1. COLORECTAL CANCER [3-BP DEL, SER45DEL]2. COLORECTAL CANCER [SER33TYR]3. PILOMATRICOMA, SOMATIC [SER33TYR]4. HEPATOBLASTOMA, SOMATIC [THR41ALA]5. DESMOID TUMOR, SOMATIC [THR41ALA]6. PILOMATRICOMA, SOMATIC [ASP32GLY]7. OVARIAN CARCINOMA, ENDOMETRIOID TYPE, SOMATIC [SER37CYS]8. HEPATOCELLULAR CARCINOMA SOMATIC [SER45PHE]9. HEPATOCELLULAR CARCINOMA SOMATIC [SER45PRO]10. MEDULLOBLASTOMA, SOMATIC [SER33PHE]

1. HEPATOCELLULAR CARCINOMA SOMATIC [ARG249SER]

1. HEPATOCELLULAR CARCINOMA SOMATIC [ARG249SER]

TP53*

aflatoxin B1, a mycotoxin induces a very specific G-to-T mutation at codon 249 in the tumor suppressor gene p53.

Environmental Effects

Many disease states are complex, because of many genes (alleles & ethnicity, gene families, etc.), environmental effects (life style, exposure, etc.) and the interactions.

The REAL Problems

Page 17: Diseases

HEPATOCELLULAR CARCINOMA

LIVER: •Hepatocellular carcinoma; •Micronodular cirrhosis; •Subacute progressive viral hepatitis

NEOPLASIA: •Primary liver cancer

CTNNB1

MET

TP53

1. ALK in cardiac myocytes 2. Cell to Cell Adhesion Signaling 3. Inactivation of Gsk3 by AKT causes

accumulation of b-catenin in Alveolar Macrophages

4. Multi-step Regulation of Transcription by Pitx2 5. Presenilin action in Notch and Wnt signaling 6. Trefoil Factors Initiate Mucosal Healing 7. WNT Signaling Pathway

1. ALK in cardiac myocytes 2. Cell to Cell Adhesion Signaling 3. Inactivation of Gsk3 by AKT causes

accumulation of b-catenin in Alveolar Macrophages

4. Multi-step Regulation of Transcription by Pitx2 5. Presenilin action in Notch and Wnt signaling 6. Trefoil Factors Initiate Mucosal Healing 7. WNT Signaling Pathway

1. CBL mediated ligand-induced downregulation of EGF receptors

2. Signaling of Hepatocyte Growth Factor Receptor

1. CBL mediated ligand-induced downregulation of EGF receptors

2. Signaling of Hepatocyte Growth Factor Receptor 1. Estrogen-responsive protein Efp

controls cell cycle and breast tumors growth

2. ATM Signaling Pathway 3. BTG family proteins and cell

cycle regulation 4. Cell Cycle 5. RB Tumor

Suppressor/Checkpoint Signaling in response to DNA damage

6. Regulation of transcriptional activity by PML

7. Regulation of cell cycle progression by Plk3

8. Hypoxia and p53 in the Cardiovascular system

9. p53 Signaling Pathway 10. Apoptotic Signaling in Response

to DNA Damage 11. Role of BRCA1, BRCA2 and ATR

in Cancer Susceptibility….Many More…..

1. Estrogen-responsive protein Efp controls cell cycle and breast tumors growth

2. ATM Signaling Pathway 3. BTG family proteins and cell

cycle regulation 4. Cell Cycle 5. RB Tumor

Suppressor/Checkpoint Signaling in response to DNA damage

6. Regulation of transcriptional activity by PML

7. Regulation of cell cycle progression by Plk3

8. Hypoxia and p53 in the Cardiovascular system

9. p53 Signaling Pathway 10. Apoptotic Signaling in Response

to DNA Damage 11. Role of BRCA1, BRCA2 and ATR

in Cancer Susceptibility….Many More…..

The REAL Problems

Page 18: Diseases

Integrative Genomics - what is it?Another buzzword or a meaningful concept useful for

biomedical research?

Acquisition, Integration, Curation, and Analysis of biological data

Integrative Genomics: the study of complex interactions between genes, organism and environment, the triple helix of biology. Gene <–> Organism <-> Environment

It is definitely beyond the buzzword stage - Universities now have programs named 'Integrated Genomics.'

Hypothesis

Information is not knowledge - Albert Einstein

Page 19: Diseases

1. Link driven federations• Explicit links between databanks.

2. Warehousing• Data is downloaded, filtered,

integrated and stored in a warehouse. Answers to queries are taken from the warehouse.

3. Others….. Semantic Web, etc………

Methods for Integration

Page 20: Diseases

1. Creates explicit links between databanks

2. query: get interesting results and use web links to reach related data in other databanks

Examples: NCBI-Entrez, SRS

Link-driven Federations

Page 21: Diseases

http://www.ncbi.nlm.nih.gov/Database/datamodel/

Page 22: Diseases

http://www.ncbi.nlm.nih.gov/Database/datamodel/

Page 23: Diseases

http://www.ncbi.nlm.nih.gov/Database/datamodel/

Page 24: Diseases

http://www.ncbi.nlm.nih.gov/Database/datamodel/

Page 25: Diseases

http://www.ncbi.nlm.nih.gov/Database/datamodel/

Page 26: Diseases

Querying Entrez-Gene

Page 27: Diseases
Page 28: Diseases

Database name

No. of Records

Query= p53

Query= TP53

(HGNC)

Query= p53 OR TP53

PubMed 37,962 1928 38,512

PMC 9647 373 9738

Book 710 332 744

Nucleotide 7062 1603 8442

Protein 3882 314 3970

Genome 12 0 12

OMIM 317 79 744

SNP 14,277 1513 14,779

Gene 1058 258 1115

Homologene 723 31 735

GEO Profiles 68,000 10,539 70,718

Cancer Chr 292 129 421

Page 29: Diseases

1.Advantages• complex queries• Fast

2.Disadvantages• require good knowledge• syntax based• terminology problem not solved

Link-driven Federations

Page 30: Diseases

Data is downloaded, filtered, integrated and stored in a warehouse. Answers to queries are taken from the warehouse.

Data Warehousing

Advantages1. Good for very-specific,

task-based queries and studies.

2. Since it is custom-built and usually expert-curated, relatively less error-prone.

Disadvantages1. Can become quickly

outdated – needs constant updates.

2. Limited functionality – For e.g., one disease-based or one system-based.

Page 31: Diseases

No Integrative Genomics is Complete without Ontologies

• Gene Ontology (GO)

• Unified Medical Language System (UMLS)

Gene World Biomedical World

Page 32: Diseases

• Molecular Function = elemental activity/task– the tasks performed by individual gene products; examples

are carbohydrate binding and ATPase activity

– What a product ‘does’, precise activity

• Biological Process = biological goal or objective– broad biological goals, such as dna repair or purine

metabolism, that are accomplished by ordered assemblies of molecular functions

– Biological objective, accomplished via one or more ordered assemblies of functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular

complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

– ‘is located in’ (‘is a subcomponent of’ )

The 3 Gene Ontologies

http://www.geneontology.org

Page 33: Diseases

Function (what) Process (why)

Drive a nail - into wood Carpentry

Drive stake - into soil Gardening

Smash a bug Pest Control

A performer’s juggling object Entertainment

Example: Gene Product = hammer

http://www.geneontology.org

Page 34: Diseases

• ISS: Inferred from sequence or structural similarity

• IDA: Inferred from direct assay• IPI: Inferred from physical interaction• TAS: Traceable author statement• IMP: Inferred from mutant phenotype• IGI: Inferred from genetic interaction• IEP: Inferred from expression pattern• ND: no data available

GO term associations: Evidence Codes

http://www.geneontology.org

Page 35: Diseases

• Access gene product functional information

• Find how much of a proteome is involved in a process/ function/ component in the cell

• Map GO terms and incorporate manual annotations into own databases

• Provide a link between biological knowledge and

• gene expression profiles

• proteomics data

What can researchers do with GO?

• Getting the GO and GO_Association Files

• Data Mining– My Favorite Gene– By GO– By Sequence

• Analysis of Data– Clustering by

function/process• Other Tools

And how?

Page 36: Diseases

http://www.geneontology.org/

Page 37: Diseases

Open biomedical ontologies

http://obo.sourceforge.net/

Page 38: Diseases

Unified Medical Language System Knowledge Server– UMLSKS

http://umlsks.nlm.nih.gov/kss/

• The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic and full-text databases, and expert systems.

• The Semantic Network, through its semantic types, provides a consistent categorization of all concepts represented in the UMLS Metathesaurus. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain.

• The SPECIALIST Lexicon is an English language lexicon with many biomedical terms, containing syntactic, morphological, and orthographic information for each term or word.

Page 39: Diseases

Unified Medical Language SystemMetathesaurus

• about over 1 million biomedical concepts • About 5 million concept names from more than 100 controlled

vocabularies and classifications (some in multiple languages) used in patient records, administrative health data, bibliographic and full-text databases and expert systems.

• The Metathesaurus is organized by concept or meaning. Alternate names for the same concept (synonyms, lexical variants, and translations) are linked together.

• Each Metathesaurus concept has attributes that help to define its meaning, e.g., the semantic type(s) or categories to which it belongs, its position in the hierarchical contexts from various source vocabularies, and, for many concepts, a definition.

• Customizable: Users can exclude vocabularies that are not relevant for specific purposes or not licensed for use in their institutions. MetamorphoSys, the multi-platform Java install and customization program distributed with the UMLS resources, helps users to generate pre-defined or custom subsets of the Metathesaurus.

• Uses: – linking between different clinical or biomedical vocabularies– information retrieval from databases with human assigned subject index

terms and from free-text information sources– linking patient records to related information in bibliographic, full-text, or

factual databases– natural language processing and automated indexing research

Page 40: Diseases

UMLSKS – Semantic Network

• Complexity reduced by grouping concepts according to the semantic types that have been assigned to them.

• There are currently 15 semantic groups that provide a partition of the UMLS Metathesaurus for 99.5% of the concepts.ACTI|Activities & Behaviors|T053|Behavior

ANAT|Anatomy|T024|Tissue

CHEM|Chemicals & Drugs|T195|Antibiotic

CONC|Concepts & Ideas|T170|Intellectual Product

DEVI|Devices|T074|Medical Device

DISO|Disorders|T047|Disease or Syndrome

GENE|Genes & Molecular Sequences|T085|Molecular Sequence

GEOG|Geographic Areas|T083|Geographic Area

LIVB|Living Beings|T005|Virus

OBJC|Objects|T073|Manufactured Object

OCCU|Occupations|T091|Biomedical Occupation or Discipline

ORGA|Organizations|T093|Health Care Related Organization

PHEN|Phenomena|T038|Biologic Function

PHYS|Physiology|T040|Organism Function

PROC|Procedures|T061|Therapeutic or Preventive Procedure

Semantic Groups (15)

Semantic Types (135) Concepts

(millions)

Page 41: Diseases

UMLSKS – Semantic Navigator

Page 42: Diseases

• The number of patients with AD in any community depends on the proportion of older people in the group. Traditionally, the developed countries had large proportions of elderly people, and so they had very many cases of Alzheimer’s disease in the community at one time.

• 4.5 million AD patients in the United States today.

• Expected to increase to 11 to 16 million by 2050.

• In 2000, health care costs for AD patients in the United States totaled approximately $31.9 billion, which is expected to reach $49.3 billion by 2010 (http://www.alz.org)

• World-wide: ~18 million (projected to nearly double by 2025 to 34 million).

• Demographic transition - Developing countries:

• Increased life expectancy (current life expectancy in India is >60 years).

• 1991 India Census: 70 million people were over 60 years.

• 2001 India Census: 77 million, or 7.6% of the population.

• By 2025, we will have 177 million elderly people.

• Currently, more than 50% of people with Alzheimer’s disease live in developing countries and by 2025, this will be over 70%.

Alzheimer’s Disease – Alarming Statistics

Source: WHO & NIA

Page 43: Diseases

• The goal of applying computational data-mining approaches is to extract useful information from large amounts of data by employing mathematical methods that should be as automated as possible.

• Computational data-mining approaches are particularly appropriate in areas with much data but few explanations, such as gerontology. If researchers can find/derive patterns in data to perceive information, then information may enhance our knowledge over aging.

Alzheimer’s Disease – Why Computational Approaches?

• The complexity and broad range of cellular and biochemical events make researchers believe that there must be a sophisticated network of AD signal transduction, gene regulation, and protein-protein interaction events.

• Therefore, deciphering AD-related molecular network “circuitry” can help researchers understand AD disease better, model details, and propose treatment ideas.

Page 44: Diseases

Alzheimer Disease

Astrocytes

Basal Nucleus of Meynert

Cerebrum

Cerebral Cortex

Brain and Nervous System

Brain Microglia

Hippocampus

Frontal Lobe

Neurons

Temporal Lobe

A2M APOE ALOX12 ABCA1 ABCA2 NEF3 PARK2 STH APPNME1

A simplistic picture

Page 45: Diseases

Alzheimer Disease

Astrocytes

Basal Nucleus of Meynert

Cerebrum

Cerebral Cortex

Brain and Nervous System

Brain Microglia

Hippocampus

Frontal Lobe

Neurons

Temporal Lobe

A2M APOE ALOX12 ABCA1 ABCA2 NEF3 PARK2 STH APPNME1

Page 46: Diseases

A2M

APOE

ALOX12

ABCA1

ABCA2

STH

APP

NME1

Alzheimer Disease

Astrocytes

Basal Nucleus of Meynert

Cerebrum

Cerebral Cortex

Brain and Nervous System

Brain Microglia

Hippocampus

Frontal Lobe

Neurons

Temporal Lobe

NEF3

PARK2

Parkinson Disease

SchizophreniaSCZD2

SCZD8SCZD3

PARK3

PARK7PARP

Many Diseases – Many Genes

Page 47: Diseases

Alzheimer Disease

Astrocytes

Basal Nucleus of Meynert

Cerebrum

Cerebral Cortex

Brain and Nervous System

Brain Microglia

Hippocampus

Frontal Lobe

Neurons

Temporal Lobe

A2M APOE ALOX12 ABCA1 ABCA2 NEF3 PARK2 STH APPNME1

→enzyme binding

→extracellular space

→interleukin-1 binding

→interleukin-8 binding

→intracellular protein transport

→protein carrier activity

→protein homooligomerization

→serine-type endopeptidase inhibitor activity

→tumor necrosis factor binding

→wide-spectrum protease inhibitor activity

Functions/Processes

Alzheimer's disease (Kegg)

Neurodegenerative Disorders (Kegg)

Deregulation of CDK5 in Alzheimers Disease (BioCarta)

Generation of amyloid b-peptide by PS1 (BioCarta)

Platelet Amyloid Precursor Protein Pathway (BioCarta)

Hemostasis (Reactome)Pathways

Genes: Functions & Pathways

Page 48: Diseases

Alzheimer Disease

Astrocytes

Basal Nucleus of Meynert

Cerebrum

Cerebral Cortex

Brain and Nervous System

Brain Microglia

Hippocampus

Frontal Lobe

Neurons

Temporal Lobe

A2M APOE ALOX12 ABCA1 ABCA2 NEF3 PARK2 STH APPNME1

C1QBP

KNG1

KLKB1

CNTF

NS5A

TGFB2

APPBP1

Protein Interactions

Page 49: Diseases

1. Identifying the genetic players involved

2. Systematically perturbing individual players and/or pathways suspect of being involved in neurodegenerative diseases of model organisms (e.g. knock-outs)

Understanding the genetic network of human Alzheimer’s disease - Two general phases

Computational Approaches

• Data-mining (Data marts): Comparative Genomics, Interactome, Comparative Phenomics, Regulomics (TFBSs, motif/pattern search)

• Text-mining: Literature mining (hypothesis-generator)

• Mathematical Modeling: Disease process modeling

Experimental Approaches

• Genetic Manipulations

• Gene Expression Studies

• Animal Models

• Cellular Studies (to investigate specific cellular processes)

Page 50: Diseases

Alzheimer Disease Related Genes

Proteomics Genomics

Gene Expression

Model Organisms &

Genetic Manipulations

Comparative GenomicsDifferentially expressed genes

Cellular Studies

Transcriptome

Models of human neurodegenerative diseases

Post-Transcriptional Regulation - MicroRNAs

Transcriptional Regulation

Text-mining: Knowledge Discovery

Clustering Algorithms

Page 51: Diseases

NCBI Entrez Gene Query:

(alzheimer[Disease/Phenotype] OR alzheimer[All fields]) AND "homo sapiens"[Organism]143 Genes

A2M

ABCA1

ABCA2

ABCB1

ABL1

ACE

AD5

AD6

AD7

AD8

AD9

ADAM10

AGER

AHSG

APBA1

APBB2

APH1A

APOC1

APOD

APOE

APOM

APP

ASAHL

ATF2

BACE1

BACE2

BAX

BCHE

BCL2

BCL2L2

BLMH

CBS

CD40

CDC2

CDK5

CDK5R1

CDK5R2

CHAT

CHRNA4

CHRNA7

CLU

COL18A1

COL25A1

COX10

CRH

CTCF

CTNNA3

CTSB

CTSD

CXCR3

CYP46A1

DHCR24

DLST

DSCR1

E2F1

EEF2

EEF2K

EIF2AK2

EIF4E

EIF4EBP1

ENO1

ERBB4

ESR1

FALZ

FAS

FASLG

FRAP1

FYN

GABBR1

GAL

GAPDH

GFAP

GRIA1

GRIA2

GRIA3

GRIN2A

GRIN2B

GSK3B

HADH2

HPCAL1

HTR2A

IDE

IFNG

IGF2R

IL1B

ITM2B

KCNC4

KLK10

KLK7

LAMA1

LAMC1

LOC644264

LRP8

MAP2K1

MAPT

MEOX2

MME

MPO

MRE11A

MSI1

MTRR

NACA

NCAM1

NCSTN

NDRG2

NES

NGFR

NME1

NME2

NOS3

NRG1

OLR1

P18SRP

PARK7

PAXIP1

PCSK1

PCSK2

PCSK9

PIN1

PLAU

PON1

PRDX1

PRDX2

PRDX3

PRNP

PSEN1

PSEN2

RPS3A

RABGAP1L

RTN4

SERPINA3

SFRS12

SLC1A2

SLC6A3

SLC6A4

SNCB

SORL1

TFAM

TGFB1

TNF

TUBB3

UBQLN1

VSNL1

Mining Interactome

Page 52: Diseases
Page 53: Diseases
Page 54: Diseases

Pathways (top 10)

Molecular & Cellular Functions (top 10)

Physiological System Development & Function (top 10)

Y-axis represents significance - probability that the genes within the dataset file are involved in a particular high level function (Ingenuity Analysis)

Page 55: Diseases

http://depts.washington.edu/l2l/

NCBI Entrez Gene Query:

(alzheimer[Disease/Phenotype] OR alzheimer[All fields]) AND "homo sapiens"[Organism]

143 AD-associated genes

Mining about 800 gene expression datasets

Page 56: Diseases

Text-mining MedLine Abstracts

• Data Source: GeneRIF – Gene reference into function – Manually entered/curated sentences.

• GeneRIF: “Abstract of Abstracts”• NLP - MetaMap and GATE (General

Architecture for Text Engineering)• Keywords: MESH and UMLS concepts for

Alzheimer’s disease (AD, Alzheimer’s dementia, Alzheimer disease, etc.)

299 unique genes associated with Alzheimer’s disease

Page 57: Diseases

GATACA – Gene Association To Anatomy & Clinical Abnormality

Page 58: Diseases

299 genes associated with Alzheimer's Disease (based on text-mining Medline abstracts)

Entrez GENE ID

GENE SYMBOL SENTENCE PubMed_ID

2 A2MGenetic association of alpha2-macroglobulin polymorphisms with Alzheimer's disease 12221172

5243 ABCB1

Deposition of Alzheimer beta amyloid is inversely correlated with expression of this protein in the brains of elderly non-demented humans. 12360104

153 ADRB1

Single-nucleotide polymorphisms (SNPs) in the beta1-adrenergic receptor (ADRB1) allelic frequencies were analyzed in Alzheimer's disease. The combination of G protein beta3 subunit and ADRB1 polymorphisms produces AD susceptibility. 15212839

239 ALOX1212/15-lipoxygenase is increased in Alzheimer's disease and has a possible role in brain oxidative stress 15111312

246 ALOX1512/15-lipoxygenase is increased in Alzheimer's disease and has a possible role in brain oxidative stress 15111312

9546 APBA3Associated with etiological mechanism of Alzheimer's disease. 11831025

Page 59: Diseases
Page 60: Diseases

List PMID description

total probes

expected

actual bin prob

alzheimers_disease_dn 14769913*

Downregulated in correlation with overt Alzheimer's Disease, in the CA1 region of the hippocampus 1222

11.08886 49 2.83E-17

alzheimers_disease_up 14769913*

Upregulated in correlation with overt Alzheimer's Disease, in the CA1 region of the hippocampus 1665 15.1088 53 1.82E-14

ageing_brain_up

15190254**

Age-upregulated in the human frontal cortex 252

2.286737 19 3.67E-12

ageing_brain_dn

15190254

**Age-downregulated in the human frontal cortex 145

1.315781 13 1.07E-09

*Lu T, Pan Y, Kao SY, Li C, Kohane I, Chan J, Yankner BA. 2004. Gene regulation and DNA damage in the ageing human brain. Nature 429(6994): 883-891.

** Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW. 2004. Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci U S A. 101 (7): 2173-2178.

http://depts.washington.edu/l2l/

299 genes associated with Alzheimer’s disease: Comparison with genes differentially expressed in Alzheimer’s and ageing frontal cortex

Page 61: Diseases

Human CNS

Mouse non-CNSMouse CNS

Human non-CNS

A 940 gene ortholog pairs over-expressed in both human and mouse CNS

B 206 gene ortholog pairs over-expressed in human, not mouse CNS

C 266 gene ortholog pairs over-expressed in mouse, not human CNS

Kong and Jegga, unpublished

CNS-overexpressed genes in adult human and/or mouse

Page 62: Diseases

220 28

2130 308

865

581

940 human-mouse orthologous genes overexpressed in CNS

1222 genes downregulated in Alzheimer’s

299 genes associated with Alzheimer’s disease – Literature mining

APP

ARPP-19

CAMK2A

CDK5

CDK5R1

CHGA

CKB

GLUL

GNAS

GRIA3

KNS2

MAP2K1

MAPK1

MAPK8IP1

PCSK1

PRDX2

RGS4

SNCA

UCHL1

VSNL1

YWHAZ

How many of these are involved in CNS development or function – From GO

Page 63: Diseases

Sequence Context

List of Transcription Factor Binding Sites

http://concise-scanner.cchmc.org

To identify putative gene targets of transcription factors

Page 64: Diseases

Human Mouse

GenomeTrafac Coordinates

Genome Assembly Coordinates

Conserved binding sites between human and mouse

Page 65: Diseases

Trachea & bronchial epithelial cells

Prostate

Gnf Expression Atlas - Human

• PDEF is an ETS transcription factor expressed in prostate epithelial cells.

• Nkx3.1 interacts with SPDEF or Prostate derived Ets factor.

GenomeTrafac Tracks

Page 66: Diseases

http://polydoms.cchmc.org

Page 67: Diseases

Goals – Summary………• Enable discovery of novel disease-gene

relationships• Facilitate discovery of disease-pathway

relationships• Enable discovery of novel pathways and targets

and associate them with disease processes• Help researchers generate testable hypotheses• Support efforts to prioritize research• Facilitate meta-analyses

Page 68: Diseases

Computational• Semantic Web (SW): “A

vision for the next generation web in which data from multiple sources described with rich semantics are integrated to enable human processing by humans as well as software agents” (SW Life Sciences)

• Semantic Web Languages– RDF (Resource Description

Framework)– RDFS (RDF schema) and – OWL (Ontology Web

Language)– SPARQL (semantic web

querying language) • Prioritization and Ranking

entities on novel Gene Networks and Inferencing

New/Future Directions…….Biological/Genomics• Gene regulation by

microRNAs (miRNAs): – ~22 bp non-coding nucleotide

RNAs that primarily act post-transcriptionally by suppressing mRNAs

– At least 1% of the transcripts in the genome code for miRNAs

– miRNA have at least 20-30% of the coding genes as their targets

– miRNAs are implicated in various cellular processes, such as cell fate determination, cell death, and tumorigenesis (Bartel 2004).

– E.g.: CREB-regulated miRNA regulates neuronal morphogenesis (Vo et al 2005)

Page 69: Diseases

Take-home messages• Networks and integration of databases are

keys to success in Bioinformatics.• Integration of data computation and data

integration into a single cohesive whole will increase the efficiency of research effort – by reducing the serendipity & hit and miss nature

of empirical research and – will provide valuable clues to the biomedical

researchers on their choice of experiments - limitations of funds, manpower and time.

• Researchers/Users have to know what is available and how to access (what are the limitations), and use the resources they are offered or are available.

Page 70: Diseases

PubMed

Medical Informatics

Patient Records

Patient Records

Disease Database

Disease Database→Name→Synonyms→Related/Similar Diseases→Subtypes→Etiology →Predisposing Causes→Pathogenesis→Molecular Basis→Population Genetics→Clinical findings→System(s) involved→Lesions →Diagnosis→Prognosis→Treatment→Clinical Trials……

Clinical Trials

Clinical Trials

Bioinformatics

Genome

Transcriptome

Proteome

Interactome

Metabolome

Physiome

Regulome Variome

Pathome

Ph

arm

acog

enom

e

Disease

World

OMIM

►Personalized Medicine►Decision Support System►Outcome Predictor►Course Predictor►Diagnostic Test Selector►Clinical Trials Design►Hypothesis Generator…..

Integrative

Genomics -

Biomedical

Informatics

the Ultimate Goal…….

Page 71: Diseases

http://sbw.kgi.edu/Thank You!

“To him who devotes his life to science, nothing can give more happiness than increasing the number of discoveries, but his cup of joy is full when the results of his studies immediately find practical applications”

— Louis Pasteur

http://anil.cchmc.org (under presentations)