the ontrez project at ncbo

27
The Ontrez project at The Ontrez project at NCBO NCBO Nigam Shah [email protected]

Upload: egil

Post on 22-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

The Ontrez project at NCBO. Nigam Shah [email protected]. Public data repositories. Around 1100 databases in the NAR’s 2008 database issue. High throughput gene expression data in repositories such as GEO, SMD, Array Express - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  Ontrez  project at NCBO

The Ontrez project at NCBOThe Ontrez project at NCBO

Nigam [email protected]

Page 2: The  Ontrez  project at NCBO

Public data repositoriesPublic data repositories

• Around 1100 databases in the NAR’s 2008 database issue.

• High throughput gene expression data in repositories such as GEO, SMD, Array Express

• Clinical Trial repositories such as caBIG, TrialBank, clinicaltrials.gov

• Guideline repositories such as www.guideline.gov• Image repositories such as BIRN• Observational studies such as Framingham,

NHANES, AMCIS.

2

Page 3: The  Ontrez  project at NCBO

Database annotationDatabase annotation

• Ontology based annotation is not as wide-spread as desired• Most annotation is still free-text

• Possible reasons:1.Lack of a one stop shop for bio-ontologies2.Lack of tools to annotate experimental data

• Manual phenote• Automatic ?

3.Lack of a sustainable mechanism to create ontology based annotations

3

Page 4: The  Ontrez  project at NCBO

Different kinds of annotationsDifferent kinds of annotations

ELMO1 expression is altered by mechanical stimuli

::

Other experiments::

ELMO1 associated_with actin cytoskeleton organization and biogenesis

Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli.

Chronic Bladder Overdistension

Low level result

summary result

annotation

metadata

4

Page 5: The  Ontrez  project at NCBO

Annotations as assertionsAnnotations as assertions

• Annotation = An assertion declaring a relationship b/w a biomedical entity and a type in an ontology.

• e.g. p53 <associated_with> cell death

• Annotations tell us what the biologists believe to be true (in particular or in general)• Most annotations are based on particular observations

and are generalized during interpretation by a biologist/curator.

• Semantics of annotations are not always declared apriori (e.g. associated_with, involves)

5

Page 6: The  Ontrez  project at NCBO

Annotations as ‘Meta-data’Annotations as ‘Meta-data’

• Metadata: The text description accompanying a dataset in a database.

• Metadata-annotations should be machine processed (and indexed using ontologies) because• The volume is orders of magnitude more than the

summary results• These annotations are not stating any biological fact

• Hence don’t need a curator to create them• These annotations are to be used to LOCATE datasets

accurately as soon as they are available in a public repository

• we can not afford to have a curation bottleneck

6

Page 7: The  Ontrez  project at NCBO

High level goalHigh level goal

• Process the metadata annotations to automatically tag the ‘elements’ in public repositories with as many ontology terms as possible.

• For example in case of the GEO dataset 906: • Expression profiling of cultured bladder smooth muscle cells subjected to

repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli.

• Gets tagged with:• Expression, Expression of bladder, bladder, smooth, bladder muscle, muscle,

smooth muscle, cells, mechanical, mechanical stimulation, stimulation, Chronic, results, bladder overdistension, associated, associated with, with, loss, genes, altered

7

Page 8: The  Ontrez  project at NCBO

Tagging [annotating] with ontology Tagging [annotating] with ontology termsterms

8

Page 9: The  Ontrez  project at NCBO

9

Page 10: The  Ontrez  project at NCBO

Querying the annotation indexQuerying the annotation index

10

Page 11: The  Ontrez  project at NCBO

11

Page 12: The  Ontrez  project at NCBO

12

Page 13: The  Ontrez  project at NCBO

13

Page 14: The  Ontrez  project at NCBO

14

Page 15: The  Ontrez  project at NCBO

WHAT NEW SCIENCE DO WE WHAT NEW SCIENCE DO WE ENABLE?ENABLE?

15

Page 16: The  Ontrez  project at NCBO

New Science enabledNew Science enabled

• Nature study on image features and gene expression

• Correlation b/w protein and gene expression for cancer classification

• Correlating gene expression and drug effect information for predicting drug efficacy

• Training and testing image processing algorithms

16

Page 17: The  Ontrez  project at NCBO

Decoding global gene expression programs in liver cancer by noninvasive imagingEran Segal, Claude B Sirlin, Clara Ooi, Adam S Adler, Jeremy Gollub, Xin Chen, Bryan K Chan, George R Matcuk, Christopher T Barry, Howard Y Chang & Michael D Kuo

Nature Biotechnology 25, 675 - 680 (2007) Published online: 21 May 2007

17

Page 18: The  Ontrez  project at NCBO

Correlation of protein and gene expression for the stratification of breast cancer patients

18

Page 19: The  Ontrez  project at NCBO

There are 20 other diseases for There are 20 other diseases for which this is possible!which this is possible!

Disease GEO samples TMADsamples

Acute myeloid leukemia 366 3Malignant melanoma 47 43B-cell lymphoma 133 27Prostate cancer 47 15Renal carcinoma 34 185Carcinoma squamous 105 175Multiple myeloma 225 169Clear cell carcinoma 34 63Renal cell carcinoma 34 9Breast carcinoma 3 1277Hepatocellular carcinoma 80 163Carcinoma lung 91 66Cutaneous malignant melanoma

38 41

T-cell lymphoma 29 31Lymphoblastic lymphoma 29 30Uterine fibroid 10 19Medulloblastoma 46 9Clear cell sarcoma 35 8Leiomyosarcoma 24 5Mesothelioma 54 5Kaposi's sarcoma 4 3Cardiomyopathy 14 2Dilated cardiomyopathy 14 2

19

Page 20: The  Ontrez  project at NCBO

20

Page 21: The  Ontrez  project at NCBO

TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms.

21

Page 22: The  Ontrez  project at NCBO

Current status of the prototypeCurrent status of the prototype

Resource Number of elements

Resource file size (Kb)

Number of direct annotations

Number of closure annotations

Total number of 'useful' annotations

PubMed 10164 13461 187686 681973 857459

ArrayExpress 2751 2880 143134 484758 619133

ClinicalTrials.gov 43918 8379 1206939 6792430 5217115

Gene Expression Omnibus

546 163 16494 100984 116234

ARRS GoldMiner 1155 494 53082 290935 340915

TOTAL 58534 25377 1607335 8351080 7150856

22

Page 23: The  Ontrez  project at NCBO

Ontrez: Target resourcesOntrez: Target resources

Papers Datasets Guidelines

Clinical Trials Treatments

Drugs Phenotype Animal models

Alleles and Genotype

mRNA expression

Protein expression

GWAS RCT reports

Trial description

text images Genes Variations

Metastatic Melanoma

3330 7 76

Invasive Melanoma

237 1 1

Melanoma in situ 314 1 2

Spindle Cell Melanoma

47 0 0

23

Page 24: The  Ontrez  project at NCBO

Where can we go?Where can we go?

• Become a service for ‘annotating’ biomedical text.– People send us text, we send back recognized concepts

(may be even relationships)– Given a set of concepts we provide a similarity metric

between them– Both these services can be plugged into a variety of

community and collaborative annotations tools• Become ‘the one stop shop’ for finding items across

a wide variety of resources …– Integrate on the ‘disease’ dimension. Gene cards exist,

disease cards don’t– Focus on approx. 15 resources in the next year.

– PDB and PLoS are interested

24

Page 25: The  Ontrez  project at NCBO

Research questions - 1Research questions - 1

Genes/Proteins Diseases Drugs body parts developmental stages

Pathways processes genetic markers

SNOMEDCT .. X .. .. .. .. .. ..

RxNORM .. .. X .. .. .. .. ..

INOH .. .. .. .. .. X .. .. NCIT .. X .. .. .. .. .. .. Gene Ontology (BP)

.. .. .. .. .. .. X ..

FMA .. .. .. X .. .. .. .. Cell type Ontology

.. .. .. .. .. .. .. ..

Mammalian Phenotype

.. .. X .. .. .. .. ..

Mouse anatomy and development

.. .. .. X X .. .. ..

Zebrafish anatomy and development

.. .. .. X X .. .. ..

25

Page 26: The  Ontrez  project at NCBO

Research questions - 2Research questions - 2

Genes/Proteins Diseases Drugs body parts developmental stages

Pathways processes genetic markers

GATE .. .. .. .. .. .. .. ..

UMLS-Query .. .. .. .. .. .. .. ..

mgrep .. .. .. .. .. .. .. ..

MetaMAP .. .. .. .. .. .. .. ..

UPenn (conditional random fields)

.. .. .. .. .. .. .. ..

Language Modeling methods

.. .. .. .. .. .. .. ..

26

Page 27: The  Ontrez  project at NCBO

Credits and collaborationsCredits and collaborations

• Clement Jonquet• Nipun Bhatia• Manhong Dai

• Fan Meng• Brian Athey• Mark Musen

27