ncbo haendel talk 2013

52
Removing roadblocks: leveraging ontologies for data aggregation and computation NCBO Seminar series March 6 th , 2013 Melissa Haendel On behalf of very many team members

Upload: mhaendel

Post on 24-May-2015

271 views

Category:

Documents


0 download

DESCRIPTION

Part of the NCBO seminar series http://www.bioontology.org/webinar-series

TRANSCRIPT

Page 1: NCBO haendel talk 2013

Removing roadblocks: leveraging ontologies for data aggregation and

computation

NCBO Seminar seriesMarch 6th, 2013

Melissa HaendelOn behalf of very many team members

Page 2: NCBO haendel talk 2013

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Page 3: NCBO haendel talk 2013

Consult Databases

Share Resources/Data

Publish papers

Contribute to Databases

The Research Symbiosis

Get funding

Do Experiments

The Web

Page 4: NCBO haendel talk 2013

We’ve all been here before:

Ontologies can help us do better.

OMIM Query # of records“large bone” 1032"enlarged bone" 207"big bones" 22"huge bones" 4"massive bones" 39"hyperplastic bones" 12"hyperplastic bone" 44"bone hyperplasia" 173"increased bone growth" 836

Page 5: NCBO haendel talk 2013

Why not just map to ontology terms?Class A Class B Mapped? Useful?

FMA: extensor retinaculum of wrist

MouseAnatomy: retina Yes No

Vivo: legal decision Cognitive Atlas: decision Yes No

PlantOntology: Pith MouseAnatomy: medulla Yes No

TaxRank: domain NCI: protein domain Yes No

ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes

FMA: tibia FlyAnatomy: tibia Yes No

FMA: colon GAZ: Colón, Panama Yes No

Quality: male Chebi: maleate 2(-) Yes No

Mapping requires manual work to perform and maintain; string matching for mapping can lead to spurious results; semantics of mappings and provenance are not always clear

Page 6: NCBO haendel talk 2013

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Page 7: NCBO haendel talk 2013

CTSAconnect: A Linked Open Data

approach to represent clinical and research

expertise, activities, and resources

CTSA 10-001: 100928SB23PROJECT #: 00921-0001

Page 8: NCBO haendel talk 2013

Research generates many resources that are rarely shared or published:

About eagle-i: inventories “invisible” resources

Ontology-system for collecting and querying research resources

eagle-i.net

Page 9: NCBO haendel talk 2013

About VIVO Primarily focused on people, activities, and

outcomes typically associated with research networking

Eager to represent more diverse components of expertise, across domains e.g., exhibits, performances, specifics about research

Had worked with core facilities at Cornell to represent labs, equipment, and services

Started collaborating with eagle-i to go further with research resources

Page 10: NCBO haendel talk 2013

At the intersection of Vivo and eagle-i

Page 11: NCBO haendel talk 2013

www.ctsaconnect.org CTSAconnectReveal Connections. Realize

Potential.

And then was born the “CTSAconnect” project

Ok, so it is perhaps not a very informative name for an effort to consolidate researcher, research activities, and research resource representation, but what else are we going to call it?

ARG! The Agents, Resources, and Grants ontology

Page 12: NCBO haendel talk 2013

ISF Content and modularization

eagle-IResearch resources

VIVOPerson profiling

ShareCenterDiscussions, requests,

share documents

ISF

Contact OrganizationsAffiliations

Services EventsClinical

ExpertiseReagents

OrganismsCredentials

Page 13: NCBO haendel talk 2013

ISF Modularization

Constraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO applications

• Investments in existing RDF datasets and the need for stable targets

Benefits• Flexibility in what modules to populate at a given site• Extensibility as needs and feedback influence future evolution

Page 14: NCBO haendel talk 2013

Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability to save the

changes with a log comment, and generate the spreadsheet summary

Protégé refactoring plugin

Page 15: NCBO haendel talk 2013

ISF Merging

Page 16: NCBO haendel talk 2013

Relating ICD9 to MeSH in support of clinical expertise

Page 17: NCBO haendel talk 2013

Clinical expertise data visualization

Page 18: NCBO haendel talk 2013

Building translational teams

We want to assemble teams of scientists to examine, for example, specific drugs released for repurposing

Hard to identify and connect complementary basic and clinical expertise across disciplines

Page 19: NCBO haendel talk 2013

Bringing together clinical expertise and basic science expertise

Representation of a clinician expertise extracted From ICD-9 codes for

Basic Researcher with Similar Expertise based on MeSH TermsResources

a resource related to Autoimmune disease

Page 20: NCBO haendel talk 2013

Relating researchers across disciplines

Page 21: NCBO haendel talk 2013

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Page 22: NCBO haendel talk 2013

OHSU’s Biolibrary and Search Engine

Data aggregated from two repositories:– Department of Pathology repository (600K)– Knight Cancer Institute repository (16K)

A web-based search engine over de-identified data

Our group is applying semantic informatics to improve– Data format and quality– Data integration across the two repositories– Search capabilities Funded by Medical Research

Foundation of Oregon

Page 23: NCBO haendel talk 2013

Opportunities for improving the Biolibrary data

Limited anatomical data– Cancer registry table has 300+

anatomical entities– Pathology table only 86 – 99% of pathology reports (600K)

have no anatomical codes– No anatomical relationships– Coded sites are not as specific as

descriptions in the pathology reports

Page 24: NCBO haendel talk 2013

Current Search Interface

Two separate search interfaces

Multiple forms

Page 25: NCBO haendel talk 2013

Biolibrary Text Search

Syntactic free text search

Page 26: NCBO haendel talk 2013

Coded Syntactic Search

Search through anatomy and histology lists

Page 27: NCBO haendel talk 2013

Extracting ontology concepts

Pathology reports were the main focus– Main source of data in the current system– Contain richer information

NLP tools were used to identify concepts Existing ontology resources were used to

add semantics

Page 28: NCBO haendel talk 2013

Developing a Biospecimen ontology

Phenotypes (PATO)

Information Ontology (IAO)

•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•Cell

Anatomy (FMA, Uberon)

Medicine(OGMS)

Classes, Types, Vocabulary

Data, Instances

Pathology Catalog

Pathology Inventory

Pathology Report Instance #123 Instance #456

Instantiates Classifies asUses

Page 29: NCBO haendel talk 2013

Structured data vs. pathology report(about 7K cases)

However, pathology report also includes:•Low grade pancreatic intraepithelial neoplasia•Extensive perineural invasion•Acute and chronic cholecystitis•Bile duct tissue with chronic inflammation•Chronic pancreatitis•Acute gastric serositis

Available structured data from one case:

Page 30: NCBO haendel talk 2013

Adding Logical Relationships

About 400 anatomical entities were mapped to the Foundational Model of Anatomy

An additional 300 to SNOMED Used the is_a and part_of relations Re-represented this in a semantic and

computable format Allows for semantic queries

Page 31: NCBO haendel talk 2013

Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact

– Negation– Differential diagnosis– Past case history

Researchers will likely need aggregated facts from multiple sources to support real research queries

Information extraction options are being explored as part of this work

Page 32: NCBO haendel talk 2013

Topics for today

The Research Symbiosis Some Integration Projects Leveraging

Ontologies A more complete research profile – integrating

research resources and person information Improving query across multiple biospecimen

repositories Identifying disease candidates by leveraging

cross-species anatomy and phenotype queries

Page 33: NCBO haendel talk 2013

Vertebrata

Ascidians

Arthropoda

Annelida

Mollusca

Echinodermata

tetrapod limbs

ampullae

tube feet

parapodia

We want to understand gene function across taxa

Page 34: NCBO haendel talk 2013

Databasing phenotypes is hard

• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple file formats• Measurements• …

ATTCGGATTACCGTATTA…genes, regulatory elements, …

sequence

Sequence data

Page 35: NCBO haendel talk 2013

Databases proliferate

ATTCGGATTACCGTATTA…genes, regulatory elements, …

sequence

Sequence data

Page 36: NCBO haendel talk 2013

Ontologies as a tool for unification

Disease-Phenotypedatabases

Disease phenotype ontology

Expressiondata

Gene functiondata

Cell and tissueontology GO

annotations

ontologies

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556

Page 37: NCBO haendel talk 2013

Yet problems remains

Incompletedata

Not connected

ontology

Missing & incorrectannotations

MultipleOverlappingOntologies

ontologyontology

ontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

ontologyontology

Annotationsmiss the importantbiology

Page 38: NCBO haendel talk 2013

Ontologies built for one species will not work for others

http://fme.biostr.washington.edu:8080/FME/index.html

http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html

Page 39: NCBO haendel talk 2013

Uberon: a multi-species anatomy ontology

• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass, part-of and

develops-from• Scope: metazoa (animals)

– Current focus is chordates– Federated approach for other taxa

• Uberon classes are generic / species neutral– ‘mammary gland’: you can use this class for any mammal!– ‘lung’: you can use this class for any vertebrate (that has lungs)

Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://genomebiology.com/2012/13/1/R5

Page 40: NCBO haendel talk 2013

Bridging anatomy ontologies

ZFA

MA FMA

EHDAA2EMAPA

Uberon

CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5

SNOMED

NCIt

GO

CL

Page 41: NCBO haendel talk 2013

UBERON

cerebellum

cerebellarvermis

pp

cerebellum

cerebellar vermis

cerebellum

vermis of cereblleum

posterior lobe of

cerebellum

pp

MA:mouseFMA:human

GO/NIF: subcellular GO/NIF: subcellular

axon

CL:Purkinje cell

p

i i

CL:Purkinje cell

axon

i

ii

i

dendrite dendrite

cerebellum posterior

lobe

cerebellum posterior

lobe

p

pp

Uberon enables queries across

granularity

Page 42: NCBO haendel talk 2013

Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011). vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.

http://bgee.unil.ch

Page 43: NCBO haendel talk 2013

Evo-devo applications

Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708. doi:10.1371/journal.pone.0010708

Page 44: NCBO haendel talk 2013

The Monarch InitiativeThe model systems research network

We are under construction

Goals are to: Aggregate model systems genotype

and phenotype information Integrate with network, genomic, and

functional data Leverage ontologies for phenotype

similarity matching Build knowledge exploration tools for

end users Build services for other applications

Funded by NIH # 1R24OD011883-01

Page 45: NCBO haendel talk 2013

Can we search by phenotype alone?

Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247

Page 46: NCBO haendel talk 2013

Integrating phenotypes using ontologies

Page 47: NCBO haendel talk 2013

But..different organisms record genotypes differently

Phenotypes can be attached to full or partial genotypes, alleles, or variants

Page 48: NCBO haendel talk 2013

Model systems phenotype and genotype data

Pulling it togetherNIF DISCO

Data ingest Ontology annotation

OWLSIM

Enabling phenotype-based knowledge discovery tools

ONTOQUEST

Extensible Web resource DISCOvery, registration and interoperation framework

MONARCH tools and services

Page 49: NCBO haendel talk 2013
Page 50: NCBO haendel talk 2013

These integration projects…well, integrate

CTSAconnectReveal Connections. Realize

Potential.

OHSU Biolibrary

peopleResearch resources

Clinical encounters

Phenotypes

biospecimens

genes

variations

Page 51: NCBO haendel talk 2013

Conclusions

Ontologies have provided us the capability to integrate a variety of biomedical data, at different levels of granularity, from different applications, and across domains

Describing biology works best with multiple connected ontologies

We need smart data, not just big data We need better tools to integrate multiple ontologies We need better tools to make use of smarter data

structures (e.g. reasoning costs)

Page 52: NCBO haendel talk 2013

Monarch Initiative

CTSAconnect

Biospecimen Ontology

OHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim Essaid

Cornell UniversityDean KrafftJon Corson-RikertBrian Lowe

University of FloridaMike ConlonChris BarnesNicholas Rejack

OHSUMelissa HaendelShahim EssaidCarlo Torniai

OHSUMelissa HaendelCarlo TorniaiShahim EssaidNicole VasilevskyScott HoffmanMatt Brush

LBNLChris MungallSuzi LewisNicole Washington

UCSD/NIFMaryann MartoneAnita BandrowskiJeff GretheAmarnath Gupta

Stony Brook UniversityMoises EisenbergErich BremerJanos Hajagos

Harvard UniversityDaniela BourgesSophia Cheng

University at BuffaloBarry SmithDagobert Soergel

ZaloniWill CorbettRanjit DasBen Sharma

University of PittsburghHarry HochheiserChuck Borromeo