ncbo haendel talk 2013
DESCRIPTION
Part of the NCBO seminar series http://www.bioontology.org/webinar-seriesTRANSCRIPT
Removing roadblocks: leveraging ontologies for data aggregation and
computation
NCBO Seminar seriesMarch 6th, 2013
Melissa HaendelOn behalf of very many team members
Topics for today
The Research Symbiosis Some Integration Projects Leveraging
Ontologies A more complete research profile – integrating
research resources and person information Improving query across multiple biospecimen
repositories Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
Consult Databases
Share Resources/Data
Publish papers
Contribute to Databases
The Research Symbiosis
Get funding
Do Experiments
The Web
We’ve all been here before:
Ontologies can help us do better.
OMIM Query # of records“large bone” 1032"enlarged bone" 207"big bones" 22"huge bones" 4"massive bones" 39"hyperplastic bones" 12"hyperplastic bone" 44"bone hyperplasia" 173"increased bone growth" 836
Why not just map to ontology terms?Class A Class B Mapped? Useful?
FMA: extensor retinaculum of wrist
MouseAnatomy: retina Yes No
Vivo: legal decision Cognitive Atlas: decision Yes No
PlantOntology: Pith MouseAnatomy: medulla Yes No
TaxRank: domain NCI: protein domain Yes No
ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes
FMA: tibia FlyAnatomy: tibia Yes No
FMA: colon GAZ: Colón, Panama Yes No
Quality: male Chebi: maleate 2(-) Yes No
Mapping requires manual work to perform and maintain; string matching for mapping can lead to spurious results; semantics of mappings and provenance are not always clear
Topics for today
The Research Symbiosis Some Integration Projects Leveraging
Ontologies A more complete research profile – integrating
research resources and person information Improving query across multiple biospecimen
repositories Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
CTSAconnect: A Linked Open Data
approach to represent clinical and research
expertise, activities, and resources
CTSA 10-001: 100928SB23PROJECT #: 00921-0001
Research generates many resources that are rarely shared or published:
About eagle-i: inventories “invisible” resources
Ontology-system for collecting and querying research resources
eagle-i.net
About VIVO Primarily focused on people, activities, and
outcomes typically associated with research networking
Eager to represent more diverse components of expertise, across domains e.g., exhibits, performances, specifics about research
Had worked with core facilities at Cornell to represent labs, equipment, and services
Started collaborating with eagle-i to go further with research resources
At the intersection of Vivo and eagle-i
www.ctsaconnect.org CTSAconnectReveal Connections. Realize
Potential.
And then was born the “CTSAconnect” project
Ok, so it is perhaps not a very informative name for an effort to consolidate researcher, research activities, and research resource representation, but what else are we going to call it?
ARG! The Agents, Resources, and Grants ontology
ISF Content and modularization
eagle-IResearch resources
VIVOPerson profiling
ShareCenterDiscussions, requests,
share documents
ISF
Contact OrganizationsAffiliations
Services EventsClinical
ExpertiseReagents
OrganismsCredentials
ISF Modularization
Constraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO applications
• Investments in existing RDF datasets and the need for stable targets
Benefits• Flexibility in what modules to populate at a given site• Extensibility as needs and feedback influence future evolution
Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability to save the
changes with a log comment, and generate the spreadsheet summary
Protégé refactoring plugin
ISF Merging
Relating ICD9 to MeSH in support of clinical expertise
Clinical expertise data visualization
Building translational teams
We want to assemble teams of scientists to examine, for example, specific drugs released for repurposing
Hard to identify and connect complementary basic and clinical expertise across disciplines
Bringing together clinical expertise and basic science expertise
Representation of a clinician expertise extracted From ICD-9 codes for
Basic Researcher with Similar Expertise based on MeSH TermsResources
a resource related to Autoimmune disease
Relating researchers across disciplines
Topics for today
The Research Symbiosis Some Integration Projects Leveraging
Ontologies A more complete research profile – integrating
research resources and person information Improving query across multiple biospecimen
repositories Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
OHSU’s Biolibrary and Search Engine
Data aggregated from two repositories:– Department of Pathology repository (600K)– Knight Cancer Institute repository (16K)
A web-based search engine over de-identified data
Our group is applying semantic informatics to improve– Data format and quality– Data integration across the two repositories– Search capabilities Funded by Medical Research
Foundation of Oregon
Opportunities for improving the Biolibrary data
Limited anatomical data– Cancer registry table has 300+
anatomical entities– Pathology table only 86 – 99% of pathology reports (600K)
have no anatomical codes– No anatomical relationships– Coded sites are not as specific as
descriptions in the pathology reports
Current Search Interface
Two separate search interfaces
Multiple forms
Biolibrary Text Search
Syntactic free text search
Coded Syntactic Search
Search through anatomy and histology lists
Extracting ontology concepts
Pathology reports were the main focus– Main source of data in the current system– Contain richer information
NLP tools were used to identify concepts Existing ontology resources were used to
add semantics
Developing a Biospecimen ontology
Phenotypes (PATO)
Information Ontology (IAO)
•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•Cell
Anatomy (FMA, Uberon)
Medicine(OGMS)
Classes, Types, Vocabulary
Data, Instances
Pathology Catalog
Pathology Inventory
Pathology Report Instance #123 Instance #456
Instantiates Classifies asUses
Structured data vs. pathology report(about 7K cases)
However, pathology report also includes:•Low grade pancreatic intraepithelial neoplasia•Extensive perineural invasion•Acute and chronic cholecystitis•Bile duct tissue with chronic inflammation•Chronic pancreatitis•Acute gastric serositis
Available structured data from one case:
Adding Logical Relationships
About 400 anatomical entities were mapped to the Foundational Model of Anatomy
An additional 300 to SNOMED Used the is_a and part_of relations Re-represented this in a semantic and
computable format Allows for semantic queries
Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact
– Negation– Differential diagnosis– Past case history
Researchers will likely need aggregated facts from multiple sources to support real research queries
Information extraction options are being explored as part of this work
Topics for today
The Research Symbiosis Some Integration Projects Leveraging
Ontologies A more complete research profile – integrating
research resources and person information Improving query across multiple biospecimen
repositories Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
Vertebrata
Ascidians
Arthropoda
Annelida
Mollusca
Echinodermata
tetrapod limbs
ampullae
tube feet
parapodia
We want to understand gene function across taxa
Databasing phenotypes is hard
• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple file formats• Measurements• …
ATTCGGATTACCGTATTA…genes, regulatory elements, …
sequence
Sequence data
Databases proliferate
ATTCGGATTACCGTATTA…genes, regulatory elements, …
sequence
Sequence data
Ontologies as a tool for unification
Disease-Phenotypedatabases
Disease phenotype ontology
Expressiondata
Gene functiondata
Cell and tissueontology GO
annotations
ontologies
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
Yet problems remains
Incompletedata
Not connected
ontology
Missing & incorrectannotations
MultipleOverlappingOntologies
ontologyontology
ontology
ontologyontology
ontologyontology
ontologyontology
ontologyontology
ontologyontology
ontologyontology
Annotationsmiss the importantbiology
Ontologies built for one species will not work for others
http://fme.biostr.washington.edu:8080/FME/index.html
http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
Uberon: a multi-species anatomy ontology
• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass, part-of and
develops-from• Scope: metazoa (animals)
– Current focus is chordates– Federated approach for other taxa
• Uberon classes are generic / species neutral– ‘mammary gland’: you can use this class for any mammal!– ‘lung’: you can use this class for any vertebrate (that has lungs)
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://genomebiology.com/2012/13/1/R5
Bridging anatomy ontologies
ZFA
MA FMA
EHDAA2EMAPA
Uberon
CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5
SNOMED
NCIt
GO
CL
UBERON
cerebellum
cerebellarvermis
pp
cerebellum
cerebellar vermis
cerebellum
vermis of cereblleum
posterior lobe of
cerebellum
pp
MA:mouseFMA:human
GO/NIF: subcellular GO/NIF: subcellular
axon
CL:Purkinje cell
p
i i
CL:Purkinje cell
axon
i
ii
i
dendrite dendrite
cerebellum posterior
lobe
cerebellum posterior
lobe
p
pp
Uberon enables queries across
granularity
Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011). vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.
http://bgee.unil.ch
Evo-devo applications
Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708. doi:10.1371/journal.pone.0010708
The Monarch InitiativeThe model systems research network
We are under construction
Goals are to: Aggregate model systems genotype
and phenotype information Integrate with network, genomic, and
functional data Leverage ontologies for phenotype
similarity matching Build knowledge exploration tools for
end users Build services for other applications
Funded by NIH # 1R24OD011883-01
Can we search by phenotype alone?
Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
Integrating phenotypes using ontologies
But..different organisms record genotypes differently
Phenotypes can be attached to full or partial genotypes, alleles, or variants
Model systems phenotype and genotype data
Pulling it togetherNIF DISCO
Data ingest Ontology annotation
OWLSIM
Enabling phenotype-based knowledge discovery tools
ONTOQUEST
Extensible Web resource DISCOvery, registration and interoperation framework
MONARCH tools and services
These integration projects…well, integrate
CTSAconnectReveal Connections. Realize
Potential.
OHSU Biolibrary
peopleResearch resources
Clinical encounters
Phenotypes
biospecimens
genes
variations
Conclusions
Ontologies have provided us the capability to integrate a variety of biomedical data, at different levels of granularity, from different applications, and across domains
Describing biology works best with multiple connected ontologies
We need smart data, not just big data We need better tools to integrate multiple ontologies We need better tools to make use of smarter data
structures (e.g. reasoning costs)
Monarch Initiative
CTSAconnect
Biospecimen Ontology
OHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim Essaid
Cornell UniversityDean KrafftJon Corson-RikertBrian Lowe
University of FloridaMike ConlonChris BarnesNicholas Rejack
OHSUMelissa HaendelShahim EssaidCarlo Torniai
OHSUMelissa HaendelCarlo TorniaiShahim EssaidNicole VasilevskyScott HoffmanMatt Brush
LBNLChris MungallSuzi LewisNicole Washington
UCSD/NIFMaryann MartoneAnita BandrowskiJeff GretheAmarnath Gupta
Stony Brook UniversityMoises EisenbergErich BremerJanos Hajagos
Harvard UniversityDaniela BourgesSophia Cheng
University at BuffaloBarry SmithDagobert Soergel
ZaloniWill CorbettRanjit DasBen Sharma
University of PittsburghHarry HochheiserChuck Borromeo