the national center for biomedical ontology · data integration efforts are laborious barriers to...
TRANSCRIPT
Copyright © Daniel Rubin Stanford University [email protected] 1
The National Center for The National Center for Biomedical OntologyBiomedical Ontology
Daniel Rubin, MD, MSStanford Medical Informatics
Stanford – Berkeley Mayo – Victoria – Buffalo
UCSF – Oregon – Cambridgehttp://www.bioontology.org
Copyright © Daniel L. Rubin 2006
● Explosion in online biomedical dataGenomics (genetic sequences, SNPs)Gene expression microarraysProteomics (mass spectrometry, protein arrays)Tissue arrays, ICH
● Need for people & machines to make sense of massive data sets
The biomedical data explosionThe biomedical data explosion
Copyright © Daniel Rubin Stanford University [email protected] 2
Copyright © Daniel L. Rubin 2006
Biomedical researchers use ontologiesBiomedical researchers use ontologies
● Controlled vocabulary for science
● Representation of biomedical knowledge, shared by humans and computers
● Terms for annotating experimental data
● Knowledge source for biomedical applicationsDecision supportNatural language-processingData integration
Ontologies are popping up everywhere
Copyright © Daniel Rubin Stanford University [email protected] 3
Copyright © Daniel L. Rubin 2006
Ontology development is fragmentedOntology development is fragmented
● Many different groups/consortia create ontologies—efforts are uncoordinated
● Many different ontologies, overlapping content and variable quality
● Ontologies are not interoperable
● Data integration efforts are laborious
● Barriers to accessing and effectively using numerous existing ontologies
Copyright © Daniel L. Rubin 2006
● Consortium of informaticians, biologists, clinicians, and ontologists, funded by the NIH Roadmap
● Ontology research and servicesOntology access, alignment, and managementOntology-based annotation of large data setsEnhance quality of ontology developmentCollaboration with diverse biomedical projects
Copyright © Daniel Rubin Stanford University [email protected] 4
Copyright © Daniel L. Rubin 2006
● Stanford: Tools for ontology search, alignment, versioning, and peer review
● Lawrence Berkeley Labs: Tools to use ontologies for data annotation
● Mayo Clinic: Tools for access to large controlled terminologies
● Univ. of Victoria: Tools for ontology visualization● Univ. at Buffalo: Dissemination of best practices
for ontology engineering● Univ. of Cambridge, Univ. of Oregon, UCSF:
Driving biomedical projects
National Center for Biomedical Ontology
Capture and index
experimental results
BIOMEDICALTHEORY
EXPERIMENTALDATA
Relate experimental
data to results from other
sources
Visualizationand
Analysis
Biomedical OntologiesAnnotations on
Experimental Data
Information Integration
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Open Biomedical Ontologies (OBO)
Open BiomedicalData (OBD)
BioPortalIFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
National Center for Biomedical Ontology
Capture and index
experimental results
BIOMEDICALTHEORY
EXPERIMENTALDATA
Relate experimental
data to results from other
sources
Visualizationand
Analysis
Biomedical OntologiesAnnotations on
Experimental Data
Information Integration
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Open Biomedical Ontologies (OBO)
Open BiomedicalData (OBD)
BioPortalIFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Copyright © Daniel Rubin Stanford University [email protected] 5
Copyright © Daniel L. Rubin 2006
Copyright © Daniel L. Rubin 2006
Core technologies for BioPortalCore technologies for BioPortal● Protégé:
Ontology visualizationOntology alignment and version diff
● LexGrid: Defines common information model for terminology contentAccess to controlled terminologies in many different formatsOntology content indexing and search
● Tiered Web app/services architecture
Copyright © Daniel Rubin Stanford University [email protected] 6
Copyright © Daniel L. Rubin 2006
BioPortal Architecture:BioPortal Architecture:Unifying ontologies and annotationsUnifying ontologies and annotations
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel Rubin Stanford University [email protected] 7
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel L. Rubin 2006
LexGridLexGrid::Ontology indexing and searchOntology indexing and search
● Terminological servicesSearch for ontology termsUse homophone, exact and partial searchMap free-text to ontologies
● Ontology indexes and servicesLucene index on ontology terms, definitions, and synonymsGlobal identifiers for termsOntology version information
Copyright © Daniel Rubin Stanford University [email protected] 8
Copyright © Daniel L. Rubin 2006
LexGridLexGrid
Dynamic auto-completion of closest matching term
Copyright © Daniel L. Rubin 2006
Mapping databases and text to Mapping databases and text to ongologiesongologies
Prostate DuctalAdenocarcinoma
Work by Nigam Shah
Copyright © Daniel Rubin Stanford University [email protected] 9
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel L. Rubin 2006
Degree of Interest modelingDegree of Interest modeling
● Creates a user profile to identify relevant information
● Developed by monitoring the user activities (e.g. navigation actions, editing and annotations)
● Permits model-based highlighting or filteringof “interesting” entities in the ontology
● Based on Degree of Interest Trees (Stuart Card) and Mylar (Mik Kersten)
Copyright © Daniel Rubin Stanford University [email protected] 10
Copyright © Daniel L. Rubin 2006
DIaMONDDIaMOND
● Degree of Interest Modeling for Ontology Navigation and Development
● Integrates Mylar degree of interest model (DOI) for Eclipse with Protégé
● Uses the DOI to provide adaptive visualizations of the ontology
Work by Tricia d’Entremont
Copyright © Daniel L. Rubin 2006Without DOIWithout DOI DOI HighlightingDOI Highlighting DOI FilteringDOI Filtering
Copyright © Daniel Rubin Stanford University [email protected] 11
Copyright © Daniel L. Rubin 2006
PictorialPictorial--guided ontology guided ontology navigationnavigation
● Users often interested in ontology subset pertinent to
Biological scale (organ/tissue/cell/molecule)Image regions (locations, components)
● Strategy: browse ontology views driven by the biological scale of the image
● Accomplished by annotating multi-scale images using ontologies to describe their contents
● Also enables image retrieval driven by ontology
Copyright © Daniel L. Rubin 2006
PictorialPictorial--guided ontology guided ontology navigationnavigation
Work by Nigam Shah
Copyright © Daniel Rubin Stanford University [email protected] 12
Copyright © Daniel L. Rubin 2006
Navigating by different image scaleNavigating by different image scale
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies andannotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel Rubin Stanford University [email protected] 13
Copyright © Daniel L. Rubin 2006
Visualizing ontologyVisualizing ontology--annotated annotated clinical trial dataclinical trial data
Different clinical trials vary in treatments, inclusion, methodology, etc.
Work by Maleh Hernandez
Copyright © Daniel L. Rubin 2006
Visualizing differences in Visualizing differences in NevirapineNevirapine trialstrials
Copyright © Daniel Rubin Stanford University [email protected] 14
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel L. Rubin 2006
Challenges for community Challenges for community ontology developmentontology development
● Need to communicate ways to improve & evolve ontologies
Missing attributes (e.g., definitions)Class too broad, should be split or deletedClass should be moved, renamed
● Current approach: email lists, F2F meetingsOntology feedback is disconnected from the ontologyCannot determine what parts of ontologies are stable, contentious, or evolving
Copyright © Daniel Rubin Stanford University [email protected] 15
Copyright © Daniel L. Rubin 2006
Example email communications Example email communications from from fugofugo--discussdiscuss
● I'd like to propose a few relationships between some higher level classes:
study executes study_designstudy_design has_factor owl:Thing
● What is the definition of biomaterial?
● Should biomaterial be a subclass of FuGO_54 study_object?
Copyright © Daniel L. Rubin 2006
Ontology Ontology ““marginal notesmarginal notes””
● Structured annotations on ontologies and their contents
● Capture community feedback on ontologies
● Localized to parts of ontology to which they apply
● Make explicit the types of ontology evolutionary changes
Copyright © Daniel Rubin Stanford University [email protected] 16
Copyright © Daniel L. Rubin 2006
Ontology marginal notesOntology marginal notes
Work by Ravi Tiruvury andKaustubh Supekar
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel Rubin Stanford University [email protected] 17
Copyright © Daniel L. Rubin 2006
Ontology metadata and peer Ontology metadata and peer reviewreview
● Variable ontology quality; no venue for community rating of ontologies
● Building a peer review platform for ontologies based on “Web of Trust”
● Providing tools to enable community to evaluate and improve ontology quality
Copyright © Daniel L. Rubin 2006
Peer review of ontologiesPeer review of ontologies
Metadata Ontology
Web-based tools to enter ontology metadata, post reviews, and rate reviewers
Work by KaustubhSupekar
Copyright © Daniel Rubin Stanford University [email protected] 18
Copyright © Daniel L. Rubin 2006
Biomedical ontology challengesBiomedical ontology challenges
● Find ontologies or terms of interest
● Visualize and navigate ontologies and annotated data
● Support distributed, collaborative ontology development
● Enable community-based evaluation of ontology quality
● Use ontologies to annotate data, and use annotations to make discoveries
Copyright © Daniel L. Rubin 2006
Preliminary resultsPreliminary results
● Two unrelated biomedical knowledge sources (ZFIN, OMIM)
● Each annotated using ontologies to describe phenotypes
● Search for similar phenotype annotations discover disease genes
● Example: holoprosencephaly genes
Copyright © Daniel Rubin Stanford University [email protected] 19
Copyright © Daniel L. Rubin 2006
SHH-/+
SHH-/-
shh-/+
shh-/-
ZEBRAFISH HUMAN
SHH gene was known to be associated with human holopros-encephaly
Are any other genes assoc?
SHH gene was known to be associated with human holopros-encephaly
Are any other genes assoc?
Copyright © Daniel L. Rubin 2006
Phenotype(clinical sign) = entity + quality
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
Syndrome = P1 + P2 + P3(disease)
= holoprosencephaly
Encoding disease phenotypesEncoding disease phenotypes
Copyright © Daniel Rubin Stanford University [email protected] 20
Copyright © Daniel L. Rubin 2006
Human holo-prosencephaly
Zebrafishshh
Zebrafishoep
Similar phenotypes
Gene homology?Finding human disease genesFinding human disease genes
1. Search ontology-annotated data for genes with similar phenotypes
2. Orthologs in human may cause disease?
Copyright © Daniel L. Rubin 2006
National Center for Biomedical Ontology
Capture and index
experimental results
BIOMEDICALTHEORY
EXPERIMENTALDATA
Relate experimental
data to results from other
sources
Visualizationand
Analysis
Biomedical OntologiesAnnotations on
Experimental Data
Information Integration
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Open Biomedical Ontologies (OBO)
Open BiomedicalData (OBD)
BioPortalIFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
National Center for Biomedical Ontology
Capture and index
experimental results
BIOMEDICALTHEORY
EXPERIMENTALDATA
Relate experimental
data to results from other
sources
Visualizationand
Analysis
Biomedical OntologiesAnnotations on
Experimental Data
Information Integration
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Open Biomedical Ontologies (OBO)
Open BiomedicalData (OBD)
BioPortalIFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
IFF
LIGAND
ENZYME
GENE
Copyright © Daniel Rubin Stanford University [email protected] 21
Copyright © Daniel L. Rubin 2006
AcknowledgementsAcknowledgements
● National Center for Biomedical OntologyExecutive Team: Mark Musen, Suzanna Lewis, Daniel Rubin, Sima MisracBiO staff: Natasha Noy, Tim Redmond, Lynn Murphy, ArchanaVerbakam, Chris Mungall, John Day-Richter, Mark Gibson, ShengQiang Shu, Nicole Washington, Harold Solbrig, Deepak Sharma, James Buntrock, Tom Johnson, Chris CallendarCollaborators: Michael Ashburner, Monte Westerfield, Ida Sim, Chris Chute, Barry Smith, Peggy Storey, Richard Olshen, Werner Ceusters, Deborah McGuinnessStudents & post-docs: Kaustubh Supekar, Nigam Shah, FabianNeuhaus, Tricia d'Entremont, Maria-Elena Hernandez, Sean Falconer, Ravi Tiruvury
● Funded through NIH Roadmap for Medical Research grant U54 HG004028
Program officer: Peter Good (NIGMS)Lead Science Officer: Carol Bean (NCRR)
Copyright © Daniel L. Rubin 2006
Contact informationCenter: [email protected]
Thank you.Thank you.