Download - The OBO Foundry
The OBO Foundry
Barry Smith
1
History of Ontology as Computational Artifact
1970s: AI (based on FOL: McCarthy, Hayes)
1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...)
1999: GO, OBO format (Ashburner, ...)
2000s: Semantic Web (based on OWL; Horrocks, Hendler, 1000 lite ontologies)
2009: Reconciliation of OBO with OWL; but still 2 methodologies: OBO Foundry; NCBO Bioportal
2
Ontology and the Semantic Web
• html demonstrated the power of the Web to allow sharing of information
• can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)?
• can we use RDF and OWL to break down silos, and create useful integration of on-line data and information?
3
people tried, but the more they were successful, they more they failed
OWL breaks down data silos via controlled vocabularies for the description of data dictionaries
Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways
4
reasons for this effect• Semantic Web (original) idea: if a million ‘lite
ontologies bloom’, then somehow intelligence will be created
• let’s all build new ones (shrink-wrapped software mentality – you will not get paid for reusing existing ontologies
• requirements-driven software development, promotes forking, reduces potential for secondary uses
5
Ontology success stories, and some reasons for failure
•
A fragment of the “Linked Open Data” in the biomedical domain
6
What you get with ‘mappings’
HPO: all phenotypes (excess hair loss, duck feet ...)
7
What you get with ‘mappings’
HPO: all phenotypes (excess hair loss, duck feet ...)
NCIT: all organisms
8
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
9
What you get with ‘mappings’
all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
Acute Lymphoblastic Leukemia (A.L.L.)
10
Mappings are hardThey are fragile, and expensive to maintainNeed new authorities to maintain(one for each pair of
mapped ontologies), yielding new risk of forking – who will police the mappings?
The goal should be to minimize the need for mappings, by avoiding redundancy in the first place
Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible
11
Why should you care?
• you need to create systems for data mining and text processing which will yield useful digitally coded output
• if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted
• serious investment in annotation will be defeated from the start
• relevant data will not be found, because it will be lost in multiple semantic cemeteries
12
How to do it right?
• how create an incremental, evolutionary process, where what is good survives, and what is bad fails
• where the number of ontologies needing to be linked is small
• where links are stable• create a scenario in which people will find it
profitable to reuse ontologies, terminologies and coding systems which have been tried and tested
13
Reasons why GO has been successful
It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists
Based on community consensusUpdated every nightClear versioning principles ensure backwards
compatibility; prior annotations do not lose their value
Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)
14
GO has learned the lessons of successful cooperation
• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in
manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input with rapid turnaround and
help desk
15
GO has been amazingly successful in overcoming the data balkanization
problembut it covers only generic biological entities of three sorts:
– cellular components– molecular functions– biological processes
no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …
16
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 17
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Environment Ontology
envi
ron
men
ts
are
her
e
18
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OFORGANISMS
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO)
Population Phenotype
PopulationProcess
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Population-level ontologies 19
Ontology success stories, and some reasons for failure
•
20
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OFORGANISMS
Family, Community, Deme, Population
OrganFunction
(FMP, CPRO)
Population Phenotype
PopulationProcess
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
http://obofoundry.org 21
Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology
and agree in advance to collaborate with developers of ontologies in adjacent domains.
http://obofoundry.org
The OBO Foundry: a step-by-step, evidence-based approach to expand
the GO
22
OBO Foundry Principles
Common governance (coordinating editors)
Common training
Common architecture to overcome Tim Berners Lee-ism:
• simple shared top level ontology
• shared Relation Ontology: www.obofoundry.org/ro
23
Open Biomedical Ontologies Foundry
Seeks to create high quality, validated terminology modules across all of the life sciences which will be
• one ontology for each domain, so no need for mappings
• close to language use of experts
• evidence-based
• incorporate a strategy for motivating potential developers and users
• revisable as science advances
24
Principles
http://obofoundry.org/wiki/index.php/OBO_FoundryPrinciples
25
Pistoia AllianceOpen standards for data and technology interfaces in
the life science research industry
consortium of major pharmaceutical and life science companies
can we address the data silo problems created by multiplicity of proprietary terminologies by declaring terminology ‘pre-competitive’
require shared use of something like OBO Foundry ontologies in presentation of information?
26
27
Virtual Physiological Human
28
Only with a prospective standard like that of the OBO Foundry could
something like the VPH work
designed to guarantee interoperability of ontologies from the very start (and to keep out weeds)
initial set of 10 criteria tested in the annotation of
scientific literature
model organism databases
life science experimental results
29
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
OBO Foundry coverage
GRANULARITY
RELATION TO TIME
30
ORTHOGONALITY
modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit
work on other modules• incentivization of those responsible for
individual modules
31
Benefits of coordination
• Can more easily reuse what is made by others• Can more easily inspect and criticize what is
made by others• Leads to innovations (e.g. Mireot strategy for
importing terms into ontologies)
32
8 Foundry members (2010)
CHEBI: Chemical Entities of Biological Interest
GO: Gene Ontology
PATO: Phenotypic Quality Ontology
PRO: Protein Ontology
XAO: Xenopus Anatomy Ontology
ZFA: Zebrafish Anatomy Ontology
33
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)XAO ZFA
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule (SO, RnaO)
Molecular Function(GO)
Molecular Process
(GO)ChEBI PRO
Current Foundry members in yellow34
ORGAN ANDORGANISM
OrganismNCBI
Taxonomy
CARO FMAOrgan
Function(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)
XAO ZFA
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Componen
t(FMA, GO)
Cellular Function
(GO)
MOLECULESO RnaO Molecular Function
(GO)
Molecular Process
(GO)ChEBI PRO
Prospective Foundry ontologies (in green):Foundational Model of Anatomy Ontology (FMA)Cell Ontology (CL)Sequence Ontology (SO)RNA Ontology (RnaO)
35
Anatomy Ontology(FMA*, CARO)
Environment
Ontology(EnvO)
Infectious Disease
Ontology(IDO*)
Biological Process
Ontology (GO*)
Cell Ontology
(CL)
CellularComponentOntology
(FMA*, GO*) Phenotypic Quality
Ontology(PaTO)
Subcellular Anatomy Ontology (SAO)Sequence Ontology
(SO*) Molecular Function
(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization
top level
mid-level
domain level
Information Artifact Ontology
(IAO)
Ontology for Biomedical Investigations
(OBI)
Ontology of General Medical Science
(OGMS)
Basic Formal Ontology (BFO)
36
Problem cases
Common Anatomy Reference Ontology
Disease Ontology
Function Ontologies Cellular Component Function
Cellular Function
Organ Function
Artifact Function (pumping, transporting ...)
Environment Ontology
Species Ontology (NCBI Taxonomy)37
IDO (Infectious Disease Ontology) Core
Follows GO strategy of providing a canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities
38
IDO (Infectious Disease Ontology) Consortium• MITRE, Mount Sinai, UTSouthwestern – Influenza• IMBB/VectorBase – Vector borne diseases (A.
gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)
• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph. aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV
39
Ontology for General Medical Science
http://code.google.com/p/ogms/
(OBO) http://purl.obolibrary.org/obo/ogms.obo
(OWL) http://purl.obolibrary.org/obo/ogms.owl
40
OGMS-based initiatives
Vital Signs Ontology (VSO) (Welch Allyn)
EHR / Demographics Ontology
Infectious Disease Ontology
Mental Health Ontology
Emotion Ontology
41
Ontology for General Medical Science
Jobst Landgrebe (then Co-Chair of the HL7 Vocabulary Group):
“the best ontology effort in the whole biomedical domain by far”
42
EXPERIMENTAL ARTIFACTS Ontology for Biomedical Investigations (OBI)
CLINICAL MEDICINE Ontology of General Medical Science (OGMS)
INFORMATION ARTIFACTS Information Artifact Ontology (IAO)
How to keep clear about the distinction• processes of observation,
• results of such processes (measurement data)
• the entities observed
43