1 introduction to biomedical ontology barry smith university at buffalo
Post on 19-Dec-2015
214 views
TRANSCRIPT
On June 22, 1799, in Paris,everything changed
3
International System of Units
4
Multiple kinds of data in multiple kinds of silos
Lab / pathology data
EHR data
Clinical trial data
Patient histories
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
5
How to find data?
How to find other people’s data?
How to reason with data when you find it?
How to work out what data does not yet exist?
6
7
How to solve the problem of making the data we find queryable and re-
usable by others?
Part of the solution must involve: standardized terminologies and coding schemes
But there are multiple kinds of standardization for biomedical data, and
they do not work well together
Terminologies (SNOMED, UMLS)
CDEs (Clinical research)
Information Exchange Standards (HL7 RIM)
LIMS (LOINC)
MGED standards for microarray data, etc.
top-down grid frameworks (caBIG)
8
9
most successful, thus far: UMLSUnified Medical Language System
collection of separate terminologies built by trained experts
massively useful for information retrieval and information integration
UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies developed according to different and sometimes conflicting standards
10
for UMLSlocal usage respected
regimentation frowned upon
cross-framework consistency not important
no concern to establish consistency with basic science
different grades of formal rigor, different degrees of completeness, different update policies, capricious policies for empirical testing
A good solution to the silo problem must be:
• modular
• incremental
• bottom-up
• evidence-based
• revisable
• incorporate a strategy for motivating potential developers and users
11
12
ontologies = standardized labels designed for use in annotations
to make the data cognitively accessible to human beings
and algorithmically accessible to computers
13
ontologies = high quality controlled structured vocabularies for the annotation (description) of data
Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological OntologySyst. Biol. 56(2):283–294, 2007
15
what cellular component?
what molecular function?
what biological process?
ontologies used in curation of literature
16
Ontologies
help integrate complex representations of reality
help human beings find things in complex representations of reality
help computers reason with complex representations of reality
The Gene Ontology
Ontologies facilitate grouping of annotations
brain 20 hindbrain 15 rhombomere 10
Query brain without ontology 20Query brain with ontology 45
but they succeed in this only if there is one consensus ontology for each domain
18
19
20
21
People are extending the GO methodology to other domains of
biology and of clinical and translational medicine?
• It is easier to write useful software if one works with a simplified model
• (“…we can’t know what reality is like in any case; we only have our concepts…”)
• This looks like a useful model to me
• (One week goes by:) This other thing looks like a useful model to him
• Data in Pittsburgh does not interoperate with data in Vancouver
• Science is siloed
The standard engineering methodology
23
an analogue of the UMLS problem
proliferation of tiny ontologies by different groups with urgent annotation needs
25
the solution
establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence-based pathway to incremental improvement
26
a shared portal for (so far) 58 ontologies (low regimentation)
http://obo.sourceforge.net NCBO BioPortal
First step (2001)
27
OBO builds on the principles successfully implemented by the GO
recognizing that ontologies need to be developed in tandem
28
The methodology of cross-products
compound terms in ontologies to be defined as cross-products of simpler terms:E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose.
= factoring out of ontologies into discipline-specific modules (orthogonality)
29
The methodology of cross-products
enforcing use of common relations in linking terms drawn from Foundry ontologies serves
• to ensure that the ontologies are maintained and revised in tandem
• logically defined relations serve to bind terms in different ontologies together to create a network
30
31
The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/
Third step (2006)Third step (2006)
32
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Building out from the original GO
33
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
initial OBO Foundry coverage
GRANULARITY
RELATION TO TIME
34
CRITERIA
opennness
common formal language.
collaborative development
evidence-based maintenance
identifiers
versioning
textual and formal definitions
CRITERIA
Orthogonality = modularity
• one ontology for each domain• no need for mappings (which are in
any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change)
• everyone knows where to look to find out how to annotate each kind of data
35
36
COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO)
CRITERIA
OBO Foundry
provides guidelines (traffic laws) to new groups of ontology developers in ways which can counteract current dispersion of effort
38
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Building out from the original GO
39
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
GRANULARITY
RELATION TO TIME
Basic Formal Ontology
continuant occurrent
biological processes
independentcontinuant
cellular component
dependentcontinuant
molecular function
BFO: The Very Top
continuant
independentcontinuant
dependentcontinuant
qualityfunctionroledisposition
occurrent
function - of liver: to store glycogen- of birth canal: to enable transport- of eye: to see- of mitochondrion: to produce ATP- of liver: to store glycogen
not optional; reflection of physical makeup of bearer
role optional:exists because the bearer is in some special natural, social, or institutional set of circumstances in which the bearer does not have to be
role - bearers can have more than one role
person as student and staff member- roles often form systems of mutual dependence
husband / wife first in queue / last in queuedoctor / patient
host / pathogen
role of some chemical compound: to serve as analyte in an experiment
of a dose of penicillin in this human child: to treat a disease
of this bacteria in a primary host: to cause infection
A good solution to the silo problem must be:
• modular• incremental• bottom-up• evidence-based • revisable• incorporate a strategy for motivating potential
developers and users
46
Because the ontologies in the Foundry
are built as orthogonal modules which form an incrementally evolving network
• scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network
• users are motivated by the assurance that the ontologies they turn to are maintained by experts
47
More benefits of orthogonality
• helps those new to ontology to find what they need
• to find models of good practice• ensures mutual consistency of ontologies
(trivially)• and thereby ensures additivity of annotations
48
More benefits of orthogonality
• it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes
• thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness
49