biomedical ontologies: the state of the art barry smith and werner ceusters mie, sarajevo, august 30...
TRANSCRIPT
Biomedical Ontologies: The State of the Art
Barry Smith and Werner Ceusters
MIE, Sarajevo, August 30
1
Part 1: Barry SmithOntologies are Representations of What is General in Reality
Part 2: Werner CeustersReferent Tracking: Pinning Ontologies to Instances in Reality
2
Uses of ‘ontology’ in PubMed abstracts
3
By far the most successful: GO (Gene Ontology)
4
You’re interested in which genes control heart muscle development
17,536 results
5
Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...
Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)
attacked
time
control
Puparial adhesionMolting cyclehemocyanin
Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes
Immune responseToll regulated genes
Amino acid catabolismLipid metobolism
Peptidase activityProtein catabloismImmune response
Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...
Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)
Microarray datashows changed expression ofthousands of genes.
How will you spot the patterns?
6
You’re interested in which of your hospital’s patient data is relevant to understanding how genes control heart muscle development
7
Lab / pathology dataEHR dataClinical trial dataFamily history data Medical imagingMicroarray dataModel organism dataFlow cytometryMass specGenotype / SNP data
How will you spot the patterns?How will you find the data you need?
8
One strategy for bringing order into this huge conglomeration of data is through the
use of Common Data Elements
• Discipline-specific (cancer, NIAID, …)• Do not solve the problems of balkanization
(data siloes)• Do not evolve gracefully as knowledge
advances• Support data cumulation, but do not readily
support data integration and computation
9
An ontology is not a terminology
Existing term lists and CDEs
• built to serve specific data-processing
• in ad hoc ways
Ontologies
• designed from the start to ensure integratability and reusability of data
• by incorporating a common logical structure
10
How does theGene Ontology work?
with thanks to Jane Lomax, Gene Ontology Consortium
11
GO provides a controlled system of representations for use in annotating data
• multi-species, multi-disciplinary, open source
• contributing to the cumulativity of scientific results obtained by distinct research communities
• compare use of kilograms, meters, seconds … in formulating experimental results
12
13
Definitions
14
Gene products involved in cardiac muscle development in humans 15
GO provides answers to three types of questions
for each gene product• in what parts of the cell has it been identified?• exercising what types of molecular functions?• with what types of biological processes?
when is a particular gene product involved • in the course of normal development?• in the process leading to abnormality
with what functions is the gene product associated in other biological processes?
16
Some pain-related terms in GO
GO:0048265 response to pain
GO:0019233 sensory perception of pain
GO:0048266 behavioral response to pain
GO:0019234 sensory perception of fast pain
GO:0019235 sensory perception of slow pain
GO:0051930 regulation of sensory perception of pain
GO:0050967 detection of electrical stimulus during sensory perception of pain
GO:0050968 detection of chemical stimulus involved in sensory perception of pain
GO:0050966 detection of mechanical stimulus involved in sensory perception of pain
17
GO:0050968 detection of chemical stimulus involved in sensory perception of pain
18
GO provides a tool for algorithmic reasoning
19
Hierarchical view representing relations between represented types 20
GO allows a new kind of biological research, based on analysis and comparison of the massive quantities of annotations linking GO terms to gene products
21
One standard method
Sjöblöm T, et al. analyzed13,023 genes in 11 breast and 11 colorectal cancers
using functional information captured by GO for given gene product types
identified 189 as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
22
Uses of GO in studies of:• Biomedical discovery acceleration, with applications to
craniofacial development. PMID: 19325874• Persistent changes in spinal cord gene expression after
recovery from inflammatory hyperalgesia: a preliminary study on pain memory. PMID: 18366630
• Spinal cord transcriptional profile analysis reveals protein trafficking and RNA processing as prominent processes regulated by tactile allodynia. PMID: 17069981
• Immune system involvement in abdominal aortic aneurisms (PMID 17634102)
23
$100 mill. invested in literature curation using GO
over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO
ontologies provide the basis for capturing biological theories in computable form
24
GO is amazingly successful in overcoming problems of balkanization
but it covers only generic biological entities of three sorts:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of diseases, symptoms, …
25
Extending the GO methodology to other domains of biology and
medicine
26
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry27
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland,
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
28
OBO Foundry
recognized by NIH as framework to address mandates for re-usability of data collected through Federally funded research
see NIH PAR-07-425: Data Ontologies for Biomedical Research (R01)
29
Initial Candidate Members– GO Gene Ontology– CL Cell Ontology– SO Sequence Ontology– ChEBI Chemical Ontology – PATO Phenotype (Quality) Ontology– FMA Foundational Model of Anatomy– ChEBI Chemical Entities of Biological Interest – CARO Common Anatomy Reference Ontology – PRO Protein Ontology
The OBO Foundry
30
Under development – Disease Ontology– Infectious Disease Ontology– Mammalian Phenotype Ontology – Plant Trait Ontology– Environment Ontology– Ontology for Biomedical Investigations– Behavior Ontology– RNA Ontology – RO Relation Ontology
The OBO Foundry
31
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry32
OBO Foundry is organized in terms of Basic Formal Ontology
Each Foundry ontology can be seen as an extension of a single upper level ontology (BFO)
33
Basic Formal Ontology (BFO)
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
http://ifomis.uni-saarland.de/bfo/34
Fundamental Dichotomy• Continuants preserve their identity through
change
vs.
• Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
– exist only in their phases
– have all their parts of necessity
35
Ontology and Referent Tracking
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......
types
instances 36
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Organism-Level Process
(GO)
CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
Cellular Process
(GO)
MOLECULEMolecule
(ChEBI, SO,RNAO, PRO)
Molecular Function(GO)
Molecular Process
(GO)
rationale of OBO Foundry coverage (homesteading principle)
GRANULARITY
RELATION TO TIME
37
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
COMPLEX OF ORGANISMS
Family, Community,
Deme, Population OrganFunction
(FMP, CPRO)
Population
Phenotype
Population Process
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
(FMA, CARO)
Phenotypic Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cell Com-
ponent(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
E N
V I R
O N
M E
N T
38
The Gene Ontology (GO)
Continuant Occurrent
IndependentContinuant
DependentContinuant
cell component
biological process
molecular function
Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology terms based on TIGR gene product annotations. CompuTerm 2004, 31-38.
Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26
39
Users of BFO
GO / OBO Foundry
NCI BiomedGT
SNOMED CT
ACGT Clinical Genomics Trials on Cancer – Master Ontology / Formbuilder (Case Report Forms for Cancer Clinical Trials)
Ontology for Risks Against Patient Safety (RAPS) (EU)
40
Users of BFO
MediCognos / Microsoft Healthvault
Cleveland Clinic Semantic Database in Cardiothoracic Surgery
Major Histocompatibility Complex (MHC) Ontology (NIAID)
Neuroscience Information Framework Standard (NIFSTD)
41
IDO Infectious Disease Ontology
• MITRE, Mount Sinai, UTSouthwestern – Influenza
• IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis, Staph. aureus
• Case Western Reserve – Infective Endocarditis
• University of Michigan – Brucilosis
42
Users of BFO
Interdisciplinary Prostate Ontology (IPO)
Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research
Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials
Ontology for General Medical Science
43
depends_on
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......quality dependson bearer
44
Specifically dependent continuants
• the quality of whiteness of this cheese
• your role as lecturer
• the disposition of this patient to experience diarrhea
45
depends_on
Continuant Occurrent
process
IndependentContinuant
thing
DependentContinuant
quality
.... ..... .......temperature dependson bearer
46
Realizable dependent continuants
plan
function
role
disposition
capability
tendency
continuants
47
Their realizations
execution
expression
exercise
realization
application
course
occurrents
48
Continuant
IndependentContinuant
DependentContinuant
..... .....
Non-realizableDependentContinuant(quality)
Realizable DependentContinuant(function, role, disposition)
49
realization depends_on disposition
Continuant Occurrent
IndependentContinuant
bearer
DependentContinuant
disposition
.... ..... .......50
Process of realization
Dependence
a is dependent on b =def. a is necessarily such that if b ceases to exist than a ceases to exist
51
Specifically Dependent Continuants
SpecificallyDependentContinuant
Quality, Pattern
Realizable Dependent Continuant
if any bearer ceases to exist, then the quality or function ceases to exist
the color of my skin
the function of my heart to pump blood
my weight52
Generically Dependent Continuants
GenericallyDependentContinuant
Information Object
Gene Sequence
if one bearer ceases to exist, then the entity can survive, because there are other bearers
(copyability)
the pdf file on my laptop
the DNA (sequence) in this chromosome 53
Four distinct classificatory tasks
1. of people (patients, carriers, …)
2. of diseases (cases, instances, problems, …)
3. of courses of disease (symptoms, treatments…)
4. of representations (records, observations, data, diagnoses…)
ICD confuses 1. & 2.
HL7, most standard terminologies, confuse 2. and 4
54
Four distinct BFO categories
1. person (patient, carrier, …) – independent continuant
2. disease (case, instance, problem, …) – specifically dependent continuant
3. course of disease (symptom, treatment…)– occurrent
4. representation (record, datum, diagnosis…)– generically dependent continuant
55
Four distinct BFO categories
1. people (patients, carriers, …) – independent continuants
2. diseases (cases, instances, problems, …) – dispositions
3. courses of disease (symptoms, treatments…)– realizations of dispositions
4. representations (records, data, diagnoses…)– generically dependent continuants
56
Big Picture (with thanks to Richard Scheuermann)
57
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
etiological process
produces
disorder
bears
disposition
realized_in
pathological process
produces
abnormal bodily features
recognized_as
signs & symptomsinterpretive process
produces
diagnosis
used_in58
Elucidation of Primitive Terms ‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process. disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when certain conditions are satisfied.
clinically abnormal - some bodily feature that (1) is not part of the life plan for an organism of the relevant
type (unlike aging or pregnancy), (2) is causally linked to an elevated risk either of pain or other
feelings of illness, or of death or dysfunction, and (3) is such that the elevated risk exceeds a certain threshold
level.*
*Compare: baldness59
Definitions - Foundational Terms
Disorder =def. – A causally linked combination of physical components that is clinically abnormal.
Pathological Process =def. – A bodily process that is a manifestation of a disorder and is clinically abnormal.
Disease =def. – A disposition (i) to undergo pathological processes that (ii) exists in an organism because of one or more disorders in that organism.
60
Dispositions and Predispositions
All diseases are dispositions; not all dispositions are diseases. A predisposition is a disposition. Predisposition to Disease of Type X =def. – A disposition in an
organism that constitutes an increased risk of the organism’s subsequently developing the disease X.
HNPCC is caused by a disorder (mutation) in a DNA mismatch repair gene that disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a predisposition to the development of colon cancer.
61
Definitions - Clinical Evaluation Terms
Sign =def. – A bodily feature of a patient that is observed in a physical examination and is deemed by the clinician to be of clinical significance. (Objectively observable features)
Symptom =def. – A experienced bodily feature of a patient that is observed by and observable only by the patient and is of the type that can be hypothesized by a patient to be a realization of a disease. (A restricted family of phenomena including pain, nausea, anger, drowsiness, which are of their nature experienced in the first person)
Symptoms are subjective. But this does not mean that there is no objective fact of the matter whether a given symptom exists
62
Cirrhosis - environmental exposure Etiological process - phenobarbitol-
induced hepatic cell death produces
Disorder - necrotic liver bears
Disposition (disease) - cirrhosis realized_in
Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces
Abnormal bodily features recognized_as
Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out cirrhosis suggests
Laboratory tests produces
Test results - elevated liver enzymes in serum used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease cirrhosis
63
Influenza - infectious Etiological process - infection of
airway epithelial cells with influenza virus produces
Disorder - viable cells with influenza virus bears
Disposition (disease) - flu realized_in
Pathological process - acute inflammation produces
Abnormal bodily features recognized_as
Symptoms - weakness, dizziness Signs - fever
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out influenza suggests
Laboratory tests produces
Test results - elevated serum antibody titers used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease flu
But the disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).
64
Huntington’s Disease - genetic Etiological process - inheritance of
>39 CAG repeats in the HTT gene produces
Disorder - chromosome 4 with abnormal mHTT bears
Disposition (disease) - Huntington’s disease realized_in
Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum produces
Abnormal bodily features recognized_as
Symptoms - anxiety, depression Signs - difficulties in speaking and
swallowing
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out Huntington’s suggests
Laboratory tests produces
Test results - molecular detection of the HTT gene with >39CAG repeats used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease
65
HNPCC - genetic pre-disposition
Etiological process - inheritance of a mutant mismatch repair gene produces
Disorder - chromosome 3 with abnormal hMLH1 bears
Disposition (disease) - Lynch syndrome realized_in
Pathological process - abnormal repair of DNA mismatches produces
Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) bears
Disposition (disease) - non-polyposis colon cancer realized in
Symptoms (including pain)
66
Definition: Etiology
Etiological Process =def. – A process in an organism that leads to a subsequent disorder.
Example: toxic chemical exposure resulting in a mutation in the genomic DNA of a cell; infection of a human with a pathogenic virus; inheritance of two defective copies of a metabolic gene
The etiological process creates the physical basis of that disposition to pathological processes which is the disease.
67
Definitions - Diagnosis
Clinical Picture =def. – A representation of a clinical phenotype that is inferred from the combination of laboratory, image and clinical findings about a given patient.
Diagnosis =def. – A representation of a conclusion of an interpretive process that has as input a clinical picture of a given patient and as output an assertion to the effect that the patient has a disease of such and such a type.
68
Definitions - Qualities
Manifestation of a Disease =def. – A bodily feature of a patient that is (a) a deviation from clinical normality that exists in virtue of the realization of a disease and (b) is observable.
Observability includes observable through elicitation of response or through the use of special instruments.
Preclinical Manifestation of a Disease =def. – A manifestation of a disease that exists prior to its becoming detectable in a clinical history taking or physical examination.
Clinical Manifestation of a Disease =def. – A manifestation of a disease that is detectable in a clinical history taking or physical examination.
Phenotype =def. – A (combination of) bodily feature(s) of an organism determined by the interaction of its genetic make-up and environment.
Clinical Phenotype =def. – A clinically abnormal phenotype.
69