the obo foundry

61
1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University http:// Standards and Ontology

Upload: luisa

Post on 22-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Standards and Ontology. The OBO Foundry. Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University http://ontology.buffalo.edu/smith. we are accumulating huge amounts of sequence data, image data, pharma data,. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The OBO Foundry

1

The OBO FoundryBarry SmithCenter of Excellence in Bioinformatics & Life Sciences, University at Buffalo

IFOMIS, Saarland University

http://ontology.buffalo.edu/smith

Standards and Ontology

Page 2: The OBO Foundry

2

how do we know what data we have ?

how do I know what data you have ?

how do we know what data we don’t have ?

how do we make different sorts of data combinable, as we need to do in large domains such as neurodevelopment, immunology, cancer ...?

we are accumulating huge amounts of sequence data, image data, pharma data, ...

Page 3: The OBO Foundry

3

genomic medicine, molecular medicine, translational medicine, personalized medicine ...

need

methods for data integration to enable reasoning across data at multiple granularities

to identify biomedically relevant relations on the side of the entities themselves

Page 4: The OBO Foundry

4

Page 5: The OBO Foundry

5

where in the body ?

what kind of disease process ?

= we need ontologies

we need semantic annotation of data

Page 6: The OBO Foundry

6

Semantic Web, Moby, wikis, etc.

let a million flowers (and weeds) bloom

to create integration rely on (automatically generated?) post hoc mappings

how create broad-coverage semantic annotation systems for biomedicine?

Page 7: The OBO Foundry

7

most successful, thus far: UMLSbuilt by trained experts

massively useful for information retrieval and information integration

UMLS Metathesaurus a system of post hoc mappings between source vocabularies separately built

Page 8: The OBO Foundry

8

Page 9: The OBO Foundry

9

UMLS-based mappings fall shortof creating interoperability

because local usage is respected

regimentation frowned upon, no concern for cross-framework consistency

UMLS terminologies have different grades of formal rigor, different degrees of completeness, different update policies

Page 10: The OBO Foundry

10

with UMLS-based annotationswe can know what data we have (via term searches), but it is noisy

we can map between data at single granularities (via ‘synonyms’), but synonymy information is noisy

how do we know what data we don’t have ?

how do we reason with data (as at the molecular level), when no common logical backbone ?

Page 11: The OBO Foundry

11

for science

to develop high quality annotation resources in a collaborative, community effort?

create an evolutionary path towards improvement of terminologies, of the sort we find elsewhere in science

find ways to reward early adopters of the results

what is to be done?

Page 12: The OBO Foundry

12

for science

science works out from a consensus core, and strives to isolate and resolve inconsistencies as it extends at the fringes

we need to create a consensus corestart with what for human beings are trivialities (low hanging fruit) and work out from there

for science, consistency is a sine qua non

Page 13: The OBO Foundry

13FMA

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

Foundational Model of Anatomy

Page 14: The OBO Foundry

14

for science

include ontologies corresponding to the basic biomedical sciences in the core

clinical medicine relies on anatomy

and molecular biology to provide

integration across medical specialisms

Page 15: The OBO Foundry

15

for science

where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?

but we need more

Page 16: The OBO Foundry

16

Page 17: The OBO Foundry

17

what makes GO so wildly successful ?

Page 18: The OBO Foundry

18

science basis of the GO: trained experts curating peer-reviewed literature

different model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with gene products in a coordinated way

The methodology of annotations

Page 19: The OBO Foundry

19

cellular locations

molecular functions

biological processes

used to annotate the entities represented in the major biochemical databases

thereby creating integration across these databases and making them available to semantic search

A set of standardized textual descriptions of

Page 20: The OBO Foundry

20

what cellular component?

what molecular function?

what biological process?

Page 21: The OBO Foundry

21

This processleads to improvements and extensions of the ontology

which in turn leads to better annotations

a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself

RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form

Page 22: The OBO Foundry

22

Five bangs for your GO buckscience base

cross-species database integration

cross-granularity database integration

through links to the things which are of biomedical relevance

semantic searchability links people to software

Page 23: The OBO Foundry

23

but nowneed to improve the quality of GO to support more rigorous logic-based reasoning across the data annotated in its terms

need to extend the GO by engaging ever broader community support for the addition of new terms and for the correction of errors

Page 24: The OBO Foundry

24

but alsoneed to extend the methodology to other domains, including clinical domains need for

disease ontology

immunology ontology

symptom (phenotype) ontology

clinical trial ontology ...

Page 25: The OBO Foundry

25

the problemexisting clinical vocabularies are of variable quality and low mutual consistency

need for prospective standards to ensure mutual consistency and high quality of clinical counterparts of GO

need to ensure consistency of the new clinical ontologies with the basic biomedical sciences

if we do not start now, the problem will only get worse

Page 26: The OBO Foundry

26

the solutionestablish common rules governing best practices for creating ontologies and for using these in annotations

apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies

this solution is already being implemented

Page 27: The OBO Foundry

27

a shared portal for (so far) 58 ontologies (low regimentation)

http://obo.sourceforge.net NCBO BioPortal

First step (2003)First step (2003)

Page 28: The OBO Foundry

28

Page 29: The OBO Foundry

29

Second step (2004)Second step (2004)reform efforts initiated, e.g. linking GO to other

OBO ontologies to ensure orthogonality

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

Page 30: The OBO Foundry

30

The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/

Third step (2006)Third step (2006)

Page 31: The OBO Foundry

31

a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia

scientific literature model organism databases clinical trial data

The OBO FoundryThe OBO Foundry

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 32: The OBO Foundry

32

A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)

established March 2006

12 initial candidate OBO ontologies – focused primarily on basic science domains

several being constructed ab initio

by influential consortia who have the authority to impose their use on large parts of the relevant communities.

Page 33: The OBO Foundry

33

undergoing rigorous reform

new

GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology

CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 34: The OBO Foundry

34

Ontology Scope URL Custodians

Cell Ontology (CL)

cell types from prokaryotes to mammals

obo.sourceforge.net/cgi-

bin/detail.cgi?cell

Jonathan Bard, Michael Ashburner, Oliver Hofman

Chemical Entities of Bio-

logical Interest (ChEBI)

molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara

Common Anatomy Refer-

ence Ontology (CARO)

anatomical structures in human and model

organisms(under development)

Melissa Haendel, Terry Hayamizu, Cornelius

Rosse, David Sutherland,

Foundational Model of Anatomy (FMA)

structure of the human body

fma.biostr.washington.

edu

JLV Mejino Jr.,Cornelius Rosse

Functional Genomics Investigation

Ontology (FuGO)

design, protocol, data instrumentation, and

analysisfugo.sf.net FuGO Working Group

Gene Ontology (GO)

cellular components, molecular functions, biological processes

www.geneontology.org

Gene Ontology Consortium

Phenotypic Quality Ontology

(PaTO)

qualities of anatomical structures

obo.sourceforge.net/cgi

-bin/ detail.cgi?attribute_and_value

Michael Ashburner, Suzanna

Lewis, Georgios Gkoutos

Protein Ontology (PrO)

protein types and modifications

(under development)Protein Ontology

Consortium

Relation Ontology (RO)

relationsobo.sf.net/

relationshipBarry Smith, Chris

Mungall

RNA Ontology(RnaO)

three-dimensional RNA structures

(under development) RNA Ontology Consortium

Sequence Ontology(SO)

properties and features of nucleic sequences

song.sf.net Karen Eilbeck

Page 35: The OBO Foundry

35

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Annotations plus ontologies yield an ever-growing computer-interpretable map of biological reality.

Page 36: The OBO Foundry

36

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Building out from the original GO

Page 37: The OBO Foundry

37

Disease Ontology (DO)

Biomedical Image and Image Process Ontology (BiiO)

Upper Biomedical Ontology (OBO UBO)

Ontology of Biomedical Investigations (OBI)

Clinical Trial Ontology (CTO)

Under consideration:

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 38: The OBO Foundry

38

OBO Foundry = a subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles reflecting best practice in ontology development designed to ensure

tight connection to the biomedical basic sciences

compatibility

interoperability, common relations

formal robustness

support for logic-based reasoning

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 39: The OBO Foundry

39

CRITERIA

The ontology is OPEN and available to be used by all.

The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE.

The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap.

CRITERIA

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 40: The OBO Foundry

40

CRITERIA UPDATE: The developers of each ontology

commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.

ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 41: The OBO Foundry

41

for science

if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts

orthogonality of ontologies implies additivity of annotations

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 42: The OBO Foundry

42

CRITERIA

IDENTIFIERS: The ontology possesses a unique identifier space within OBO.

VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use

The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms.

CRITERIA

Page 43: The OBO Foundry

43

CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.

DOCUMENTATION: The ontology is well-documented.

USERS: The ontology has a plurality of independent users.

CRITERIA

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 44: The OBO Foundry

44

COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*

* Smith et al., Genome Biology 2005, 6:R46

CRITERIA

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 45: The OBO Foundry

45

Foundational is_apart_of

Spatial located_incontained_inadjacent_to

Temporal transformation_ofderives_frompreceded_by

Participation has_participanthas_agent

OBO Relation Ontology

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 46: The OBO Foundry

46

Further criteria will be added over time in light of lessons learned in order to bring about a gradual improvement in the quality of Foundry ontologies

ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT TO CONSTANT UPDATE IN LIGHT OF SCIENTIFIC ADVANCE

IT WILL GET HARDER

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 47: The OBO Foundry

47

But not everyone needs to join

The Foundry is not seeking to serve as a check on flexibility or creativity

ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE COMMUNITY CRITICISM, CORRECTION AND EXTENSION WITH NEW TERMS

IT WILL GET HARDER

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 48: The OBO Foundry

48

to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development

CREDIT for high quality ontology development work

KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 49: The OBO Foundry

49

to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new annotation schemas by each clinical research group by

REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 50: The OBO Foundry

50

to serve as BENCHMARK FOR IMPROVEMENTS in discipline-focused terminology resources

once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage

exploit the avenue of EVIDENCE-BASED MEDICINE (NIH CLINICAL RESEARCH NETWORKS) to foster their use by clinicians

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 51: The OBO Foundry

51

June 2006: establishment of MICheck:

reflects growing need for prescriptive checklists specifying the key information to include when reporting experimental results (concerning methods, data, analyses and results).

the vision is spreading

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 52: The OBO Foundry

52

MICheck: ‘a common resource for minimum information checklists’ analogous to OBO / NCBO BioPortal

MICheck Foundry: will create ‘a suite of self-consistent, clearly bounded, orthogonal, integrable checklist modules’ *

* Taylor CF, et al. Nature Biotech, in press

MICheck Foundry

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

Page 53: The OBO Foundry

53

Transcriptomics (MIAME Working Group)

Proteomics (Proteomics Standards Initiative)

Metabolomics (Metabolomics Standards Initiative)

Genomics and Metagenomics (Genomic Standards Consortium)

In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group)

Phylogenetics (Phylogenetics Community)

RNA Interference (RNAi Community)

Toxicogenomics (Toxicogenomics WG)

Environmental Genomics (Environmental Genomics WG)

Nutrigenomics (Nutrigenomics WG)

Flow Cytometry (Flow Cytometry Community)

MICheck/Foundry communities

Page 54: The OBO Foundry

54

how to replicate the successes of the GO in clinical medicine?

choose two or three representative disease domains

work out reasoning challenges for those domains

work with specialists to create ontologies interoperable with OBO Foundry basic science ontologies to address these reasoning challenges

work with leaders of professional associations and of clinical trial initiatives to foster the collection of clinical data annotated in their terms

Fourth Step (the future)Fourth Step (the future)

Page 55: The OBO Foundry

55

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO Foundry coverage (canonical ontologies)

GRANULARITY

RELATION TO TIME

Page 56: The OBO Foundry

56

INDEPENDENT

CONTINUANTS

organism

system

organ

organ part

tissue

cell

acellular anatomical structure

biological molecule

genome

DEPENDENT CONTINUANTS

physiology

(functions)

pathologyacute stage

progressive stage

resolution stage

Page 57: The OBO Foundry

57

Draft Ontology for Acute Respiratory Distress Syndrome

Page 58: The OBO Foundry

58

Draft Ontology for Muscular Sclerosis

what data do we have?

what data do the others have?

what data do we not have?

Page 59: The OBO Foundry

59

Draft Ontology for Muscular Sclerosis

to apprehend what is unknown requires a complete demarcation of the relevant space of alternatives

Page 60: The OBO Foundry

60

Goal: to advance ontology as science

http://ncor.us

National Center for Ontological Research

Page 61: The OBO Foundry

61