1 the future of biomedical informatics barry smith university at buffalo
TRANSCRIPT
1
The Future of Biomedical Informatics
Barry SmithUniversity at Buffalohttp://ontology.buffalo.edu/smith
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo2
1.1. Biomedical Informatics Needs DataBiomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo3
Biomedical Informatics Needs Data
• Four sides of the equation of translational medicine
• Biological data + clinical data
• Access + usability
4
5
Problems of gaining access to clinical data
1. privacy, security, liability
2. incentives (value of data ...)
3. costs (training ...)
Making data (re-)usable through standards
• Standards provide– common structure and terminology– single data source for review (less redundant
data)
• Standards allow– use of common tools and techniques– common training– single validation of data
6
7
Problems with standards
• Not all standards are of equal quality
• Once a bad standard is set in stone you are creating problems for your children and for your children’s children
• Standards, especially bad standards, have costs
1. Biomedical Informatics Needs Data
2.2. The Problem of Local Coding SchemesThe Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo
8
Multiple kinds of data in multiple kinds of silos
Lab / pathology data
Clinical trial data, including regulatory data
Electronic Health Record data
Patient histories (free text)
Medical imaging
Microarray data
Protein chip data
Flow cytometry
Mass spectrometry data
Genotype / SNP data
Mouse data, fly data, chicken data ...
9
How to find your data?
How to find other people’s data?
How to reason with data when you find it?
How to work out what data you do not have?
How to understand the significance of your own data from 3 years ago?
10
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3.3. NIH Policies for Data Reusability and the NIH Policies for Data Reusability and the Growth of Clinical Research ConsortiaGrowth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo
11
12
Sharing Research Data: Investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why this is not possible (http://grants.nih.gov/grants/policy/data_sharing).
13
Program Announcement (PA) Number: PAR-07-425
Title: Data Ontologies for Biomedical Research (R01)NIH Blueprint for Neuroscience Research, (http://neuroscienceblueprint.nih.gov/)National Cancer Institute (NCI), (http://www.cancer.gov)National Center for Research Resources (NCRR), (http://www.ncrr.nih.gov/)National Eye Institute (NEI), (http://www.nei.nih.gov/)National Heart Lung and Blood Institute (NHLBI), (http://http.nhlbi.nih.gov )National Human Genome Research Institute (NHGRI), (http://www.genome.gov)National Institute on Alcohol Abuse and Alcoholism (NIAAA), (http://www.niaaa.nih.gov/)National Institute of Biomedical Imaging and Bioengineering (NIBIB), (http://www.nibib.nih.gov/)National Institute of Child Health and Human Development (NICHD), (http://www.nichd.nih.gov/)National Institute on Drug Abuse (NIDA), (http://www.nida.nih.gov/)National Institute of Environmental Health Sciences (NIEHS), (http://www.niehs.nih.gov/)National Institute of General Medical Sciences (NIGMS), (http://www.nigms.nih.gov/)National Institute of Mental Health (NIMH), (http://www.nimh.nih.gov/)National Institute of Neurological Disorders and Stroke (NINDS), (http://www.ninds.nih.gov/)National Institute of Nursing Research (NINR), (http://www.ninr.nih.gov)
Release/Posted Date: August 3, 2007 Letters of Intent Receipt Date(s): December 18, 2007, August 18, 2008, December 22, 2009, and August 21, 2009 for the four separate receipt dates.
14
Purpose. Optimal use of informatics tools and resources [data sets] depends upon explicit understandings of concepts related to the data upon which they compute. This is typically accomplished by a tool or resource adopting a formal controlled vocabulary and ontology ... that describes objects and the relationships between those objects in a formal way.
... this FOA solicits Research Project Grant (R01) applications from institutions/ organizations that propose to develop an ontology that will make it possible for software to understand how two or more existing data sets relate to each other.
15
Currently, there is no convenient way to map the knowledge that is contained in one data set to that in another data set, primarily because of differences in language and structure. ... in some areas there are emerging standards. Examples include: •the Unified Medical Language System (UMLS), •the Gene Ontology, http://www.geneontology.org/, •the work supported by the caBIG project (https://cabig.nci.nih.gov/workspaces/VCDE/), •ontologies listed at the Open Biomedical Ontology web site (http://obo.sourceforge.net/).
16
This FOA will support limited awards, each of which focuses on integrating information between two (or a few very closely related) data sets in a single subject domain. The hope is that the developed vocabularies and ontologies will serve as nucleation points for other researchers in the area to build upon by adopting and extending the vocabularies and ontologies developed under this FOA.
Applicants are expected to identify and adopt emerging standards (such as those listed above) whenever possible. Applicants are also strongly encouraged to federate their data under appropriate infrastructures when possible. One potential infrastructure is provided by the Biomedical Informatics Research Network (http://www.nbirn.net ). The caBIG infrastructure (http://cabig.cancer.gov ) is another well established infrastructure that researchers should consider.
17
NIH anticipates that once important data sets in a topical area have been unified that others in that area will adopt the emerging standard.
The nucleation points should be able to interact with each other, e.g. through the use of tools that are made freely available to the research community, such as those created by the National Center for Biomedical Ontology (NCBO) (http://bioontology.org/) or by caBIG
18
Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction.
Criteria have been developed, and are undergoing empirical validation, by the Vocabulary and Common Data Element Work Group of caBIG. Other criteria have been specified by the OBO Foundry (http://obofoundry.org/ ).
In this FOA, the applicant should specify the criteria with which the ontology will conform and the reasons that those criteria are relevant to the data sets being integrated by the proposed ontology.
Growth of Clinical and Translational Research Consortia
Examples:
• PharmGKB
• caBIG
• BIRN – Biomedical Informatics Research Network– BIRN Ontology Task Force
19
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4.4. Is SNOMED the Solution?Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo
20
http://ontology.buffalo.edu/smith
21
medical records
SNOMED codes
The Systematized Nomenclature of Medicine
• built by College of American Pathologists
• now maintained by International Health Terminology Standards Development Organisation
• access via Virginia Tech SNOMED CT® Browser http://snomed.vetmed.vt.edu/
• (semi-) Open Source
22
SNOMED often includes non-perspicuous terms
FullySpecifiedName:
Coordination observable (observable entity)
FullySpecifiedName:
Coordination (observable entity)
23
and more:
Self-control behavior: aggression (observable entity)
Physical activity target light exercise (finding)
is a type of physical activity finding (finding)
24
odd bunchings
European is a ethnic group6
Other European in New Zealand (ethnic group) is a ethnic group
Mixed ethnic census group is a ethnic group
Flathead is a ethnic group
25
Poor modular development• No clear strategy for improvement
• Difficult to use for coding
• A tax on world health information technology?
26
SNOMED embraces only some of the multiple kinds of siloed data
Lab / pathology data
Electronic Health Record data
Patient histories
Clinical trial data, including regulatory data
Medical imaging
Microarray data
Protein chip data
Flow cytometry
Mass spectrometry data
Genotype / SNP data
Mouse data, fly data, chicken data ...
27
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5.5. The Gene Ontology The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo
28
29
30
The Gene Ontology
Open Source
Cross-Species
Impressive annotation resource
Impressive policies for maintenance
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
31sequence of X chromosome in baker’s yeast
How to do Biology across the Genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
32
33
what cellular component?
what molecular function?
what biological process?
A strategy for translational medicine
Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers
using functional information captured by GO for given gene product types identified 189 as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74.
34
GO widely used
Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers
using functional information captured by GO for given gene product types identified189 as being mutated at significant frequencies and thus as providing targets for diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
http://
ontologist.com
35
36
Benefits of GO
1. links people to data
2. links data together
• across species (human, mouse, yeast, fly ...)
• across granularities (molecule, cell, organ, organism, population)
3. links medicine to biological science
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6.6. The OBO FoundryThe OBO Foundry
7. The National Center for Biomedical Ontology
8. Ontology in Buffalo
37
38
a shared portal for (so far) 58 ontologies (low regimentation)
http://obo.sourceforge.net NCBO BioPortal
2003
39
40
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland,
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
41
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Building out from the original GO
42
OBO Foundry Coordinators
LewisBerkeley
AshburnerCambridge
SmithBuffalo
MungallBerkeley
The goal
all biological (biomedical) research data should cumulate to form a single, algorithmically processible, whole
http://obofoundry.org
43
44
CRITERIA
The ontology is open and available to be used by all.
The ontology is in, or can be instantiated in, a common formal language.
The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap.
FOUNDRY CRITERIA
45
CRITERIA UPDATE: The developers of each ontology
commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.
ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.
46
OBO Foundry is serving as a benchmark for improvements in discipline-focused terminology resources
yielding callibration of existing terminologies and data resources and alignment of different views
Consequences
47
Mature OBO Foundry ontologies (now undergoing reform)
Cell Ontology (CL)Chemical Entities of Biological Interest (ChEBI)Foundational Model of Anatomy (FMA)Gene Ontology (GO)Phenotypic Quality Ontology (PaTO)Relation Ontology (RO)Sequence Ontology (SO)
48
Ontologies being built to satisfy Foundry principles ab initio
Common Anatomy Reference Ontology (CARO)Ontology for Biomedical Investigations (OBI)Protein Ontology (PRO)RNA Ontology (RnaO)Subcellular Anatomy Ontology (SAO)
49
Ontologies in planning phaseBiobank/Biorepository Ontology (BrO, part of OBI)Environment Ontology (EnvO) Immunology Ontology (ImmunO)Infectious Disease Ontology (IDO)
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7.7. The National Center for Biomedical The National Center for Biomedical OntologyOntology
8. Ontology in Buffalo50
NCBO
National Center for Biomedical Ontology (NIH Roadmap Center)
51
• Stanford Medical Informatics• University of San Francisco Medical Center• Berkeley Drosophila Genome Project• Cambridge University Department of Genetics• The Mayo Clinic• University at Buffalo Department of Philosophy
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical Ontology
8.8. Ontology in BuffaloOntology in Buffalo52
53
Ontology Research Group in CoE
Werner CeustersLouis GoldbergBarry SmithRobert ArpThomas BittnerMaureen DonnellyDavid KoepsellRon RudnickiShahid Manzoor
54
Ontologies in BuffaloCommon Anatomy Reference Ontology (CARO)Environment Ontology (EnvO) Foundational Model of Anatomy (FMA)Infectious Disease Ontology (IDO)MS OntologyProtein Ontology (PRO)Relation Ontology (RO)
55
Ontologies plannedICF OntologyFood Ontology Allergy OntologyVaccine OntologyOntology for Community-Based MedicinePsychiatry Ontology