judith blake biomedical ontologies and their role in functional genomics judith a. blake, ph.d. the...

56
Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012 Func Genomics2012

Upload: maurice-warren

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Biomedical Ontologies and their role in functional genomics

Judith A. Blake, Ph.D.The Jackson Laboratory

Functional Genomics – February 2012

Page 2: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Bioinformatics-What is that?Bioinformatics is:• the use of computers (and persistent data structures) in pursuit

of biological research

• an emerging new discipline, with its own goals, research program, and practitioners

• the fundamental tool for 21st century biology

• all of the above. Robert J. Robbins

Page 3: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Topics:

• We need to coordinate the representation of information – from genetic and genomic studies, – as might be reported in the biomedical literature, and – from the output of high-throughput experiments

• This is done by designing databases (e.g., MGI) and bio-ontologies (e.g., GO) to support comprehensive data integration

• Such resources enable comparative analysis between different organisms and biological systems

• With the objective of helping us gain new knowledge about biological systems and particularly about genetic components of human diseases

Func Genomics2012

Page 4: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Roxy Laybourne and others, photo by Chip Clark

Managing Biological Information is Nothing New

Bird Collections at the Smithsonian Natural History Museum

Page 5: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

The trouble with facts is that there are so many of them.

Samuel McChord Crothers, The Gentle Reader (1903)

Func Genomics2012

Page 6: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

The data integration problem

• Vast wealth of data residing in different databases– Meaning of those records must be reconciled

for data to be automatically integrated

Sciencedatabase

medicaldatabase

Func Genomics2012

Page 7: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Accession File

Page 8: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

TCTCTCCCCCGCCCCCCAGGCTCCCCCGGTCGCTCTCCTCCGGCGGTCGCCCGCGCTCGGTGGATGTGGC

TGGCAGCTGCCGCCCCCTCCCTCGCTCGCCGCCTGCTCTTCCTCGGCCCTCCGCCTCCTCCCCTCCTCCT

TCTCGTCTTCAGCCGCTCCTCTCGCCGCCGCCTCCACAGCCTGGGCCTCGCCGCGATGCCGGAGAAGAGG

CCCTTCGAGCGGCTGCCTGCCGATGTCTCCCCCATCAACTACAGCCTTTGCCTCAAGCCCGACTTGCTGG

ACTTCACCTTCGAGGGCAAGCTGGAGGCCGCCGCCCAGGTGAGGCAGGCGACTAATCAGATTGTGATGAA

TTGTGCTGATATTGATATTATTACAGCTTCATATGCACCAGAAGGAGATGAAGAAATACATGCTACAGGA

TTTAACTATCAGAATGAAGATGAAAAAGTCACCTTGTCTTTCCCTAGTACTCTGCAAACAGGTACGGGAA

CCTTAAAGATAGATTTTGTTGGAGAGCTGAATGACAAAATGAAAGGTTTCTATAGAAGTAAATATACTAC

CCCTTCTGGAGAGGTGCGCTATGCTGCTGTAACACAGTTTGAGGCTACTGATGCCCGAAGGGCTTTTCCT

TGCTGGGATGAGCCTGCTATCAAAGCAACTTTTGATATCTCATTGGTTGTTCCTAAAGACAGAGTAGCTT

TATCAAACATGAATGTAATTGACCGGAAACCATACCCTGATGATGAAAATTTAGTGGAAGTGAAGTTTGC

CCGCACACCTGTTATGTCTACATATCTGGTGGCATTTGTTGTGGGTGAATATGACTTTGTAGAAACAAGG

TCAAAAGATGGTGTGTGTGTCCGTGTTTACACTCCTGTTGGCAAAGCAGAGCAAGGAAAATTTGCGTTAG

AGGTTGCTGCTAAAACCTTGCCTTTTTATAAGGACTACTTCAATGTTCCTTATCCTCTACCTAAAATTGA

TCTCATTGCTATTGCAGACTTTGCAGCTGGTGCCATGGAGAACTGGGGCCTTGTTACTTATAGGGAGACT

GCATTGCTTATTGATCCAAAAAATTCCTGTTCTTCATCCCGCCAGTGGGTTGCTCTGGTTGTGGGACATG

AACTCGCCCATCAATGGTTTGGAAATCTTGTTACTATGGAATGGTGGACTCATCTTTGGTTAAATGAAGG

TTTTGCATCCTGGATTGAATATCTGTGTGTAGACCACTGCTTCCCAGAGTATGATATTTGGACTCAGTTT

GTTTCTGCTGATTACACCCGTGCCCAGGAGCTTGACGCCTTAGATAACAGCCATCCTATTGAAGTCAGTG

TGGGCCATCCATCTGAGGTTGATGAGATATTTGATGCTATATCATATAGCAAAGGTGCATCTGTCATCCG

AATGCTGCATGACTACATTGGGGATAAGGACTTTAAGAAAGGAATGAACATGTATTTAACCAAGTTCCAA

CAAAAGAATGCTGCCACAGAGGATCTCTGGGAAAGTTTAGAAAATGCTAGTGGTAAACCTATAGCAGCTG

GTTTCTGCTGATTACACCCGTGCCCAGGAGCTTGACGCCTTAGATAACAGCCATCCTATTGAAGTCAGTG

TGGGCCATCCATCTGAGGTTGATGAGATATTTGATGCTATATCATATAGCAAAGGTGCATCTGTCATCCG

AATGCTGCATGACTACATTGGGGATAAGGACTTTAAGAAAGGAATGAACATGTATTTAACCAAGTTCCAA

CAAAAGAATGCTGCCACAGAGGATCTCTGGGAAAGTTTAGAAAATGCTAGTGGTAAACCTATAGCAGCTG

From the birth of the field of genetics until a decade ago, it was generally assumed that the parental origin of a gene could have no effect on its function. In the vast majority of studies carried out during the last 90 years, this paradigm has appeared to hold true. However, with increasingly sophisticated genetic and embryological investigations in the mouse, important exceptions to this rule have been uncovered over the last decade. First, the results of nuclear transplantation experiments carried out with single-cell fertilized embryos have demonstrated an absolute requirement for both a maternally-derived and a paternally-derived pronculeus to allow full-term development (McGrath and Solter, 1983). Second, in animals that receive both homologs of certain chromosomes or subchromosomal regions from one parent and not the other (through the mating of translocation heterozygotes as described in Section 5.2.3), dramatic effects on development can be observed including enhanced or retarded growth and outright lethality (Cattanach and Kirk, 1985). Third, either of two deletions that cover a small region of mouse chromosome 17 can be transmitted normally from a father to his offspring, but these same deletions cause prenatal lethality when they are maternally transmitted (Johnson, 1974; Winking and Silver, 1984). Fourth, similar parent-of-origin effects have been observed on the phenotypes expressed by animals that carry a targeted knock-out allele at the Igf2 locus (DeChiara et al., 1991). Finally, molecular techniques have been used to directly demonstrate the expression of transcripts from one parental allele and not the other at the Igf2r locus (Barlow et al., 1991) and the H19 locus (Bartolomei et al., 1991). The accumulated data indicate that a subset of mouse genes (on the order of 0.2%) will function differently in normal embryos depending on whether they have been inherited through the male or the female gamete, such that one allele will be expressed and the other will be silent. Genomic imprinting is the term that has been coined to describe this situation in which the phenotype expressed by a gene varies depending on its parental origin (Sapienza, 1989). Further experiments have demonstrated that, in general, the "imprint" is erased and regenerated during gametogenesis so that the function of an imprintable gene is fully determined by the sex of its progenitor alone, and not by earlier ancestors.

Page 9: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Crash Blossomsand other semantic ambiguities

translating what we say into what we mean: data, words and knowledge

Crash Blossoms

“Violinist Linked to JAL Crash Blossoms”

“MacArthur Flies Back to Front”

“Squad Helps Dog Bite Victim”

“Red Tape Holds Up New Bridge.”

Page 10: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

The English Language is hard to learn, even for computers.

“Jessica Hahn Pooped After Long Day Testifying”

Focus: creating the data structures and mining the biomedical literature to provide knowledge representations –

with the objective of using logical reasoning applications and predictive approaches to ‘interrogate’ very large data sets,

generating new hypothesis for further experimental investigation

Page 11: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

What is an ontology?

Page 12: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

A biological ontology is: A formal representation of some

portion of biological reality

eye

what kinds of things exist?

what are the relationships between these things?

ommatidium

sense organeyedisc

is_a

part_of

developsfrom

Page 13: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Why do we need ontologies?

Page 14: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Connections are not made explicit by default

• Computers are not intelligent• We need to spell out interconnectedness of entities

– Specificity Bone mineralization vs ossification

– Granularity Osteocyte vs bone

– Spatial Gill membrane and branchiostegal ray – Perspective Anatomy vs physiology

– Causally related entities• pathways• development

– Evolutionary Homology and descent

Page 15: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Ontologies : the key to data integration

• Ontologies provide:– rigorous, shared computable definitions for terms– classifications and connections that can be used

for database search and inference

Page 16: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Annotation of genes and proteins using ontologies are key to data integration

Biomedical Ontologies

Ontologies are human and machine readable classification of biological knowledge.

Ontologies have:•Terms •Term definitions•Relationships among terms

Page 17: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Good ontology design is required for data integration

• Not any old ontology will do– Data integration served poorly by poor ontologies

• How do we know good ontologies?– Types and classifications should be constructed

according to science and should reflect nature– Ontology constructed along lines of ontology best

practices• http://www.obofoundry.org• Formal definitions and relations• Based on distinction between types and instances• Distinction between types and their labels

Page 18: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

The Gene Ontology

• Mid-size– ~33,700 terms in all 3 ontologies– ~2n,nnn links (is_a, part_of, regulates)

• Each term represents a type– Terms also have alternate labels (synonyms)

• These do not represent distinct types• Humans use different labels to refer to the same

biological pattern– E.g: endoplasmic reticulum vs ER

Page 19: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Ontology is not nomenclature• A type can have many labels

– Preferred label (term)– Synonyms, aliases

• Types are not labels– Types are the underlying pattern

• Identified by a formal definition– Labels are important for doing science

• But life existed for billions of years quite happily prior to the invention of names and labels

– Good ontology separates the underlying patterns in nature from the labels used to describe them

Page 20: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Ontologies and annotation

• Ontologies are of little practical use without annotation– GO has ~6 million annotations linking genes and gene

products to GO terms– Mostly (but not all) MOD & Human– Same terms are shared across species

• All annotation statements have provenance– Source/publication– Evidence & evidence codes

Page 21: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Use of GO annotations

• Database search• Database integration• Automating further annotation• Data mining and data analysis

– Microarray analysis:• 1. Extract cluster of co-exressed genes• 2. Analyses annotations for enrichment of certain terms

Page 22: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

What is a Database?

• an organized body of related information

• In computing, a database can be defined as a structured collection of records or data that is stored in a computer so that a program can consult it to answer queries. The records retrieved in answer to queries become information that can be used to make decisions.

Page 23: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Mouse Genome Informatics (MGI) Database

• Comprehensive information resource about the laboratory mouse

• Provides consensus representation of the mouse genome

• International scientific community resource• Integrated data acquisition and query capabilites

MGI Database is a Relational Database: Information is stored in tables that have relationships to each other. This facilitates query and retrieval of subsets of data.

Page 24: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

MGI’s primary mission is to facilitate the use of mouse as a model for human biology by providing integrated access to data on the genetics,

genomics, and biology of the laboratory mouse.

Hermansky-Pudlak syndrome Mouse model & human phenotype

Information content spans from sequence to phenotype/disease

sequence

variants & polymorphisms

gene function

genome location

mouse/humanorthologs & maps

strain geneaologyexpression

tumors

Database Resource:Mouse Genome Informatics (MGI)

Func Genomics2012

Page 25: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

MGI integrates genetic, genomic and phenotypic data

IntegrateFactor out common objects

Assemble integrated objects

Gather data from multiple sources

• Within MGI • Genes• Sequence• Expression• Literature• Alleles• Phenotypes

• Between MGI and others• Via shared sequence

annotations……UniProt, EntrezGene, Ensembl

• Via shared semantic representations……Drosophila, Arabidopsis, etc.

Func Genomics2012

Page 26: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

• Data Acquisition• Object Identity• Standardizations• Data Associations• Integration with other

bioinformatics resources

New Gene, Strain or

Sequence?

Controlled Vocabularies

Evidence & Citation

Co-curation of shared objects and concepts

Annotation PipelineLiterature &

Loads

Func Genomics2012

Page 27: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

RPCI

Automated (mostly) Data Integration (Loads)

MGI db

Associations

Clones

Non-mouse

Gene models and coordinates

Sequences

Vocabularies

SNP db

GOM

PAnatomyInterproOMI

MPIRSFAnnotatio

n

MGC

GenBankRefSe

qUniProtDFCIse

qDoTSseqNIAse

qNCB

IVEGA

dbSNP

EG chimpEG

dogEG ratEG

human

EG mouseUniPro

tDFCIDoT

SNIAUnigen

eTreeFamGene traps

Ensembl

microRNAs

UniSTS

HCOPHomologene

Page 28: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Manual (mostly) annotation of the biomedical literature

Func Genomics2012

> 12,000 / year

Page 29: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Data acquisition is constant

Load Program Summary of Data Loaded

Mouse EntrezGene EntrezGene IDs for mouse markers. Plus marker-to-sequence associations from EntrezGene not already in MGD

Human/Rat EntrezGene Nomenclature, map position and other data regarding human and rat genes. OMIM associations for human.

GenBank Seq Mouse sequence records from GenBank

RefSeq Seq Mouse sequence records from RefSeq

UniProt/TrEMBL Seq Mouse sequence records from UniProt and TrEMBL

TIGR/DoTS/NIA Seq Mouse consensus sequence records from TIGR/DoTS/NIA clusters

TIGR/DoTS/NIA Association Associations between TIGR/DoTS/NIA cluster sequences and markers.

Ensembl Gene Model Ensembl gene model sequences, coordinates, & associations between these & markers

NCBI Gene Model NCBI gene model sequences, coordinates, & associations between these & markers

UniProt Association UniProt/TrEMBL IDs and additional GenBank IDs for mouse markers. Plus GO and InterPro annotations

UniGene Association UniGene cluster IDs for mouse markers.

EST cDNA Clone Mouse IMAGE, NIA, MGC, Riken, cDNAs and EST sequence associations

MGC Association MGC IDs and associations between MGC full length sequences and MGC cDNAs

RPCI Clone RPCI 23/24 BAC clones and sequence associations

GO Vocabulary Updated Gene Ontology (GO) vocabularies from the central GO site.

OMIM Vocabulary Updated OMIM disease terms

MP Vocabulary Updated MP vocabulary (from OBO-Edit)

Anatomy Updated adult mouse anatomy ontology (from OBO-Edit)

Mapping panel JAX, EUCIB, Copeland-Jenkins and many others

PIRSF Mouse PIR superfamily terms and associations to markers

SNPs Mouse SNPs from dbSNP and associations between SNPs & markers.

Page 30: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Who is the authority?

Mouse data for which MGI serves as the authoritative source.Data type Working relationship

Gene Symbol/Name MGD makes primary assignment; coordination with HGNC, RGNC

Allele Symbol/Name MGD makes primary assignment

Strain Designations MGD makes primary assignment

Gene -to- nucleotide sequence association Co-curation with NCBI

Gene -to- protein sequence association Co-curation with UniProt

Gene Ontology (GO) annotations MGD provides primary data set

Mammalian Phenotype Ontology MGD develops and applies vocabulary

Gene homology data between mouse & other species

MGD curated orthology relationships

Genotype -to- phenotype data MGD provides primary curation

Mouse model -to- human disease (OMIM) MGD provides primary curation

Page 31: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Snapshot of MGI data content

MGI data statisticsMarch 2010

Genes (including unmapped mutants) 36,290

Genes w/ nucleotide sequence 29,110

Genes w/ protein sequence 26,108

Genes annotated to GO (comprehensive) 25,644

Mouse/human orthologs 17,841

Mouse/rat orthologs 16,767

Targeted alleles mutant alleles in mice

24,77023,866

Genes w/ phenotypic alleles genes w/ targeted alleles

12,35010,340

Human diseases w/ one or more mouse model 999

QTL 4,404

References 150,341

mouse refSNPs 10,089,692

Page 32: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Having the data, we want to ask complex questions

Page 33: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Curators use controlled terms from structured vocabularies (ontologies) to annotate complex biological systems described in the literature

The knowledge is in the details

Page 34: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

• Gene Nomenclature• Gene/Marker Type• Allele Type• Assay Type

– Expression– Mapping

• Molecular Mutation• Inheritance Mode

• Tissue Types• Cell Types• Cell Lines• Units

– Cytogenetic– Molecular

• ES Cell Line• Strain Nomenclature

Keyword lists standardize descriptions and enable comprehensive data retrieval

Keyword lists support data integration

Func Genomics2012

Page 35: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

• Sheer number of terms too much to remember and sort– Need standardized, stable, carefully defined terms– Need to describe different levels of detail– So…defined terms need to be related in a hierarchy

• With structured vocabularies/hierarchies– Parent/child relationships exist between terms– Increased depth -> Increased resolution– Can annotate data at appropriate level– May query at appropriate level

• All model organisms database and genome annotation systems have same issues

Organogenesis

Blood vessel development

Angiogenesis

Vasculogenesis

Process terms

But, keyword lists are not enough

Func Genomics2012

Page 36: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

And so, we started theGene Ontology (GO)

aa

www.geneontology.org

• Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge.

• Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries.

• Members agree to contribute gene product annotations and associated sequences to GO database; thus facilitating data analysis and semantic interoperability.

Func Genomics2012

Page 37: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

What is Ontology?

Func Genomics2012

• Dictionary:A branch of metaphysics concerned with the nature and relations of being.

• Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.

16061700s

Page 38: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

what kinds of things exist?

what are the relationships between these things?

eye

_part of

sclera

_is a

sense organ

developsfrom

Optic placode

A biological ontology is:

• A (machine and human) interpretable representation of some aspect of biological reality

http://www.macula.org/anatomy/eyeframe.html

Page 39: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Gene Ontology: widely adopted

AgBase

Func Genomics2012

Page 40: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

• Molecular Function = elemental activity/task - the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

• Biological Process = biological goal or objective– broad biological goals, such as mitosis or purine metabolism, that are accomplished

by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes; examples include

nucleus, telomere, and RNA polymerase II holoenzyme

• Sequence Ontology = genome features– regions, attributes, variants; examples include exon, CpG island, and transgenic

insertion

• Cell Ontology = cell types– Examples include photoreceptor cell and pillar cell

GO represents selected molecular domains

Func Genomics2012

Page 41: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Biological ProcessGO term: tricarboxylic acid cycleSynonym: Krebs cycleSynonym: citric acid cycleGO id: GO:0006099

Cellular ComponentGO term: mitochondrionGO id: GO:0005739Definition: A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration.

Molecular FunctionGO term: Malate dehydrogenase. GO id: GO:0030060(S)-malate + NAD(+) = oxaloacetate + NADH.

H

O

H

O

O

H

O

H

O

H

H

O

O

H

O

H

O

H

H

O

NAD+NADH + H+

GO reflects biological knowledge for computers

Page 42: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012Terms are defined graphically relative to other terms

Page 43: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Ontologies can be represented as graphs, where the nodes are connected by edges • Nodes = terms in the ontology• Edges = relationships between the concepts

node

nodenode

edge

Ontology Structure

Page 44: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Ontological relations

• Types are related• Network of terms forms a graph

– Terms (nodes)– The edge type (relation) is important

• Two common relations:– Is_a– Part_of

Page 45: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

eyeball

cavitated organ

is_a

organ

is_a

instance_of

Types(represented in the ontology)

Instances(NOT represented in the ontology)

Page 46: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

Formal definition of is_a

• is_a holds between types• X is_a Y holds if and only if:

– Given any thing that instantiates X at some time, that thing also instantiates Y at the same time

Page 47: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

GO terms are used for functional annotations

I

I Denotes an ‘is-a’ relationshipDenotes a ‘part-of’ relationship

P

Brain development [GO:0007420] (141 genes, 207 annotations)I

Page 48: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Annotations are assertions

• There is evidence that this gene product can be best classified using this term

• The source of the evidence and other information is included

• There is agreement on the meaning of the term

Func Genomics2012

Page 49: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

P05147

PMID: 2976880

GO:0047519IDA

P05147 GO:0047519 IDA PMID:2976880

GO Term

Reference

Evidence

Annotating Gene Products using GO

Gene Product

Func Genomics2012

Page 50: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

NO Direct ExperimentInferred from evidence

Direct Experiment in organism

Evidence codes describe the basis of the annotation

• IDA: Inferred from direct assay• IPI: Inferred from physical interaction• IMP: Inferred from mutant phenotype• IGI: Inferred from genetic interaction• IEP: Inferred from expression pattern• IEA: Inferred from electronic annotation• ISS: Inferred from sequence or structural similarity• TAS: Traceable author statement • NAS: Non-traceable author statement • IC: Inferred by curator• RCA: Reviewed Computational Analysis• ND: no data available

Func Genomics2012

Page 51: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

DAGs

DefinitionSynonyms

GO:54321

Terms

Transcription factor

DNA binding

Protein binding

Ligand binding or carrier

Vocabulary

Annotations

J:65378TAS

J:62648IDA

J:60000IEAAhr

Edr2

Genes

Synonyms

Name MGI:105043

Vocabularies in MGI: GO Example

Func Genomics2012

Page 52: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

34,315 genes75,933 annotations

Acetyl-CoACoA-SH

Citrate synthase

Function

34,517 genes65,513 annotations

Cellular Component

Biological Process

34,063 genes87,565 annotations

TCACycle

March, 2010

GO @ MGI

Total Genes: 35,147Total Annot.: 145,895Total Papers: 8,985

Func Genomics2012

Page 53: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Now we can query across all annotations based on shared biological activity.

Func Genomics2012

Page 54: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith Blake

Biomedical Ontologies in MGI

• GO: (function, process, cellular location)• SO: (sequence features)• PRO: (specific proteins by species/strain)• MP: (phenotypes)• Traits / Behavior /• Anatomies / Homologies (morphology)• DO: (diseases, not phenotypes; definitions not

diagnoses)• CL: (cells and their lineages)• OBO Foundry (standards and status)

Func Genom

ics2012

Page 55: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012

BioOntologies (GO) enable science

• Ontologies as terminology / classifications• Ontologies enable data aggregation• Ontologies used for data mining • Ontologies used for statistical analysis

Page 56: Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012

Judith BlakeFunc Genomics2012