gene ontology for the newbies suparna mundodi, phd the arabidopsis information resources, stanford,...

37
GENE ONTOLOGY GENE ONTOLOGY FOR THE NEWBIES FOR THE NEWBIES Suparna Mundodi, PhD Suparna Mundodi, PhD The Arabidopsis Information The Arabidopsis Information Resources, Stanford, CA Resources, Stanford, CA

Upload: ernest-simpson

Post on 31-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

GENE GENE ONTOLOGY FOR ONTOLOGY FOR THE NEWBIESTHE NEWBIES

Suparna Mundodi, PhDSuparna Mundodi, PhD

The Arabidopsis Information The Arabidopsis Information Resources, Stanford, CAResources, Stanford, CA

Page 2: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

A Common Language for Annotation of Genes from

Yeast, Flies and Mice

The Gene OntologiesThe Gene Ontologies

…and Plants and Worms

…and Humans

…and anything else!

Page 3: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Outline of TopicsOutline of Topics

Introduction to the Gene Ontologies Introduction to the Gene Ontologies (GO)(GO)

Annotations to GO termsAnnotations to GO terms

GO ToolsGO Tools

Applications of GOApplications of GO

Page 4: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

GGene ene OOntologyntology

- Gene annotation systemGene annotation system

- Controlled vocabulary that can be Controlled vocabulary that can be applied to all organismsapplied to all organisms

- Used to describe gene products Used to describe gene products

Page 5: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What’s in a name?What’s in a name?

What is a cell?What is a cell?

Page 6: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

CellCell

Page 7: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

CellCell

Page 8: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

CellCell

Page 9: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

CellCell

Page 10: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

CellCell

Image from http://microscopy.fsu.edu

Page 11: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Bud initiation?

Page 12: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

= bud initiation

sensu Metazoa

= bud initiation

sensu Saccharomyces

= bud initiation

sensu Viridiplantae

Page 13: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What’s in a name?What’s in a name?

The same The same namename can be used to can be used to describe different describe different conceptsconcepts

Page 14: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What’s in a name?What’s in a name?

Page 15: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What’s in a name?What’s in a name?

Glucose synthesisGlucose synthesis Glucose biosynthesisGlucose biosynthesis Glucose formationGlucose formation Glucose anabolismGlucose anabolism GluconeogenesisGluconeogenesis

All refer to the process of making All refer to the process of making glucose from simpler componentsglucose from simpler components

Page 16: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What’s in a name?What’s in a name?

The same The same namename can be used to can be used to describe different describe different conceptsconcepts

A A conceptconcept can be described using can be described using different different namesnames

Comparison is difficult – in particular across species or across databases

Page 17: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

What is the Gene What is the Gene Ontology?Ontology?

A (part of the) solution: A (part of the) solution:

- A controlled vocabulary that can be A controlled vocabulary that can be applied to all organismsapplied to all organisms

- Used to describe gene products - Used to describe gene products - proteins and RNA - in any organismproteins and RNA - in any organism

Page 18: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

How does GO work?How does GO work?

WhatWhat does the gene product do? does the gene product do? WhyWhy does it perform these does it perform these

activities?activities? WhereWhere does it act? does it act?

What information might we want to capture about a gene product?

Page 19: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Molecular FunctionMolecular Function = elemental = elemental activity/taskactivity/task the tasks performed by individual gene products; examples the tasks performed by individual gene products; examples

are are carbohydrate bindingcarbohydrate binding and and ATPase activityATPase activity

Biological ProcessBiological Process = biological goal or = biological goal or objectiveobjective broad biological goals, such as broad biological goals, such as mitosismitosis or or purine purine

metabolismmetabolism, that are accomplished by ordered assemblies , that are accomplished by ordered assemblies of molecular functionsof molecular functions

Cellular ComponentCellular Component = location or complex= location or complex subcellular structures, locations, and macromolecular subcellular structures, locations, and macromolecular

complexes; examples include complexes; examples include nucleusnucleus, , telomeretelomere, and , and RNA RNA polymerase II holoenzymepolymerase II holoenzyme

The 3 The 3 GGene ene OOntologiesntologies

Page 20: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Function (what) Process (why)

Drive nail (into wood) Carpentry

Drive stake (into soil) Gardening

Smash roach Pest Control

Clown’s juggling object Entertainment

Example:Example: Gene Product = hammer Gene Product = hammer

Page 21: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Ontologies can be represented as graphs, where the Ontologies can be represented as graphs, where the nodesnodes are connected by are connected by edgesedges

Nodes = Nodes = conceptsconcepts in the ontology in the ontology Edges = Edges = relationshipsrelationships between the concepts between the concepts

node

nodenode

edge

Ontology StructureOntology Structure

Page 22: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Ontology StructureOntology Structure

The Gene Ontology is structured as a The Gene Ontology is structured as a hierarchical directed acyclic graph hierarchical directed acyclic graph (DAG)(DAG)

Terms can have more than one parent Terms can have more than one parent and zero, one or more childrenand zero, one or more children

Terms are linked by two relationshipsTerms are linked by two relationships is-ais-a part-ofpart-of

Page 23: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Directed Acyclic Directed Acyclic Graphs (DAG)Graphs (DAG)

is-apart-of

[other protein complexes]

[other organelles]

protein complex organelle

mitochondrion

fatty acid beta-oxidation multienzyme complex

Page 24: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Nucleus

Nucleoplasm Nuclearenvelope

Chromosome Perinuclear spaceNucleolus

A child is a subset ofa parent’s elements

The cell component term Nucleus has 5 children

Parent-Child Parent-Child RelationshipsRelationships

Page 25: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

True Path RuleTrue Path Rule

The path from a child term all the way up The path from a child term all the way up to its top-level parent(s) must always be to its top-level parent(s) must always be truetrue

cellcell cytoplasmcytoplasm

chromosomechromosome nuclear chromosomenuclear chromosome cytoplasmic chromosomecytoplasmic chromosome mitochondrial chromosomemitochondrial chromosome

nucleusnucleus nuclear chromosomenuclear chromosome

is-a

part-of

Page 26: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

term: gluconeogenesis

id: GO:0006094

definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

What’s in a GO term?What’s in a GO term?

Page 27: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Mitochondrial P450

Annotation of gene products with GO terms

Page 28: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Cellular component: mitochondrial inner membrane GO:0005743

Biological process:Electron transportGO:0006118

Molecular function: monooxygenase activity GO:0004497

substrate + O2 = CO2 +H20 product

Page 29: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Other gene products annotated to monooxygenase activity (GO:0004497)

- monooxygenase, DBH-like 1 (mouse)- prostaglandin I2 (prostacyclin) synthase (mouse)- flavin-containing monooxygenase (yeast)   - ferulate-5-hydrolase 1 (arabidopsis)

Page 30: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Two types of GO Annotations:

Electronic Annotation

Manual Annotation

All annotations must:

• be attributed to a source

• indicate what evidence was found to support the GO term-gene/protein association

Page 31: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

IEA Inferred from Electronic Annotation

ISS Inferred from Sequence Similarity

IEP Inferred from Expression Pattern

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IPI Inferred from Physical Interaction

IDA Inferred from Direct Assay

RCA Inferred from Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred by Curator

ND No biological Data available

Page 32: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

• Terms become obsolete when they are removed or redefined

• GO IDs are never deleted

• For each term, a comment is added to explains why the term is now obsolete

Ensuring Stability in a Dynamic Ontology

Obsolete Cellular ComponentObsolete Molecular FunctionObsolete Biological Process

Biological ProcessMolecular FunctionCellular Component

Page 33: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Why modify the GOWhy modify the GO

GO reflects GO reflects currentcurrent knowledge of knowledge of biologybiology

New organisms being added makes New organisms being added makes existing terms arrangements existing terms arrangements incorrectincorrect

Not everything perfect from the Not everything perfect from the outsetoutset

Page 34: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

• Access gene product functional information

• Find how much of a proteome is involved in a process/ function/ component in the cell

• Map GO terms and incorporate manual annotations into own databases

• Provide a link between biological knowledge and …

• gene expression profiles

• proteomics data

What can scientists do with GO?

Page 35: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Whole genome analysis(J. D. Munkvold et al., 2004)

Microarray analysis

Page 36: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

http://www.geneontology.org/GO.tools

Page 37: GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA

Beyond GO – Open Biomedical Ontologies• Orthogonal to existing ontologies to facilitate combinatorial approaches

- Share unique identifier space- Include definitions

• Anatomies• Cell Types• Sequence Attributes• Temporal Attributes• Phenotypes• Diseases• More….

http://obo.sourceforge.net