gene ontology for the newbies suparna mundodi, phd the arabidopsis information resources, stanford,...
TRANSCRIPT
GENE GENE ONTOLOGY FOR ONTOLOGY FOR THE NEWBIESTHE NEWBIES
Suparna Mundodi, PhDSuparna Mundodi, PhD
The Arabidopsis Information The Arabidopsis Information Resources, Stanford, CAResources, Stanford, CA
A Common Language for Annotation of Genes from
Yeast, Flies and Mice
The Gene OntologiesThe Gene Ontologies
…and Plants and Worms
…and Humans
…and anything else!
Outline of TopicsOutline of Topics
Introduction to the Gene Ontologies Introduction to the Gene Ontologies (GO)(GO)
Annotations to GO termsAnnotations to GO terms
GO ToolsGO Tools
Applications of GOApplications of GO
GGene ene OOntologyntology
- Gene annotation systemGene annotation system
- Controlled vocabulary that can be Controlled vocabulary that can be applied to all organismsapplied to all organisms
- Used to describe gene products Used to describe gene products
What’s in a name?What’s in a name?
What is a cell?What is a cell?
CellCell
CellCell
CellCell
CellCell
CellCell
Image from http://microscopy.fsu.edu
Bud initiation?
= bud initiation
sensu Metazoa
= bud initiation
sensu Saccharomyces
= bud initiation
sensu Viridiplantae
What’s in a name?What’s in a name?
The same The same namename can be used to can be used to describe different describe different conceptsconcepts
What’s in a name?What’s in a name?
What’s in a name?What’s in a name?
Glucose synthesisGlucose synthesis Glucose biosynthesisGlucose biosynthesis Glucose formationGlucose formation Glucose anabolismGlucose anabolism GluconeogenesisGluconeogenesis
All refer to the process of making All refer to the process of making glucose from simpler componentsglucose from simpler components
What’s in a name?What’s in a name?
The same The same namename can be used to can be used to describe different describe different conceptsconcepts
A A conceptconcept can be described using can be described using different different namesnames
Comparison is difficult – in particular across species or across databases
What is the Gene What is the Gene Ontology?Ontology?
A (part of the) solution: A (part of the) solution:
- A controlled vocabulary that can be A controlled vocabulary that can be applied to all organismsapplied to all organisms
- Used to describe gene products - Used to describe gene products - proteins and RNA - in any organismproteins and RNA - in any organism
How does GO work?How does GO work?
WhatWhat does the gene product do? does the gene product do? WhyWhy does it perform these does it perform these
activities?activities? WhereWhere does it act? does it act?
What information might we want to capture about a gene product?
Molecular FunctionMolecular Function = elemental = elemental activity/taskactivity/task the tasks performed by individual gene products; examples the tasks performed by individual gene products; examples
are are carbohydrate bindingcarbohydrate binding and and ATPase activityATPase activity
Biological ProcessBiological Process = biological goal or = biological goal or objectiveobjective broad biological goals, such as broad biological goals, such as mitosismitosis or or purine purine
metabolismmetabolism, that are accomplished by ordered assemblies , that are accomplished by ordered assemblies of molecular functionsof molecular functions
Cellular ComponentCellular Component = location or complex= location or complex subcellular structures, locations, and macromolecular subcellular structures, locations, and macromolecular
complexes; examples include complexes; examples include nucleusnucleus, , telomeretelomere, and , and RNA RNA polymerase II holoenzymepolymerase II holoenzyme
The 3 The 3 GGene ene OOntologiesntologies
Function (what) Process (why)
Drive nail (into wood) Carpentry
Drive stake (into soil) Gardening
Smash roach Pest Control
Clown’s juggling object Entertainment
Example:Example: Gene Product = hammer Gene Product = hammer
Ontologies can be represented as graphs, where the Ontologies can be represented as graphs, where the nodesnodes are connected by are connected by edgesedges
Nodes = Nodes = conceptsconcepts in the ontology in the ontology Edges = Edges = relationshipsrelationships between the concepts between the concepts
node
nodenode
edge
Ontology StructureOntology Structure
Ontology StructureOntology Structure
The Gene Ontology is structured as a The Gene Ontology is structured as a hierarchical directed acyclic graph hierarchical directed acyclic graph (DAG)(DAG)
Terms can have more than one parent Terms can have more than one parent and zero, one or more childrenand zero, one or more children
Terms are linked by two relationshipsTerms are linked by two relationships is-ais-a part-ofpart-of
Directed Acyclic Directed Acyclic Graphs (DAG)Graphs (DAG)
is-apart-of
[other protein complexes]
[other organelles]
protein complex organelle
mitochondrion
fatty acid beta-oxidation multienzyme complex
Nucleus
Nucleoplasm Nuclearenvelope
Chromosome Perinuclear spaceNucleolus
A child is a subset ofa parent’s elements
The cell component term Nucleus has 5 children
Parent-Child Parent-Child RelationshipsRelationships
True Path RuleTrue Path Rule
The path from a child term all the way up The path from a child term all the way up to its top-level parent(s) must always be to its top-level parent(s) must always be truetrue
cellcell cytoplasmcytoplasm
chromosomechromosome nuclear chromosomenuclear chromosome cytoplasmic chromosomecytoplasmic chromosome mitochondrial chromosomemitochondrial chromosome
nucleusnucleus nuclear chromosomenuclear chromosome
is-a
part-of
term: gluconeogenesis
id: GO:0006094
definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.
What’s in a GO term?What’s in a GO term?
Mitochondrial P450
Annotation of gene products with GO terms
Cellular component: mitochondrial inner membrane GO:0005743
Biological process:Electron transportGO:0006118
Molecular function: monooxygenase activity GO:0004497
substrate + O2 = CO2 +H20 product
Other gene products annotated to monooxygenase activity (GO:0004497)
- monooxygenase, DBH-like 1 (mouse)- prostaglandin I2 (prostacyclin) synthase (mouse)- flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis)
Two types of GO Annotations:
Electronic Annotation
Manual Annotation
All annotations must:
• be attributed to a source
• indicate what evidence was found to support the GO term-gene/protein association
IEA Inferred from Electronic Annotation
ISS Inferred from Sequence Similarity
IEP Inferred from Expression Pattern
IMP Inferred from Mutant Phenotype
IGI Inferred from Genetic Interaction
IPI Inferred from Physical Interaction
IDA Inferred from Direct Assay
RCA Inferred from Reviewed Computational Analysis
TAS Traceable Author Statement
NAS Non-traceable Author Statement
IC Inferred by Curator
ND No biological Data available
• Terms become obsolete when they are removed or redefined
• GO IDs are never deleted
• For each term, a comment is added to explains why the term is now obsolete
Ensuring Stability in a Dynamic Ontology
Obsolete Cellular ComponentObsolete Molecular FunctionObsolete Biological Process
Biological ProcessMolecular FunctionCellular Component
Why modify the GOWhy modify the GO
GO reflects GO reflects currentcurrent knowledge of knowledge of biologybiology
New organisms being added makes New organisms being added makes existing terms arrangements existing terms arrangements incorrectincorrect
Not everything perfect from the Not everything perfect from the outsetoutset
• Access gene product functional information
• Find how much of a proteome is involved in a process/ function/ component in the cell
• Map GO terms and incorporate manual annotations into own databases
• Provide a link between biological knowledge and …
• gene expression profiles
• proteomics data
What can scientists do with GO?
Whole genome analysis(J. D. Munkvold et al., 2004)
Microarray analysis
http://www.geneontology.org/GO.tools
Beyond GO – Open Biomedical Ontologies• Orthogonal to existing ontologies to facilitate combinatorial approaches
- Share unique identifier space- Include definitions
• Anatomies• Cell Types• Sequence Attributes• Temporal Attributes• Phenotypes• Diseases• More….
http://obo.sourceforge.net