getting started: a user’s guide to the go

24
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010

Upload: thai

Post on 07-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Getting Started: a user’s guide to the GO. GO Workshop 3-6 August 2010. Avian Gene Nomenclature. Provides structural annotation for agriculturally important genomes Provides functional annotation (GO) Provides tools for functional modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Getting Started: a user’s guide to the GO

Getting Started: a user’s guide to the

GOGO Workshop

3-6 August 2010

Page 2: Getting Started: a user’s guide to the GO

1. Provides structural annotation for agriculturally important genomes

2. Provides functional annotation (GO)3. Provides tools for functional modeling4. Provides bioinformatics & modeling

support for research community

Avian Gene Nomenclature

Page 3: Getting Started: a user’s guide to the GO

Introduction to GO Anatomy of a GO term: a GO annotation

example GO evidence codes Making annotations: literature

biocuration & computation analysis ND vs no GO Using the GO

GO tools Functional modeling considerations

Page 4: Getting Started: a user’s guide to the GO

Gene Ontology (GO) Not about genes!

Gene products: genes, transcripts, ncRNA, proteins

The GO describes gene product function Not a single ontology

Biological Process (BP or P) Molecular Function (MF or F) Cellular Component (CC or C)

Page 5: Getting Started: a user’s guide to the GO

What is the Gene Ontology?

assign functions to gene products at different levels, depending on how much is known about a gene product is used for a diverse range of species structured to be queried at different levels, eg:

find all the chicken gene products in the genome that are involved in signal transduction

zoom in on all the receptor tyrosine kinases human readable GO function has a digital tag to allow computational analysis of large datasets

COMPUTATIONALLY AMENABLE ENCYCLOPEDIA OF GENE FUNCTIONS AND THEIR RELATIONSHIPS

“a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and

changing”

Page 6: Getting Started: a user’s guide to the GO

Ontologiesdigital identifier

(computers)

description(humans)

relationships between terms

As of ontology version 1.1348 (27/07/2010):

32,091 terms, 99.3% defined

* 19169 biological process * 2745 cellular component * 8736 molecular function

1441 obsolete terms (not included in figures above)

Page 7: Getting Started: a user’s guide to the GO

GO annotation example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

Page 8: Getting Started: a user’s guide to the GO

GO annotation example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

aspect or ontologyGO:ID (unique)

GO term nameGO evidence code

Page 9: Getting Started: a user’s guide to the GO

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Guide to GO Evidence Codes http://www.geneontology.org/GO.evidence.shtml

Page 10: Getting Started: a user’s guide to the GO

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

Biocuration of literature• detailed function • “depth”• slower (manual)

Page 11: Getting Started: a user’s guide to the GO

P05147

PMID: 2976880

Find a paperabout the protein.

Biocuration of Literature:detailed gene function

Page 12: Getting Started: a user’s guide to the GO

Read paper to get experimental evidence of function

Use most specific termpossible

experiment assayed kinase activity:use IDA evidence code

Page 13: Getting Started: a user’s guide to the GO

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Biocuration of literature• detailed function • “depth”• slower (manual)

Sequence analysis• rapid (computational)• “breadth” of coverage • less detailed

Page 14: Getting Started: a user’s guide to the GO

Unknown Function vs No GO ND – no data

Biocurators have tried to add GO but there is no functional data available

Previously: “process_unknown”, “function_unknown”, “component_unknown”

Now: “biological process”, “molecular function”, “cellular component”

No annotations (including no “ND”): biocurators have not annotated this is important for your dataset: what % has

GO?

Page 15: Getting Started: a user’s guide to the GO

Using the GO

Page 16: Getting Started: a user’s guide to the GO

Using the GO Decide on GO analysis tool How much GO is available for your

species? Getting GO for you data set Adding GO for your data

Page 17: Getting Started: a user’s guide to the GO

http://www.geneontology.org/

Page 18: Getting Started: a user’s guide to the GO
Page 19: Getting Started: a user’s guide to the GO

However…. many of these tools do not support non-model

organisms the tools have different computing requirements may be difficult to determine how up-to-date the

GO annotations are…

Need to evaluate tools for your system.

Page 20: Getting Started: a user’s guide to the GO

Evaluating GO toolsSome criteria for evaluating GO Tools:1. Does it include my species of interest (or do I have to

“humanize” my list)?2. What does it require to set up (computer usage/online)3. What was the source for the GO (primary or secondary)

and when was it last updated?4. Does it report the GO evidence codes (and is IEA

included)?5. Does it report which of my gene products has no GO?6. Does it report both over/under represented GO groups and

how does it evaluate this?7. Does it allow me to add my own GO annotations?8. Does it represent my results in a way that facilitates

discovery?

Page 21: Getting Started: a user’s guide to the GO

Some useful expression analysis tools:

Database for Annotation, Visualization and Integrated Discovery (DAVID)

http://david.abcc.ncifcrf.gov/ AgriGO -- GO Analysis Toolkit and Database for

Agricultural Community http://bioinfo.cau.edu.cn/agriGO/ used to be EasyGO chicken, cow, pig, mouse, cereals, dicots includes Plant Ontology (PO) analysis

Onto-Express http://vortex.cs.wayne.edu/projects.htm#Onto-Express can provide your own gene association file

Funcassociate 2.0: The Gene Set Functionator http://llama.med.harvard.edu/funcassociate/ can provide your own gene association file

Page 22: Getting Started: a user’s guide to the GO

Functional Modeling Considerations

Should I add my own GO? use GOProfiler to see how much GO is available for your species use GORetriever to find existing GO for your dataset Does analysis tool allow me to add my own GO?

Should I do GO analysis and pathway analysis and network analysis? different functional modeling methods show different aspects about

your data (complementary) is this type of data available for your species (or a close ortholog)?

What tools should I use? which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)

Page 23: Getting Started: a user’s guide to the GO

Protein/Gene identifiers

GORetriever

GO annotationsGenes/Proteins with no GO annotations

GOanna

Pathways and network analysis

GO Enrichment analysis

ArrayIDer

Microarray Ids

GOSlimViewer

Yellow boxes represent AgBase toolsGreen/Purple boxes are non-AgBase resources

Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVID

Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVIDEasyGO/AgriGOOnto-ExpressOnto-Express-to-go (OE2GO)

Overview of Functional Modeling Strategy

summarizes GO function

GOModeler hypothesis testing

Page 24: Getting Started: a user’s guide to the GO

For more information about GO

GO Evidence Codes: http://www.geneontology.org/GO.evidence.shtml

gene association file information: http://www.geneontology.org/GO.format.annotation.shtml

tools that use the GO: http://www.geneontology.org/GO.tools.shtml

GO Consortium wiki: http://wiki.geneontology.org/index.php/Main_Page

All websites are listed on the AgBase workshop website.