gene ontology network enrichment analysis

30
Dmitry Grapov, PhD Gene Ontology Network Enrichment Analysis

Upload: uc-davis

Post on 01-Jul-2015

406 views

Category:

Science


3 download

DESCRIPTION

From the UC Davis Proteomics 2014 Summer Workshop www.proteomics.ucdavis.edu by Dmitry Grapov, Ph D

TRANSCRIPT

Page 1: Gene Ontology Network Enrichment Analysis

Dmitry Grapov, PhD

Gene Ontology Network Enrichment Analysis

Page 2: Gene Ontology Network Enrichment Analysis

Download all material for the tutorial

https://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/Summer%202014%20Proteomics%20Workshop.zip/download

https://sourceforge.net/projects/teachingdemos/files/

Choose 2014 UC Davis Proteomics Workshop or use the full URL below

Page 3: Gene Ontology Network Enrichment Analysis

• decrease• increase

Use functional analysis to identify if the changes in variables are enriched (increased compared to random chance) for some biological pathway, domain or ontological category.

Page 4: Gene Ontology Network Enrichment Analysis

Enrichment or Overrepresentation analysis

Biochemical Pathway Biochemical Ontology

Page 5: Gene Ontology Network Enrichment Analysis

Major Tasks

Using the proteins listed in the excel workbook: ‘proteomic data for analysis.xlsx’ and worksheet: ‘protein IDs’

1. Conduct Gene Ontology (GO) Enrichment Analysis using DAVID Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

2. Investigate enriched terms using Quick GO http://www.ebi.ac.uk/QuickGO/

3. Summaries and visualize the results using REVIGO http://revigo.irb.hr/

4. Create and modify GO network using Cytoscape http://www.cytoscape.org/

Page 6: Gene Ontology Network Enrichment Analysis

Protein IDsCommon protein identifier UniProt/SwissProt Accession (default in scaffold) http://www.uniprot.org/

Use Biomart to translate to other database IDS

http://www.biomart.org/

e.g. gene symbols

Page 7: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

Page 8: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

1. Upload list

2. Choose ID type

3. Select list type

4. Submit

Page 9: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resourcesorganism Make sure all IDs were recognized

List of biochemical databases tested for enrichment

Page 10: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Choose GO

Page 11: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 12: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Overview BP: Biological process

2. Select

Page 13: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 14: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources1. Overview most enriched term

Page 15: Gene Ontology Network Enrichment Analysis

Quick GO http://www.ebi.ac.uk/QuickGO/1. View children (lower hierarchy subsets) of this term

Page 16: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources/Quick GO1. Can you identify any enriched children of this term in our DAVID output?

?

2. Download results

Page 17: Gene Ontology Network Enrichment Analysis

Overview and Format Results in Excel

1. Save results 2. Open in MS Excel

Page 18: Gene Ontology Network Enrichment Analysis

Overview Results

Modified Fisher’s Exact Test p-value

optionally: Check in Rx<-data.frame(user=c(1,47),genome=c(690,13528))

fisher.test(x) # p-value = 5.41e-06

(13/47) / (690/13528)

Page 19: Gene Ontology Network Enrichment Analysis

Alternative to Fisher Exact Test:

Hypergeometric Test

How to calculate statistics to determine enrichment?

hit.num = 51 # number of significantly changed pathway variables

set.num = 1455 # number of variables in pathway

full = 3358 # all possible variables in organism

q.size = 72 # number of significantly changed variables

phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)

enrichment p-value = 1.717553e-06

Page 20: Gene Ontology Network Enrichment Analysis

Visualization Options

Challenges: •Removal of redundant information•Visualizing term relationships (term-term, term-protein)

Page 21: Gene Ontology Network Enrichment Analysis

Use REVIGO to filter redundant terms

http://revigo.irb.hr/

prepare input (term, p-value)

1. Upload to

REVIGO

Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800

2. Run

Page 22: Gene Ontology Network Enrichment Analysis

REVIGO: overview scatterplot

Position defined on similarity (MDS)

Page 23: Gene Ontology Network Enrichment Analysis

REVIGO: overview table

Cluster leaders prioritized based on enrichment p-value

Page 24: Gene Ontology Network Enrichment Analysis

REVIGO: network

• Edges: 3% of the strongest GO term pairwise similarities

• Node size: generality of term (small = specific)

• Node color: p-value

Download network

Page 25: Gene Ontology Network Enrichment Analysis

Cytoscape

1. Open Cytoscape

Import REVIGO network into cytoscape

2

3 4

Page 26: Gene Ontology Network Enrichment Analysis

Cytoscape: set layout and defaults

1. Set layout 3. Set network defaults

2

4 5

Page 27: Gene Ontology Network Enrichment Analysis

Cytoscape: map data to network properties

1. Set Edge width and color 2. Set Node labels, size and color

Page 28: Gene Ontology Network Enrichment Analysis

Cytoscape: overview network components

Download edge information

1

2

3. View in excel

Download node information

1

2

3. View in excel

Page 29: Gene Ontology Network Enrichment Analysis

Bonus: Modify Edge and Node Attributes to show term to protein connections

See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload formats

See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping

Page 30: Gene Ontology Network Enrichment Analysis

See more Statistical and Multivariate Analysis Examples athttp://imdevsoftware.wordpress.com/tutorials/

Questions?

[email protected]

This research was supported in part by NIH 1 U24 DK097154