1 annotation for gene expression analysis with reactome.db package utah state university – spring...

26
1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

Upload: ciara-stiverson

Post on 11-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

1

Annotation for Gene Expression Analysis with Reactome.db Package

Utah State University – Spring 2012

STAT 6570: Statistical Bioinformatics

Cody Tramp

Page 2: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

2

References

Ligtenberg W. 2011. Reactome.db: How to use the reactome.db package.

www.reactome.org

Page 3: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

3

Reactome.db Overview

“Open souce, open access, manually curated, and peer-reviewed pathway database” – www.reactome.org

Reactome.db is an R interface that allows queries to the SQL database containing pathway information

Contains functions for converting between annotation IDs and names for GO, Entrez, and Reactome

Page 4: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

4

Getting Help on Specific Reactome.db Functions

#Load the Reactome.db packagelibrary(reactome.db)

#Check for main manual pages?reactome.db #This won't get the actual manual

#List all reactome.db objectsls("package:reactome.db")

# [1] "reactome“ "reactome_dbconn“ "reactome_dbfile" # [4] "reactome_dbInfo“ "reactome_dbschema“ "reactomeEXTID2PATHID" # [7] "reactomeGO2REACTOMEID“ "reactomeMAPCOUNTS“ "reactomePATHID2EXTID" #[10] "reactomePATHID2NAME“ "reactomePATHNAME2ID“ "reactomeREACTOMEID2GO"

#Look up specific manual for an object?reactome_dbInfo #Still not very useful – poor documentation

Page 5: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

5

How IDs and names are stored in Reactome.db The reactome.db links to a SQL database Functions are interfaces to the database SQL databases are relational databases

(think of Excel spreedsheets, but better) Data is stored as key:value pairs

Key Value15869 Homo sapiens: Metabolism of nucleotides68616 Homo sapiens: Assembly of the ORC complex at the origin of replication68827 Homo sapiens: CDC6 association with the ORC:origin complex68867 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex68874 Homo sapiens: Assembly of the pre-replicative complex

Page 6: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

6

Reactome.db Function Uses(NOTE: all return a key:value list)

Converting Between Entrez and ReactomereactomeEXTID2PATHID = Entrez ID to Reactome.db IDreactomePATHID2EXTID = Reactome.db Name to Entrez ID

> xx <- toTable(reactomeEXTID2PATHID)> head(xx) reactome_id gene_id1 168253 108982 168254 108983 168253 81064 168254 81065 168253 56106 168254 5610

Use toTable() instead of as.list() that is shown in manuals

Page 7: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

7

Reactome.db Function Uses(NOTE: all return a key:value list)

Converting from GO ID and Reactome IDreactomeREACTOMEID2GO = Reactome.db ID to GO IDsreactomeGO2REACTOMEID = GO ID to Reactome.db ID

> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003

Page 8: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

8

Reactome.db Function Uses(NOTE: all return a key:value list)

Retrieving Pathway Names from Reactome IDSreactomePATHNAME2ID = Reactome.db Name to Reactome.db IDreactomePATHID2NAME = Reactome.db ID to Reactome.db Name

> xx <- toTable(reactomePATHID2NAME)> head(xx) reactome_id path_name1 15869 Homo sapiens: Metabolism of nucleotides2 68616 Homo sapiens: Assembly of the ORC complex at the origin of replication3 68689 Homo sapiens: CDC6 association with the ORC:origin complex4 68827 Homo sapiens: CDT1 association with the CDC6:ORC:origin complex5 68867 Homo sapiens: Assembly of the pre-replicative complex6 68874 Homo sapiens: M/G1 Transition

Page 9: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

9

Reactome.db Function Uses(NOTE: all return a key:value list)

reactomeMAPCOUNTS = shows number of rows in each function’s relational database (not very useful unless error checking)

> xx <- as.list(reactomeMAPCOUNTS)> xx$reactomeEXTID2PATHID[1] 28363

$reactomeGO2REACTOMEID[1] 3217

$reactomePATHID2EXTID[1] 8320

$reactomePATHID2NAME[1] 13778

$reactomePATHNAME2ID[1] 13876

$reactomeREACTOMEID2GO[1] 47575

Page 10: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

10

Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)# Get data.frame summarizing all reactome.db pathways including a certain string

xx <- toTable(reactomePATHNAME2ID)all.pathways <- xx$path_name # get name of each reactome.db pathwayt <- grep('apoptosis',all.Terms) # get index where Term includes #use agrep() for approximate term searching

reactome.Term <- unlist(all.pathways[t])reactome.IDs <- unlist(xx$reactome_id[t])

reactome.frame <- data.frame(reactome.ID=reactome.IDs, reactome.Term=reactome.Term)

rownames(reactome.frame) <- 1:length(reactome.ID)reactome.frame # 13 terms

Page 11: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

11

Ex: Find apoptosis induction-related ID(compare to Notes 6.1 slide 10)

Page 12: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

12

Ex. Pathway Term Search Function##Define Function to search for pathways with given key word##agrep.bool is indicator to use agrep (TRUE) or grep (FALSE)searchPathways2REACTOMEID <- function(term, agrep.bool) { xx <- toTable(reactomePATHNAME2ID) all.pathways <- xx$path_name # get name of each reactome.db pathway #get index where Term is found if (agrep.bool==FALSE) (t <- grep(term, all.pathways)) else (t <- agrep(term, all.pathways)) unlist(xx$reactome_id[t]) }

apop.IDs <- searchPathways2REACTOMEID("apoptosis", FALSE)length(apop.IDs) #13 pathways matched

apop.IDs <- searchPathways2REACTOMEID("apoptosis", TRUE)length(apop.IDs) #85 pathways matched

Page 13: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

13

Getting GO Terms from single Reactome ID##Get List of GO Terms from Reactome IDxx <- toTable(reactomeGO2REACTOMEID)t <- xx$reactome_id == "15869"GOTerms <- xx$go_id[t]

> GOTerms [1] "GO:0055086" "GO:0006139" "GO:0044281" [4] "GO:0034641" "GO:0044238" "GO:0008152" [7] "GO:0006807" "GO:0044237" "GO:0008150"[10] "GO:0009987"

> xx <- toTable(reactomeGO2REACTOMEID)> head(xx) reactome_id go_id1 168276 GO:00190542 168276 GO:00190483 168276 GO:00440684 168276 GO:00224155 168276 GO:00517016 168276 GO:0044003

Page 14: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

14

Getting GO Terms from list of Reactome IDs##Define Function to get all GO Terms for all Reactome IDs in a listgetGOTerms <- function(list_reactome) { listGO = list(); xx <- toTable(reactomeGO2REACTOMEID); for(i in 1:length(list_reactome)) {t <- xx$reactome_id==list_reactome[i]; temp_list = xx$go_id[t] listGO = c(listGO, temp_list)} unlist(listGO) }

GOTerms.all <- getGOTerms(apop.IDs)#From slide 10length(GOTerms.all) #136 GO Terms from 13 apop.IDs

Should have yielded 169 terms (Notes 4.1 slide 10) – reactome.db might not be complete

Page 15: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

15

Reactome.org Online Tools

Page 16: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

16

Pathway Viewer on reactome.org

http://www.reactome.org/userguide/Usersguide.html#Introduction

Page 17: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

17

Pathway Viewer on reactome.org Details Panel

Page 18: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

18

Pathway Viewer on reactome.org

http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142

Page 19: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

19

Reactome Pathway SymbolsUpregulation andparticipating proteins

Inhibition

http://www.reactome.org/entitylevelview/PathwayBrowser.html#DB=gk_current&FOCUS_SPECIES_ID=48887&FOCUS_PATHWAY_ID=71387&ID=76213&VID=3422142

Page 20: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

20

Reactome Database Assignment Method Genes seem to be assigned to pathways in a

similar manner to GO database If gene is up-regulated, it is included Genes that are down-regulated in a condition are

NOT mapped to the condition/pathway

Haven’t received official response from reactome.org, but from general browsing this seems to be the case

Page 21: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

21

Pathway Analysis Tool

http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage

Page 22: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

22

Pathway Analysis Tool

http://www.reactome.org/ReactomeGWT/entrypoint.html#PathwayAnalysisDataUploadPage

Page 23: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

23

Expression Set Data Analysis

Page 24: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

24

Expression Set Data Analysis

Page 25: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

25

Summary Reactome.db provides an interface to the

SQL database containing IDs Functions for converting between ID types No functionality for gene testing through R

Online tools include pathway maps and ID lookup tables

Some limited expression testing (with unknown statistical methods)

Page 26: 1 Annotation for Gene Expression Analysis with Reactome.db Package Utah State University – Spring 2012 STAT 6570: Statistical Bioinformatics Cody Tramp

26

Questions?