motif enrichment analysis in co-expressed gene sets and high-throughput sequence sets

52
www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3

Upload: loring

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets. Wyeth Wasserman Jan. 18, 2012. opossum.cisreg.ca/oPOSSUM3. Welcome. If you encounter any technical difficulties during the webinar Type a report using the chat option Slide presentation ~20 min - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

www.cmmt.ubc.ca

MOTIF ENRICHMENT ANALYSIS IN CO-EXPRESSED GENE SETS AND HIGH-

THROUGHPUT SEQUENCE SETS

Wyeth WassermanJan. 18, 2012

opossum.cisreg.ca/oPOSSUM3

Page 2: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Welcome

• If you encounter any technical difficulties during the webinar– Type a report using the chat option

• Slide presentation ~20 min• Compile Questions as they are submitted

and answer them during the final Q&A/discussion period

• During the discussion session, we’ll allow audience speaking

2

Page 3: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Webinar Format

• Introduction• Walk-Through• Summary• Q&A

3

Page 4: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

INTRODUCTION

4

Page 5: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Overview

• Given co-expressed gene sets, what are the key mediators of co-expression?– Focus on TFs

• Web-based software system for motif enrichment analysis– Co-expressed genes or sequences– Multiple sets of analysis methods– Available for human, mouse, fly, worm, yeast

5

Page 6: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Motif Enrichment Analysis

6

Background Target

0

0.2

0.4

0.6

0.8

1

TFBS1 TFBS2 TFBS3

Prop

ortio

n of

gen

es c

onta

inin

g TF

BS

BackgroundTarget

p=0.04 p=0.55 p=0.66

Finds over-represented TFBS in co-expressed gene sets

Page 7: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

What do we need?

• Region selection– Where to look for enriched binding sites– Use conservation filter to restrict search

space• TFBS profiles to search for

– Need a pool of validated profiles• Scoring metrics for enrichment

– How to measure motif over-representation

7

Page 8: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

GeneCR1 CR2 CR4CR3

Threshold

Genomic Position

phastConsScore

Conserved Region Selection

8

Page 9: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

TFBS Profiles• JASPAR 2010: Portales-Casamar et al. Nucleic

Acids Research 2009.• Expanded collection of TFBS profiles

– 130 vertebrate profiles– 105 insect profiles– 5 nematode profiles– 177 yeast profiles– PBM (104), PBM_HOMEO (176), PBM_BHLH (19)

• Standardized 2-level TF classification (class, family)

9

Page 10: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Scoring Metrics

• Z scores– Based on the number of occurrences of the TFBS

relative to background– Normalized for sequence length– Simple binomial distribution model

• Fisher scores– Fisher exact probability test

• Fisher score = -log(Fisher p-value)– Based on the number of genes containing the TFBS

relative to background

10

Page 11: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Additional Metric for Seq-Based• KS scores

– Kolmogorov-Smirnoff test– Compares the empirical

distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background

– Expect real binding sites to be centered around the MPC

11

MPC

Foreground

Background

KS score = -log(KS test p-value)

Page 12: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Analysis Methods

12

Page 13: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

WALK-THROUGH

13

Page 14: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

14

http://opossum.cisreg.ca/oPOSSUM3

Page 15: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human SSA - Input

15

Page 16: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

16

Page 17: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

17

Page 18: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human SSA - Results

18

Page 19: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

19

TF HNF1A

JASPAR ID MA0046.1

Class Helix-Turn-Helix

Family Homeo

Tax Group Vertebrates

IC 15.548

GC Content 0.259

Page 20: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

20

Target Gene Hits 19

Target Gene Non-Hits 36

Background Gene Hits 1113

Background Gene Non-Hits 3887

Target TFBS Hits 41

Target TFBS Nucleotide Rate 0.0269

Background TFBS Hits 2127

Background TFBS Nucleotide Rate 0.009

Page 21: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

21

Z-score 15.134

Fisher score 3.646

Page 22: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

22

Page 23: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

oPOSSUM methods

23

Page 24: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

24

Page 25: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human aCSA - Input

25

Page 26: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human aCSA - Input

26

Page 27: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human aCSA - Input

27

Page 28: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human aCSA - Results

28

Page 29: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

29

Page 30: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

30

Page 31: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

TFBS Cluster Analysis

31

TFBS ProfileCluster

Page 32: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

GeneCR1 CR2 CR4CR3

TFBSs

TFBS Cluster Hits

Merge

Overrepresentation Analysisbased on merged TFBS cluster hits

TFBS Cluster Analysis (TCA)

32

Page 33: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human TCA – TFBS cluster selection

33

Page 34: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Human TCA - Results

34

Page 35: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

TFCluster Info Page

35

Page 36: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

36

Page 37: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Seq SSA - Input

37

Page 38: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Seq SSA - Input

38

Page 39: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

39

Page 40: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

40

Page 41: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

41

Page 42: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

42

Page 43: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

43

Page 44: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

44

Page 45: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Seq SSA - Results

45

Page 46: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

46

KS score

Page 47: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

47

Page 48: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Seq TCA - Input

48

Page 49: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

SUMMARY

49

Page 50: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

oPOSSUM-3

• Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments

• Important functionalities– Gene-based vs. Sequence-based– Single site vs. Anchored combination site– Individual vs. clusters of TFBS profiles– Human, mouse, fly, worm and yeast

50

Page 51: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

Development Team

51

Version 1 CSA Version 2 Version 3• Ho Sui, SJ• Mortimer, JR• Arenillas, DJ• Brumm, J• Walsh, CJ• Kennedy, BP• Wasserman,

WW

• Huang, S• Fulton, DL• Arenillas, DJ• Perco, P• Ho Sui, SJ• Mortimer, JR• Wasserman,

WW

• Ho Sui, SJ• Fulton, DL• Arenillas, DJ• Kwon, AT• Wasserman,

WW

• Kwon, AT• Arenillas, DJ• Worsely

Hunt, R• Wasserman,

WW

Page 52: Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets

QUESTIONS & ANSWERS

Please take a moment to type questions/comments into the chat box.The questions will be answered shortly.

52