motif enrichment analysis in co-expressed gene sets and high-throughput sequence sets

Post on 22-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets. Wyeth Wasserman Jan. 18, 2012. opossum.cisreg.ca/oPOSSUM3. Welcome. If you encounter any technical difficulties during the webinar Type a report using the chat option Slide presentation ~20 min - PowerPoint PPT Presentation

TRANSCRIPT

www.cmmt.ubc.ca

MOTIF ENRICHMENT ANALYSIS IN CO-EXPRESSED GENE SETS AND HIGH-

THROUGHPUT SEQUENCE SETS

Wyeth WassermanJan. 18, 2012

opossum.cisreg.ca/oPOSSUM3

Welcome

• If you encounter any technical difficulties during the webinar– Type a report using the chat option

• Slide presentation ~20 min• Compile Questions as they are submitted

and answer them during the final Q&A/discussion period

• During the discussion session, we’ll allow audience speaking

2

Webinar Format

• Introduction• Walk-Through• Summary• Q&A

3

INTRODUCTION

4

Overview

• Given co-expressed gene sets, what are the key mediators of co-expression?– Focus on TFs

• Web-based software system for motif enrichment analysis– Co-expressed genes or sequences– Multiple sets of analysis methods– Available for human, mouse, fly, worm, yeast

5

Motif Enrichment Analysis

6

Background Target

0

0.2

0.4

0.6

0.8

1

TFBS1 TFBS2 TFBS3

Prop

ortio

n of

gen

es c

onta

inin

g TF

BS

BackgroundTarget

p=0.04 p=0.55 p=0.66

Finds over-represented TFBS in co-expressed gene sets

What do we need?

• Region selection– Where to look for enriched binding sites– Use conservation filter to restrict search

space• TFBS profiles to search for

– Need a pool of validated profiles• Scoring metrics for enrichment

– How to measure motif over-representation

7

GeneCR1 CR2 CR4CR3

Threshold

Genomic Position

phastConsScore

Conserved Region Selection

8

TFBS Profiles• JASPAR 2010: Portales-Casamar et al. Nucleic

Acids Research 2009.• Expanded collection of TFBS profiles

– 130 vertebrate profiles– 105 insect profiles– 5 nematode profiles– 177 yeast profiles– PBM (104), PBM_HOMEO (176), PBM_BHLH (19)

• Standardized 2-level TF classification (class, family)

9

Scoring Metrics

• Z scores– Based on the number of occurrences of the TFBS

relative to background– Normalized for sequence length– Simple binomial distribution model

• Fisher scores– Fisher exact probability test

• Fisher score = -log(Fisher p-value)– Based on the number of genes containing the TFBS

relative to background

10

Additional Metric for Seq-Based• KS scores

– Kolmogorov-Smirnoff test– Compares the empirical

distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background

– Expect real binding sites to be centered around the MPC

11

MPC

Foreground

Background

KS score = -log(KS test p-value)

Analysis Methods

12

WALK-THROUGH

13

14

http://opossum.cisreg.ca/oPOSSUM3

Human SSA - Input

15

16

17

Human SSA - Results

18

19

TF HNF1A

JASPAR ID MA0046.1

Class Helix-Turn-Helix

Family Homeo

Tax Group Vertebrates

IC 15.548

GC Content 0.259

20

Target Gene Hits 19

Target Gene Non-Hits 36

Background Gene Hits 1113

Background Gene Non-Hits 3887

Target TFBS Hits 41

Target TFBS Nucleotide Rate 0.0269

Background TFBS Hits 2127

Background TFBS Nucleotide Rate 0.009

21

Z-score 15.134

Fisher score 3.646

22

oPOSSUM methods

23

24

Human aCSA - Input

25

Human aCSA - Input

26

Human aCSA - Input

27

Human aCSA - Results

28

29

30

TFBS Cluster Analysis

31

TFBS ProfileCluster

GeneCR1 CR2 CR4CR3

TFBSs

TFBS Cluster Hits

Merge

Overrepresentation Analysisbased on merged TFBS cluster hits

TFBS Cluster Analysis (TCA)

32

Human TCA – TFBS cluster selection

33

Human TCA - Results

34

TFCluster Info Page

35

36

Seq SSA - Input

37

Seq SSA - Input

38

39

40

41

42

43

44

Seq SSA - Results

45

46

KS score

47

Seq TCA - Input

48

SUMMARY

49

oPOSSUM-3

• Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments

• Important functionalities– Gene-based vs. Sequence-based– Single site vs. Anchored combination site– Individual vs. clusters of TFBS profiles– Human, mouse, fly, worm and yeast

50

Development Team

51

Version 1 CSA Version 2 Version 3• Ho Sui, SJ• Mortimer, JR• Arenillas, DJ• Brumm, J• Walsh, CJ• Kennedy, BP• Wasserman,

WW

• Huang, S• Fulton, DL• Arenillas, DJ• Perco, P• Ho Sui, SJ• Mortimer, JR• Wasserman,

WW

• Ho Sui, SJ• Fulton, DL• Arenillas, DJ• Kwon, AT• Wasserman,

WW

• Kwon, AT• Arenillas, DJ• Worsely

Hunt, R• Wasserman,

WW

QUESTIONS & ANSWERS

Please take a moment to type questions/comments into the chat box.The questions will be answered shortly.

52

top related