p. j. munson, national institutes of health, nov. 2001page 1 a "consistency" test for...

28
P. J. Munson, National Institutes of Health, Nov. 2001 Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate Samples and Two Convenient Variance- stabilizing Transformations Peter J. Munson, Ph.D. Mathematical and Statistical Computing Laboratory DCB, CIT, NIH [email protected]

Upload: keyshawn-fackrell

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 1

A "Consistency" Test for Determining the Significance of Gene Expression

Changes on Replicate Samples

and

Two Convenient Variance-stabilizing Transformations

Peter J. Munson, Ph.D.Mathematical and Statistical Computing Laboratory

DCB, CIT, NIH

[email protected]

Page 2: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 2

Introduction

• Math. Stat. Comp. Lab. at NIH• Run Affy LIMS database

– Started Dec 2000, Stores >700 chips, – Serves 3 core facilities at NIH

• Study 1– 2 treatments, 5 time points, 6 subjects, 60 U95A chips, PBMC

cells

• Study 2– 3 treatments, 5 time points, 5 subj., 75 Hu6800 chips, human

cells in culter

• Study 3– 4 doses, 2 time oints, 20 subjects, 20 RG U34A chips, blood

cells

Page 3: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 3

Outline

• Development of Consistency Test• Variance-stabilizing transforms

– Generalize Logarithm, GLog– Adaptive transform for Average Diff, TAD

• Normalization– Normal quantile + adaptive transform

• Application• Probe-pair data visualization:

– Parallel Axis Coordinate Display

Page 4: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 4

Comparing Two Cell Lines

Data from Carlisle, et al., Mol.Carcinogen., 2000Data from Carlisle, et al., Mol.Carcinogen., 2000

• Don’t subtract

background

• Ignore background-level

points

• Calibrate on median

intensity of each cell type

• Over 3-fold change = =

Outside dashed lines

• Are these expression

level changes significant?

real?

Page 5: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 5

Duplicate Experiments and "Consistency" Plot

Identifies Real Changes in ExpressionIdentifies Real Changes in Expression

Vimentin

Keratin 5

Page 6: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 6

Replication Permits Calculation of Significance (P-values)

4 False-positives4 False-positivesOut of 5760 spots:Out of 5760 spots:

P ≈ 4/5760 = 0.0007P ≈ 4/5760 = 0.0007

Page 7: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 7

Consistency Plot

• Compare duplicate experiments, Log Ratio scale

• Set Cutoffs for Over-, Under-expression

• Calculate number detected, D

• Assume Independence, calculate expected number, E, above both, below both cutoffs

• Estimate false positive rate, E/D

0

0. 3

22

45.2

D=24

E=0. 6

E/D=3%

46

11

26.1

4074

4036.6

28

50.4

4113

16

E=0.6

74

88.4

0

1.1

90

27 4170 52 4249

-1

-0.8

-0.6

-0.4

-0.2

-0

0.2

0.4

0.6

0.8

1

L21b**exp45

-1 -0.8 -0.6 -0.4 -0.2 -0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1L12b**exp44

D=24D=24

D=16D=16

Page 8: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 8

-1

0

1

L21**exp64

-1 -0.8 -0.6 -0.4 -0.2 -0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1L12**exp63

-1

-0.8

-0.6

-0.4

-0.2

-0

0.2

0.4

0.6

0.8

1

L21**exp45

-1 0 1L12**exp44

p53 +/+ cells 6 hrs, replicate reciprocal experiment

Page 9: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 9

Consistency Test on Relative ExpressionDEFINE: x(g, i) = relative expression value for gene g (=1,...,n) in experiment i (=1,...,m)

Fi(X) = empirical cdf of xi across genes (spots)

c = minj x(g, j), across experiments

THEN assuming that { x(g, i), g=1,...,n } are an independent sample from distribution Fi , the probability that x(g, i) is consistently large is:

pup (g) = Pr(Xi ≥ c, for all i) = ∏i (1 - Fi(c))

Page 10: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 10

Consistency Test on Relative Expression- 2

DEFINE: x(g, i) = relative expression value for gene g (= 1,...,n) in experiment i (= 1,...,m) pup(g) = ∏i (1 - Fi( minj x(g, j) )) pdn(g) = ∏i (Fi( maxj x(g, j) ))

THEN

Expected number of false positives: E(g) = n * p(g)

Page 11: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 11

Assumptions of Consistency Test

• Independence between experiments

• “Exchangeability” of genes

• Homogeneity of variance across genes (i.e. across expression intensity)

Does NOT require:

• Identical distribution in separate experiments

But, variance homogeneity violated for Affy Avg. Diff. data

Page 12: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 12

Variance Stabilizing Transformations

• Logarithm

• Box-Cox, power

• Generalized Logarithm, GLog

• Adaptive, TAD

Page 13: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 13

Model Variance as Function of Mean AD

Page 14: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 14

Model Variance as Function of Mean AD

Var(y) = a0 Var(y) = a0 + a1*yVar(y) = a0 + a1*y + a2*y2

Var(y) = a2*y2

=>> use logarithms

What about:

Var(y) = a0 + a2*y2

Page 15: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 15

Var(y) = a0 + a2 * y2

= a0*( 1+ (y/c)2) where c = sqrt(a0/a2)

GLog(y; c) = sign(y) *ln{ |y/c| + sqrt(1 + y2/c2) }

= s.d. at y = 0 / CV, e.g. = 10 / 0.1 = 100

Generalized Log Transform (G-Log)

Page 16: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 16

Quantile Normalization for AD (before)

Page 17: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 17

Quantile Normalization for AD (after)

Page 18: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 18

Normal Quantile Transform after GLog(AD)(it’s almost linear)

Page 19: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 19

Adaptive Transform of AD (TAD) - 1

Model variance (over manyreplicates) vs. mean AD

Plot:

Log(SD) or Wilson-Hilferty, SD^(2/3)transformvs.

Mean of NQ(AD)

Fit smooth function, g whichpredicts SD

Page 20: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 20

T(X) = Int(-inf,X,1/g)

Adaptive Transform of AD (TAD) - 2

Page 21: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 21

Adaptive Transform of AD (TAD)

Page 22: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 22

500

1000

1500

Count Axis

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

100

200

300

Count Axis

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Consistency Test p-values

Time 2 vs. Time 0 Time 1 vs. Time 0

Treatment

Sham

Page 23: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 23

Table 1. Number of genes detected by consistency test with expected false positivesset to 1.0Group Any Time 1-0 2-0 3-0 4-0

Treated 385 13 340 22 19Controls 83 21 23 26 24Both 2 0 1 2 1

Table 3. Number of genes detected by Maximum TAD greater than 1Group Any time 1-0 2-0 3-0 4-0Treated 275 5 264 4 5Controls 6 1 2 4 4Both 1 0 0 0 1

Results of Study 1(5 time points, 2 treatments, 6 subjects)

Page 24: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 24

Probe Pair Data, Delta TAD = 2Parallel Axis Coordinate Display

Page 25: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 25

Probe Pair Data Delta TAD = 0.5

Page 26: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 26

Probe Pair Data, Delta TAD = -1.5

Page 27: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 27

Probe Pair Data, Delta TAD = -0.5

Page 28: P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate

P. J. Munson, National Institutes of Health, Nov. 2001Page 28

Acknowledgements

Lynn Young, MSCLVinay Prabhu, MSCLJennifer Barb, MSCLHoward Shindel, MSCLAndrew Schwartz, CITSteve Bailey, CIT

Robert Danner, CCAnthony Suffredini, CCPeter Eichacker, CCJames Shelhamer, CCEric Gerstenberger, CC

Sayed Daoud, NCIYves Pommier, NCIJohn Weinstein, NCI

David Krizman, NCIAlex Carlisle, NCI

David Rocke, UC Davis