an evaluation of statistical methods for dna methylation ... · an evaluation of statistical...

28
An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational Science Institute School of Medicine and Dentistry University of Rochester Email: Dongmei Li @urmc .rochester .edu Li, D; Xie, Z.; Le Pape, M.; Dye, T. ”An evaluation of statistical methods for DNA methylation microarray data analysis”. BMC Bioinformatics. 2015; 16(217). November 16, 2015 Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 1 / 28

Upload: others

Post on 18-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

An Evaluation of Statistical Methods for DNAMethylation Microarray Data Analysis

Dongmei Li, PhD

Clinical and Translational Science InstituteSchool of Medicine and Dentistry

University of RochesterEmail: Dongmei [email protected] .edu

Li, D; Xie, Z.; Le Pape, M.; Dye, T. ”An evaluation of statistical methods for DNAmethylation microarray data analysis”. BMC Bioinformatics. 2015; 16(217).

November 16, 2015

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 1 / 28

Page 2: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Outline

1 Introduction

2 Methylation array analysis methods

3 Simulation Studies

4 Real data example

5 Discussion

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 2 / 28

Page 3: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Introduction

DNA methylation

Figure: DNAmolecule that is methylated at the two center cytosines. Source:http : //en.wikipedia.org/wiki/DNAmethylation

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 3 / 28

Page 4: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Introduction

DNA methylation and disease

Figure: Diagram for DNA methylation and cancer. Source: 2005NaturePublishing GroupRobertson, K. DNA methylation and human disease.NatureReviews Genetics6,598.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 4 / 28

Page 5: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Introduction

Epigenome-wide association studies (EWAS)

Illumina Methylation Assay

Three platforms for DNA methylation assay

GoldenGate (1, 563 methylation site per sample)

Infinium Human Methylation27 (> 27, 000 methylation sites persample)

Infinium HumanMethylation450 BeadChip (> 485, 000 methylationsites per sample)

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 5 / 28

Page 6: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Introduction

Work flow of Illumina Assay

Figure: Source: http : //en.wikipedia.org/wiki/IlluminaMethylationAssay

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 6 / 28

Page 7: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Methylation array downstream analysis

Locus-by-Locus analyses are commonly used for EWAS

Average β value denote the level or percentage of methylation for alocus

M value, or log ratio of percentage of methylation, is also commonlyused to measure methylation

Relationship between the β-value and the M-value

M = log2β

1− β

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 7 / 28

Page 8: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Methods implemented in Bioconductor/R

Wilcoxon rank sum test (methyAnalysis)

t-test (methyAnalysis, CpGAssoc, RnBeads, and IMA package)

Kolmogorov-Smirnov Tests (Used in some papers: Price et al.Epigenetics & Chromatin 2013, 6:4)

Permutation test (CpGAssoc package)

Empirical Bayes method (RnBeads, IMA and minfi package)

Bump hunting method (minfi package)

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 8 / 28

Page 9: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Motivation for evaluation methylation data analysismethods

Finding the most appropriate one to use for a specific data set ischallenging

Different methods have different assumptions to get validate results

Multiple methods could provide inconsistent results for the same dataset

Exploring power and stability differences across different methods forthe same data set

Proving advice for investigators choosing appropriate method for theirmethylation data

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 9 / 28

Page 10: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Definition of power and stability

number not rejected number rejected

true null hypotheses U V m0

non-true null hypotheses T S m1

total m − R R m

Table: Possible outcomes from m hypotheses tests

Power = E (S

m1|m > m0),

Stability = Var(R) = Var(S + V ) = Var(S) + Var(V ) + 2Cov(S ,V ).

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 10 / 28

Page 11: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Wilcoxon rank sum test

H0 : Median1 = Median2

z =R − µRσR

where

µR =n1(n1 + n2 + 1)

2

σR =

√n1n2(n1 + n2 + 1)

12R = sum of ranks for smaller sample size (n1)

n1 = smalle of sample sizes

n2 = larger of sample sizes

n1 ≥ 10 and n2 ≥ 10

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 11 / 28

Page 12: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

t-test

H0 : µ1 = µ2

t =y1 − y2

sy1−y2

where

sy1−y2 =

√s2

1

n1+

s22

n2

df =(s2i1n1

+s2i2n2

)2

(s2i1n1

)2

n1−1 +(s2i2n2

)2

n2−1

.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 12 / 28

Page 13: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Kolmogorov-Smirnov Tests

H0 : F1,n1(y) = F2,n2(y)

Dn1,n2 = supy|F1,n1(y)− F2,n2(y)|

The null hypothesis is rejected at level α if

Dn1,n2 > c(α)

√n1 + n2

n1n2

where c(α) = 1.36 for α = 0.05

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 13 / 28

Page 14: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Permutation test

H0 : F1,n1(y) = F2,n2(y)

Compute the test statistic for the observed data set

Permute the original data in a way that matches the null hypothesis

Calculate the critical value of a level α test based on the upper αpercentile of the reference distribution

Obtain the raw p-value from the reference distribution.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 14 / 28

Page 15: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Empirical Bayes method

H0 : β∗gj = 0

The moderated t-statistic, based on a hybrid classical/Bayes approach, isdefined by:

tgj =β∗gj

sg√vgj

The posterior mean of σ2g given s2

g is

s2g = E (σ2

g |s2g ) =

d0s20 + dg s

2g

d0 + dg

The prior estimator s20 and d0 degrees of freedom is estimated from the

data by equating empirical to expected values for the first two moments oflogs2

g

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 15 / 28

Page 16: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Methylation array analysis methods

Bump hunting method

H0 : β∗(tj) = 0

Fit a linear model between methylation and disease type, covariates, andpotential confounding variables

Yij = µ(tj) + β∗(tj)Xi +

p∑k=1

γk(tj)Zi ,k +

q∑l=1

al ,jWi ,l + εi ,j

where i is ith subject and j is j−th genomic locus

1 Estimate β(tj) for each tj2 Use these to estimate the smooth function β(t)

3 Use this to estimate the regions Rn, n = 1, . . . ,N for which β(t) 6= 0for all t ∈ Rn

4 Use permutation tests to assign statistical uncertainty to eachestimated region

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 16 / 28

Page 17: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Simulation Studies

Simulation Set up

Methylation data are generated from mixed beta distributions tomimic real methylation data

Proportion of methylated loci are 0.01, 0.05, 0.10, 0.25, 0.50, 0.75,0.90 to cover all possible scenarios

Mean β value differences were set to be between 0.1 and 0.4 withsteps equaling 1/(m −m0) such as 1/10, 1/50, 1/100, 1/250, 1/500,1/750, 1/900

1000 loci and 1000 independent simulations

Sample sizes are 3, 6, 12, and 24 in each group with two-groupcomparisons

Both β values and M values are compared

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 17 / 28

Page 18: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Simulation Studies

Methylation Results for sample size 3 in each group (Left:β values and Right: M values)

0.0 0.2 0.4 0.6 0.8

0.00

00.

010

0.02

0

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

010

2030

40

π1

SD

of t

otal

dis

cove

ries

0.0 0.2 0.4 0.6 0.8

0.00

00.

015

0.03

0

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

010

2030

40

π1

SD

of t

otal

dis

cove

ries

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH; Purple: Bump Hunting q-value.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 18 / 28

Page 19: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Simulation Studies

Methylation Results for sample size 6 in each group (Left:β values and Right: M values)

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1020

π1

SD

of t

otal

dis

cove

ries

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1015

20

π1

SD

of t

otal

dis

cove

ries

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH; Purple: Bump Hunting q-value.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 19 / 28

Page 20: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Simulation Studies

Methylation Results for sample size 12 in each group (Left:β values and Right: M values)

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1015

20

π1

SD

of t

otal

dis

cove

ries

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1015

π1

SD

of t

otal

dis

cove

ries

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH; Purple: Bump Hunting q-value.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 20 / 28

Page 21: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Simulation Studies

Methylation Results for sample size 24 in each group (Left:β values and Right: M values)

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1015

20

π1

SD

of t

otal

dis

cove

ries

0.0 0.2 0.4 0.6 0.8

0.00

0.02

0.04

π1

FD

R

0.0 0.2 0.4 0.6 0.8

0.0

0.4

0.8

π1

Pow

er

0.0 0.2 0.4 0.6 0.8

020

060

0

π1

Mea

n of

tota

l dis

cove

ries

0.0 0.2 0.4 0.6 0.8

05

1015

π1

SD

of t

otal

dis

cove

ries

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH; Purple: Bump Hunting q-value.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 21 / 28

Page 22: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Real data example

Real data example

Genome wide DNA methylation profiling of United Kingdom OvarianCancer Population Study (UKOPS) with GEO accession numberGSE19711

Illumina Infinium 27k Human DNA methylation Beadchip v1.2 with27578 CpGs in whole blood sample from 3, 6, or 12 cases and 3, 6, or12 controls

Total number of rejections at 10 significance levels were recordedusing raw p-values

Both β values and M values are compared

Raws p-values are used for comparisons and no rejections for BumpHunting method using Storey’s q-value adjustment

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 22 / 28

Page 23: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Real data example

Apparent test power comparisons for n = 3 in each group

0.02 0.04 0.06 0.08 0.10

050

010

0015

0020

0025

00

FDR level

Tota

l dis

cove

ries

usin

g be

ta v

alue

RanktKSPermutationE−BayesBH−BHBH−q

0.02 0.04 0.06 0.08 0.10

050

010

0015

0020

0025

00

FDR level

Tota

l dis

cove

ries

usin

g M

val

ue

RanktKSPermutationE−BayesBH−BHBH−q

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 23 / 28

Page 24: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Real data example

Apparent test power comparisons for n = 6 in each group

0.02 0.04 0.06 0.08 0.10

010

0020

0030

0040

00

FDR level

Tota

l dis

cove

ries

usin

g be

ta v

alue

RanktKSPermutationE−BayesBH−BHBH−q

0.02 0.04 0.06 0.08 0.10

010

0020

0030

0040

00

FDR level

Tota

l dis

cove

ries

usin

g M

val

ue

RanktKSPermutationE−BayesBH−BHBH−q

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 24 / 28

Page 25: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Real data example

Apparent test power comparisons for n = 12 in each group

0.02 0.04 0.06 0.08 0.10

1000

2000

3000

4000

5000

6000

7000

FDR level

Tota

l dis

cove

ries

usin

g be

ta v

alue

RanktKSPermutationE−BayesBH−BHBH−q

0.02 0.04 0.06 0.08 0.10

1000

2000

3000

4000

5000

6000

7000

FDR level

Tota

l dis

cove

ries

usin

g M

val

ue

RanktKSPermutationE−BayesBH−BHBH−q

Figure: Blue: rank test; Red: t-test; Green: KS test; Black: permutation test; Orange:Empirical Bayes; Yellow: Bump Hunting BH.

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 25 / 28

Page 26: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Discussion

Discussion

No significant differences were detected in terms of FDR control,power, and stability between β values and M values

For small sample size, both empirical Bayes method and bumphunting method showed good FDR control and much larger powerthan all other methods compared

For medium to large sample size, all methods compared have goodFDR control except the bump hunting method with large proportionof differentially methylated loci

For medium to large sample size, all methods compared have almostequivalent power except permutation test with very low proportion ofdifferentially methylated loci

For all sample sizes, bump hunting method has lowest stability interms of variance of total discoveries

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 26 / 28

Page 27: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Discussion

Conclusion

Either β values or M values are good to use for methylation dataanalysis

Empirical Bayes method is recommended for methylation studies withsmall sample size

For medium to large sample size, all methods except the bumphunting method are good for differentially methylation data analysis

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 27 / 28

Page 28: An Evaluation of Statistical Methods for DNA Methylation ... · An Evaluation of Statistical Methods for DNA Methylation Microarray Data Analysis Dongmei Li, PhD Clinical and Translational

Discussion

Acknowledgement

Clinical and Translational Science Institute

Dongmei Li, PhD (URMC) Biostatistics-2015 conference November 16, 2015 28 / 28