multiple comparisons measures of ld

19
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013

Upload: ike

Post on 23-Feb-2016

70 views

Category:

Documents


0 download

DESCRIPTION

Multiple Comparisons Measures of LD. Jess Paulus, ScD January 29, 2013. Today’s topics. Multiple comparisons Measures of Linkage disequilibrium D’ and r 2 r 2 and power. Multiple testing & significance thresholds. Concern about multiple testing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Comparisons Measures of LD

Multiple ComparisonsMeasures of LD

Jess Paulus, ScD January 29, 2013

Page 2: Multiple Comparisons Measures of LD

Today’s topics

1. Multiple comparisons2. Measures of Linkage disequilibrium

• D’ and r2

• r2 and power

Page 3: Multiple Comparisons Measures of LD

Multiple testing & significance thresholds

Concern about multiple testing Standard thresholds (p<0.05) will lead to a

large number of “significant” results Vast majority of which are false positives

Various approaches to handling this statistically

Page 4: Multiple Comparisons Measures of LD
Page 5: Multiple Comparisons Measures of LD

Possible Errors in Statistical Inference

Unobserved Truth in the Population

Ha: SNP prevents DMH0: No

association

Observed in the Sample

Reject

H0: SNP prevents

DM

True positive (1 – β)

False positive Type I error (α)

Fail to reject H0:

No assoc.

False negativeType II error (β):

True negative (1- α)

Page 6: Multiple Comparisons Measures of LD

Probability of Errors

α = Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5%

p value = The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone

Page 7: Multiple Comparisons Measures of LD

Type I Error (α) in Genetic and Molecular Research

A genome-wide association scan of 500,000 SNPs will yield:

25,000 false positives by chance alone using α = 0.05

5,000 false positives by chance alone using α = 0.01

500 false positives by chance alone using α = 0.001

Page 8: Multiple Comparisons Measures of LD

Multiple Comparisons Problem

Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously

Type I errors are more likely to occur Several statistical techniques have been developed to

attempt to adjust for multiple comparisons Bonferroni adjustment

Page 9: Multiple Comparisons Measures of LD

Adjusting alpha

Standard Bonferroni correction Test each SNP at the α* =α /m1 level Where m1 = number of markers tested Assuming m1 = 500,000, a Bonferroni-corrected threshold

of α*= 0.05/500,000 = 1x10–7 Conservative when the tests are correlated

Permutation or simulation procedures may increase power by accounting for test correlation

Page 10: Multiple Comparisons Measures of LD

Measures of LD

Jess Paulus, ScD January 29, 2013

Page 11: Multiple Comparisons Measures of LD

Haplotype definition Haplotype: an ordered sequence of alleles at

a subset of loci along a chromosome

Moving from examining single genetic markers to sets of markers

Page 12: Multiple Comparisons Measures of LD

Measures of linkage disequilibrium

Basic data: table of haplotype frequencies

A G

a g

A G

a g

A g

A G

a g

A G

A G

a g

A G

A g

a g

A G

a g

A G

A aG 8 0 50%

g 2 6 50%

62.5% 37.5%

Page 13: Multiple Comparisons Measures of LD

D’ and r2 are most common

Both measure correlation between two loci D prime …

Ranges from 0 [no LD] to 1 [complete LD] R squared…

also ranges from 0 to 1 is correlation between alleles on the same

chromosome

Page 14: Multiple Comparisons Measures of LD

D Deviation of the observed frequency of a

haplotype from the expected is a quantity called the linkage disequilibrium (D)

If two alleles are in LD, it means D ≠ 0 If D=1, there is complete dependency between

loci

Linkage equilibrium means D=0

Page 15: Multiple Comparisons Measures of LD

A aG n11 n10 n1

g n01 n00 n0

n1 n0

Measure Formula Ref.

D’ Lewontin (1964)

2 = r2 Hill and Weir (1994)

* Levin (1953)

Edwards (1963)

Q Yule (1900)

)nn,nnmin(nnnn

1001

01100011

o101

201100011

nnnnnnnn

011

01100011

nnnnnn

0110

0011

nnnn

01100011

01100011

nnnnnnnn

Page 16: Multiple Comparisons Measures of LD

A G

a g

A G

a g

A g

A G

a g

A G

A G

a g

A G

A g

a g

A G

a g

A G

A aG 8 0 50%g 2 6 50%

62.5% 37.5%

D’ =(86 – 0x2) / (86) =1 r2 = (86 – 0x2)2 / (10688) = .6

o101

201100011

nnnnnnnn

R2 =

)nn,nnmin(nnnn

1001

01100011

D’ =

Page 17: Multiple Comparisons Measures of LD

r2 and power r2 is directly related to study power

A low r2 corresponds to a large sample size that is required to detect the LD between the markers

r2*N is the “effective sample size” If a marker M and causal gene G are in LD, then a

study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured G

Page 18: Multiple Comparisons Measures of LD

r2 and power Example:

N = 1000 (500 cases and 500 controls) r2 = 0.4 If you had genotyped the causal gene directly,

would only need a total N=400 (200 cases and 200 controls)

Page 19: Multiple Comparisons Measures of LD

Today’s topics

1. Multiple comparisons2. Measures of Linkage disequilibrium

• D’ and r2

• r2 and power