Download - Multiple Comparisons Measures of LD
Multiple ComparisonsMeasures of LD
Jess Paulus, ScD January 29, 2013
Today’s topics
1. Multiple comparisons2. Measures of Linkage disequilibrium
• D’ and r2
• r2 and power
Multiple testing & significance thresholds
Concern about multiple testing Standard thresholds (p<0.05) will lead to a
large number of “significant” results Vast majority of which are false positives
Various approaches to handling this statistically
Possible Errors in Statistical Inference
Unobserved Truth in the Population
Ha: SNP prevents DMH0: No
association
Observed in the Sample
Reject
H0: SNP prevents
DM
True positive (1 – β)
False positive Type I error (α)
Fail to reject H0:
No assoc.
False negativeType II error (β):
True negative (1- α)
Probability of Errors
α = Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5%
p value = The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone
Type I Error (α) in Genetic and Molecular Research
A genome-wide association scan of 500,000 SNPs will yield:
25,000 false positives by chance alone using α = 0.05
5,000 false positives by chance alone using α = 0.01
500 false positives by chance alone using α = 0.001
Multiple Comparisons Problem
Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously
Type I errors are more likely to occur Several statistical techniques have been developed to
attempt to adjust for multiple comparisons Bonferroni adjustment
Adjusting alpha
Standard Bonferroni correction Test each SNP at the α* =α /m1 level Where m1 = number of markers tested Assuming m1 = 500,000, a Bonferroni-corrected threshold
of α*= 0.05/500,000 = 1x10–7 Conservative when the tests are correlated
Permutation or simulation procedures may increase power by accounting for test correlation
Measures of LD
Jess Paulus, ScD January 29, 2013
Haplotype definition Haplotype: an ordered sequence of alleles at
a subset of loci along a chromosome
Moving from examining single genetic markers to sets of markers
Measures of linkage disequilibrium
Basic data: table of haplotype frequencies
A G
a g
A G
a g
A g
A G
a g
A G
A G
a g
A G
A g
a g
A G
a g
A G
A aG 8 0 50%
g 2 6 50%
62.5% 37.5%
D’ and r2 are most common
Both measure correlation between two loci D prime …
Ranges from 0 [no LD] to 1 [complete LD] R squared…
also ranges from 0 to 1 is correlation between alleles on the same
chromosome
D Deviation of the observed frequency of a
haplotype from the expected is a quantity called the linkage disequilibrium (D)
If two alleles are in LD, it means D ≠ 0 If D=1, there is complete dependency between
loci
Linkage equilibrium means D=0
A aG n11 n10 n1
g n01 n00 n0
n1 n0
Measure Formula Ref.
D’ Lewontin (1964)
2 = r2 Hill and Weir (1994)
* Levin (1953)
Edwards (1963)
Q Yule (1900)
)nn,nnmin(nnnn
1001
01100011
o101
201100011
nnnnnnnn
011
01100011
nnnnnn
0110
0011
nnnn
01100011
01100011
nnnnnnnn
A G
a g
A G
a g
A g
A G
a g
A G
A G
a g
A G
A g
a g
A G
a g
A G
A aG 8 0 50%g 2 6 50%
62.5% 37.5%
D’ =(86 – 0x2) / (86) =1 r2 = (86 – 0x2)2 / (10688) = .6
o101
201100011
nnnnnnnn
R2 =
)nn,nnmin(nnnn
1001
01100011
D’ =
r2 and power r2 is directly related to study power
A low r2 corresponds to a large sample size that is required to detect the LD between the markers
r2*N is the “effective sample size” If a marker M and causal gene G are in LD, then a
study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured G
r2 and power Example:
N = 1000 (500 cases and 500 controls) r2 = 0.4 If you had genotyped the causal gene directly,
would only need a total N=400 (200 cases and 200 controls)
Today’s topics
1. Multiple comparisons2. Measures of Linkage disequilibrium
• D’ and r2
• r2 and power