analysis of pooled dna - a decision-theoretic model · analysis of pooled dna - a...
TRANSCRIPT
Analysis of pooled DNA -a decision-theoretic model
Inke R. König and Andreas ZieglerInstitute of Medical Biometry and Statistics
Medical University of Lü[email protected]
BAMM 2001, Heidelberg
Overview
! Association analyses using cases and independent controls
! Association analyses using pooled DNA ofcases and independent controls
! Problem setting: association studies for asthma affection status
! Possible solution: a decision theoretic approach
Association analyses in case-control designs- Principle
! Measure of deviation from independenttransmission of a locus and a disease
disease no disease
mM
Association =Specific allele at the genetic locus is more frequent in group ofaffecteds than in groupof non-affecteds
Association analyses in case-control designs- Principle
! Disease causing mutation D occured inspecific haplotype MD
! Haplotype significantly more frequent incases than expected:P(MD) - P(M)P(D) ≠ 0
! Deviation from expected independence= linkage disequilibrium
Association analyses in case-control designs- Advantage and disadvantage
Advantage:! Easier ascertainment of samples! Data of controls can be reused! Greater statistical power" Greater cost effectiveness
Disadvantage:! Possible population stratification
Association analyses in case-control designs- Challenges
! Small effects likely in complex diseases! Possibility of locus heterogeneity! Possibility of gene-gene and gene-
environment interactions! Low significance thresholds required" Large sample sizes necessary" Cost effective study designs required
DNA pooling - Idea
! Equal amount of DNA from individual samples is pooled before genotyping
! Creation of pool for cases and for controls
!"
U!"
U
DNA pooling - Idea
! Comparison of allele frequency estimates inthe two pools
! Requires high accuracy and reproducibility, as differences between cases and controls might be small
! Reduction of cost and amount of DNA
DNA pooling - Expected reduction
! Assume to genotype 1,000 cases and 1,000 controls at 10,000 SNPs= 2⋅107 genotypings
! Using 384-well plates for PCR = 52,090 assay plates necessary
! DNA pooling and replicating each PCR 10times= 2⋅105 genotypings = 521 assay plates
DNA pooling - Procedure
! Microsatellite markers or SNPs! Case-control and family samples! Estimation of allele frequencies
# by peak heights(e.g., Barcellos et al. 1997 Am J Hum Genet 61:734-47)
# or from allele image patterns(e.g., Daniels et al. 1998 Am J Hum Genet 62:1189-97)
DNA pooling - Errors
! Possible sources of error:# low reproducibility of PCR results#marker specific unequal amplification of alleles# stutter bands in microsatellite markers# residual differences
DNA pooling - Reproducibility
! Different results in multiple PCRs of same pool
! Recommendation to# genotype pool 5, 10, 16 times# construct triplicate pools and duplicate PCR"estimate allele frequency as mean or median of
obtained values
DNA pooling - Unequal allele amplification
! Differential amplification of alleles! Recommended corrections:
# genotype heterozygotes individually# determine ratio k of peak heights, expected to
equal 1# correct frequency f of allele A to f = A/(A+kB)
with A and B as observed frequencies
DNA pooling - Procedure
! Performed in 2 stages:# initial screen for association using pooled DNA# individual genotyping of markers with positive
results
DNA pooling - Follow up with individual genotyping
! Corrections not sufficiently accurate for complex diseases " too many false positives and false
negatives! Possible population stratification
DNA pooling - Follow up with individual genotyping
! Tests on allele frequencies appropriate only under Hardy-Weinberg equilibrium(Sasieni 1997 Biometrics 53:1253-61)
! No information about:# haplotypes# imprinting# specific genetic models# ...
Problem setting - Previous results on Asthma
! Genome scans for linkage of asthma affection status point to chromosome 6p in:#Caucasian families
(CSGA 1997 Nat Genet 15:389-392Xu et al. 2001 Am J Hum Genet 68:1437-1446)
#German and Swedish families(Wjst et al. 1999 Genomics 58:1-8)
# Japanese families(Yokouchi et al. 2000 Genomics 66:152-160)
Problem setting - Plan of association studies
! Conduct case-control study to analyze association in identified region
! Using of at least 1400 SNPs with distance <40 kb
! Use of pooled DNA as initial screen! Which SNPs are interesting to be followed
up with individual genotyping?
Problem setting- Previous criteria for follow up
! Mostly estimation of allele frequencies only(Hoogendoorn et al. 2000 Hum Genet 107:488-93)
! Modification of test statistic ...#without follow up (Le Hellard et al. 2001)# or without accounting for errors due to pooling
(Risch & Teng 1998 Genome Res 8:1273-88)
Problem setting- Previous criteria for follow up
! Pooled analysis as initial screen, criterion ...# „more liberal“ than significance criterion
(Daniels et al. 1998 Am J Hum Genet 62:1189-97)# or not explained
(Barcellos et al. 1997 Am J Hum Genet 61:734-47)
Possible solution- Decision model
! Aim to determine criterion that is optimal for given situation
! Modeling as decision theoretic problem in adecision chart
Possible solution- Decision tree
! State of nature! Validity of H0 or H1
depends on a prioriprobability
H0 or H1 is valid# #
Analysis of pooled DNA# + / ? / - #
Decision for individual genotyping# yes / no #
Analysis of individual DNA# + / - #
Outcome
Possible solution- Decision tree
! Result of pooled analysis ispositive, inconclusive, ornegative
! Result depends on power (sample size, effect size)
! Result depends on error of DNA pooling (size,distribution)
H0 or H1 is valid# #
Analysis of pooled DNA# + / ? / - #
Decision for individual genotyping# yes / no #
Analysis of individual DNA# + / - #
Outcome
Possible solution- Decision tree
! Action! Decision for or against
individual genotypingat each marker
! Criterion to be established
H0 or H1 is valid# #
Analysis of pooled DNA# + / ? / - #
Decision for individual genotyping# yes / no #
Analysis of individual DNA# + / - #
Outcome
Possible solution- Decision tree
! Result of individual genotyping positive ornegative
! Result depends on power
! Result depends oncriterion for previous decision
H0 or H1 is valid# #
Analysis of pooled DNA# + / ? / - #
Decision for individual genotyping# yes / no #
Analysis of individual DNA# + / - #
Outcome
Possible solution- Decision tree
! Possible outcomes tobe evaluated
! Evaluation determines previous decisions
H0 or H1 is valid# #
Analysis of pooled DNA# + / ? / - #
Decision for individual genotyping# yes / no #
Analysis of individual DNA# + / - #
Outcome
Possible solution- Decision tree
Test +
Test -
Test -
Test -
Test +
Test +
Test -
Test +
Test -
Test +
Test +
Test -
correct positivefalse negative
correct positivefalse negative
correct positivefalse negative
false positivecorrect negative
false positivecorrect, negative
false positivecorrect negative
false negative
false negative
false negative
correct negative
correct negative
correct negative
I yes
I yes
I yes
I yes
I yes
I yes
I no
I no
I no
I no
I no
I no
H1
H0
P
P
Test +
Test ?
Test -
Test +
Test ?
Test -
Possible solution- Implementation
! Aim: calculate loss (utility) under H0 and H1! Specify:
# number of cases and controls# allele frequency in cases and controls under H0
and H1# size and distribution of measurement error# loss function for: false positive, false negative,
correct positive, correct negative
Possible solution- Implementation
! For given number of replications:# Generates individual genotypes for cases and controls
under H0 and H1# Generates pooled genotypes by adding measurement
error# Calculates test statistic for pooled genotypes and
compares with each criterion# If criterion met, calculates test statistic for individual
genotypes# Calculates expected loss function values
Possible solution- Example
! p (cases H0) = p (controls) = 0.5! p (cases H1) = 0.7, error ~ N(0, 0.05)! 93 cases and controls for
α = 0.05 and (1-β) = 0.8! 0-1 Loss function:
H0 H1
Decision H0 0 1H1 1 0
State of Nature
Possible solution- Example
Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis
under H0under H1
0
0.2
0.4
0.6
0.8
1
00.20.40.60.81
Significance level for analysis of pooled data
Gen
otyp
ing
Rat
e
Possible solution- Example
Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis
0
0.1
0.2
0.3
0.4
0.5
00.20.40.60.81
Significance level for analysis of pooled data
Exp
ecte
d Lo
ss
under H0under H1
Possible solution- Decision rule
! Bayes risk principle:# selects decision rule with minimal average loss# requires assumptions about a priori probabilities" for each SNP, probabilities of H0 and H1 have
to be determined
Possible solution- Decision rule
! Minimax principle:# selects decision rule with minimal maximum loss
Possible solution- Example
Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis
0
0.1
0.2
0.3
0.4
0.5
00.20.40.60.81
Significance level for analysis of pooled data
Exp
ecte
d Lo
ss
under H0under H1
Possible solution- Decision rule
! Minimax principle:# selects decision rule with minimal maximum loss# in given setting with 0-1 loss function and higher
type II error rate: maximization of power" different loss function may be more reasonable
! Possible definition of loss function with respect to accuracy, cost, and publicity
Possible solution- Decision tree
correct, positive, expensivefalse, negative, expensive
correct, positive, expensivefalse, negative, expensive
correct, positive, expensivefalse, negative, expensive
false, positive, expensivecorrect, negative, expensive
false, positive, expensivecorrect, negative, expensive
false, positive, expensivecorrect, negative, expensive
false, negative, cheap
false, negative, cheap
false, negative, cheap
correct, negative, cheap
correct, negative, cheap
correct, negative, cheap
Test +
Test -
Test -
Test -
Test +
Test +
Test -
Test +
Test -
Test +
Test +
Test -
I yes
I yes
I yes
I yes
I yes
I yes
H1
H0
P
P
Test +
Test ?
Test -
Test +
Test ?
Test -
I no
I no
I no
I no
I no
I no
Possible solution- Construction of loss functions
! Assuming that individual genotyping is necessary for a positive result, 6 outcomes are possible:
Positive NegativeCost Correct False Correct FalseHighLow — —
Possible solution- Construction of loss functions
! Outcome with highest loss = l1 =false $, negative %, expensive !!!
" L(l1) = 1! Outcome with lowest loss = l2 =
correct &, positive ', expensive !!!
" L(l2) = 0! Determine loss of l3 =
false $, negative %, cheap != L(l3)
Possible solution- Construction of loss functions
! „Betting“ situation:# assume, l3 is given# you are asked to „pay“ it in order to play the
gamble
$%! for $%!!! with probability α
and &'!!! with probability (1-α)# At which α do you gamble?
Possible solution- Construction of loss functions
! Assume you are willing to pay it to have
10% probability for &'!!! and
90% probability for $%!!!
" α = 0.9 " L(l3) = 0.9! Same procedure for other outcomes
Possible solution- Remaining questions
! How large is the measurement error that is due to the pooling? What is its distribution?
! How can reasonable loss functions be determined?
! Which decision rule should be employed? If necessary, how can we determine a prioriprobabilities?