analysis of pooled dna - a decision-theoretic model · analysis of pooled dna - a...

Analysis of pooled DNA -a decision-theoretic model

Inke R. König and Andreas ZieglerInstitute of Medical Biometry and Statistics

Medical University of Lü[email protected]

BAMM 2001, Heidelberg

Overview

! Association analyses using cases and independent controls

! Association analyses using pooled DNA ofcases and independent controls

! Problem setting: association studies for asthma affection status

! Possible solution: a decision theoretic approach

Association analyses in case-control designs- Principle

! Measure of deviation from independenttransmission of a locus and a disease

disease no disease

mM

Association =Specific allele at the genetic locus is more frequent in group ofaffecteds than in groupof non-affecteds

Association analyses in case-control designs- Principle

! Disease causing mutation D occured inspecific haplotype MD

! Haplotype significantly more frequent incases than expected:P(MD) - P(M)P(D) ≠ 0

! Deviation from expected independence= linkage disequilibrium

Association analyses in case-control designs- Advantage and disadvantage

Advantage:! Easier ascertainment of samples! Data of controls can be reused! Greater statistical power" Greater cost effectiveness

Disadvantage:! Possible population stratification

Association analyses in case-control designs- Challenges

! Small effects likely in complex diseases! Possibility of locus heterogeneity! Possibility of gene-gene and gene-

environment interactions! Low significance thresholds required" Large sample sizes necessary" Cost effective study designs required

DNA pooling - Idea

! Equal amount of DNA from individual samples is pooled before genotyping

! Creation of pool for cases and for controls

!"

U!"

U

DNA pooling - Idea

! Comparison of allele frequency estimates inthe two pools

! Requires high accuracy and reproducibility, as differences between cases and controls might be small

! Reduction of cost and amount of DNA

DNA pooling - Expected reduction

! Assume to genotype 1,000 cases and 1,000 controls at 10,000 SNPs= 2⋅107 genotypings

! Using 384-well plates for PCR = 52,090 assay plates necessary

! DNA pooling and replicating each PCR 10times= 2⋅105 genotypings = 521 assay plates

DNA pooling - Procedure

! Microsatellite markers or SNPs! Case-control and family samples! Estimation of allele frequencies

# by peak heights(e.g., Barcellos et al. 1997 Am J Hum Genet 61:734-47)

# or from allele image patterns(e.g., Daniels et al. 1998 Am J Hum Genet 62:1189-97)

DNA pooling - Errors

! Possible sources of error:# low reproducibility of PCR results#marker specific unequal amplification of alleles# stutter bands in microsatellite markers# residual differences

DNA pooling - Reproducibility

! Different results in multiple PCRs of same pool

! Recommendation to# genotype pool 5, 10, 16 times# construct triplicate pools and duplicate PCR"estimate allele frequency as mean or median of

obtained values

DNA pooling - Unequal allele amplification

! Differential amplification of alleles! Recommended corrections:

# genotype heterozygotes individually# determine ratio k of peak heights, expected to

equal 1# correct frequency f of allele A to f = A/(A+kB)

with A and B as observed frequencies

DNA pooling - Procedure

! Performed in 2 stages:# initial screen for association using pooled DNA# individual genotyping of markers with positive

results

DNA pooling - Follow up with individual genotyping

! Corrections not sufficiently accurate for complex diseases " too many false positives and false

negatives! Possible population stratification

DNA pooling - Follow up with individual genotyping

! Tests on allele frequencies appropriate only under Hardy-Weinberg equilibrium(Sasieni 1997 Biometrics 53:1253-61)

! No information about:# haplotypes# imprinting# specific genetic models# ...

Problem setting - Previous results on Asthma

! Genome scans for linkage of asthma affection status point to chromosome 6p in:#Caucasian families

(CSGA 1997 Nat Genet 15:389-392Xu et al. 2001 Am J Hum Genet 68:1437-1446)

#German and Swedish families(Wjst et al. 1999 Genomics 58:1-8)

# Japanese families(Yokouchi et al. 2000 Genomics 66:152-160)

Problem setting - Plan of association studies

! Conduct case-control study to analyze association in identified region

! Using of at least 1400 SNPs with distance <40 kb

! Use of pooled DNA as initial screen! Which SNPs are interesting to be followed

up with individual genotyping?

Problem setting- Previous criteria for follow up

! Mostly estimation of allele frequencies only(Hoogendoorn et al. 2000 Hum Genet 107:488-93)

! Modification of test statistic ...#without follow up (Le Hellard et al. 2001)# or without accounting for errors due to pooling

(Risch & Teng 1998 Genome Res 8:1273-88)

Problem setting- Previous criteria for follow up

! Pooled analysis as initial screen, criterion ...# „more liberal“ than significance criterion

(Daniels et al. 1998 Am J Hum Genet 62:1189-97)# or not explained

(Barcellos et al. 1997 Am J Hum Genet 61:734-47)

Possible solution- Decision model

! Aim to determine criterion that is optimal for given situation

! Modeling as decision theoretic problem in adecision chart

Possible solution- Decision tree

! State of nature! Validity of H0 or H1

depends on a prioriprobability

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome


! Result of pooled analysis ispositive, inconclusive, ornegative

! Result depends on power (sample size, effect size)

! Result depends on error of DNA pooling (size,distribution)





Outcome


! Action! Decision for or against

individual genotypingat each marker

! Criterion to be established





Outcome


! Result of individual genotyping positive ornegative

! Result depends on power

! Result depends oncriterion for previous decision





Outcome


! Possible outcomes tobe evaluated

! Evaluation determines previous decisions





Outcome


Test +

Test -

Test -

Test -

Test +

Test +

Test -

Test +

Test -

Test +

Test +

Test -

correct positivefalse negative



false positivecorrect negative

false positivecorrect, negative

false positivecorrect negative

false negative

false negative

false negative

correct negative

correct negative

correct negative

I yes

I yes

I yes

I yes

I yes

I yes

I no

I no

I no

I no

I no

I no

H1

H0

P

P

Test +

Test ?

Test -

Test +

Test ?

Test -

Possible solution- Implementation

! Aim: calculate loss (utility) under H0 and H1! Specify:

# number of cases and controls# allele frequency in cases and controls under H0

and H1# size and distribution of measurement error# loss function for: false positive, false negative,

correct positive, correct negative

Possible solution- Implementation

! For given number of replications:# Generates individual genotypes for cases and controls

under H0 and H1# Generates pooled genotypes by adding measurement

error# Calculates test statistic for pooled genotypes and

compares with each criterion# If criterion met, calculates test statistic for individual

genotypes# Calculates expected loss function values

Possible solution- Example

! p (cases H0) = p (controls) = 0.5! p (cases H1) = 0.7, error ~ N(0, 0.05)! 93 cases and controls for

α = 0.05 and (1-β) = 0.8! 0-1 Loss function:

H0 H1

Decision H0 0 1H1 1 0

State of Nature


Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis

under H0under H1

0

0.2

0.4

0.6

0.8

1

00.20.40.60.81

Significance level for analysis of pooled data

Gen

otyp

ing

Rat

e



0

0.1

0.2

0.3

0.4

0.5

00.20.40.60.81


Exp

ecte

d Lo

ss

under H0under H1

Possible solution- Decision rule

! Bayes risk principle:# selects decision rule with minimal average loss# requires assumptions about a priori probabilities" for each SNP, probabilities of H0 and H1 have

to be determined


! Minimax principle:# selects decision rule with minimal maximum loss



0

0.1

0.2

0.3

0.4

0.5

00.20.40.60.81


Exp

ecte

d Lo

ss

under H0under H1


! Minimax principle:# selects decision rule with minimal maximum loss# in given setting with 0-1 loss function and higher

type II error rate: maximization of power" different loss function may be more reasonable

! Possible definition of loss function with respect to accuracy, cost, and publicity


correct, positive, expensivefalse, negative, expensive



false, positive, expensivecorrect, negative, expensive



false, negative, cheap



correct, negative, cheap



Test +

Test -

Test -

Test -

Test +

Test +

Test -

Test +

Test -

Test +

Test +

Test -

I yes

I yes

I yes

I yes

I yes

I yes

H1

H0

P

P

Test +

Test ?

Test -

Test +

Test ?

Test -

I no

I no

I no

I no

I no

I no

Possible solution- Construction of loss functions

! Assuming that individual genotyping is necessary for a positive result, 6 outcomes are possible:

Positive NegativeCost Correct False Correct FalseHighLow — —


! Outcome with highest loss = l1 =false $, negative %, expensive !!!

" L(l1) = 1! Outcome with lowest loss = l2 =

correct &, positive ', expensive !!!

" L(l2) = 0! Determine loss of l3 =

false $, negative %, cheap != L(l3)


! „Betting“ situation:# assume, l3 is given# you are asked to „pay“ it in order to play the

gamble

$%! for $%!!! with probability α

and &'!!! with probability (1-α)# At which α do you gamble?


! Assume you are willing to pay it to have

10% probability for &'!!! and

90% probability for $%!!!

" α = 0.9 " L(l3) = 0.9! Same procedure for other outcomes

Possible solution- Remaining questions

! How large is the measurement error that is due to the pooling? What is its distribution?

! How can reasonable loss functions be determined?

! Which decision rule should be employed? If necessary, how can we determine a prioriprobabilities?

analysis of pooled dna - a decision-theoretic model · analysis of pooled dna - a...

Documents