analysis of pooled dna - a decision-theoretic model · analysis of pooled dna - a...

42
Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics Medical University of Lübeck [email protected] BAMM 2001, Heidelberg

Upload: others

Post on 14-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Analysis of pooled DNA -a decision-theoretic model

Inke R. König and Andreas ZieglerInstitute of Medical Biometry and Statistics

Medical University of Lü[email protected]

BAMM 2001, Heidelberg

Page 2: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Overview

! Association analyses using cases and independent controls

! Association analyses using pooled DNA ofcases and independent controls

! Problem setting: association studies for asthma affection status

! Possible solution: a decision theoretic approach

Page 3: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Association analyses in case-control designs- Principle

! Measure of deviation from independenttransmission of a locus and a disease

disease no disease

mM

Association =Specific allele at the genetic locus is more frequent in group ofaffecteds than in groupof non-affecteds

Page 4: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Association analyses in case-control designs- Principle

! Disease causing mutation D occured inspecific haplotype MD

! Haplotype significantly more frequent incases than expected:P(MD) - P(M)P(D) ≠ 0

! Deviation from expected independence= linkage disequilibrium

Page 5: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Association analyses in case-control designs- Advantage and disadvantage

Advantage:! Easier ascertainment of samples! Data of controls can be reused! Greater statistical power" Greater cost effectiveness

Disadvantage:! Possible population stratification

Page 6: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Association analyses in case-control designs- Challenges

! Small effects likely in complex diseases! Possibility of locus heterogeneity! Possibility of gene-gene and gene-

environment interactions! Low significance thresholds required" Large sample sizes necessary" Cost effective study designs required

Page 7: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Idea

! Equal amount of DNA from individual samples is pooled before genotyping

! Creation of pool for cases and for controls

!"

U!"

U

Page 8: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Idea

! Comparison of allele frequency estimates inthe two pools

! Requires high accuracy and reproducibility, as differences between cases and controls might be small

! Reduction of cost and amount of DNA

Page 9: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Expected reduction

! Assume to genotype 1,000 cases and 1,000 controls at 10,000 SNPs= 2⋅107 genotypings

! Using 384-well plates for PCR = 52,090 assay plates necessary

! DNA pooling and replicating each PCR 10times= 2⋅105 genotypings = 521 assay plates

Page 10: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Procedure

! Microsatellite markers or SNPs! Case-control and family samples! Estimation of allele frequencies

# by peak heights(e.g., Barcellos et al. 1997 Am J Hum Genet 61:734-47)

# or from allele image patterns(e.g., Daniels et al. 1998 Am J Hum Genet 62:1189-97)

Page 11: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Errors

! Possible sources of error:# low reproducibility of PCR results#marker specific unequal amplification of alleles# stutter bands in microsatellite markers# residual differences

Page 12: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Reproducibility

! Different results in multiple PCRs of same pool

! Recommendation to# genotype pool 5, 10, 16 times# construct triplicate pools and duplicate PCR"estimate allele frequency as mean or median of

obtained values

Page 13: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Unequal allele amplification

! Differential amplification of alleles! Recommended corrections:

# genotype heterozygotes individually# determine ratio k of peak heights, expected to

equal 1# correct frequency f of allele A to f = A/(A+kB)

with A and B as observed frequencies

Page 14: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Procedure

! Performed in 2 stages:# initial screen for association using pooled DNA# individual genotyping of markers with positive

results

Page 15: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Follow up with individual genotyping

! Corrections not sufficiently accurate for complex diseases " too many false positives and false

negatives! Possible population stratification

Page 16: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

DNA pooling - Follow up with individual genotyping

! Tests on allele frequencies appropriate only under Hardy-Weinberg equilibrium(Sasieni 1997 Biometrics 53:1253-61)

! No information about:# haplotypes# imprinting# specific genetic models# ...

Page 17: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Problem setting - Previous results on Asthma

! Genome scans for linkage of asthma affection status point to chromosome 6p in:#Caucasian families

(CSGA 1997 Nat Genet 15:389-392Xu et al. 2001 Am J Hum Genet 68:1437-1446)

#German and Swedish families(Wjst et al. 1999 Genomics 58:1-8)

# Japanese families(Yokouchi et al. 2000 Genomics 66:152-160)

Page 18: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Problem setting - Plan of association studies

! Conduct case-control study to analyze association in identified region

! Using of at least 1400 SNPs with distance <40 kb

! Use of pooled DNA as initial screen! Which SNPs are interesting to be followed

up with individual genotyping?

Page 19: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Problem setting- Previous criteria for follow up

! Mostly estimation of allele frequencies only(Hoogendoorn et al. 2000 Hum Genet 107:488-93)

! Modification of test statistic ...#without follow up (Le Hellard et al. 2001)# or without accounting for errors due to pooling

(Risch & Teng 1998 Genome Res 8:1273-88)

Page 20: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Problem setting- Previous criteria for follow up

! Pooled analysis as initial screen, criterion ...# „more liberal“ than significance criterion

(Daniels et al. 1998 Am J Hum Genet 62:1189-97)# or not explained

(Barcellos et al. 1997 Am J Hum Genet 61:734-47)

Page 21: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision model

! Aim to determine criterion that is optimal for given situation

! Modeling as decision theoretic problem in adecision chart

Page 22: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

! State of nature! Validity of H0 or H1

depends on a prioriprobability

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome

Page 23: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

! Result of pooled analysis ispositive, inconclusive, ornegative

! Result depends on power (sample size, effect size)

! Result depends on error of DNA pooling (size,distribution)

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome

Page 24: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

! Action! Decision for or against

individual genotypingat each marker

! Criterion to be established

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome

Page 25: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

! Result of individual genotyping positive ornegative

! Result depends on power

! Result depends oncriterion for previous decision

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome

Page 26: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

! Possible outcomes tobe evaluated

! Evaluation determines previous decisions

H0 or H1 is valid# #

Analysis of pooled DNA# + / ? / - #

Decision for individual genotyping# yes / no #

Analysis of individual DNA# + / - #

Outcome

Page 27: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

Test +

Test -

Test -

Test -

Test +

Test +

Test -

Test +

Test -

Test +

Test +

Test -

correct positivefalse negative

correct positivefalse negative

correct positivefalse negative

false positivecorrect negative

false positivecorrect, negative

false positivecorrect negative

false negative

false negative

false negative

correct negative

correct negative

correct negative

I yes

I yes

I yes

I yes

I yes

I yes

I no

I no

I no

I no

I no

I no

H1

H0

P

P

Test +

Test ?

Test -

Test +

Test ?

Test -

Page 28: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Implementation

! Aim: calculate loss (utility) under H0 and H1! Specify:

# number of cases and controls# allele frequency in cases and controls under H0

and H1# size and distribution of measurement error# loss function for: false positive, false negative,

correct positive, correct negative

Page 29: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Implementation

! For given number of replications:# Generates individual genotypes for cases and controls

under H0 and H1# Generates pooled genotypes by adding measurement

error# Calculates test statistic for pooled genotypes and

compares with each criterion# If criterion met, calculates test statistic for individual

genotypes# Calculates expected loss function values

Page 30: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Example

! p (cases H0) = p (controls) = 0.5! p (cases H1) = 0.7, error ~ N(0, 0.05)! 93 cases and controls for

α = 0.05 and (1-β) = 0.8! 0-1 Loss function:

H0 H1

Decision H0 0 1H1 1 0

State of Nature

Page 31: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Example

Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis

under H0under H1

0

0.2

0.4

0.6

0.8

1

00.20.40.60.81

Significance level for analysis of pooled data

Gen

otyp

ing

Rat

e

Page 32: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Example

Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis

0

0.1

0.2

0.3

0.4

0.5

00.20.40.60.81

Significance level for analysis of pooled data

Exp

ecte

d Lo

ss

under H0under H1

Page 33: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision rule

! Bayes risk principle:# selects decision rule with minimal average loss# requires assumptions about a priori probabilities" for each SNP, probabilities of H0 and H1 have

to be determined

Page 34: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision rule

! Minimax principle:# selects decision rule with minimal maximum loss

Page 35: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Example

Individual genotyping, if p of χ² statistic does not exceed significance level for pooled data analysis

0

0.1

0.2

0.3

0.4

0.5

00.20.40.60.81

Significance level for analysis of pooled data

Exp

ecte

d Lo

ss

under H0under H1

Page 36: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision rule

! Minimax principle:# selects decision rule with minimal maximum loss# in given setting with 0-1 loss function and higher

type II error rate: maximization of power" different loss function may be more reasonable

! Possible definition of loss function with respect to accuracy, cost, and publicity

Page 37: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Decision tree

correct, positive, expensivefalse, negative, expensive

correct, positive, expensivefalse, negative, expensive

correct, positive, expensivefalse, negative, expensive

false, positive, expensivecorrect, negative, expensive

false, positive, expensivecorrect, negative, expensive

false, positive, expensivecorrect, negative, expensive

false, negative, cheap

false, negative, cheap

false, negative, cheap

correct, negative, cheap

correct, negative, cheap

correct, negative, cheap

Test +

Test -

Test -

Test -

Test +

Test +

Test -

Test +

Test -

Test +

Test +

Test -

I yes

I yes

I yes

I yes

I yes

I yes

H1

H0

P

P

Test +

Test ?

Test -

Test +

Test ?

Test -

I no

I no

I no

I no

I no

I no

Page 38: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Construction of loss functions

! Assuming that individual genotyping is necessary for a positive result, 6 outcomes are possible:

Positive NegativeCost Correct False Correct FalseHighLow — —

Page 39: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Construction of loss functions

! Outcome with highest loss = l1 =false $, negative %, expensive !!!

" L(l1) = 1! Outcome with lowest loss = l2 =

correct &, positive ', expensive !!!

" L(l2) = 0! Determine loss of l3 =

false $, negative %, cheap != L(l3)

Page 40: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Construction of loss functions

! „Betting“ situation:# assume, l3 is given# you are asked to „pay“ it in order to play the

gamble

$%! for $%!!! with probability α

and &'!!! with probability (1-α)# At which α do you gamble?

Page 41: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Construction of loss functions

! Assume you are willing to pay it to have

10% probability for &'!!! and

90% probability for $%!!!

" α = 0.9 " L(l3) = 0.9! Same procedure for other outcomes

Page 42: Analysis of pooled DNA - a decision-theoretic model · Analysis of pooled DNA - a decision-theoretic model Inke R. König and Andreas Ziegler Institute of Medical Biometry and Statistics

Possible solution- Remaining questions

! How large is the measurement error that is due to the pooling? What is its distribution?

! How can reasonable loss functions be determined?

! Which decision rule should be employed? If necessary, how can we determine a prioriprobabilities?