analysis of selective dna pooling data in fox joanna szyda, magdalena zatoń-dobrowolska, heliodor...

24
ANALYSIS OF SELECTIVE DNA POOLING DATA IN FOX Joanna Szyda, Magdalena Zatoń- Dobrowolska, Heliodor Wierzbicki, Anna Rząsa

Upload: antony-morison

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

ANALYSIS OF SELECTIVE DNA POOLING DATA

IN FOX

Joanna Szyda, Magdalena Zatoń-Dobrowolska,

Heliodor Wierzbicki, Anna Rząsa

MAIN OBJECTIVES: ASSES

POLYMORPHISM OF MICROSATELLITES

IDENTIFY MARKER-TRAIT ASSOCIATIONS

METHODOLOGICAL OBJECTIVES:

TOOLS FOR THE ANALYSIS OF SPARSE DATA

SELECTIVE (INDIVIDUAL) GENOTYPING

MATERIAL METHODS RESULTS CONCLUSIONS

qq QQ

• MORE POWER• STANDARD (LINEAR) MODELS NOT VALID

SELECTIVE DNA POOLING

MATERIAL METHODS RESULTS CONCLUSIONS

qq QQ

M1 M2 M3 M4

QTLm1 M1 m1 m1m2 m2 m2 m2m3 m3 m3 m3M4 M4 m4 m4

M1 m1 M1 M1M2 M2 M2 M2M3 M3 M3 M3m4 m4 M4 M4

SELECTIVE DNA POOLING

MATERIAL METHODS RESULTS CONCLUSIONS

• CHEAP ~18%-60% more efficient (Barrat et al. 02)

• MORE POWERFULL ~10%-70% less individuals

• HIGH TECHNICAL ERROR DNA pool formation (DNA quantification)

DNA amplification (differential amplification, shadow bands)

• POOLING POPULATIONS: no relationship informationtesting for association

• POOLING HALFSIBS: partial relationship informationtesting for linkage

ANIMALS

MATERIAL METHODS RESULTS CONCLUSIONS

POLAR FOX (Alopex lagopus)

NORWEGIAN TYPE“LARGE”

FINNISH TYPE“SMALL”

6377

MARKERS

MARKER DOG GENOME FOX GENOME HET.REN112I02 01 ? 0.76C02.342 02 ? 0.77C03.629 03 ? 0.76FH2732 04 ? 0.86C05.771 05 ? 0.81FH2734 06 ? 0.82C08.410 08 ? 0.79G06401 09 ? 0.64REN153O12 12 ? 0.76REN227M12 13 ? 0.74FH2763 14 ? 0.70REN275L19 16 ? 0.82FH3047 17 ? 0.77REN100J13 20 ? 0.83REN128E21 22 ? 0.70LEI002 27 ? 0.70REN248F14 30 ? 0.70REN43H24 31 ? 0.66REN106I07 36 ? 0.78REN67C18 37 ? 0.83

MATERIAL METHODS RESULTS CONCLUSIONS

MARKERS

MATERIAL METHODS RESULTS CONCLUSIONS

MARKER SELECTION CRITERIA:

• POLYMORPHISM

number of alleles

allele lengths

• AMPLIFICATION PROPERTIES

temperature

?

0.0

0.2

0.4

0.6

0.8

1.0

M1 M2 M3 M4 M5 M6

MARKERS

FR

EQ

UE

NC

YMARKER ALLELE FREQUENCY IN POOLS

MATERIAL METHODS RESULTS CONCLUSIONS

0.0

0.2

0.4

0.6

0.8

1.0

M1 M2 M3 M4 M5 M6

MARKERS

FR

EQ

UE

NC

Y

0.0

0.2

0.4

0.6

0.8

1.0

M1 M2 M3 M4 M5 M6

MARKERS

FR

EQ

UE

NC

YMARKER ALLELE FREQUENCY IN POOLS

MATERIAL METHODS RESULTS CONCLUSIONS

• LOW POLYMORPHISM WITHIN EACH POOL• “POOL-SPECIFIC” ALLELES• POOR CORRESPONDENCE BETWEEN REPLICATES

BINOMIAL DISTRIBUTION

MATERIAL METHODS RESULTS CONCLUSIONS

allele

pool147 149 155 157 161

1 0 0 14 54 0

2 0 0 0 53 5

3 16 66 0 0 0

4 0 0 62 10 0

• BINOMIAL DISTRIBUTION

Odds Ratio, Logistic Regression

ijiij

i

NNi

N

ij

iiiijij N

NNNBiN

1,~

allele

pool147 149 155 157 161

1 n12 n12

2 n21 n22

3 n31 n32

4 n41 n42

ODDS RATIO

MATERIAL METHODS RESULTS CONCLUSIONS

ln (OR) = ln

ANAN

ANAN

2

2

1

1

ln (OR) = ln

• distribution ln (OR) ~ N (0,1)

• variance ln (OR) =

• confidence intervals ln (OR) ±

)(1

)(1

)(1

)(1

2211

ANANANAN

ORz ln̂

ODDS RATIO IN SPARSE DATA

MATERIAL METHODS RESULTS CONCLUSIONS

ln (OR) = ln

22

12

21

11

nn

nn

allele

pool147 149 155 157 161

1 0 0 14 54 0

2 0 0 0 53 5

3 16 66 0 0 0

4 0 0 62 10 0

SPARSE DATA PROBLEM

ln (OR) = ln

cncn

cncn

22

12

21

11

• c = 0 standard• c = 0.5 Haldane(55)

• cij= 2 (ni.n.j / n2 ) Bishop(75)

• Agresti (99):c=0.5 not valid for ln(OR)>4

cij not valid for ln(OR)>8

0.00

0.20

0.40

0.60

0.80

1.00M1 M2 M3 M4 M5 M6

MARKERS

P

DISCRODANT CONCORDANT

ODDS RATIO: P-values

MATERIAL METHODS RESULTS CONCLUSIONS

-4

-2

0

2

4

6

8

M1 M2 M3 M4 M5 M6

MARKERS

ln(O

R)

-4

-2

0

2

4

6

8

M1 M2 M3 M4 M5 M6

MARKERS

ln(O

R)

ODDS RATIO - CI

MATERIAL METHODS RESULTS CONCLUSIONS

0.01 CI FOR “CONCORDANT” POOLS

0.01 CI FOR “DISCORDANT” POOLS

ODDS RATIO - REMARKS

MATERIAL METHODS RESULTS CONCLUSIONS

• many 2x2 comparisons (theoretically) possible: 18 m4 – 60 m1,m6• significance pattern often inconsistent between alleles – sparse data• difficult to summarize ORs with a single value

marker

C03.629 association

C05.771 association

C08.410 ?

REN227M12 no association

REN275L19 ? (sparse data)

LEI002 ? (sparse data)

FURTHER WORK

MATERIAL METHODS RESULTS CONCLUSIONS

• use all table cells

• account for sparseness in

testing

• multivariate logistic models

MULTINOMIAL DISTRIBUTION

MATERIAL METHODS RESULTS CONCLUSIONS

allele

pool147 149 155 157 161

1 0 0 14 54 0

2 0 0 0 53 5

3 16 66 0 0 0

4 0 0 62 10 0

• MULTINOMIAL DISTRIBUTION

Multivariate Logistic Regression

)1(11

1

1

1

...11

111

...1...

......,...~...

niiii

ini

NNNn

N

ij

i

Nn

N

ij

iniiniini

N

N

N

NNNNMuNN

allele

pool147 149 155 157 161

1 n12 n12

2 n21 n22

3 n31 n32

4 n41 n42

allele

pool147 149 155 157 161

1 n11 n12 n13 n14 n15

2 n21 n22 n23 n24 n25

3 n31 n32 n33 n34 n35

4 n41 n42 n43 n44 n45

MODEL

MATERIAL METHODS RESULTS CONCLUSIONS

• GENERAL LOGISTIC MODEL

eXini

ij

in

ij

11 ...1lnln

• CONSIDERED MODELS FOR ALLELE FREQUENCIES

X

Xij exp1

exp

1 T

4 nT ...1

8 NnNFnFT ...... 11

16 nnT

441111 .........

TEST STATISTIC

MATERIAL METHODS RESULTS CONCLUSIONS

• MODEL SELECTION

ijij ˆ

• POWER DIVERGENCE FAMILY Cressie, Read (1984)

cellsn

iijijfD

.

1

,ˆ, 2.

1

2

~1

21

ˆˆ

2

cellsn

i ij

ij

ij

ij

cellsn

i ij

ijijD.

1

2

ˆ

ˆ1

cellsn

iijij

ij

ijijD

.

1

ˆˆ

ln20

Pearson’s X2

Likelihood Ratio Test

estimatedfrequencies

observedfrequenciesDATA MODEL

TEST STATISTIC

MATERIAL METHODS RESULTS CONCLUSIONS

• NORMALISATION

2~ D

SPARSE DATA !INCREASING

CELLS ASYMPTOTICS !

?

1,0~ ND

TD

D

TEST STATISTIC

MATERIAL METHODS RESULTS CONCLUSIONS

• ANALYTICAL

Osius, Rojek (1989): D(=1)

Farrington (1996): D(=1)+Copas (1989): a*D(=1)

• EMPIRICAL – Bootstrap, Jackknife• EVALUATION OF REAL DATA• NORMAL PROPERTIES - simulation

D ?

D ?

LITERATURE

MATERIAL METHODS RESULTS CONCLUSIONS

• Agresti, A. (1990) Categorical data analysis. New York, Chichester, Brisbane, Toronto,

Singapore. John Wiley & Sons.• Agresti, A. (1999) On logit confidence intervals for the odds ratio with small samples.

Biometrics 55:597-602.• Barratt, B. J., Payne, F., Rance, H. E. ,Nutland, S., Todd, J. A., Clayton, D. G. (2002)

Identification of the sources of error in allele frequency estimations from pooled DNA indicates

an optimal experimental design. Annals of Human Genetics 66:393-405.• Bishop, Y.M.M., Fienberg, S.E., Holland, P. (1975) Discrete multivariate analysis. Cambridge,

Massachusetts: MIT Press.• Copas, J.B. (1989) Unweighted Sum of Squares Test for Proportions. Applied Statistics 38:71-

80.• Cressie, N.A.C., Read, T.R.C. (1984) Multinomial goodness-of-t tests, Journal of the Royal

Statistical Society Ser.B 46: 440-464.• Farrington, C.P. (1996) On assessing goodness of fit of generalized linear models to sparse

data. Journal of the Royal Statistical Society Ser.B 58:349-360.• Haldane, J.B.S. (1956) The estimation and significance of the logarithm of a ratio of

frequencies. Annals of Human Genetics 20:309-311.• Osius, G., Rojek, D. (1989) Normal goodness-of-fit tests for parametric multinomial models

with large degrees of freedom. Fahbereich Mathematik/Informatik, Universitaet Bremen.

Mathematik Arbeitspapiere 36: