analysis of selective dna pooling data in fox joanna szyda, magdalena zatoń-dobrowolska, heliodor...
TRANSCRIPT
ANALYSIS OF SELECTIVE DNA POOLING DATA
IN FOX
Joanna Szyda, Magdalena Zatoń-Dobrowolska,
Heliodor Wierzbicki, Anna Rząsa
MAIN OBJECTIVES: ASSES
POLYMORPHISM OF MICROSATELLITES
IDENTIFY MARKER-TRAIT ASSOCIATIONS
METHODOLOGICAL OBJECTIVES:
TOOLS FOR THE ANALYSIS OF SPARSE DATA
SELECTIVE (INDIVIDUAL) GENOTYPING
MATERIAL METHODS RESULTS CONCLUSIONS
qq QQ
• MORE POWER• STANDARD (LINEAR) MODELS NOT VALID
SELECTIVE DNA POOLING
MATERIAL METHODS RESULTS CONCLUSIONS
qq QQ
M1 M2 M3 M4
QTLm1 M1 m1 m1m2 m2 m2 m2m3 m3 m3 m3M4 M4 m4 m4
M1 m1 M1 M1M2 M2 M2 M2M3 M3 M3 M3m4 m4 M4 M4
SELECTIVE DNA POOLING
MATERIAL METHODS RESULTS CONCLUSIONS
• CHEAP ~18%-60% more efficient (Barrat et al. 02)
• MORE POWERFULL ~10%-70% less individuals
• HIGH TECHNICAL ERROR DNA pool formation (DNA quantification)
DNA amplification (differential amplification, shadow bands)
• POOLING POPULATIONS: no relationship informationtesting for association
• POOLING HALFSIBS: partial relationship informationtesting for linkage
ANIMALS
MATERIAL METHODS RESULTS CONCLUSIONS
POLAR FOX (Alopex lagopus)
NORWEGIAN TYPE“LARGE”
FINNISH TYPE“SMALL”
6377
MARKERS
MARKER DOG GENOME FOX GENOME HET.REN112I02 01 ? 0.76C02.342 02 ? 0.77C03.629 03 ? 0.76FH2732 04 ? 0.86C05.771 05 ? 0.81FH2734 06 ? 0.82C08.410 08 ? 0.79G06401 09 ? 0.64REN153O12 12 ? 0.76REN227M12 13 ? 0.74FH2763 14 ? 0.70REN275L19 16 ? 0.82FH3047 17 ? 0.77REN100J13 20 ? 0.83REN128E21 22 ? 0.70LEI002 27 ? 0.70REN248F14 30 ? 0.70REN43H24 31 ? 0.66REN106I07 36 ? 0.78REN67C18 37 ? 0.83
MATERIAL METHODS RESULTS CONCLUSIONS
MARKERS
MATERIAL METHODS RESULTS CONCLUSIONS
MARKER SELECTION CRITERIA:
• POLYMORPHISM
number of alleles
allele lengths
• AMPLIFICATION PROPERTIES
temperature
?
0.0
0.2
0.4
0.6
0.8
1.0
M1 M2 M3 M4 M5 M6
MARKERS
FR
EQ
UE
NC
YMARKER ALLELE FREQUENCY IN POOLS
MATERIAL METHODS RESULTS CONCLUSIONS
0.0
0.2
0.4
0.6
0.8
1.0
M1 M2 M3 M4 M5 M6
MARKERS
FR
EQ
UE
NC
Y
0.0
0.2
0.4
0.6
0.8
1.0
M1 M2 M3 M4 M5 M6
MARKERS
FR
EQ
UE
NC
YMARKER ALLELE FREQUENCY IN POOLS
MATERIAL METHODS RESULTS CONCLUSIONS
• LOW POLYMORPHISM WITHIN EACH POOL• “POOL-SPECIFIC” ALLELES• POOR CORRESPONDENCE BETWEEN REPLICATES
BINOMIAL DISTRIBUTION
MATERIAL METHODS RESULTS CONCLUSIONS
allele
pool147 149 155 157 161
1 0 0 14 54 0
2 0 0 0 53 5
3 16 66 0 0 0
4 0 0 62 10 0
• BINOMIAL DISTRIBUTION
Odds Ratio, Logistic Regression
ijiij
i
NNi
N
ij
iiiijij N
NNNBiN
1,~
allele
pool147 149 155 157 161
1 n12 n12
2 n21 n22
3 n31 n32
4 n41 n42
ODDS RATIO
MATERIAL METHODS RESULTS CONCLUSIONS
ln (OR) = ln
ANAN
ANAN
2
2
1
1
ln (OR) = ln
• distribution ln (OR) ~ N (0,1)
• variance ln (OR) =
• confidence intervals ln (OR) ±
)(1
)(1
)(1
)(1
2211
ANANANAN
ORz ln̂
ODDS RATIO IN SPARSE DATA
MATERIAL METHODS RESULTS CONCLUSIONS
ln (OR) = ln
22
12
21
11
nn
nn
allele
pool147 149 155 157 161
1 0 0 14 54 0
2 0 0 0 53 5
3 16 66 0 0 0
4 0 0 62 10 0
SPARSE DATA PROBLEM
ln (OR) = ln
cncn
cncn
22
12
21
11
• c = 0 standard• c = 0.5 Haldane(55)
• cij= 2 (ni.n.j / n2 ) Bishop(75)
• Agresti (99):c=0.5 not valid for ln(OR)>4
cij not valid for ln(OR)>8
0.00
0.20
0.40
0.60
0.80
1.00M1 M2 M3 M4 M5 M6
MARKERS
P
DISCRODANT CONCORDANT
ODDS RATIO: P-values
MATERIAL METHODS RESULTS CONCLUSIONS
-4
-2
0
2
4
6
8
M1 M2 M3 M4 M5 M6
MARKERS
ln(O
R)
-4
-2
0
2
4
6
8
M1 M2 M3 M4 M5 M6
MARKERS
ln(O
R)
ODDS RATIO - CI
MATERIAL METHODS RESULTS CONCLUSIONS
0.01 CI FOR “CONCORDANT” POOLS
0.01 CI FOR “DISCORDANT” POOLS
ODDS RATIO - REMARKS
MATERIAL METHODS RESULTS CONCLUSIONS
• many 2x2 comparisons (theoretically) possible: 18 m4 – 60 m1,m6• significance pattern often inconsistent between alleles – sparse data• difficult to summarize ORs with a single value
marker
C03.629 association
C05.771 association
C08.410 ?
REN227M12 no association
REN275L19 ? (sparse data)
LEI002 ? (sparse data)
FURTHER WORK
MATERIAL METHODS RESULTS CONCLUSIONS
• use all table cells
• account for sparseness in
testing
• multivariate logistic models
MULTINOMIAL DISTRIBUTION
MATERIAL METHODS RESULTS CONCLUSIONS
allele
pool147 149 155 157 161
1 0 0 14 54 0
2 0 0 0 53 5
3 16 66 0 0 0
4 0 0 62 10 0
• MULTINOMIAL DISTRIBUTION
Multivariate Logistic Regression
)1(11
1
1
1
...11
111
...1...
......,...~...
niiii
ini
NNNn
N
ij
i
Nn
N
ij
iniiniini
N
N
N
NNNNMuNN
allele
pool147 149 155 157 161
1 n12 n12
2 n21 n22
3 n31 n32
4 n41 n42
allele
pool147 149 155 157 161
1 n11 n12 n13 n14 n15
2 n21 n22 n23 n24 n25
3 n31 n32 n33 n34 n35
4 n41 n42 n43 n44 n45
MODEL
MATERIAL METHODS RESULTS CONCLUSIONS
• GENERAL LOGISTIC MODEL
eXini
ij
in
ij
11 ...1lnln
• CONSIDERED MODELS FOR ALLELE FREQUENCIES
X
Xij exp1
exp
1 T
4 nT ...1
8 NnNFnFT ...... 11
16 nnT
441111 .........
TEST STATISTIC
MATERIAL METHODS RESULTS CONCLUSIONS
• MODEL SELECTION
ijij ˆ
• POWER DIVERGENCE FAMILY Cressie, Read (1984)
cellsn
iijijfD
.
1
,ˆ, 2.
1
2
~1
21
ˆˆ
2
cellsn
i ij
ij
ij
ij
cellsn
i ij
ijijD.
1
2
ˆ
ˆ1
cellsn
iijij
ij
ijijD
.
1
ˆˆ
ln20
Pearson’s X2
Likelihood Ratio Test
estimatedfrequencies
observedfrequenciesDATA MODEL
TEST STATISTIC
MATERIAL METHODS RESULTS CONCLUSIONS
• NORMALISATION
2~ D
SPARSE DATA !INCREASING
CELLS ASYMPTOTICS !
?
1,0~ ND
TD
D
TEST STATISTIC
MATERIAL METHODS RESULTS CONCLUSIONS
• ANALYTICAL
Osius, Rojek (1989): D(=1)
Farrington (1996): D(=1)+Copas (1989): a*D(=1)
• EMPIRICAL – Bootstrap, Jackknife• EVALUATION OF REAL DATA• NORMAL PROPERTIES - simulation
D ?
D ?
LITERATURE
MATERIAL METHODS RESULTS CONCLUSIONS
• Agresti, A. (1990) Categorical data analysis. New York, Chichester, Brisbane, Toronto,
Singapore. John Wiley & Sons.• Agresti, A. (1999) On logit confidence intervals for the odds ratio with small samples.
Biometrics 55:597-602.• Barratt, B. J., Payne, F., Rance, H. E. ,Nutland, S., Todd, J. A., Clayton, D. G. (2002)
Identification of the sources of error in allele frequency estimations from pooled DNA indicates
an optimal experimental design. Annals of Human Genetics 66:393-405.• Bishop, Y.M.M., Fienberg, S.E., Holland, P. (1975) Discrete multivariate analysis. Cambridge,
Massachusetts: MIT Press.• Copas, J.B. (1989) Unweighted Sum of Squares Test for Proportions. Applied Statistics 38:71-
80.• Cressie, N.A.C., Read, T.R.C. (1984) Multinomial goodness-of-t tests, Journal of the Royal
Statistical Society Ser.B 46: 440-464.• Farrington, C.P. (1996) On assessing goodness of fit of generalized linear models to sparse
data. Journal of the Royal Statistical Society Ser.B 58:349-360.• Haldane, J.B.S. (1956) The estimation and significance of the logarithm of a ratio of
frequencies. Annals of Human Genetics 20:309-311.• Osius, G., Rojek, D. (1989) Normal goodness-of-fit tests for parametric multinomial models
with large degrees of freedom. Fahbereich Mathematik/Informatik, Universitaet Bremen.
Mathematik Arbeitspapiere 36: