sampling strategies to control misclassification bias in longitudinal udder health studies
TRANSCRIPT
Sampling Strategies to Control Misclassification Bias in Longitudinal
Udder Health Studies
Denis Haine1 Ian Dohoo2 Daniel Scholl3 Henrik Stryhn2 Simon Dufour1
SVEPM — March 30, 2017
1Faculté de médecine vétérinaire, Université de Montréal2Atlantic Veterinary College, University of Prince Edward Island3College of Agriculture & Biological Sciences, South Dakota State University
Cohort Studies: Baseline and Follow-up
t0 t1
Test -
Test +
No disease Disease
TN
FN
FP
TP
Selection Bias
1/21
Cohort Studies: Baseline and Follow-up
t0 t1
Test -
Test +
No disease Disease
TN
FN Misclassification Bias
True IncidenceObservedIncidence
Based on Pekkanen et al. (2006), J. Clin. Epidemiol. 59, 281-2891/21
Sensitivity and Specificity
• Improve diagnostic• 2 tests
• in parallel (⊕
at 1 of 2 tests):↗ Se;↘ Sp• in series (
⊕at both tests):↘ Se;↗ Sp
• 3 tests
• Analytical solution by modelling
2/21
Objectives
• Estimate the impact of selection and misclassification biases• Incidence• Association
• Effect of number of samplings
2/21
Material & Methods
• Simulation of 100 cohorts
• With 2 samplings at 1 month interval (S1 & S2)
• Of 30 cows/herd, from 100 herds
• For these 2 scenarios:
S. aureus CNS
Prevalence < 5% 10–30%Incidence 1 NIMI/100 quarters-month ∼30 NIMI/100 quarters-monthSe1 ∼90% ∼60%Sp1 > 99% (100 CFU/ml) 95% (200 CFU/ml)
1Zadoks et al., 2001; Dohoo et al., 2011; Dufour et al., 2012a; Dufour et al., 2012b.
3/21
S1 S2
S′1 S′
2 Total Bias
S′1 S2 Selection Bias
S1 S′2 Misclassification Bias
• With Se and Sp as Beta distributions.
4/21
S1 S2
Sampling: duplicate duplicate triplicateInterpretation2: parallel series 2 out of 3
Se Sp Se Sp Se Sp
S. aureus -0.10 0 +0.10 0 0 0CNS -0.25 +0.05 +0.15 -0.05 0 +0.10
• With Se and Sp as Beta distributions.
2Dohoo et al., 2011.
4/21
• Poisson and logistic regressions• multi-level (quarter–cow–herd)
• Monte Carlo Markov Chain (MCMC) with Stan3
• called via R
• Cloud computing
3Carpenter et al., 2017.
5/21
0
100
200
300
400
0.0 0.5 1.0
Cases per 100 quarters
Den
sity
True incidence
Total bias
Selection bias only
Misclassificiation bias only
S. aureusBias assessment
6/21
0
100
200
300
400
0.0 0.5 1.0
Cases per 100 quarters
Den
sity
True incidence
Duplicate samples, single S1, parallel S2
Duplicate samples, parallel S1, single S2
Duplicate samples, parallel on S1 & S2
S. aureus
Bias control by duplicate sampling
7/21
0
100
200
300
400
0.0 0.5 1.0
Cases per 100 quarters
Den
sity
True incidence
Duplicate samples, single S1, series S2
Duplicate samples, series S1, single S2
Duplicate samples, series S1, parallel S2
Duplicate samples, series on S1 & S2
Duplicate samples, parallel S1, series S2
S. aureus
Bias control by duplicate sampling
8/21
0
100
200
300
400
0.0 0.5 1.0
Cases per 100 quarters
Den
sity
True incidence
Triplicate samples (S1 and S2)
S. aureus
Bias control by triplicate sampling
9/21
0
5
10
15
20
10 20 30 40 50
Cases per 100 quarters
Den
sity
True incidence
Total bias
Selection bias only
Misclassificiation bias only
CNSBias assessment
10/21
0.0
0.1
0.2
0.3
0.0 2.5 5.0 7.5 10.0
Odds ratio
Den
sity
True association
Total bias
Selection bias only
Misclassificiation bias only
S. aureusBias assessment
11/21
0.0
0.5
1.0
1.5
2.0
2.5
1 2 3 4 5 6
Odds ratio
Den
sity
True association
Total bias
Selection bias only
Misclassificiation bias only
CNSBias assessment
12/21
0.0
0.5
1.0
1.5
2.0
1 2 3 4 5 6
Odds ratio
Den
sity
True association
Duplicate samples, single S1, parallel S2
Duplicate samples, parallel S1, single S2
Duplicate samples, parallel on S1 & S2
CNS
Bias control by duplicate sampling
13/21
0.0
0.5
1.0
1.5
2.0
2.5
1 2 3 4 5 6
Odds ratio
Den
sity
True association
Duplicate samples, single S1, series S2
Duplicate samples, series S1, single S2
Duplicate samples, series S1, parallel S2
Duplicate samples, series on S1 & S2
Duplicate samples, parallel S1, series S2
CNS
Bias control by duplicate sampling
14/21
0.0
0.5
1.0
1.5
1 2 3 4 5 6
Odds ratio
Den
sity
True association
Triplicate samples (S1 and S2)
CNS
Bias control by triplicate sampling
15/21
Prevalence & Incidence Se Sp What?
Low Excellent Excellent Nothing!High Fair Excellent Bias!
• Misclassification bias (non-differential):• Bias towards null• Importance of Sp
16/21
(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)(0.96)
(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)(1.51)
1.0
1.2
1.4
1.6
1.8
2.0
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Specificity of test
App
aren
t rel
ativ
e ris
k
SENSITIVITY OF TEST
0.5
0.7
0.9
Cohort study
Bias as a function of sensitivity and specificity
Risk in population A = .10; Risk in population B = .05; True relative risk = 2.0Copeland et al. (1977), Am. J. Epidemiol. 105(5), 488−495
17/21
123456789
10
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Prevalence of exposure
App
aren
t rel
ativ
e ris
k
Se=0.80, Sp=0.99
Se=0.85, Sp=0.95
Se=0.90, Sp=0.90
Se=0.95, Sp=0.85
Se=0.99, Sp=0.80
True relative risk = 10
Apparent relative risk as a function of prevalence
Flegal et al. (1986), Am. J. Epidemiol. 123(4), 736−75118/21
• Improve Se at baseline ( test: rule out disease)
• Improve Sp at follow-up (⊕ test: rule in disease)
• Incorporate Se/Sp in modelling (Bayes)4
4McInturff et al., 2004.
19/21
Conclusion
• Increasing number of samples can (or cannot) prevent biases
• Evaluate biases with R package
https://github.com/dhaine/misclass
19/21
1 devtoo ls : : i n s t a l l _ g i t h u b ( ’ dhaine / misc lass ’ )2 l i b r a r y ( misc lass )3 s i m _ l i s t 1 ← vec to r ( ” l i s t ” , 100)4 r equ i re ( pbapply )5 set .seed (123)6 s i m _ l i s t ← r e p l i c a t e ( n = 100 ,7 expr = make_data (100 , 30 , ” saureus ” ) ,8 s i m p l i f y = FALSE)9 check_incidence ( s i m _ l i s t ,10 i t e r = 500 ,11 warmup = 100 ,12 chains = 4 ,13 cores = 4 ,14 seed = 123 ,15 nsimul = 100)
20/21
Thank you!
[email protected]@denishaine
https://github.com/dhaine/misclass
https://github.com/dhaine/plotBias for bias plots shown in Discussion (and more)https://cran.r-project.org/package=episensr R package for quantitative bias analysis
Images: Unsplash, Dairy Farmers of Canada21/21