probability sampling approach to editingq2008.istat.it/sessions/presentation/28/s2802ilves.pdf ·...
TRANSCRIPT
Probability Sampling Approach toEditing
Maiki Ilves1, Prof. Thomas Laitila2
1 Department of Statistics, Orebro University, Sweden2Department of Statistics, Orebro University and Statistics Sweden.
Introduction
The role of editing:
1. To assess the quality of data
2. To improve the survey by identifying error sources
3. Correct errors
Probability Sampling Approach to Editing – p.1/16
Different ways of editing
Traditional micro-editing
Automated editing
Selective editing
Macro-editing
Probability Sampling Approach to Editing – p.2/16
Selective editing - 1
Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.
▽Probability Sampling Approach to Editing – p.3/16
Selective editing - 1
Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.
Three stages:
▽Probability Sampling Approach to Editing – p.3/16
Selective editing - 1
Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.
Three stages:
1. Find out suspicious responses- editing rules
▽Probability Sampling Approach to Editing – p.3/16
Selective editing - 1
Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.
Three stages:
1. Find out suspicious responses- editing rules
2. Prioritize- score function i.e. function of measured value andexpected amended value. Local score, global score.
▽Probability Sampling Approach to Editing – p.3/16
Selective editing - 1
Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.
Three stages:
1. Find out suspicious responses- editing rules
2. Prioritize- score function i.e. function of measured value andexpected amended value. Local score, global score.
3. Determine cut-off point- in simulation study based on fully edited dataset
Probability Sampling Approach to Editing – p.3/16
Selective editing - 2
Evaluation: relative pseudo-bias∣
∣
∣
∣
∣
θq − θ100
se(θ100)
∣
∣
∣
∣
∣
q - percentage of suspicious responses pursued.
Probability Sampling Approach to Editing – p.4/16
Selective editing - 3
Advantages
+ Reduced costs+ Reduced response burden+ Gain in timeliness
▽Probability Sampling Approach to Editing – p.5/16
Selective editing - 3
Advantages
+ Reduced costs+ Reduced response burden+ Gain in timeliness
Disadvantages
- How to take into account the effect of editing in theestimation stage?- Influence of edited data when used in differentstatistical analysis is not known.- So far used only on quantitative variables.
Probability Sampling Approach to Editing – p.5/16
Estimating measurement bias
Literature: Madow (1965), Lessler and Kalsbeck (1992),Rao and Sitter (1997)
Bias estimation through double sampling or two-phasesampling. For all subsampled units the true values arerecorded and the difference between true values andobserved values is used for bias estimation.
Probability Sampling Approach to Editing – p.6/16
Probability sampling approach
Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.
▽Probability Sampling Approach to Editing – p.7/16
Probability sampling approach
Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.
U
▽Probability Sampling Approach to Editing – p.7/16
Probability sampling approach
Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.
U'
&
$
%
sa
▽Probability Sampling Approach to Editing – p.7/16
Probability sampling approach
Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.
U'
&
$
%
sa
U1 U2
sa1 sa2
▽Probability Sampling Approach to Editing – p.7/16
Probability sampling approach
Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.
U'
&
$
%
sa
U1 U2
sa1 sa2
&%'$
s2
Probability Sampling Approach to Editing – p.7/16
Unbiased estimator for edited data
Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit
k zk + (1 − Ieditk )xk, k ∈ U - observed value
after selective editing
▽Probability Sampling Approach to Editing – p.8/16
Unbiased estimator for edited data
Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit
k zk + (1 − Ieditk )xk, k ∈ U - observed value
after selective editingWe want to estimate tz =
∑
U zk.
▽Probability Sampling Approach to Editing – p.8/16
Unbiased estimator for edited data
Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit
k zk + (1 − Ieditk )xk, k ∈ U - observed value
after selective editingWe want to estimate tz =
∑
U zk.HT-estimator ty =
∑
sa
yk/πak is biased.
▽Probability Sampling Approach to Editing – p.8/16
Unbiased estimator for edited data
Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit
k zk + (1 − Ieditk )xk, k ∈ U - observed value
after selective editingWe want to estimate tz =
∑
U zk.HT-estimator ty =
∑
sa
yk/πak is biased.Estimator of bias is
B(ty) =∑
s2
ek
πakπk|sa2
, ek = xk − zk.
▽Probability Sampling Approach to Editing – p.8/16
Unbiased estimator for edited data
Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit
k zk + (1 − Ieditk )xk, k ∈ U - observed value
after selective editingWe want to estimate tz =
∑
U zk.HT-estimator ty =
∑
sa
yk/πak is biased.Estimator of bias is
B(ty) =∑
s2
ek
πakπk|sa2
, ek = xk − zk.
Bias corrected estimator is tz = ty − B(ty).
Probability Sampling Approach to Editing – p.8/16
Precision of the estimators
MSE(ty) = V (ty) + B2(ty).
MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))
▽Probability Sampling Approach to Editing – p.9/16
Precision of the estimators
MSE(ty) = V (ty) + B2(ty).
MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))
whereV (ty) =
∑ ∑
U
∆aklyk
πak
yl
πal
,(1)
▽Probability Sampling Approach to Editing – p.9/16
Precision of the estimators
MSE(ty) = V (ty) + B2(ty).
MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))
whereV (B(ty)) =
∑∑
U2
∆aklek
πak
el
πal
+(2)
+Ea
[
∑ ∑
U2
∆kl|sa2IakIal
ek
πakπk|sa2
el
πalπl|sa2
]
,
▽Probability Sampling Approach to Editing – p.9/16
Precision of the estimators
MSE(ty) = V (ty) + B2(ty).
MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))
whereC(ty, B(ty)) =
∑
U
∑
U2
∆aklyk
πak
el
πal
.
Probability Sampling Approach to Editing – p.9/16
One example
One specific two-phase design is considered.
First phase sampling design: SI of size na,second phase sampling design: Poisson with inclusionprobability πk|sa2
.
▽Probability Sampling Approach to Editing – p.10/16
One example
Then,
V (ty) = CS2ysa
,
V (B(ty)) = C
[
S2es2
+1
N − na
∑
s2
(1 − πk|sa2)e2
k
]
,
C(ty, B(ty)) =C
na
[
∑
s2
xkek −1
na − 1
∑
sa
yk
∑
s2
ek
]
,
where C = (1−fa)N2
na
, ek = ek/πk|sa2and
S2es2
= 1/(na − 1)(∑
s2e2k − 1/na(
∑
s2ek)
2).
Probability Sampling Approach to Editing – p.10/16
Simulation study: purpose
To compare survey estimates under two editingapproaches:
▽Probability Sampling Approach to Editing – p.11/16
Simulation study: purpose
To compare survey estimates under two editingapproaches:Approach 1 - editing procedure where selective editingprocedure is applied;
▽Probability Sampling Approach to Editing – p.11/16
Simulation study: purpose
To compare survey estimates under two editingapproaches:Approach 1 - editing procedure where selective editingprocedure is applied;Approach 2 - editing procedure where in addition toselective editing bias correction is carried out.
Probability Sampling Approach to Editing – p.11/16
Simulation study: setup
Population size: N = 10000
▽Probability Sampling Approach to Editing – p.12/16
Simulation study: setup
Population size: N = 10000Sample size: na = 1000
▽Probability Sampling Approach to Editing – p.12/16
Simulation study: setup
Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)
▽Probability Sampling Approach to Editing – p.12/16
Simulation study: setup
Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)
Observed values: x =
z, 0 ≤ u < 0.3
Po(5), 0.3 ≤ u < 0.6
Po(2), 0.6 ≤ u < 0.8
Po(10), 0.8 ≤ u ≤ 1
where u ∼ U(0, 1).
▽Probability Sampling Approach to Editing – p.12/16
Simulation study: setup
Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)
Observed values: x =
z, 0 ≤ u < 0.3
Po(5), 0.3 ≤ u < 0.6
Po(2), 0.6 ≤ u < 0.8
Po(10), 0.8 ≤ u ≤ 1
where u ∼ U(0, 1).Second phase inclusion probability: πk|sa2
being constant
▽Probability Sampling Approach to Editing – p.12/16
Simulation study: setup
Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)
Observed values: x =
z, 0 ≤ u < 0.3
Po(5), 0.3 ≤ u < 0.6
Po(2), 0.6 ≤ u < 0.8
Po(10), 0.8 ≤ u ≤ 1
where u ∼ U(0, 1).Second phase inclusion probability: πk|sa2
being constantScore function: sk(x) = xk − µz
Probability Sampling Approach to Editing – p.12/16
Simulation study: results
Table 1: Total estimate and its variance and MSE undertwo approaches (1000 repetition).
Approach 1 Approach 2Edited 20% 12% + 8%
t 46 448 50 067B(t) -3 572 (8% ) -47 (0% )V ar 495 508 5 213 649ˆV ar 495 830 5 143 426
MSE 13 254 435 5 213 649Emp. MSE 13 356 847 5 289 592
Emp. Root MSE 3 655 2 300
Probability Sampling Approach to Editing – p.13/16
Simulation study: results 2
Instead of study variable z the correlated variable w isused in the score function.
Table 2: Size of bias and precision of the estimated totalunder two approaches (Corr(w, z) = 0.70,sk(w) = wk − µw, 1000 repetition).App. Edited B(ty) SE 95% CI Root MSE
1 16% 8% 1006 54388±1972 45169% +7% 0% 3738 50010±7326 3738
2 5% +11% 0.2% 2910 50076±5704 29100% +16% 0.2% 2376 50074±4657 2376
Probability Sampling Approach to Editing – p.14/16
Simulation study: conclusions
Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.
▽Probability Sampling Approach to Editing – p.15/16
Simulation study: conclusions
Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.
Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.
▽Probability Sampling Approach to Editing – p.15/16
Simulation study: conclusions
Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.
Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.
Implementation: the same number of responses arepursued but difference is in timeliness and estimationprocedure.
▽Probability Sampling Approach to Editing – p.15/16
Simulation study: conclusions
Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.
Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.
Implementation: the same number of responses arepursued but difference is in timeliness and estimationprocedure.
Improvement: consider different estimators andpossibilities to draw second phase sample moreefficiently.
Probability Sampling Approach to Editing – p.15/16
Thank you!
Probability Sampling Approach to Editing – p.16/16