probability sampling approach to editingq2008.istat.it/sessions/presentation/28/s2802ilves.pdf ·...

44
Probability Sampling Approach to Editing Maiki Ilves 1 , Prof. Thomas Laitila 2 1 Department of Statistics, ¨ Orebro University, Sweden 2 Department of Statistics, ¨ Orebro University and Statistics Sweden.

Upload: others

Post on 01-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability Sampling Approach toEditing

Maiki Ilves1, Prof. Thomas Laitila2

1 Department of Statistics, Orebro University, Sweden2Department of Statistics, Orebro University and Statistics Sweden.

Page 2: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Introduction

The role of editing:

1. To assess the quality of data

2. To improve the survey by identifying error sources

3. Correct errors

Probability Sampling Approach to Editing – p.1/16

Page 3: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Different ways of editing

Traditional micro-editing

Automated editing

Selective editing

Macro-editing

Probability Sampling Approach to Editing – p.2/16

Page 4: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 1

Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.

▽Probability Sampling Approach to Editing – p.3/16

Page 5: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 1

Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.

Three stages:

▽Probability Sampling Approach to Editing – p.3/16

Page 6: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 1

Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.

Three stages:

1. Find out suspicious responses- editing rules

▽Probability Sampling Approach to Editing – p.3/16

Page 7: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 1

Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.

Three stages:

1. Find out suspicious responses- editing rules

2. Prioritize- score function i.e. function of measured value andexpected amended value. Local score, global score.

▽Probability Sampling Approach to Editing – p.3/16

Page 8: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 1

Purpose: prioritize suspicious responses according totheir influence to the survey estimates and edit only themost influential responses.

Three stages:

1. Find out suspicious responses- editing rules

2. Prioritize- score function i.e. function of measured value andexpected amended value. Local score, global score.

3. Determine cut-off point- in simulation study based on fully edited dataset

Probability Sampling Approach to Editing – p.3/16

Page 9: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 2

Evaluation: relative pseudo-bias∣

θq − θ100

se(θ100)

q - percentage of suspicious responses pursued.

Probability Sampling Approach to Editing – p.4/16

Page 10: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 3

Advantages

+ Reduced costs+ Reduced response burden+ Gain in timeliness

▽Probability Sampling Approach to Editing – p.5/16

Page 11: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Selective editing - 3

Advantages

+ Reduced costs+ Reduced response burden+ Gain in timeliness

Disadvantages

- How to take into account the effect of editing in theestimation stage?- Influence of edited data when used in differentstatistical analysis is not known.- So far used only on quantitative variables.

Probability Sampling Approach to Editing – p.5/16

Page 12: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Estimating measurement bias

Literature: Madow (1965), Lessler and Kalsbeck (1992),Rao and Sitter (1997)

Bias estimation through double sampling or two-phasesampling. For all subsampled units the true values arerecorded and the difference between true values andobserved values is used for bias estimation.

Probability Sampling Approach to Editing – p.6/16

Page 13: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability sampling approach

Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.

▽Probability Sampling Approach to Editing – p.7/16

Page 14: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability sampling approach

Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.

U

▽Probability Sampling Approach to Editing – p.7/16

Page 15: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability sampling approach

Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.

U'

&

$

%

sa

▽Probability Sampling Approach to Editing – p.7/16

Page 16: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability sampling approach

Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.

U'

&

$

%

sa

U1 U2

sa1 sa2

▽Probability Sampling Approach to Editing – p.7/16

Page 17: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Probability sampling approach

Our idea: Combine selective editing with bias estimationand derive unbiased estimator and its variance for thisapproach.

U'

&

$

%

sa

U1 U2

sa1 sa2

&%'$

s2

Probability Sampling Approach to Editing – p.7/16

Page 18: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Unbiased estimator for edited data

Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit

k zk + (1 − Ieditk )xk, k ∈ U - observed value

after selective editing

▽Probability Sampling Approach to Editing – p.8/16

Page 19: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Unbiased estimator for edited data

Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit

k zk + (1 − Ieditk )xk, k ∈ U - observed value

after selective editingWe want to estimate tz =

U zk.

▽Probability Sampling Approach to Editing – p.8/16

Page 20: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Unbiased estimator for edited data

Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit

k zk + (1 − Ieditk )xk, k ∈ U - observed value

after selective editingWe want to estimate tz =

U zk.HT-estimator ty =

sa

yk/πak is biased.

▽Probability Sampling Approach to Editing – p.8/16

Page 21: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Unbiased estimator for edited data

Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit

k zk + (1 − Ieditk )xk, k ∈ U - observed value

after selective editingWe want to estimate tz =

U zk.HT-estimator ty =

sa

yk/πak is biased.Estimator of bias is

B(ty) =∑

s2

ek

πakπk|sa2

, ek = xk − zk.

▽Probability Sampling Approach to Editing – p.8/16

Page 22: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Unbiased estimator for edited data

Notation:zk, k ∈ U1 - true valuexk, k ∈ U2 - observed valueyk = Iedit

k zk + (1 − Ieditk )xk, k ∈ U - observed value

after selective editingWe want to estimate tz =

U zk.HT-estimator ty =

sa

yk/πak is biased.Estimator of bias is

B(ty) =∑

s2

ek

πakπk|sa2

, ek = xk − zk.

Bias corrected estimator is tz = ty − B(ty).

Probability Sampling Approach to Editing – p.8/16

Page 23: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Precision of the estimators

MSE(ty) = V (ty) + B2(ty).

MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))

▽Probability Sampling Approach to Editing – p.9/16

Page 24: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Precision of the estimators

MSE(ty) = V (ty) + B2(ty).

MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))

whereV (ty) =

∑ ∑

U

∆aklyk

πak

yl

πal

,(1)

▽Probability Sampling Approach to Editing – p.9/16

Page 25: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Precision of the estimators

MSE(ty) = V (ty) + B2(ty).

MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))

whereV (B(ty)) =

∑∑

U2

∆aklek

πak

el

πal

+(2)

+Ea

[

∑ ∑

U2

∆kl|sa2IakIal

ek

πakπk|sa2

el

πalπl|sa2

]

,

▽Probability Sampling Approach to Editing – p.9/16

Page 26: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Precision of the estimators

MSE(ty) = V (ty) + B2(ty).

MSE(tz) = V (ty) + V (B(ty)) − 2C(ty, B(ty))

whereC(ty, B(ty)) =

U

U2

∆aklyk

πak

el

πal

.

Probability Sampling Approach to Editing – p.9/16

Page 27: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

One example

One specific two-phase design is considered.

First phase sampling design: SI of size na,second phase sampling design: Poisson with inclusionprobability πk|sa2

.

▽Probability Sampling Approach to Editing – p.10/16

Page 28: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

One example

Then,

V (ty) = CS2ysa

,

V (B(ty)) = C

[

S2es2

+1

N − na

s2

(1 − πk|sa2)e2

k

]

,

C(ty, B(ty)) =C

na

[

s2

xkek −1

na − 1

sa

yk

s2

ek

]

,

where C = (1−fa)N2

na

, ek = ek/πk|sa2and

S2es2

= 1/(na − 1)(∑

s2e2k − 1/na(

s2ek)

2).

Probability Sampling Approach to Editing – p.10/16

Page 29: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: purpose

To compare survey estimates under two editingapproaches:

▽Probability Sampling Approach to Editing – p.11/16

Page 30: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: purpose

To compare survey estimates under two editingapproaches:Approach 1 - editing procedure where selective editingprocedure is applied;

▽Probability Sampling Approach to Editing – p.11/16

Page 31: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: purpose

To compare survey estimates under two editingapproaches:Approach 1 - editing procedure where selective editingprocedure is applied;Approach 2 - editing procedure where in addition toselective editing bias correction is carried out.

Probability Sampling Approach to Editing – p.11/16

Page 32: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000

▽Probability Sampling Approach to Editing – p.12/16

Page 33: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000Sample size: na = 1000

▽Probability Sampling Approach to Editing – p.12/16

Page 34: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)

▽Probability Sampling Approach to Editing – p.12/16

Page 35: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)

Observed values: x =

z, 0 ≤ u < 0.3

Po(5), 0.3 ≤ u < 0.6

Po(2), 0.6 ≤ u < 0.8

Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1).

▽Probability Sampling Approach to Editing – p.12/16

Page 36: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)

Observed values: x =

z, 0 ≤ u < 0.3

Po(5), 0.3 ≤ u < 0.6

Po(2), 0.6 ≤ u < 0.8

Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1).Second phase inclusion probability: πk|sa2

being constant

▽Probability Sampling Approach to Editing – p.12/16

Page 37: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: setup

Population size: N = 10000Sample size: na = 1000True values: z ∼ Po(5)

Observed values: x =

z, 0 ≤ u < 0.3

Po(5), 0.3 ≤ u < 0.6

Po(2), 0.6 ≤ u < 0.8

Po(10), 0.8 ≤ u ≤ 1

where u ∼ U(0, 1).Second phase inclusion probability: πk|sa2

being constantScore function: sk(x) = xk − µz

Probability Sampling Approach to Editing – p.12/16

Page 38: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: results

Table 1: Total estimate and its variance and MSE undertwo approaches (1000 repetition).

Approach 1 Approach 2Edited 20% 12% + 8%

t 46 448 50 067B(t) -3 572 (8% ) -47 (0% )V ar 495 508 5 213 649ˆV ar 495 830 5 143 426

MSE 13 254 435 5 213 649Emp. MSE 13 356 847 5 289 592

Emp. Root MSE 3 655 2 300

Probability Sampling Approach to Editing – p.13/16

Page 39: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: results 2

Instead of study variable z the correlated variable w isused in the score function.

Table 2: Size of bias and precision of the estimated totalunder two approaches (Corr(w, z) = 0.70,sk(w) = wk − µw, 1000 repetition).App. Edited B(ty) SE 95% CI Root MSE

1 16% 8% 1006 54388±1972 45169% +7% 0% 3738 50010±7326 3738

2 5% +11% 0.2% 2910 50076±5704 29100% +16% 0.2% 2376 50074±4657 2376

Probability Sampling Approach to Editing – p.14/16

Page 40: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: conclusions

Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.

▽Probability Sampling Approach to Editing – p.15/16

Page 41: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: conclusions

Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.

Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.

▽Probability Sampling Approach to Editing – p.15/16

Page 42: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: conclusions

Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.

Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.

Implementation: the same number of responses arepursued but difference is in timeliness and estimationprocedure.

▽Probability Sampling Approach to Editing – p.15/16

Page 43: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Simulation study: conclusions

Precision: two-phase approach generally givessmaller MSE, except when bias or subsample issmall.

Inference: two-phase approach enables to describethe effect of editing on the estimates, not possiblewith selective editing.

Implementation: the same number of responses arepursued but difference is in timeliness and estimationprocedure.

Improvement: consider different estimators andpossibilities to draw second phase sample moreefficiently.

Probability Sampling Approach to Editing – p.15/16

Page 44: Probability Sampling Approach to Editingq2008.istat.it/sessions/presentation/28/S2802Ilves.pdf · Selective editing - 1 Purpose: prioritize suspicious responses according to their

Thank you!

Probability Sampling Approach to Editing – p.16/16