chapter 7: cluster sampling design 2: two-stage...
TRANSCRIPT
Chapter 7: Cluster sampling design 2:Two-stage sampling
Jae-Kwang Kim
Iowa State University
Fall, 2014
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 1 / 26
Introduction
1 Introduction
2 Estimation
3 Examples
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 2 / 26
Introduction
Two-stage sampling
Setup:1 Stage 1: Draw AI ⊂ UI via pI (·)2 Stage 2: For every i ∈ AI , draw Ai ⊂ Ui via pi (· | AI )
Sample of elements: A = ∪i∈AIAi
Some simplifying assumptions1 Invariance of the second-stage design pi (· | AI ) = pi (·) for every i ∈ UI
and for every AI such that i ∈ AI
2 Independence of the second-stage design
P (∪i∈AIAi | AI ) =
∏i∈AI
Pr (Ai | AI )
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 3 / 26
Introduction
Remark: A non-invariant design is two-phase sampling design.1 Phase 1: Select a sample and observe xi2 Phase 2: Based on the observed value of xi , the second-phase sampling
design is determined. The second-phase sample is selected by thesecond-phase sampling design.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 4 / 26
Introduction
Notation: Sample size
nI : Number of PSU’s in the sample.mi : Number of sampled elements in Ai .∑
i∈AImi = |A|: The number of sampled elements.
Notation: Inclusion probability
Cluster inclusion probability: πIi and πIij (same as in the single-stagecluster sampling)Conditional inclusion probability:
πk|i = Pr [k ∈ Ai | i ∈ AI ]
πkl|i = Pr [k , l ∈ Ai | i ∈ AI ]
∆kl|i = πkl|i − πk|iπl|i .
In general, πk|i is a random variable (in the sense that it is a functionof AI ). Under invariance, it is fixed.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 5 / 26
Introduction
Element inclusion probability
First order inclusion probability
πik = Pr {(ik) ∈ A} = Pr (k ∈ Ai | i ∈ AI ) Pr (i ∈ AI ) = πk|iπIi .
Second order inclusion probability
πik,jl =
πIiπk|i if i = j and k = lπIiπkl |i if i = j and k 6= lπIijπk|iπl |j if i 6= j
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 6 / 26
Estimation
1 Introduction
2 Estimation
3 Examples
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 7 / 26
Estimation
HT estimation
HT estimation for Y =∑
i∈UIYi =
∑i∈UI
∑k∈Ui
yik :
YHT =∑i∈AI
Yi
πIi=∑i∈AI
∑k∈Ai
yikπk|iπIi
Properties of YHT1 Unbiased2 Variance
V(
YHT
)= VPSU + VSSU
where
VPSU =∑i∈UI
∑j∈UI
∆IijYi
πIi
Yj
πIj
VSSU =∑i∈UI
Vi
πIi, Vi = V
(Yi | Ai
)=∑k∈Ui
∑l∈Ui
∆kl|iyikπk|i
yilπl|i
.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 8 / 26
Estimation
Remark
1 If AI = UI , then the design is a stratified sampling. Note that πIi = 1,
πIij = 1, and ∆Iij = 0 for all i , j . Thus, V(
YHT
)=∑
i∈UIVi/1.
2 If Ai = Ui for every i ∈ AI , then the design is single-stage cluster
sampling and V(
YHT
)= VPSU .
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 9 / 26
Estimation
Variance estimation
Variance estimation
V(
YHT
)= VPSU + VSSU
=∑i∈AI
∑j∈AI
∆Iij
πIij
Yi
πIi
Yj
πIj+∑i∈AI
Vi
πIi,
where
VPSU =∑i∈AI
∑j∈AI
∆Iij
πIij
Yi
πIi
Yj
πIj−∑i∈AI
1
πIi
(1
πIi− 1
)Vi
VSSU =∑i∈AI
Vi
π2Ii
and Vi satisfies E(
Vi | AI
)= V
(Yi | AI
).
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 10 / 26
Estimation
Variance estimation (Cont’d)
Here, we used the fact
E(
Yi Yj | AI
)=
{YiYj if i 6= jVi + Y 2
i if i = j .
by the independence of the second-stage sampling across the clusters.
Often,∑
i∈AI
ViπIi
is ignored. (if nI/NI.
= 0).
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 11 / 26
Estimation
Justification
Let
V ∗ =∑i∈AI
∑j∈AI
∆Iij
πIij
Yi
πIi
Yj
πIj.
Then, we have
E{V ∗} = E1
∑i∈AI
∑j∈AI
∆Iij
πIij
1
πIiπIjE2(Yi Yj)
= E
∑i∈AI
∑j∈AI
∆Iij
πIij
Yi
πIi
Yj
πIj
+ E
∑i∈AI
∆Iii
πIii
Vi
π2Ii
=
∑i∈UI
∑j∈UI
∆IijYi
πIi
Yj
πIj+∑i∈UI
πIi − π2Iiπ2Ii
Vi = V (YHT )−∑i∈UI
Vi .
Biased downward. The bias is of order O(NI ).Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 12 / 26
Estimation
Example 1
Two-stage sampling design1 Stage One: Simple random sampling of clusters of size nI from NI
clusters.2 Stage Two: Simple random sampling of size mi from Mi elements in
the sampled cluster i
HT estimator of Y = N−1∑NI
i=1
∑Mij=1 yij , where N =
∑NIi=1 Mi is
assumed to be known:
ˆYHT =NI
nIN
∑i∈AI
Yi =1
nI M
∑i∈AI
Mi
mi
∑j∈Ai
yij
where M = N−1I
∑NIi=1 Mi .
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 13 / 26
Estimation
Example 1 (Cont’d)
Variance
V {YHT} = V
1
nI M
∑i∈AI
Yi
+ E
1
n2I M2
∑i∈AI
M2i
mi
(1− mi
Mi
)1
Mi − 1
∑k∈Ui
(yik − Yi )2
=
1
nI
(1− nI
NI
)S2q1 +
1
nINI M2
NI∑i=1
M2i
mi
(1− mi
Mi
)S22i
where S2q1 = (NI − 1)−1
∑NIi=1(qi − q1)2 with qi = Yi/M,
q1 = N−1I
∑NIi=1 qi , and S2
2i = (Mi − 1)−1∑
k∈Ui(yik − Yi )
2.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 14 / 26
Estimation
Example 1 (Cont’d)
If the sampling rate for the second stage sampling is constant suchthat mi/Mi = f2, then we can write
V {YHT} =1
nI(1− f1)S2
q1 +1
nI m(1− f2)
1
N
NI∑i=1
MiS22i
=1
nI(1− f1)B2 +
1
nI m(1− f2)W 2
where f1 = nI/NI and m = N−1I
∑NIi=1 mi .
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 15 / 26
Estimation
Example 1 (Cont’d)
Minimizing
V (YHT ) =1
nIB2 +
1
nI mW 2
subject toC = c0 + c1nI + c2nI m
lead to
mopt =
√c1c2× W 2
B2
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 16 / 26
Estimation
Example 1 (Cont’d)
If Mi are equal (Mi = M), then the following results are satisfied
We have B2 = S2b/M and W 2 = SSW ·M/(M − 1) · 1/(NIM) = S2
w .Thus, for sufficienty large M, we have S2 = B2 + W 2.
The homogeneity measure defined by
δ =B2
B2 + W 2
is equal to the intracluster correlation coefficient ρ.
Ignoring f1 term,
V { ˆYHT} =1
nI m{1 + (m − 1)δ}S2
y
= VSRS( ˆYHT ) {1 + (m − 1)δ}
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 17 / 26
Estimation
Example 1 (Cont’d)
Even when Mi are unequal, we can still express
V { ˆYHT} =1
nI m{1 + (m − 1)δ} kS2
= VSRS( ˆYHT )k {1 + (m − 1)δ}
where k = (B2 + W 2)/S2. Thus,
deff = k {1 + (m − 1)δ} .
If Mi = M, then k = 1. Otherwise, it is greater than one.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 18 / 26
Examples
1 Introduction
2 Estimation
3 Examples
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 19 / 26
Examples
Example 2 : Special case of Example 1
Sampling design1 Stage 1: Select SRS of clusters of size nI from a population of NI
clusters.2 Stage 2: Select SRS of elements of size m from a cluster of Mi = M
elements at each cluster.
HT estimator of Y =∑NI
i=1
∑Mj=1 yij/(NIM):
ˆY =1
nIm
∑i∈AI
∑j∈Ai
yij
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 20 / 26
Examples
Example 2 (Cont’d)
Variance
V(
ˆY)
=
(1− nI
NI
)S2b
nIM+(
1− m
M
) S2w
nIm
Or,
V(
ˆY)
=
(1− nI
NI
)S21
nI+(
1− m
M
) S22
nIm
where
S21 =
1
NI − 1
NI∑i=1
(Yi − Y
)2=
S2b
M
and
S22 = S2
w =1
NI (M − 1)
NI∑i=1
M∑j=1
(yij − Yi
)2.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 21 / 26
Examples
Example 2 (Cont’d)
Variance components:
S2b = S2 {1 + (M − 1) ρ}
S2w = S2 (1− ρ)
Ignoring f1 = nI/NI ,
V(
ˆY).
=S2
nIm{1 + (m − 1) ρ}
Thus, Design effect = 1 + (m − 1) ρ.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 22 / 26
Examples
Example 2 (Cont’d)
Variance estimation
V(
ˆY)
=
(1− nI
NI
)s21nI
+nI
NI
(1− m
M
) s22nIm
where
s21 = (nI − 1)−1∑i∈AI
(yi − ˆY
)2s22 = n−1I (m − 1)−1
∑i∈AI
∑j∈Ai
(yij − yi )2
and yi =∑
j∈Aiyij/m.
If nI/NI.
= 0, then V ( ˆY ) = s21/nI .
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 23 / 26
Examples
Example 3: Two-stage PPS sampling
Sampling design1 Stage One: PPS sampling of nI clusters with MOS = Mi
2 Stage Two: SRS sampling of m elements in each selected clusters.
Estimation of mean
ˆYPPS =1
nI
nI∑k=1
zk
where zk = ti/Mi if cluster i is selected in the k-th PPS sampling and
ti =Mi
m
∑j∈Ai
yij .
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 24 / 26
Examples
Example 3 (Cont’d)
We can expressˆYPPS =
1
nIm
∑i∈AI
∑j∈Ai
yij .
Self-weighting design: equal weights
Variance estimation
V(
ˆYPPS
)=
1
nIs2z
where
s2z =1
nI − 1
nI∑k=1
(zk − zn)2
and zk = ti/Mi = ˆYi if cluster i is selected in the k-th PPS sampling.
Very popular in practice.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 25 / 26
Examples
Example 3 (Cont’d)
Very popular in practice for the following reasons:1 pi ∝ Mi : efficient2 Point estimation is easy (self-weighting)3 Interviewer workloads are equal4 Simple variance estimation.5 Works for multi-stage sampling design.
Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 26 / 26