chapter 7: cluster sampling design 2: two-stage...

Chapter 7: Cluster sampling design 2:Two-stage sampling

Jae-Kwang Kim

Iowa State University

Fall, 2014

Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 1 / 26

Introduction

1 Introduction

2 Estimation

3 Examples


Introduction

Two-stage sampling

Setup:1 Stage 1: Draw AI ⊂ UI via pI (·)2 Stage 2: For every i ∈ AI , draw Ai ⊂ Ui via pi (· | AI )

Sample of elements: A = ∪i∈AIAi

Some simplifying assumptions1 Invariance of the second-stage design pi (· | AI ) = pi (·) for every i ∈ UI

and for every AI such that i ∈ AI

2 Independence of the second-stage design

P (∪i∈AIAi | AI ) =

∏i∈AI

Pr (Ai | AI )


Introduction

Remark: A non-invariant design is two-phase sampling design.1 Phase 1: Select a sample and observe xi2 Phase 2: Based on the observed value of xi , the second-phase sampling

design is determined. The second-phase sample is selected by thesecond-phase sampling design.


Introduction

Notation: Sample size

nI : Number of PSU’s in the sample.mi : Number of sampled elements in Ai .∑

i∈AImi = |A|: The number of sampled elements.

Notation: Inclusion probability

Cluster inclusion probability: πIi and πIij (same as in the single-stagecluster sampling)Conditional inclusion probability:

πk|i = Pr [k ∈ Ai | i ∈ AI ]

πkl|i = Pr [k , l ∈ Ai | i ∈ AI ]

∆kl|i = πkl|i − πk|iπl|i .

In general, πk|i is a random variable (in the sense that it is a functionof AI ). Under invariance, it is fixed.


Introduction

Element inclusion probability

First order inclusion probability

πik = Pr {(ik) ∈ A} = Pr (k ∈ Ai | i ∈ AI ) Pr (i ∈ AI ) = πk|iπIi .

Second order inclusion probability

πik,jl =

πIiπk|i if i = j and k = lπIiπkl |i if i = j and k 6= lπIijπk|iπl |j if i 6= j


Estimation

1 Introduction

2 Estimation

3 Examples


Estimation

HT estimation

HT estimation for Y =∑

i∈UIYi =

∑i∈UI

∑k∈Ui

yik :

YHT =∑i∈AI

Yi

πIi=∑i∈AI

∑k∈Ai

yikπk|iπIi

Properties of YHT1 Unbiased2 Variance

V(

YHT

)= VPSU + VSSU

where

VPSU =∑i∈UI

∑j∈UI

∆IijYi

πIi

Yj

πIj

VSSU =∑i∈UI

Vi

πIi, Vi = V

(Yi | Ai

)=∑k∈Ui

∑l∈Ui

∆kl|iyikπk|i

yilπl|i

.


Estimation

Remark

1 If AI = UI , then the design is a stratified sampling. Note that πIi = 1,

πIij = 1, and ∆Iij = 0 for all i , j . Thus, V(

YHT

)=∑

i∈UIVi/1.

2 If Ai = Ui for every i ∈ AI , then the design is single-stage cluster

sampling and V(

YHT

)= VPSU .


Estimation

Variance estimation

Variance estimation

V(

YHT

)= VPSU + VSSU

=∑i∈AI

∑j∈AI

∆Iij

πIij

Yi

πIi

Yj

πIj+∑i∈AI

Vi

πIi,

where

VPSU =∑i∈AI

∑j∈AI

∆Iij

πIij

Yi

πIi

Yj

πIj−∑i∈AI

1

πIi

(1

πIi− 1

)Vi

VSSU =∑i∈AI

Vi

π2Ii

and Vi satisfies E(

Vi | AI

)= V

(Yi | AI

).


Estimation

Variance estimation (Cont’d)

Here, we used the fact

E(

Yi Yj | AI

)=

{YiYj if i 6= jVi + Y 2

i if i = j .

by the independence of the second-stage sampling across the clusters.

Often,∑

i∈AI

ViπIi

is ignored. (if nI/NI.

= 0).


Estimation

Justification

Let

V ∗ =∑i∈AI

∑j∈AI

∆Iij

πIij

Yi

πIi

Yj

πIj.

Then, we have

E{V ∗} = E1

∑i∈AI

∑j∈AI

∆Iij

πIij

1

πIiπIjE2(Yi Yj)

= E

∑i∈AI

∑j∈AI

∆Iij

πIij

Yi

πIi

Yj

πIj

+ E

∑i∈AI

∆Iii

πIii

Vi

π2Ii

=

∑i∈UI

∑j∈UI

∆IijYi

πIi

Yj

πIj+∑i∈UI

πIi − π2Iiπ2Ii

Vi = V (YHT )−∑i∈UI

Vi .

Biased downward. The bias is of order O(NI ).Kim (ISU) Ch. 7: Cluster sampling design 2 Fall, 2014 12 / 26

Estimation

Example 1

Two-stage sampling design1 Stage One: Simple random sampling of clusters of size nI from NI

clusters.2 Stage Two: Simple random sampling of size mi from Mi elements in

the sampled cluster i

HT estimator of Y = N−1∑NI

i=1

∑Mij=1 yij , where N =

∑NIi=1 Mi is

assumed to be known:

ˆYHT =NI

nIN

∑i∈AI

Yi =1

nI M

∑i∈AI

Mi

mi

∑j∈Ai

yij

where M = N−1I

∑NIi=1 Mi .


Estimation

Example 1 (Cont’d)

Variance

V {YHT} = V

1

nI M

∑i∈AI

Yi

+ E

1

n2I M2

∑i∈AI

M2i

mi

(1− mi

Mi

)1

Mi − 1

∑k∈Ui

(yik − Yi )2

=

1

nI

(1− nI

NI

)S2q1 +

1

nINI M2

NI∑i=1

M2i

mi

(1− mi

Mi

)S22i

where S2q1 = (NI − 1)−1

∑NIi=1(qi − q1)2 with qi = Yi/M,

q1 = N−1I

∑NIi=1 qi , and S2

2i = (Mi − 1)−1∑

k∈Ui(yik − Yi )

2.


Estimation


If the sampling rate for the second stage sampling is constant suchthat mi/Mi = f2, then we can write

V {YHT} =1

nI(1− f1)S2

q1 +1

nI m(1− f2)

1

N

NI∑i=1

MiS22i

=1

nI(1− f1)B2 +

1

nI m(1− f2)W 2

where f1 = nI/NI and m = N−1I

∑NIi=1 mi .


Estimation


Minimizing

V (YHT ) =1

nIB2 +

1

nI mW 2

subject toC = c0 + c1nI + c2nI m

lead to

mopt =

√c1c2× W 2

B2


Estimation


If Mi are equal (Mi = M), then the following results are satisfied

We have B2 = S2b/M and W 2 = SSW ·M/(M − 1) · 1/(NIM) = S2

w .Thus, for sufficienty large M, we have S2 = B2 + W 2.

The homogeneity measure defined by

δ =B2

B2 + W 2

is equal to the intracluster correlation coefficient ρ.

Ignoring f1 term,

V { ˆYHT} =1

nI m{1 + (m − 1)δ}S2

y

= VSRS( ˆYHT ) {1 + (m − 1)δ}


Estimation


Even when Mi are unequal, we can still express

V { ˆYHT} =1

nI m{1 + (m − 1)δ} kS2

= VSRS( ˆYHT )k {1 + (m − 1)δ}

where k = (B2 + W 2)/S2. Thus,

deff = k {1 + (m − 1)δ} .

If Mi = M, then k = 1. Otherwise, it is greater than one.


Examples

1 Introduction

2 Estimation

3 Examples


Examples

Example 2 : Special case of Example 1

Sampling design1 Stage 1: Select SRS of clusters of size nI from a population of NI

clusters.2 Stage 2: Select SRS of elements of size m from a cluster of Mi = M

elements at each cluster.

HT estimator of Y =∑NI

i=1

∑Mj=1 yij/(NIM):

ˆY =1

nIm

∑i∈AI

∑j∈Ai

yij


Examples


Variance

V(

ˆY)

=

(1− nI

NI

)S2b

nIM+(

1− m

M

) S2w

nIm

Or,

V(

ˆY)

=

(1− nI

NI

)S21

nI+(

1− m

M

) S22

nIm

where

S21 =

1

NI − 1

NI∑i=1

(Yi − Y

)2=

S2b

M

and

S22 = S2

w =1

NI (M − 1)

NI∑i=1

M∑j=1

(yij − Yi

)2.


Examples


Variance components:

S2b = S2 {1 + (M − 1) ρ}

S2w = S2 (1− ρ)

Ignoring f1 = nI/NI ,

V(

ˆY).

=S2

nIm{1 + (m − 1) ρ}

Thus, Design effect = 1 + (m − 1) ρ.


Examples


Variance estimation

V(

ˆY)

=

(1− nI

NI

)s21nI

+nI

NI

(1− m

M

) s22nIm

where

s21 = (nI − 1)−1∑i∈AI

(yi − ˆY

)2s22 = n−1I (m − 1)−1

∑i∈AI

∑j∈Ai

(yij − yi )2

and yi =∑

j∈Aiyij/m.

If nI/NI.

= 0, then V ( ˆY ) = s21/nI .


Examples

Example 3: Two-stage PPS sampling

Sampling design1 Stage One: PPS sampling of nI clusters with MOS = Mi

2 Stage Two: SRS sampling of m elements in each selected clusters.

Estimation of mean

ˆYPPS =1

nI

nI∑k=1

zk

where zk = ti/Mi if cluster i is selected in the k-th PPS sampling and

ti =Mi

m

∑j∈Ai

yij .


Examples


We can expressˆYPPS =

1

nIm

∑i∈AI

∑j∈Ai

yij .

Self-weighting design: equal weights

Variance estimation

V(

ˆYPPS

)=

1

nIs2z

where

s2z =1

nI − 1

nI∑k=1

(zk − zn)2

and zk = ti/Mi = ˆYi if cluster i is selected in the k-th PPS sampling.

Very popular in practice.


Examples


Very popular in practice for the following reasons:1 pi ∝ Mi : efficient2 Point estimation is easy (self-weighting)3 Interviewer workloads are equal4 Simple variance estimation.5 Works for multi-stage sampling design.


chapter 7: cluster sampling design 2: two-stage...

Documents