1 ch 5: cluster sampling with equal probabilities defn: a cluster is a group of observation units...
TRANSCRIPT
1
Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of
observation units (or “elements”)
Population Obs Unit Cluster
U.S. residents person household
Lincoln households household city block, or postal route
UNL employees employee department
Maple trees in Vermont
tree 1 km 1 km plot
2
Cluster sample DEFN: A cluster sample is a
probability sample in which a sampling unit is a cluster
Frame SU OU List of phone numbers phone number person List of blocks block household List of UNL departments department faculty member List of plots plot tree
3
Cluster sample – 2 1-stage cluster sampling
Divide the population (of K elements) into N clusters (of size Mi for cluster i)
Cluster = group of elements An element belongs to 1 and only 1 cluster
Sampling unit Cluster = group of elements = PSU = primary
sampling unit We’ll start by assuming a SRS of clusters (equal prob) Can use any design to select clusters (STS, PPS) –
we’ll work with other designs in Ch 6 Data collection
Collect information on ALL elements in the cluster
4
1-stage CS STS
Take an SRS f rom ever stratum:Take an SRS of clusters; observe all elements within the clusters in thesample:
A block of cells is a stratum
A block of cells is a clusterSU is a cluster
Don’t sample from every cluster
SU is an element (or OU)
Sample from every stratum
Sample of 40 elements
5
Cluster vs. stratified sampling Cluster sample
Divide K elements into N clusters Cluster or PSU i has Mi elements
Take a sample of n clusters Stratified sampling
N elements divided into H strata An element belongs to 1 and only 1 stratum
Take a sample of n elements, consisting of nh elements from stratum h for each of the H strata
N
iiMK
1
6
Cluster sample – 3 2-stage cluster sampling (later)
Process Select PSUs (stage 1) Select elements within each sampled PSU (stage
2) First stage sampling unit is a …
PSU = primary sampling unit = cluster Second stage sampling unit is a …
SSU = secondary sampling unit = element = OU Only collect data on the SSUs that were
sampled from the cluster
7
1-stage vs. 2-stage cluster sampling
Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:
1-stage cluster sample (stop here)
OR
Stage 1 of 2-stage cluster sample(select PSUs)
Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)
8
Why use cluster sampling? May not have a list of OUs for a frame, but a list
of clusters may be available List of Lincoln phone numbers (= group of residents) is
available, but a list of Lincoln residents is not available List of all NE primary and secondary schools (= group
of students) is available, but a list of all students in NE schools is not available
May be cheaper to conduct the study if OUs are clustered
Occurs when cost of data collection increases with distance between elements
Household surveys using in-person interviews (household = cluster of people)
Field data collection (plot = cluster of plants, or animals)
9
Defining clusters due to frame limitations A cluster (or PSU) is a group of
elements corresponding to a record (row) in the frame
Example Population = employees in
McDonald’s franchises Element = employee Frame = list of McDonald’s stores PSU = store = cluster of employees
10
Defining clusters to reduce travel costs A cluster (or PSU) is a group of
nearby elements Example
Population = all farms Element = farm Frame = list of sections (1 mi x 1 mi
areas) in rural area PSU = section = cluster of farms
11
Cluster samples usually lead to less precise estimates Elements within clusters tend to be correlated
due to exposure to similar conditions Members of a household Employees in a business Plants or soil within a field plot
We are getting less information than if selected same number of unrelated elements
Select sample of city blocks (clusters of households) Ask each household:
Should city upgrade storm sewer system? PSU (city block) 1
No storm sewer households will tend to say yes PSU (city block) 2
New development households will tend to say no
12
Defining clusters for improved precision Define clusters for which within-cluster
variation is high (rarely possible) Make each cluster as heterogeneous as possible
Like making each cluster a mini-population that reflects variation in population
Minimizes the amount of correlation among elements in the cluster
Opposite of the approach to stratification Large variation among strata, homogeneous within
strata Define clusters that are relatively small
Extreme case is cluster = element Decreasing the number of correlated
observations in the sample
13
Example for single-stage cluster sampling w/ equal prob (CSE1) Dorm has N = 100 suites (clusters) Each suite has Mi = 4 students (4 elements
in cluster i , i = 1, 2, … , N) Note that there are
Take SRS n = 5 suites (clusters) Ask each student living in each of the 5
suites How many nights per week do you eat dinner in
the dining hall? Will get observations from a sample of 20
students = 5 suites x 4 students/suite
population in students 400)4(1001
N
iiMK
14
Dorm example – 2
Stu-dent
Suite 6
Suite 21
Suite 28
Suite 54
Suite 89
1 5 3 6 5 1
2 5 2 4 4 4
3 4 4 4 6 3
4 6 5 5 6 2
Total 20 14 19 21 10
15
Dorm example – 3 SRS of n = 5 dorm rooms Data on each cluster (all students in dorm
room) ti = total number of dining hall dinners for dorm
room i t2 = 14 dining hall dinners for 4 students in dorm
room 2 Estimated total number of dining hall nights
for the dorm students HT estimator of total = pop size x sample mean (of
cluster totals)dinners hall dining 1680)8.16(100
)1021191420(51
1001ˆ
1
n
iiunb t
nNt
16
Notation Indices
i = index for PSU i i j = index for SSU j in PSU i
Number of PSUs (clusters) in the population N clusters
Number of SSUs (elements) in a PSU (cluster) Mi elements
Number of SSUs (elements) in the polulation
In Chapters 1-4, this was designated as N elements
1
N
iiMK
17
Notation – 2
N = 12 PSUs
K = 20 + 12 + … + 9 + 16
= 150 SSUs
M1 = 20 SSUs
M2 = 12 SSUs
M12 = 16 SSUs
M11 = 9 SSUs
i =1
i =9
i =4i =3i =2
i =11 i =12
i =5
SSU i = 9j = 1 SSU
i = 9j = 7
18
Notation – 3 Response variable for SSU j in PSU
i yij e.g., age of j-th resident in household
i e.g., whether or not dorm resident j
in room i owns a computer
19
Cluster size =
Cluster population total
Note that we observe cluster population total (or mean or variance) for each sample cluster in 1-stage cluster sampling
We will estimate cluster parameters in 2-stage cluster sampling
iM
jiji yt
1
Cluster-level population parameters (for cluster i )
Mi elements
20
Cluster population mean
Within-cluster variance
Cluster-level population parameters (for cluster i ) – 2
iM
jiUij
ii yy
MS
1
22
11
i
iM
jij
iiU M
ty
My
i
1
1
21
75.733.4
39
9
21111
11
11
Sy
t
M
U
Popuation
83.3
46
88.6Sboxes12
2
2
222
Uy
t
M
33.3
30
00.9S9
6
6
266
Uy
t
M
00.7
95.4
99
20
21
1
1
1
S
y
t
M
U
1-stage cluster sample
22
Cluster-level population parameters (for cluster i ) – 3 For 1-stage cluster samples
Have a complete enumeration of the cluster elements
Cluster population parameters are known For 2-stage cluster samples
Observe data on a sample of elements in a cluster
Estimate cluster population parameters
23
Population parameters Same parameters as in previous
chapters, rewritten in notation for cluster sampling
Population size
(** K was referred to as N in previous chapters)
Population total (sum of all cluster totals)
N
ii
N
i
M
jij tyt
i
11 1
elements 1
N
iiMK
24
Population Parameters-2 Population mean (of K elements)
Population variance (among K elements)
Variance among N cluster totals
N
i
M
jUij
i
yyK
S1 1
22
11
N
i
M
jijU
i
yK
y1 1
1
N
iit N
tt
NS
1
22
11
25
Data from cluster samples Work with element and cluster-level data Element data set will have columns for
Cluster id Element id within cluster Variable (y)
Will also summarize this data set to generate cluster parameters (1-stage) or estimates of cluster parameters (2-stage)
Cluster id Cluster total (or estimate) Cluster mean (or estimate) Cluster variance (or estimate)
26
1-stage cluster sampleElement data Cluster
summary
i j yij
1 1 y11
1 2 y12
1 3 Y13
1 4 y14
2 1 y21
2 2 y22
2 3 y23
3 1 y31
…
i ti
1 t1
2 t2
3 t3
…
iUy
Uy1
Uy2
Uy3
2iS
21S22S23S
27
Estimation for CSE1 Chapter reading
Section 5.2.1 covers equal sized clusters (Mi constant, read)
We’ll start with 5.2.3 (unequal sized clusters, Mi varies)
Section 5.2.2 covers theory Two types estimators
Unbiased – HT estimator Ratio estimation
Equal probability sample of clusters – assume SRS of clusters
28
CSE1 unbiased estimation under SRS – total t Estimator for population total using data
collected from a 1-stage cluster sample SRS of clusters
Estimator of variance of
n
iiunb t
nN
t1
ˆ
unbt
2
1
22
2ˆ
11
where1ˆˆ
N
tt
ns
n
s
Nn
NtV unbi
n
it
tunb
29
Dorm example – 4 Estimated population total
Estimated variance
dinners hall dining 1680)8.16(100
)1021191420(51
1001ˆ
1
n
iiunb t
nNt
06.203ˆ
230,415
7.21
100
511001ˆˆ
7.21])8.1610(...)8.1620[(15
1ˆ
1
1
22
2
22
1
2
2
unb
tunb
n
i
unbit
tSE
n
s
N
nNtV
N
tt
ns
30
Two events : A and B Pr{ A and B both occur }
= P { A occurs } x P { B occurs given A occurs } In our setting
A = sample cluster i B = sample element j (in cluster i)
Inclusion probability for for element j in cluster i ij = Pr {including element j and cluster i in sample}
= Pr {including cluster i in sample} x Pr {incl. element j given cluster i has been
included in sample}
CSE1 inclusion probability for an element
31
Need to two pieces Pr {including cluster i in sample} = n / N Pr {including element j given cluster i has been
included in sample} = 1 Inclusion probability ij
= Pr {including element j and cluster i in sample}= Pr {including cluster i in sample} x
Pr {including element j given cluster i has been included in sample} = (n / N ) x 1 = n / N
CSE1 inclusion probability for an element – 2
32
CSE1 weight for an element Weight for element j in cluster i
Inverse element inclusion probability wij = 1/ ij = N /n
Estimator using weights
n
ii
n
i
M
jij
n
i
M
jijijunb t
nN
ynN
ywtii
11 11 1
ˆ
33
Dorm example – 5 Inclusion probability for student j in
dorm room i N = 100 dorm rooms n = 5 sample dorm rooms Take all 4 students in dorm room ij = n / N = 1/20 = 0.05
Weight for student j in dorm room i wij = N / n = 20 students
34
CSE1 unbiased estimation under SRS – mean Unbiased estimator for population
mean For SRS, estimator for total divided by
number of population elements (OUs) Units are y-units per element
unbunb
unbunb
tVK
yV
tK
y
ˆˆ1ˆˆ
ˆ1ˆ
2
Uy
35
Dorm example – 6
51.0ˆ
257688.0400
230,41ˆˆˆˆ
per weekstudent per dinners hall dining 20.4
)4(100
1680ˆˆ
22
unb
unbunb
unbunb
ySE
K
tVyV
K
ty
36
Unbiased estimation – proportion p What is y ?
37
Ratio estimation Usually ti (cluster total) is correlated with Mi
(cluster size) As Mi (# SSUs/elements in cluster i ) increases,
value for ti (total of yij for cluster i ) increases Positive correlation between Mi and ti No intercept
Perfect conditions for SRS ratio estimator
Notation of Ch 3 Notation of Ch 5
yi (variable of interest) ti (cluster total)
xi (auxiliary info) Mi (cluster size)
38
Ratio estimation for CSE1 Estimator for population mean
Units are y-units per element
n
ii
n
ii
r
M
ty
1
1ˆ
39
Ratio estimation for CSE1 – 2 Estimator for variance of ratio
estimator of population mean
is average cluster size for populationUM
1
ˆ1
1
1
ˆ1
1ˆˆ
1
22
2
1
2
2
n
yyM
MnNn
n
Myt
MnNn
yV
n
irii
U
n
iiri
Ur
40
Ratio estimation for CSE1 – 3 Average cluster size
If unknown, can estimate with sample mean of cluster sizes
NK
MN
MN
iiU
1
1
n
iiS M
nM
1
1
41
Dorm example – 7 Estimated population mean
Average cluster size
n
ii
n
ii
r
M
ty
1
1ˆ
N
KM
NM
N
iiU
1
1
42
Dorm example – 8 Estimated variance
1
ˆ1
1
1
ˆ1
1ˆˆ
1
22
2
1
2
2
n
yyM
MnNn
n
Myt
MnNn
yV
n
irii
U
n
iiri
Ur
43
Ratio estimation for CSE1 – 4 Estimator for population total
rr yKt ˆˆ
rr yVKtV ˆˆˆˆ 2
44
Dorm example – 9 Estimated population total
Estimated variance
rr yKt ˆˆ
rr yVKtV ˆˆˆˆ 2
45
CSE1: impact of cluster size If cluster sizes Mi are variable across
clusters, generally estimate population parameter with less precision If ti is related to Mi , then get large
variation among cluster totals if Mi is variable
Variance of population parameter estimator (unbiased or ratio) is a function of variation among cluster totals
46
2-stage equal probability cluster sampling (CSE2) CSE2 has 2 stages of sampling
Stage 1. Select SRS of n PSUs from population of N PSUs
Stage 2. Select SRS of mi SSUs from Mi elements in PSU i sampled in stage 1
47
2-stage cluster sampling
Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:
Stage 1 of 2-stage cluster sample(select PSUs)
Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)
48
Motivation for 2-stage cluster samples
Recall motivations for cluster sampling in general Only have access to a frame that lists
clusters Reduce data collection costs by going
to groups of nearby elements (cluster defined by proximity)
49
Motivation for 2-stage cluster samples – 2 Likely that elements in cluster will be
correlated May be inefficient to observe all elements in
a sample PSU Extra effort required to fully enumerate a
PSU does not generate that much extra information
May be better to spend resources to sample many PSUs and a small number of SSUs per PSU Possible opposing force: study costs
associated to going to many clusters
50
Have a sample of elements from a cluster We no longer know the value of
cluster parameter, ti
Estimate ti using data observed for mi SSUs
CSE2 unbiased estimation for population total t
im
jij
i
iiii y
m
MyMt
1
ˆ
51
CSE2 unbiased estimation for population total – 2 Approach is to plug estimated
cluster totals into CSE1 formula CSE1
CSE2
n
iii
n
jiunb yM
nN
tnN
t11
ˆˆ
n
iiUi
n
jiunb yM
nN
tnN
t11
ˆ
52
The variance of has 2 components associated with the 2 sampling stages1. Variation among PSUs2. Variation among SSUs within PSUs
CSE2 unbiased estimation for population total – 3
unbt
n
i i
ii
i
itunb m
sM
M
m
nN
n
s
Nn
NtV1
22
22 11ˆˆ
among PSU
within PSU
53
In CSE1, we observe all elements in a cluster We know ti
Have variance component 1, but no component 2
In CSE2, we sample a subset of elements in a cluster We estimate ti with Component 2 is a function of estimates
variance for
CSE2 unbiased estimation for population total – 4
it
it
i
i
i
ii m
s
M
mM
22 1
54
CSE2 unbiased estimation for population total – 5 Estimated variance among cluster
totals
Estimated variance among elements in a cluster
n
i
unbit N
tt
ns
1
2
2ˆ
ˆ1
1
im
jiij
ii yy
ms
1
22
11
55
CSE2 unbiased estimation for population total – 6
n
i i
ii
i
itunb m
sM
M
m
nN
n
s
Nn
NtV1
22
22 11ˆˆ
n
i
unbit N
tt
ns
1
2
2ˆ
ˆ1
1
im
jiij
ii yy
ms
1
22
11
56
Dorm example – 10 Stage 2: select 2 students in each
room
Stu-
dent
Rm 6
Rm 21
Rm 28
Rm 54
Rm 89
1 5 3 6 5 1
2 5 2 4 4 4
3 4 4 4 6 3
4 6 5 5 6 2
Total
? ? ? ? ?
57
Dorm example – 11 Stage 1
Cluster = N = n = SRS
Stage 2 Element = Mi = mi = SRS
it
58
Dorm example – 12
it
Stu-dent
(j)
Rm 6
(i=1)
Rm 21
(i=2)
Rm 28
(i=3)
Rm 54
(i=4)
Rm 89
(i=5)
1 5 3 4 5 4
2 6 2 5 4 2
2is
iy
ii yM
im
jiij
i
yym 1
2
11
59
Dorm example – 13
n
jiunb t
nN
t1
ˆˆ
n
i
unbit N
tt
ns
1
2
2ˆ
ˆ1
1
60
Dorm example – 14
n
i i
ii
i
itunb m
sM
M
m
nN
n
s
Nn
NtV1
22
22 11ˆˆ
61
CSE2 unbiased estimation for population mean
2
ˆˆˆˆ
ˆˆ
K
tVyV
Kt
y
unbunb
unbunb
Uy
62
Dorm example – 15
2
ˆˆˆˆ
ˆˆ
K
tVyV
Kt
y
unbunb
unbunb
63
Two events : A and B Pr{ A and B both occur }
= P { A occurs } x P { B occurs | A occurs } “|” denotes “given” (a condition)
In our setting A = sample cluster i B = sample element j
Inclusion probability symbols ij = Pr {including element j and cluster i in sample} i = Pr {including cluster i in sample} j|i = Pr {incl. element j | cluster i has been included
in sample}
CSE2 inclusion probability for an element
64
Need to two pieces i = Pr {including cluster i in sample} = n / N
j|i = Pr {including element j | cluster i has been included in sample} = mi /Mi
Inclusion probability for element j in cluster i ij = i j|i =
CSE2 inclusion probability for an element – 2
i
i
Mm
Nn
65
CSE2 weight for an element Sampling Weight for element j in
cluster i
Estimator for population total
n
ii
n
iii
n
i
M
jij
i
in
i
M
jij
i
in
i
M
jijijunb
tnN
yMnN
ymM
nN
ymM
nN
ywtiii
11
1 11 11 1
ˆ
ˆ
i
i
ijij m
MnN
w 1
66
What does equal probability mean in Ch 5? Clusters (PSUs) sampled using SRS Equal inclusion probability for stage 1
PSUs (clusters)
i is same for all i
Nn
i
67
What does equal probability mean in Ch 5? – 2 Elements (SSUs) in a given PSU are
sampled using SRS All elements (j ) in a sample PSU (i ) are
selected with equal probability This is a conditional probability (given PSU i )
For a given PSU i , j|i is the same for all elements j
i
iij M
m|
68
What does equal probability mean in Ch 5? – 3 Note that
Equal probability at stage 1 (i )
plus Equal probability at stage 2 given PSU i (j|i )
does NOT imply equal inclusion probability for an element
In fact, element-level (unconditional) inclusion probability is not necessarily constant
Depends on cluster size Mi and sample size mi for the cluster to which the element belongs
i
iij M
mNn
69
CSE2 ratio estimation for population mean
n
ii
n
iii
n
ii
n
ii
r
M
yM
M
ty
1
1
1
1
ˆˆ
Uy
70
CSE2 ratio estimation for population mean – 2
n
iiiU
n
irii
n
iriiir
n
i i
i
i
ii
r
Ur
Mn
MMM
yyMn
yMyMn
s
ms
M
mM
Nnns
Nn
MyV
1S
1
22
1
22
1
22
2
1or of mean sampleby estimated be can
ˆ1
1ˆ1
1
11
11ˆˆ
71
Dorm example – 16
it
Stu-dent
(j)
Rm 6
(i=1)
Rm 21
(i=2)
Rm 28
(i=3)
Rm 54
(i=4)
Rm 89
(i=5)
1 5 3 4 5 4
2 6 2 5 4 2
5.5 2.5 4.5 4.5 3.0
22 10 18 18 12
0.5 0.5 0.5 0.5 2.0
2is
iy
ii yM
im
jiij
i
yym 1
2
11
72
Dorm example – 16
n
ii
n
ii
r
M
ty
1
1
ˆˆ
n
iriir yyM
ns
1
222 ˆ
11
73
Dorm example – 17
n
i i
i
i
ii
rr m
sM
mM
Nnns
Nn
MyV
1
22
21
11
1ˆˆ
74
CSE2 ratio estimation for population total t
rr yKt ˆˆ
rr yVKtV ˆˆˆˆ 2
75
Dorm example – 18
rr yKt ˆˆ
rr yVKtV ˆˆˆˆ 2
76
Coots egg example Target pop = American coot eggs in Minnedosa,
Manitoba PSU / cluster = clutch (nest) SSU / element = egg w/in clutch Stage 1
SRS of n = 184 clutches N = ??? Clutches, but probably pretty large
Stage 2 SRS of mi = 2 from Mi eggs in a clutch Do not know K = ??? eggs in population, also large Can count Mi = # eggs in sampled clutch i
Measurement yij = volume of egg j from clutch i
77
Coots egg example – 2 Scatter plot of volumes
vs. i (clutch id) Double dot pattern - high
correlation among eggs WITHIN a clutch
Quite a bit of clutch to clutch variation
Implies May not have very high
precision unless sample a large number of clutches
Certainly lower precision than if obtained a SRS of
eggs3681
n
iim
ijy
i
Could use a side-by-side plot for data with larger cluster sizes – PROC UNIVARIATE w/ BY CLUSTER and PLOTS option
78
Coots egg example – 3 Plot
Rank the mean egg volume for clutch i ,
Plot yij vs. rank for clutch i Draw a line between yi 1 and
yi2 to show how close the 2 egg volumes in a clutch are
Observations Same results as Fig 5.3, but
more clear Small within-cluster
variation Large between-cluster
variation Also see 1 clutch with large
WITHIN clutch variation check data (i = 88)
ijy
i sorted by iy
iy
79
Coots egg example – 4 Plot si vs. for clutch i Since volumes are
always positive, might expect si to increase as gets larger
If is very small, yi 1 and yi 2 are likely to be very small and close small si
See this to moderate degree
Clutch 88 has large si , as noted in previous plot
is
iy
iy
iy
80
Coots egg example – 5 Estimation goal
Estimate , population mean volume per coot egg in Minnedosa, Manitoba
What estimator? Unbiased estimation
Don’t know N = total number of clutches or K = total number of eggs in Minnedosa, Manitoba
Ratio estimation Only requires knowledge of Mi , number of eggs in
selected clutch i , in addition to data collected May want to plot versus Mi it
Uy
81
Coots egg example – 6
Clutch
Mi
iy 2is
it i
ii
i ms
MM
222
1
2ˆˆ
rii yMt
1 13 3.86 0.0094 50.23594 0.671901 318.9232 2 13 4.19 0.0009 54.52438 0.065615 490.4832 3 6 0.92 0.0005 5.49750 0.005777 89.22633 4 11 3.00 0.0008 32.98168 0.039354 31.19576 5 10 2.50 0.0002 24.95708 0.006298 0.002631 6 13 3.98 0.0003 51.79537 0.023622 377.053 7 9 1.93 0.0051 17.34362 0.159441 25.72099 8 11 2.96 0.0051 32.57679 0.253589 26.83682 9 12 3.46 0.0001 41.52695 0.006396 135.4898 10 11 2.96 0.0224 32.57679 1.108664 26.83682 … … … … … … …
180 9 1.95 0.0001 17.51918 0.002391 23.97106 181 12 3.45 0.0017 41.43934 0.102339 133.4579 182 13 4.22 0.00003 54.85854 0.002625 505.3962 183 13 4.41 0.0088 57.39262 0.630563 625.7549 184 12 3.48 0.000006 41.81168 0.000400 142.1994 sum 1757 4375.947 42.17445 11,439.58 var 149.565814
ry 2.490579
82
Don’t
know
Use
Coots egg example – 7
061.0184
511.62549.91ˆ
18417.421
184511.62184
1549.91ˆˆ
549.9184/1757
511.62183
58.439,111
ˆˆ
49.21757
947.4375ˆ
ˆ
2
2
2
r
r
S
riiSi
r
iSi
iSi
r
ySE
NNyV
M
n
yMts
M
ty
Don’t know N , but assumed large
FPC 1
2nd term is very small, so approximate SE ignores 2nd
UM
sM
83
Coots egg example – 8 What is first-stage PSU inclusion
probability?
What is conditional SSU inclusion probability at second stage?
What is unconditional SSU inclusion probability?
84
CSE2: Unbiased vs. ratio estimation Unbiased estimator can poor precision if
Cluster sizes (Mi ) are unequal ti (cluster total) is roughly proportional to Mi
(cluster size)
Biased (ratio estimator) can be precise if ti roughly proportional to Mi
This happens frequently in pops w/cluster sizes (Mi) vary
85
CSE2: Self-weighting design Stage 1: Select n PSUs from N PSUs in pop
using SRS Inclusion probability for PSU i :
Stage 2: Choose mi proportional to Mi so that mi /Mi is constant, use SRS to select sample
Inclusion probability for SSU j given PSU i :
Unconditional inclusion probability for SSU j in cluster i is constant for all elements
Nn
i
cMm
i
iij |
cNn
ij Inclusion probability may vary in practice because may not be possible for mi /Mi to be equal to c for all clusters
86
Self-weighting designs in general Why are self-weighting samples
appealing?
Are dorm student or coot egg samples self-weighting 2-stage cluster samples?
What other (non-cluster) self-weighting designs have we discussed?
87
Self-weighting designs in general – 2 What is the caveat for variance
estimation in self-weighting samples? No break on variance of estimator – must
use proper formula for design
Why are self-weighting samples appealing? Simple mean estimator Homogeneous weights tends to make
estimates more precise
88
Return to systematic sampling (SYS) Have a frame, or list of N elements Determine sampling interval, k
k is the next integer after N/n Select first element in the list
Choose a random number, R , between 1 & k R-th element is the first element to be
included in the sample Select every k-th element after the R-th
element Sample includes element R, element R + k,
element R + 2k, … , element R + (n-1)k
89
SYS example Telephone survey of members in an
organization abut organization’s website use N = 500 members Have resources to do n = 75 calls N / n = 500/75 = 6.67 k = 7 Random number table entry: 52994
Rule: if pick 1, 2, …, 7, assign as R; otherwise discard #
Select R = 5 Take element 5, then element 5+7 =12, then
element 12+7 =19, 26, 33, 40, 47, …
90
SYS – 2 Arrange population in rows of
length k = 7R 1 2 3 4 5 6 7 i
1 2 3 4 5 6 7 1
8 9 10 11 12 13 14 2
15 16 17 18 19 20 21 3
22 23 24 25 26 27 28 4
… …
491
492
493
494
495
496
497
71
498
499
500
72
91
Relationship between SYS and cluster sampling Design relationships
Element = ? Cluster = ? Sampling unit(s) = ? Cluster sampling design = ?
Relationship between frame ordering and expected precision of a an estimate from a cluster sample?
Periodic, where cycle of pattern is coincident with sampling interval k
Ordered by X , which is correlated with response variable Y
Random
92
SYS – 3 Suppose X [age of member] is correlated with
Y [use of org website] Sort list by X before selecting sample
k 1 2 3 4 5 6 7 X i
1 2 3 4 5 6 7 young 1
8 9 10 11 12 13 14 2
15 16 17 18 19 20 21 3
22 23 24 25 26 27 28 4
… mid …
491
492
493
494
495
496
497
71
498
499
500
old 72