sample design: use of sampling and replicated weights · estimating sampling variance •simple...
TRANSCRIPT
Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights
Andrés Sandoval-Hernández – IEA DPC
Workshop on using PISA, PIAAC, TIMSS & PIRLS, TALIS datasets
Ispra, Italy- June 24-27, 2014
Note: These slides were prepared as part of the IEA training portfolio with the collaboration of IEA staff and resource persons.
Table of Contents
• Introduction
• The sampling design
• The assessment design
2
Some of the Challenges...
• Domain is very broad
• Limited testing time (physical & psychological)
• Challenges:
• Need to administer the items in a “sensible“ design
• Need to summarize performance on the items
• Need to account for unreliability of estimates
3
What’s Common…?
• Complex Sample Design
• Probabilistic, stratified, multistage sample designs
• Need to take sample design into account when computing estimates
• Complex Assessment Design
• Multiple matrix sample designs where nobody takes all items, and not all items are given to all test participants.
• Need to take measurement uncertainty into account when computing estimates
4
How Does Sampling Help?
• Impossible to test everyone on everything • Too many people
• Too many items
• Too expensive
• Not necessary to test everyone on everything • Blood sample
• Soup sample
• Some people are tested on some things
• Results should be seen in the context of the student and item sample design
5
Table of Contents
• Introduction
• The sampling design
• The assessment design
6
An Example…
7
An Example…
8
General Sample Design
• Populations of Interest
• TIMSS: 8th and 4th graders
• PIRLS: 4th graders
• PISA: 15 year old students
• PIAAC: adults
• TALIS: teachers (ISCED level 1, 2 and 3)
9
General Sample Design - Students
• Basic student sampling design is referred to as a two-stage stratified cluster sample design • 1st Stage: Selection of schools (PIRLS & TIMSS, PISA, TALIS)
• 2nd Stage (PIRLS & TIMSS): Selection of classes within schools
• 2nd Stage (PISA): Selection of students within schools
• 2nd Stage (TALIS): Selection of teachers within schools
• Consider the alternatives...
10
General Sample Design - PIAAC
• Multistage stratified cluster sampling design • Multistage
• There are different stages/levels of selection
• For example: Municipalities–Block–Household-Person
• Stratified • Selection takes place across different segments of the population
• Achieved by systematic selection across sorted list, or targeted selection within different groups
• Cluster • Multiple individuals are selected from within segments of the
population
• Segments could be municipalities, blocks, etc.
11
PIAAC Sampling Design Sampling Frames
• Three broad types
• Population registers • Administrative lists of residents are maintained at either national or
regional level
• Master samples • Lists of dwelling units or primary sampling units are maintained at
national level for official surveys
• Area frames • A frame of geographic clusters formed by combining adjacent
geographic areas, respecting their population sizes, and taking into consideration travel distances for interviewers
12
PIAAC Sampling Design Sampling Frames
• Required to cover at least 95 percent of the target population
• Limited exclusion (non-coverage) of groups in the target population
• Hard-to-reach groups such as the populations of remote and isolated regions
13
PIAAC Sampling Design Sample Sizes
• Minimum sample size required depended on two variables:
• Number of cognitive domains assessed
• Number of languages in which the assessment was administered
Participating countries had the choice of assessing all three domains or assessing literacy and numeracy only
• Minimum sample size for one language required:
• 5,000 completed cases if all three domains were assessed
• 4,500 if only literacy and numeracy were assessed
14
PIAAC Sampling Design Sample Sizes
• To fully report results in more than one language, the required sample size is either 4,500 or 5,000 cases per reporting language
• When not reporting results separately by language, the required sample size is at least 5,000 completed cases collected in the principal language
• A completed case is defined as an interview in which the respondent provided answers to key background questions, including age, gender, highest level of schooling and employment status, and completed the ‘core’ cognitive instrument
15
Why Do It This Way?
• Among many reasons…
• Availability of information
• Cost reduction
• Ensure representation of target population groups
• Achieve desired precision levels for target groups
• Redundancy
16
What are the Consequences?
• We DO NOT have a “simple random selection/sample” (SRS) from the population
• Think of selecting clusters, and then persons within clusters • Persons within a cluster are likely to be more similar to each other
than to persons in other clusters
• This matters when we compute sampling errors
17
Selecting Individuals vs. Clusters
18
Random vs. Systematic Selection
19
Oversampling
20
Exclusions from the Population
21
Sampling Weights
• Sampling weights are an inverse of the probability of the selection for a person
• They take into account characteristics of the sample and selection procedure • Stratification or disproportional sampling of subgroups
• Adjustments for nonresponse
• Selection probability of each person is known
• Poststratification to external control totals
• Sampling weights must ALWAYS be used to get correct population estimates
22
Computing Means
Weighted Mean xwgt x
wgt_ ( )
*
Unweighted Mean xx
N_ ( )
23
Estimating Sampling Variance
• Simple random sample (SRS) of 4,500 students from all students in the population covers the population diversity better than a sample of 100 schools with 45 students sampled in each school
• A two-stage design has more uncertainty associated with its estimates than a SRS of the same size
• The increase in uncertainty in a two-stage design is directly related to the differences between and within the school
24
Estimating Sampling Variance
• Consider the extreme where all schools are different but within each school, all students are identical
• Sample 100 schools and 45 students per school, the effective sample size is really only 100 as opposed to 4,500
• This example is an extreme case but shows that in general, with such designs, the “effective sample size” will be decreased from the actual sample size
25
Estimating Sampling Variance
• Sampling automatically results in some uncertainty (called “error” or “variance”)
• Which factors can influence the magnitude of variance in a sample? • How we sample
• Sample size
• Variability within the population
• Think of a SRS and how these factors influence the variance of an estimate
26
Using Replicate Weights
• How to properly estimate sampling variance from a complex design? • Replicate samples
• Brief explanation: • Delete different sub-samples from the full sample to form G
replicate samples
• Adjust weights of the remaining units to account for the deleted units – new weights are called replicate weights
• Produce an estimate using the full sample weight and an estimate from each set of replicate weights
• Calculate the variance of these estimates
27
Using Replicate Weights
• The variation of the full sample estimate replicates provide a measure of the variance of the full sample estimate
• Advantages of replication:
• Convenient to use
• Effects of non-response and other adjustments can be reflected in replicate weights
• Estimates can be computed for subpopulations
• Applicable to most statistics
28
Estimating Sampling Variance
• Jackknife Repeated Replication (JK2) used in TIMSS, PIRLS
• Balanced Repeated Replication (BRR) used in PISA, TALIS
• These procedures make use of replicate weights
• In PISA, replicate weights are stored in the database
• In TIMSS, PIRLS and ICCS replicate weights are computed “on the fly”
29
How the JK2 works ...
Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
1 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
4 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0
8 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
9 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0
10 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
11 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0
12 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0
13 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0
14 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0
15 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0
16 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0
17 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0
18 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0
19 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
20 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0
1
2
9
10
3
4
5
6
7
8
30
How the BRR Fay‘s Variant works...
31
Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12
1 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
2 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
3 1.0 1.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5
4 1.0 0.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5
5 1.0 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5
6 1.0 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5
7 1.0 1.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5
8 1.0 0.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5
9 1.0 1.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5
10 1.0 0.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5
11 1.0 1.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5
12 1.0 0.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5
13 1.0 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5
14 1.0 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5
15 1.0 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5
16 1.0 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5
17 1.0 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5
18 1.0 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5
19 1.0 1.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5
20 1.0 0.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5
21 1.0 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5
22 1.0 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5
23 1.0 1.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5
24 1.0 0.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5
1
2
3
4
5
6
7
8
9
10
11
12
Estimating Sampling Variance
• Think of the following:
• If I take clusters out, recalculate, and results DO change… • What can I say about the rest of the clusters not sampled?
• What can I say about other samples I could have drawn?
• If I take clusters out, recalculate, and results DO NOT change… • What can I say about the rest of the clusters not sampled?
• What can I say about other samples I could have drawn?
32
Calculating Sampling Variance
• When…
• Using JK2:
• Using BRR (w/Fay):
2
0
1
*R
r
r
Var f
1.0f
2
1
* 1 f
R FayFac
33
Sampling Summary
• Always use sampling weights
• Always take design into account when computing sampling variance
34
Table of Contents
• Introduction
• The sampling design
• The assessment design
35
TIMSS, PIRLS, PIAAC & PISA Assessment Design*
• Rotated block assessment design: blocks of items are rotated among several booklets
• No individual answered all items
• In TIMSS and PIRLS students encounter booklets with both mathematics and science items
• In PISA and PIACC individuals encounter booklets with items from one or more domains
• Each booklet contained both trend items and new items
* TALIS does not include assessment
36
PISA Assessment Design
2000 2003* 2006 2009 2012*
Reading MAJOR Minor Minor MAJOR Minor
Mathematics Minor MAJOR Minor Minor MAJOR
Science Minor Minor MAJOR Minor Minor
37 * Problem solving
PIAAC Assessment Design
Source: http://www.oecd.org/site/piaac/ 38
Thank you for your attention!
Any questions?
39