sample design: use of sampling and replicated weights · estimating sampling variance •simple...

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

Andrés Sandoval-Hernández – IEA DPC

Workshop on using PISA, PIAAC, TIMSS & PIRLS, TALIS datasets

Ispra, Italy- June 24-27, 2014

Note: These slides were prepared as part of the IEA training portfolio with the collaboration of IEA staff and resource persons.

Table of Contents

• Introduction

• The sampling design

• The assessment design

2

Some of the Challenges...

• Domain is very broad

• Limited testing time (physical & psychological)

• Challenges:

• Need to administer the items in a “sensible“ design

• Need to summarize performance on the items

• Need to account for unreliability of estimates

3

What’s Common…?

• Complex Sample Design

• Probabilistic, stratified, multistage sample designs

• Need to take sample design into account when computing estimates

• Complex Assessment Design

• Multiple matrix sample designs where nobody takes all items, and not all items are given to all test participants.

• Need to take measurement uncertainty into account when computing estimates

4

How Does Sampling Help?

• Impossible to test everyone on everything • Too many people

• Too many items

• Too expensive

• Not necessary to test everyone on everything • Blood sample

• Soup sample

• Some people are tested on some things

• Results should be seen in the context of the student and item sample design

5

Table of Contents

• Introduction



6

An Example…

7

An Example…

8

General Sample Design

• Populations of Interest

• TIMSS: 8th and 4th graders

• PIRLS: 4th graders

• PISA: 15 year old students

• PIAAC: adults

• TALIS: teachers (ISCED level 1, 2 and 3)

9

General Sample Design - Students

• Basic student sampling design is referred to as a two-stage stratified cluster sample design • 1st Stage: Selection of schools (PIRLS & TIMSS, PISA, TALIS)

• 2nd Stage (PIRLS & TIMSS): Selection of classes within schools

• 2nd Stage (PISA): Selection of students within schools

• 2nd Stage (TALIS): Selection of teachers within schools

• Consider the alternatives...

10

General Sample Design - PIAAC

• Multistage stratified cluster sampling design • Multistage

• There are different stages/levels of selection

• For example: Municipalities–Block–Household-Person

• Stratified • Selection takes place across different segments of the population

• Achieved by systematic selection across sorted list, or targeted selection within different groups

• Cluster • Multiple individuals are selected from within segments of the

population

• Segments could be municipalities, blocks, etc.

11

PIAAC Sampling Design Sampling Frames

• Three broad types

• Population registers • Administrative lists of residents are maintained at either national or

regional level

• Master samples • Lists of dwelling units or primary sampling units are maintained at

national level for official surveys

• Area frames • A frame of geographic clusters formed by combining adjacent

geographic areas, respecting their population sizes, and taking into consideration travel distances for interviewers

12

PIAAC Sampling Design Sampling Frames

• Required to cover at least 95 percent of the target population

• Limited exclusion (non-coverage) of groups in the target population

• Hard-to-reach groups such as the populations of remote and isolated regions

13

PIAAC Sampling Design Sample Sizes

• Minimum sample size required depended on two variables:

• Number of cognitive domains assessed

• Number of languages in which the assessment was administered

Participating countries had the choice of assessing all three domains or assessing literacy and numeracy only

• Minimum sample size for one language required:

• 5,000 completed cases if all three domains were assessed

• 4,500 if only literacy and numeracy were assessed

14

PIAAC Sampling Design Sample Sizes

• To fully report results in more than one language, the required sample size is either 4,500 or 5,000 cases per reporting language

• When not reporting results separately by language, the required sample size is at least 5,000 completed cases collected in the principal language

• A completed case is defined as an interview in which the respondent provided answers to key background questions, including age, gender, highest level of schooling and employment status, and completed the ‘core’ cognitive instrument

15

Why Do It This Way?

• Among many reasons…

• Availability of information

• Cost reduction

• Ensure representation of target population groups

• Achieve desired precision levels for target groups

• Redundancy

16

What are the Consequences?

• We DO NOT have a “simple random selection/sample” (SRS) from the population

• Think of selecting clusters, and then persons within clusters • Persons within a cluster are likely to be more similar to each other

than to persons in other clusters

• This matters when we compute sampling errors

17

Selecting Individuals vs. Clusters

18

Random vs. Systematic Selection

19

Oversampling

20

Exclusions from the Population

21

Sampling Weights

• Sampling weights are an inverse of the probability of the selection for a person

• They take into account characteristics of the sample and selection procedure • Stratification or disproportional sampling of subgroups

• Adjustments for nonresponse

• Selection probability of each person is known

• Poststratification to external control totals

• Sampling weights must ALWAYS be used to get correct population estimates

22

Computing Means

Weighted Mean xwgt x

wgt_ ( )

*

Unweighted Mean xx

N_ ( )

23

Estimating Sampling Variance

• Simple random sample (SRS) of 4,500 students from all students in the population covers the population diversity better than a sample of 100 schools with 45 students sampled in each school

• A two-stage design has more uncertainty associated with its estimates than a SRS of the same size

• The increase in uncertainty in a two-stage design is directly related to the differences between and within the school

24


• Consider the extreme where all schools are different but within each school, all students are identical

• Sample 100 schools and 45 students per school, the effective sample size is really only 100 as opposed to 4,500

• This example is an extreme case but shows that in general, with such designs, the “effective sample size” will be decreased from the actual sample size

25


• Sampling automatically results in some uncertainty (called “error” or “variance”)

• Which factors can influence the magnitude of variance in a sample? • How we sample

• Sample size

• Variability within the population

• Think of a SRS and how these factors influence the variance of an estimate

26

Using Replicate Weights

• How to properly estimate sampling variance from a complex design? • Replicate samples

• Brief explanation: • Delete different sub-samples from the full sample to form G

replicate samples

• Adjust weights of the remaining units to account for the deleted units – new weights are called replicate weights

• Produce an estimate using the full sample weight and an estimate from each set of replicate weights

• Calculate the variance of these estimates

27

Using Replicate Weights

• The variation of the full sample estimate replicates provide a measure of the variance of the full sample estimate

• Advantages of replication:

• Convenient to use

• Effects of non-response and other adjustments can be reflected in replicate weights

• Estimates can be computed for subpopulations

• Applicable to most statistics

28


• Jackknife Repeated Replication (JK2) used in TIMSS, PIRLS

• Balanced Repeated Replication (BRR) used in PISA, TALIS

• These procedures make use of replicate weights

• In PISA, replicate weights are stored in the database

• In TIMSS, PIRLS and ICCS replicate weights are computed “on the fly”

29

How the JK2 works ...

Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

1 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

2 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

3 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

4 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

5 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

6 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

7 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0

8 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0

9 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0

10 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0

11 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0

12 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0

13 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0

14 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0

15 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0

16 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0

17 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0

18 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0

19 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0

20 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0

1

2

9

10

3

4

5

6

7

8

30

How the BRR Fay‘s Variant works...

31

Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12

1 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

2 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

3 1.0 1.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5

4 1.0 0.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5

5 1.0 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5

6 1.0 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5

7 1.0 1.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5

8 1.0 0.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5

9 1.0 1.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5

10 1.0 0.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5

11 1.0 1.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5

12 1.0 0.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5

13 1.0 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5

14 1.0 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5

15 1.0 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5

16 1.0 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5

17 1.0 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5

18 1.0 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5

19 1.0 1.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5

20 1.0 0.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5

21 1.0 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5

22 1.0 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5

23 1.0 1.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5

24 1.0 0.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5

1

2

3

4

5

6

7

8

9

10

11

12


• Think of the following:

• If I take clusters out, recalculate, and results DO change… • What can I say about the rest of the clusters not sampled?

• What can I say about other samples I could have drawn?

• If I take clusters out, recalculate, and results DO NOT change… • What can I say about the rest of the clusters not sampled?

• What can I say about other samples I could have drawn?

32

Calculating Sampling Variance

• When…

• Using JK2:

• Using BRR (w/Fay):

2

0

1

*R

r

r

Var f

1.0f

2

1

* 1 f

R FayFac

33

Sampling Summary

• Always use sampling weights

• Always take design into account when computing sampling variance

34

Table of Contents

• Introduction



35

TIMSS, PIRLS, PIAAC & PISA Assessment Design*

• Rotated block assessment design: blocks of items are rotated among several booklets

• No individual answered all items

• In TIMSS and PIRLS students encounter booklets with both mathematics and science items

• In PISA and PIACC individuals encounter booklets with items from one or more domains

• Each booklet contained both trend items and new items

* TALIS does not include assessment

36

PISA Assessment Design

2000 2003* 2006 2009 2012*

Reading MAJOR Minor Minor MAJOR Minor

Mathematics Minor MAJOR Minor Minor MAJOR

Science Minor Minor MAJOR Minor Minor

37 * Problem solving

PIAAC Assessment Design

Source: http://www.oecd.org/site/piaac/ 38

Thank you for your attention!

Any questions?

39

sample design: use of sampling and replicated weights · estimating sampling variance •simple...

Documents