module b-4: processing ict survey data training course on the production of statistics on the...

14
Module B-4: Processing ICT survey data TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY THE INFORMATION ECONOMY TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data Unctad Manual Chapter 7

Upload: clark-gross

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B-4: Processing ICT survey data

TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS PRODUCTION OF STATISTICS

ON THE INFORMATION ON THE INFORMATION ECONOMYECONOMY

TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS PRODUCTION OF STATISTICS

ON THE INFORMATION ON THE INFORMATION ECONOMYECONOMY

Module B-4Processing ICT Survey data

Unctad ManualChapter 7

Page 2: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

2

ObjectivesObjectives

After completing this module you will know how to do:

• Data processing• Data weighting (grossing-up)• Data editing• Data analysis

Contents of this moduleContents of this module

4. Data processing and analysis

4.1 Data editing

4.2 Data weighting

4.3 Estimating ICT indicators

Page 3: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

3

Data editingData editing

Statistical information provided by businesses can contain errors such as Wrong or missing data, Incorrect classifications Inconsistent or illogical responses.

Solutions to minimize such errors Ex ante optimize the effectiveness of

• data capture instruments • collection procedures.

Ex post application of robust data editing techniques

Editing! What is

Editing! What is

editing?editing?

B4.1. Data editing Page 82Page 82

Page 4: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

4

Phases of data processingPhases of data processing

Raw dataRaw data

Quality controls during data collection and entryQuality controls during data collection and entry

Clean Clean data filedata file

Data Data editingediting

Treatment of internal errors and Treatment of internal errors and inconsistenciesinconsistenciesEstimation of missing dataEstimation of missing data

Outlier analysisOutlier analysis

Re-weighting proceduresRe-weighting procedures

Editing of aggregatesEditing of aggregates

Micro-editingMicro-editing (input)(input)

Macro-editingMacro-editing (output(output))

EditinEditing!g!

B4.1. Data editing

Page 5: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

5

Internal inconsistencies and errorsInternal inconsistencies and errors

Validity control of an individual data item requires:

1. To define a valid set of responses (in general, gender

should be = 0 or 1, age should not be 110 years, etc; in ICT

use of Internet by business should be 0 or 1)

2. To check questions against valid responses

- Definition of rules based in relationships between questions (see

Box 15 of the Manual: some logical tests)

3. Arithmetic checks during data entry or batch mode (totals,

subtotals, frequencies)

B4.3. Estimating ICT indicators Page 82Page 82

Page 6: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

6

Treatment of missing dataTreatment of missing data

Final non-response (missing data) should be treated to avoid biased estimates. Unit non-response treatment: Corrective weighting.

• Sample-based methods (the original weights are modified with sample information)

• Population-based method (the weights are modified with population information, the classical post stratification procedure)

B4.3. Estimating ICT indicators Page 84Page 84

Page 7: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

7

Treatment of missing data Treatment of missing data (cont.)(cont.)

Final non-response (missing data) should be treated in order to avoid biased estimates. Item non-response treatment: Imputation.

• Deterministic imputation (a law).• Hot deck imputation (let’s do it now).• Cold deck imputation (using other information,

models, econometrics…).• Mean or modal value imputation ( it is clear).• Historical imputation (long series).

B4.3. Estimating ICT indicators

Page 151 Annexe Page 151 Annexe 55

Page 8: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

8

Misclassified unitsMisclassified units

Two cases of misclassification Non-eligibility unit erroneously included

• This will reduce the effective sample size unless a reserve list is prepared

Eligible unit included in the wrong stratum or omitted from the frame altogether

• The technical solution consists of recalculating sample weights (see Box 17)

B4.3. Estimating ICT indicators

Page 86Page 86

Page 9: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

9

Some simple weighting methodsSome simple weighting methods

The sample average in stratum h is defined as

The estimate for the total for stratum h can be obtained by multiplying the stratum average by the total number of businesses in the stratum (Nh)

hhh yNY .'

B4.2. Data weighting

Page 10: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

10

LLL yNyNyNYYYY .........' 2211''

2'1

The estimate for the total in the population is just

or

Ln

iLi

LL

n

ii

n

ii y

nNy

nNy

nNY

112

22

11

11

1....

1.

1.'

21

1

11

1

'n

ii

L

l l

l yn

NY

See boxes 18 and 19 pag See boxes 18 and 19 pag 8989

B4.3. Estimating ICT indicators

Some simple weighting methods Some simple weighting methods (cont.)(cont.)

Page 11: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

11

Estimating proportions and ratiosEstimating proportions and ratios

A proportion:

Four different types of estimates are very usual

Simple random sampling of a non-stratified population Stratified random sampling

• With one or several strata exhaustively investigated Ratio estimates with simple random sampling Ratio estimates with stratified random sampling

A ratio :'

'

Y

Xp

B4.3. Estimating ICT indicators

ICT indicators are mainly proportions and ratios.

Page 12: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

12

CASE 1: CASE 1: Simple random sampling of a nonSimple random sampling of a non--stratified populationstratified population

The indicator can be expressed as the sample proportion:

The standard error (SE) of the sample proportion is estimated by:

SE expression valid with a sampling fraction of 10% or less

n

a

N

anN

nN

anN

w

awp

n

ii

n

ii

n

i

n

ii

n

ii

n

iii

11

1

1

1

1

/

)/(

)/(ˆ

B4.3. Estimating ICT indicators

Page 13: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

13

CASE 2: Stratified random samplingCASE 2: Stratified random sampling

An unbiased estimate of p is:

Where, L: the number of strataNh : the population in stratum h (h=1, 2, ... L)nh : the sample size in stratum h (h=1, 2, ... L)

The estimate of the SE of:

See Annex 4 of the Manual for more details

L

hh

h

L

h

n

ihi

h

h

pN

N

N

an

N

p

h

1

1 1 ˆˆ

L

hh

h pVN

NpSE

1

2

)ˆ()ˆ(

B4.3. Estimating ICT indicators

Page 14: Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data

Module B4: Processing ICT survey data

UNCTAD

14

CASE 3: Ratio estimates with simple random CASE 3: Ratio estimates with simple random samplingsampling

The indicator to estimate is:

The natural estimate of ratio p is:

Finally, one approximation of the SE is:

where is the sample average of n X observations,

N

ii

N

ii

x

y

X

Yp

1

1

n

ii

n

ii

n

ii

n

ii

x

y

xnN

ynN

X

Yr

1

1

1

1

ˆ

ˆˆ

1

)ˆ(1

)ˆ( 1

2

n

xry

Nn

nN

xrSE

n

iii

n

xx

n

ii

1x

This is a reference outside the scope of our This is a reference outside the scope of our course course

B4.3. Estimating ICT indicators