module b-4: processing ict survey data training course on the production of statistics on the...
TRANSCRIPT
Module B-4: Processing ICT survey data
TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS PRODUCTION OF STATISTICS
ON THE INFORMATION ON THE INFORMATION ECONOMYECONOMY
TRAINING COURSE ON THE TRAINING COURSE ON THE PRODUCTION OF STATISTICS PRODUCTION OF STATISTICS
ON THE INFORMATION ON THE INFORMATION ECONOMYECONOMY
Module B-4Processing ICT Survey data
Unctad ManualChapter 7
Module B4: Processing ICT survey data
UNCTAD
2
ObjectivesObjectives
After completing this module you will know how to do:
• Data processing• Data weighting (grossing-up)• Data editing• Data analysis
Contents of this moduleContents of this module
4. Data processing and analysis
4.1 Data editing
4.2 Data weighting
4.3 Estimating ICT indicators
Module B4: Processing ICT survey data
UNCTAD
3
Data editingData editing
Statistical information provided by businesses can contain errors such as Wrong or missing data, Incorrect classifications Inconsistent or illogical responses.
Solutions to minimize such errors Ex ante optimize the effectiveness of
• data capture instruments • collection procedures.
Ex post application of robust data editing techniques
Editing! What is
Editing! What is
editing?editing?
B4.1. Data editing Page 82Page 82
Module B4: Processing ICT survey data
UNCTAD
4
Phases of data processingPhases of data processing
Raw dataRaw data
Quality controls during data collection and entryQuality controls during data collection and entry
Clean Clean data filedata file
Data Data editingediting
Treatment of internal errors and Treatment of internal errors and inconsistenciesinconsistenciesEstimation of missing dataEstimation of missing data
Outlier analysisOutlier analysis
Re-weighting proceduresRe-weighting procedures
Editing of aggregatesEditing of aggregates
Micro-editingMicro-editing (input)(input)
Macro-editingMacro-editing (output(output))
EditinEditing!g!
B4.1. Data editing
Module B4: Processing ICT survey data
UNCTAD
5
Internal inconsistencies and errorsInternal inconsistencies and errors
Validity control of an individual data item requires:
1. To define a valid set of responses (in general, gender
should be = 0 or 1, age should not be 110 years, etc; in ICT
use of Internet by business should be 0 or 1)
2. To check questions against valid responses
- Definition of rules based in relationships between questions (see
Box 15 of the Manual: some logical tests)
3. Arithmetic checks during data entry or batch mode (totals,
subtotals, frequencies)
B4.3. Estimating ICT indicators Page 82Page 82
Module B4: Processing ICT survey data
UNCTAD
6
Treatment of missing dataTreatment of missing data
Final non-response (missing data) should be treated to avoid biased estimates. Unit non-response treatment: Corrective weighting.
• Sample-based methods (the original weights are modified with sample information)
• Population-based method (the weights are modified with population information, the classical post stratification procedure)
B4.3. Estimating ICT indicators Page 84Page 84
Module B4: Processing ICT survey data
UNCTAD
7
Treatment of missing data Treatment of missing data (cont.)(cont.)
Final non-response (missing data) should be treated in order to avoid biased estimates. Item non-response treatment: Imputation.
• Deterministic imputation (a law).• Hot deck imputation (let’s do it now).• Cold deck imputation (using other information,
models, econometrics…).• Mean or modal value imputation ( it is clear).• Historical imputation (long series).
B4.3. Estimating ICT indicators
Page 151 Annexe Page 151 Annexe 55
Module B4: Processing ICT survey data
UNCTAD
8
Misclassified unitsMisclassified units
Two cases of misclassification Non-eligibility unit erroneously included
• This will reduce the effective sample size unless a reserve list is prepared
Eligible unit included in the wrong stratum or omitted from the frame altogether
• The technical solution consists of recalculating sample weights (see Box 17)
B4.3. Estimating ICT indicators
Page 86Page 86
Module B4: Processing ICT survey data
UNCTAD
9
Some simple weighting methodsSome simple weighting methods
The sample average in stratum h is defined as
The estimate for the total for stratum h can be obtained by multiplying the stratum average by the total number of businesses in the stratum (Nh)
hhh yNY .'
B4.2. Data weighting
Module B4: Processing ICT survey data
UNCTAD
10
LLL yNyNyNYYYY .........' 2211''
2'1
The estimate for the total in the population is just
or
Ln
iLi
LL
n
ii
n
ii y
nNy
nNy
nNY
112
22
11
11
1....
1.
1.'
21
1
11
1
'n
ii
L
l l
l yn
NY
See boxes 18 and 19 pag See boxes 18 and 19 pag 8989
B4.3. Estimating ICT indicators
Some simple weighting methods Some simple weighting methods (cont.)(cont.)
Module B4: Processing ICT survey data
UNCTAD
11
Estimating proportions and ratiosEstimating proportions and ratios
A proportion:
Four different types of estimates are very usual
Simple random sampling of a non-stratified population Stratified random sampling
• With one or several strata exhaustively investigated Ratio estimates with simple random sampling Ratio estimates with stratified random sampling
A ratio :'
'
Y
Xp
B4.3. Estimating ICT indicators
ICT indicators are mainly proportions and ratios.
Module B4: Processing ICT survey data
UNCTAD
12
CASE 1: CASE 1: Simple random sampling of a nonSimple random sampling of a non--stratified populationstratified population
The indicator can be expressed as the sample proportion:
The standard error (SE) of the sample proportion is estimated by:
SE expression valid with a sampling fraction of 10% or less
n
a
N
anN
nN
anN
w
awp
n
ii
n
ii
n
i
n
ii
n
ii
n
iii
11
1
1
1
1
/
)/(
)/(ˆ
B4.3. Estimating ICT indicators
Module B4: Processing ICT survey data
UNCTAD
13
CASE 2: Stratified random samplingCASE 2: Stratified random sampling
An unbiased estimate of p is:
Where, L: the number of strataNh : the population in stratum h (h=1, 2, ... L)nh : the sample size in stratum h (h=1, 2, ... L)
The estimate of the SE of:
See Annex 4 of the Manual for more details
L
hh
h
L
h
n
ihi
h
h
pN
N
N
an
N
p
h
1
1 1 ˆˆ
L
hh
h pVN
NpSE
1
2
)ˆ()ˆ(
B4.3. Estimating ICT indicators
Module B4: Processing ICT survey data
UNCTAD
14
CASE 3: Ratio estimates with simple random CASE 3: Ratio estimates with simple random samplingsampling
The indicator to estimate is:
The natural estimate of ratio p is:
Finally, one approximation of the SE is:
where is the sample average of n X observations,
N
ii
N
ii
x
y
X
Yp
1
1
n
ii
n
ii
n
ii
n
ii
x
y
xnN
ynN
X
Yr
1
1
1
1
ˆ
ˆˆ
1
)ˆ(1
)ˆ( 1
2
n
xry
Nn
nN
xrSE
n
iii
n
xx
n
ii
1x
This is a reference outside the scope of our This is a reference outside the scope of our course course
B4.3. Estimating ICT indicators