monica consalvi – giuseppe garofalo – caterina viviano italian national statistical institute
DESCRIPTION
Quality in statistics: the BR case. Session: Quality indicators and quality measurement of Statistical Registers 10 July 2008. Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute. Quality in statistics: the BR case. - PowerPoint PPT PresentationTRANSCRIPT
Monica Consalvi – Giuseppe Garofalo – Caterina Viviano
Italian National Statistical Institute
Session: Quality indicators and quality measurement
of Statistical Registers
10 July 2008
Quality in statistics: the BR case
2
Quality in
statistics: the
BR case
Business Register vs Statistical Survey
BRs are statistical products with their own specificities:
• Extensive use of Administrative data
• Heterogeneity and variability of inputs
• Heterogeneity of users
• Relevance of technological aspects
• Output specificity (dissemination of micro data)
• Continuous data updating
Quality in
statistics: the
BR case
Extensive use of Administrative dataThe problem of quality is set in a different context – in comparison with statistical surveys – it is resolvable only ex-post: data is known but not how it is generated
Heterogeneity and variability of inputsQuality indicators for specific subsets of units and for different variables are necessary
Relevance of technological aspectsHuge amount of data, complex procedures for data integration and methodologies application, changes over time in applied rules (e.g. changes in classification, in adm. sources contents….)
Business Register vs Statistical Survey – Quality specificities
Output specification The dissemination of micro data suggests that “errors annul each other on average” is not true anymore. With reference to BR errors add one to another (e.g. over and under coverage)
Heterogeneity of users• The BR’s reference universe and updating period will be
different if used for the STS rather than for SBS• If the Value Added is estimated referring to BR’s universe, the
quality (e.g. activity code and size) of large units will be fundamental.
• If the indicators of the Business Demography take the BR as reference, the quality of the smaller units will be very important.
Quality in
statistics: the
BR case
Business Register vs Statistical Survey – Quality specificities
Continuous data updating Need to identify actual and spurious changes: structural development of the economy
• demographic aspects, • changes in size • changes in economic activity
process of revision of the register• the BR may acquire data referring to a previous time • actual changes recorded at a later time• delay in recording birth/death or in recording changes in
characteristics in the administrative registers
5
The BR quality indicators
The system of quality indicators refers to three dimensions:
1. The phases of the BR’s updating process
2. A framework of components of the quality
3. The factors for the building up of the indicators
Quality in
statistics: the
BR case
6
The phases of the BR’s updating process
The BR is the result of a conceptual and physical integration of several administrative and statistical input sources
1) Quality of the INPUT (input sources)
2) Quality of the process (matching, merging, editing, updating)
3) Quality of the OUTPUT
Quality in
statistics: The
BR case
7
A framework of components of the quality
To monitor the BR quality the most frequently used
components are:
- Coverage in terms of both units and variables
- Timeliness in terms of delay in updating
- Completeness
- Accuracy
Quality in
statistics: the
BR case
8
The factors for the building up of the indicatorsLa qualità del registro ASIA
A methodological process for assessing variables coming from administrative sources
Quality in
statistics: the
BR case
Five factors for defining a BR quality indicator: time, scope, subpopulation, variable and criterion
The most important factor is the criterion : a method to evaluate, unit by unit, the correctness of the variables’ values of the interest
• Compliance• Internal Consistency• Temporal Consistency• Metadata
9
Criteria (1)
Quality in
statistics: the
BR case
1. Compliance The value of a unit of the BR can be considered as correct if it is sufficient “close” to the reference value (external sources).The compliance determines whether or not the BR complies with an ex. source The compliance comes close to the reliability when the real value is not known
2. Internal consistency
A value will be deemed “correct” if it is coherent in relation to other variables of the same unit.
10
La qualità del registro ASIA
Criteria (2) A methodological
process for assessing variables coming from administrative sources
Quality in
statistics: the
BR case
4. Quality without ‘witness’ (use of metadata)Usage of a set of information included in the BR to measure quality without needing a reference value and with no element of comparison - variables of BR management or metadata system: validity date, estimation methodology, origin of data, data validation process.
3. Temporal consistency The quality is defined on the basis of a comparison between two values in two different periods.Big changes in short temporal lags are defined as impossible or less plausible
11
Phase: Input / Component : timeliness / Factor: temp. consistency
JANFEBMAR
APRMAY
JUN
JUL
AUG
Sep
OCT
NOV
DEC
JUL
AUG
Sep
OCT
NOV
DEC
JANFEBMAR
APRMAY
JIU
Supply_2004
Supply_2005
71%
57%
Source: Social SecurityIndicator: Percentage of records with declared
employees by month
A methodological process for assessing variables coming from administrative sources
Quality in
statistics: the
BR case
Supply’s year
2001 2002 2003 2004 2005 2006
BR reference year I(t)%
Cessationdate 2000 2001 2002 2003 2004 2005
1-[N(t+1)/N(t)
2000332.878 19 374.341 100 19 20
2001194.634 350.462 384.199 178 31 19 -9,6
200214 30.055 408.291 419.144 36 19 -2,7
2003- - 129.661 358.822 369.815 35 -3,1
2004- - 4 130.247 357.907 380.778 -6,4
2005- - - 28 79.721 384.272
Phase: Input / Component : coverage / Factor: temp. consistency
Source: Chamber of CommerceIndicator: Loss of information in dates of cessation
A methodological process for assessing variables coming from
administrative sources
Quality in
statistics: the
BR case
Phase: Process / Component : accuracy / Factor: metadata
Indicator: Variables Edit and Imputation
VAR INDICATOR It=2005 It=2005%
NACE
N° edit 202.333 1,85 %
N° imputation 87.628 43,31 %
N° edit without imputation 114.705 56,69 %
VAR INDICATOR It=2005 It=2005%
Empl.
N° edit 74.3120,68 %
N° imputation 72.768 97,92 %
N° edit without imputation 1.544 2,08 %
A methodological process for assessing variables coming from
administrative sources
Quality in
statistics: the
BR case
Phase: Process / Component : accuracy / Factor: int. consistency
Source: Tax AuthorityIndicator: out-of-date classification
INDICATOR It=2005 It=2005% Var_I[t-(t-1)]
N° record with out-of-date classification that are not decoded using NACE Rev 1.1
725.697 9,53 % 0,84 %
A methodological process for assessing variables coming from
administrative sources
Quality in
statistics: the
BR case
15
0,0
2,0
4,0
6,0
8,0
10,0
1999 2000 2001 2002 2003 2004 2005
Addres
Activity Status
Quality in
statistics: the
BR case
Phase: Output / Component : accuracy / Factor: compliance
Source: SME sample survey Indicator: differences in address and activity status
time series Population N error(t-2)_(t-1)_(t)
001 Entries 442,352 0000 Out never active 2,275,196 0111 Active 3,597,559 0110 Exits 313,413 0100 Exits in t-1 and not active 225,868 0011 Entries in t-1 and active 365,097 0010 Dis-activations 54,848 1.5101 Reactivations 52,567 1.5
Ij= 100 –[(xkj * ek) / xkj * 100]
I2005 = 97.8
Quality in
statistics: the
BR case
Phase: Output / Component : accuracy / Factor: temp. consistency
Indicator: coherence in activity status
17
The BR’s Quality Declaration (QD)
QD is a complex system of quality indicators
QD is based on the concept of transparency: to supply all the meaningful and useful tools to measure different quality components in relation to each stage of the process. QD consists of a rich documentation made up of a set of important direct and indirect indicators, having a time dimension for data, sources and variables.
QD contains:- meta-data - a set of indicators easily to be interpreted
Quality in
statistics: the
BR case
18
1. Phases of the process2. Components 3. Factors
Input C: timeliness, coverage, completeness, F: temporal consistency, internal consistency
Process C: coverage, accuracyF : temporal consistency, internal consistency, metadata
Output C: timeliness, coverage, completeness, accuracy
F: compliance, internal consistency, metadata
A methodological process for assessing variables coming from administrative sources
Quality in
statistics: the
BR case
The BR’s Quality Declaration (QD)
19
The BR’s Quality Declaration (QD)
37 Indicators have been identified:Quality
in statistics:
the BR case
CriteriaTimeliness Coverage Completeness Accuracy
ComplianceInt. Consistency 7Temp. Consistency 2 3 1Metadata
ComplianceInt. Consistency 2 2Temp. ConsistencyMetadata 1 3
Compliance 6Int. Consistency 2 4 1Temp. ConsistencyMetadata 2 1
Components
INPUT
PROCESS
OUTPUT
20
The BR’s Quality Declaration (QD)A methodological process for assessing variables coming from administrative sources
Quality in
statistics: the
BR case
1. Quality of Input Component – 1.1 Completeness 1.1.1 ) Address, s=CCIAA: Number of records ( % weight) with missing information
INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Records with missing address (cciaa)
% weight (abs.number of records)
0.49 (37,408)
-0.03
2. Quality of process Component – 2.1 Coverage 1) Number of records, s=CCIAA, not matched with the base MEF
INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Not matched Records (cciaa)
% weight (abs. number of records)
5.25 (338,304)
0.03
3. Quality of output Component – 3.2 Timeliness 3.2 Lag, in days, between dissemination time of BR and reference year of data
INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Timeliness of BR dissemination
Days of delay between the dissemination time and the reference year of data
492 +24
21
The BR’s Quality Declaration (QD)
The QD has been disseminated to internal users
for the first time in 2007
Problems not solved yet:
1. Dissemination of a different version for external users - containing only meta-data and indicators on quality of output.
2. The necessity to obtain a synthetic view of the proposed indicators using “compound indicators”.
3. Internal users were involved in the discussion around QD, but a deeper analysis of their suggestions has not been considered yet.
Quality in
statistics: the
BR case