sisai - statistical information ystems … · adam.wronski@ec.europa.eu ... ‘vip on data...
Post on 22-Jun-2018
215 Views
Preview:
TRANSCRIPT
Commission européenne, 2920 Luxembourg, LUXEMBOURG - Tel. +352 43011 Office: BECH A3/122 - Tel. direct line +352 4301-35285 - Fax +352 4301-31092 http://epp.eurostat.ec.europa.eu adam.wronski@ec.europa.eu
EUROPEAN COMMISSION EUROSTAT Directorate B: Corporate statistical and IT services Unit B-3: IT and standards for data and metadata exchange
SISAI - STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
WORKING GROUP
3rdMEETING
13-14 MAY 2013
BECH BUILDING
ROOM AMPÈRE LUXEMBOURG
ITEM 2.7
WORKING DOCUMENT – Pending further analysis and improvements
European Commission – Eurostat/B1, Eurostat/E1, Eurostat/E6
WORKING DOCUMENT – Pending further
analysis and improvements
Based on deliverable 2.4 Contract No. 40107.2011.001-2011.567
‘VIP on data validation general approach’
2.4 - Exhaustive and detailed typology
of validation rules – v 0.1304
April 2013
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page ii
Document Service Data
Type of Document Deliverable
Reference: 2-4 EXHAUSTIVE AND DETAILED TYPOLOGY OF
VALIDATION RULES
Version: 0.1304 Status: Draft
Created by: Angel SIMÓN Date: 23.04.2013
Distribution: European Commission – Eurostat/B1, Eurostat/E1, Eurostat/E6
For Internal Use Only
Reviewed by: Angel SIMÓN
Approved by: Remark: Pending further analysis and improvements
Document Change Record
Version Date Change
0.1304 23.04.2013 Initial release based on deliverable from contractor AGILIS
Contact Information
EUROSTAT
Ángel SIMÓN
Unit E-6: Transport statistics
BECH B4/334
Tel.: +352 4301 36285
Email: Angel.SIMON@ec.europa.eu
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page iii
Table of contents Page
1 Introduction ..................................................................................................................................... 1
2 Validation rules ............................................................................................................................... 1 2.1 File structure ............................................................................................................................... 1
2.1.1 Filename check .................................................................................................................... 1 2.1.2 File type check ..................................................................................................................... 2 2.1.3 Allowed character checks .................................................................................................... 2 2.1.4 Format check ....................................................................................................................... 3
2.2 Checks within and between datasets ......................................................................................... 3 2.2.1 Type Check .......................................................................................................................... 3 2.2.2 Length Check ....................................................................................................................... 4 2.2.3 Presence Check ................................................................................................................... 5 2.2.4 Allowed character checks .................................................................................................... 5 2.2.5 Uniqueness Check ............................................................................................................... 6 2.2.6 Referential integrity .............................................................................................................. 6 2.2.7 Code List Check ................................................................................................................... 7 2.2.8 Consistency checks ............................................................................................................. 8 2.2.9 Cardinality checks .............................................................................................................. 10 2.2.10 Mirror checks ................................................................................................................... 10 2.2.11 Range Check ................................................................................................................... 13 2.2.12 Control Check .................................................................................................................. 13 2.2.13 Conditional Checks .......................................................................................................... 15 2.2.14 Time series checks .......................................................................................................... 15 2.2.15 Revised data integrity Check ........................................................................................... 17 2.2.16 Model – based Consistency Check ................................................................................. 18
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 1
1 Introduction
The aim of this document is to present, in most exhaustive way, all the validation rules currently
applied to the data received by Eurostat. This document is the evolution of document ‘Typology of
data validation rules and imputation methods’ completed in the task 1 of the project (deliverable 1.4
Typology of data validation rules and imputation methods).
2 Validation rules
The different validation rules can be split into two categories:
a. Checks on the file structure: This validation category involves consistency and reasonability
tests applied by the data manager prior to integration into the Database System. Consistency
tests verify that file naming conventions, data formats, field names, and file structure are
consistent with project conventions. Discrepancies are reported to the measurement
investigator for remediation.
b. Intra-dataset and inter-dataset checks: This category of data validation takes place after data
have been assembled in the database1. This validation category is the first step in data
analysis. Validation tests in this category involve the testing of measurement assumptions,
comparisons of measurements, and internal consistency tests.
When the validation failed, it gives two types of error2:
Fatal: the data is rejected;
Warning: the record can be accepted, with some corrections or explanations from the data
provider.
The presentation of the rules is structured as follows: a short description of each rule type is followed
by examples of its application in several domains. The examples are drawn from the inventory of
validation rules (deliverable 1.5 – 1.6 of the project).
2.1 File structure
2.1.1 Filename check
Checks that the filename is consistent with file naming conventions based on predefined rules, for
example CENSUS_2011_LU_SEX. This validation also checks implicitly the filename length whether
is consistent with file naming conventions agreed for each domain e.g. Windows imposes a 260
maximum length for the Path+Filename. Below we present in a table the fields for which filename
check validation rules applied for the distinct domains:
2.1.1.1 Road freight transport statistics
1 Intra-dataset checks take place before the data is assembled in the database as well. Some basic
checks – consistency, integrity can be performed on the incoming file. 2 Eurostat unit B3 proposes three levels of error with progressively increasing impact on the quality of
input data: warning, error and fatal error. Further developments will take this into account depending
on the needs. Moreover, there are ideas about error weights to be added up over a whole file and the
report to contain these kinds of 'measurements' or 'indicators'.
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 2
Type of error: Fatal
Table File naming convention File name/File name length
(characters)
A2
Country Code (2-characters) +
Year (2-digits) + Quarter Code (2-
characters) + ’ROAD’+ Table
Name
LU09Q3ROADA2.dat/16
A1
Country Code (2-characters) +
Year (2-digits) + Quarter Code (2-
characters) + ’ROAD’+ Table
Name
IT07Q2ROADA1.dat/16
2.1.2 File type check
Checks the type of data file we are dealing with. This validation is quite important since both sender
and receiver rely both upon the compatibility and integrity of data file e.g. a system can require input
data in csv format.
2.1.2.1 Road freight transport statistics
Type of error: Fatal
Table File File Format
A1 All data Files referring to A1
table
a DAT format which is a generic
"data" file or a ZIP format which
is used for data file compression
A2 All data Files referring to A2
table
a DAT format which is a generic
"data" file or a ZIP format which
is used for data file compression
2.1.3 Allowed character checks
Checks that ascertain that only expected characters are present as field or record separators. For
example for a csv file may only allow comma as field separator. Below we present in a table the fields
for which allowed character check validation rules applied for the distinct domains:
2.1.3.1 Farm structure survey statistics
Type of error: Fatal
Table Field Valid character checks
Any table Any data field A plus sign ‘+’ used as field
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 3
separator
Any table Any data record
A plus sign ‘+’ is used as record
separator followed by a line feed
character
2.1.3.2 Rail transport statistics
Type of error: Fatal
Table Field Valid character checks
A1 – A9 Any data field A semicolon ‘;’ used as field
separator
A1 – A9 Any data record A semicolon ‘;’ used as field
separator
Any table Any data record
A plus sign ‘+’ is used as record
separator followed by a line feed
character
2.1.4 Format check
Checks that the data is in a specified format (template), e.g., each record must contain ten fields.
Below we present in a table the fields for which format check validation rules apply for the distinct
domains:
2.1.4.1 Rail transport statistics
Type of error: Fatal
Table Field Valid character checks
A1 – A9 Any data record Each record must include 18
fields
A1 – A9 Any data file corresponding to
tables A1 – A9
Each file must include the
correct names for the fields and
in the specified order
2.2 Checks within and between datasets
2.2.1 Type Check
A type check will ensure that the correct type of data is entered into that field. By setting the data type
as number, only numbers could be entered e.g. 10,12, 14, and you would prevent anyone to enter text
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 4
such as ‘ten’ or ‘twelve’. Below we present in a table the fields for which type check validation rules
applied for the distinct domains:
2.2.1.1 Road freight transport statistics
Type of error: Fatal
Table Field Valid Type
A1 Rcount Text
A1 A1 Text
A1 Year Text
A1 Quarter Text
A1 QuestN Text
A1 A1.1 Text
A1 A1.3 Number
A1 A1.6 Text
A1 A1.8.1 Number
A1 A1.8.2 Number
A1 A1.9 Number
A1 Stratum Text
A1 A2link Text
2.2.2 Length Check
Sometimes we may have a set of data, which always has the same number of characters. For
example if alpha – 2 codes are adopted for countries a length check could be set up to ensure that
exactly 2 characters are entered into the field. This type of validation can’t check that the 2 characters
are correct but it can ensure that 1 or 4 characters aren’t entered. A length check can also be set up to
allow characters to be entered within a certain range. Below we present in a table the fields for which
length check validation rules applied for the distinct domains:
2.2.2.1 Road freight transport statistics
Type of error: Fatal
Table Field Valid Length (in
characters/digits)
A1 Rcount 2
A1 A1 2
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 5
A1 Year 4
A1 Quarter 2
A1 QuestN 9
A1 A1.1 1
A1 A1.3 2
A1 A1.6 5
A1 A1.8.1 5
A1 A1.8.2 4
A1 A1.9 8
A1 Stratum 7
A1 A2link 5
2.2.3 Presence Check
Checks that important data are actually present and have not been missed out, e.g., for road freight
transport data files the survey year is mandatory. The check would not ensure that each field was filled
in the correct way. Below we present in a table the fields for which presence check validation rules
applied for the distinct domains:
2.2.3.1 Road freight transport statistics
Type of error: Fatal
Table Field Mandatory Presence
A1 Year Yes
A1 Quarter Yes
A1 QuestN Yes
A1 A1.9 Yes
A1 Stratum Yes
A1 A1.3 Yes
2.2.4 Allowed character checks
Checks that ascertain that only expected characters are present in a field. For example a numeric field
may only allow the digits 0-9, the decimal point and perhaps a minus sign or commas. Below we
present in a table the fields for which allowed character check validation rules applied for the distinct
domains:
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 6
2.2.4.1 Road freight transport statistics
Type of error: Fatal
Table Field Valid character checks
A1 A1.9 A comma as decimal separator
instead of full stop
2.2.4.2 Farm structure survey statistics
Type of error: Fatal
Table Field Valid character checks
Any table Any numerical field A full stop as decimal separator
instead of comma
Any table Any data field The character ‘:’ is used for non
available data
2.2.5 Uniqueness Check
The uniqueness checks are integrity rules, which checks that each value in specific fields is unique.
This can be applied to several fields (i.e. Country, Year, Type of transport). This type of validation
checks for duplicate data values in certain combinations of fields, which created mistakenly during
data import process. Below we present in a table the fields for which uniqueness check validation
rules applied for the distinct domains:
2.2.5.1 Road freight transport statistics
Type of error: Fatal
Table Table key (fields combination) Unique
A1 Rcount + Year + Quarter + QuestN Yes
2.2.6 Referential integrity
Referential integrity is a data quality concept. Data quality is a common concern of information system;
the first line of defense for data quality is a series of human controls. Once input into the database
computer-based controls used to eliminate problems, which reduce data quality. Referential integrity is
a computer-based control that ensures that relationships between tables remain consistent. When one
table has a foreign key to another table, the concept of referential integrity states that you may not add
a record to the table that contains the foreign key unless there is a corresponding record in the linked
table e.g. a Journey record in Journey file with no corresponding key in Vehicle file. Below we present
in a table the fields for which referential integrity validation rules applied for the distinct domains:
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 7
2.2.6.1 Road freight transport statistics
Type of error: Fatal
Table1 Table2 Foreign key
A1 A2 Rcount + Year + Quarter + QuestN
A2 A3 Rcount + Year + Quarter + QuestN +
JourN
2.2.6.2 Inland Waterways transport statistics
Type of error: Fatal
Table1 Table2 Foreign key
A1 B1 Reporting Country + Year + Type of
Transport
A1 C1 Reporting Country + Year + Type of
Transport
B1 D1 Reporting Country + Year + Type of
Transport
C1 D2 Reporting Country + Year
2.2.7 Code List Check
A table look up check takes the entered data item and compares it to a valid list of entries that are
stored in a database table. Below we present in a table the fields for which code list check validation
rules applied for the distinct domains:
2.2.7.1 Road freight transport statistics
Type of error: Fatal
Table Field Valid List of entries
A1 Rcount
All country codes in table
COUNTRY where Year equals
the reference Year
A1 A1 A1
A1 Quarter Q1, Q2, Q3, Q4
A1 A1.6 All NACE codes where Year
equals the reference Year
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 8
2.2.7.2 Maritime transport statistics
Type of error: Warning
Dataset Field Valid List of entries
A1 Reporting port
All available codes for survey
year
A1 National partner port
A1 Non sea partner countries
F1 Size of vessel
F1 Type of vessel
B1 Type of cargo
2.2.7.3 External Trade statistic
Type of error: Warning
Dataset Field Valid List of entries
INTRASTAT Commodity Code against
partner country
All available codes for survey
year
INTRASTAT Country of origin Valid ISO country codes
INTRASTAT Region of origin/Destination All acceptable codes
INTRASTAT Nature of transaction All acceptable codes
2.2.8 Consistency checks
Checks fields to ensure data in these fields corresponds, e.g., If file naming convention includes
country code e.g. ‘LU09Q3ROADA2.dat’ then the reporting country code indicated in the dataset
should be Country = “LU". This validation applies not only to categorical fields but also to numerical
fields e.g. V13310<1.18*V16130. Below we present in a table the fields for which consistency check
validation rules applied for the distinct domains:
2.2.8.1 Road freight transport statistics
Type of error: Fatal
Table Field1 Field2 Consistency Rule
A1 Year Filename Year Field1=Field2
A1 Quarter Filename Quarter Field1=Filed2
A1 A1.8.1 A2.6
Field1=SUM (A2.6)
for the A2 linked
records
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 9
A1 A1.8.2 A2.6
Field1>=SUM (A2.6)
for the A2 linked
records
2.2.8.2 Structural Business Statistics
Type of error: Fatal or Warning
Table Variable Consistency Rule
Series 1A V12150/V12120 0.85<V12150/V12120<1.15
Series1A V13110/V12120 0.85<V13110/V12120<1.15
Series 1A V12110/V16110 0.85< V12110/V16110<1.18
Series 1A V12150/V16110 0.82< V12150/V16110<1.22
Series 1A V13310/V16130 0.85< V13310/V16130<1.18
Series 1A V13310/V12120 0.85< V13310/V12120<1.15
Series 1A V13320/V13310 0.85< V13320/V13310<1.15
Series 2A V12150/V12120 0.85< V12150/V12120<1.15
Series 2A V13110/V12120 0.85< V13110/V12120<1.15
Series 2A V12110/V16110 0.85< V12110/V16110<1.18
Series 2A V12150/V16110 0.85< V12150/V16110<1.22
Series 2A V13310/V16130 0.85< V13310/V16130<1.18
2.2.8.3 Rail transport statistics
Type of error: Warning
Table Variable Consistency Rule
C1, E2 C1-11 – E2-12 0.05<= C1-11 – E2-12 <=0.2
C3, E2 C3-12 – E2-09 0.05<= C3-12 – E2-09 <=0.2
Type of error: Fatal
Table Variable Consistency Rule
C1, E2 C1-11 – E2-12 C1-11 – E2-12>0.2
C3, E2 C3-12 – E2-09 C3-12 – E2-09 >0.2
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 10
2.2.8.4 Farm structure survey
Type of error: Fatal or Warning
Table Variable Consistency Rule
Table2 A_3_2_4_1 – C_2 A_3_2_4_1 – C_2<=0
Table2 A_3_2_4_2 – C_4 A_3_2_4_2 – C_4<=0
Table2 A_3_2_4_4 – C_5 A_3_2_4_4 – C_5<=0
Table2 B_6_2_1 – A_3_1 B_6_2_1 – A_3_1<=0
Table2 B_5_2 – B_5_2_1 B_5_2 – B_5_2_1>=0
2.2.9 Cardinality checks
Checks that record has a valid number of related records. For example in an imaginary Census we
have household data and personal data, If based to household records the number of persons living in
the same household is three, there must be three associated records in personal data for this
household (Cardinality = 3). Below we present in a table the fields for which cardinality check
validation rules applied for the distinct domains:
2.2.9.1 Road freight transport statistics
Type of error: Fatal
Table Field1 Field2 Cardinality Check
A1 A1.8.1 A2link If Field1=0 then
Field2=0
A1 A1.8.1 A2link If Field1<>0 then
Field2<>0
A1 A1.8.2 A2link If Field1=0 then
Field2=0
A2 A2.1 A3link If Field1=4 then
Field2=0
A2 A2.1 A3link If Field1<>4 then
Field2<>0
A2 A2.1 A2.2 If Field1=4 then
Field2=0
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 11
2.2.10 Mirror checks
These quality checks have been performed in order to compare the consistency between two partner
declarations. Mirror validation may entail, according to the data category under consideration, the
reconciliation of stocks and transactions data; the differences with partner data or preshipment
inspection data. Below we present in a table the fields for which mirror checks validation rules applied
for the distinct domains:
2.2.10.1 Road freight transport statistics
Type of error: Warning
Table Field Mirror field1 Mirror field2 Mirror Check
Table
A2
A2.2
(Weight of
goods)
A2.3 (Place of
loading (for a
laden journey):
either country
code or full region
code with
country)
A2.4 (Place of
unloading (for a
laden journey):
either country
code or full region
code with
country)
A2.2[A2.3] A2.2[A2.4] Round_Err
Table
A2
A2.2
(Weight of
goods)
A2.8 (Place of
loading of the
goods road motor
vehicle on
another means of
transport)
A2.9 (Place of
unloading of the
goods road motor
vehicle on
another means of
transport)
A2.2[A2.8] A2.2[A2.9] Round_Err
Table
A3
A3.2
(Weight of
goods)
A3.5 (Place of
loading (for a
laden journey):
either country
code or full region
code with
country)
A3.6 (Place of
unloading (for a
laden journey):
either country
code or full region
code with
country)
A3.2[A3.5] A3.2[A3.6] Round_Err
2.2.10.2 Rail transport statistics
Type of error: Warning
Table Field Mirror field1 Mirror field2 Mirror Check
Table A1 A1.6 (Weight of
goods)
A1.5 (Place of loading
(for a laden journey):
outward international
transport, A1.5=3)
A1.5 (Place of unloading
(for a laden journey):
inward international
transport, A1.5=4)
A1.6[A1.5=3]
=A1.6[A1.5=4]
+Round_Err
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 12
2.2.10.3 Air transport statistics
Type of error: Warning
Table Field Mirror field1 Mirror field2 Mirror Check
Table A1 Passengers
Total Passengers on
board at Departure
(Reporting country)
Total Passengers on
board at Arrival
(Partner country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
Table B1 Passengers
Total Passengers
carried at Departure
(Reporting country)
Total Passengers
carried at Arrival
(Partner country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
Table A1 Freight and mail
Total tons of freight
and mail on board at
Departure (Reporting
country)
Total tons of freight
and mail on board at
Arrival (Partner
country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
Table B1 Freight and mail
Total tons of freight
and mail carried at
Departure (Reporting
country)
Total tons of freight
and mail carried at
Arrival (Partner
country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
2.2.10.4 Inland waterways transport statistics
Type of error: Warning
Table Field Mirror field1 Mirror field2 Mirror Check
Table A1 Weight of goods
Place of loading: either
country code or full
region code with
country
Place of unloading: either
country code or full
region code with country
Weight of goods [Mirror field1]
= Weight of goods [Mirror
field2]+Round_Err
2.2.10.5 Maritime transport statistics
Type of error: Warning
Table Field Mirror field1 Mirror field2 Mirror Check
Table D1 Passengers
Total Passengers
embarked at
Departure (Reporting
country)
Total Passengers
disembarked at Arrival
(Partner country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 13
Table A1 Freight
Total tons of freight on
board at Departure
(Reporting country)
Total tons of freight
and on board at Arrival
(Partner country)
Passengers [Mirror
field1] =Passengers
[Mirror field2] +
Deviation
2.2.10.6 External Trade
Type of error: Warning
The classic example of mirror checks comes from INTRASTAT.
MSs report monthly the arrivals of goods from the other MSs and the dispatches of goods to other
MSs. Therefore for each combination of [Reference month, Type of goods, Dispatching MS, Receiving
MS] there are two data items: one for the dispatch declared by the dispatching country as reporting
country and one for the arrival declared by the receiving country as reporting country. In principle the
statistical values and quantities in the two items must be equal.
This is not always the case due to different reporting thresholds for different MSs and different
recording dates of shipments, which strand two months. The principle however remains and is used in
actual validation.
2.2.11 Range Check
Checks that the data lay within a specified range of values, e.g., the month of a person's date of birth
should lie between 1 and 12. This validation checks data also for one limit only, upper OR lower, e.g.,
data should not be greater than 2 (<=2). Below we present in a table the fields for which range check
validation rules applied for the distinct domains:
2.2.11.1 Road freight transport statistics
Type of error: Fatal or Warning
Table Field Valid Range (Minimum,
Maximum)
A1 A1.3 (0,30)
A1 A1.9 (0,99999.9999)
A2 A1.4 (5,700)
A2 A1.5 (3,400)
A1 Year >1998
A1 A1.8.1 <10000
A2 A1.5 =< 1200
A2 A1.5 =< 0.7 * A1.4
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 14
2.2.12 Control Check
This is a total done on one or more numeric fields, which appears in every record. This field is called
the Control totals key figure field. This is a meaningful total, e.g., add the total payment for a number
of Customers. Control totals are used to verify the integrity of the contents of the data. Below we
present in a table the fields for which control check validation rules applied for the distinct domains:
2.2.12.1 Farm structure survey
Type of error: Warning
Table Key fields Control Check
Table2 C_5_1, C_5_2, C_5_3 C_5 = C_5_1 + C_5_2 + C_5_3 +
Round_Err
Table2 C_4_1, C_4_2, C_4_99 C_4 = C_4_1 + C_4_2 + C_4_99
+ Round_Err
Table2 C_3_2_1, C_3_2_99 C_3_2 = C_3_2_1 + C_3_2_99 +
Round_Err
Table2 C_2_1, C_2_2, C_2_3, C_2_4,
C_2_5, C_2_6, C_2_99
C_2 = C_2_1+C_2_2, C_2_3+
C_2_4+C_2_5+C_2_6+C_2_99 +
Round_Err
2.2.12.2 Agricultural statistics
Type of error: Warning
Table Key fields Control Check
Data Bullocks, Bulls, Cows, Heifers,
Calves
Bovines = Bullocks + Bulls +
Cows + Heifers + Calves +
Round_Err
Data Bullocks, Bulls, Cows, Heifers Adult cattle = Bullocks + Bulls +
Cows + Heifers + Round_Err
Data Sheep, Goats Sheep = Sheep + Goats +
Round_Err
Data Slaughtering, Exports of live
animals, Imports of live animals
Gross Indigenous production =
Slaughtering + Exports of live
animals - Imports of live animals +
Round_Err
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 15
2.2.12.3 Migration statistics
Type of error: Warning
Table Key fields Control Check
IMM1CTZ Sex (Total, Male, Female) Total = Male + Female
IMM7CTB
Citizenship (TOTAL,
NATIONALS, NON-
NATIONALS, UNK_GR)
TOTAL = NATIONALS + NON-
NATIONALS + UNK_GR
IMM6CTZ
Country of birth (TOTAL,
NATIVE-BORN, FOREIGN-
BORN, UNK_GR)
TOTAL = NATIVE-BORN +
FOREIGN-BORN + UNK_GR
2.2.13 Conditional Checks
Conditional checks perform different checks depending on whether a pre-specified condition evaluates
to true or false. Below we present in a table the fields for which conditional check validation rules
applied for the distinct domains:
2.2.13.1 Road freight transport statistics
Type of error: Fatal or Warning
Table Condition Type Condition Conditional Check
A2 Format check A1.2 like ‘1XX’ A1.5 <= 0.7*A1.4
A2 Format check A1.2 like ‘2XX’ A1.5 <= 0.8*A1.4
A2 Format check A1.2 like ‘3XX’ A1.5 <= 0.85*A1.4
A2 Limit check A2.1=1 A2.2<=A1.5
A2 Limit check A2.1=2 A2.2<=A1.5
A2 Limit check A2.1=3 A2.2<=A1.5
2.2.14 Time series checks
Time series checks are implemented in order to detect suspicious evolution of data during the time.
They can be associated to outlier detection. The second type takes into account the seasonality of
data. Below we present time series validation rules applied for the distinct domains:
2.2.14.1 Maritime transport statistics
Type of error: Warning
Table Field Indicator Valid Range (Minimum,
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 16
Maximum)
A1 Gross weight of goods A1(t) / A1(t-1) (Low limit, High Limit)
C1 Number of TEU's and number of
units for cargo type 5, 6 and X C1(t) / C1(t-1) (Low limit, High Limit)
D1 Passengers excluding cruise
passengers D1(t) / D1(t-1) (Low limit, High Limit)
F2 Gross tonnage and number of
vessels F2(t) / F2(t-1) (Low limit, High Limit)
2.2.14.2 Inland Waterways transport statistics
Type of error: Warning
Table Field
Indicator Valid Range (Minimum,
Maximum)
A1 Tonnes by type of transport A1(t) / A1(t-1) (Low limit, High Limit)
A1 Tonnes by type of transport and
type of goods A1(t) / A1(t-1) (Low limit, High Limit)
A2 Tonnes by type of transport A2(t) / A2(t-1) (Low limit, High Limit)
B1 Tonnes by type of vessel B1(t) / B1(t-1) (Low limit, High Limit)
B1 Tonnes by nationality of vessel B1(t) / B1(t-1) (Low limit, High Limit)
B2 Movements of vessels by type of
transport and loading status B2(t) / B2(t-1) (Low limit, High Limit)
2.2.14.3 Structural Business Statistics
Type of error: Fatal
Table Variable Valid Range (Minimum,
Maximum)
Series 1A V11110(t)/V11110(t-1) (0.82,1.22)
Series1A V12110(t)/V12110(t-1) (0.82,1.22)
Series 1A V12120(t)/V12120(t-1) (0.82,1.22)
Series 1A V12150(t)/V12150(t-1) (0.77,1.30)
Series 1A V13110(t)/V13110(t-1) (0.82,1.22)
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 17
2.2.14.4 Air transport statistics
Type of error: Fatal or Warning
Table Condition Type Condition Conditional Check
A1 Range check 10000 =< Passengers
<100000
Passengers (t)/Passengers(t-1)=>0.6
A1 Range check 10000 =< Passengers
<100000
Passengers (t)/Passengers(t-1)<=1.4
A1 Range check 50 ton =< Freight
transport <1500 ton
Freight transport(t)/Freight
transport(t-1)<=2
A1 Range check 100 =< Flights <1200 Flights(t)/Flights(t-1)<=1.7
A1 Range check 100 =< Flights <1200 Flights(t)/Flights(t-1)=>0.3
2.2.14.5 External Trade
Type of error: Warning
Table Variable Valid Range (Minimum,
Maximum)3
Model specifies
Valid Range
INTRASTAT Statistical Value (Low limit, Higher Limit) MAD
INTRASTAT Invoice Value (Low limit, Higher Limit) MAD
INTRASTAT Quantity (supplementary
unit) (Low limit, Higher Limit)
MAD
INTRASTAT Total value declared (Low limit, Higher Limit) MAD
2.2.14.6 National Accounts
Type of error: Warning
Table Variable Valid Range
V101.EE.B1GM.CLV00MF.QNW B1GM (t)/B1GM (t-1)
Average Growth Rate2
V102.EE.P3.CLV00MF.QSW P3 (t)/P3 (t-1)
Average Growth Rate2
V102.EE.P5.CLV00MF.QSW P5 (t)/P5 (t-1)
Average Growth Rate2
3 The lower and higher limits in the valid range defined by the limits specified in MAD routine for
detection of outliers
Project: ESS.VIP.BUS Common data validation policy
Document: 2.4 - Exhaustive and detailed typology of validation rules
(3rd_main)
Version: 0.1304
April 2013 Page 18
2.2.15 Revised data integrity Check
Revised data integrity check applies to revised datasets. This validation compares revised to initial
data and, if necessary4, investigates the sources of significant discrepancies. The levels of acceptable
discrepancies are either ad – hoc or model specified. Below we present revised integrity validation
rules applied for the distinct domains:
2.2.15.1 National Accounts
Type of error: Warning
Table1 (Revised data) Table2 (Initial data) Condition
V101.EE.B1GM.CLV00MF.QSW V.EE.B1GM.CLV00MF.QSW (ValueT1 – ValueT2) <=
0.005*ValueT2
V101.EE.B1GM.CLV05MF.QSW V.EE.B1GM.CLV05MF.QSW (ValueT1 – ValueT2) <=
0.005*ValueT2
V101.EE.B1GM.KPM95F.QSW V.EE.B1GM.KPM95F.QSW (ValueT1 – ValueT2) <=
0.005*ValueT2
2.2.16 Model – based Consistency Check
These rules compare quantitative data with limits derived from other data of the same reference
period, e.g. with limits set at a number of standard deviations around the data mean or limits derived
from a regression model that connects two variables.
Note: models are also used to derive limits from historical data for comparison of current data with
them. These are listed under type “Time series checks”, presented in section 2.2.14. For the time
being we have not found any specific example among the rules of deliverable 1.5 – 1.6.
4 For some datasets, the revision process is a normal one so the detection of revisions is a
'processing' step and not a pure validation step. However, in order to validate data, the revised figures
should be detected and tested against some thresholds
top related