ess.vip validation
DESCRIPTION
ESS.VIP Validation. Objectives, scope & concepts. Angel Simón Delgado EUROSTAT Email: [email protected]. ESS.VIP VALIDATION. VIP VALIDATION – First Phase VIP VALIDATION – Deliverables in First Phase ESS.VIP VALIDATON – Definition of the project - PowerPoint PPT PresentationTRANSCRIPT
Eurostat
ESS.VIP Validation
Objectives, scope & concepts
Angel Simón DelgadoEUROSTAT
Email: [email protected]
Eurostat
ESS.VIP VALIDATION
VIP VALIDATION – First Phase VIP VALIDATION – Deliverables in First Phase ESS.VIP VALIDATON – Definition of the project Validation Service (EDIT)
2
Eurostat
ESS VIP-V First PhaseOverall goal: To develop validation solutions to be used by different production chains (horizontal integration), within the ESS (vertical integration)Bottom-up approach:• Extensive consultation of all possible stakeholders• Participative management• Business driven approach• From pilots experience to general principles
3
Eurostat
Scope, objectives and outputs• Documentation / Standardisation:
a) template and guidelines for process descriptionb) template and guidelines for a standard documentation of the validation process
• Methodological analysis of Data Validationa) Typology of validation rulesb) Standard definition of validation levelsc) Standard formalised “syntax” (understandable by business users) to express validation rules
• Distribution of responsibilities in the production chaina) Guidelines to be used for the attribution of responsibility in the whole production chain (MSs
and Eurostat) by the WG. b) Guidelines based on efficiency principles (Validation Corrections: “the sooner, the better”)c) Preparing for IS/IT solutions and architecture
ESS VIP-V First Phase
4
Eurostat
Scope, objectives and outputs
• Towards IT/IS solutions and architecture:a) Users’ requirements to develop a new software to allow business users
to input validation rules and the corresponding error messages in a shared Central Repository of Validation Rules. The new software should be able to generate the rules in the Validation syntax developed by the project.
b) Validation Architecture defining the elements and their relationships in an integrated validation system ("common platform") to be used by:
Internal users, in an appropriate IS architecture to facilitate horizontal integration of IT/IS systems
All stakeholders in the production chain
ESS VIP-V First Phase
5
Eurostat
Contribution to the VISION
More efficient production chain with
clear attribution of responsibilitie
s
Standard definitions,
guidelines and validation
syntax
Development of a common
validation procedure
Common solutions to be shared within
the ESS
6
7
1.1 Inventory of documents1.2 Analysis of inventory1.5 Inventory of validation rules1.6 Inventory of error messages
Documentation1.4 Validation typologies2.4 Analysis of validation
typologies2.5 Levels of validation
Methodology1.3 Validation & statistical processes2.4 Analysis of validation typologies3.1 Validation rules by typology3.3 Error messages
Examples
3.2 Validation syntax4.1 Functional specifications for
GUI
Solutions2.1 Documentation of validation
process2.2 Documentation of statistical
process3.3 Error messages
3.4 Selection of validation rules3.5 Improvement actions3.6 Attribution of responsibilities
Templates & Guidelines
ESS.VIP Validation First Phase - Deliverables
Eurostat
Deliverables: Validation levelsD
ata
Wit
hin
an o
rgan
isat
ion
Wit
hin
a do
mai
n
From
the
sam
e so
urce
Sam
e da
tase
t Same fileLevel 0: Format &
file structure
Level 1: Cells, records, file
Between files
Level 2: Revisions and Time series
Between datasets
Level 2: Between correlated datasets
From different sources Level 3: Mirror checks
Between domains Level 4: Consistency checks
Between different organisations Level 5: Consistency checks Va
lidat
ion
com
plex
ity
8
Eurostat
Deliverables: Typology of validation rulesFile Structure
Filename File typeDelimitersFormat
>1 file checks
Referential integrityCode listCardinalityMirrorTime seriesRevised data integrityModel-based consistency
1 file checks
Type
Length
Presence
Allowed character
Uniqueness
Range
1n file checks
ConsistencyControlConditional
9
Eurostat
Deliverables: Guidelines for the attribution of responsibility of validation activities in the whole production chain
10
Step
1 Data preparation by the NSI's St
ep 2 Transmission
of data and validation report St
ep 3 Loading
data to production database St
ep 4 Additional
processing & dissemination
Validation Controls – Different actors – Different responsibilities
• Guidelines for the allocation of responsibility, for the implementation of validation rules within the ESS based on an AGREEMENT Eurostat-NSI's with periodic performance revisions from both sides
• Proposal for a generic business architecture of data validation:
Eurostat
Validation report structure:
Header• Time stamp• User ID• Data checked (dataset name…)
Body
• Rules applied• Total failures
• No. errors• No. warnings
• Total records• Records failed• Sum of weights• Maximum
admissible error weight
• Rate of acceptance• Maximum possible
amount of error• Rate of
performance
Footer• Error/warning
messages
Deliverables: Standard template for error/warning messages
Standard templates for error/warning messages and for validation report
Error/warning message structure:
• Rule ID• Severity• Rule type ID• Message text• Action• Failing data
11
Eurostat
Deliverables: VALS - Validation syntaxStandard syntax for validation languageTo define a meta-language for the domain of statistical data validation to express, document and communicate validation rulesTrade-off between Human-understandable and Computer-parseable languageImplementation through Graphic User Interface to support business users to input and maintain validation rules and rule-sets
Types ExamplesType check validate ( type(A1.Rcount)='ΤΕΧΤ')Range & math validate( Table2.C_5 between
(Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 - tolerance) and (Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 + tolerance))
Code list check validate ( match_codelist (A1.Quarter, CL_QRT) )Range check validate ( H.HB050 between 1 and 12 )Rules & Metarules
validate ( age_range_rule.result = true and country_codes_rule.result = true)
… …12
Eurostat
Proposed approachESS.VIP VALIDATION - Package 1:
IMPLEMENTATION
Goals• Implementation of the methodological developments of VIP-V
Phase I in the statistical domains/WGs• Maintenance and refinement of standards developed• User requirements for further developments• Evaluation, monitoring and reporting
13
Eurostat
Proposed approach
Goals• Vertical integration of the micro data validation within the ESS
production processes taking into account the results of the first phase of ESS VIP-V
• Extension of the functional specifications to apply to micro data validation
• Integrated solutions for micro data validation
ESS.VIP VALIDATION - Package 2:
MICRO DATA
14
Eurostat
Goals• Adaptation of existing validation tools to the functional
specifications issued from ESS VIP-V Phase I• Deployment of validation solutions to MSs
• Distribution of validation rules in agreed language• Building-Blocks in an adequate web services architecture• Provision of web services validation solutions to be used by
Member States before transmission to Eurostat
Proposed approachESS.VIP VALIDATION - Package 3:
SOLUTIONS
15
Eurostat
Goals• Overall coordination of the project• Coherence of validation approaches within ESS• Implementation of the meta-language within ESS• Analysis of links with other VIP & ESS VIP projects• Good practices identification• More sophisticated validation solutions:
• Longitudinal validation• Mirror checks validation• ESS wide shared final warehouses
Proposed approachESS.VIP VALIDATION - Package 4:
EXTENSIONS AND GENERAL COORDINATION
16
Eurostat
Elements in a validation system: ESS production chain
Statistical domain n
Member States
statistical production
and validation
Single Entry Point
Web Inteface
Statistical domain 1
Validation Services
Member StatesEurostat
Statistical domain …
Eurostat
Eurostat
Validation Service
Validation Service
ErrorsValidation reportMetrics
Validation rulesrepository
Data Definition registry
Data
18
Validation architecture elements – Global overview
VALIDATION RULESETS MAINTENANCE
Repository of rulesets and their metadata
VALIDATION SERVICE
SYNTAX ANALYSER
SYNTAX SPECIFICATIONS
Data
Data Structure
Validation Report
SEPSingle
Entry Point
Eurostat
Current actions and next steps
Assessment of Eurostat domains in the field of validation
Set of standard documentation for domain managers for harmonisation of communication with data providers
Task Force to: • Identify best practices in ESS• Advice on implementation in the ESS• Optimisation of the validation process
20
Eurostat
Current actions and next steps
Functional specifications for Validation services (EDIT) to be accordingly adapted to the findings of the project
Tools development:• System to create/maintain a central repository of validation
rules• Development and/or adaptation of IT Tools (EDIT, eDAMIS)• Validation Quality Metrics
21
Eurostat
Thank you
More information on:Email:[email protected] (only from EUROSTAT):http://www.cc.cec/wikis/display/ESTATmethodology/ESS.VIP+VALIDATION)