Metadata driven application for data processing – from local
toward global solution
Rudi Seljak Statistical Office of the Republic of Slovenia
Summary of presentation
• Introduction • Current generic application – main
characteristics• Development of global solution • Changes in the statistical process• Conclusions
Introduction
• Statistical data processing:– Demanding, time consuming and very expensive task– Constant pressure for budget cuts
• Rationalisation of the statistical process:– Take advantage of the rapid IT development– Movement from domain oriented to process oriented production– Stove-pipe IT solutions replaced by general applications
• Statistical Office of the Republic of Slovenia (SURS)– SURS began systematic development of generic solutions 6 years ago– Prototype solutions for several parts of the process were developed – These solutions were already used for several large surveys (e.g. 2010
Agriculture Census and the 2011 Population Census)– The prototype generic solutions are now upgraded to a more global
solutions
Generalised solutions – main characteristics
• Small, generic solutions for small parts of the statistical process, called the building blocks: – Enable easy and flexible linking of inputs and outputs of the individual
components to the whole statistical process
– Can be plugged to different databases in different environments (e.g. ORACLE, SAS) if the input database follows few basic conditions
– They are designed as fully metadata driven (MDD) systems: one program code → the parameters for the execution of the processing for the concrete survey are provided through the special metadata tables
– The process metadata can be provided in different environments (SAS, MS Access, ORACLE) → the metadata organisation must follow the strict rules of its structure (tables and variables)
Building blocks - functioning
…
Different microdata databases
General SAS program
Ad-hoc program
Ad-hoc program
Building block
Different databases of process metadata
Linking bulding blocks into the process
Building block 1
MicrodataBuilding block 2
Ad-hoc program
Building block n
Transformed data
…
Ad-hoc program
Transformed data
Ad-hoc program
Transformed data
Process metadata
• The system is to a very large extent based on the process metadata:– Processing rules which enable adjustment of the general
program for different surveys.
• The process metadata are at the moment inserted directly into MS Access database– High probability of syntax errors – Users must be thoroughly instructed in order to correctly fill the
metadata
Table Variable Condition Corr_rule Step
TABLE1 X X/Y >1000 Round(X/100) 1
TABLE1 Z Z NE X X 2
Building blocks
• The basic tool of the whole system are the building blocks, which cover the particular processing phase.
• SAS macros which is able to operate on the basis of the process metadata.
• So far the building blocks for following phases are created:– Data validation (logical controls) – Deterministic corrections – Data imputations– Standard error estimation – Aggregation – Tabulation– Calculation of quality indicators– Disclosure control (testing phase)
Building a global solution
• The developed system is very open and flexible tool. • However certain re-integration would be needed to
increase its functionality: – To move the process metadata in ORACLE environment
– To create single, unique database of process metadata where process metadata for all the surveys are stored and maintained
– To develop the graphical interfaces for user friendly management of process metadata
– To link the system with the metadata repository
The new system
…
Different microdata databases
General SAS program
Ad-hoc program
Database of processing metadata
Metadata repository
Ad-hoc program
Application for metadata management
Data on tables and variables
Application for metadata managementDeterministic corrections
Application for metadata managementExecution of the particular process step
New application and statistical process
• Generic MDD application introduces changes in the implementation of data processing on general level: – Essentially different distribution of work between IT specialists, general
methodologists and IT experts
– Change in the role of subject-matter statisticians → changed expectations of their skills and capabilities
– The work organisation of the IT Department and the General Methodology Department will have to be changed from domain oriented to process oriented.
– Different approach of IT and methodology experts will be needed. • Experts capable of thinking and operating at a much more general level • Survey is just one of the realisations of the general statistical process.
Conclusions
• SURS developments in recent years: flexible, metadata driven generic solutions for different phases of data processing.
• Very open system will be replaced with more integrated and centralised system
• Main goal: Transition from the stove-pipe oriented production to the more integrated processing systems
• Two main challenges:– To build the generic IT solutions, which would „cover“ the wide
diversity of statistical surveys – To change the very „domain oriented state of mind “ among the
employees
Thank you for your attention