· web viewintegrated statistical information system (isis) in croatia item 1.1 of the agenda...

17
Doc. Eurostat/ITDG/October 2009/1.1 IT Directors Group 21 and 22 October 2009 BECH Building, 5, rue Alphonse Weicker, Luxembourg-Kirchberg Room QUETELET 9.30 a.m. - 5.30 p.m. 9.00 a.m – 1.00 p.m. Integrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda

Upload: lamcong

Post on 28-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

Doc. Eurostat/ITDG/October 2009/1.1

IT Directors Group

21 and 22 October 2009

BECH Building, 5, rue Alphonse Weicker, Luxembourg-Kirchberg

Room QUETELET

9.30 a.m. - 5.30 p.m.9.00 a.m – 1.00 p.m.

Integrated Statistical Information System (ISIS) in Croatia

Item 1.1 of the agenda

Page 2: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

Integrated Statistical Information System (ISIS) in Croatia

1. INTRODUCTION

The idea to develop an automated statistical survey processing system on the client/server platform resulted from the operational circumstances in the Central Bureau of Statistics (CBS):

the IT sector is strictly centralized, i.e. IT sector processes all statistical surveys upon the descriptions laid out by statisticians.

more than half (60 out of 100) statistical surveys are still processed on the mainframe

the majority of surveys have similar processing stages (data entry, validation, correction, tabulation, dissemination); therefore the majority of corresponding data processing jobs have similar structure which could be incorporated in a generalized solution. Such a solution was developed in CBS for data processing on the mainframe in the 80s and is still in use.

2. STATISTICAL INFORMATION SYSTEM AND THE STATISTICAL BUSINESS PROCESS

The majority of CBS´s statistical surveys are processed in following steps:

1. Planning all activities2. Survey design and description3. Data capture and file transfer4. Validity checking against preset rules and producing error-list to be presented to

the statistician in charge5. On-line correction6. Tabulation, i.e. producing statistics for statisticians supervision7. Publishing, i.e. producing first releases and other statistics8. Archiving9. Monitoring

It should be pointed out that steps mentioned above refer to the data processing stage of any particular statistical survey or, in other words, to the statistical survey as seen from the perspective of the IT sector which is responsible for providing adequate IT solutions for all statistical activities within the organization. Mapped to the Generic Statistical Business Process Model (Level 1)1, the steps mentioned above fit the model as follows:

Generic Statistical Business Process Model

CBS’s Survey Business Process

Need

1 UNECE Secretariat: Generic Statistical Business Process Model, Version 3.1; December 2008; Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS)

Doc. Eurostat/ITDG/October 2008/x.x 1

Page 3: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

9. Monitoring

Design

Build

1. Planning all activities2. Survey design & description

Collect 3. Data capture & file transfer

Process 4. Validity checking 5. On-line correction6. Tabulation

Analyze

Disseminate 7. Publishing

Archive 8. Archive

Evaluate

Since the CBS’s business processes regarding data processing follow the general pattern as laid out by the joint work of international statistical organizations, it is reassuring that an information system built upon those processes will be in line with statistical work in most countries and therefore the know-how could be shared.

3. STATISTICAL METATDATA SYSTEM

The first step towards an integrated survey processing solution was to establish a metadata system containing information concerning every stage of the statistical survey’s business process. There were three principal reasons to implement the metadata maintenance system in CBS prior to development of ISIS:

to standardize definitions across all statistical activities to provide users with a tool to describe the statistical survey using standardized

definitions, thus providing input parameters for statistical survey data processing to present statistics on internet along with its context in order to make statistics

understandable and available to users of all types, i.e. to extend the use of statistics beyond the usual statistical publications.

The CBS’s metadata system (CROMETA) contains Reference ModelTM concepts2

extended and customized for CBS needs along with specifics needed to run PC-Axis as the main dissemination tool. Although the model is very complex and rather demanding to comprehend, it proved to be well conceived from the beginning, or rather from the moment we fixed the 'big picture'. Now the metadatabase is stable, with high tolerance for occasional changes that occur along with development of specific solutions for particular stages of the statistical life cycle.

The central metadata repository is presently rather empty since the ISIS system is still in test phase; it contains basic metadata on just a few statistical surveys that were selected as pilots. We are well aware of the problems which may arise among statisticians with the obligations to enter or transfer all the 'knowledge' of all statistical activities.

2 MetaNet project within Eurostat (2000-2003)

Doc. Eurostat/ITDG/October 2008/x.x 2

Page 4: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

4. INTEGRATED STATISTICAL INFORMATION SYSTEM (ISIS)

The aim of the ISIS is to enhance all aspects of statistical production i.e. data capture, data validation and cleaning processes, data warehousing and presentation of statistics to the public. In practice, one general SW solution was developed consisting of modules for particular statistical business processes. The metadata repository, containing descriptions of all data, activities and processes in CBS provides the fundaments of such a system.

The basis and precondition for the automation is well structured metadata, entered and maintained by the owners of statistics. The metadata repository must contain all the necessary information to be used as parameters for a general 'program' that produces specific operating procedures for particular surveys. Therefore it could be stated that centrally stored metadata could more or less automatically 'drive' the statistical production system.

4.1. ISIS Architecture

The architecture of the CBS ISIS solutions follows common techniques for modern system development, presenting a multi-tiered, scalable solution, customized for multi-user environment. Generally it consists of the data layer in the bottom, keeping a business layer above it to manage the business logic of the system. On top of the business layer is the presentation layer, dealing with user interaction, providing several application interfaces for metadata maintenance, survey processing and macrodata dissemination.

Doc. Eurostat/ITDG/October 2008/x.x 3

Page 5: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

4.2. ISIS Tools

There are several customized tools developed in CBS that support different aspects of ISIS: metadata maintenance tool CROMETA, survey processing tool Survey Processor and a tool for data tabulation/aggregation Warehouse Browser. These tools enable users to process a survey in an automated fashion and they are guided in a very easy and natural way from survey design to output results containing aggregated data and ready-to-use tables.

The following picture shows user entry points to the different ISIS components, presented in the form of a flow diagram of the applications and their sequence in the survey processing.

Doc. Eurostat/ITDG/October 2008/x.x 4

CROMETA Metadata Manager:

Define your survey using the metadata maintenance tool. Copy metadata from an existing version or start from scratch.

Survey Processor:

Process the study in an automated fashion – taking advantage of modern techniques and computer-aided functions.

Warehouse Browser:

Create cubes or ready-to-use tables and customize reports in various formats for printing or on-line publishing.

Page 6: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

4.2.1. CROMETA Metadata Manager

The main tool for metadata administration and maintenance provides a Windows-like interface for adding, browsing, editing and generally maintaining metadata. Basically everything that could be done to metadata could be achieved here; given you have the appropriate privileges to the metadata you are aiming for. The background of CROMETA is very complex though, and the tool is intended to be used by metadata experts.

The following picture gives an example of the interface to the central metadata repository:

The CROMETA Metadata Manager tool has some outstanding functionalities:

Multi-language support - CROMETA Metadata Manager supports an indefinite number of languages. Also the interface itself supports several languages; hence it could be customized for any language.

Versions management - all metadata objects may exist in numerous versions. Each version must be of one and only one state, however only one version of each object could be current/authorized.

History management - every time a change is made to metadata, this is logged with modifying user and date. Through the history management, metadata may be studied

Doc. Eurostat/ITDG/October 2008/x.x 5

Page 7: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

and followed over time, including all changes, explanations, involved users and its lifetime cycle.

Authorization - in order to secure the metadata, a full-covering model for access and authorization is applied while using CROMETA Metadata Manager.

General functions - all metadata, despite type or usage, could be added, edited or deleted using the exact same methods in the maintenance tool.

Add new version based on an existing version of a metadata object - to minimize the work describing similar things, CROMETA Metadata Manager supports "Add new based on", which means that all properties and connections of an existing version of an object are copied into a new object that could be further edited.

General properties managed in a general way - when working with any metadata object, the general properties will always be available in the same way. In practice this means that footnotes, keywords, documents, etc. could be connected to all metadata objects in the same flexible way.

Subscription to metadata - a user is allowed to sign-up to any object, specifying with which frequency he or she would like to be notified on changes.

Locking and unlocking of metadata - in order to ensure that the same metadata are not edited simultaneously by two different users, CROMETA Metadata Manager has built-in functionality for exclusive locking and unlocking of metadata.

4.2.2. Survey Processor

Survey Processor is a tool for survey data processing containing several modules for each step described in chapter 2. The Monitoring module is started by default if the user has the access rights and an authorization level for survey processing. User can access only those modules which are granted to him/her in the metadata repository.

Doc. Eurostat/ITDG/October 2008/x.x 6

Page 8: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

A module is enabled when its corresponding process is started, but there is a predefined scheme of enabling modules, depending on the survey version specifics. For example, if there are questionnaires used in the survey, then they must be first linked to the population by matching appropriate variables. Thus, the Data import process cannot be started before the links are defined and corresponding module is not available until all conditions are met. For obvious reasons, process of Data Cleaning will not be started before data import was successfully completed.

Modules were developed to cover particular phases in the statistical survey processing. The following picture shows modules of the Survey Processor developed so far. Along with the modules, their main functions are also presented. The function titled in blue letters indicates some sort of metadata transaction in that particular module.

Doc. Eurostat/ITDG/October 2008/x.x 7

Page 9: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

In the future there should be more modules included, i.e. automatic coding, manual coding, publishing, etc.

The Monitoring module is intended for creating main processes for the survey and their management. Some basic statistics about processing is also presented in this module, i.e. number of respondents, rate of response, error statistics, data correction statistics, etc. It should be stressed that relevant process management information is stored to the central metadata repository.

The Population definition module is intended to create the survey population from a frame that can be selected from statistical and/or administrative registers and populations from previous study versions.

The Survey design module contains many features; from questionnaire design and definition to the description of other sources, from variable and data sources description to the definition of validation rules and value domains.

The Data import module is used for importing data in several data formats (delimited, fixed-width, etc.). During the import data is checked and cleaned in several phases. All metadata about the validation is stored into the metadata repository, while errors found in the data are logged in the Processing database.

The Validation module enables re-applying of standard and specific validation rules in checking. This can be done at any moment after the first import of data or cleaning of erroneous data. All metadata about the validation is also stored to the metadata repository, while eventual errors are logged in the Processing database.

The Data cleaning module is intended primarily for corrections of the erroneous data. All data errors are clearly presented with many graphic features. Every correction is logged in the Processing database with the information about the user, old and new values, etc. This module is the only one that does not produce updates in metadata system.

The Tabulation module triggers the Warehouse Browser application.

Doc. Eurostat/ITDG/October 2008/x.x 8

Page 10: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

4.2.3. Warehouse Browser

The Warehouse Browser application can be accessed either directly or from the Survey Processor.

Again, users can access only those survey version registers and/or cubes that they are authorized for. The variables are clearly presented and divided in groups (qualitative, quantitative, time and weight variables), can be easily filtered and presented in an appropriate way in result tables by using value domains. All information about them is retrieved from the metadata system and new variables can be derived and stored into the repository. The results can be customized by users (different style templates, column widths, page settings, etc.) and saved in several data formats (MS Excel workbooks, PDF, HTML, px-files for PC-Axis, etc.).

5. ORGANIZATIONAL ISSUES AND LESSONS LEARNED

The work on the project was carried out in CBS with support of the Swedish Statistical Office (SCB), financed by the Swedish International Development Cooperation Agency (Sida). Activities started in 2002, and the detailed plan for CROMETA was laid out in May 2004. In between there was some swaying in search of the right direction, given the complexity of the goals and the complexity of the existing production system which had to be taken into consideration. The ISIS development started in September 2006. By the end of 2007 the SCB support expired and the finalization is still under way in CBS. Pilot surveys were applied, the system was amended upon the feedback from the pilots and the broader testing is about to begin.

Participants in those activities were mainly IT people and some statisticians. It should be noted that CBS's IT people are more than well informed about statistical activities, methodologies and processes since they process or manage more than 100 statistical surveys on various platforms on a daily basis. The fact that the IT sector contained both knowledge about statistical methodologies and IT development made it natural that the system should be developed in-house. That saved a lot of money since it was performed during regular working hours. On the other hand, the development duration was significantly longer because the development team could not be appointed to this job exclusively. 

Outsourcing would take a year or so for the software development itself, but prior to that, it would take another year of work with statisticians to sort out, explain and document all the requirements. Given the complexity of CROMETA concepts it seems unlikely if not impossible to outsource the development and receive a comprehensive solution to be used and maintained by statisticians.

There is also the crucial problem of 'brain drain' throughout CBS and particularly the IT sector. Some fifteen people involved in CROMETA and ISIS development left IT sector in last three years, that is, almost half of the total number of developers and 80% of developers who started the project. They were replaced of course, but plenty of know-how was lost nevertheless. This influenced a great deal the activities laid out in the plan since there was a constant shift of duties and responsibilities along with the constantly growing amount of regular production work.

The complete ISIS information system in general and CROMETA system in particular will force considerable changes upon the overall culture of CBS. By implementing ISIS

Doc. Eurostat/ITDG/October 2008/x.x 9

Page 11: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

we expect some changes of roles, namely more work should be completed on locations where and when the need originates (in subject-mater departments), instead of following the much more time-consuming procedure of central data processing. The degree of content or discontent by the majority of statisticians when they start using the ISIS tools is yet to be learned. In any case, we expect resistance from a number of subject-matter experts, especially those who cherish very much the legacy from past times when it was usual practice to order a tailor-made data processing system from the IT department.

The most important lesson learned from the project is that serious development requires a development team appointed to this and only this project. This applies to IT developers as well as statisticians. Of course we knew that even before we started the project but we could not afford to have experts unavailable to regular production for a longer period of time. For this reason the development lasted much longer than planned.

The most painful lesson learned was that there is no project interesting and challenging enough to keep young and well educated IT experts from going to better paid jobs. IT experts in government bodies are paid two or three times less than in private sector and therefore it is rather impossible to keep them without applying some measures for special rewards.

6. CONCLUSION

The absolute target of CBS was to develop and manage a completely metadata-driven automated processing system. This objective was met by now and the system is ready for production which will by no means give further feedback and generate new ideas. By fulfilling this target CBS has opened the possibilities for a full transfer from mainframe to the client/server environment i.e. to move production work closer to the subject matter experts. This should lead to the abandonment of the mainframe platform saving a large amount of maintenance money for the organization. No less an achievement should be the possibility to redirect a considerable amount of IT manpower into further development, taking into account that no software development will be needed for specific surveys, as regards general survey business processes. The solution developed upon the ISIS architecture should provide to that.

Doc. Eurostat/ITDG/October 2008/x.x 10

Page 12: · Web viewIntegrated Statistical Information System (ISIS) in Croatia Item 1.1 of the agenda Integrated Statistical Information System (ISIS) in Croatia Introduction The idea to develop

ANNEX:

SOURCES

1. CBS/SCB – Sida Project, presentation Zagreb, Croatia, November 30th 2005, Development of a Central Metadata Repository and a Public Macro Database at the Central Bureau of Statistics of Croatia

2. CBS/SCB – Sida Project, Presentation Zagreb, Croatia, June 17th 2008, Presentation of Integrated Statistical Information System: Metadata Manager, Survey Processor and Warehouse Browser

3. Andreas Goldman: Development of an Integrated Statistical Information System (ISIS) at the Central Bureau of Statistics, Croatia, ConsciousTM, October 2007

4. Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) – Luxembourg, April 2008, Metadata and the statistical cycle and Implementation, WP19, Case Study: Central Bureau of Statistics of the Republic of Croatia

5. Joint UNECE/Eurostat/OECD meeting on statistical information systems (MSIS) – Oslo, Norway, May 2009, Topic (III): Architecture (invited paper), CBS ISIS: Architecture for Survey Processing

Doc. Eurostat/ITDG/October 2008/x.x 11