workshop ess net on micro data linking and data warehousing in statistical production 22 & 23...

26
Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture” Antonio Laureti Palma IT - Structural Business Statistics Unit National Institute of Statistics – Italy

Upload: lexus-bridgett

Post on 01-Apr-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011

“Mapping the GSBPM on a SDW architecture”

Antonio Laureti Palma

IT - Structural Business Statistics Unit

National Institute of Statistics – Italy

Page 2: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

OverviewThe aim of this study is to define and contextualize a statistical data warehouse in order to define a framework to assist the development and definition of “data warehousing and data linking”.

The data warehousing architecture presented can be considered as an IT-conclusion of the activities of the first year of the ESSnet. While, the modelling approach proposed it would indicate the roadmap for the future IT representation on the context. It will be described by:

Data Warehousing as a Single Coherent Statistical production System

Statistical Data Warehousing an Architecture schema

Modeling the Business Domain - Designer’s view of the GSBPM on DWA schema

Modeling the Data/Metadata Domain

Conclusion

Page 3: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

The Data Warehouse

IT definition: In computing, a data warehouse is a database used for

reporting.

…the concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse" (from Wikipedia).

...as Bill Inmon says - “the data warehouse is at the center of the corporate information factory, which provides a logical framework for decision support environments and business management capabilities”.

...in essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to delivering business intelligence.

Page 4: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Data Warehousing for Enterprise

MARKETING

PRODUCTION

SALES

RESOURCES

DISTRIBUTION

DATA WAREHOUSE

DSSDecision

Support System

MISManagement

Information System

ETL

ETL

ETL

ETL

ETL

ENTERPRISE PRODUCTION LINE

DW centrality in an enterprise is obtained trough a IT infrastructure transversal to all the operational systems.

The data from operational systems are Extracted Transformed and Loaded (ETL) into the DW and then they are available for the DSS and MIS.

Page 5: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Data Warehousing for Statistics

SURVEYS

ADMIN DATA

DATA WAREHOUSE

ETL

STATISTICAL PRODUCTION LINE

RESOURCES

OUTPUT

REGULATIONS

ETL

ETL

ETL

In a NSI, if the DW is mainly used for improving production efficiency, like for an enterprise, it is transversal to the statistical production line:

ELABORATION ETL

ETL

DSSDecision

Support System

MISManagement

Information System

Page 6: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Data Warehousing for Statistics

SURVEYS

ADMIN DATA

DATA WAREHOUSE

ETL

STA

TIS

TIC

AL

PR

OD

UC

TIO

N

LIN

E

RESOURCES

DDS

MIS

SDStatistical

Dissemination

REGULATIONS

ETL

ETL

ETL

In a NSI, if the DW is used for “improving the production efficiency” (DSS-MIS) and for “creating the statistical product” (SD), then the DW is part of the production line.

…in this case, the DW could be considered as a single logical repository, the center of the information factory, of all information generated from the NSI:

Page 7: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

From the survey, two issues arise:  Single coherent system (questions 6 to 13)

15 counties declare they do not have a single coherent system, even if 11 out of them are planning to change it... this situation will probably largely change in the next five years...

Current output requirements are not integrated into data systems for 10 countries and the situation will probably change for half of them...

Those who have a single coherent system do not want to change it, metadata and data-input are totally integrated in the data system as well as admin data.

  Motivation to start DW (question 14)

The main motivations are linked to the ways to (re)use data, the improvement of the efficiency and the process integration in business statistics production...

Adjunct motivations are integrating the project in the organization processing model, reducing the burden (cost and time) on survey responders and increasing consistency and quality.

Page 8: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

In a stove-pipe production system every single production line corresponds to a specific domain of statistics, together with the corresponding production system. For each domain, the whole production process from survey design to dissemination, takes place independently of other domains, and each has its own data suppliers and user groups: 

Disadvantages of a stove-pipe-like production

administrative data

Information Society

Science Technology Innovation

….

Short Term business Statistics

surv

ey d

ata

Structural Business Statistics

elaboration statistical output

SBS

STS

IS

STI

I/O

STS

SBS

STI

IS

I/Oda

ta in

tegr

atio

n

BusinessRegister

Page 9: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Data Warehousing as a Single Coherent System

In a NSI, a single coherent Data Warehousing System (DWSys) is finalized to improve the production efficiency and to create the statistical products, in a full integrated way.

From this view, the DWSys becomes the “effective” Information System of the full statistical production line. Then, the DWSys should be used to refer to the interaction between: People, Business Processes, Data and Technology.

The Statistical Data Warehouse (SDW) then can be seen as a central statistical data store, regardless of the data’s source, for managing all available data of interest, improving the NSI’s ability to: (re)use data to create new data/new outputs; perform reporting; execute analysis; produce the necessary information.

Page 10: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWSys Architectural description

A DWSys Architecture (DWA) for statistics is a rigorous description of the structure of the NSI production, which comprises DWSys components (business entities or sub-process), the externally visible properties of those components, and the relationships (e.g. the behavior) between them.

The DWA should be a framework for a NSI which defines how to organize the DWSys:

provide the mechanisms for communicating information about the relationships that are important in the architecture

provide the discipline to gather and organize the data and construct the views in a way that helps ensure integrity, accuracy and completeness

support the application of method and use of tools

Page 11: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Layers of the enterprise architecture

In the context of the creation of enterprise architecture it is common, to recognize four types of architecture, each corresponding to its particular architectural domain.

Page 12: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA – Business Domain

To provide a DWA as detailed as possible, in the context of statistics production, we could articulate the business domain in four functional layers: data source layer, integration layer, interpretation and data analysis layer, access layer.

Each layer has its data domain structure: operational data, for data warehousing

meta data, the description data of the SDW, usually used to manage, describe and monitor the information systems.

Page 13: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA layered business architecture

INTERPRETATION& DATA ANALYSIS

DS

SM

ISS

TAT

IST

ICA

LD

ISS

EM

INA

TIO

N

SOURCE INTEGRATION

STAGINGAREA

BUSINESSREGISTER

PRIMARYDATA

DATAMART

ACCESS

DATAMART

DATAMART

SURVEYS 1

ADMIN DATA 1

RESOURCES

REGULATIONS

SURVEYS n

ADMIN DATA n

META DATA MANAGEMENT

Page 14: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA - functional Layer

Source Database Layer:

This level is responsible for, physically or virtually, storing the data from internal (surveys) or external (archives) sources for statistical purpose.

Typical data sources, in the context of business statistics, are data from : specific surveys, like STS, ICT, CIS, SBS, Customs Agency, Revenue Agency, Chambers of Commerce, National Social Security Institute.

Page 15: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA - functional Layer

Integration layer:

It is used for all integration and reconciliation activities of data sources. Into this layer we have the set of applications that perform the main ETL, which manages: inconsistent coding for the same object, the consistency is

obtained by coding defined by the data warehouse; adjustment of the different units of measurement and

inconsistent formats; alignment of inconsistent labels, same object named differently.

Usually the data are identified according to the definition contained in the metadata of the system.

incomplete or incorrect data; in this case operation may require human intervention to resolve issues not predictable a priori.  

data linking, in which different sources enable the creation of extended, or new, units of analysis.

Page 16: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA - functional Layer

Interpretation and data analysis layer:

The basic functions performed at this level are advanced analysis and interpretation of data-elaborations, both based on statistical algorithms. Here “statistical expert users” operate to produce strategic value information, working with the maximum granularity data. Only a reduced number of users are allowed to access the data, in order to prevent lack of servers performance.

This strategy of “process of information delivery”, where the demand for new statistical information does not involve the construction of new statistical production lines, but rather the creation of other data marts. Results of these activities are unplanned aggregate data for the next access layer or to develop software rules for next iteration, through data marts, regarded as subsets of the DW, usually oriented to a specific business line or team.

Page 17: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA - functional Layer

Access Layer:

It is the layer for the final presentation of the information sought, addressed to a wide typology of users, not necessarily expert on business statistics, or informatics instruments. They are:

- Specialized Business Intelligence tools: in this extensive category, in terms of solutions on the market, we find tools to build queries, navigational tools (OLAP viewer) including Web browsers;

- Graphics and publishing tools: the Business Intelligence tools are able to generate graphs and tables for its users, this solution consists essentially in just a couple of steps to avoid inefficiency.

- Office Automation tools: this is a reassuring solution for users who come for the first time to the data warehouse context, as they are not forced to learn new complex instruments. The problem is that this solution while adequate with regard to productivity and efficiency, is very restrictive in the use of the data warehouse, since these instruments, have significant architectural and functional limitations;

Page 18: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA – Modeling the Business Domain

The designer's view of business is also known as the analytical view and there are various standards for modeling this view. One mostly commonly used modeling standard is the Generic Statistical Business Process Model (GSBPM).

The GSBPM definition by UNECE is (vers.4):“The original intention was for the GSBPM to provide a basis for statistical organizations to agree on standard terminology to aid their discussions on developing statistical metadata systems and processes. The GSBPM should therefore be seen as a flexible tool to describe and define the set of business processes needed to produce official statistics”.

So, in order to define a general and comprehensive architecture for statistical production, it may be useful to identify and locate the different phases of a generic statistic production process on the different DWA’s functional levels.

Page 19: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Generic Statistic Business Production Model

Page 20: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

DWA - Mapping the GSBPM on DWA

The analysis of sub-processes locations on a SDW architecture is graphically represented in the next slides, with: SDW functional layers on the horizontal axis and the nine GSBPM phases on the vertical axis. Each element inside the graph is a sub-process, we will consider from the 4td to the 7td GSBPM phases.

That is only an example of Model Processing. Each case must be validated and discussed on the different operational context this is just a basis for setting and starting the modelling work for the next two year of the ess-net.

In the context, each sub-process must be regarded from either a:methodological,planning,technological,operational,

point of view. Blank sub-processes are related to methodological, or planning, metadata definitions, meanwhile brown sub-processes are related to operational, or technological, function for data elaboration.

Page 21: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

6

Analyze

6.4-apply disclosure

control

6.2-validate outputs

Source Layer Access Layer

Interpretation and analysis Layer

Integration Layer

6.3-scrutinize and explain

6.5-finalize outputs

6.1-prepare draft output

7

Disseminate

7.1-update output systems 7.2-produce dissemination

7.5-manage user support

7.4-promote dissemination

7.3-manage release of

dissemination products

Designer's view - Mapping the GSBPM on DWASub-Process of the GSBPM allocated on the functional layers of the DWA.

Page 22: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

5

Process

Source Layer Access Layer

Interpretation and analysis Layer

Integration Layer

5.1-integrate data

5.2-classify & code

5.4-impute

5.5-derive new variables and statistical units

5.6-calculate weights

5.7-calculate aggregate

5.8-finalize data files

5.3-review, validate & edit

4

Collect4.2-set up collection

4.3-run collection

4.4-finalize collection

4.1-select sample

Designer's view - Mapping the GSBPM on DWASub-Process of the GSBPM allocated on the functional layers of the DWA.

Page 23: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Graphic scheme of layered architecture with a focus on “statistical data”:

Designer's view – Modeling the Data Domain

Page 24: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

SDA – Modeling the Meta Data Domain

Our purpose is to refer to an IT infrastructure of SDW, so we should consider only structured metadata articulated as:Structural Metadata (SM), they are used for description,

identification and retrieval of statistical and quality information. Moreover they could link the various different components of the SDW;

Process Metadata (PM), they are used to store the data usage and maintenance of process administration, as well as the proper information for automatic execution of work flows or management systems.

Both of them can be Active, when they enables operational use, manual or automated, for one or more processes, or Passive in all other uses.

Page 25: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Graphic scheme of layered architecture with a focus on “meta data”:Designer's view - Modeling the Meta Data Domain

Page 26: Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 SEPTEMBER 2011 “Mapping the GSBPM on a SDW architecture”

Conclusion

We have contextualized the statistical production in a Data Warehousing Architecture.

So, we have introduced a general Enterprise Architecture vision for a SDW production system.

We have showed as the GSBPM representation can be used for modelling the business domain of the SDW layered architecture, for a complete operational view for the deploy of statistical production cases.

Finally, we have showed the corresponding four level data-domain of the architecture for a Statistical Data Warehouse.