ec. file · web viewin partnership with. title: overview of and recommendations on the...

27

Click here to load reader

Upload: ngodiep

Post on 08-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

in partnership with

Title: Overview of and recommendations on the use of metadata models.

WP: 1 - Metadata Deliverable: 1.3

Version: 2.0 (final version) Date: 2 September

2013

Autors:

Jos Dressen, Michel Lindelauf, Harry Goossens NSI:

Statistcics Netherlands (CBS)

ESS - NET

Page 2: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

ON MICRO DATA LINKING AND DATA WAREHOUSING IN PRODUCTION OF BUSINESS STATISTICS

INDEX

1. Introduction 2

2. The use of metadata models and standards 3

2.1 International models and standards 4

2.2 Relevance 7

2.3 Subset mapping 8

3. Conclusions and recommendations 9

4. Best practice cases 10

Appendix 1 15

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

1

Page 3: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

1. IntroductionIn the statistical data warehouse (S-DWH) the metadata satisfies 2 essential needs:

a. to guide statisticians in processing and controlling the statistical production

b. to inform end users by giving them insight in the exact meaning of statistical data

In order to meet these 2 essential functions, the statistical metadata must be:

correct and reliable (the metadata must give a correct picture of the statistical data)

consistent and coherent (the metadata driving the statistical processes and the reporting metadata presented to the end users must be compatible with each other)

standardised and coordinated (the data of different statistics are described and documented in the same standardised way).

Therefore it is essential to assure the management of metadata. To realise this, the use of a metadata model is a key element in structuring and standardising the statistical metadata within a NSI in a generic way.

In the Metadata framework1 (deliverable 1.1) the roles and purposes, definitions etc. of metadata in the statistical data warehouse are defined in generic terms. The framework defines a metadata model as follows:

[Def 3.6.1] A metadata model is a special case of a data model:an abstract documentation of the structure of metadata used by business processes.

In the context of the S-DWH, a metadata model is a standardized representation used to define all necessary metadata elements of statistical information systems, based upon and using 1 or more standards/norms. In these implementations, standards act as checklists for controlling the completeness and correctness of all metadata elements as described by the model.

In this document we focus on the use of potential metadata models and standards, providing a framework for capturing, maintaining and understanding the metadata when describing statistical data. Most of the models and standards that are described in chapter 2 where pre-selected by the project team mainly through studies on the internet and hints from outside the project team. Of course there are a lot more models and standards worldwide but we first wanted to compare the pre-selected ones to get a fist view of the opportunities as well as the similarity of models and standards for the SDWH. The first conclusions and recommendations about the use of models and standards in relation with the S-DWH are contained in chapter 3. In the final chapter the practice cases of Estonia, UK, Italy, Ireland, Sweden and the Netherlands are included. Appendix 1 contains a comprehensive and thorough overview of meta models and standards.

1 document.doc

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

2

Page 4: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

2. The use of metadata models and standardsIn 2011 this ESSnet sent out a questionnaire for the stocktaking on best practices in ESS-member states, which also included some questions about metadata. One of the issues mentioned as most important to focus on when developing a DWH was metadata to drive the process. Remarkably, almost all NSI mention on the one hand that metadata are “important” or “extremely important” for DWH systems. But on the other hand, 19 of the 24 NSIs admit that meta data are currently implemented in only a few systems. Further inquiries revealed that one reason for this apparent contradiction is that current metadata-models are considered as complex and cumbersome to deal with. Hence, one challenge of the ESSnet might be to provide recommendations about relatively easy to manage metadata models, which allow us to drive DWH systems.

In the context of the S-DWH at least 2 types of metadata models can be distinguished:

a conceptual model that usually gives a high-level overview on how the metadata is organised, managed, maintained etc. a physical model that describes the details of the metadata objects and attributes, including relations between the metadata objects.

More simple, you could say that a conceptual metadata model is a description of the overall metadata process(es), where the physical model is a structured description of the metadata elements. In the context of the term (metadata)model also the term standard needs to be reconsidered, as they are often used in relation or even mixed. The following general definition of a model is commonly accepted:

‘A model is a simplified description of an analogue part of the reality.’

For the term standard, often also norm is used as a synonym. The following general definition of standard/norm is commonly accepted:

‘A standard or norm is a document with recognized agreements, specifications or criteria about a product, service or method.’

Looking at the coherence of and/or the differences between both terms a standard/norm generally defines WHAT to be done, a model describes HOW to do it.

For example:

the standard/norm ISO 11179 is a international standard defining the representation of metadata in a metadata registry, without a physical representation, whereas the Nordic Metadata model provides a basis for organizing and managing metadata, as it describes the metadata systems that are being used in NSIs.

From the metadata perspective it is the ultimate goal to use one single model for statistical metadata, covering the total life-cycle of statistical production. But considering the great variety in statistical production processes (e.g. surveys, micro data analysis or aggregated output), all with their own requirements for handling metadata, it is very difficult and not very likely to agree upon one single model. Biggest risk is duplication of metadata, which you want to avoid of course. This best can be achieved by the use of standards for describing and handling statistical metadata.

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

3

Page 5: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

2.1 International models and standardsChapter 1.3 of the Metadata Framework briefly describes the models and standards considered most relevant for a S-DWH. Part B of the Common Metadata Framework of The Metis Group gives a more complete overview of concepts, standards, and models:http://www1.unece.org/stat/platform/display/metis/The+Common+Metadata+Framework

The most important standards in relationship to the use of metadata models are:

ISO / IEC 11179-3 2 ISO/IEC 11179 is a well established international standard for representing metadata in a metadata registry. It has two main purposes: definition and exchange of concepts. Thus it describes the semantics and concepts, but does not handle physical representation of the data. It aims to be a standard for metadata-driven exchange of data in heterogeneous environments, based on exact definitions of data. In particular Part 3 : Registry metamodel and basic attributesPrimary purpose of part 3 is to specify the structure of a metadata registry and also to specify basic attributes which are required to describe metadata items, which may be used in situations where a complete metadata registry is not appropriate.

Neuchâtel Model - Classifications and Variables The main purpose of this model is to provide a common language and a common perception of the structure of classifications and the links between them. The original model was extended with variables and related concepts. The discussion includes concepts like object types, statistical unit types, statistical characteristics, value domains, populations etc. The two models together claim to provide a more comprehensive description of the structure of statistical information embodied in data items.

Intended use: For setting up metadata models and frameworks inside statistical offices several models are used as a source or starting point. The Neuchâtel model is one of those models.

References - Classifications: http://www1.unece.org/stat/platform/download/attachments/14319930/Part+I+Neuchatel_version+2_1.pdf?version=1

References - Variables: http://www1.unece.org/stat/platform/download/attachments/14319930/Neuchatel+Model+V1.pdf?version=1

Corporate Metadata Repository Model (CMR) This statistical metadata model integrates a developmental version of edition 2 of ISO/IEC 11179 and a business data model derivable from the Generic Statistical Business Process Model. It includes the constructs necessary for a registry. Forms of this model are in use at the US Census Bureau at Statistics Canada.

Intended use: The model is a framework for managing all the statistical metadata of a statistical office. It accounts for survey, census, administrative, and derived data; and it accounts for the entire survey life-cycle.

References:http://www.unece.org/stats/documents/1998/02/metis/11.e.pdf

2 Homepage for ISOIEC 11179Information Technology – Metadata registries / http://metadata-stds.org11179/#A3

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

4

Page 6: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

For an overview paper on the subject. See also Gillman, D. W. "Corporate Metadata Repository (CMR) Model", Invited Paper, University of Edinburgh -Proceedings of First MetaNet Conference, Voorburg, Netherlands, 2001.

Relationships to other standards: ISO/IEC 11179 and Generic Statistical Business Process Model.

Nordic Metamodel, version 2.2 The Nordic Metamodel was developed by Statistics Sweden, and has become increasingly linked with their popular "PC-Axis" suite of dissemination software. It provides a basis for organizing and managing metadata for data cubes in a relational database environment.

Intended Use: The Nordic Metamodel is used to describe the metadata system behind several implementations of PC-Axis in national and international statistical organizations, particularly those using MS SQL Server as a platform.

Maintenance organization: Statistics Sweden (with input from the PC-Axis Reference Group)

References: PC AXIS SQL metadata base

Common Warehouse Metamodel (CWM) Specification for the metadata in support of exchange of data between tools.

Intended use: As a means for recording the metadata to achieve data exchange between tools.

Maintenance organization: OMG - Object Management Group

ISO Standard Number: ISO/IEC 19504

References: See OMG web site (http://www.omg.org), and specifically http://www.omg.org/technology/documents/formal/cwm_mip.htm

SDMX Statistical Data and Metadata eXchange, SDMX, was initiated by seven international organisations to foster standards for the exchange of statistical information. SDMX has its focus on macro data, even though the model also supports micro data. It is an adopted standard for delivering and sharing data between NSIs and Eurostat. Sharing the results from the latest Population Census is perhaps the most advanced example, so far. Recently, SDMX more and more has evolved to a framework with several sub frameworks for specific use:- ESMS- SDMX-IM- ESQRS- MCV- MSD

References: See SDMX web site ( http://sdmx.org ), and specifically http://sdmx.org/?page_id=10 for standards

DDI The Data Documentation Initiative (DDI) has its roots in the data archive environment, but with its latest development, DDI 3 or DDI Lifecycle, it has become an increasingly interesting option for NSIs. DDI is an effort to create an international standard for describing data from the social, behavioural, and economic sciences. It is based on XML. DDI is supported by a non-profit international organisation, the DDI Alliance.

References: http://www.ddialliance.org

GSIM The Generic Statistical Information Model (GSIM) is a reference framework of information

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

5

Page 7: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

objects, which enables generic descriptions of data and metadata definition, management, and use throughout the statistical production process. As a common reference framework for information objects, the GSIM will facilitate the modernisation of statistical production by improving communication at different levels:

- Between the different roles in statistical production(statisticians, methodologists and information technology experts);

- Between the statistical subject matter domains;

- Between statistical organisations at the national and international levels.

The GSIM is designed to be complementary to other international standards, particularly the Generic Statistical Business Process Model (GSBPM). It should not be seen in isolation, and should be used in combination with other standards.

References: Websitehttp://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model+(GSIM)GSIM Version 0.3http://www1.unece.org/stat/platform/download/attachments/65373325/GSIM+v0_3.doc?version=1

MMX metadata framework The MMX metadata framework is not an international standard, it is a specific adaptation of several standards by a commercial company. The MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The MMX framework is used by Statistics Estonia, so it needs to be considered from the point of practical experiences.

Appendix 1 gives a more comprehensive and thorough overview of models and standards.

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

6

Page 8: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

2.2 RelevanceAs not all models/standards we enlisted are relevant in the context of the S-DWH, we made a selection of the ones who are and need to be study more in depth. For this we used the following 4 selection criteria:

Topicality Date of last change/last reference on the internet ?

Are there (still) new developments of the model/standard ?

Support Is there an organisation that is in charge of the maintenance of the standard/model ?

Usage How extensive is the usage of the model ?Are there many / few users ?

Usability Is the model/standard difficult or easy to use ?Do we think it is usable in of the S-DWH ?

We made a first selection of relevance by scoring each model on the 4 categories:

Criteria Advise topicality support usage usability

Mod

els /

Sta

ndar

ds

ISO/IEC 11179-3 + +/- +/- +/- relevant

Neuchâtel Model +/- +/- + +/- relevant

CMR - - +/- - not relevant

Nordic Metamodel + + + + relevant

CWM - - +/- - not relevant SDMX:

+ + + ? relevant * SDMX-IM +/- +/- +/- +/- relevant

* EPMS ? ? ? ? unclear

* ESMS + ? ? ? relevant

* ESQRS + ? ? ? relevant

* MCV ? ? - ? not relevant

*MSD ? ? ? ? unclear

DDI + +/- +/- ? relevant

GSBPM + +/- +/- ? relevant

GISM + +/- +/- ? relevant

MMX +/- +/- +/- +/- relevant

Legend + good +/- moderate - not good ? more research needed

Figure 1: preselection of models and standards

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

7

Page 9: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

In this first selection we made following considerations: A model/standard is rated relevant if it has at least 1 ‘+’ A model/standard is rated not relevant if it has at least 1 ‘-’ A model/standard is rated unclear if it has mainly ‘?’ If a model/standard has mainly ‘+/-‘ we considered the overall context:

- SDMX-IM is rated relevant as it is a subset of SDMX- MMX is rated relevant as it is a key element in the new S-DWH of Statistics Estonia,

and we want to consider it in the general discussion.

2.3 Subset mapping

For the (possible) use in the S-DWH it is necessary to first map the relevant models/standards on the metadata subsets from the framework. Goal is to indicate for each subset which model/standard is to be considered and useful. In this mapping the GSBPM is not matched as WP 3 has made a mapping of the GSBPM on the S-DWH (deliverable 3.1).

Metadata subsets Statistical Process Quality Technical Authorisation

Mod

el /

Sta

ndar

d

ISO/IEC 11179-3 no yes no no ? Neuchâtel Model yes no no no ? Nordic Metamodel yes no no no ? SDMX: yes ? no yes ? * SDMX-IM no no no yes ? * ESMS yes yes ? no ? * ESQRS no no yes no ? DDI yes no no yes ? GSIM yes yes no ? ? MMX yes yes no yes yes

Figure 2: mapping potential models and standards on the metadata subsets

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

8

Page 10: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

3. Conclusions and recommendationsBased upon the results from the subset mapping we can conclude that most of the models and standards apply to statistical metadata and a few ones also apply to process metadata. The technical, quality and authorisation subsets do have less matches. For the SDWH this is good news because the main focus of the SDWH is on describing the exact meaning of files and the variables contained in those files and therefore mainly statistical metadata. But the subset mapping let us also recognized that there is not an ideal model that fits 100% for describing statistical metadata.

The 2 models below are in the opinion of the project team most suitable as a starting point for describing the statistical subset within a SDWH:

Nordic metamodel

Neuchâtel model3

Models are not or less suitable for describing process metadata. Describing process metadata mainly works better with a standard. But here we also recognized that there is not an ideal standard that fits 100% for describing process metadata. In the opinion of the project team the ISO 11179 can provide you with some guidelines for describing this subset in a uniform and standardized way. The final choice for a model or standard depends on the needs and on the financial and technical possibilities within the NSI. To establish a uniform policy and governance within the NSIs the project team recommends to use the guideline below:

1. Do not strive for 100% perfection but keep everything as simple as you can;

2. Determine the subset(s) of metadata to describe and for what purpose;

3. Select per subset a model or standard that covers most of the needs determined in step 2;

4. Use this model or standard as a starting point to define your final solution. It is very important that the selected model or standard applies to most of the attributes in the subset to be described. But only use a single model or standard for each subset to be described within the SDWH.

5. Only make adjustments to a model or standard when it is really necessary.

6. When it is necessary to make adjustments in the starting model or standard it is mandatory that you do describe these adjustments per subset;;

7. Publish the final model or standard and make sure that the whole NSI knows about it and will use it the same way;

8. Make sure that there is a change management board where users can report errors and shortcomings. Then let the board decide whether the model or standard should be adjusted and how that is being done;

9. Always document the adjustments approved by the board and make sure all users are aware of them on time and act in accordance with these adjustments.

3 This model is no longer maintained but can still provide the necessary information on how to describe statistical metadata

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

9

Page 11: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

4. Best practice casesBased upon the information from the stocktaking, uniform best practice descriptions are made for the specific NSIs, with special focus on the use, role and function of metadata in the S-DWH. This case studies provide good insight into the NSIs with the most developed metadata systems. For more specific information on the (possible) use of a metadata model, further research will be performed. In this paragraph w give an overview of the BP cases with focus on the metadata model use. In the Annex we add some complete BP-case descriptions.

NSI Statistics Estonia

Metadata Model(s) : YES

- MMX MOF3 Metadata model for the central metadata repository (iMeta)- Neuchatel model for statistical activity codes/descriptions, variable classifiers (in code lists)

for statistical reference metadata and statistical structural metadata- XDTL metamodel for process metadata- Relational Database Metamodel for technical metadata

Metadata System : YES

iMeta is the central metadata repository, based on MMX MOF 3 metadata model, which enables to manage several different meta models. SE manages both reference metadata and structural metadata (including process metadata, technical metadata, user roles and privileges etc).

The S-DWH (conformed collection of datasets) consists of data processed and prepared for analysis. In the data warehouse variables (columns) are linked with variable descriptions in iMeta. Data sets in Data Warehouse are versioned. Data sets are mutually linked with common dimensions and facts in different data sets are unique (avoid data duplication in different data sets).

NSI Office for National Statistics (ONS) - UK

Metadata Model(s) : NO Within ONS there is no standardised / centralised metadata model used.There was a metadata model developed several years ago which was based upon the Neuchatel model, but it was not put into operation.

At the moment a prototype S-DWH (covering multi-mode data collection and subsequent statistical phases) is being developed and the metadata in that is closely integrated with the data objects.There is need for standardising the metadata / use of a metadata model

Metadata System : NO ONS has no central metadata system / repository.There is no standard approach and coordination for the metadata management yet.There is a Standards and Guidance repository (Lotus Notes database) which is used to document metadata about the statistical processes used across the different statistical domains, but it is not to a specific structure or template.

There are various specific systems / repositories which have been developed for separate statistical processes and development projects independently. Currently, there is no coordinated approach.

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

10

Page 12: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

NSI Statistics Netherlands (CBS)

Metadata Model(s) : YES

CBS (methodology) has developed a dedicated CBS metadata model specifically for describing the so called steady states, inspired by Swedish model and Neuchatel (among others).

It is a generic OBJECT model, dedicated to describe statistical datasets, in a conceptual, non technical and uniform way. It treats micro data and macro data differently.It focuses on the description of structured reference (business) metadata

Technical metadata in separate, standard XML files, no part of the metadata model.Process and Quality metadata are part of metadata model but not yet standardised; free format text or separate documentations.

Metadata System : YES

The Data Service Centre (DSC) is the implementation of the CBS metadata model.The DSC-concept is passive and (meta)data oriented – steady states concept.The DSC is the central ‘data vault’ and metadata repository, linking:

▪ conceptual, describing metadata▪ technical metadata, in separate, standard XML files▪ statistical data, as standardised flat ASCII files (‘steady states’)▪ all other documentation (Word, PDF, Excel etc.)

Basic concepts:▪ Storage of DATA (steady states) after each processing step,

WITH METADATA (no data without metadata !)▪ Strict distinction between the statistical data that are actually processed

and the metadata that describe the definitions, the quality and the process activities▪ Steady states are explicitly designed for re-use of statistical data.▪ The metadata are generally accessible and are standardised as much as possible.

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

11

Page 13: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

NSI Italian National institute for Statistics (Istat) - IT

Metadata Model(s) : NO

Within Istat there is no standardised/centralised metadata model used. A conceptual layer model, called Osi (Objects-Information Frames) has been implemented for surveys/sources metadata, to specify terminologies and information frames. The role of metadata in mainly descriptive.

Metadata System : NO

Istat has no central metadata system/repository.There is no standard approach and coordination for the metadata management yet.

A centralised quality information system for metadata management, called Sidi manages metadata concerning the production processes of surveys. Sidi (Surveys Information System) has been designed as a tool for monitoring the quality of the Istat surveys from both a qualitative and a qualitative viewpoint. Therefore it also allows for calculating and disseminating standard quality indicators for surveys.

A centralised data management system, called Armida (ARchivio MIcro DAti – Micro data source), manages not only data but also metadata concerning surveys. Armida has been developed as a tool for allowing end users of survey data to access them at a micro level instead of macro level. The access is assisted as regard confidentiality.

NSI Central Statistical Office (CSO) - Ireland

Metadata Model(s) : NO

Metadata System : NO

The CSO currently provides a significant amount of metadata with the data it disseminates. However, as no formal metadata standards or management systems have been adopted, the CSO runs the risk of disseminating poor quality metadata.

The role of Metadata: Mainly descriptive. So far the metadata capturing is still minimal. Both S-DWH implementations (ADC and DMS) have an own metadata layer. In the DMS metadata must be entered as datasets/tables are created. There is no specific model being used.In general there is now some research done about DDI, as it looks like that DDI is being revitalised.

In 2009 CSO has defined a Metadata Strategy, outlining a vision for metadata:The standardisation of metadata management across the CSO organisation.

The vision takes into account the needs of both the compilers and users of CSO statistics.The vision sees providers of metadata taking responsibility for its maintenance and updating, to specified standards, using simple web based applications. Users of metadata will have easy access to all available and relevant metadata and that metadata will be maintained to a defined standard.

So far the implementation of the strategy could not be realised due to lack of resources/capacity. Planning is to start in 2013.

The table below gives a shirt overview of the current availability of metadata at CSO.

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

12

Page 14: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

Metadata Is this metadata available now?

Related metadata standard/ model

Is CSO metadata meeting this standard?

1. A catalogue of releases & publications

No - not in a systematic and easily accessible format

Dublin Core CSO does not have a central catalogue of releases and publications, with Dublin Core metadata. However, all releases and publications are available on CSO website

2. Databank Yes Nordic model Yes – in place as part of CSO output databases

3. Variable definitions

No - not in a systematic and easily accessible format

Neuchatel, ISO11179 No – no capturing of variable definitions to standard

4. Classifications Yes CARS Yes 5. Quality reports Work-in-progress Office standard,

SDDS/DQAF Yes – work in progress, However will need to investigate how to link Office standard to SDDS/DQAF IMF requirements

6. Questionnaires Yes Dublin Core No 7. Survey methodology & system processing notes

Ad-hoc availability & standard across the Office

To be decided No

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

13

Page 15: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

NSI Statistics Sweden (SCB)

Metadata Model(s) : YES

The Swedish Statistics act prescribes that all official statistics must be documented in a Description of the Statistics (DoS). It is a document in two parts: General information (contents, responsible authority, purpose, use) and a Quality declaration (accuracy, timeliness, coherence, availability). The current version of the model has been in use since 1999.

A metadata model for documentation of a survey round, its result (final observation register) and the production methods used (frame and sampling procedures, data collection, estimation, data processing system), SCBDOK was developed at SCB and has been in use for about 20 years.

Documentation of objects, populations, variables and value domains (including classifications) is a sub model of SCBDOK, called the MetaPlus model. This model was developed at SCB in 2004-06 as a replacement for an older model. It is based on ISO 11179, with some additions to support local needs. It includes a complete implementation of the Neuchâtel model for classifications.

The output databases are supported by a metadata model, MacroMeta, which was developed in-house in 1994-96.The current version is also known as the Nordic Metamodel.

Metadata System : YES

SCB does not have one single coherent metadata system, but several loosely connected ones.

DoS and SCBDOK have been implemented as documentation templates based on Microsoft Word with user instructions. Both documents are published on SCB’s web site.

The implementation of MetaPlus forms SCB’s repository for micro metadata. Currently, it is primarily a documentation system where users can search and reuse definitions and descriptions, but its design allows for it to be the basis of a metadata layer in a future data warehouse.

The implementation of the MacroMeta (the Nordic Metamodel) is an integral part of SCB’s Internet based dissemination of aggregated data or statistics (Sweden’s Statistical Databases, SSD).

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

14

Page 16: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

Appendix 1

Appendix 1: Overview models and standards for statistical metadata

id short name full_name short_description type version intro valid

maintenance

m01 ISO 11179 ISO/IEC 11179-3 ISO/IEC 11179 is a well established international standard for representing metadata in a metadata registry. It has two main purposes: definition and exchange of concepts. Thus it describes the semantics and concepts, but does not handle physical representation of the data. It aims to be a standard for metadata-driven exchange of data in heterogeneous environments, based on exact definitions of data. Several NSIs have based their current metadata systems on this standard. Most of those are developed in-house, but at least one commercial product exists that claims to support the standard (OneMeta MDR).In particular Part 3 : Registry metamodel and basic attributesPrimary purpose of part 3 is to specify the structure of a metadata registry and also to specify basic attributes witch are required to describe metadata items, which may be used in situations where a complete metadata registry is not appropriate http://metadata-stds.org11179/#A3

standard 3 ? yes

?

m02 Neuchâtel Model Neuchâtel Terminology Model Calssification

In 2004 the Neuchâtel Group issued version 2.1 of the Neuchâtel Terminology Model Classification data object types and their attributes. Main purpose of the work was to arrive a common language and common perception of the structure of classifications and the link between them. In 2006 the model was extended with variables and related concepts. The discussion includes concepts like object types, statistical unit types, statistical characteristics, value domains, population etc. The two models together claim to provide a more comprehensive description of the structure of statistical information embodied in data items. Intended use: For setting up metadata models and frameworks inside statistical offices several models are used as a source or starting point. The Neuchâtel model is one of those models. http://www1.unece.org/stat/platform/download/attachments/14319930/Part+I+Neuchatel_version+2_1.pdf?version=1 http://www1.unece.org/stat/platform/download/attachments/14319930/Neuchatel+Model+V1.pdf?version=1

model 2.1 2004 no

Neuchâtel Group (does no exist anymore)

m03 CMR Corporate Metadata Repository Model

This is a statistical metadata model that integrates a develpomental version of edition 2 of ISO/IEC 11179 and a business data model derivable from the Generic Statistical Business Process Model. It includes the constructs necessary for a registry. Forms of this model are in use at the US Census Bureau at Statistics Canada.Intended use: The model is a framework for managing all the statistical metadata of a statistical office. It accounts for survey, census, administrative, and derived data; and it accounts for the entire survey life-cycle. Relationships to other standards:ISO/IEC 11179 and Generic Statistical Business Process Model See also Gillman, D. W. "Corporate Metadata Repository (CMR) Model", Invited Paper, University of Edinburgh -Proceedings of First MetaNet Conference, Voorburg, Netherlands, 2001.http://www.unece.org/stats/documents/1998/02/metis/11.e.pdf&nbsp

model ? 1998 no

not anaymore

m04 Nordic Metamodel Nordic Metamodel

The Nordic Metamodel was developed by Statistics Sweden, and has become increasingly linked with their popular "PC-Axis"suite of dissemination software. It provides a basis for organizing and managing metadata for data cubes in a relational database environment. Intended Use: The Nordic Metamodel is used to describe the metadata system behind several implementations of PC-Axis in national and international statistical organizations, particularly those using MS SQL Server as platform. http://www.scb.se/Pages/List____314010.aspx

model 2.2 June 2008

yes

Statistics Sweden & PC Axis group

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

15

Page 17: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

m05 CWM Common Warehouse Metamodel (ISO/IEC 19504)

Specification for the metadata to achieve data exchange between tools. Intended use: As a means for recording the metadata to achieve data exchange between tools. CWM is designed by OMG to work in conjunction with several others of their standards: Meta-Object Facility (MOF), Unified Modelling Language (UML), XML Metadata Interchange (XMI), and Ontology Definition Metamodel (ODM)It also is well designed to fit into the processing cascade for a survey in a statistical office. http://www.omg.org http://www.omg.org/technology/documents/formal/cwm_mip.htm

model ? March 2004

yes

Object Management Group (OGM)

m06 SDMX Statistical Data and Metadata eXchange (ISO TS 17369)

Statistical Data and Metadata eXchange, SDMX, was initiated by seven international organisations to foster standards for the exchange of statistical information. SDMX has its focus on macro data, even though the model supports micro data. It is an adopted standard for delivering and sharing data between NSIs and Eurostat. Sharing the results from the latest Population Census is perhaps the most advanced example, so far. Several software products commonly used by NSIs support SDMX. The Statistical Data and Metadata Exchange (SDMX) initiative sets technical standards and content-oriented guidelines to facilitate the exchange of statistical data and metadata. There are several sections to the SDMX Technical Specification:1. The SDMX Information Model - the information model on which syntax-specific implementations described in the other sections are based.2. SDMX-EDI - the EDIFACT format for exchange of SDMX-structured data and metadata.3. SDMX-ML - the XML format for the exchange of SDMX-structured data and metadata.4. The SDMX Registry Specification provides for a central registry of information about available data and reference metadata, and for a repository containing structural metadata and provisioning information.5. The SDMX Implementer’s Guide - this is a guide to help those who wish to use the SDMX specifications. It includes reference material for the use of the SDMX Information Model;6. Web Services Guidelines - this is a guide for those who wish to implement SDMX using web-services technologies The aim of SDMX is to create and maintain technical and statistical standards and guidelines to be used and implemented by organisations dealing with statistical data and metadata. Together with the use of modern IT technologies, these standards and guidelines should improve efficiency by preventing duplication of effort. The SDMX standards support a data-sharing process based on the use of central registry services. Registry services provide visibility into the data and metadata existing within the community, and support the access and use of this data and metadata by providing a set of triggers for automated processing. The data or metadata itself is not stored in a central registry - these services merely provide a set of metadata about the data (and additional metadata) in a known location, so that users or applications can easily locate and obtain whatever data and metadata is registered. The use of standards for all data, metadata, and the registry services themselves permits a high level of automation within a data-sharing community http://sdmx.org/

standard 2.1 ? yes

SDMX is maintained by a group of seven sponsors known as the SDMX consortium: the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat, the International Monetary Fund (IMF), the Organization for Economic Cooperation and Development (OECD), the United Nations and the World Bank. The SDMX Secretariat (in which all sponsoring organisations are represented) is in charge of carrying the work forward.

m07 DDI Data Documentation Initiative

The Data Documentation Initiative (DDI) has its roots in the data archive environment, but with its latest development, DDI 3 or DDI Lifecycle, it has become an increasingly interesting option for NSIs. DDI is an effort to create an international standard for describing data from the social, behavioural, and economic sciences. It is based on XML. DDI is supported by a non-profit international organisation, the DDI Alliance. Several tools that support DDI are available, both on the commercial market and as free software. The Data Documentation Initiative is a standard for technical documentation describing social science data and is based on XML. The current version supports description of the full life cycle of a dataset or data collection (see also Generic Statistical Business Process Model).Intended use: DDI is commonly used as a standard for documenting and describing data for archiving and reuse. It is also suitable for:

•Documenting on-going research projects•Documenting secondary uses of data•Creating concept/question/variable libraries•Generating multiple delivery formats for data dissemination or discovery

standard 3 ? yes

DDI Alliance

m08 ESMS Euro SDMX Metadata Structure

ESMS Metadata files are used for describing the statistics released by Eurostat. It aims at documenting methodologies, quality and the statistical production in general. It uses 21 high level concepts with a limited breakdown of sub items, strictly derived from the list of cross domain concepts in the SDMX Content Oriented Guidelines (2009)

model 3 ? yes

Eurostat?

m09 MSD Out of scope, not investigated

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

16

Page 18: ec. file · Web viewin partnership with. Title: Overview of and recommendations on the use of metadata models. WP: 1 - Metadata Deliverable: 1.3 Version: 2.0 (final version) Date:

ESSnet on Data WarehousingHarry Goossens, Jos Dressen, Michel Lindelauf (CBS)

m10 SDMX-IM SDMX Information Model

The SDMX-IM is used to describe the basic data and metadata structures used in all of the SDMX data formats. There is a primary division between time series and cross-sectional data and the metadata which describes the structure of that data. The Information Model concerns itself with statistical data and its structural metadata, and that is what is described here. Both structural metadata and data have some additional metadata in common, related to their management and administration

model ? ? yes

?

m11 ESQRS ESQRS is part of ESS and does focus on the producers side of statistics. Its main goal is to monitor the quality of the statistics produced. http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.40/2010/wp.14.e.ppt

? ? ? yes

Eustat/EU?

m12 EPMS Euro Process Metadata Structure

Out of scope, not investigated Eustat/EU?

m13 GSIM Generic Statistical Information Model

GSimSM is linked to the GSBPM and its main purpose is to provide a common reference model for statistical information. It describes data and metadata flows within statistical business process

model ? ? yes

Australian Bureau of Statics (ABS)

m14 MMX MMX Framework MMX is a so called storage for metadata model which is being used at Statistics Estonia. MMX metadata framework is a lightweight implementation of OMG Metadata Object Facility built on relational database technology. MMX framework is based on three general concepts: Metamodel, Acces layer and Generic TransformationThe MMX Metamodel provides a storage mechanism for various knowledge models. The data model underlying the metadata framework is more abstract in nature than metadata models in general. The model consists of only a few abstract entities, most remarkably, OBJECT, RELATION, EXPRESSION and PROPERTY. The rest of the entities and relationships are 'hidden' behind these root objects and can be derived (inherited) by typifying those. Most of the structure of the data model normally exposed in ER diagram is therefore actually stored as data (meta-metadata). MMX Metamodel can be seen as a general-purpose storage mechanism for different knowledge models, e.g. Frame system, Description Logic (RDF). Several metadata models are predefined in MMX Metamodel, e.g.:- ontology (based on Frame System, Declaration Logic etc.);- classification (based on ISO/IEC: Metadata Registries/Neuchâtel Terminology Model);- relational database (based on Eclipse SQL Model);- role-based access control model (based on NIST RBAC).http://www.mmxframework.org/post/2008/09/29/MMX-Knowledge-Model.aspx

model ? ? yes

Statistics Estonia & commercial party

m15 GSBPM Generic Statistical Business Process Model

GSBPM provides a framework tot describe the statistical production process in term of standard components. One of the original aims of the model is to standardise the process terminology, thereby making is easier to compare and benchmark processes within and between organisations, primarily NSIs and international organisations

standard ? ? ? ?

m16 MCV Metadata Common Vocabulary

MCV is not a standard itself, but provides definitions of common metadata concepts, in particular in the domain of statistical metadata. It is part of the SDMX Content-oriented Guide Lines

model ? ? ? SDMX consortium

Overview and recommendations on metadata modelsversion 1.0 – final / 2 September 2013

17