metadata projects and tasks at statistics finland metis 2010 saija ylönen [email protected]

Metadata projects and tasks at Statistics Finland

METIS 2010

Saija Ylö[email protected]

Organizational chart

11/03/2010 2Saija Ylönen

Co-operating parties of the metadata tasks: organizational units

IT Managementsituated in the Secretariat of the Director Generalco-ordinates the general information architecture, of which

metadata tasks form one element Classification and Metadata Services

situated in the IT and Statistical Methods departmentoperational unitactive role in developing of metadata

Dissemination Servicessituated in the IT and Statistical Methods departmentdevelops the metadata connected with the dissemination


Metadata Co-ordination Group

Originally a co-operation group for persons working with metadata issues in the support function departments of SF

The objective at present is to intensify the co-operation between the statistics departments and the parties responsible for general metadata work

Comprised of members working on metadata and permanent members from all statistics department

Goal is to widen knowledge about metadata and metadata systems and to give an opportunity to the statistics departments to discuss their metadata needs with metadata specialists


CoSSI Steering Group and CoSSI model

Foundation for the metadata system Modular, xml-based model for describing statistical tables,

classifications, concepts, variables, general information on statistical documents, and quality, etc.

Expandable CoSSI Steering Group is in charge of mastering and

developing the model according to user needs in a manner that will not expose its main structure to risk


Definition of metadata

1) Statistical metadata variable and data descriptions classifications, concepts

2) Statistical data quality quality reports statistical method descriptions

3) Metadata of statistical documents or products producers publication information field or subject area


Definition of metadata II

4) Process metadata a) technical metadata

technical metadata guide the workflow of data production, makes it possible to follow data production and documents the working process.

b) conceptual process metadatatechnical information of data and variables which are

used in producing data. E.g. minimum or maximum values, various calculation rules or use of certain classification values


Metadata systems at Statistics Finland


Metadata systems: present situation

We are in a transitional phase from relational databases to an xml-based environment

Relational databases: classifications, concepts and definitions, archiving database

Xml database eXist: publications, classifications, concepts, data descriptions


Relational databases

Built in the 1990’s Used in statistics production but not in all statistical

processes or all statistics Classifications in the relational databases are used in SAS

and Superstar Archiving database is in use in the archiving process Classifications and concepts are generated from the

relational databases to the web pages


XML database

At the moment, the xml database is used mostly in the creation of publications with an Arbortext word processor

Classifications and concepts are copied to the xml database from the relational databases and are ready to use

Tools for utilising metadata objects from the xml database are being constructed

The first metadata tool linked to the xml database is the variable editor


Variable editor

For creating and maintaining the descriptions of statistical data and variables

At the testing phase Implementation begins in 2010 Descriptions are saved as xml documents conforming to

the CoSSI model in the eXist/xml database


Content and functions of the variable editor

Data descriptions are comprised of a general description of the data, a list of variables and information about an individual variable

General data description includes descriptive information on the entire data document

Variable list interleaf allows management of the list of variables in the data description and selection of the variable whose description needs editing.



Variable list interleaf

Variable metadata


Field name Description

short name Short identifying name of variable

long name Name of variable in natural language

concept definition Basic conceptual description of variable

operational definition Verbal description of the formation of the variable

deduction rule E.g. programming instructions, mathematical formula, etc.

classification ID Identifier of classification. Refers to a classification in the classification database.

unit of measure Measurement unit of variable

variable modified Date of creation or modification of variable (yyyy-mm-dd)

start of validity Start date of validity of variable (yyyy-mm-dd)

end of validity End date of validity of variable (yyyy-mm-dd)

status Stage of editing of variable: draft, ready, validated

variable group Name of group to which variable belongs. Makes working with long variable lists easier.

work comment Free text field. Contains information only for the use of the maintainer of a description.

Results from the variable editor project

the development of a consistent information architecture the construction of production applications in which

metadata need not be separately produced or manually added to data when publishing or archiving statistics

information service where excessive time need not be spent on searching for metadata, or on actual reproduction of metadata for special compilation assignments

a system from which table column and row headings can in tabulation applications be retrieved in multiple languages for all statistics using the same methods.


In addition to actual variable editor application the project also created preconditions for:

Experiences gained during the variable editor project

Various questions concerning standardisation had to be addressed in the project although they were not originally in the projects’ scope of task – they had to be done and they took a lot of time

Because the variable editor project was the first leg in the revision of the metadata system it was subjected to a diversity of expectations

Project was a good test run for the CoSSI model – the data content of the model proved to be exhaustive


The planning and building of a classification editor

Reasons for the renewing of the classification system:the present way of maintaining classifications has been

viewed as inflexible by statisticsrenunciation of the Sybase relational databasesICT strategy: in the next few years the agency will

introduce a common statistical metadata system based on the CoSSI model

Classification editor project 20101) definition stage2) construction stage


Goals of the classification editor project

Analyse the service needs required from a centralised classification system

Create maintenance tools for classifications in connection with the CoSSI/eXist metadata store so that the basic maintenance needs of classifications of individual statistics are met in a user-oriented manner which also allows further development of the classification system

Produce the solutions with which the interoperability of the Sybase classification database and the eXist metadatabase can be ensured

Compile user instructions for the editor Pilot test the editor


Benefits of the new classification system

A classification system which serves well will encourage centralised and structured maintenance of classification

The documentation of classifications will improve, making them easy to find for use in-house and for the provision of information service

The new classification system will support smooth movement between data descriptions, variable descriptions and maintenance of classifications and thus improve the efficiency of the maintenance and use of classifications in statistics


General benefits of the common classification system

A centralised classification system eases the workload needed to maintain classifications because classifications are only maintained in one place

Reduces the possibility of errors because classifications are documented in the system consistently so that they are accessible to everybody and easy to find

Improves the efficiency of time use because working hours need not be spent on looking for classifications and trying to find their background information

Makes the classifications used in different statistics visible to everybody and thus creates possibilities for their harmonisation


In conclusion: Why do some statistics departments still have their own metadata systems instead of using the centralized system?

Centralised metadata work progresses too slowly from the perspective of individual statistics – We should rethink our construction and implementation strategy

Common attitude still regards the process of an individual set of statistics as unique, and therefore incapable of exploiting systems that are meant for all statistics – We have to get quick results to prove the benefits of the system

Commitment by the Management and their support to the work is crucial – We have to convince them


THANK YOU FOR YOUR ATTENTION!


metadata projects and tasks at statistics finland metis 2010 saija ylönen [email protected]

Documents