metadata projects and tasks at statistics finland metis 2010 saija ylönen [email protected]
TRANSCRIPT
Organizational chart
11/03/2010 2Saija Ylönen
Co-operating parties of the metadata tasks: organizational units
IT Managementsituated in the Secretariat of the Director Generalco-ordinates the general information architecture, of which
metadata tasks form one element Classification and Metadata Services
situated in the IT and Statistical Methods departmentoperational unitactive role in developing of metadata
Dissemination Servicessituated in the IT and Statistical Methods departmentdevelops the metadata connected with the dissemination
11/03/2010 3Saija Ylönen
Metadata Co-ordination Group
Originally a co-operation group for persons working with metadata issues in the support function departments of SF
The objective at present is to intensify the co-operation between the statistics departments and the parties responsible for general metadata work
Comprised of members working on metadata and permanent members from all statistics department
Goal is to widen knowledge about metadata and metadata systems and to give an opportunity to the statistics departments to discuss their metadata needs with metadata specialists
11/03/2010 4Saija Ylönen
CoSSI Steering Group and CoSSI model
Foundation for the metadata system Modular, xml-based model for describing statistical tables,
classifications, concepts, variables, general information on statistical documents, and quality, etc.
Expandable CoSSI Steering Group is in charge of mastering and
developing the model according to user needs in a manner that will not expose its main structure to risk
11/03/2010 5Saija Ylönen
Definition of metadata
1) Statistical metadata variable and data descriptions classifications, concepts
2) Statistical data quality quality reports statistical method descriptions
3) Metadata of statistical documents or products producers publication information field or subject area
11/03/2010 6Saija Ylönen
Definition of metadata II
4) Process metadata a) technical metadata
technical metadata guide the workflow of data production, makes it possible to follow data production and documents the working process.
b) conceptual process metadatatechnical information of data and variables which are
used in producing data. E.g. minimum or maximum values, various calculation rules or use of certain classification values
11/03/2010 7Saija Ylönen
Metadata systems at Statistics Finland
11/03/2010 8Saija Ylönen
Metadata systems: present situation
We are in a transitional phase from relational databases to an xml-based environment
Relational databases: classifications, concepts and definitions, archiving database
Xml database eXist: publications, classifications, concepts, data descriptions
11/03/2010 9Saija Ylönen
Relational databases
Built in the 1990’s Used in statistics production but not in all statistical
processes or all statistics Classifications in the relational databases are used in SAS
and Superstar Archiving database is in use in the archiving process Classifications and concepts are generated from the
relational databases to the web pages
11/03/2010 10Saija Ylönen
XML database
At the moment, the xml database is used mostly in the creation of publications with an Arbortext word processor
Classifications and concepts are copied to the xml database from the relational databases and are ready to use
Tools for utilising metadata objects from the xml database are being constructed
The first metadata tool linked to the xml database is the variable editor
11/03/2010 11Saija Ylönen
Variable editor
For creating and maintaining the descriptions of statistical data and variables
At the testing phase Implementation begins in 2010 Descriptions are saved as xml documents conforming to
the CoSSI model in the eXist/xml database
11/03/2010 12Saija Ylönen
Content and functions of the variable editor
Data descriptions are comprised of a general description of the data, a list of variables and information about an individual variable
General data description includes descriptive information on the entire data document
Variable list interleaf allows management of the list of variables in the data description and selection of the variable whose description needs editing.
11/03/2010 13Saija Ylönen
11/03/2010 14Saija Ylönen
Variable list interleaf
Variable metadata
11/03/2010 15Saija Ylönen
Field name Description
short name Short identifying name of variable
long name Name of variable in natural language
concept definition Basic conceptual description of variable
operational definition Verbal description of the formation of the variable
deduction rule E.g. programming instructions, mathematical formula, etc.
classification ID Identifier of classification. Refers to a classification in the classification database.
unit of measure Measurement unit of variable
variable modified Date of creation or modification of variable (yyyy-mm-dd)
start of validity Start date of validity of variable (yyyy-mm-dd)
end of validity End date of validity of variable (yyyy-mm-dd)
status Stage of editing of variable: draft, ready, validated
variable group Name of group to which variable belongs. Makes working with long variable lists easier.
work comment Free text field. Contains information only for the use of the maintainer of a description.
Results from the variable editor project
the development of a consistent information architecture the construction of production applications in which
metadata need not be separately produced or manually added to data when publishing or archiving statistics
information service where excessive time need not be spent on searching for metadata, or on actual reproduction of metadata for special compilation assignments
a system from which table column and row headings can in tabulation applications be retrieved in multiple languages for all statistics using the same methods.
11/03/2010 16Saija Ylönen
In addition to actual variable editor application the project also created preconditions for:
Experiences gained during the variable editor project
Various questions concerning standardisation had to be addressed in the project although they were not originally in the projects’ scope of task – they had to be done and they took a lot of time
Because the variable editor project was the first leg in the revision of the metadata system it was subjected to a diversity of expectations
Project was a good test run for the CoSSI model – the data content of the model proved to be exhaustive
11/03/2010 17Saija Ylönen
The planning and building of a classification editor
Reasons for the renewing of the classification system:the present way of maintaining classifications has been
viewed as inflexible by statisticsrenunciation of the Sybase relational databasesICT strategy: in the next few years the agency will
introduce a common statistical metadata system based on the CoSSI model
Classification editor project 20101) definition stage2) construction stage
11/03/2010 18Saija Ylönen
Goals of the classification editor project
Analyse the service needs required from a centralised classification system
Create maintenance tools for classifications in connection with the CoSSI/eXist metadata store so that the basic maintenance needs of classifications of individual statistics are met in a user-oriented manner which also allows further development of the classification system
Produce the solutions with which the interoperability of the Sybase classification database and the eXist metadatabase can be ensured
Compile user instructions for the editor Pilot test the editor
11/03/2010 19Saija Ylönen
Benefits of the new classification system
A classification system which serves well will encourage centralised and structured maintenance of classification
The documentation of classifications will improve, making them easy to find for use in-house and for the provision of information service
The new classification system will support smooth movement between data descriptions, variable descriptions and maintenance of classifications and thus improve the efficiency of the maintenance and use of classifications in statistics
11/03/2010 20Saija Ylönen
General benefits of the common classification system
A centralised classification system eases the workload needed to maintain classifications because classifications are only maintained in one place
Reduces the possibility of errors because classifications are documented in the system consistently so that they are accessible to everybody and easy to find
Improves the efficiency of time use because working hours need not be spent on looking for classifications and trying to find their background information
Makes the classifications used in different statistics visible to everybody and thus creates possibilities for their harmonisation
11/03/2010 21Saija Ylönen
In conclusion: Why do some statistics departments still have their own metadata systems instead of using the centralized system?
Centralised metadata work progresses too slowly from the perspective of individual statistics – We should rethink our construction and implementation strategy
Common attitude still regards the process of an individual set of statistics as unique, and therefore incapable of exploiting systems that are meant for all statistics – We have to get quick results to prove the benefits of the system
Commitment by the Management and their support to the work is crucial – We have to convince them
11/03/2010 22Saija Ylönen
THANK YOU FOR YOUR ATTENTION!
11/03/2010 23Saija Ylönen