metadata models in survey computing some results of metanet – wg 2 metis 2004, geneva w. grossmann...

32
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

Upload: charles-williams

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

Metadata Models in Survey Computing

Some Results of MetaNet – WG 2

METIS 2004, GenevaW. Grossmann

University of Vienna

Page 2: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 2

Contents

MetanetRequirements for ModelsKey Features of the ModelImplications for Terminology

Page 3: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 3

METANET 1

A network of excellence funded by EUROSTAT 2000/01 – 2003, 5 work groups WG 1: Methodology and Tools WG 2: Harmonisation of Metadata –

Structure and Definitions WG 3: Best Practice for Migration WG 4: Adoption Issues WG 5: Terminology (ad hoc)

Page 4: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 4

METANET 2

Within WG 2 two different approaches: Terminology Model (cf. WP 12) Unified Metadata Architecture for

Statistics (UMAS model)

Page 5: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 5

METANET 3

Intention of the UMAS model Statistics deals with different kinds of

data, e.g. surveys, registers, classifications, …

These data show a dynamic defined by statistical processing activities

Define a model which supports besides description of data description of the statistical dynamic

Page 6: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 6

Requirement Analysis 1 Method

Requirement analysis is based on Examination of a number of activities in

survey processing Documentation of these activities inside

statistical systems, in particular proposal of Banca d’Italia, DDI, OECD, SCB-DOK, SDDS,

Statistics Netherlands (Input-Throughput-Output model)

General methods for documentation, e.g. Dublin Core, Facet Classifications, ISO-Standards

Page 7: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 7

Requirement Analysis 2 Example A: Sampling

Terminology view Sampling is the process of selecting a

number of cases from all the cases in a particular group or universe

Operational view Input: Sampling Frame Output: Sample

Page 8: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 8

Requirement Analysis 3 Example A: Sampling

Details of operational view What is an appropriate definition for the sampling

frame given the problem (e.g. coverage)? What kind of additional information should be

available for the sampling frame (e.g. auxiliary variables)?

How can we obtain an appropriate representation of the desired sampling frame (e.g. merging existing frames, selecting from existing frames)

Who is responsible for the frame in the future?

Page 9: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 9

Requirement Analysis 4 Example A: Sampling

Which sampling technique is appropriate for our problem (Note that there are some relations between structure of sampling frame and possible sampling techniques)

In which form is the output (i.e. the sample) represented in the system?

Who is responsible for the sampling procedure?

Page 10: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 10

Requirement Analysis 5 Example B: Editing

Terminology view Editing is the process of detecting and

adjusting individual errors in data records resulting from data collection and capture…

Operational view Input: A variable together with a set of

admissible values for the variable within a specific context

Output: A summary statement about quality of the variable or a listing of errors for each case

Page 11: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 11

Requirement Analysis 6 Example B: Editing

Details of operational view Context may be defined in various ways

Subject matter considerations, e.g. there is only one person in a household, who can claim to be head of household

Context may be defined by some more technical reasons, e.g. use as measurement unit for annual income 1000€

Context may be defined by pure technical reasons, e.g. “f” for female and “m” for male

Page 12: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 12

Requirement Analysis 7 Example B: Editing

Context defines rules for the admissible values of the variable Within one data set, Within one infological model (e.g. person-household) Within a time series

Rules may be formulated As strong constraints, i.e. logical conditions on

combination of values As soft constraints, i.e. statistical conditions on

combinations of values Rules have to be processed in algorithmic

form and maintained by an administrative procedure

Page 13: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 13

Requirement Analysis 8 Example C: Weighting

Terminology view Weight is the importance of an object

in relation to a set of objects to which it belongs; ….

Operational view Input: A statistical dataset together

with appropriate information Output: Statistical dataset augmented

by the weight information

Page 14: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 14

Requirement Analysis 9 Example C: Weighting

Details of operational view Which subject matter problem should be

solved by weighting (e.g. representation of strata, post-stratification, …)?

Which procedure should be used for weighting (e.g. base weights, calibration weights, ….)?

In which form are the data and the additional information about the population available (e.g. population data as summary table or as register with auxiliary variables)?

Page 15: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 15

Requirement Analysis 10 Example C: Weighting

How can we access and combine the different data?

Who is responsible for the different datasets? How is the output represented (e.g. as

weight for the dataset, as weights for the sampling procedure, as summary table)?

Are we interested in reuse of the procedure for new data sets (e.g. the same weighting procedure within a series)?

Page 16: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 16

Requirement Analysis 11 Example D: Analytical Units

Terminology view Analytical units represent real or

artificially constructed units for which statistics are compiled

Operational view Input: Two or more statistical units Output: A new statistical unit

Page 17: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 17

Requirement Analysis 12 Example D: Analytical Units

Details of operational view What is the conceptual definition of the

statistical units? How are the conceptual definitions captured

by operational characteristics (e.g. auxiliary variables)?

How can we access and manipulate the operational characteristics in order to produce the new analytical unit?

How is the new analytical unit embedded into an existing administrative framework?

Page 18: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 18

Requirement Analysis 13 Summary

The examples show that Models should be based on terminology but are

more than terminology Models have to consider different types of

“statistical” objects For these objects we have to know the

concepts represented as data, together with the relations between the concepts

We have to know the statistical meaning of the objects, together with their statistical relations

Page 19: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 19

Requirement Analysis 14 Summary

We have to take into account the specific format of realisation of the objects as physical datasets

We must include statements about responsibility, access rights and other administrative details

We need a flexible coupling mechanism for the objects according to processing needs

We have to develop a description formalism for statistical processing

We have to take into account information requirements of external users

Page 20: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 20

Key Features of the Model 1

In order to meet the different aspects of the requirement analysis a model with four different facets, resembling the idea of facet classifications used by librarians and archivist, was designed

Page 21: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 21

Key Features of the Model 2

“Structure Facet” The objects of interest, so called

“statistical categories”: statistical unit, statistical population, statistical variables, statistical values together with a number of

related objects like classifications, statistical datasets,

Page 22: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 22

Key Features of the Model 3

“statistical domains” for coupling objects according to processing needs (basically a system of catalogues for the other objects)

Each instance of the structure has a twofold representation inside a system

As data (“Category-Instance data”) As description (“Category-Instance model” ,

i.e. metadata)

Page 23: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 23

Key Features of the Model 4

“View Facet” describe the instances “Conceptual point of view”

subject matter definition “Statistical point of view”

The statistical properties of the instances necessary for processing

“Data management point of view” All information necessary for machine supported

storage and manipulation “Administrative point of view”

Management and bookkeeping of the structures

Page 24: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 24

Key Features of the Model 5

“Stage Facet” describes processing at the data as well as at the metadata level

DEFINITIONandDESIGN

{ CM1, CM2, . . . }Metadata level

{ TM (CM1, CM2, ... )}

PRODUCTION PROCESSING

{ CID1, CID2, . . . }Data level

{ TD (CID1, CID2, ... )}

DISSEMINATIONandEXCHANGE

Page 25: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 25

Key Features of the Model 6

“Production blueprint” Keeps the information how the instance is set

up inside the system according to the four different views of the view facet

“Processing blueprint” Describes the processing activities for the

instances according to the four different views of the view facet

Page 26: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 26

Key Features of the Model 7

“Function facet” All aspects of communication and usage

of meta-information by humans inside the system as well as in connection with dissemination and exchange

Who is involved in communication? What information is communicated? How is the information communicated ?

Page 27: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 27

Implications for Terminology 1

Statistical information systems use terminology from different sources Statistics, Computer Science, Economy, Social

Sciences,…

We can at best collect terminology and bring it into an order according to some modelThis ordering defines a “statistical ontology” which is above terminology and has to use few common agreed terms

Page 28: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 28

Implications for Terminology 2

Activities of METANET WG 4 showed that there is rather large agreement between statisticians about the main terms for important structures: Statistical Unit Statistical Variable Statistical Values

Page 29: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 29

Implications for Terminology 3

Based on such agreement we can assign each terminology item a specification according to the ontologyUMAS proposes the following classification What is the source of terminology (e.g. statistics, general

standards, application area,…)? For which structure applies the term? For which view is the term used? In which processing stage is the term used? Which function aspects covers the term?

Page 30: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 30

Implications for Terminology 4

MCV (SDMX) proposes the following classification Administration (close relation to administrative view) Concepts, Definitions, Standards (close relation to

conceptual view and structure) Data Collection, manipulating/accounting convention

(close relation to stage facet) Quality and performance metadata (close relation to

function facet in connection with dissemination and exchange)

Page 31: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

10.02.2004 METIS 2004 Geneva 31

Summary

Statistical processing activities define metadata requirements usually not considered in traditional data modelling The processing activities require a model which supports flexible coupling of entities according to processing needsWe need besides terminology also specification of the terminology in context of a statistical ontology

Page 32: Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

Thank you!

[email protected]