technical overview of sdmx and ddi : describing microdata arofan gregory metadata technology

Technical Overview of SDMX and DDI : Describing Microdata

Arofan GregoryMetadata Technology

Outline

• Background• Capabilities of SDMX for Describing Microdata

and Related Information– Intended use– The nature of microdata

• Capabilities of DDI for Describing Microdata and Related Information

• Comparison• Criteria for Choosing a Standard

Background

• There has been much discussion of how SDMX and DDI relate– UN/ECE SDMX-DDI Dialogue – a discussion involving

users and members from both standards bodies– METIS and other conferences– HLG and GSIM

• In order to understand how a standard should be chosen, we need to understand the implications of our choices

Background (continued)

• First, we must understand that capabilities of each standard, and whether it supports what we are trying to do

• We must consider the implications for IT infrastructure and tools used within the organization

• We must understand the cost of adopting each standard in terms of staff and organizational capabilities

Points of Discussion

• For time-series and aggregate data reported to international organizations, SDMX is seen as the best standard to use– But it is possible to describe aggregates in DDI

• For describing questionnaires, DDI is the preferred standard in most cases– But it is possible to describe questionnaires using SDMX

• For describing microdata sets, there is no simple choice: both standards are useful for certain microdata sets

Comparison

• In order to compare the standards for certain purposes, we will look at the functionalities they were designed for, and then consider the implications

SDMX Capabilities

• SDMX is able to describe many types of data– Time series and cross-sectional aggregates– “Reference” metadata in a very configurable way (eg, quality

frameworks and methodological metadata)– Information about managing data exchange between counterparties

• Data description is highly dimensional– All data sets are seen as having a dimensional structure for

addressing each observation within the data set– Microdata can be modelled in a dimensionalized way, as well as

aggregate data• SDMX is designed to support specific types of microdata

– Financial transactional registers

SDMX Capabilities (Continued)

• SDMX Reference metadata does not provide an explicit modelling of the metadata it can describe– You define the needed concepts– Concepts are arranged into a flat or hierarchical

structure– Concepts are given suitable representations

• But nothing in the SDMX specifications provides the model– This is provided by the using organization, and can be a

standard (eg, Eurostat’s quality frameworks)

SDMX Capabilities (Continued)

• Questionnaires can be described as SDMX Reference metadata structures– The now-finished ESSnet project proved this to be

the case– But it was a very complicated use of this SDMX

feature set• Methodological metadata can be expressed as

SDMX Reference metadata– This works quite well, but is not necessarily

“standard”

The Nature of Microdata

• When we consider aggregate data, there are clear dimensions, sufficient to differentiate every observation in a data set– Eg, Percentage of Employment expressed as Sex by Age by Region

• Microdata can also be described dimensionally– Any classificatory variable can act as a dimension– But each record also has a case identifier– The variables often hold different types of measures

• Unlike aggregate data, there are very few necessary dimensions for identifying an observation– All you need is the case identifier and the variable

The Nature of Microdata (Continued)

• Microdata can be described also in a different way– As a rectangular table where variables are columns,

and cases are rows– This is a very common way to describe the structure

of microdata– Many tools use this approach (SAS, SPSS, Stata, etc.)– This is a much more “relational” approach that a

dimensionalized one (as seen in OLAP data warehouses, for example)

The Capabilities of DDI

• DDI comes from the data archive community, which has a strong focus on the microdata deposited by social science researchers– It has excellent capabilities for describing

microdata sets using the unit-record (row-column) paradigm

– Also good capabilities for describing various phases of the data lifecycle: data collection, archiving, data processing, tabulation, methodology with explicit models

DDI Capabilities (Continued)

• Very detailed description of questionnaires– Also an explicit model

• DDI provides a description of the aggregation process, including the structural metadata for dimensionalized data sets (“Ncubes”)

Comparison

• Because of the use cases which SDMX and DDI were designed to support, they have specific strengths– SDMX for exchange, reporting, and dissemination of

aggregate data– DDI for describing the data collection and resulting

microdata, along with the processes applied to it• But both standards can be used to support some

common use cases– Questionnaires– Microdata description– Dimensionalized data

Comparison (Continued)

• For microdata description specifically, there are some significant differences between the standards capabilities– SDMX has the data in an XML format, which can be

problematic for large data sets– DDI describes an ASCII data file (or other external

format)– DDI can describe data files with different linked

record types– SDMX cannot do this

Comparison (Continued)

• For describing questionnaires, and other types of related metadata (methodology, etc.) there are also major differences– SDMX relies on the Reference Metadata

mechanism for these metadata, which has no specified model (it is configured by users)

– DDI has an explicit model in the standard itself– These facts can be strengths or weaknesses

Criteria for Choosing a Standard

• Does the standard support needed functionality?– Eg, SDMX can describe questionnaires, but if you need detailed

flow logic, DDI is much better• How good is tools support for the needed functions?

– Eg, For graphical data display, SDMX has good tools – DDI does not

• Is there a high cost in terms of the learning curve?– Maintaining competencies among staff can be costly and difficult– Using a familiar standard may be the best choice– SDMX and DDI both require a significant learning investment for

developers

Conclusions

• SDMX and DDI were designed to support different uses, and have different strengths as a result

• In most cases, SDMX is better for dimensionalized data sets, exchange, and dissemination

• DDI is generally better for working with microdata and its collection and processing

• But: The choice of a suitable standard can only be made by taking into consideration a larger number of factors – it is not a simple black-and-white choice

technical overview of sdmx and ddi : describing microdata arofan gregory metadata technology

Documents

governance of commonly used sdmx...

iza data service center ddi/sdmx workshop wiesbaden,...

escwa sdmx workshop session: sdmx and data. session...

sdmx statistical capacity building guidelines for ... 2013...

talk microdata

census microdata revolution

eddi: introduction to sdmx arofan gregory open data...

sdmx - oecd.org · introduction 1. data exchange process a....

business needs and context for ddi and sdmx ess ddi/sdmx...

sdmx and metadata

sdmx @ abs: why and how we use sdmx graeme oakley australian...

sdmx central - dsbb.imf.org · 2 1. structure and data imf...

sdmx interface for ilostat

escwa sdmx workshop

sdmx it tools sdmx use in practice in na

status on the mapping of metadata standards iso/iec 11179,...

dienstencatalogus on site / remote access...microdata...

sdmx tools

sdmx advanced topics on technical standards arofan gregory...

governance of commonly used sdmx artefacts of commonly used...