technical overview of sdmx and ddi : describing microdata arofan gregory metadata technology
Post on 31-Dec-2015
213 Views
Preview:
TRANSCRIPT
Technical Overview of SDMX and DDI : Describing Microdata
Arofan GregoryMetadata Technology
Outline
• Background• Capabilities of SDMX for Describing Microdata
and Related Information– Intended use– The nature of microdata
• Capabilities of DDI for Describing Microdata and Related Information
• Comparison• Criteria for Choosing a Standard
Background
• There has been much discussion of how SDMX and DDI relate– UN/ECE SDMX-DDI Dialogue – a discussion involving
users and members from both standards bodies– METIS and other conferences– HLG and GSIM
• In order to understand how a standard should be chosen, we need to understand the implications of our choices
Background (continued)
• First, we must understand that capabilities of each standard, and whether it supports what we are trying to do
• We must consider the implications for IT infrastructure and tools used within the organization
• We must understand the cost of adopting each standard in terms of staff and organizational capabilities
Points of Discussion
• For time-series and aggregate data reported to international organizations, SDMX is seen as the best standard to use– But it is possible to describe aggregates in DDI
• For describing questionnaires, DDI is the preferred standard in most cases– But it is possible to describe questionnaires using SDMX
• For describing microdata sets, there is no simple choice: both standards are useful for certain microdata sets
Comparison
• In order to compare the standards for certain purposes, we will look at the functionalities they were designed for, and then consider the implications
SDMX Capabilities
• SDMX is able to describe many types of data– Time series and cross-sectional aggregates– “Reference” metadata in a very configurable way (eg, quality
frameworks and methodological metadata)– Information about managing data exchange between counterparties
• Data description is highly dimensional– All data sets are seen as having a dimensional structure for
addressing each observation within the data set– Microdata can be modelled in a dimensionalized way, as well as
aggregate data• SDMX is designed to support specific types of microdata
– Financial transactional registers
SDMX Capabilities (Continued)
• SDMX Reference metadata does not provide an explicit modelling of the metadata it can describe– You define the needed concepts– Concepts are arranged into a flat or hierarchical
structure– Concepts are given suitable representations
• But nothing in the SDMX specifications provides the model– This is provided by the using organization, and can be a
standard (eg, Eurostat’s quality frameworks)
SDMX Capabilities (Continued)
• Questionnaires can be described as SDMX Reference metadata structures– The now-finished ESSnet project proved this to be
the case– But it was a very complicated use of this SDMX
feature set• Methodological metadata can be expressed as
SDMX Reference metadata– This works quite well, but is not necessarily
“standard”
The Nature of Microdata
• When we consider aggregate data, there are clear dimensions, sufficient to differentiate every observation in a data set– Eg, Percentage of Employment expressed as Sex by Age by Region
• Microdata can also be described dimensionally– Any classificatory variable can act as a dimension– But each record also has a case identifier– The variables often hold different types of measures
• Unlike aggregate data, there are very few necessary dimensions for identifying an observation– All you need is the case identifier and the variable
The Nature of Microdata (Continued)
• Microdata can be described also in a different way– As a rectangular table where variables are columns,
and cases are rows– This is a very common way to describe the structure
of microdata– Many tools use this approach (SAS, SPSS, Stata, etc.)– This is a much more “relational” approach that a
dimensionalized one (as seen in OLAP data warehouses, for example)
The Capabilities of DDI
• DDI comes from the data archive community, which has a strong focus on the microdata deposited by social science researchers– It has excellent capabilities for describing
microdata sets using the unit-record (row-column) paradigm
– Also good capabilities for describing various phases of the data lifecycle: data collection, archiving, data processing, tabulation, methodology with explicit models
DDI Capabilities (Continued)
• Very detailed description of questionnaires– Also an explicit model
• DDI provides a description of the aggregation process, including the structural metadata for dimensionalized data sets (“Ncubes”)
Comparison
• Because of the use cases which SDMX and DDI were designed to support, they have specific strengths– SDMX for exchange, reporting, and dissemination of
aggregate data– DDI for describing the data collection and resulting
microdata, along with the processes applied to it• But both standards can be used to support some
common use cases– Questionnaires– Microdata description– Dimensionalized data
Comparison (Continued)
• For microdata description specifically, there are some significant differences between the standards capabilities– SDMX has the data in an XML format, which can be
problematic for large data sets– DDI describes an ASCII data file (or other external
format)– DDI can describe data files with different linked
record types– SDMX cannot do this
Comparison (Continued)
• For describing questionnaires, and other types of related metadata (methodology, etc.) there are also major differences– SDMX relies on the Reference Metadata
mechanism for these metadata, which has no specified model (it is configured by users)
– DDI has an explicit model in the standard itself– These facts can be strengths or weaknesses
Criteria for Choosing a Standard
• Does the standard support needed functionality?– Eg, SDMX can describe questionnaires, but if you need detailed
flow logic, DDI is much better• How good is tools support for the needed functions?
– Eg, For graphical data display, SDMX has good tools – DDI does not
• Is there a high cost in terms of the learning curve?– Maintaining competencies among staff can be costly and difficult– Using a familiar standard may be the best choice– SDMX and DDI both require a significant learning investment for
developers
Conclusions
• SDMX and DDI were designed to support different uses, and have different strengths as a result
• In most cases, SDMX is better for dimensionalized data sets, exchange, and dissemination
• DDI is generally better for working with microdata and its collection and processing
• But: The choice of a suitable standard can only be made by taking into consideration a larger number of factors – it is not a simple black-and-white choice
top related