sdmx -...
TRANSCRIPT
SDMX Statistical Data and Metadata eXchange
Statistics agencies/providers want to:
• Avoid sending the same data to multiple agencies
• Avoid sending data packages full stop
• Not adopt new formats/standards for a few, specific data flows
• Make datasets user-friendly and comparable
International Organisations/receivers want to:
• Avoid time and errors when processing different file formats from providers
• Large dataset size processing is a technical and manual headache
• To have comparable data
• Avoid the time delays caused by manual file processing
• Avoid round trips of validation with member agencies, avoid creating proprietary validation rules
Exchange Problems & Opportunities
Everybody would like:
• Automatic validation of the exchange before processing
• Automated workflows for exchange of statistics
• Lower cost, increase quality, more guidelines for exchange and implementation
• To document datasets structural metadata and reference metadata
• To store the documentation, have a standard way of querying it, and make it discoverable
• To benefit from a large community offering free tools and sharing expertise around a standard
Exchange Problems & Opportunities
• Statistical Data and Metadata eXchange • Released in 2002 “SDMX is an initiative to
foster standards for the exchange of statistical information.”
• Sponsor organisations: – BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank
• SDMX.org web site
What is SDMX?
• A set of technical standards – Information Model
– Web service standard, e.g. for creating data queries
– Registry standards so that data catalogs can be queries/data can be discovered
• Guidelines for – Coding
– Best practices when using the standard
– Technical implementation
• SDMX governance: – SDMX Sponsors are steering group
– Technical Working Group, Statistical Working Group
• Existing, reusable tools
• Main exchange format is with standard schemas
What is SDMX?
Designed to improve machine-to-machine meta/data exchange Saves resources: • Reuse of exchange systems across domains and agencies • Reuse of statistical metadata and methodology Improves quality: • Promotes standard classifications -> reduces mapping and transformation
errors • Automated exchange -> reduces manual intervention errors • Validation is a first-class part of SDMX Improves timeliness: • Automated workflows, less “wait states” • Reduces delays from manual intervention
E.g. Copy/paste – click; repeat this many times. Automation allows unattended workflow execution
Why use SDMX?: The Business Case
For exchange, why not use…?
Standard Issues
Simple CSV Not structured, hard to validate No metadata
Excel Metadata tied to presentation Proprietary format Licensing Hard to process and automate
FAME, SAS, STATA files
Proprietary format Licensing
GESMES No information model Proprietary format Few tools or international support
XBRL, DDI Not focused on modelling the exchange
XML (only) No context to tags SDMX adds context to XML
The SDMX Information Model
• Information Model Examples:
• SDMX IM is designed for statistical data and metadata exchange, and cataloguing that metadata
• SDMX IM was designed for aggregated data, but can be used for microdata
Information Model Objects Used by
Excel Sheets, Cells, Rows Formulae, VBA
Relational database Database, Table, Column
SQL, Interface
OECD metadata 42 categories OECD.Stat, Metastore
The SDMX Information Model (High Level)
Clickable SDMX: https://statswiki.unece.org/display/ClickSDMX
• SDMX Registry – Structural Metadata catalog – Data Discovery – Demo of SDMX Global Registry
• SDMX Converter – Converts between formats (Excel, GESMES, CSV, etc.)
• SDMX Reference Infrastructure – SDMX Export and mapping for an existing database – Used by many agencies for reporting
• Plug-ins/Libraries for Econometrics Tools: – R, Stata, SAS, etc.
• Java and .Net software libraries (SDMX Source) • Full Tools list • OECD.Stat data warehouse platform is partly SDMX now, will be fully SDMX based
in next two years – Is used by iStat, ABS, Tunisia, ILO, being considered by others – Has an active community
SDMX Main Tools
Mapping
<Query a web service: OECD KEI dataset> http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/KEI/PS+PR+PRINTO01+SL+SLRTTO01+SLRTCR03+OD+ODCNPI03+CI+LO+LOLITOAA+LORSGPRT+LI+LF.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+USA.GP+GY+ST.A+Q+M/all?startTime=2015&endTime=2017&format=compact_v2
Querying Data
• <Checklist for SDMX Design> • <SDMX Glossary> • Guidelines
– Versioning – Data vintage representation – Confidentiality/Embargo representation – Non-calendar time ranges – Cross-domain Code lists (Observation Status, Seasonal Adjustment,
etc.) – How to modelling a statistical domain/reporting framework – Creating Data Structure Definitions – Reference metadata concepts – A global framework/set of concepts for exchanging ref metadata
SDMX Guidelines
The Generic Statistical Business Process Model (GSBPM)
Relationship to other standards
• Since 2011, SDMX has brought together the technical and statistical world in several domains to work on “Global DSDs”
• Global DSDs improve on the heterogenous reporting methods that we have today
• They improve many aspects of data exchange, including: – Better timeliness by allowing data queries (rather than sending data
many times) – Avoid the burden of maintaining many different reporting systems and
exchange agreements – Save money by reusing IT systems, standards, and methodology
• Find them in the Global Registry
SDMX Global Data Structure Definitions (DSDs)
Status of Global DSDs Domains/Reporting frameworks
IN PRODUCTION PUBLICATION DATE
National Accounts (including Gov. Finance Statistics)
2013 Q3
Balance of Payments 2014 Q1
Foreign Direct Investment 2014 Q1
IN PROGRESS
International Merchandise Trade Statistics 2017 Q4
Price statistics 2017 Q4
Labour statistics 2017 Q4
Education 2017 Q4
Sustainable Development Goals 2018 Q3
R&D Statistics To be decided
Environmental-Economic Accounts To be decided
Energy Statistics Envisaged
• SDMX is a set of technical, content and methodological standards • Many free tools, more coming online • Not just a file format (though it includes that) • Goal of SDMX: Help organise statistical metadata to make the exchange of
information easier and more efficient • Has a very active community, new features such as SDMX-CSV • Saves resources, improves quality and timelines of data exchange • For transparency:
– SDMX helps to manage, catalog and surface metadata through registries (such as the global registry
– Standard exchange mechanisms and structures help with comparability and linking between datasets
– SDMX initiative is aligned with other standards through the HLG and international organisation communities
In Conclusion
Thank you David Barraclough – OECD
http://sdmx.org SDMX LinkedIn Group