sdmx basics david barraclough oecd sdmx coordinator
TRANSCRIPT
SDMXBasics
David Barraclough OECD SDMX Coordinator
Overview• What is SDMX?• Why SDMX?• SDMX at OECD• How to start with SDMX?• Some SDMX concepts– How is data exchanged– The main tools– Content-oriented guidelines
• Future of SDMX
What is SDMX (not)?
Not simply a technical format!
What is SDMX?
• Statistical Data and Metadata eXchange• Released in 2002 “SDMX is an initiative to
foster standards for the exchange of statistical information.”
• Sponsor organisations: – BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank
What is SDMX?
• Format: XML and EDI (rebranded GESMES)• SDMX Information model• Web service standards: APIs• SDMX Registry standards• Content-oriented guidelines
Why SDMX? The Business Case
• Reusable, open-source (free) tools save money and time• Standard codes and naming help improve reuse and save
time– Reuse of categories– Less mapping/data processing saving– Shopping list of concepts when defining structures
• Strongly-typed structures help improve validation and processing– Heavy-lifting processing of data messages can be automated– Text format is human-readable– Easier to create new tools around the agreed format
Why SDMX? The Business Case
• Standard technical architecture promotes more timely, better quality data– Timely because less manual conversion is needed– Quality because automated processing means less human error
• SDMX Information model– Provides a common terminology – Makes tool development much easier– Information model described later
What’s in it for Data Reporters?
• SDMX Registry helps structure metadata • SDMX Tools exist and are free• One dissemination channel instead of
packaging data for multiple consumers• Can easily disseminate SDMX from existing
data warehouse with SDMX-RI• Lots of SDMX methodology available and
growing
Why not use…?Issues
CSV Not structured, hard to validateNo metadata
Excel Metadata tied to presentationProprietary formatLicensingHard to process and automate
FAME, SAS, STATA
Proprietary formatLicensing
GESMES No information modelProprietary formatFew tools or international support
XML No context to tagsSDMX adds context to XML
XBRL, DDI Not focused on aggregated data exchange
The SDMX XML Data file format:
“Global DSDs”Domains at various stages of implementation:• National Accounts• Balance of Payments• Foreign Direct Investment FDIIn draft:• Harmonized Trade IMTS• R&D• Education
Many other “Shared ” DSDs.
SDMX at OECDHarmonized Trade data• Synchronised from UN database every night• Only “Delta” is synched in our database. Required because
trade database is huge• SDMX standards support querying the delta for a given date
Harmonized Trade
SDMX at OECD• OECD.Stat SDMX web service is used for:– Data resellers receive data in standardformat, easy to process
– Incremental updates are possible by slicing data
– Querying autonomously. Standard APIis easy to use in programs
SDMX at OECD
<Demo of OECD.Stat web service>
How to start with SDMX? Data Structure Definition Data set Structure specific data set Structure specific time series data set Generic time series data set Generic data set Data flow Data flow definition Category Category map Category scheme Category scheme map Code Code map Codelist Codelist map Hierarchy Hierarchical code Hierarchical codelist Hybrid code map Hybrid codelist map Concept Concept map Concept scheme Concept scheme map Metadata structure definition Metadata set Metadata flow Metadata flow definition Metadata concept Metadata concept scheme
Reporting categoryReporting category mapStructure mapStructure setStructure usageConstraintsAnnotationRepresentationIdentifiable artefact refMaintainable artefact refStructure refInternational stringLocalised stringAgencyAgency schemeContactProvision agreementData and metadata provisioningData providerData provider schemeData provider refData consumerData consumer schemeOrganisation mapOrganisation unitOrganisation unit schemeOrganisation scheme mapMetadata targetAttribute descriptorData attributeMetadata report
Report structureMetadata attributeMeasure descriptorPrimary measureComponent mapTransitionEnumerated attribute valueXHTML attribute valueText attribute valueOther non enumerated attribute valueTarget data keyTarget object keyLevelCoding formatSource codeSource hierarchical codeSource codelistSource hierarchical codelistHierarchical code referenceTarget codeTarget codelistTarget hierarchical codeTarget hierarchical codelistDimension descriptorDimensionTime dimensionMeasure dimensionGroup dimension descriptorData set targetTarget data setReport period targetTarget report period
Dimension description values targetIdentifiable object targetTarget identifiable objectConstraint content targetReporting taxonomyReporting taxonomy mapSeries keyGroup keyReporting year start dayAttachment constraintNo specified relationshipPrimary measure relationshipGroup relationshipDimension relationshipMeasure key valueCoded key valueUncoded key valueTime key valueTime dimension valueComponent valueObservationUncoded observationCoded observationUncoded attribute valueCoded attribute valueScheme mapTo text formatTo value typeData key setData key
Metadata key setMetadata keyConstraint roleContent constraintCube regionMetadata target regionConstraint role typeReference periodRelease calendarMember selectionMember valueRange periodStart periodEnd periodBefore periodAfter periodRegistrationProcessProcess stepProcess artefactSimple datasourceRest datasourceWeb service datasourceComputationTransitionTransformationTransformation schemeOperator schemeReference nodeConstant nodeOperatorOperator nodeParameter
How to start with SDMX?
• Not much needed, but at least:– Understand the business case – the value in doing the project– SDMX.org Learning and working groups:
• Understand the basic SDMX terms, but don’t try to understand the whole of the standard…
SDMX Information Model
• What is an Information Model? Examples:
• SDMX IM designed for statistical data and metadata
exchange• SDMX IM focused on aggregated data, but can be used
for microdata
Information Model Objects Used by
Excel Sheets, Cells, Rows Formulae, VBA
Relational database Database, Table, Column
SQL, Interface
OECD metadata 42 categories OECD.Stat, Metastore
• Benefits of having an information model:– Common vocabulary (Code list, Concept, Dataset)– IM objects are fit-for-purpose• Clearly defined relationships between objects and their
usage
– SDMX formats and tools are built around the IM• Interoperable tools• IM is highly structured, easier to use a part of it rather
than implementing full SDMX standard
SDMX Information Model
Basic SDMX Artefacts• DSD: Data Structure Definition
– Defines a cube/dataset for a domain such as National Account– States dimensions, their members, and attributes– Understand difference between a dimension and attribute
• Concept– Either a dimension or attribute, e.g.
• Dimensions: Age, Location, Sector, Time• Attributes: Observation status, Unit multiplier
• Code list– Dimension or attribute members– Each code list item has a code and description
• Concept Scheme– List of all concepts for domain before splitting them into DSDs
Basic SDMX ArtefactsExample with National Accounts
Concept Scheme
National Accounts
DSDNA Main
ConceptFrequency
Code ListFrequency CL
ConceptReference area
Code ListArea CL
ConceptSector
Code ListSector CL
ConceptObservation Status
Code ListObservation Status CL
How is SDMX data exchanged?
Web Services used for automation• Web Service: a web site without a user
interface• Instead of user interface there is an API
(Application Programming Interface)• Used for machine-to-machine processing• SDMX has a standard API– Means that same software can use API from many
locations
• Push mode– Data provider sends data files to each collector– Each collector gets the data
• Pull mode – Data provider publishes the data once– Each collector gets the data from the provider
• Data hub– Data published to a central location (the hub)– Consumers get notification when data is published
• Pull mode offers more efficient dissemination and collection of data, enables client-drive slicing, and increases timelines of data
• SDMX uses web services to support Pull mode
How is SDMX data exchanged?
Content-oriented Guidelines• Common code lists:
– Country– Observation Status– Currency– Etc.
• Rules in coding• Guidelines for SDMX projects and creating new DSDs, etc.• Benefits:
– Promote best practices in artifact creation, governance– Alignment between domains– Speed-up SDMX projects. Provide shopping list of existing code lists
• Help SDMX projects with recommendations
SDMX Project Steps
Map data flows between organisations• Data formats• Reporting
forms or tables
• Mailbox/web
List domain concepts for entire domain• Becomes
Concept Scheme
Define code lists• Codify all items
using SDMX guidelines
• Hierarchy can come later
Concepts dimension or attribute• Dimension
uniquely identifies data
• Attribute adds info to data, e.g. flags
Create DSDs from concepts. Use data flows• Each dimension
grouping is a DSD
• Ways to avoid many DSDs
Pilot DSDs• 1st
pilot:Reporters provide feedback on DSD structures
• 2nd pilot: Reporters send data, Consumers process it
SDMX Main Tools
• SDMX RegistryDirectory of the structural metadata
• SDMX Converterconverts between formats (Excel, GESMES, CSV, etc.)
• SDMX Reference Infrastructure SDMX Export and mapping
for existing database
Mapping
SDMX Tools
<Demo of Global Registry>
Future of SDMX
• SDMX Validation Language– Automate basic level of data validation e.g. a+b=c– Transform data
• More standard code lists– E.g. Seasonal adjustment
• Better, more reusable tools, e.g. Mapping– Plug-and-play modules to transform, validate messages
• More guidelines and harmonised structures– Such as Global DSDs
• Use SDMX for reference metadata exchange
Thank you
Any questions?
David Barraclough OECD SDMX Coordinator