sdmx basics david barraclough oecd sdmx coordinator

28
SDMX Basics David Barraclough OECD SDMX Coordinator

Upload: bertha-shields

Post on 17-Jan-2016

234 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMXBasics

David Barraclough OECD SDMX Coordinator

Page 2: SDMX Basics David Barraclough OECD SDMX Coordinator

Overview• What is SDMX?• Why SDMX?• SDMX at OECD• How to start with SDMX?• Some SDMX concepts– How is data exchanged– The main tools– Content-oriented guidelines

• Future of SDMX

Page 3: SDMX Basics David Barraclough OECD SDMX Coordinator

What is SDMX (not)?

Not simply a technical format!

Page 4: SDMX Basics David Barraclough OECD SDMX Coordinator

What is SDMX?

• Statistical Data and Metadata eXchange• Released in 2002 “SDMX is an initiative to

foster standards for the exchange of statistical information.”

• Sponsor organisations: – BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank

Page 5: SDMX Basics David Barraclough OECD SDMX Coordinator

What is SDMX?

• Format: XML and EDI (rebranded GESMES)• SDMX Information model• Web service standards: APIs• SDMX Registry standards• Content-oriented guidelines

Page 6: SDMX Basics David Barraclough OECD SDMX Coordinator

Why SDMX? The Business Case

• Reusable, open-source (free) tools save money and time• Standard codes and naming help improve reuse and save

time– Reuse of categories– Less mapping/data processing saving– Shopping list of concepts when defining structures

• Strongly-typed structures help improve validation and processing– Heavy-lifting processing of data messages can be automated– Text format is human-readable– Easier to create new tools around the agreed format

Page 7: SDMX Basics David Barraclough OECD SDMX Coordinator

Why SDMX? The Business Case

• Standard technical architecture promotes more timely, better quality data– Timely because less manual conversion is needed– Quality because automated processing means less human error

• SDMX Information model– Provides a common terminology – Makes tool development much easier– Information model described later

Page 8: SDMX Basics David Barraclough OECD SDMX Coordinator

What’s in it for Data Reporters?

• SDMX Registry helps structure metadata • SDMX Tools exist and are free• One dissemination channel instead of

packaging data for multiple consumers• Can easily disseminate SDMX from existing

data warehouse with SDMX-RI• Lots of SDMX methodology available and

growing

Page 9: SDMX Basics David Barraclough OECD SDMX Coordinator

Why not use…?Issues

CSV Not structured, hard to validateNo metadata

Excel Metadata tied to presentationProprietary formatLicensingHard to process and automate

FAME, SAS, STATA

Proprietary formatLicensing

GESMES No information modelProprietary formatFew tools or international support

XML No context to tagsSDMX adds context to XML

XBRL, DDI Not focused on aggregated data exchange

Page 10: SDMX Basics David Barraclough OECD SDMX Coordinator

The SDMX XML Data file format:

Page 11: SDMX Basics David Barraclough OECD SDMX Coordinator

“Global DSDs”Domains at various stages of implementation:• National Accounts• Balance of Payments• Foreign Direct Investment FDIIn draft:• Harmonized Trade IMTS• R&D• Education

Many other “Shared ” DSDs.

Page 12: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX at OECDHarmonized Trade data• Synchronised from UN database every night• Only “Delta” is synched in our database. Required because

trade database is huge• SDMX standards support querying the delta for a given date

Harmonized Trade

Page 13: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX at OECD• OECD.Stat SDMX web service is used for:– Data resellers receive data in standardformat, easy to process

– Incremental updates are possible by slicing data

– Querying autonomously. Standard APIis easy to use in programs

Page 14: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX at OECD

<Demo of OECD.Stat web service>

Page 15: SDMX Basics David Barraclough OECD SDMX Coordinator

How to start with SDMX? Data Structure Definition Data set Structure specific data set Structure specific time series data set Generic time series data set Generic data set Data flow Data flow definition Category Category map Category scheme Category scheme map Code Code map Codelist Codelist map Hierarchy Hierarchical code Hierarchical codelist Hybrid code map Hybrid codelist map Concept Concept map Concept scheme Concept scheme map Metadata structure definition Metadata set Metadata flow Metadata flow definition Metadata concept Metadata concept scheme

Reporting categoryReporting category mapStructure mapStructure setStructure usageConstraintsAnnotationRepresentationIdentifiable artefact refMaintainable artefact refStructure refInternational stringLocalised stringAgencyAgency schemeContactProvision agreementData and metadata provisioningData providerData provider schemeData provider refData consumerData consumer schemeOrganisation mapOrganisation unitOrganisation unit schemeOrganisation scheme mapMetadata targetAttribute descriptorData attributeMetadata report

Report structureMetadata attributeMeasure descriptorPrimary measureComponent mapTransitionEnumerated attribute valueXHTML attribute valueText attribute valueOther non enumerated attribute valueTarget data keyTarget object keyLevelCoding formatSource codeSource hierarchical codeSource codelistSource hierarchical codelistHierarchical code referenceTarget codeTarget codelistTarget hierarchical codeTarget hierarchical codelistDimension descriptorDimensionTime dimensionMeasure dimensionGroup dimension descriptorData set targetTarget data setReport period targetTarget report period

Dimension description values targetIdentifiable object targetTarget identifiable objectConstraint content targetReporting taxonomyReporting taxonomy mapSeries keyGroup keyReporting year start dayAttachment constraintNo specified relationshipPrimary measure relationshipGroup relationshipDimension relationshipMeasure key valueCoded key valueUncoded key valueTime key valueTime dimension valueComponent valueObservationUncoded observationCoded observationUncoded attribute valueCoded attribute valueScheme mapTo text formatTo value typeData key setData key

Metadata key setMetadata keyConstraint roleContent constraintCube regionMetadata target regionConstraint role typeReference periodRelease calendarMember selectionMember valueRange periodStart periodEnd periodBefore periodAfter periodRegistrationProcessProcess stepProcess artefactSimple datasourceRest datasourceWeb service datasourceComputationTransitionTransformationTransformation schemeOperator schemeReference nodeConstant nodeOperatorOperator nodeParameter

Page 16: SDMX Basics David Barraclough OECD SDMX Coordinator

How to start with SDMX?

• Not much needed, but at least:– Understand the business case – the value in doing the project– SDMX.org Learning and working groups:

• Understand the basic SDMX terms, but don’t try to understand the whole of the standard…

Page 17: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX Information Model

• What is an Information Model? Examples:

• SDMX IM designed for statistical data and metadata

exchange• SDMX IM focused on aggregated data, but can be used

for microdata

Information Model Objects Used by

Excel Sheets, Cells, Rows Formulae, VBA

Relational database Database, Table, Column

SQL, Interface

OECD metadata 42 categories OECD.Stat, Metastore

Page 18: SDMX Basics David Barraclough OECD SDMX Coordinator

• Benefits of having an information model:– Common vocabulary (Code list, Concept, Dataset)– IM objects are fit-for-purpose• Clearly defined relationships between objects and their

usage

– SDMX formats and tools are built around the IM• Interoperable tools• IM is highly structured, easier to use a part of it rather

than implementing full SDMX standard

SDMX Information Model

Page 19: SDMX Basics David Barraclough OECD SDMX Coordinator

Basic SDMX Artefacts• DSD: Data Structure Definition

– Defines a cube/dataset for a domain such as National Account– States dimensions, their members, and attributes– Understand difference between a dimension and attribute

• Concept– Either a dimension or attribute, e.g.

• Dimensions: Age, Location, Sector, Time• Attributes: Observation status, Unit multiplier

• Code list– Dimension or attribute members– Each code list item has a code and description

• Concept Scheme– List of all concepts for domain before splitting them into DSDs

Page 20: SDMX Basics David Barraclough OECD SDMX Coordinator

Basic SDMX ArtefactsExample with National Accounts

Concept Scheme

National Accounts

DSDNA Main

ConceptFrequency

Code ListFrequency CL

ConceptReference area

Code ListArea CL

ConceptSector

Code ListSector CL

ConceptObservation Status

Code ListObservation Status CL

Page 21: SDMX Basics David Barraclough OECD SDMX Coordinator

How is SDMX data exchanged?

Web Services used for automation• Web Service: a web site without a user

interface• Instead of user interface there is an API

(Application Programming Interface)• Used for machine-to-machine processing• SDMX has a standard API– Means that same software can use API from many

locations

Page 22: SDMX Basics David Barraclough OECD SDMX Coordinator

• Push mode– Data provider sends data files to each collector– Each collector gets the data

• Pull mode – Data provider publishes the data once– Each collector gets the data from the provider

• Data hub– Data published to a central location (the hub)– Consumers get notification when data is published

• Pull mode offers more efficient dissemination and collection of data, enables client-drive slicing, and increases timelines of data

• SDMX uses web services to support Pull mode

How is SDMX data exchanged?

Page 23: SDMX Basics David Barraclough OECD SDMX Coordinator

Content-oriented Guidelines• Common code lists:

– Country– Observation Status– Currency– Etc.

• Rules in coding• Guidelines for SDMX projects and creating new DSDs, etc.• Benefits:

– Promote best practices in artifact creation, governance– Alignment between domains– Speed-up SDMX projects. Provide shopping list of existing code lists

• Help SDMX projects with recommendations

Page 24: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX Project Steps

Map data flows between organisations• Data formats• Reporting

forms or tables

• Mailbox/web

List domain concepts for entire domain• Becomes

Concept Scheme

Define code lists• Codify all items

using SDMX guidelines

• Hierarchy can come later

Concepts dimension or attribute• Dimension

uniquely identifies data

• Attribute adds info to data, e.g. flags

Create DSDs from concepts. Use data flows• Each dimension

grouping is a DSD

• Ways to avoid many DSDs

Pilot DSDs• 1st

pilot:Reporters provide feedback on DSD structures

• 2nd pilot: Reporters send data, Consumers process it

Page 25: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX Main Tools

• SDMX RegistryDirectory of the structural metadata

• SDMX Converterconverts between formats (Excel, GESMES, CSV, etc.)

• SDMX Reference Infrastructure SDMX Export and mapping

for existing database

Mapping

Page 26: SDMX Basics David Barraclough OECD SDMX Coordinator

SDMX Tools

<Demo of Global Registry>

Page 27: SDMX Basics David Barraclough OECD SDMX Coordinator

Future of SDMX

• SDMX Validation Language– Automate basic level of data validation e.g. a+b=c– Transform data

• More standard code lists– E.g. Seasonal adjustment

• Better, more reusable tools, e.g. Mapping– Plug-and-play modules to transform, validate messages

• More guidelines and harmonised structures– Such as Global DSDs

• Use SDMX for reference metadata exchange

Page 28: SDMX Basics David Barraclough OECD SDMX Coordinator

Thank you

Any questions?

David Barraclough OECD SDMX Coordinator