use of geospatial and web data for oecd statistics · nowcasting inflation and prices std, dcd,...

20
USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 OCTOBER 2015 Paul Schreyer Deputy-Director, Statistics Directorate, OECD

Upload: others

Post on 17-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS

CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 OCTOBER 2015

Paul Schreyer Deputy-Director, Statistics Directorate, OECD

Page 2: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

OECD APPROACH

Page 3: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

• OECD: – Facilitator of discussion on new data sources

for NSOs – OECD’s own use of new data sources

• From Big Data to Smart Data – Not every New data source is Big

Not every Big data source is New

Page 4: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Business value analysis: why are we working on this?

• More granularity or coverage of existing data

(e.g. spatial disaggregation) • New output (e.g., measuring trust, inequalities) • Greater timeliness – nowcasting • Increased impact – analysis supporting OECD

mission, possibility to link areas • Increased responsiveness – capacity to address new

topics quickly, respond to what-if questions

Presenter
Presentation Notes
Increased quality of output – e.g. more insightful policy analysis, better targeted recommendations: Chief economist and SG launched in London last week the Finance and inclusive growth work with slide build on individual data, GOV used railway data to improve their definition of population linked to urban areas, STD using google trends it their measure of subjective well being. New output – e.g. the production of new lines of work: Social connectivity can be researched using Linkedin (GOV), Road connectivity and border effects (ECO), geographical welfare /inequality between cities Increased timeliness – e.g. decrease of the time lag of publishing analysis and underlying data. Nowcasting inflation and prices STD, DCD, detect perturbation events from google trends and swift data (ECO) Increased impact – e.g. analysis and underlying data are increasingly fit for purpose. Big data allows to dive into the core of OECD mission: help governments foster prosperity, fight poverty taken into account environmental and social development implications. STD measuring subjective well being, Dev building indicators for civil tensions and political governance, DCD measuring social development, GOV air quality, assessing china’s skills gap and inequalities in education Increased responsiveness – e.g. increased ability to flexibly adapt to changing and/or newly arising user demands. ECO structural policy database, Product Market Regulation, TAD policy model on trade restrictions
Page 5: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

– Capacity to identify, evaluate and access new data sources

– Command of methodology – Proven quality and metadata frameworks – Suitable IT infrastructures – Established legal and ethical frameworks – Skills and training capacity

Business process analysis: Necessary capabilities

Page 6: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

* Online Real estate prices (OECD GOV)

* Measuring trade restrictiveness by scraping and analysing trade laws (OECD TAD)

Web crawling, web scraping

Content Analysis Mobility studies Sensor and geospatial data

* African Economic Outlook (AEO): Civil tensions and political governance indicators (OECD DEV)

* Big Data Measures of Human Well-Being – Evidence from US Google Index (OECD STD)

* Measure transport reliability from geolocalisation logs (ITF)

* Air quality and land cover data (OECD GOV)

* Enriching the metropolitan database using geo-spatial data (OECD GOV)

* PIAAC log file data (OECD EDU)

4 types of new sources and examples of use cases

Presenter
Presentation Notes
120 participants
Page 7: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

EXAMPLE 1 ENVIRONMENTAL INDICATORS Using geospatial data (satellite data)

Page 8: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

– Where air pollution is above recommended levels

– Where improvements in air quality have happened

– Linking air pollution to health

Average population exposure to air pollution (PM2.5)

Key messages that the indicator should communicate

Presenter
Presentation Notes
First environmental indicator
Page 9: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Source: Raster (satellite observations)

9

Ground-based stations Satellite observations Advantages • Direct measures

• Offer regular levels of air pollution over time

• More pollutants are available

• Global coverage • Consistent method to compute air

pollution in cities, regions and countries

• Consistent time-series data, spanning more than a decade

Disadvantages • Low coverage in developing countries • Uneven coverage within and across

countries • PM2.5 concentration rarely monitored • Site selection, measurement

techniques, and reporting methods differ across regions and countries

• Modelled data • Satellite observations are less precise

for bright surfaces (snow or desert) • Current data are on a multi-year

average, evaluation of short-term events often unavailable

Satellite observations • Raster: van Donkelaar et al. (2014) • Resolution: ~10 km2 • Years: 1998-2012

Presenter
Presentation Notes
Data was made available only in 2014 – OECD estimates June 2014 Data is open, not commercial Data comes processed in terms of annual averages of pollution
Page 10: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

1. The satellite-based values of air pollution are multiplied by the population living in the area (using a 1km2 resolution grid)

2. The exposure to air pollution in a region is given by the sum of the population weighted values of PM2.5 in the 1km2 grid cells falling within the boundaries of the region

3. Finally, dividing this aggregated value by the total population in the region, we obtain the average exposure to PM2.5 concentration in a region

Basic methodology

Page 11: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

• 68% of the urban population in OECD countries (376 million people) are exposed to pollution above the WHO’s recommended levels.

• OECD estimates show wide variation in PM2.5 exposure levels across cities within countries, the largest in Mexico, Italy, Japan and Korea

11

Levels and trends in OECD cities M

érid

a Pale

rmo

Naha

Ulsa

n

Toul

on

Portl

and

Gdańsk

Las

Palm

as

Brem

en

Stoc

khol

m

Gla

sgow

Brno

Conc

epció

n

Gen

eva

Que

bec

Utre

cht

Lisb

on

Athe

ns

Antw

erp

Linz

Cuer

nava

ca

Mila

n

Kum

amot

o

Cheo

ngju

Stra

sbou

rg

Buffa

lo

Krak

ów

Zara

goza Es

sen

Mal

Live

rpoo

l Ost

rava

Sant

iago

Zuric

h

Toro

nto

The

Hagu

e

Porto

Thes

salo

nica

Brus

sel

Vien

na

Buda

pest

Brat

islav

a

Ljub

ljana

Cope

nhag

uen

Helsi

nki

Tallin

n

Oslo

Dubl

in

-10

010

2030

40

Mex

ico (3

3)

Italy

(11)

Japa

n (3

6)

Kore

a (1

0)

Fran

ce (1

5)

Unite

d St

ates

(70)

Pola

nd (8

)

Spai

n (8

)

Ger

man

y (2

4)

Swed

en (3

)

Unite

d Ki

ngdo

m (1

5)

Czec

h Re

publ

ic (3

)

Chile

(3)

Switz

erla

nd (3

)

Cana

da (9

)

Neth

erla

nds

(5)

Portu

gal (

2)

Gre

ece

(2)

Belg

ium

(4)

Aust

ria (3

)

Hung

ary

(1)

Slov

ak R

epub

lic (1

)

Slov

enia

(1)

Denm

ark

(1)

Finl

and

(1)

Esto

nia

(1)

Norw

ay (1

)

Irela

nd (1

)

Metropolitan minimum Country average Metropolitan maximum

Coun

try (N

o.of

citie

s)

Source: Brezzi and Sanchez-Serra (2014)

Presenter
Presentation Notes
Concluision is not that satellite data should replace ground monitoring everywhere Rather: complementary – fill gaps, add granularity, cross-check measurement
Page 12: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Europe USA Japan World

Raster name

Corine land cover

National land cover dataset (NLCD)

Japan National Land Service Information data

MODIS 500 Map of Global Urban Extent

Resolution 25 metres 30 metres 100 metres 500m

Years 2000-06 2001-06 1997-2006 2008

Classif. of urban land

44 land urban classes

21 land cover classes

11 land cover classes

17 land cover classes Water

Other example: raster sources used for land cover

Presenter
Presentation Notes
Several datasources have been used to construct this indicator. More detailed rasters (satellite data) were available for EU, US and Japan, other countries were computed using a less detailed datasource (MODIS) which convers the entire world.
Page 14: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

EXAMPLE 2 TRADE POLICY ANALYSIS Using qualitative data from government websites

Page 15: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Basic idea

Traditionally: • Policy questionnaires to countries • ‘Manual’ screening of government websites New: • Machine-based monitoring of government web sites • Automatic check for changes or addition of rules and

regulations

Test case: qualitative information for the OECD’s trade restrictiveness information and index

Page 16: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Text comparison - Initial discovery Run a text comparison between the original document

and the new updated document Detect and flag specific paragraphs changed or

updated inside long documents Text comparison - Advanced discovery. Changes in rules and regulations can also happen

through new pages Use ‘big data’ techniques to compare in house

structured information to the universe of laws and regulations in a given country.

Work on text definitions similar to the original ones to help identifying potentially relevant documents.

How?

Page 17: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Web-crawling: scripts to systematically scan governmental websites where regulations can be found (federal, provincial, regional, etc.).

Web-scraping: scripts to extract the relevant information in documents, possibly based on articles and paragraphs (text analysis).

Document conversion: most laws and regulations are in pdf but possibly in other formats that would need to become text documents to run text analysis.

Text comparison: tools and dictionaries to compare the text of updated documents with the original text, to calculate similarity coefficients with other documents, in a variety of languages with the option to also use proximity of similar words.

IT Tools

Page 18: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Promising results on French legal texts (Legifrance)

Web scraping / Text analysis

Page 19: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

• Significant potential • Use cases and pilots provide really

important reality checks • Smart data and multiple source, not

necessarily big data • Initiatives have sprung in many parts of

OECD • Need to be accompanied by overall

strategy being developed at OECD

Summary

Page 20: Use of geospatial and web data for OECD statistics · Nowcasting inflation and prices STD, DCD, detect perturbation events fro對m google trends and swift data \⠀䔀䌀伀尩\爀屲Increased

Thank you!