applications & implications of big data for official statistics - emmanuel letouzé
TRANSCRIPT
Applications &
Implications of
Big Data for
Official Statistics
Emmanuel Letouzé
Director & co-Founder
Data-Pop Alliance
DfID, London, February 26, 2015
1. The Emergence of Big Data &
The Statistical Tragedy
Framing and surfacing of the issue
2. Big Data and Official Statistics:
Substitute, Complement, or “It’s complicated”?
3. The Case of the SDGs
A story of fish and fishermen
1. The Emergence of Big Data
vs. The Statistical Tragedy
Framing and surfacing of the issue
Hal Varian’s nowcasting, GDP
and light emissions paper.…
Line shows returns for “Big Data” on Google Trends between 2007 and 2014; 100=maximum value
“We are at
the beginning
of what I call
The Industrial
Revolution of Data.”
Joe Hellerstein
, November
19, 2008
Context: the Big Data rush
* So
urc
e: O
xfa
m In
tern
atio
na
l, citin
g C
red
it Su
isse, J
an
. 2014
“Data is the new oil”
Google Flu Trend: rise and fall
Hope or Hype?
2. Big Data & Official Statistics
It’s complicated. Or complex.
What is Big Data? 2010-12: the 3 Vs of big data
i. Exhaust
ii. Web
iii. Sensing
Crumbs
Capacities
Communities
What is Big Data? Now: the 3 Vs of Big Data
Movement of an individual in Rwanda over 4 years using CDRs (Source J. Blumenstock, 2010)
The new data ecosystem
1. Early warning
1. Real time awareness
1. Real-time feedback
Source: Letouzé, 2012
“What can it be used for?”—Taxonomy of applications (1)
1. Descriptive-e.g. maps, clouds..
1. Predictive:-forecasting
-inference
1. Prescriptive-causal inference Source: Letouzé, Vinck and Meier, 2013
“What can it be used for?”—Taxonomy of applications (2)
NationalStatistical
Institutes carryout surveys
Telefonica teamused their data to‘predict’ SELs fromCell Phone Usage
Predict the present(SELs for non-
surveyed regions)
and monitor the
future (trackchanges over time)
Survey from “a
major city in Latin America”
Source: “Prediction of Socio-Economic Levels Using Cell-Phone Records” (Telefonica research, 2011)
‘Predicting’ socioeconomic levels?
Promoting a people-centered Big Data Revolution
Counting people?
Sample bias correction
Then:
blending of hypothesis based vs. supervised
machine learning methods to model bias
Source: Letouzé, 2014, based on primary and secondary sources
What & how much do we know?
….and does it matter?
Poverty prevalence 1990-2030Fragile States vs. Non-Fragile States
Are official statistics ever more than shadows in
the cave? If so what are they good for?
“Official statistics assumes a key role in ensuring
democracy and fostering social
progress…[should] ”provide society with
knowledge of itself”
Enrico Giovanni—former President of Istat
Co-Chair of IEG on the Data Revolution
“Knowledge is power; statistics is democracy”Former President of Statistics Finland
(2) Official
Statistics as
systems—not
reducible to
producing (1)
(1) Official
Statistics as
data—entirely
defined as
product of (2)*
* According to Fundamental Principles of Official Statistics
What is/are official statistics?
Official
Statistics
(1) Ensure that
societies benefit from
“knowledge of itself”
(according to some
political and
technical standards)
(2) Ensure that
societies benefit
from the presence
of a deliberative
public space
What is/are the purpose(s)
of official statistics?
(2) Ensure that societies benefit from the
presence of a deliberative public
space
It’s complex—and it’s political
(1) Ensure that societies benefit from “knowledge of itself” (according to some
political and technical standards)
Official
Statistics
Big Data
Source: Letouzé, 2014, based on primary and secondary sources
(How) can data reduce poverty?
Poverty prevalence 1990-2030Fragile States vs. Non-Fragile States
…by that Gary King means: it’s about the analytics
Jonathan GlemmieThe Guardian, Oct 3, 2013
3. The case of the SDGs
A story of fish and fishermen
SDGs adopted by
the OWG
Big data examples What is monitored How is
monitored
Country(ies) Year Advantages of using
big data
1. Poverty eradication
Satellite data to estimate povertyi Poverty Satellite images, night-lights
Global map 2009 International comparable data,
which can be
updated more
frequently
Estimating poverty maps with cell-phone recordsii
Poverty Cell phone records
Cote d’Ivoire 2013-4
Internet-based data to estimate
consumer price index and poverty
ratesiii
Price indexes Online prices at
retailers
websites
Argentina 2013 Cheaper data
available at higher
frequencies
Cell-phone records to predict socio-
economic levelsiv
Socio-economic
levels
Cell phone
records
City in Latin
America
2011 Data available more
regularly and cheaper than official data;
informal economy
better reflected
2. End hunger,
achieve food security and
improved nutrition,
and promote
sustainable
agriculture
Mining Indonesian Tweets to
understand food price crisesv
Food price crises Tweets Indonesia 2014
Uses indicators derived from mobile
phone data as a proxy for food
security indicatorsvi
Food security Cell phone data
and airtime
credit
purchases
A country in
Central Africa
2014
Use of remote-sensing data for drought assessment and monitoring
Drought Remote sensing Afghanistan, India,
Pakistanvii
2004
Chinaviii 2008
3. Health Internet-based data to identify
influenza breakoutsix
Influenza Google search
queries
US 2009 Real-time data;
captures disease cases not officially recorded;
data available earlier
than official data
Data from online searches to
monitor influenza epidemicsx
Influenza Online searches
data
China 2013
Detecting influenza epidemics using
twitterxi
Influenza Twitter Japan 2011
Monitoring influenza outbreaks using twitterxii
Influenza Twitter US 2013
Systems to monitor the activity of
influenza-like-illness with the aid of
volunteers via the internetxiii,xiv
Influenza Voluntary
reporting
through the
internet
Belgium, Italy,
Netherlands,
Portugal,
United
Kingdom, United States
ongoi
ng
Cell-phone data to model malaria
spreadxv
Malaria Cell-phone
data
Kenya 2012
Using social and news media to Cholera Social and news Haiti 2012
SDG monitoring & Big Data
SDG achievement & Big Data