eurostat web activity evidence to increase timeliness of official statistics iaos 2014 8 – 10...

17
Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Upload: marsha-morton

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Web activity evidence to increase timeliness of official statistics

IAOS 20148 – 10 October

Page 2: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

My definition of big data

• Data deluge• Larger, faster, more

(a.k.a. Volume, Velocity, Variety)

• Everything is dataText, sound, images, video

• Analytics• Predictive analytics

Ex: Google translate, voice recognition, suggestions systems, health applications

• The new data product by excellenceOfficial stat: chances of getting a new job

• An emergent market

Page 3: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

ESS Big Data action plan

• Scheveningen memorandum• Action plan adopted by European Statistical System

Committee• Strategy

• Pilots, three time horizons roadmap, review as needed

• Areas• Policy, Communication, Big data sources, Applications /

pilots, Methods, Quality, IT infrastructure, Skills, Experience sharing, Legislation, Governance

• http://www.cros-portal.eu/content/ess-big-data-action-plan-and-roadmap-10

Page 4: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Past experiences

• 2005: Association between web activity and unemployment identified

• 2006: Google Trends• 2008: Google Flu Trends (GFT)• 2009: GFT underestimated official figures

• 1st revision of GFT model

• 2013: GFT overestimated flu peak values• 2nd revision of GFT model

• 2014: Backlash against big data

Page 5: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Data Source: Google Trends (www.google.com/trends).

Page 6: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates,June 2003–March 2013

Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10)

License: Creative Commons CC0 public domain dedication

Page 7: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates,June 2003–March 2013

Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10)

License: Creative Commons CC0 public domain dedication

Page 8: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates,June 2003–March 2013

Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10)

License: Creative Commons CC0 public domain dedication

Page 9: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates,June 2003–March 2013

Olson DR, Konty KJ, Paladini M, Viboud C, et al. (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10)

License: Creative Commons CC0 public domain dedication

Page 10: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Source: Financial Times Magazine (2014).

Page 11: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Lessons from GFT

• Premature release of statistical product can harm its reputation

• Avoid big data hubris• Google search algorithms frequent changes

impacts validity of models• We need transparency and replicability

• GFT search terms unknown• GT is based on a sample which sampling

methodology is unknown

Page 12: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

Other sources of web activity

• Wikipedia page views• Flu

• Twitter• International and internal migration flows

• Possibly other• Visits to particular websites

Page 13: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

How to introduce web activity data in official flash estimates?

• Launch a larger scale balanced study

• Negative results normally are not published

• Purpose: guide decision on investment

Page 14: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

How to introduce web activity data in official flash estimates?

• Diversification and assessment of the web activity data sources• NSI lack control of the source

Black boxInability to guarantee that there was no manipulationBreaks in seriesLack of continuity

• Diversify the sources• Revision of prediction models• Accreditation and certification

Page 15: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

How to introduce web activity data in official flash estimates?

• Integration of web activity data with traditional official statistics sources• Official statistics should not simply reproduce

what others can do, but instead do it making use of its specific comparative advantages

• We are the original producers, we know its details• Use more detail than what is published• Traditional methods (surveys)

Page 16: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

How to introduce web activity data in official flash estimates?

• Research on relation between web activity and the phenomena being predicted

• Remember lesson from GFT

• Do not confuse web activity with the phenomenon itself

Page 17: Eurostat Web activity evidence to increase timeliness of official statistics IAOS 2014 8 – 10 October

Eurostat

How to introduce web activity data in official flash estimates?

• Joint effort on the development of appropriate prediction models

• Learn from each other

• Transparency

• International comparability