an investigation into using google trends as an administrative data source in ons daniel ayoubkhani...

25
An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and Statistical Computing Division Office for National Statistics, UK

Upload: elena-milson

Post on 31-Mar-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

An Investigation into Using Google Trends as an Administrative Data Source in ONS

Daniel AyoubkhaniTime Series Analysis Branch

Survey Methodology and Statistical Computing Division

Office for National Statistics, UK

Page 2: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

Overview

1. Introduction to Google Trends

2. Using Google Trends Data

An Investigation Conducted by ONS:

3. Data

4. Methods

5. Results

6. Conclusions and Considerations

Page 3: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

1. Introduction to Google Trends

• Google provide weekly data on changes in search query share (rather than volume)

• need to convert to levels and aggregate to months/quarters

• Data are available:• back to the start of January 2004

• for individual search queries, 25 top level categories and hundreds of lower level categories

• for free, to anyone with a Gmail account, from:

www.google.com/trends

Page 4: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

1. Introduction to Google Trends

Source: Google Trends

Example – Google searches for “statistics”

Page 5: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

1. Introduction to Google Trends

Example – Search query to top level classification:

“statistics”

Demographics

Social Sciences

Reference

Poverty & Hunger

Social Issues & Advocacy

People & Society

Page 6: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

2. Using Google Trends Data

• Choi, H and Varian, H (2009) Predicting the Present with Google Trends:

• Paper pioneered use of Google Trends data as a nowcasting tool for economic variables

• Fitted log–linear models to US retail, automotive and home sales

• Predictive performance of models increased when Google Trends terms were included

• Many studies using Google Trends data for prediction of economic variables published since then

Page 7: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

2. Using Google Trends Data

• Potential uses of Google Trends (GT) data identified by ONS:1. Quality assurance of outputs

2. Nowcasting of outputs

3. Replacement of existing data sources

• Focus of this investigation: quality assurance of the UK Retail Sales Index (RSI)

Page 8: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

2. Using Google Trends Data

Aims of this investigation:• Fit benchmark models that are representative

of current ONS practice• Fit alternative models that include appropriate

GT terms as predictors• Compare models using empirical measures• Draw conclusions to inform ONS strategy

Page 9: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

3. Data - Retail Sales Index

• All Retail Sales• Non-Specialised Food Stores• Non-Specialised Non-Food Stores• Textiles, Clothing and Footwear• Furniture and Lighting• Home Appliances• Hardware, Paints and Glass• Audio and Video Equipment and Recordings• Books, Newspapers and Stationary• Computers and Telecommunications• Non-Store Retailing

Page 10: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

3. Data - Retail Sales Index

Source: ONS

Page 11: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

3. Data - Google Trends

• All extracted GT time series:• represent weekly UK search activity• start in January 2004• end in July 2011

• Each RSI series matched with:• at least one GT search category• top five search queries with each category

Page 12: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

3. Data - Google Trends

RSI Series: Furniture and Lighting

Google Trends category Top 5 Google Trends queries

Lighting lighting, light, lights, lamp, lamps

Home and Garden furniture, ikea, garden, b&q, homebase

Homemaking and Interior Decor blinds, curtains, curtains curtains curtains, bedroom

Home Furnishings furniture, ikea, beds, lighting, table table

Page 13: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

4. Methods - Benchmark Models

RegARIMA (linear regression + ARIMA noise)• Regression terms capture deterministic effects:

• inconsistent survey periods due to 4-4-5 design• moving holidays (e.g. Easter)• additive outliers and level shifts

• ARMA terms capture autocorrelation in the regression residuals

• Non-stationarity handled via log transformation and differencing

• Models automatically identified and estimated using X-12-ARIMA

Page 14: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

4. Methods - Alternative Models

Benchmark models extended with (log transformed, differenced) GT variables:•“Forced” static relationships estimated for all series•Lagged relationships identified from cross-correlation plots of pre-whitened series:

1. fit ARIMA models to all RSI and GT series

2. correlate each RSI residual series with each of its corresponding GT residual series

(i.e. remove trend and seasonality and correlate the shocks)

•Relationships identified at more than one lag modelled both individually and together

Page 15: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

4. Methods - Alternative Models

Example – Furniture and Lighting vs “garden”:

Page 16: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

5. Results - Initial Analysis

Component of the RSI(no. alternative models fitted)

% alt. models with AICC lower than benchmark

% GT terms significant at

5% level

All Retail Sales (8) 0.0 37.5

Non-Specialised Food Stores (6) 0.0 0.0

Non-Specialised Non-Food Stores (6) 0.0 83.3

Textiles, Clothing & Footwear (23) 30.4 36.0

Furniture & Lighting (31) 90.3 78.8

Home Appliances (7) 14.3 0.0

Hardware, Paints & Glass (6) 50.0 100.0

Audio & Video Equipment (44) 43.2 51.0

Books, Newspapers & Stationery (6) 16.7 100.0

Computers & Telecommunications (31) 9.7 15.2

Non-Store Retailing (7) 42.9 42.9

Page 17: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

5. Results - Initial Analysis

• Furniture and Lighting – top three models in terms of AICC:

GT term in model Lag(s) GT category AICC

lighting 0 Home Furnishings 412.47

curtains curtains curtains 0 & 1Homemaking & Interior

Decor414.76

lights 0 Lighting 415.63

Benchmark 432.29

Page 18: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

5. Results - More Recent Analysis

• Focused on GT search categories due to transient nature of popular search queries

• Compared models using out-of-sample, one-step-ahead predictions• relies on having sufficient number of observations

for initial fitting• 24 periods: May 2010 to April 2012• only calculated for models with significant GT

terms

Page 19: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

5. Results - More Recent Analysis

Component of the RSIMAPE of

benchmark model

MAPE of best alternative

model

No. alt. models with MAPE lower than benchmark

All Retail Sales - - -

Non-Specialised Food Stores - - -

Non-Specialised Non-Food Stores 2.01 1.87 1/1

Clothing & Footwear 2.70 1.80 1/2

Furniture & Lighting 3.78 2.89 7/7

Home Appliances 5.20 4.30 4/4

Hardware, Paints & Glass 4.90 4.07 4/4

Audio & Video Equipment 4.03 3.46 3/9

Books, Newspapers & Stationery 3.71 3.55 1/3

Computers & Telecoms 7.76 6.21 5/8

Non-Store Retailing 3.25 3.24 1/1

Page 20: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

5. Results - More Recent Analysis

GT search category Lag(s) MAPE

[Lamps & Lighting] + [Rugs & Carpets] 0 , 0 2.89

Home Furnishings 0 2.90

Lamps & Lighting 0 2.97

Rugs & Carpets 0 3.19

Sofas & Chairs 0 3.29

Homemaking & Interior Decor 0 3.56

Clocks 0 3.65

Benchmark 3.78

Furniture and Lighting:

Page 21: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

6. Conclusions and Considerations

• Promising results for some RSI components...• Furniture and Lighting• Hardware, Paints and Glass• Audio Equipment and Recordings

• ...but less so for others• All Retail Sales• Non-Specialised Food Stores• Non-Specialised Non-Food Stores

• Additional information is only useful when the RSI series is not dominated by trend and seasonality

Page 22: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

6. Conclusions and Considerations

1. GT variable selection• millions of potential explanatory variables• need for automation• Google Correlate• popularity of search queries is transitory:

Home Improvement - top 5 search queries

August 2011 August 2012

b&q doors

homebase paint

b q flooring

b and q tiles

diy homebase

Page 23: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

6. Conclusions and Considerations

2. Changes to GT categorisation taxonomy• happened in December 2011

• new categories created• infrequent categories deleted• changes to taxonomic parents• became possible to have more than one parent

3. GT data only available from 2004 onwards• most ONS economic series start much earlier

Page 24: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

6. Conclusions and Considerations

4. Some factors affect the response variable but not the GT predictor (or vice-versa), even if the model performs well overall

• e.g. heavy snowfall prevents customers travelling to shops, but internet sales unlikely to be adversely affected

5. Wider applicability to outputs• key economic outputs e.g. Index of Services• other possibilities – e.g. migration?

6. Future cost and accessibility of GT data?

Page 25: An Investigation into Using Google Trends as an Administrative Data Source in ONS Daniel Ayoubkhani Time Series Analysis Branch Survey Methodology and

Questions?