wp6: early estimates - europa...state of the affairs 4 since tallinn meeting wp6&wp7 meeting...
TRANSCRIPT
WP6: Early Estimates
Boro Nikic
Co-ordination meeting ESSnet
Big Data
Brussels, November 2016
WP6 Early estimates
Goal:
• The aim of this pilot is to investigate how a combination of (early available)
multiple Big Data sources and existing official statistical data can be used in
order to create existing or new early estimates for statistics.
• A maximum two pilots will be carried out on quick wins.
Partners: SI, FI, NL, PL
Start date: 1.2.2016
End date: 28.2.2017
2
Deliverables (SGA-1 only):
3
Deliverables Month
List of potential big data sources together with the business cases
for the aim of early estimates (general)
13
Recommendations about IT tools for collection of data for purposes
of Consumer Confidence Index and NowCasts of Turnover Indices
13
Recommendations about methodology for processing the data
for purposes of Consumer Confidence Index and NowCasts of
Turnover Indices
13
State of the affairs
4
Since Tallinn meeting
WP6&WP7 meeting
Warsaw
June,2016
Interim feasibility report July, 2016
Business case for the aim
of early estimates (SGA2)
July-September, 2016
First results of nowcacsting
of turnover indices
September-October, 2016
Proposed pilot (1)
Title of the pilot: Eytly estimates of economic indicators
Main economic indicators: • Gross domestic product (GDP)
• Consumer price index (CPI)
• Retail sale
• Balance of payments
• Economic sentiment indictors
• New leading economic indicators
5
Proposed pilot (2) Aim of the pilot: • Investigate multiple Big data and other existing sources for purposes of early estimates of
at least one of the main economic indicators (partly in SGA1)
• Create and test the methodology of creating early estimates for at least one of the main
economic indicators.
• Define and test the quality measures which assess quality of the sources, statistical
production and statistical results
Multinational dimension: Many of the sources are available in most of the countries
so it is possible to test them and create the results for more than one country.
Even if the country does not have access to any Big data source it is still possible to test
methods and processes on administrative and other existing sources.
6
Sources (1)
7
SURS survey /
administrative sources
(monthly)
Dissemin
ation
Availability
of majority
of data
Business tendencies t-5 t-5
Short term statistics (industry, construction,
services, trade)
t+30-60 t+20-30
Foreign trade t+40 t+20-30
Building permits t+20 (2017) t+5
Demography of enterprises (SBR) t+20-25 t+20-25
VATdata (FURS) t+45 T+20 (rok za
oddajo)
Wages
…
Sources (2)
8
Big Data Availability (SURS)
Job Vacancies Ads from
job portals
Yes
Traffic loops ?
Social media data (Twitter,
Facebook,…)
?
Data from supermarket
chains
Yes
Transaction data from
banks
?
Sources (ongoing work)
Before the end of SGA1:
1. All countries involved in SGA1 (and SGA2) will prepare
possible list of sources which could be used for the early
estimates of economic indicators
2. This list will also contain the information for each source of
data at what time majority of data is available
9
Nowcasting turnover indices
• One of the pilots that was started in WP6
• Statistics Finland, SURS
• Interesting methodological suggestions for
estimating early economic indicators → SURS
decided for testing staring with this idea
• Modeling isn‘t new, but is very often used in
connection with big data sources.
• Modeling is very useful for estimations of early
economic indicators based on many different
data sources.
Model (1)
• Input 1: time series of interest
(aggregate data)
time TSI
2008M01 109.64
2008M02 113.51
2008M03 116.23
… …
2015M12 95.78
Model (2) Input 2: time series of data that might help explaining
what‘s going on (microdata)
time P001 P002 … P973
2008M0
1
3526 214 … 66519
2008M0
2
4252 332 … 36012
2008M0
3
4111 411 … 52447
… … … … …
2015M1
2
5241 412 … 71025
Model (3) • Model: 2 stages:
1. Principal component analysis (PCA)
- dimensionality reduction
- choose the first few principal components
2. Linear regression
- Y (dependent variable): time series of
interest, e.g. turnover index
- X1, …, Xn (predictors): e.g. the chosen
principal components
Model (4) • Ouput:
– An estimate for the series of interest‘s last point in time: e.g.
2015M12
– Others, e.g.:
• Percentage of variability of the data explained by the
chosen principal components
• Percentage of variability of the time series of interest
explained by the chosen linear regression model
• Mean absolute error of the chosen linear regression
model
Model (5) • Many possibilities for improvement of the models:
– Length of time series
– Data editing (e.g. imputations)
– Choice of principal components
– Additional predictors in linear regression
• Many issues:
– Availability of the data
– Software: RStudio …
First results of testing (1) Example:
- Time series of interest: Real turnover index in industry
- Time series of data that might help: Real turnover of 973
industrial enterprises
- Data: from 2008M01 to 2015M12 (8 years)
- Principal component analysis:
- 33 chosen principal components explain 80.2 % of the variablity of
enterprise data
- Linear regression:
- 97.5% of variability of real turnover index in industry is explained
- Maximum absolute error: 4.94
- Mean absolute error: 1.04
- Standard deviation of error: 1.32
- The last period is 2015M12: Original value: 95.78
Estimate: 97.18
Error: -1.40
IT tools involved in nowcasting of
turnover indices
Data preparation Modeling Results
18
STATISTICAL PRODUCTION
Methods (ongoing work)
Before the end of SGA1 we plan to:
1. Test at least one alternative method for nowcasting of economic
indicators
2. Include data from multiple sources (construction, services,...)
3. Test forecasting based on available data
4. Prepare an inventory of nowcasting methods
19
Early estimates (ongoing work)
Before the end of SGA1:
1. Inventory of current practices in other countries/institutions
2. Prepare a list of possible „new leading economic indicators“
20