Weighing Individual Observations for Time Series Forecasting
Victor Hoornweg & Philip Hans Franses
Erasmus University Rotterdam, Tinbergen Institute, Econometric Institute
Rotterdam, July 1, 2014
1
Introduction
Issue: • How to deal with structural breaks or outliers? Weigh individual observations
• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 • DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡,
– where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
2
Figure 1. Simulated series
Introduction
Issue:
• How to deal with structural breaks or outliers?
Weigh individual observations
• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡
• DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡, – where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
3
Figure 2. Individual weights assigned to observations across time
Introduction
Issue:
• How to deal with structural breaks or outliers?
Proposed solution:
• Assign robust weights to observations based on pseudo out-of-sample forecasts (posf):
– 𝑦𝑤,𝑡 = 𝑤𝑡𝑦𝑡
– 𝑋𝑤,𝑡 = 𝑤𝑡𝑋𝑡
– 𝑤𝑡 = 1𝑇𝑡=1
• Use discrete, exponential, and/or equal weights (𝑤𝑡 =1
𝑇 ∀ t)
• Exponential posf
Relevance:
• Interpretation: which period in the past is akin to the present period
• Forecasting accuracy: focus on relevant data
• Robust: shrink towards equal weights with penalty for unequal weights
• Easy to apply to many types of datasets (high/low-frequency, many/few variables) and models
4
Introduction
Overview:
• Literature
• Innovations
• Simulations
– Forecasting accuracy
– Influence statistical decisions on forecasts
• Practical application
• Discussion
5
Literature on weighing observations
Select optimal starting point (Pesaran & Timmermann 2007):
• Compute posf for different starting points
– Select best starting point
– Take a weighted combination of starting points
Exponential smoothing (Holt 1957, Brown 1959):
– Basic model: 𝑦 𝑇+1 = 𝑤𝑖(𝛾)𝑦𝑖𝑇𝑖=1
Discrete and exponential weights (Pesaran, Pick & Pranovich -PPP- 2013):
• 𝛽 𝑇(𝒘) = 𝑤𝑡𝒙𝑡𝒙𝑡′𝑇
𝑡=1−1 𝑤𝑡𝒙𝑡
𝑇𝑡=1 𝒚𝑡, 𝑤𝑡
𝑇𝑡=1 = 1, h = 1
• Choose weights so that pMSFE of 𝑦 𝑇+1 = 𝛽 𝑇𝒙𝑇+1 is minimized
– Discrete breaks: analytic expression of optimal weights for multiple IVs
• Determine breakpoints by considering all possible combinations between two breakpoints with certain limits for 𝑏1 and 𝑏2
– Continuous breaks: exponential smoothing
6
Innovations
Example
• DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 , where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡
– Computation time: 2.35 sec
7
Figure 3. Individual weights assigned to observations across time
Innovations
• Exponentially weighted posf
• Steps:
1. Determine breakpoints
2. Assign discrete weights to observations
3. Shrink discrete weights towards equal or exponential weights
• Use penalty for deviating from equally weighted observations
Figure 4. Individual weights assigned to observations at T=120
8
1. Determine breakpoints
Known methods to identify breakpoints or outliers
• CUSUM(SQ)
• Chow break test
• Quandt-Andrews Sup F test
• Studentized residuals / dfbetas/ dffits
– 𝑦 = 𝑋𝛽 + 𝐷𝑗𝛾 + 𝜀,
where 𝐷𝑗 is an (n × 1) indicator vector with 𝐷𝑗𝑗 = 1
Motivation for new method:
• Determine multiple breakpoints
• Applicable to various statistical models
9
Determine breakpoints
Figure 5. Finding breakpoints at T=120
10
Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,
where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
Determine breakpoints
Figure 5. Finding breakpoints at T=120
10
Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,
where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
Determine breakpoints
Figure 5. Finding breakpoints at T=120
10
Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,
where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)
1. Determine breakpoints
𝑆 𝑡 =1
𝑊𝐼𝑁 𝑦 𝑝
¬ 𝑡− 𝑦 𝑝
¬ 𝑡−1
𝑇
𝑝=𝑇−𝑊𝐼𝑁+1
• Largest values of 𝑆 are breakpoints
– Contiguous high values of 𝑆 form a ‘breakperiod’
– Quick way to find many candidate breakpoints in real-time
• Combination of test for outlier identification (‘leave-one-out’) and analyzing influence of configurations on posf (Hoornweg & Franses, 2013)
11
1. Determine breakpoints
Alternative
• Equally distribute breaks over treatment sample.
• Adjust each breakpoint and select adjustment that leads to the biggest increase in forecasting accuracy of posf. Continue until no improvement is made (adjustment to Patient Rule Induction Method -PRIM-algorithm).
• Computation time: 7.61 seconds instead of 2.35.
Figure 6. Adjusting equally distributed breakpoints at T=120
12
2. Discrete weights
1. Determine pMSFE of each period.
– Periods with too few observations receive an average weight.
2. Consider all possible combinations of leaving out periods.
– Periods left in receive equal weights or inverse pMSFE weights
𝑤𝑡𝑖 =
1𝑣
𝑒𝜏,𝑖2𝑇
𝜏=𝑇−𝑣+1
−1
1𝑣
𝑒𝜏,𝑗2𝑇
𝜏=𝑇−𝑣+1
−1𝑁𝑗=1
3. Select discrete weights with highest accuracy of posf
Figure 6. Assigning weights to periods at T=120
13
3. Shrink
• 𝑤𝑡𝐸𝑋𝑃 =
−log (1−𝑡/𝑇)
𝑇−1, for 𝑡 = 1, 2, … , 𝑇 − 1, and 𝑤𝑇
𝐸𝑋𝑃 =log (𝑇)
𝑇−1 (PPP, pp. 144)
• 𝑤𝑡𝐸𝑄𝑈𝐴𝐿
=1
𝑇
• 𝑤𝑡𝑠ℎ𝑟𝑖𝑛𝑘 𝜑 = 1 − 𝜑 𝑤𝑡
𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 + 𝜑𝑤𝑡{𝐸𝑋𝑃,𝐸𝑄𝑈𝐴𝐿}
, 𝜑 ∈ (0, 0.1, 0.2, … , 1)
• 𝑅𝑀𝑆𝐹𝐸 𝜆 = 𝑀𝑆𝐹𝐸𝑊 + 𝜆 ∙ 𝑤𝑖−
1
𝑇𝑇𝑖=1
𝑤𝑗𝑚𝑖𝑛−
1
𝑇𝑇𝑗=1
∙ 𝑀𝑆𝐹𝐸𝐸𝑄𝑈𝐴𝐿 .
Figure 7. Shrinking to exponential weights at T=120
Figure 8. Shrinking to equal weights at T=120
14
HF-weight
• Exponentially weighted posf
• Steps:
1. Determine breakpoints (𝑆 𝑡 )
2. Assign discrete weights to observations
3. Shrink discrete weights towards equal or exponential weights
Figure 9. Individual weights assigned to observations at T=120
15
HF-weight
• Exponentially weighted posf
• Steps:
1. Determine breakpoints (𝑆 𝑡 )
2. Assign discrete weights to observations
3. Shrink discrete weights towards equal or exponential weights
• Ad hoc decisions:
– posf:
• #: 20
• exponential
– Maximum # of periods: 4
– minOBS = 20
• minimum # obs for periods to get an individual weight
• minimum # obs in treatment sample.
– 𝜆 = 0.5: Penalty for deviating from equally weighted observations
Figure 9. Individual weights assigned to observations at T=120
15
Simulation study • 𝑦𝑡 = 𝑋𝑡𝛽𝑡 + 𝜀𝑡 , 𝜀𝑡~𝑁 0,1 , 𝑣𝑡~𝑁 0,1 #simul=1000
• Score: % better (-) or worse (+) 𝑀𝑆𝐹𝐸 in comparison to 𝑀𝑆𝐹𝐸(𝑦 𝐸𝑊)
16
DGP Mean1 Mean2 Random walk Regressor
𝑋𝑡 1 1 𝑋𝑡−1 + 0.5 ∙ 𝑣𝑡 ~𝑁(0,1)
𝛽1≤𝑡≤70 3 3 1 3
𝛽71≤𝑡≤120 4 5 1 4
𝛽121≤𝑡≤170 3 3 1 3
Model
HF-Weight -9 -36 -224 -35
Exponential -11 -43 -252 -40
Discrete -14 -53 -274 -48
Best SP -14 -51 -277 -48
Average SP -12 -40 -230 -39
Table 2. Percentage change in MSFE in comparison to MSFE(Equal)
Influence ad hoc decisions Mean1
17
Figure 10. Heatmap ad hoc accuracy Mean1 (50x)
• Reference model:
– Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4
Simulation study
18
Figure 11. Heatmap dynamic accuracy Mean1 (1000x)
Simulation study
18
Figure 11. Heatmap dynamic accuracy Mean1 (1000x)
Simulation study
18
Figure 11. Heatmap dynamic accuracy Mean1 (1000x)
Practical application
• Croushore (2011): SPF forecasts of PGDP might be biased. NGDP forecasts are not biased. Real-time data used.
• Model: 𝑦𝑡𝑎𝑑𝑗
= 𝛼 + 𝛽𝑦𝑡𝑚𝑒𝑎𝑛𝑆𝑃𝐹 + 𝜀𝑡
19
PGDP h=1 h=2 h=3 h=4 NGDP h=1 h=2 h=3 h=4
HF-Weight -8 -30 -28 -20 -0 1 -2 0
Exponential -17 -36 -42 -30 14 12 0 15
Discrete -18 -48 -46 -38 12 9 0 12
Best SP -18 -35 -39 -33 13 11 -1 12
Average SP -17 -35 -40 -32 13 11 -1 1
Table 2. Percentage change in MSFE in comparison to MSFE(Equal)
Influence ad hoc decisions PGDP
20
Figure 12. Heatmap ad hoc accuracy PGDP
• Reference model:
• Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4
Influence ad hoc decisions NGDP
21
Figure 13. Heatmap ad hoc accuracy NGDP
• Reference model:
• Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4
Conclusion
Innovations and results
1. Use exponentially weighted posf to assign weights to treatment sample
2. Use quick way to find relevant breakpoints
3. Use automated algorithm to combine three types of weights:
– Equal weights
• When there are no breaks
• When there is much uncertainty about individual weights
– Discrete weights
• When there are clear breakpoints and there is enough data after a breakpoint
– Exponential weights
• First observations after a breakpoint with unprecedented dgp
• Continuous breakprocess
4. Add penalty term for deviating from equal weights
5. Applicable to various data sets and models to achieve better forecasting accuracy
Further research
• More focus on flexible and robust weights for posf and treatment sample
• Less focus on exact timing of breaks
22