Download - Weighing Individual Observations for Time Series Forecasting · 2017. 4. 7. · Time Series Forecasting ... • Assign robust weights to observations based on pseudo out-of-sample

Weighing Individual Observations for Time Series Forecasting

Victor Hoornweg & Philip Hans Franses

Erasmus University Rotterdam, Tinbergen Institute, Econometric Institute

Rotterdam, July 1, 2014

1

Introduction

Issue: • How to deal with structural breaks or outliers? Weigh individual observations

• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 • DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡,

– where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

2

Figure 1. Simulated series

Introduction

Issue:

• How to deal with structural breaks or outliers?

Weigh individual observations

• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡

• DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡, – where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

3

Figure 2. Individual weights assigned to observations across time

Introduction

Issue:

• How to deal with structural breaks or outliers?

Proposed solution:

• Assign robust weights to observations based on pseudo out-of-sample forecasts (posf):

– 𝑦𝑤,𝑡 = 𝑤𝑡𝑦𝑡

– 𝑋𝑤,𝑡 = 𝑤𝑡𝑋𝑡

– 𝑤𝑡 = 1𝑇𝑡=1

• Use discrete, exponential, and/or equal weights (𝑤𝑡 =1

𝑇 ∀ t)

• Exponential posf

Relevance:

• Interpretation: which period in the past is akin to the present period

• Forecasting accuracy: focus on relevant data

• Robust: shrink towards equal weights with penalty for unequal weights

• Easy to apply to many types of datasets (high/low-frequency, many/few variables) and models

4

Introduction

Overview:

• Literature

• Innovations

• Simulations

– Forecasting accuracy

– Influence statistical decisions on forecasts

• Practical application

• Discussion

5

Literature on weighing observations

Select optimal starting point (Pesaran & Timmermann 2007):

• Compute posf for different starting points

– Select best starting point

– Take a weighted combination of starting points

Exponential smoothing (Holt 1957, Brown 1959):

– Basic model: 𝑦 𝑇+1 = 𝑤𝑖(𝛾)𝑦𝑖𝑇𝑖=1

Discrete and exponential weights (Pesaran, Pick & Pranovich -PPP- 2013):

• 𝛽 𝑇(𝒘) = 𝑤𝑡𝒙𝑡𝒙𝑡′𝑇

𝑡=1−1 𝑤𝑡𝒙𝑡

𝑇𝑡=1 𝒚𝑡, 𝑤𝑡

𝑇𝑡=1 = 1, h = 1

• Choose weights so that pMSFE of 𝑦 𝑇+1 = 𝛽 𝑇𝒙𝑇+1 is minimized

– Discrete breaks: analytic expression of optimal weights for multiple IVs

• Determine breakpoints by considering all possible combinations between two breakpoints with certain limits for 𝑏1 and 𝑏2

– Continuous breaks: exponential smoothing

6

Innovations

Example

• DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 , where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)

• Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡

– Computation time: 2.35 sec

7

Figure 3. Individual weights assigned to observations across time

Innovations

• Exponentially weighted posf

• Steps:

1. Determine breakpoints

2. Assign discrete weights to observations

3. Shrink discrete weights towards equal or exponential weights

• Use penalty for deviating from equally weighted observations

Figure 4. Individual weights assigned to observations at T=120

8


Known methods to identify breakpoints or outliers

• CUSUM(SQ)

• Chow break test

• Quandt-Andrews Sup F test

• Studentized residuals / dfbetas/ dffits

– 𝑦 = 𝑋𝛽 + 𝐷𝑗𝛾 + 𝜀,

where 𝐷𝑗 is an (n × 1) indicator vector with 𝐷𝑗𝑗 = 1

Motivation for new method:

• Determine multiple breakpoints

• Applicable to various statistical models

9

Determine breakpoints

Figure 5. Finding breakpoints at T=120

10

Model: 𝑦𝑡 = 𝜇 + 𝜂𝑡 DGP: 𝑦𝑡 = 3 − 2 ∙ 𝟏𝑡>80 + 2 ∙ 𝟏𝑡>120 + 𝜀𝑡 ,

where 𝑡 = 1, 2, … , 170 and 𝜀𝑡~𝑁(0, 1)


𝑆 𝑡 =1

𝑊𝐼𝑁 𝑦 𝑝

¬ 𝑡− 𝑦 𝑝

¬ 𝑡−1

𝑇

𝑝=𝑇−𝑊𝐼𝑁+1

• Largest values of 𝑆 are breakpoints

– Contiguous high values of 𝑆 form a ‘breakperiod’

– Quick way to find many candidate breakpoints in real-time

• Combination of test for outlier identification (‘leave-one-out’) and analyzing influence of configurations on posf (Hoornweg & Franses, 2013)

11


Alternative

• Equally distribute breaks over treatment sample.

• Adjust each breakpoint and select adjustment that leads to the biggest increase in forecasting accuracy of posf. Continue until no improvement is made (adjustment to Patient Rule Induction Method -PRIM-algorithm).

• Computation time: 7.61 seconds instead of 2.35.

Figure 6. Adjusting equally distributed breakpoints at T=120

12

2. Discrete weights

1. Determine pMSFE of each period.

– Periods with too few observations receive an average weight.

2. Consider all possible combinations of leaving out periods.

– Periods left in receive equal weights or inverse pMSFE weights

𝑤𝑡𝑖 =

1𝑣

𝑒𝜏,𝑖2𝑇

𝜏=𝑇−𝑣+1

−1

1𝑣

𝑒𝜏,𝑗2𝑇

𝜏=𝑇−𝑣+1

−1𝑁𝑗=1

3. Select discrete weights with highest accuracy of posf

Figure 6. Assigning weights to periods at T=120

13

3. Shrink

• 𝑤𝑡𝐸𝑋𝑃 =

−log (1−𝑡/𝑇)

𝑇−1, for 𝑡 = 1, 2, … , 𝑇 − 1, and 𝑤𝑇

𝐸𝑋𝑃 =log (𝑇)

𝑇−1 (PPP, pp. 144)

• 𝑤𝑡𝐸𝑄𝑈𝐴𝐿

=1

𝑇

• 𝑤𝑡𝑠ℎ𝑟𝑖𝑛𝑘 𝜑 = 1 − 𝜑 𝑤𝑡

𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 + 𝜑𝑤𝑡{𝐸𝑋𝑃,𝐸𝑄𝑈𝐴𝐿}

, 𝜑 ∈ (0, 0.1, 0.2, … , 1)

• 𝑅𝑀𝑆𝐹𝐸 𝜆 = 𝑀𝑆𝐹𝐸𝑊 + 𝜆 ∙ 𝑤𝑖−

1

𝑇𝑇𝑖=1

𝑤𝑗𝑚𝑖𝑛−

1

𝑇𝑇𝑗=1

∙ 𝑀𝑆𝐹𝐸𝐸𝑄𝑈𝐴𝐿 .

Figure 7. Shrinking to exponential weights at T=120

Figure 8. Shrinking to equal weights at T=120

14

HF-weight


• Steps:

1. Determine breakpoints (𝑆 𝑡 )




15

HF-weight


• Steps:

1. Determine breakpoints (𝑆 𝑡 )



• Ad hoc decisions:

– posf:

• #: 20

• exponential

– Maximum # of periods: 4

– minOBS = 20

• minimum # obs for periods to get an individual weight

• minimum # obs in treatment sample.

– 𝜆 = 0.5: Penalty for deviating from equally weighted observations


15

Simulation study • 𝑦𝑡 = 𝑋𝑡𝛽𝑡 + 𝜀𝑡 , 𝜀𝑡~𝑁 0,1 , 𝑣𝑡~𝑁 0,1 #simul=1000

• Score: % better (-) or worse (+) 𝑀𝑆𝐹𝐸 in comparison to 𝑀𝑆𝐹𝐸(𝑦 𝐸𝑊)

16

DGP Mean1 Mean2 Random walk Regressor

𝑋𝑡 1 1 𝑋𝑡−1 + 0.5 ∙ 𝑣𝑡 ~𝑁(0,1)

𝛽1≤𝑡≤70 3 3 1 3

𝛽71≤𝑡≤120 4 5 1 4

𝛽121≤𝑡≤170 3 3 1 3

Model

HF-Weight -9 -36 -224 -35

Exponential -11 -43 -252 -40

Discrete -14 -53 -274 -48

Best SP -14 -51 -277 -48

Average SP -12 -40 -230 -39

Table 2. Percentage change in MSFE in comparison to MSFE(Equal)

Influence ad hoc decisions Mean1

17

Figure 10. Heatmap ad hoc accuracy Mean1 (50x)

• Reference model:

– Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4

Simulation study

18

Figure 11. Heatmap dynamic accuracy Mean1 (1000x)

Practical application

• Croushore (2011): SPF forecasts of PGDP might be biased. NGDP forecasts are not biased. Real-time data used.

• Model: 𝑦𝑡𝑎𝑑𝑗

= 𝛼 + 𝛽𝑦𝑡𝑚𝑒𝑎𝑛𝑆𝑃𝐹 + 𝜀𝑡

19

PGDP h=1 h=2 h=3 h=4 NGDP h=1 h=2 h=3 h=4

HF-Weight -8 -30 -28 -20 -0 1 -2 0

Exponential -17 -36 -42 -30 14 12 0 15

Discrete -18 -48 -46 -38 12 9 0 12

Best SP -18 -35 -39 -33 13 11 -1 12

Average SP -17 -35 -40 -32 13 11 -1 1

Table 2. Percentage change in MSFE in comparison to MSFE(Equal)

Influence ad hoc decisions PGDP

20

Figure 12. Heatmap ad hoc accuracy PGDP


• Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4

Influence ad hoc decisions NGDP

21

Figure 13. Heatmap ad hoc accuracy NGDP


• Break -1, break no adjust, posf exp, win 20, minobs 20, RMSFE(0.5), max periods 4

Conclusion

Innovations and results

1. Use exponentially weighted posf to assign weights to treatment sample

2. Use quick way to find relevant breakpoints

3. Use automated algorithm to combine three types of weights:

– Equal weights

• When there are no breaks

• When there is much uncertainty about individual weights

– Discrete weights

• When there are clear breakpoints and there is enough data after a breakpoint

– Exponential weights

• First observations after a breakpoint with unprecedented dgp

• Continuous breakprocess

4. Add penalty term for deviating from equal weights

5. Applicable to various data sets and models to achieve better forecasting accuracy

Further research

• More focus on flexible and robust weights for posf and treatment sample

• Less focus on exact timing of breaks

22

Download - Weighing Individual Observations for Time Series Forecasting · 2017. 4. 7. · Time Series Forecasting ... • Assign robust weights to observations based on pseudo out-of-sample

Top Related