FCAS FINAL PROJECT REPORT Forecasting Temperature Using Weather Time
Series Data of US & Canadian Cities for Effective Inventory Stock Plan at Retail Stores
Section B- Group 10
Rahul Sharma 61910137
Himanshu Jain 61910148
Rahul Garg 61910740
Artika . 61910321
Vividha Arabandi 61910249
Varghese Zachariah 61910374
Table of Contents
1. Executive Summary...............................................................
1
1.1 Problem
Summary..............................................................
1
1.2 Description of data, its source, key characteristics, &
chart(s)....................................................................................
1
1.3 High level description of the final forecasting method
and performance on meaningful performance
metrics...........................
2
1.4 Conclusions and
recommendations........................................
3
2. Technical Summary...............................................................
4
2.1 Data
preparation/issues......................................................
4
2.2 Forecasting
methods............................................................
5
2.3 Performance compared to
benchmarks....................................
5
3. Appendix………………...............................................................
6
1
1. Executive Summary
1.1| Problem Summary
Several cities in the continental US and Canada see temperature well below freezing. This
has several consequences for local business as mobility is restricted and transport cost
increase. Thus, for six cities, that show extreme weather condition (Chicago, Denver,
Detroit, Vancouver, Toronto, Montreal), we have collected weather data for the past five
years and intend to prepare a model that predicts the possible temperature range for the
coming year. This will help Retail owners like Walmart, Kmart, Weston, Costco along with
various mom-pop retails can plan their inventory stock up of various crucial SKUs. Inventory
planning is critical as the extreme weather these cities see lead to costly and difficult
inventory replenishment. There is also a decrease in sales as movement becomes more
difficult and customer tend to avoid venturing out. This could lead to a potential loss of
millions of dollars in overstocking and understocking costs.
1.2| Description of data, its source, key characteristics, & chart(s)
The data series contains ~5 years of high temporal resolution (hourly measurements) data of
temperature in Kelvin in 3 US cities and 3 Canadian cities (Chicago, Denver, Detroit,
Vancouver, Toronto, Montreal). The data has been downloaded from kaggle.com1
The data has the following key characteristics in terms of the time series components:
▪ Noise- This is the non-systematic component and is present in all the time series. It
shows the random variation that results from measurement errors or other causes that
are not accounted for.
▪ Level- This is a systematic component and a point estimate that shows the average
value of the series. This is also present in all kinds of time series.
▪ Trend- All the six time-series have a linear trend.
▪ Seasonality- In all the six time-series we can observe additive seasonality.
Charts showing the raw data vs time for all the 6 cities can be seen in Exhibit 1
Exhibit 1: Time-Series chart for Chicago, Denver, Detroit, Vancouver, Toronto & Montreal from Dec 1, 2012 to Nov 30, 2017
1 https://www.kaggle.com/selfishgene/historical-hourly-weather-data/home
2
1.3| High level description of the final forecasting method and
performance on meaningful performance metrics
The following methods were tried on the time series data for the six cities:
A) Holt Winter (Triple Exponential) with additive seasonality - This method of
smoothing was chosen because the time series had both a linear trend and seasonality. The
seasonality was a 365 period. We chose additive seasonality based on the raw data analysis.
Period 365
Alpha (Level) Optimise [Refer Technical Summary]
Beta (Trend) Optimise [Refer Technical Summary]
Gamma (seasonality) Optimise [Refer Technical Summary]
B) Regression with Linear Trend – In some cities, Holt Winter’s model performance was
not satisfactory, wherein the RMSE was quite high. We then tried linear regression for
forecasting.
Categorical Variables: Season_Index_1 …..Season_index_12 , Predictor Variable : t
Output variable: City temperature
We had tried with Quadratic trend; however, the trend portion is so flat, that the coefficient of
t2 was insignificant.
C) ARIMA – In linear regression model, we saw that the forecasts were very staggered, i.e.
in a window of few nearby forecasts were same value. This was a strong indication for
autocorrelation. On performing the lag analysis, it was confirmed that there was
3
autocorrelation of lag-1. Therefore, we performed ARIMA and adjusted the linear regression
forecasts by adding the latest forecasted errors/residuals.
Results:
City Forecasting Method - Recommendation
Denver Linear Regression
Chicago Holt-Winter Smoothing
Detroit Linear Regression
Montreal Linear Regression
Toronto Holt-Winter Smoothing
Vancouver Holt-Winter Smoothing
1.4| Conclusions and recommendations
From the project, we have the following conclusions and recommendations for the project:
• Forecasting over short horizon is better: We learnt that forecasting over a shorter
horizon is better than forecasting over a longer horizon as local market intelligence
can be incorporated.
• Need to check autocorrelation: We saw in the linear regression models, that the
‘binned’ forecasts can exist – where same forecasts are given for 2-3 days window.
Thus, a lag-1 or lag-2 autocorrelation can exist, and lag analysis should be done.
• Avoid overcomplication: It is advisable to not invoice many variables to forecast.
• Use Control Charts for model performance. Review and adjust the model after a
fixed period (every 6 months) or when control chart indication of error.
2. Technical Summary
2.1| Data preparation/issues
Data has been taken from Kaggle (https://www.kaggle.com/selfishgene/historical-hourly-
weather-data#temperature.csv). Various data attributes are timestamp, city name and
temperature (in kelvin). The time series has components like
Level | Noise | Trend – Linear (very flat) | Seasonality
For preparing the data for use, following steps were taken
4
1. Missing Value Treatment – There were entries missing in the data we downloaded.
As we have monthly seasonality in the data, we used Mode within a month for
imputing the missing values
2. Daily aggregation from hourly – The data we downloaded from Kaggle had hourly
temperature values. As our business goal is to help retailers use the weather forecast
in planning their inventories better and inventories are managed at the daily level not
hourly, temperature variations within the day are not relevant. Furthermore, we want
to concentrate on the effect of cold weather, hence, we used minimum temperature of
the day while aggregating the temperature to a daily level
3. For model preparation, columns ‘t’ & ‘t2’ were added and categorical variables were
created for Linear Regression for Season Index
4. Data was partitioned - Training: 4 years (1461 years); Validation: 1 year (365
records)
2.2| Forecasting methods
Following are the details of the Forecasting methods for Denver. All other cities are detailed
in the Appendix.
Holt’s Winter Smoothing - Alpha(Level) – 0.5849 , Beta(Trend) – 0.0038 , Gamma(Season) – 0.00817
Error Measures: Training
Record ID Value
SSE 17660.14
MSE 12.08771
MAPE 0.977481
MAD 2.674989
CFE -7.96718
MFE -0.00545
TSE -2.9784
Error Measures: Validation
Record ID Value
SSE 28687.15
MSE 78.59492
MAPE 2.777486
MAD 7.751918
CFE 2540.677
MFE 6.960758
TSE 327.7481
240
260
280
300
320
1
52
10
3
15
4
20
5
25
6
30
7
35
8
40
9
46
0
51
1
56
2
61
3
66
4
71
5
76
6
81
7
86
8
91
9
97
0
10
21
10
72
11
23
11
74
12
25
12
76
13
27
13
78
14
29
Training: Actual Vs Fitted
Denver Fitted: Denver
240
260
280
300
Validation: Actual vs Forecast
Denver Forecast -Holt Winter
5
Linear Regression -
ARIMA – Autocorrelation Adjustment
2.3| Performance compared to benchmarks
We have used Naive forecast (lag 365 value) for benchmarking. Ie 01-April, 2017 was forecasted
using 01-April,2016.
We see that in some cities where Holt Winter smoothing didn’t perform well, RMSE was greater
than Naïve model, thus for those cities we have recommended Linear Regression that has
consistently performed better.
240
290
340
1
54
10
7
16
02
13
26
6
31
9
37
24
25
47
85
31
58
4
63
76
90
74
37
96
84
9
90
2
95
51
00
8
10
61
11
14
11
67
12
20
12
73
13
26
13
79
14
32
Training: Actual vs Fitted
Denver Prediction: Denver
Training: Prediction Summary
Metric Value
SSE 34266.26
MSE 23.45398
RMSE 4.842931
MAD 3.714774
R2 0.721353
Validation: Prediction Summary
Metric Value
SSE 7883.837
MSE 21.59955
RMSE 4.647532
MAD 3.440248
R2 0.709864240
260
280
300
Validation: Actual vs Forecast
Denver Prediction: Denver - Regression
6
3. Appendix
Chicago
Holt Winter Smoothing with additive seasonality
0.881039
0.001831
0.082461
Alpha (Level)
Beta (Trend)
Gamma (seasonality)
Error Measures: Training Error Measures: Validation
Record ID Value Record ID Value Column1
SSE 17683.87 SSE 13263.69 115.1681
MSE 12.10395 MSE 36.33887 6.028173
MAPE 0.988919 MAPE 1.684337 1.29782
MAD 2.740549 MAD 4.688776 2.165358
CFE -3.5127 CFE 316.2392 17.78312
MFE -0.0024 MFE 0.866409 0.930811
TSE -1.28175 TSE 67.446 8.212551
7
Detroit
Linear Regression
Montreal
Linear Regression
-30
-20
-10
0
10
204
27
05
42
71
7
42
72
9
42
74
1
42
75
3
42
76
5
42
77
7
42
78
9
42
80
1
42
81
3
42
82
5
42
83
7
42
84
9
42
86
1
42
87
3
42
88
5
42
89
7
42
90
9
42
92
1
42
93
3
42
94
5
42
95
7
42
96
9
42
98
1
42
99
3
43
00
5
43
01
7
43
02
9
43
04
1
43
05
3
43
06
5
Residual
Training: Prediction Summary
Metric Value
SSE 36252.1001
MSE 24.8132102
RMSE 4.981285998
MAD 3.921526299
R2 0.782243802
240
290
340
1
75
14
9
22
3
29
7
37
1
44
5
51
9
59
3
66
7
74
1
81
5
88
9
96
3
10
37
11
11
11
85
12
59
13
33
14
07
Training: Actual vs Fitted
Detroit Prediction: Detroit
240
260
280
300
1
21
41
61
81
10
1
12
1
14
1
16
1
18
1
20
1
22
1
24
1
26
1
28
1
30
1
32
1
34
1
36
1
Validation: Actual vs Forecast
Detroit Prediction: Detroit
Validation: Prediction Summary
Metric Value
SSE 10167.56
MSE 27.85632
RMSE 5.277908
MAD 4.194072
R2 0.670782
-30
-10
10
30
1
21
41
61
81
10
1
12
1
14
1
16
1
18
1
20
1
22
1
24
1
26
1
28
1
30
1
32
1
34
1
36
1
Validation: Residual
-40
-20
0
20
1
71
14
1
21
1
28
1
35
1
42
1
49
1
56
1
63
1
70
1
77
1
84
1
91
1
98
1
10
51
11
21
11
91
12
61
13
31
14
01
Residual
Training: Prediction Summary
Metric Value
SSE 42051.7681
MSE 28.7828666
RMSE 5.3649666
MAD 4.1825682
R2 0.79321145
Validation: Prediction Summary
Metric Value
SSE 11189.4193
MSE 30.6559432
RMSE 5.53678095
MAD 4.20776217
R2 0.73692502
8
Toronto
Holt Winter Smoothing with additive seasonality
Alpha (Level) 0.881039
Beta (Trend) 0.001831
Gamma (seasonality) 0.082461
240
260
280
300
320
17
11
41
21
12
81
35
14
21
49
15
61
63
1
70
17
71
84
19
11
98
11
05
11
12
1
11
91
12
61
13
31
14
01
Training: Actual vs Fitted
Montreal Prediction: Montreal
-20
-10
0
10
20
30
1
71
14
1
21
1
28
1
35
1
42
1
49
1
56
1
63
1
70
1
77
1
84
1
91
1
98
1
10
51
11
21
11
91
12
61
13
31
14
01
Training: Residual
220
240
260
280
300
1
20
39
58
77
96
11
5
13
4
15
3
17
2
19
1
21
0
22
9
24
8
26
7
28
6
30
5
32
4
34
3
36
2
Validation: Actual vs Forecasted
Montreal Prediction: Montreal
-20
-10
0
10
20
1
20
39
58
77
96
11
5
13
4
15
3
17
2
19
1
21
0
22
9
24
8
26
7
28
6
30
5
32
4
34
3
36
2
Validation:Residual
Error Measures: Training Error Measures: Validation
Record ID Value Record ID Value
SSE 16628.13 SSE 20712.86
MSE 11.38134 MSE 56.74755
MAPE 0.937095 MAPE 2.291616
MAD 2.584219 MAD 6.360996
CFE 2.469481 CFE -2026.17
MFE 0.00169 MFE -5.55116
TSE 0.9556 TSE -318.531
9
Vancouver
Holt Winter Smoothing with additive seasonality
Alpha (Level) 0.584918
Beta (Trend) 0.003815
Gamma (seasonality) 0.008179
-30
-20
-10
0
10
201
15
29
43
57
71
85
99
11
3
12
7
14
1
15
5
16
9
18
3
19
7
21
1
22
5
23
9
25
3
26
7
28
1
29
5
30
9
32
3
33
7
35
1
36
5
Residual
Error Measures: Training Error Measures: Validation
Record ID Value Record ID Value
SSE 6362.835 SSE 16076.25
MSE 4.355123 MSE 44.04452
MAPE 0.529988 MAPE 2.134632
MAD 1.478359 MAD 5.938589
CFE 3.152989 CFE -1815.87
MFE 0.002158 MFE -4.97499
TSE 2.132763 TSE -305.775