3 - the forecaster’s toolbox (sections 3.2 & 3.3)course1.winona.edu/bdeppa/fin...

3 - The Forecaster’s Toolbox (Sections 3.2 & 3.3)

Rob Hyndman with Deppa modifications (FIN 335)

January 26, 2018

Table of Contents3.2 - Transformations and Adjustments..........................................................................................................1

Calendar Adjustments........................................................................................................................................ 2

Example 1 - Monthly Milk Production per Cow.................................................................................2

Example 2 - Monthly Fastenal Sales and Number of Business Days.........................................4

Population Adjustments.................................................................................................................................... 6

Example 3 - Flu and Influenze Deaths in Minnesota........................................................................6

Mathematical Transformations......................................................................................................................9

Example 4 - Monthly U.S. Liquor Sales (1980 -2007)...................................................................10

Features of Power Transformations..........................................................................................................14

Bias Adjustments................................................................................................................................................15

Example 4 - Monthly U.S. Liquor Sales (1980 -2007) (cont’d).................................................15

Example 5 - Price of Dozen Eggs in US (1900-1993), in constant dollars............................17

3.3 - Residual Diagnostics................................................................................................................................... 18

Residuals................................................................................................................................................................ 19

Example 2 - Monthly Fastenal Sales and Number of Business Days (cont’d).....................19

Example 6 - Google Daily Stock Prices (02/25/13 - 12/06/13)..............................................23

Example 4 - Monthly U.S. Liquor Sales (1980 -2007) (cont’d).................................................26

Portmanteau Tests for Autocorrelation...................................................................................................31

Example 6 - Google Daily Stock Prices (02/25/13 - 12/06/13)..............................................32

Example 7 - Dow Jones Industrial Average and Differencing (Assignment 1)..................35

3.2 - Transformations and Adjustments

Adjusting the historical data can often lead to a simpler forecasting model. Here, we deal with four kinds of adjustments: calendar adjustments, population adjustments, inflation adjustments and mathematical transformations. The purpose of these adjustments and transformations is to simplify the patterns in the historical data by removing known sources of variation or by making the pattern more consistent across the whole data set. Simpler patterns usually lead to more accurate forecasts.

1

Calendar Adjustments

Some of the variation seen in seasonal data may be due to simple calendar effects. In such cases, it is usually much easier to remove the variation before fitting a forecasting model. The monthdays function will compute the number of days in each month or quarter.

For example, if you are studying the monthly milk production on a farm, there will be variation between the months simply because of the different numbers of days in each month, in addition to the seasonal variation across the year. As a second example, the total monthly sales for a company like Fastenal could be adjusted for the number of business days in the month. The number of business days in a given month will differ from year to year when even considering the same month. We will examine calendar adjustments for both of these example below.

Example 1 - Monthly Milk Production per Cow

These data come from monitoring the average monthly milk production per cow on dairy farm over the last 14 years. This would be calculated by finding the total milk production in pounds for each cow and then finding the mean of those totals. This could vary as the number of cows per farm is likely to change during 14 years of study, but the bigger problem here is that not all months have the same number of days! For example, there are three more days of production in August compared to February in a non-leap year.

require(fpp2)

## Loading required package: fpp2

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

milk

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec## 1 589 561 640 656 727 697 640 599 568 577 553 582## 2 600 566 653 673 742 716 660 617 583 587 565 598## 3 628 618 688 705 770 736 678 639 604 611 594 634## 4 658 622 709 722 782 756 702 653 615 621 602 635## 5 677 635 736 755 811 798 735 697 661 667 645 688## 6 713 667 762 784 837 817 767 722 681 687 660 698## 7 717 696 775 796 858 826 783 740 701 706 677 711## 8 734 690 785 805 871 845 801 764 725 723 690 734## 9 750 707 807 824 886 859 819 783 740 747 711 751## 10 804 756 860 878 942 913 869 834 790 800 763 800## 11 826 799 890 900 961 935 894 855 809 810 766 805## 12 821 773 883 898 957 924 881 837 784 791 760 802## 13 828 778 889 902 969 947 908 867 815 812 773 813## 14 834 782 892 903 966 937 896 858 817 827 797 843

2

autoplot(milk)+ggtitle("Monthly Milk Production Per Cow") + xlab("Year") + ylab("Pounds of Milk")

monthdays(milk)

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec## 1 31 28 31 30 31 30 31 31 30 31 30 31## 2 31 28 31 30 31 30 31 31 30 31 30 31## 3 31 28 31 30 31 30 31 31 30 31 30 31## 4 31 29 31 30 31 30 31 31 30 31 30 31## 5 31 28 31 30 31 30 31 31 30 31 30 31## 6 31 28 31 30 31 30 31 31 30 31 30 31## 7 31 28 31 30 31 30 31 31 30 31 30 31## 8 31 29 31 30 31 30 31 31 30 31 30 31## 9 31 28 31 30 31 30 31 31 30 31 30 31## 10 31 28 31 30 31 30 31 31 30 31 30 31## 11 31 28 31 30 31 30 31 31 30 31 30 31## 12 31 29 31 30 31 30 31 31 30 31 30 31## 13 31 28 31 30 31 30 31 31 30 31 30 31## 14 31 28 31 30 31 30 31 31 30 31 30 31

milk.adj = milk/monthdays(milk)Milk = cbind(milk,milk.adj)autoplot(Milk,facet=T) + xlab("Years") + ylab("Pounds of Milk")

3

Notice how much simpler the seasonal pattern is in the average daily production plot compared to the average monthly production plot. By looking at the average daily production instead of the average monthly production, we effectively remove the variation due to the different month lengths. Simpler patterns are usually easier to model and lead to more accurate forecasts.

A similar adjustment can be done for sales data when the number of trading days in each month varies. In this case, the sales per trading day can be modelled instead of the total sales for each month.

Example 2 - Monthly Fastenal Sales and Number of Business Days

The number of business days in the month can effect the total monthly sales figures for any company and Fastenal is no exception. In this example we will use the monthdays command to adjust the total monthly sales figures.

Fastenal = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Fastenal%20Sales%20(2004-2013).csv")names(Fastenal)

## [1] "Time" "Month" "Month.Num" ## [4] "Year" "NumBDays" "AvSalesPD" ## [7] "Total.Sales" "Total.Fastner" "Total.Nonfastner"

TotSales = ts(Fastenal$Total.Sales,start=2004,frequency=12)TotSales = TotSales/1000000

4

NumBusDays = monthdays(TotSales)NumBusDays

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec## 2004 31 29 31 30 31 30 31 31 30 31 30 31## 2005 31 28 31 30 31 30 31 31 30 31 30 31## 2006 31 28 31 30 31 30 31 31 30 31 30 31## 2007 31 28 31 30 31 30 31 31 30 31 30 31## 2008 31 29 31 30 31 30 31 31 30 31 30 31## 2009 31 28 31 30 31 30 31 31 30 31 30 31## 2010 31 28 31 30 31 30 31 31 30 31 30 31## 2011 31 28 31 30 31 30 31 31 30 31 30 31## 2012 31 29 31 30 31 30 31 31 30 31 30 31## 2013 31 28 31 30 31 30 31 31 30 31 30 31

AvgDaySales = TotSales/NumBusDaysFastData = cbind(TotSales,AvgDaySales)autoplot(FastData,facet=T)+ggtitle("Total Monthly Sales and Average Sales Per Day") + xlab("Year") + ylab("Millions $")

The difference in the time series is noticeable, with average daily sales again being a bit smoother than the total sales which are impacted by the number of business days.

Note: The number of days in the month is not technically the number of business days (i.e. M-F). If that figure was known we could adjust by dividing by that number instead. Finally, it is important to note that forecasting with the average daily sales series will require us to multiply by the number business days in the months we wish to forecast for.

5

Population Adjustments

Any data that are affected by population changes can be adjusted to give per-capita data. That is, consider the data per person (or per thousand people, per 100,000 people or per million people) rather than the total. For example, if you are studying the number of hospital beds in a particular region over time, the results are much easier to interpret if you remove the effects of population changes by considering the number of beds per thousand people. Then you can see whether there have been real increases in the number of beds, or whether the increases are due entirely to population increases. It is possible for the total number of beds to increase, but the number of beds per thousand people to decrease. This occurs when the population is increasing faster than the number of hospital beds. For most data that are affected by population changes, it is best to use per-capita data rather than the totals.

As an example we will consider the number of weekly deaths in Minnesota due to all causes and number of deaths due specifically to influenza or pneumonia. If were looking at the these numbers over a long span of time we would certainly need to consider the state population. Even if considering these figures over shorter time frame we may want to convert these death counts to rates (say per 100,000) so we could compare them fairly across state or to those in Ontario, Canada for example.

Example 3 - Flu and Influenze Deaths in MinnesotaMNflu = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Minnesota%20Flu%20Deaths.csv")names(MNflu)

## [1] "Time" "Year" "Week" "PerFluInf" "AllDeaths" ## [6] "NumFluInf" "Population"

head(MNflu)

## Time Year Week PerFluInf AllDeaths NumFluInf Population## 1 1 2010 40 6.70 776 52 5528630## 2 2 2010 41 9.10 780 71 5528630## 3 3 2010 42 8.55 772 66 5528630## 4 4 2010 43 8.32 781 65 5528630## 5 5 2010 44 5.84 805 47 5528630## 6 6 2010 45 9.69 784 76 5528630

AllDeaths = ts(MNflu$AllDeaths,start=c(2010,40),frequency=52)FluDeaths = ts(MNflu$NumFluInf,start=c(2010,40),frequency=52)MNpop = MNflu$PopulationDeaths = cbind(AllDeaths,FluDeaths)

6

autoplot(Deaths,facet=T) + ggtitle("MN Weekly Deaths (all causes and flu)") + xlab("Year")

To adjust these time series we can convert from the number of deaths to the number of deaths per 100,000 people. The MNpop variable above contains the state population for each year in the time series. Note the population figure is computed annually only so we will assume it is the same for each week of the year. The conversion of raw counts to deaths per 100,000 individuals we use the formula:

Deaths per 100,000 = 100,000 x (# of Deaths/MN population)

DeathRate = 100000*(AllDeaths/MNpop)FluRate = 100000*(FluDeaths/MNpop)DeathRates = cbind(DeathRate,FluRate)autoplot(DeathRates,facet=T) + ggtitle("Deaths per 100,000 (all causea and flu)") + xlab("Year") + ylab("Deaths per 100,000")

7

We could also consider the percentage of deaths caused or attributed to flu & pneumonia.

PerFlu = 100*FluDeaths/AllDeathsautoplot(PerFlu) + ggtitle("Percentage of Deaths due to Flu/Pneumonia") + xlab("Year") + ylab("% of Deaths")

8

ggseasonplot(PerFlu,polar=T)

We can see a few years where there influenza represented a large percentage of deaths during January outbreaks in 2013 and 2014 and in March 2012 (I think, colors are bit hard to discern).

Mathematical Transformations

If the data show variation that increases or decreases with the level of the series, then a transformation can be useful. For example, a logarithmic transformation is often useful. If we denote the original observations as y1 ,…, yT and the transformed observations as w1 ,…,wT, then w t=log ( y t).

Logarithms are useful because they are interpretable: changes in a log value are relative (or percentage) changes on the original scale. So if log base 10 is used, then an increase of 1 on the log scale corresponds to a multiplication of 10 on the original scale. Another useful feature of log transformations is that they constrain the forecasts to stay positive on the original scale.

Sometimes other transformations are also used (although they are not as interpretable). For example, square roots and cube roots can be used. These are called power transformations because they can be written in the form w t= y t

p.

A useful family of transformations, that includes both logarithms and power transformations, is the family of “Box-Cox transformations”, which depend on the parameter λ.

w t={ log ( y t) if λ=0 ;( y t

λ−1)/ λ otherwise .

9

The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to basee). If λ=1 then w t= y t−1, so it is the original time series shifted down 1 unit, but the shape and appearance will not change. But for all other values of λ, the time series will change shape. The e-text has nice example of a time series representing electricity demand. In the original scale the seasonality swings get larger as the demand increases. This increasing variation presents a problem when making forecasts, thus transforming the response using the Box-Cox family of transformations to stabilize the variability can produce better forecasts and forecast prediction intervals. You can try different transformations by using the slider at the top of the plot of this time series.

There are algorithms for find the “optimal” value for λ, but you may want to round the result to the nearest “common” transformation. Common values are λ=0 which is the log transformation, λ=1/m which is the mth root (e.g. square root (m=2), cube root (m=3)), and λ=−1 which is the reciprocal. The reciprocal can be good if the time series is a rate of some kind. I would argue negative values of λ should generally be avoided. The log and root transformations are definitely the most common.

We can convert a Box-Cox transformed time series back to the original scale, i.e. back-transform, using reverse Box-Cox transformation given by:

y t=¿

The function BoxCox.lambda() will automatically find an “optimal” value for λ for a given time series. We consider an example below.

Example 4 - Monthly U.S. Liquor Sales (1980 -2007)Liquor = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/US%20Liquor%20Sales.csv")names(Liquor)

## [1] "Time" "Month" "Year" "Liquor.Sales"

LiqSales = ts(Liquor$Liquor.Sales,start=1980,frequency=12)autoplot(LiqSales) + ggtitle("Monthly US Liquor Sales") + xlab("Year") + ylab("Liquor Sales")

10

lambda = BoxCox.lambda(LiqSales)lambda

## [1] 0.1127641

LiqTran.opt = BoxCox(LiqSales,lambda)autoplot(LiqTran.opt) + ggtitle("Box-Cox Transformed Liquor Sales (lambda=.11)") + xlab("Year") + ylab("Transformed Liquor Sales")

logLS = log(LiqSales)autoplot(logLS) + ggtitle("Log-transformed Liquor Sales (lambda=0)") + xlab("Year") + ylab("log(Liquor Sales)")

11

The log-transformed liquor sales has much more constant variance and is easily back-transformed to the original scale by using exp(logLS) which is R code for e logLS. We can make forecasts in the log scale and then convert them back to the original scale which will demonstrate below using the seasonal naive forecast method.

logforecast = snaive(logLS,h=12)logforecast

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 2008 7.307202 7.216227 7.398177 7.168068 7.446337## Feb 2008 7.275865 7.184890 7.366840 7.136730 7.414999## Mar 2008 7.404279 7.313304 7.495254 7.265145 7.543413## Apr 2008 7.428333 7.337358 7.519308 7.289199 7.567467## May 2008 7.467942 7.376967 7.558917 7.328808 7.607077## Jun 2008 7.480992 7.390017 7.571967 7.341858 7.620126## Jul 2008 7.545918 7.454943 7.636893 7.406784 7.685052## Aug 2008 7.482119 7.391144 7.573094 7.342985 7.621253## Sep 2008 7.463363 7.372388 7.554338 7.324229 7.602497## Oct 2008 7.454720 7.363745 7.545695 7.315586 7.593854## Nov 2008 7.478170 7.387195 7.569145 7.339036 7.617304## Dec 2008 7.796058 7.705083 7.887033 7.656924 7.935192

plot(logforecast)

12

names(logforecast)

## [1] "method" "level" "x" "mean" "lower" ## [6] "upper" "model" "fitted" "residuals" "lambda" ## [11] "series"

The log time series that was used to forecast from is contained in logforecast$x, the forecasts are contained in logforecast$mean and the lower & upper prediction intervals (80% and 95%) are contained in logforecast$lower and logforecasts$upper respectively. We can extract these results, combine them, and then convert them back to the original scale of the liquor sales time series.

results = cbind(OriginalSeries=logforecast$x,Forecast=logforecast$mean,Lower80=logforecast$lower[,1],Upper80=logforecast$upper[,1])results.orig = exp(results)autoplot(results.orig) + ggtitle("Monthly US Liquor Sales with Forecasts") + xlab("Year") + ylab("Liquor Sales")

13

As an alternative we can specify the Box-Cox transformation using the lambda= option inside the call to snaive() when constructing the forecast.

logforecast = snaive(LiqSales,lambda=0,h=36,level=95)logforecast

## Point Forecast Lo 95 Hi 95## Jan 2008 1491 1297.336 1713.574## Feb 2008 1445 1257.311 1660.707## Mar 2008 1643 1429.593 1888.264## Apr 2008 1683 1464.397 1934.235## May 2008 1751 1523.565 2012.386## Jun 2008 1774 1543.577 2038.820## Jul 2008 1893 1647.121 2175.584## Aug 2008 1776 1545.318 2041.118## Sep 2008 1743 1516.604 2003.192## Oct 2008 1728 1503.552 1985.953## Nov 2008 1769 1539.227 2033.073## Dec 2008 2431 2115.240 2793.896## Jan 2009 1491 1224.682 1815.231## Feb 2009 1445 1186.899 1759.227## Mar 2009 1643 1349.533 2000.284## Apr 2009 1683 1382.388 2048.983## May 2009 1751 1438.242 2131.770 ETC...

plot(logforecast)

14

Features of Power Transformations• If some y t≤0, no power transformation is possible unless all observations are adjusted by

adding a constant to all values.

• Choose simple values of λ as it makes explanations easier. Common choices for λ are λ={1/2,1/3,1/4,0 ,−1/2 ,−1 }, with the log transformation (λ=0) being the MOST common.

• The forecasting results are relatively insensitive to the value of λ.

• Most often no transformation is needed.

• Transformations sometimes make little difference to the forecasts but have a large effect on prediction intervals.

15

Bias Adjustments

One issue with using mathematical transformations such as Box-Cox transformations is that the back-transformed forecast will not be the mean of the forecast distribution. In fact, it will usually be the median of the forecast distribution (assuming that the distribution on the transformed space is symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, you may wish to add up sales forecasts from various regions to form a forecast for the whole country. But medians do not add up, whereas means do.

For a Box-Cox transformation, the back-transformed mean is given by:

y t=¿

where σ h2 is the h−¿step forecast variance. The larger the forecast variance, the bigger the

difference between the mean and the median.

Example 4 - Monthly U.S. Liquor Sales (1980 -2007) (cont’d)logforecast = snaive(LiqSales,lambda=0,h=36,level=95)plot(logforecast)

16

logforecast.bias = snaive(LiqSales,lambda=0,h=36,level=95,biasadj=TRUE)plot(logforecast.bias)

autoplot(LiqSales) + autolayer(logforecast,series="Simple Back-Transformation") + autolayer(logforecast.bias$mean,series="Bias-Adjusted") + guides(colour=guide_legend(title="Forecast"))

The differences between the back-transformed and bias-adjusted back-transformed forecasts and prediction intervals are pretty small here.

17

Example 5 - Price of Dozen Eggs in US (1900-1993), in constant dollarseggs = eggs/100 #convert cents to dollarsautoplot(eggs) + ggtitle("Price of a Dozen Eggs (constant $)") + xlab("Year") + ylab("Price $")

lambda = BoxCox.lambda(eggs)lambda

## [1] 0.3956183

results.log = rwf(eggs,drift=T,lambda=.333,h=25,level=80)results.bias = rwf(eggs,drift=T,lambda=.333,h=25,level=80,biasadj=T)autoplot(eggs) + autolayer(results.log,series="Simple Back-transform") + autolayer(results.bias$mean,series="Bias-Adjusted") + guides(colour=guide_legend(title="Forecast"))

18

The optimal λ returned by the Box-Cox method is λ=.396. Opting to use the nearest “common” transformation we will use λ=.333, i.e. the cube root transformation. Here the difference between the forecast based upon the bias-adjusted back-transform differs quite a bit from the simple back-transformed forecast. In FPP the author shows the results from using a log-transformation. The Drift Method was used for making the forecast.

3.3 - Residual Diagnostics

Each observation in a time series can be forecast using all previous observations. We call these

“fitted values” and they are denoted by y¿

t∨t−1, meaning the forecast of y t is based upon the

observations y1 ,…, y t−1. We use these so often, we sometimes drop part of the subscript and just

write y¿

t instead of denoting the dependence on the previous t−1 observations y

¿

t∨t−1. Fitted values

are always involve one-step forecasts.

Actually, fitted values are often not true forecasts because any parameters involved in the forecasting method are estimated using all available observations in the time series, including future observations. For example, if we use the average method, the fitted values are given by

y t¿

= y=c¿

where y=c¿

is the average computed over all available observations, including those at times after

time t . Similarly, for the drift method, the drift parameter is estimated using all available observations. In this case, the fitted values are given by

y¿

t= y t−1+c¿

where

c¿

=( yT− y1)/ (T−1) .

In both cases, there is a parameter to be estimated from the data. The “hat” above the c reminds us that this is an estimate. When the estimate of c involves observations after time t , the fitted values are not true forecasts. On the other hand, naïve or seasonal naïve forecasts do not involve any parameters, and so fitted values are true forecasts in such cases.

19

Residuals

The “residuals” in a time series model are what is left over after fitting a model. For many (but not all) time series models, the residuals are equal to the difference between the observations and the corresponding fitted values:

e t¿

= y t− y t¿

Residuals are useful in checking whether a model has adequately captured the information in the data. A good forecasting method will yield residuals with the following properties:

1. The residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts. We can use the

ACF of the residuals e t¿

to check whether or not this is the case.

2. The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts are biased.

Any forecasting method that does not satisfy these properties can be improved. However, that does not mean that forecasting methods that satisfy these properties cannot be improved. It is possible to have several different forecasting methods for the same data set, all of which satisfy these properties. Checking these properties is important in order to see whether a method is using all of the available information, but it is not a good way to select a forecasting method.

If either of these properties is not satisfied, then the forecasting method can be modified to give better forecasts. Adjusting for bias is easy: if the residuals have mean m, then simply add m to all forecasts and the bias problem is solved. Fixing the correlation problem is harder, and we will not address it until Chapter 9 of FPP2. [http://otexts.org/fpp2/dynamic.html]

In addition to these essential properties, it is useful (but not necessary) for the residuals to also have the following two properties.

3. The residuals have constant variance.

4. The residuals are normally distributed.

These two properties make the calculation of prediction intervals easier (see Section 3.5 [http://otexts.org/fpp2/prediction-intervals.html] for an example). However, a forecasting method that does not satisfy these last two properties cannot necessarily be improved. Sometimes applying a Box-Cox transformation may assist with these properties, but otherwise there is usually little that you can do to ensure that your residuals have constant variance and a normal distribution. Instead, an alternative approach to obtaining prediction intervals is necessary. Again, we will not address how to do this until later in the FPP2 book.

Example 2 - Monthly Fastenal Sales and Number of Business Days (cont’d)

As a first example we consider the average total sales per day for Fastenal (2004-2013). Given the semi-seasonal nature of this time series we will use a seasonal naive forecast and then examine the residuals from this fit.

20

http://otexts.org/fpp2/prediction-intervals.html

http://otexts.org/fpp2/dynamic.html

Fastenal = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Fastenal%20Sales%20(2004-2013).csv")names(Fastenal)

## [1] "Time" "Month" "Month.Num" ## [4] "Year" "NumBDays" "AvSalesPD" ## [7] "Total.Sales" "Total.Fastner" "Total.Nonfastner"

TotSales = ts(Fastenal$Total.Sales,start=2004,frequency=12)TotSales = TotSales/1000000NumBusDays = monthdays(TotSales)NumBusDays

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec## 2004 31 29 31 30 31 30 31 31 30 31 30 31## 2005 31 28 31 30 31 30 31 31 30 31 30 31## 2006 31 28 31 30 31 30 31 31 30 31 30 31## 2007 31 28 31 30 31 30 31 31 30 31 30 31## 2008 31 29 31 30 31 30 31 31 30 31 30 31## 2009 31 28 31 30 31 30 31 31 30 31 30 31## 2010 31 28 31 30 31 30 31 31 30 31 30 31## 2011 31 28 31 30 31 30 31 31 30 31 30 31## 2012 31 29 31 30 31 30 31 31 30 31 30 31## 2013 31 28 31 30 31 30 31 31 30 31 30 31

AvgDaySales = TotSales/NumBusDaysautoplot(AvgDaySales)+ggtitle("Average Sales Per Day") + xlab("Year") + ylab("Millions $")

fc = snaive(AvgDaySales) # notice that we did not specify a forecast horizon (h)res = residuals(fc)

21

autoplot(res) + ggtitle("Residuals from Seasonal Naive") + xlab("Year") + ylab("Residuals")

summary(res)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## -1.5323 0.4249 0.7109 0.5469 0.8950 1.4072 12

gghistogram(res) + ggtitle("Residuals from Seasonal Naive Method") + xlab("Residuals")

## Warning: Removed 12 rows containing non-finite values (stat_bin).

22

ggAcf(res) + ggtitle("ACF for Residuals from Seasonal Naive Method")

23

qqnorm(res)

Clearly the seasonal naive forecast method does NOT fit this time series well. The financial crisis of 2008-2009 could not be forecast. The mean of the residuals is not zero, which is primarily due to the large negative residuals where the forecasted daily sales was MUCH higher than was actually observed during this economic slump. The ACF shows that is still substantial autocorrelation structure in the time series not addressed by our forecast. We will not try to address the problems with this fit at present, but we definitely know we have work to do.

Example 6 - Google Daily Stock Prices (02/25/13 - 12/06/13)goog200 = window(goog,start=1,end=200)autoplot(goog200) + ggtitle("Daily Closing Price of Google (02/25/13 - 12/06/13)") + xlab("Day") + ylab("Closing Price $")

fc = naive(goog200)res = residuals(fc)

24

autoplot(res) + ggtitle("Residuals from Naive Forecast - Google Price") + xlab("Day") + ylab("Residual ($)")

gghistogram(res) + ggtitle("Histogram of Residuals - Naive Method") + xlab("Residuals ($)")


25

summary(res)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## -8.7183 -2.4863 -0.1689 0.6967 3.1744 60.9138 1

ggAcf(res) + ggtitle("ACF of Residuals from Naive Method")

qqnorm(res)

These graphs show that the naïve method produces forecasts that appear to account for all available information. The mean of the residuals is very close to zero and there is no significant correlation in the residuals series. The time plot of the residuals shows that the variation of the

26

residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant. This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal - the right tail seems a little too long, even when we ignore the outlier. Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.

Example 4 - Monthly U.S. Liquor Sales (1980 -2007) (cont’d)fc = snaive(LiqSales)res = residuals(fc)autoplot(res) + ggtitle("Residuals from Seasonal Naive Forecast - NO transform") + xlab("Day") + ylab("Residual ($)")

gghistogram(res) + ggtitle("Histogram of Residuals - Seasonal Naive Method") + xlab("Residuals ($)")


27

summary(res)


ggAcf(res) + ggtitle("ACF of Residuals from Seasonal Naive Method - NO transform")

28

qqnorm(res)

logfc = snaive(LiqSales,lambda=0)res2 = residuals(logfc)autoplot(res2) + ggtitle("Residuals from Seasonal Naive Forecast - log transform") + xlab("Day") + ylab("Residual ($)")

29

gghistogram(res2) + ggtitle("Histogram of Residual: Seasonal Naive Method - log transform") + xlab("Residuals ($)")


summary(res2)


ggAcf(res2) + ggtitle("ACF of Residuals from Seasonal Naive Method - log transform")

30

qqnorm(res2)

The log transformation helped stabilize the variation in the residuals, however the seasonal naive fit still fails adequately model the time series as the residuals still show structure indicative of trend. A seasonal method with some way of dealing with this trend in these data might be helpful. Such a method is the Holt-Winter’s Method which we will discuss in Chapter 6 [http://otexts.org/fpp2/decomposition.html].

31

http://otexts.org/fpp2/decomposition.html

Portmanteau Tests for Autocorrelation

In addition to looking at the ACF plot, we can also do a more formal test for autocorrelation by considering a whole set of rk values as a group, rather than treating each one separately.

Recall that rk is the autocorrelation for lag k . When we look at the ACF plot to see whether each spike is within the required limits ($\pm {2\over{\sqrt{T}}}$), we are implicitly carrying out multiple hypothesis tests, each one with a small probability of giving a false positive (α=.05). When enough of these tests are done, it is likely that at least one will give a false positive, and so we may conclude that the residuals have some remaining autocorrelation, when in fact they do not.

In order to overcome this problem, we test whether the first h autocorrelations are significantly different from what would be expected from a white noise process. A test for a group of autocorrelations is called a portmanteau test, from a French word describing a suitcase containing a number of items.

One such test is the Box-Pierce test, based on the following statistic

Q=T∑k=1

h

rk2 ,

where h is the maximum lag being considered and T is the number of observations in the time series. If each rk is close to zero, then Q will be small. If some rk values are large (positive or negative), then Q will be large. The author suggests using h=10 for non-seasonal data and h=2m for seasonal data, where m is the period of seasonality. However, the test is not good when h is large, so if these values are larger than T /5, then use h=T /5 instead.

A related (and more accurate) test is the Ljung-Box test, based on

Q¿=T (T+2)∑k=1

h

¿¿

Again large values of Q¿ suggest that the autocorrelations do not come from a white noise time series. How large is too large? If the autocorrelations did come from a white noise series, then both Q and Q¿ would have a χ2−¿distribution with (h−K ) degrees of freedom, where K is the number of parameters in the model. If they are calculated from raw data (rather than residuals from a model), then K=0.

32

Example 6 - Google Daily Stock Prices (02/25/13 - 12/06/13)

For the Google stock price example, the naïve model has no parameters, soK=0 in this case also. If we used the naive model with drift, then the slope of the line segment connecting y1 and yT is a parameter, in which case K=1. Other models we examine later in the course will have K>0 as well.

autoplot(goog200) + ggtitle("Daily Closing Price of Google (02/25/13 - 12/06/13)") + xlab("Day") + ylab("Closing Price $")

fc = naive(goog200)res = residuals(fc)autoplot(res) + ggtitle("Residuals from Naive Forecast - Google Price") + xlab("Day") + ylab("Residual ($)")

33

ggAcf(res) + ggtitle("ACF of Residuals from Naive Method")

Box.test(res,lag=10,fitdf=0) # Here fitdf refers to the value of K

## ## Box-Pierce test## ## data: res## X-squared = 10.611, df = 10, p-value = 0.3886

Box.test(res,lag=10,fitdf=0,type="Lj")

## ## Box-Ljung test## ## data: res## X-squared = 11.031, df = 10, p-value = 0.3551

For both Q and Q¿ the results are not significant (p>0.05). Thus we have no evidence that results are not from a white noise series.

34

All of these methods for checking residuals are conveniently packaged into the R function checkresiduals, which will produce a time plot, ACF plot and histogram of the residuals (with an overlayed normal distribution for comparison), and do a Ljung-Box test with the correct degrees of freedom:

checkresiduals(naive(goog200)) #Notice the function is applied directly to the forecast function naive()

## ## Ljung-Box test## ## data: Residuals from Naive method## Q* = 11.031, df = 10, p-value = 0.3551## ## Model df: 0. Total lags used: 10

35

Example 7 - Dow Jones Industrial Average and Differencing (Assignment 1)

Recall on Assigment 1 - Problem 5 you asked to consider 292 consecutive trading days of the Dow Jones Index. In the problem you computed a new time series containing the daily changes (from previous trading day) of this index.

checkresiduals(dj)

ggtsdisplay(dj)

36

Box.test(dj,lag=25,fitdf=0)

## ## Box-Pierce test## ## data: dj## X-squared = 3405.9, df = 25, p-value < 2.2e-16

Box.test(dj,lag=25,fitdf=0,type="Lj")

## ## Box-Ljung test## ## data: dj## X-squared = 3544.9, df = 25, p-value < 2.2e-16

ddj = diff(dj,1)checkresiduals(ddj)

Box.test(ddj,lag=10,fitdf=0)

## ## Box-Pierce test## ## data: ddj## X-squared = 14.045, df = 10, p-value = 0.1709

37

Box.test(ddj,lag=10,fitdf=0,type="Lj")

## ## Box-Ljung test## ## data: ddj## X-squared = 14.461, df = 10, p-value = 0.153

checkresiduals(naive(dj))

## ## Ljung-Box test## ## data: Residuals from Naive method## Q* = 14.461, df = 10, p-value = 0.153## ## Model df: 0. Total lags used: 10

Even though the Dow Jones time series object dj does not contain residuals from a fit we can still use the plotting features of the checkresiduals function to view plots of the time series. The function ggtsdisplay will also accomplish this with exception of the histogram which is replaced by a PACF plot which we will discuss later in the course.

The plot of the time series dj and the ACF function clearly show that there is significant autocorrelation in the time series. The portmanteau tests confirm the obvious here.

The difference times series y t− y t−1 however looks consistent with a white noise series. The portmanteau tests confirm this. Finally we examine a the residuals from a Naive Method fit to these data. The results look identical to those from the differenced time series y t− y t−1. Why?

38

3 - the forecaster’s toolbox (sections 3.2 & 3.3)course1.winona.edu/bdeppa/fin...

Documents