time series

26
Using Statistical Data to Make Decisions For more information, contact: Tom Ilvento 213 Townsend Hall, Newark, DE 19717 302-831-6773 [email protected] Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and Resource Economics T The last module examined the multiple regression modeling techniques as a tool for analyzing financial data. Multiple regression is a commonly used technique for explaining the relationship between several variables of interest. Although financial analysts are interested in explaining the relationship between correlated variables, they also want to know the future trend of key variables. Business managers and policymakers regularly use forecasts of financial variables to help make important decisions about production, purchases, market conditions, and other choices about the best allocation of resources. How are these decisions made and what forecasting techniques are used? Are the forecasts accurate and reliable? This module introduces some basic skills for analyzing and forecasting data over time. We will discuss several forecasting techniques and how they are used in generating forecasts. Furthermore, we will also examine important issues on how to evaluate and judge the accuracy of forecasts and discuss some of the common challenges to developing good forecasts. BASICS OF FORECASTING Time series are any univariate or multivariate quantitative data collected over time either by private or government agencies. Common uses of time series data include: 1) modeling the relationships between various time series; 2) forecasting the underlying behavior of the data; and 3) forecasting what effect changes in one variable may have on the future behavior of another variable. There are two major categories of forecasting approaches: Qualitative and Quantitative. Qualitative Techniques: Qualitative techniques refer to a number of forecasting approaches based on subjective estimates from informed experts. Usually, no statistical data analysis is involved. Rather, estimates are based on a deliberative process of a group of experts, based on their Key Objectives • Understand the basic components of forecasting including qualitative and quantitative techniques Understand the three types of time series forecasts • Understand the basic characteristics and terms of forecasts See an example of a forecast using trend analysis In this Module We Will: • Run a time series forecast with trend data using Excel • Compare a linear and nonlinear trend analysis

Upload: ashutoshsrivastava

Post on 07-Sep-2015

219 views

Category:

Documents


1 download

DESCRIPTION

Time Series

TRANSCRIPT

  • Using Statistical Data to Make Decisions

    For more information, contact:

    Tom Ilvento 213 Townsend Hall, Newark, DE 19717 302-831-6773 [email protected]

    Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento,

    University of Delaware, College of Agriculture and Natural Resources, Food and Resource Economics

    T The last module examined the multiple regressionmodeling techniques as a tool for analyzing financialdata. Multiple regression is a commonly usedtechnique for explaining the relationship between severalvariables of interest. Although financial analysts areinterested in explaining the relationship between correlatedvariables, they also want to know the future trend of keyvariables. Business managers and policymakers regularlyuse forecasts of financial variables to help make importantdecisions about production, purchases, market conditions,and other choices about the best allocation of resources.How are these decisions made and what forecastingtechniques are used? Are the forecasts accurate andreliable?

    This module introduces some basic skills for analyzing andforecasting data over time. We will discuss severalforecasting techniques and how they are used in generatingforecasts. Furthermore, we will also examine importantissues on how to evaluate and judge the accuracy offorecasts and discuss some of the common challenges todeveloping good forecasts.

    BASICS OF FORECASTING

    Time series are any univariate or multivariate quantitativedata collected over time either by private or governmentagencies. Common uses of time series data include: 1)modeling the relationships between various time series; 2)forecasting the underlying behavior of the data; and 3)forecasting what effect changes in one variable may have onthe future behavior of another variable. There are two majorcategories of forecasting approaches: Qualitative andQuantitative.

    Qualitative Techniques: Qualitative techniques refer to anumber of forecasting approaches based on subjectiveestimates from informed experts. Usually, no statistical dataanalysis is involved. Rather, estimates are based on adeliberative process of a group of experts, based on their

    Key Objectives

    Understand the basiccomponents of forecastingincluding qualitative andquantitative techniques

    Understand the three types oftime series forecasts

    Understand the basiccharacteristics and terms offorecasts

    See an example of a forecastusing trend analysis

    In this Module We Will:

    Run a time series forecastwith trend data using Excel

    Compare a linear andnonlinear trend analysis

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 2

    past knowledge and experience. Examples are the Delphitechnique and scenario writing, where a panel of experts areasked a series of questions on future trends, the answersare recorded and shared back to the panel, and the processis repeated so that the panel builds a shared scenario. Thekey to these approaches is a recognition that forecasting issubjective, but if we involve knowledgeable people in aprocess we may get good insights into future scenarios.This approach is useful when good data are not available, orwe wish to gain general insights through the opinions ofexperts.

    Quantitative Techniques: refers to forecasting based onthe analysis of historical data using statistical principles andconcepts. The quantitative forecasting approach is furthersub-divided into two parts: causal techniques and time seriestechniques. Causal techniques are based on regressionanalysis that examines the relationship between the variableto be forecasted and other explanatory variables. In contrast,Time Series techniques usually use historical data for onlythe variable of interest to forecast its future values (SeeTable 1 below).Table 1. Alternative Forecasting Approaches

    Categories Application Specific Techniques

    QualitativeTechniques

    Useful whenhistorical data arescare or non-existent

    Delphi TechniqueScenario WritingVisionary ForecastHistoric Analogies

    CasualTechniques

    Useful whenhistorical data areavailable for both thedependent (forecast)and the independentvariables

    Regression ModelsEconometric ModelsLeading IndicatorsCorrelation Methods

    TimeSeriesTechniques

    Useful whenhistorical data existsfor forecast variableand the data exhibitsa pattern

    Moving AverageAutoregression ModelsSeasonal RegressionModelsExponential SmoothingTrend ProjectionCointegration Models

    Forecast horizon: The forecast horizon is defined as thenumber of time periods between the current period and thedate of a future forecast. For example, for the case ofmonthly data, if the current period is month T, then a forecastof sales for month T+3 has a forecast horizon of three steps.For quarterly data, a step is one quarter (three months), but

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 3

    for annual data, one step is one year (twelve months). Theforecast changes with the forecast horizon. The choice of thebest and most appropriate forecasting models and strategyusually depends on the forecasting horizon.

    Three Types of Time Series Forecasts

    Point Forecast: a single number or a "best guess." It doesnot provide information on the level of uncertainty around thepoint estimate/forecast. For example, an economist mayforecast a 10.5% growth in unemployment over the next sixmonths.

    Interval Forecast: relative to a point forecast, this is a rangeof forecasted values which is expected to include the actualobserved value with some probability. For example, aneconomist may forecast growth in unemployment rate to bein the interval, 8.5% to 12.5%. An interval forecast is relatedto the concept of confidence intervals.

    Density Forecast: this type of forecast provides informationon the overall probability distribution of the future values ofthe time series of interest. For example, the density forecastof future unemployment rate growth might be normallydistributed with a mean of 8.3% and a standard deviation of1.5%. Relative to the point forecast, both the density and theinterval forecasts provide more information since we providemore than a single estimate, and we provide a probabilitycontext for the estimate. However, despite the importanceand more comprehensive information contained in densityand interval forecasts, they are rarely used by businesses.Rather, the point forecast is the most commonly used typeof forecast by businesses managers and policymakers.

    CHARACTERISTICS OF TIME SERIES

    Any given time series can be divided into four categories:trend, seasonal components, cyclical components, andrandom fluctuations.

    Trend. The trend is a long-term, persistent downward orupward change in the time series value. It represents thegeneral tendency of a variable over an extended time period.We usually observe a steady increase or decline in thevalues of a time series over a given time period. We cancharacterize an observed trend as linear or non-linear. Forexample, a data plot of U.S. overall retail sales data over the1955-1996 time period exhibits an upward trend, which maybe reflecting the increase in the purchases of consumerdurables and non-durables over time (See Figure 1). The

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 4

    Monthly U.S. Retail Sales, 1955 to 1996

    $0$25,000$50,000$75,000

    $100,000$125,000$150,000$175,000$200,000

    Oct-5

    4

    Jun-

    57

    Mar-6

    0De

    c-62

    Sep-6

    5

    Jun-6

    8

    Mar-7

    1

    Dec-

    73

    Aug-7

    6

    May-7

    9

    Feb-8

    2

    Nov-

    84

    Aug-8

    7

    May-9

    0Ja

    n-93

    Oct-9

    5Ju

    l-98

    Monthly Trend

    Mill

    ions

    D

    olla

    rs

    Figure 1. Time Series Plot of Monthly U.S. RetailSales, 1955 to 1996

    U.S. Monthly Housing Starts, 1990 to 2003

    020406080

    100120140160180200

    Jan-

    90

    Sep-9

    2

    Jun-

    95

    Mar-9

    8De

    c-00

    Sep-0

    3

    Thou

    sand

    s of

    Sta

    rts

    Figure 2. Graph of Seasonal Fluctuations in HousingStarts

    plot in Figure 1 shows a nonlinear trend - sales wereincreasing at an increasing rate. This type of relationship isa curvilinear trend best represented by a polynomialregression.

    Seasonal Components. Seasonal components to a timeseries refer to a regular change in the data values of a timeseries that occurs at the same time every year. This is avery common characteristic of financial and other businessrelated data. The seasonal repetition may be exact(deterministic seasonality) or approximate (stochasticseasonality). Sources of seasonality are technologies,preferences, and institutions that are linked to specific timesof the year. It may be appropriate to remove seasonalitybefore modeling macroeconomic time series (seasonallyadjusted series) since the emphasis is usually onnonseasonal fluctuations of macroeconomic series.However, the removal of seasonal components is notappropriate when forecasting business variables. It isimportant to account for all possible sources of variation inbusiness time series.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 5

    U.S. Monthly Housing Starts Adjusted for Seasonality, 1990 to 2003

    0

    50

    100

    150

    200

    Jan-9

    0

    Sep-9

    2

    Jun-

    95

    Mar-9

    8De

    c-00

    Sep-0

    3

    Thou

    san

    ds o

    f Sta

    rts

    Figure 3. Seasonally Adjust Housing Starts

    For example, retail sales of many household and clothingproducts are very high during the fourth quarter of the year.This is a reflection of Christmas seasonal purchases. Also,as shown in Figure 2, housing starts exhibits seasonalpatterns as most houses are started in the spring while thewinter period shows very low numbers of home constructiondue to the colder weather. Figure 3 shows the same data,seasonally adjusted by the Census Bureau. Notice now theregular pattern of sharp fluctuations has disappeared fromthe adjusted figures.

    Cyclical Components: refer to periodic increases anddecreases that are observed over more than a one-yearperiod. In contrast to seasonal components, these types ofvariation are also known as business cycles. They cover alonger time period and are not subject to a systematicpattern that is easily predictable. Cyclical variation canproduce peak periods known as booms, and trough periodsknown as recessions. Although economist may try, it is noteasy to predict economic booms and recessions.

    Random (Irregular) Components: refer to irregularvariations in time series that are not due to any of the threetime series components: trend, seasonality, and cyclical.This is also known as residual or error component. Thiscomponent is not predictable and is usually eliminated fromthe time series through data smoothing techniques.Stationary Time Series refers to a time series without atrend, seasonal, or cyclical components. A stationary timeseries contains only a random error component.

    Analysis of time series always assumes that the value of thevariable, Yt, at time period t, is equal to the sum of the fourcomponents and is represented by:

    Yt = Tt + St +Ct + Rt

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 6

    TIME SERIES DATA MODIFICATION STRATEGIES

    When dealing with data over time, there are several thingswe might do to adjust, smooth, or otherwise modify databefore we begin our analysis. Some of these strategies arerelatively straightforward while others involve a moreelaborate model which requires decisions on our part.Within the regression format that we are emphasizing, someof these techniques can be built into the regression model asan alternative to modifying the data.

    Adjusting For Inflation. Often time series data involvefinancial variables, such as income, expenditures, orrevenues. Financial data over time are influenced byinflation. The longer the time series (in years), the morepotential for inflation to be a factor in the data. With financialdata over time, part of the trend may simply be a reflectionof inflation. While there may be an upward trend in sales,part of the result might be a function of inflation and not realgrowth.

    The dominant strategy to deal with inflation is to adjust thedata by the Consumer Price Index (CPI), or a similar indexthat is geared toward a specific commodity or sector of theeconomy. For example, we might use a health care indexwhen dealing with health expenditures because this sector ofthe economy has experienced higher inflation than generalcommodities. The CPI is an index based on a basket ofmarket goods. The Bureau of Labor Statistics calculatesthese indices on an annual basis, often breaking it down byregion of the country and commodity. There are manyplaces to find the consumer price index. The following aretwo useful Internet sites which contain CPI indices as well asdefinitions and discussions about their use.

    Bureau of Labor http://www.bls.gov/cpi/home.htm#overview

    M i n n e a p o l i s F e d e r a l R e s e r v e B a n khttp://minneapolisfed.org/Research/data/us/calc/index.cfm

    Using the CPI to adjust your data is relatively straightforward. The index is often based on 100 and is centeredaround a particular year. The other years are expressed asbeing above (>100) or below (

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 7

    Table 2. U.S. Flood Insurance Losses Adjusted bythe CPI for 2002 Dollars, Partial Table, 1978 to 2002

    Loss1 Trend CPI CPI Ratio Adj Loss$147,719 1978 65.2 2.76 $407,587$483,281 1979 72.6 2.48 $1,197,552$230,414 1980 82.4 2.18 $503,053$127,118 1981 90.9 1.98 $251,579$198,296 1982 96.5 1.86 $369,673

    $519,416 1997 160.5 1.12 $582,199$885,658 1998 163.0 1.10 $977,484$754,150 1999 166.6 1.08 $814,355$250,796 2000 172.2 1.04 $262,010

    $1,268,446 2001 177.1 1.02 $1,288,500$338,624 2002 179.9 1.00 $338,624

    1Loss data are expressed in $1,000s

    Seasonal Data. Data that reflect a regular pattern that istied to a time of year is referred to as seasonal data. Seasonal data will show up on a graph as a regular hillsand valleys in the data across the trend. The seasonsmay reflect quarters, months, or some other regularreoccurring time period throughout the year. The best wayto think of seasonal variations is that part of the pattern inthe data reflect a seasonal component.

    In most cases we expect the seasonal variations and arenot terribly concerned with explaining them. However, wedo want to account for them when we make our estimateof the trend in the data. Seasonal variations may mask orimpede our ability to model a trend. We can account forseasonal data in one of two main ways. The first is todeseasonalize the data by adjusting the data for seasonaleffects. This often is done for us when we usegovernment data that has already been adjusted. Thesecond method is to account for the seasons within ourmodel. In regression based models this is done throughthe use of dummy variables. In the latter approach, weuse dummy variables to account for quarters (3 dummyvariables) or months (11 dummy variables).The deseasonal adjustment is done through a ratio-to-moving-average method. The exact computations to dothis are beyond the scope of this module. Thecomputations, while involved and at times tedious, are notdifficult to do with a spreadsheet program. This approachinvolves the following basic steps.

    1. Calculate moving averages based on the number ofseasons (4 for quarters, 12 for months)

    2. Calculate a centered moving average when dealingwith an even number of seasons. The center is theaverage of two successive moving averages to

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 8

    center around a particular season. For example, theaverage of quarters 1 to 4, and quarters 2 to 5, wouldbe added together and divided by two to give acentered moving average for quarter 3.

    3. Calculate a ratio to moving average of a specificseasons value by dividing the value by its centeredmoving average

    4. Calculate an average ratio to moving average for eachof the seasons in the time series, referred to as aseasonal index

    5. Use this seasonal index to adjust each season in thetime series

    Data are often available from secondary sources alreadyadjusted. The Housing Start data used in this Moduleincluded data that were seasonally adjusted through asimilar, but more sophisticated method as listed above. Whenever possible, use data that have already beenadjusted by the agency or source that created the data. Most likely they will have the best method, experience, andknowledge to adjust the data. Using seasonally adjusteddata in a modeling technique, such as regression, allowsus to make a better estimate of the trend in the data. However, when we want to make a future prediction offorecast with a model using deseasonalized data, we needto add back in the seasonal component. In essence wehave to readjust the forecast using the seasonal index tomake an accurate forecast.

    The regression approach to deal with seasonal variationsis to include the seasons into the model. This is donethrough the use of dummy variables. By including dummyvariables that represent the seasons we are accounting forseasonal variation in the model. If the seasons arequarters (4 time periods), we will include three dummyvariables with one quarter represented in the referencecategory. If the seasons are months, we will include 11dummy variables with one month as the referencecategory. It does not matter which season is the referencecategory, but we must always have k-1 dummy variables inthe model (where k equals the number of seasons).Lets look at an example of the regression approach whichuses the U.S. monthly housing starts from 1990 to 2003. The data show a strong seasonal effect, as might beexpected. Housing starts are highest in the spring throughsummer and lowest in November through February. Thereis a strong upward trend in the data which reflects growthin housing starts over time. Figure 2 shows this upward

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 9

    trend with season fluctuations. The R2 for the trend in thedata is .45 and the estimate of the slope coefficient is.3609 (data not shown).The seasonal adjusted data provide a much better fit witha R2 = .78 and the estimate of the slope coefficient for thetrend is .3589 (data not shown). We have to be careful incomparing R2 across these models because thedependent variable is not the same in both models, butclearly removing the seasonal component helped toimprove the fit of the trend. The last model is the originaldata, unadjusted for seasonal variations, but dummyvariables representing the months are included in themodel. Since there are 12 months, 11 dummy variableswere included in the model, labeled M1 (January) throughM11 (November). The reference month is December andis represented in the intercept.

    The regression output from Excel is included in Table 3. The model shows significant improvement over the firstmodel and R2 increases from .45 to .87. Including thedummy variables for the months improved the fit of themodel dramatically. The estimate for the slope coefficientfor the trend is very similar to that estimated in theadjusted data (.3576). If we focus on the dummyvariables, we can see that the coefficients for M1(January) and M2 (February) are not significantly differentfrom zero, indicating that housing starts for January andFebruary are not significantly different as those forDecember, the reference category in the model. All theother dummy variable coefficients are significantly differentfrom zero and follow an expected pattern - the coefficientsare positive and get larger as we move toward the summermonths.

    Both the deseasonalized model and the regression modelwith dummy season variables fit the data quite well. It isalso comforting that the estimate for the trend is verysimilar in both models. Either approach will provide asimple but good model to make forecasts. The regressionapproach seemed to be the best model had the addedadvantage that forecasts from the model will directlyinclude the seasonal component, unlike thedeseasonalized model. The regression approach withdummy variables for season is relatively straight-forwardand can easily be modeled with Excel.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 10

    Table 3. Regression Output of Housing Start Data Including SeasonalDummy Variables

    Regression StatisticsMultiple R 0.934R Square 0.872Adjusted R Square 0.862Standard Error 9.725Observations 168

    ANOVAdf SS MS F Sig F

    Regression 12 100092.725 8341.060 88.195 0.000Residual 155 14659.199 94.575Total 167 114751.924

    Coef Std Error t Stat P-valueIntercept 67.6221 2.950 22.921 0.000Trend 0.3576 0.016 23.056 0.000M1 -5.5877 3.680 -1.519 0.131M2 -1.6310 3.679 -0.443 0.658M3 24.5399 3.678 6.671 0.000M4 37.1823 3.678 10.110 0.000M5 41.8604 3.677 11.383 0.000M6 41.6957 3.677 11.340 0.000M7 37.3452 3.677 10.158 0.000M8 35.1662 3.676 9.566 0.000M9 29.4943 3.676 8.023 0.000M10 33.4652 3.676 9.104 0.000M11 12.3076 3.676 3.348 0.001

    Data Smoothing. Data smoothing is a strategy to modifythe data through a model to remove the random spikesand jerks in the data. We will look at two smoothingtechniques, both of which are available in Excel - movingaverages and exponential smoothing. Smoothing can bethought of as an alternative modeling approach toregression, or as an intermediate step to prepare the datafor analysis in regression.

    The Moving Averages approach replaces data with anaverage of past values. In essence it fits a relativelysimple model to the data which uses an average to modelthe data points. The rationale behind this approach is tonot allow a single data point to have too much influence onthe trend, by tempering it with observations surrounding orprior to the value. The result should provide modified datathat shows the direction of the trend, but without therandom noise that can hide or distort. The data arereplaced with an average of past values that move forward

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 11

    by a set span. For example, if we have annual data, wemight replace each value with a span of a three or fiveyear average. We would calculate the averages based onsuccessive data points that move forward over time, sothat each three year average uses two old and one newdata point to calculate the new average. Please note,some approaches to moving averages have the modifiedvalue surrounded by the observations used in the average(some observations before the value and some after). Excel exclusively uses past values to calculate the movingaverage, so that the first three observations are used toestimate an average for the third observation in a 3-periodmoving average.

    With a three year average, we will lose two data points inour series because we cant calculate an average for thefirst or second year. The number of time periods for ourmoving average is a decision point which is influenced bythe length of our series, the time units involved, and ourexperiences with the volatility of the data. If I have annualdata for 20 years, I might not feel comfortable with a 5-year moving average because I would lose too many datapoints (four). However, if the data were collected monthlyover the 20 years, a five or six month moving averagewould not be a limitation. We want to pick a number thatprovides a reasonable number of time periods so thatextreme values are tempered in the series, but we dontwant to lose too much information by picking a time spanfor the average that is too large. The longer the period ofthe span for the moving average, the less influenceextreme values have in the calculation, and thus thesmoother the data will be. However, too manyobservations in the calculation will distort the trend bysmoothing out all of the trend. The decision point for thespan (or interval in Excel) is part art and part science, andoften requires an iterative process guided by experience.

    Lets look at an example with Excel using the housing startdata. The data are given in months, and the series, from1990 to 2003, provided 168 time periods. We haveenough data to have flexibility with a longer movingaverage. I will use a 6 month moving average. In Excelthis is accomplished fairly easily using the followingcommands.

    Tools

    Data Analysis

    Moving Average

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 12

    It is wise to insert a blank column in the worksheet whereyou want the results to go and to label that column inadvance. Within the Moving Average Menu, the optionsare relatively simple.

    Input Range The source of the original timeseries

    Labels A check box of whether the first rowcontains a label

    Interval The span of the moving average,given as an integer (such as a 3, 4,or 5 period moving average)

    Output Range Where you want the new data to go(you only need to specify the firstcell of the column). Excel will onlyput the results in the currentworksheet. Please note, if you useda label for the original data, Excelwill not create a label for the movingaverage. Therefore you shouldspecify the first row of the dataseries, not the first label row.

    Chart Output Excel provides a scatter plot of theoriginal data and the smoothed data

    Standard Errors Excel will calculate standard errorsof the estimates compared with theoriginal data. Each calculatedstandard error is based on theinterval specified for the movingaverage.

    The following table is part of the output for the housingstart data (see Table 4). You can see that with a 6 intervalmoving average (translated as a 6-month moving averagefor our data), the first five observations are undefined forthe new series. The sixth value is simply the sum of thefirst six observations, divided by 6.

    Value = (99.20+86.90+108.50+119.00+212.10+117.80)/6 Value = 108.75

    The next value is calculated in a similar way:

    Value = (86.90+108.50+119.00+212.10+117.80+111.20)/6 Value = 110.75

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 13

    U.S. Monthly Housing Starts based on 6-Month Moving Averages, 1990 to 2003

    020406080

    100120140160180

    Jan-

    90

    Sep-9

    2

    Jun-

    95

    Mar-9

    8

    Dec-

    00

    Sep-0

    3

    Thou

    san

    ds o

    f Sta

    rts

    Figure 4. Graph of a 6-Month Moving Average for U.S.Housing Starts, 1990 to 2003

    Table 4. Partial Excel Output of a 6-Month MovingAverage of the Housing Start Data

    Housing Starts 6-Month Avg99.20 #N/A86.90 #N/A

    108.50 #N/A119.00 #N/A121.10 #N/A117.80 108.75111.20 110.75102.80 113.4093.10 110.83

    The graph below shows the 6-month moving average datafor housing starts plotted over time. The data still showsan upward trend with seasonal fluctuations, but clearly therevised data have removed some of the noise in theoriginal time series. Moving averages is a simple andeasy way to adjust the data, even if it is a first step beforefurther analysis with more sophisticated methods. Caremust be taken in choosing the span of the average - toolittle will not help, but too much risks smoothing the trend. Experience and an iterative approach usually guide mostattempts at moving averages.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 14

    Exponential Smoothing. Another approach at smoothingtime series data is exponential smoothing. It forecastsdata on a weighted average of past observations, but itplaces more weight on more recent observations to makeits estimates. The model for exponential smoothing ismore complicated than that of moving averages. Theequations for exponential smoothing follow this format.

    Ft+1 = Ft + "(yt - Ft)or Ft+1 = "yt + (1-")Ftwhere: Ft+1 = forecast value for period t+1

    yt = Actual value for period t

    Ft = Forecast value for period t

    " = Alpha (a smoothing constant where (0 # " #1)From this equation we can see that the forecast for thenext period will equal the forecast mode for this period plusor minus an adjustment. We wont have to worry too muchabout the equations because Excel will make thecalculations for us. However, we will have to specify theconstant, ". Alpha (") will be a value between zero andone and it reflects how much weight is given to distant pastvalues of y when making our forecast. A very low value of" (.1 to .3) means that more weight is given to pastvalues, whereas a high value of " (.6 or higher) meansthat more weight is given to recent values and the forecastreacts more quickly to changes in the series. In this sense" is similar to the span in a moving average - low values of" are analogous to a higher span. You are required tochoose alpha when forecasting with exponentialsmoothing. Excel uses a default value of .3.

    In Excel exponential smoothing is accomplished fairlyeasily using the following commands.

    Tools

    Data Analysis

    Exponential Smoothing

    It is wise to insert a blank column in the worksheet whereyou want the results to go and to label that column inadvance. Within the Exponential Smoothing Menu, theoptions are relatively simple.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 15

    U.S. Monthly Housing Starts based on Exponential Smoothing (alpha = .3), 1990 to 2003

    0

    50

    100

    150

    200

    Jan-

    90

    Sep-9

    2

    Jun-

    95

    Mar-9

    8

    Dec-

    00

    Sep-0

    3

    Thou

    san

    ds o

    f Sta

    rts

    Figure 5. Example of Exponential Smoothing of theHousing Start Data

    Input Range The source of the original timeseries

    Damping Factor The level of (1-alpha). The defaultis .3.

    Labels A check box of whether the first rowcontains a label

    Output Range Where you want the new data to go(you only need to specify the firstcell of the column). Excel will onlyput the results in the currentworksheet. Please note, if you useda label for the original data, Excelwill not create a label for theexponential smoothing. Thereforeyou should specify the first row ofthe data series, not the first labelrow, for the output.

    Chart Output Excel provides a scatter plot of theoriginal data and the smoothed data

    Standard Errors Excel will calculate standard errorsof the estimates compared with theoriginal data.

    The graph below shows the exponential smoothing datafor housing starts plotted over time. The data still showsan upward trend with seasonal fluctuations, but like themoving average example the revised data have removedsome of the noise in the original time series. Exponentialsmoothing is a bit more complicated approach andrequires software to do it well. There are several modelsto choose from (not identified here), some of which canincorporate seasonal variability. Like moving averages,exponential smoothing may be a first step before furtheranalysis with more sophisticated methods. Care must betaken in choosing the level of alpha for the model. Experience and an iterative approach usually guide mostattempts at exponential smoothing.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 16

    STEPS TO MODELING AND FORECASTING TIMESERIES

    Step 1: Determine Characteristics/Components ofSeries

    Some time series techniques require the elimination of allcomponents (trend, seasonal, cyclical) except the randomfluctuation in the data. Such techniques require modelingand forecasting with stationary time series. In contrast,other methods are only applicable to a time series with thetrend component in addition to a random component. Hence, it is important to first identify the form of the timeseries in order to ascertain which components are present.All business data have a random component. Since therandom component cannot be predicted, we need toremove it via averaging or data smoothing. The cyclicalcomponent usually requires the availability of long datasets with minimum of two repetitions of the cycle. Forexample, a 10-year cycle requires, at least 20 years ofdata. This data requirement often makes it unfeasible toaccount for the cyclical component in most business andindustry forecasting analysis. Thus, business data isusually inspected for both trend and seasonalcomponents.

    How can we detect trend component?

    Inspect time series data plot Regression analysis to fit trend line to data and

    check p-value for time trend coefficient

    How can we detect seasonal component?

    Requires at least two years worth of data at higherfrequencies (monthly, quarterly)

    Inspect a folded annual time series data plot - eachyear superimposed on others

    Check Durbin-Watson regression analysisdiagnostic for serial correlation

    Step 2: Select Potential Forecasting Techniques

    For business and financial time series, only trend andrandom components need to be considered. Figure 3summarizes the potential choices of forecastingtechniques for alternative forms of time series. Forexample, for stationary time series (only randomcomponent exist), the appropriate approach are stationaryforecasting methods such as moving averages, weighted

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 17

    No Yes

    No Yes Yes No

    Figure 6. Potential Choices of Forecasting Techniques

    Trend Component Exists?

    Time Series Data

    Stationary Forecasting Methods

    NaveMoving Average

    Weighted Moving AverageExponential Smoothing

    Seasonal Component Exist?

    Seasonal Forecasting Methods

    Seasonal Multiple RegressionSeasonal Autoregression

    Time Series Decomposition

    Trend Forecasting MethodsLinear Trend Projection

    Non-linear TrendProjection

    Trend Autoregression

    Seasonal Component Exist?

    moving average, and exponential smoothing. Thesemethods usually produce less accurate forecasts if thetime series is non-stationary. Time series methods thataccount for trend or seasonal techniques are best fornon-stationary business and financial data. Thesemethods include: seasonal multiple regression, trend andseasonal autoregression, and time series decomposition.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 18

    Step 3: Evaluate Forecasts From Potential Techniques

    After deciding on which alternative methods are suitablefor available data, the next step is to evaluate how welleach method performs in forecasting the time series. Measures such as R2 and the sign and magnitude of theregression coefficients will help provide a generalassessment of our models. However, for forecasting, anexamination of the error terms from the model is usuallythe best strategy for assessing performance.

    First, each method is used to forecast the data series. Second, the forecast from each method is evaluated tosee how well it fits relative to the actual historical data. Forecast fit is based on taking the difference betweenindividual forecast and the actual value. This exerciseproduces the forecast errors. Instead of examiningindividual forecast errors, it is preferable and much easierto evaluate a single measurement of overall forecast errorfor the entire data under analysis. Error (et) on individualforecast, the difference between the actual value and theforecast of that value, is given as:

    et = Yt - FtWhere:

    et = the error of the forecastYt = the actual valueFt = the forecast value

    There are several alternative methods for computingoverall forecast error. Examples of forecast errormeasures include: mean absolute deviation (MAD), meanerror (ME), mean square error (MSE), root mean squareerror (RMSE), mean percentage error (MPE), and meanabsolute percentage error (MAPE). The best forecastmodel is that with the smallest overall error measurementvalue. The choice of which error criteria are appropriatedepends on the forecasters business goals, knowledge ofdata, and personal preferences. The next sectionpresents the formulas and a brief description of fivealternative overall measures of forecast errors.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 19

    MEe

    N

    ii

    N

    ==

    1

    MADe

    N

    ii

    N

    ==

    1

    1) Mean Error (ME)

    A quick way of computing forecast errors is the mean error(ME) which is a simple average of all the errors of forecastfor a time series data. This involves the summing of all theindividual forecast errors and dividing by the number offorecast. The formula for calculating mean absolutedeviation is given as:

    An issue with this measure is that if forecasts are bothover (positive errors) and below (negative errors) theactual values, ME will include some cancellation effectsthat may potentially misrepresent the actual magnitude ofthe forecast error.

    2) Mean Absolute Deviation (MAD)

    The mean absolute deviation (MAD) is the mean oraverage of the absolute values of the errors. The formulafor calculating mean absolute deviation is given as:

    Relative to the mean error (ME), the mean absolutedeviation (MAD) is commonly used because by taking theabsolute values of the errors, it avoids the issues with thecanceling effects of the positive and negative values. Ndenotes the number of forecasts.

    3) Mean Square Error (MSE)

    Another popular way of computing forecast errors is themean square error (MSE) which is computed by squaringeach error and then taking a simple average of all thesquared errors of forecast. This involves the summing ofall the individual squared forecast errors and dividing bythe number of forecast. The formula for calculating meansquare error is given as:

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 20

    MSEe

    N

    ii

    N

    ==

    21

    ( )MPE

    N

    e

    Yi

    Ni

    i

    =

    =

    1001

    ( )MAPE

    N

    e

    Yi

    Ni

    i

    =

    =

    1001

    The MSE is preferred by some because it also avoids theproblem of the canceling effects of positive and negativevalues of forecast errors.

    4) Mean Percentage Error (MPE)

    Instead of evaluating errors in terms of absolute values,we sometimes compute forecast errors as a percentage ofthe actual values. The mean percent error (MPE) is theratio of the error to the actual value being forecastmultiplied by 100. The formula for calculating meanpercent error is given as:

    5) Mean Absolute Percentage Error (MAPE)

    Similar to the mean percent error (MPE), the meanabsolute percent error (MAPE) is the average of theabsolute values of the percentage of the forecast errors. The formula for calculating mean absolute percent error isgiven as:

    The MAPE is another measure that also circumvents theproblem of the canceling effects of positive and negativevalues of forecast errors.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 21

    Monthly U.S. Retail Sales, 1955 to 1996

    $0$25,000$50,000$75,000

    $100,000$125,000$150,000$175,000$200,000

    Oct-5

    4

    Jun-

    57

    Mar-6

    0De

    c-62

    Sep-6

    5

    Jun-6

    8

    Mar-7

    1

    Dec-

    73

    Aug-7

    6

    May-7

    9

    Feb-8

    2

    Nov-

    84

    Aug-8

    7

    May-9

    0Ja

    n-93

    Oct-9

    5Ju

    l-98

    Monthly Trend

    Mill

    ions

    D

    olla

    rs

    Figure 7. U.S. Monthly Retail Sales, 1955 to 1996

    TWO EXAMPLES OF FORECASTING TECHNIQUES

    Forecasting Time Series with Trend

    The first example will focus on modeling data with a trend.For this example we will look at monthly U.S. Retail Salesdata from Jan. 1955 to Jan. 1996. The data are given inmillion of dollars and are not seasonally adjusted oradjusted for inflation. We know there is a curvilinearrelationship in this data, so we will be able to see howmuch better we can do in our estimates by estimating asecond order polynomial to the data.

    Our strategy will be the following:

    1. Examine the scatter plot of the data

    2. Decide on two alternative models, one linear and theother nonlinear

    3. Split the sample into two parts. The first part will bedesignated as the estimation sample. It containsmost of the data and be used to estimate the twomodels (1955:1 to 1993:12). The second part of thedata is called the validation sample and will be usedto assess the ability of the models to forecast into thefuture.

    4. After we determine which model is best, we will re-estimate the preferred model using all the data. Thismodel will be used to make future forecasts.

    The plot of the data shows an upward trend, but the trendappears to be increasing at an increasing rate (see Figure7). A second order polynomial could provide a better fit tothis data and will be used as an alternative model to thesimple linear trend.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 22

    Table 5. Regression of Monthly Retail Sales on Trend

    Regression StatisticsMultiple R 0.942R Square 0.888Adjusted R Square 0.888Standard Error 15853.125Observations 468

    ANOVAdf SS MS F Sig F

    Regression 1 929960937041.49 929960937041.49 3700.28 0.000Residual 466 117115854291.71 251321575.73Total 467 1047076791333.20

    Coef Std Error t Stat P-valueIntercept -15826.216 1467.974 -10.781 0.000Trend 329.955 5.424 60.830 0.000

    Linear Trend Regression. The first model is a linearregression of Retail Sales from January 1955 to December1993. The Excel Regression output is given below inTable 5. The R2 for the model is fairly high, .89. Thecoefficient for trend is positive and significantly differentfrom zero. The estimated regression equation is:

    Yt = -15,826.216 + 329.955(Trend)I used the residual option in Excel to calculate columns ofpredicted values and residuals for the data. From these Iwas able to calculate the Mean Absolute Difference(MAD), Mean Percentage Error (MPE), and the MeanAbsolute Percentage Error (MAPE). The average valuesfor the data in the analysis for each sample are give below.

    MAD MPE MAPEAverageEstimation Sample 13920.40 6.50 41.34

    Average ValidationSample 37556.53 20.71 20.71

    These figures are not easy to interpret on their own, butthey will make more sense once we compare them to thesecond model. However, if you look at the residuals youwould notice that there are long strings of consecutivepositive residuals followed by strings of negative residuals(data not shown). This pattern repeats itself several times. The pattern reflects that the relationship is nonlinear andthe model systematically misses the curve of the data.

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 23

    Table 6. Polynomial Regression of U.S. Monthly Sales on Trendand Trend Squared

    Regression StatisticsMultiple R 0.998R Square 0.997Adjusted R Square 0.997Standard Error 2725.375Observations 468

    ANOVAdf SS MS F Sig F

    Regression 2 1043622925925.31 521811462962.65 70252.40 0.000Residual 465 3453865407.89 7427667.54Total 467 1047076791333.20

    Coef Std Error t Stat P-valueIntercept 19245.229 379.562 50.704 0.000Trend -117.765 3.738 -31.509 0.000TrendSq 0.955 0.008 123.703 0.000

    Polynomial Trend Regression. The alternative model isa polynomial or quadratic model of the form:

    Yt = bo + b1Trend + b2Trend2

    This model is linear in the parameters, but will fit a curve tothe data. The form of the curve depends upon the signs ofthe coefficients for b1 and b2. If b2 is negative, the curvewill show an increasing function at a decreasing rate,eventually turning down. If it is positive, the curve willincrease at an increasing rate. The regression output ofthe polynomial equation is given in Table 6. R2 for thismodel is much higher, .997, which indicates that addingthe squared trend term in the model improved the fit. Thecoefficient for Trend2 is positive and significant (p< .001). Once again I used the residual option in Excel to calculateMAD, MPE, and MAPE for the estimation sample and thevalidation sample. The following table contains the resultsof this analysis. Each of the summary error measures aresmaller for the polynomial regression, indicating thesecond model fits the data better and will provide betterforecasts.

    MAD MPE MAPEAverage EstimationSample 2307.65 0.06 6.07Average ValidationSample 3303.41 -1.77 1.88

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 24

    Forecasting Time Series with Seasonality

    This second example focuses on how to estimate andforecast time series that exhibit seasonality. A relativelystraightforward approach to modeling and forecastingseasonality is through the use of dummy variables inmultiple regressions to represent the seasons. Dummy orindicator variables were introduced earlier in the moduleson simple and multiple regressions. Dummy variables areused to represent qualitative variables in a regression.Recall that for any k categories, only k-1 dummy variablesare needed to represent it in a regression. For example,we can represent quarterly time series with three dummyvariables and monthly series by eleven dummy variables.The excluded quarter or month is known as the referencecategory. For example, the complete model for a monthlytime series can be specified as follows:

    Yt = bo + b1*Time + b2*M1+ b3*M2+ b4*M3+ b5*M4+ b6*M5+b7*M6+ b8*M7+ b9*M8+ b10*M9+ b11*M10+ b12M11

    where:

    bo is the intercept

    b1 is the coefficient for the time trend component

    b2, b3, , b12 are the coefficients that indicate howmuch each month differs from the reference month,month 12 (December).

    M1, M2, , M11 are the dummy variables for the first 11months (= 1 if the observation is from the specifiedmonth, otherwise =0)

    Yt is the monthly number of U.S. housing starts, in1,000s

    Using the U.S. Housing Starts used earlier, we illustratehow to produce out-of-sample forecast for a data with amonthly seasonal component. First, we analyze themonthly time series plot of the data and identify a seasonalcomponent in the time series. Then we create 11 monthlyseasonal dummy variables. Although we will useDecember as the reference category, any of the monthscan be chosen as the reference month. Next, divide thedata set into two sub-samples. The first sub-sample willbe designated as the "estimation sample" while the secondsub-sample represents the "validation sample." Then, weestimate seasonal dummy variable regression model withthe estimation sample (1990:1 - 2002:12) and hold outsome data as the validation sample 2003:1 - 2003:12) tovalidate the accuracy of the trend regression forecasting

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 25

    U.S. Monthly Housing Starts, 1990 to 2003

    y = 0.0119x - 298.14R2 = 0.4482

    020406080

    100120140160180200

    Jan-

    90

    Sep-9

    2

    Jun-

    95

    Mar-9

    8De

    c-00

    Sep-0

    3

    Thou

    san

    ds o

    f Sta

    rts

    Figure 8. Simple Linear Regression of U.S. MonthlyHousing Starts on Trend, 1990 to 2003

    models. The model without the seasonal dummies isshown in the graph below (see Figure 8). The R2 for thismodel is low, only .45.

    A full model is then estimated for the data from January1990 to December 2002 (estimation sample). The Exceloutput is in Table 7. The full model dramatically improvesR2 to .864. We can assume that most of the remainingvariation is due to the cyclical component of the timeseries. However, this model is only designed to capturethe seasonal (and trend) component. Note that all theseasonal dummies (except JAN and FEB dummies) arestatistically significant as shown by their very low p-values. This implies that the seasonal regression model is a goodmodel for forecasting this time series. The seasonaleffects are relatively low in the winter months, but risequickly in the spring when most home construction getsstarted. The seasonal effects seem to have peaked by themonth of June Including the dummy variables for monthhas improved the fit of the model.

    The estimated regression coefficients are then used togenerate the error measures for the estimation sampleand the validation sample, as shown below. The figuresfor MAD, MPE, and MAPE are all reasonable for thismodel, for both the estimation sample and the validationsample. Given the partial nature of the model, which onlyaccount for seasonal effects, the measures of forecasterror looks reasonable. In order to obtain a more reliableforecast with lower errors, we will need to account for thecyclical factors in the macroeconomic data for housingstarts. Overall, the model fits the data well.

    MAD MPE MAPEAverage EstimationSample 7.18 -0.81 6.56Average ValidationSample 10.16 3.36 6.61

  • Using Statistical Data to Make Decisions: Time Series Forecasting Page 26

    Table 7. Regression of Housing Starts on Trend and Season

    Regression StatisticsMultiple R 0.934R Square 0.872Adjusted R Square 0.862Standard Error 9.725Observations 168

    ANOVAdf SS MS F Sig F

    Regression 12 100092.72 8341.06 88.19 0.000Residual 155 14659.20 94.58Total 167 114751.92

    Coef Std Error t Stat P-valueIntercept 67.622 2.950 22.921 0.000Trend 0.358 0.016 23.056 0.000M1 -5.588 3.680 -1.519 0.131M2 -1.631 3.679 -0.443 0.658M3 24.540 3.678 6.671 0.000M4 37.182 3.678 10.110 0.000M5 41.860 3.677 11.383 0.000M6 41.696 3.677 11.340 0.000M7 37.345 3.677 10.158 0.000M8 35.166 3.676 9.566 0.000M9 29.494 3.676 8.023 0.000M10 33.465 3.676 9.104 0.000M11 12.308 3.676 3.348 0.001

    CONCLUSION

    This module introduced various methods available fordeveloping forecasts from financial data. We definedsome forecasting terms and also discussed someimportant issues for analyzing and forecasting data overtime. In addition, we examined alternative ways to evaluateand judge the accuracy of forecasts and discuss some ofthe common challenges to developing good forecasts.Although mastering the techniques is very important,equally important is the forecasters knowledge of thebusiness problems and good familiarity with the data andits limitations. Finally, the quality of forecasts from anytime series model is highly dependent on the quality andquantity of data (information) available when forecasts aremade. This is another way of saying Garbage in, garbageout.

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice