forecasting - georgia state university - georgia state …dscsss/teaching/mgs3100… · ppt...

31
1 Module 4. Forecasting MGS3100

Upload: truongdat

Post on 22-Mar-2018

217 views

Category:

Documents


5 download

TRANSCRIPT

  • Module 4. Forecasting

    MGS3100

  • Forecasting

  • Quantitative Forecasting

    Casual Models:

    Causal

    Model

    Year 2000

    Sales

    Price

    Population

    Advertising

    Time Series Models:

    Time Series

    Model

    Year 2000

    Sales

    Sales1999

    Sales1998

    Sales1997

    --Forecasting based on data and models

  • Causal forecasting

    Regression

    Find a straight line that fits the data best.

    y = Intercept + slope * x (= b0 + b1x)

    Slope = change in y / change in x

    Best line!

    Intercept

    Chart3

    5

    6

    5

    7.5

    6

    8.5

    8

    10

    7

    8

    11

    8

    11

    Shoe Size (Y)

    Raw Data

    Example of Simple Regression - Does Shoe Size among teenagers depend on Age?

    (Can you predict the shoe size if you know the age?)

    AgeShoe Size

    115

    126

    125

    137.5

    136

    138.5

    148

    1510

    157

    178

    1811

    188

    1911

    Raw Data

    5

    6

    5

    7.5

    6

    8.5

    8

    10

    7

    8

    11

    8

    11

    Shoe Size (Y)

    Simple

    Example of Simple Regression - Does Shoe Size among teenagers depend on Age?

    (Can you predict the shoe size if you know the age?)

    Deviations fromSquared

    Age (X)Shoe Size (Y)the MeanDeviations

    115-2.76923076927.6686390533

    126-1.76923076923.1301775148

    125-2.76923076927.6686390533

    137.5-0.26923076920.0724852071

    136-1.76923076923.1301775148

    138.50.73076923080.5340236686

    1480.23076923080.0532544379

    15102.23076923084.9763313609

    157-0.76923076920.5917159763

    1780.23076923080.0532544379

    18113.230769230810.4378698225

    1880.23076923080.0532544379

    19113.230769230810.4378698225

    Mean Shoe Size =7.769230769248.8076923077

    Sum of Squared Deviations, shown in the ANOVA

    SUMMARY OUTPUTTable as the SS Total.

    Regression Statistics

    Multiple R0.798497882

    R Square0.6375988676R-Squared is = SSR/SST = 31.119/48.807, from the ANOVA table below.

    Adjusted R Square0.6046533101

    Standard Error1.2680680711Std. Error is the square root of the Mean Squared Error

    Observations13n is the number of obs., which is 13 in this case.

    ANOVADegrees of FreedomThe Mean Squares (MS) are computed by dividing SS (Sum of Squares by the degrees of freedom)

    dfSSMSFSignificance F

    Regression31.119729344731.119729344719.35310603650.0010645821

    Residual (Error)17.6879629631.607996633Shows that the overall model is significant. There is a 0.1% chance

    Total48.8076923077that the relationship is non-existent, and that we falsely believe the model.

    k is the number of independent (predictor) variables, in this case just 1 (age)

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept-1.17592592592.0635438706-0.56985748770.5802261973-5.71775765933.3659058075

    Age0.6120370370.13912409944.39921652530.00106458210.3058268040.91824727

    These coefficients are computed using a formula that guarantees that this is the best fitting line.

    RESIDUAL OUTPUTYou do not have to know the formulas. They are available in every basic stat book

    Squared

    ObservationPredicted Shoe SizeResidualsResiduals

    15.55648148-0.556481480.3096716392

    26.16851852-0.168518520.0283984911

    36.16851852-1.168518521.3654355281

    46.780555560.719444440.5176003086

    56.78055556-0.780555560.6092669753

    66.780555561.719444442.9564891975

    77.392592590.607407410.3689437586

    88.004629631.995370373.981502915

    98.00462963-1.004629631.0092806927

    109.22870370-1.228703701.5097127915

    119.840740741.159259261.3438820302

    129.84074074-1.840740743.3883264746

    1310.452777780.547222220.2994521605

    17.687962963Sum of Squared Residuals(Errors), shown in the

    ANOVA table.

    How and Why are the Sum of Squares shown in the ANOVA table calculated?

    The basic idea behind regression is to see if there is a relationship between X and Y, and if so, how well does X help predict Y.

    If there was no info on X (Age), but all we had was a sample of shoe sizes, our best estimate of shoe sizes would be the mean

    shoe size of about 7.7, shown at the top. However, this estimate would have a lot of error, since actual sizes deviate quite a bit from

    the mean. These deviations are computed and squared (to avoid + and - cancelling each other) and summed, to get a SST (Sum

    of Squares Total) value of 48.807.

    Now, when we consider the info provided by age, we can better estimate shoe size than simply using the mean size. We can now say

    that shoe size depends on Age according to the equation Y= 0.612 X - 1.1759. Now, our new estimates are better, but they are still

    not perfect. There are still errors (residuals) shown in the excel output at the bottom. If we square each of the errors again and add them,

    we get the SSE (Sum of Squared Errors) value of 17.687.

    This means that by using Age to do the regression, we reduced our error squares by 48.807-17.687, or by a value of 31.119,

    shown in the ANOVA table as SSR (Sum of Squares Regression). SSR is thus the reduction in SST brought about by the regression.

    In other words, the regression helped to explain away 31.119 out of the total of 48.807 of error. Thus, the proportion of variability in

    Y that is explained by the regression is 31.119/48.807 = 0.6376, which is the R-Squared value shown at the top.

    What are Degrees of Freedom?

    Once the SS are computed, the Mean Squares are computed by dividing by the degrees of freedom. Normally, a mean is simply

    the sum of n numbers divided by n. Here, however, when we find the mean, we must compensate for the fact that we are averaging

    errors, and even though there are n numbers, not all of them contribute to the error.

    For example, if there is only 1 data point, there is no chance (freedom) for any variation at all to occur. Hence, total degrees of

    freedom are always n-1. Thus, if there are 2 data points, there is one degree of freedom for variation to occur.

    Next, suppose there are 2 points of data. Even though they could be different values of Y, the process of using a variable X to do

    a regression means that we draw the best line through them. Now no matter what the points are, we can always draw a straight

    line perfectly through those points. Thus, there is no freedom for error to occur, since the variable X "used up" the single degree of

    freedom that Y had. In general, the number of independent variables used (K) is the number of degrees of freedom that are used up

    from the total available (n-1), leaving n-k-1 degrees available for error to occur. Thus the SS Error is divided by n-k-1 to find the mean

    squared error, instead of dividing by n.

    What is F-value? When is a model significant?

    F-value is the ratio of MSR/MSE = 31.119/1.6079. This shows the ratio of the average error that is explained by the regression to the average

    error that is still unexplained. Thus, the higher the F, the better the model, and the more confidence we have that the model that we

    derived from sample data actually applies to the whole population, and is not just an aberration found in the sample.

    In this case, the level of confidence is around 99.9%, reflected in the significance value of 0.00106 shown in the ANOVA table.

    That value was computed by looking at standardized tables that consider the F-value and your sample size to make that determination.

    Simple

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Shoe Size (Y)

    Age in Years

    Shoe Size

    Shoe Sizes of Teens

    Multiple-initial

    Example of Multiple Regression: Can Shoe Size (Y) be predicted by

    the independent variables X1 through X4?

    YX1X2X3X4

    Shoe SizeAgeWeightSexIQ Score

    511750100

    61285180

    51288050

    7.5131351120

    613800115

    8.5131800106

    814140096

    1015200088

    715110078

    817120065

    11181501101

    8181250105

    11191651130

    Female=0

    Male=1

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9805339782

    R Square0.9614468824

    Adjusted R Square0.9421703236

    Standard Error0.484985657

    Observations13

    ANOVA

    dfSSMSFSignificance F

    Regression446.92600360811.73150090249.87647915960.0000107054

    Residual81.88168869970.2352110875

    Total1248.8076923077

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept-1.7743425060.9297924371-1.9083210780.092770928-3.91844909750.3697640855

    Age0.34626069830.06232644985.55559797250.00053743450.20253555430.4899858422

    Weight0.03038034870.00418029997.26750462880.00008656980.02074055370.0400201438

    Sex0.79189071910.32269621242.45398206910.03968948280.04775143761.5360300006

    IQ Score0.00396324190.00714713770.55452155650.5943806023-0.01251809790.0204445817

    IQ not significantly related to Shoe Size

    Multiple-revised

    Multiple Regression with the insignificant variable (IQ) dropped

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9797780489

    R Square0.9599650251

    Adjusted R Square0.9466200335

    Standard Error0.4659535903

    Observations13

    ANOVA

    dfSSMSFSignificance F

    Regression346.85367757315.617892524371.93447942010.0000013077

    Residual91.95401473470.2171127483

    Total1248.8076923077

    CoefficientsStandard Errort StatP-valueLower 95%

    Intercept-1.51044939720.767427851-1.96819726480.0805754213-3.2464931304

    Age0.34745099110.05984507885.8058406470.00025758330.2120719142

    Weight0.03096629490.00388582827.96903343470.00002283110.0221759341

    Sex0.85821709040.28794892222.9804490460.01543832690.2068308771

    RESIDUAL OUTPUT

    ObservationPredicted Shoe SizeResiduals

    14.63398362350.3660163765

    26.1493146541-0.1493146541

    35.3839964485-0.3839964485

    48.0450803909-0.5450803909

    55.48371708030.5162829197

    68.5803465716-0.0803465716

    77.68914576620.3108542338

    89.89457445210.1054255479

    97.1076079099-0.1076079099

    108.1121728412-0.1121728412

    1110.24682977010.7531702299

    128.6144553069-0.6144553069

    1311.0587751849-0.0587751849

  • Causal Forecasting Models

    Curve Fitting: Simple Linear RegressionOne Independent Variable (X) is used to predict one Dependent Variable (Y): Y = a + b XGiven n observations (Xi, Yi), we can fit a line to the overall pattern of these data points. The Least Squares Method in statistics can give us the best a and b in the sense of minimizing (Yi - a - bXi)2:

    Regression formula is an optional learning objective

  • Curve Fitting: Simple Linear Regression Find the regression line with ExcelUse Function:

    a = INTERCEPT(Y range; X range)

    b = SLOPE(Y range; X range)

    Use SolverUse Excels Tools | Data Analysis | Regression Curve Fitting: Multiple RegressionTwo or more independent variables are used to predict the dependent variable:

    Y = b0 + b1X1 + b2X2 + + bpXp

    Use Excels Tools | Data Analysis | Regression

  • Time Series Forecasting Process

    Look at the data (Scatter Plot)

    Forecast using one or more techniques

    Evaluate the technique and pick the best one.

    Observations from the scatter PlotTechniques to tryWays to evaluateData is reasonably stationary (no trend or seasonality)Heuristics - Averaging methods Naive Moving Averages Simple Exponential Smoothing MAD MAPE Standard Error BIASData shows a consistent trendRegression Linear Non-linear Regressions (not covered in this course) MAD MAPE Standard Error BIAS R-SquaredData shows both a trend and a seasonal patternClassical decomposition Find Seasonal Index Use regression analyses to find the trend component MAD MAPE Standard Error BIAS R-Squared

  • Evaluation of Forecasting Model

    BIAS - The arithmetic mean of the errors

    n is the number of forecast errors

    Excel: =AVERAGE(error range)

    Mean Absolute Deviation - MAD

    No direct Excel function to calculate MAD

  • Evaluation of Forecasting Model

    Mean Square Error - MSE

    Excel: =SUMSQ(error range)/COUNT(error range)Standard error is square root of MSEMean Absolute Percentage Error - MAPE

    R2 - only for curve fitting model such as regressionIn general, the lower the error measure (BIAS, MAD, MSE) or the higher the R2, the better the forecasting model

  • Stationary data forecasting

    Nave

    I sold 10 units yesterday, so I think I will sell 10 units today.

    n-period moving average

    For the past n days, I sold 12 units on average. Therefore, I think I will sell 12 units today.

    Exponential smoothing

    I predicted to sell 10 units at the beginning of yesterday; At the end of yesterday, I found out I sold in fact 8 units. So, I will adjust the forecast of 10 (yesterdays forecast) by adding adjusted error ( * error). This will compensate over (under) forecast of yesterday.

  • Nave Model

    The simplest time series forecasting model Idea: what happened last time (last year, last month, yesterday) will happen again this time Nave Model:

    Algebraic: Ft = Yt-1

    Yt-1 : actual value in period t-1

    Ft : forecast for period t

    Spreadsheet: B3: = A2; Copy down

  • Moving Average Model

    Simple n-Period Moving Average

    Issues of MA Model Nave model is a special case of MA with n = 1 Idea is to reduce random variation or smooth dataAll previous n observations are treated equally (equal weights)Suitable for relatively stable time series with no trend or seasonal pattern

  • Smoothing Effect of MA Model

    Longer-period moving averages (larger n) react to actual changes more slowly

  • Moving Average Model

    Weighted n-Period Moving Average

    Typically weights are decreasing: w1>w2>>wnSum of the weights = wi = 1Flexible weights reflect relative importance of each previous observation in forecastingOptimal weights can be found via Solver

  • Weighted MA: An Illustration

    Month Weight Data

    August 17%130

    September 33%110

    October 50%90

    November forecast:

    FNov = (0.50)(90)+(0.33)(110)+(0.17)(130)

    = 103.4

  • Exponential Smoothing

    Concept is simple!Make a forecast, any forecastCompare it to the actualNext forecast isPrevious forecast plus an adjustmentAdjustment is fraction of previous forecast errorEssentiallyNot really forecast as a function of timeInstead, forecast as a function of previous actual and forecasted value

  • Simple Exponential Smoothing

    A special type of weighted moving average Include all past observationsUse a unique set of weights that weight recent observations much more heavily than very old observations:

  • Simple ES: The Model

    New forecast = weighted sum of last period actual value and last period forecast

    : Smoothing constant Ft :Forecast for period t Ft-1:Last period forecast Yt-1:Last period actual value

  • Simple Exponential Smoothing

    Properties of Simple Exponential SmoothingWidely used and successful modelRequires very little dataLarger , more responsive forecast; Smaller , smoother forecast (See Table 13.2)best can be found by Solver Suitable for relatively stable time series

  • Time Series Components

    Trendpersistent upward or downward pattern in a time seriesSeasonal Variation dependent on the time of year Each year shows same patternCyclical up & down movement repeating over long time frameEach year does not show same patternNoise or random fluctuations follow no specific pattern short duration and non-repeating

  • Time Series Components

    Time

    Trend

    Random

    movement

    Time

    Cycle

    Time

    Seasonal

    pattern

    Demand

    Time

    Trend with

    seasonal pattern

  • Trend Model

    Curve fitting method used for time series data (also called time series regression model) Useful when the time series has a clear trend Can not capture seasonal patterns Linear Trend Model: Yt = a + bt t is time index for each period, t = 1, 2, 3,

    Chart2

    0.6

    1.2

    1.8

    2.4

    3

    3.6

    4.2

    4.8

    5.4

    6

    Sheet1

    10.811600.6

    29.05096679927401.2

    337.41229744354401.8

    4102.42602.4

    5223.606797752003

    6423.27182755292603.6

    7725.99415975614404.2

    81158.5237502967404.8

    91749.611605.4

    102529.822128134717006

    111324.32827878891080

    121795.79027728741470

    132376.41884565831920

    143080.13236079232430

    153921.3956380353000

    164915.23630

    176077.04538159794320

    187422.92414618395070

    198969.30635612365880

    2010733.1262919996750

    Sheet1

    Sheet2

    Sheet3

    Sheet4

    Sheet5

  • Pattern-based forecasting - Trend

    Regression Recall Independent Variable X, which is now time variable e.g., days, months, quarters, years etc.

    Find a straight line that fits the data best.

    y = Intercept + slope * x (= b0 + b1x)

    Slope = change in y / change in x

    Best line!

    Intercept

    Chart3

    5

    6

    5

    7.5

    6

    8.5

    8

    10

    7

    8

    11

    8

    11

    Shoe Size (Y)

    Raw Data

    Example of Simple Regression - Does Shoe Size among teenagers depend on Age?

    (Can you predict the shoe size if you know the age?)

    AgeShoe Size

    115

    126

    125

    137.5

    136

    138.5

    148

    1510

    157

    178

    1811

    188

    1911

    Raw Data

    5

    6

    5

    7.5

    6

    8.5

    8

    10

    7

    8

    11

    8

    11

    Shoe Size (Y)

    Simple

    Example of Simple Regression - Does Shoe Size among teenagers depend on Age?

    (Can you predict the shoe size if you know the age?)

    Deviations fromSquared

    Age (X)Shoe Size (Y)the MeanDeviations

    115-2.76923076927.6686390533

    126-1.76923076923.1301775148

    125-2.76923076927.6686390533

    137.5-0.26923076920.0724852071

    136-1.76923076923.1301775148

    138.50.73076923080.5340236686

    1480.23076923080.0532544379

    15102.23076923084.9763313609

    157-0.76923076920.5917159763

    1780.23076923080.0532544379

    18113.230769230810.4378698225

    1880.23076923080.0532544379

    19113.230769230810.4378698225

    Mean Shoe Size =7.769230769248.8076923077

    Sum of Squared Deviations, shown in the ANOVA

    SUMMARY OUTPUTTable as the SS Total.

    Regression Statistics

    Multiple R0.798497882

    R Square0.6375988676R-Squared is = SSR/SST = 31.119/48.807, from the ANOVA table below.

    Adjusted R Square0.6046533101

    Standard Error1.2680680711Std. Error is the square root of the Mean Squared Error

    Observations13n is the number of obs., which is 13 in this case.

    ANOVADegrees of FreedomThe Mean Squares (MS) are computed by dividing SS (Sum of Squares by the degrees of freedom)

    dfSSMSFSignificance F

    Regression31.119729344731.119729344719.35310603650.0010645821

    Residual (Error)17.6879629631.607996633Shows that the overall model is significant. There is a 0.1% chance

    Total48.8076923077that the relationship is non-existent, and that we falsely believe the model.

    k is the number of independent (predictor) variables, in this case just 1 (age)

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept-1.17592592592.0635438706-0.56985748770.5802261973-5.71775765933.3659058075

    Age0.6120370370.13912409944.39921652530.00106458210.3058268040.91824727

    These coefficients are computed using a formula that guarantees that this is the best fitting line.

    RESIDUAL OUTPUTYou do not have to know the formulas. They are available in every basic stat book

    Squared

    ObservationPredicted Shoe SizeResidualsResiduals

    15.55648148-0.556481480.3096716392

    26.16851852-0.168518520.0283984911

    36.16851852-1.168518521.3654355281

    46.780555560.719444440.5176003086

    56.78055556-0.780555560.6092669753

    66.780555561.719444442.9564891975

    77.392592590.607407410.3689437586

    88.004629631.995370373.981502915

    98.00462963-1.004629631.0092806927

    109.22870370-1.228703701.5097127915

    119.840740741.159259261.3438820302

    129.84074074-1.840740743.3883264746

    1310.452777780.547222220.2994521605

    17.687962963Sum of Squared Residuals(Errors), shown in the

    ANOVA table.

    How and Why are the Sum of Squares shown in the ANOVA table calculated?

    The basic idea behind regression is to see if there is a relationship between X and Y, and if so, how well does X help predict Y.

    If there was no info on X (Age), but all we had was a sample of shoe sizes, our best estimate of shoe sizes would be the mean

    shoe size of about 7.7, shown at the top. However, this estimate would have a lot of error, since actual sizes deviate quite a bit from

    the mean. These deviations are computed and squared (to avoid + and - cancelling each other) and summed, to get a SST (Sum

    of Squares Total) value of 48.807.

    Now, when we consider the info provided by age, we can better estimate shoe size than simply using the mean size. We can now say

    that shoe size depends on Age according to the equation Y= 0.612 X - 1.1759. Now, our new estimates are better, but they are still

    not perfect. There are still errors (residuals) shown in the excel output at the bottom. If we square each of the errors again and add them,

    we get the SSE (Sum of Squared Errors) value of 17.687.

    This means that by using Age to do the regression, we reduced our error squares by 48.807-17.687, or by a value of 31.119,

    shown in the ANOVA table as SSR (Sum of Squares Regression). SSR is thus the reduction in SST brought about by the regression.

    In other words, the regression helped to explain away 31.119 out of the total of 48.807 of error. Thus, the proportion of variability in

    Y that is explained by the regression is 31.119/48.807 = 0.6376, which is the R-Squared value shown at the top.

    What are Degrees of Freedom?

    Once the SS are computed, the Mean Squares are computed by dividing by the degrees of freedom. Normally, a mean is simply

    the sum of n numbers divided by n. Here, however, when we find the mean, we must compensate for the fact that we are averaging

    errors, and even though there are n numbers, not all of them contribute to the error.

    For example, if there is only 1 data point, there is no chance (freedom) for any variation at all to occur. Hence, total degrees of

    freedom are always n-1. Thus, if there are 2 data points, there is one degree of freedom for variation to occur.

    Next, suppose there are 2 points of data. Even though they could be different values of Y, the process of using a variable X to do

    a regression means that we draw the best line through them. Now no matter what the points are, we can always draw a straight

    line perfectly through those points. Thus, there is no freedom for error to occur, since the variable X "used up" the single degree of

    freedom that Y had. In general, the number of independent variables used (K) is the number of degrees of freedom that are used up

    from the total available (n-1), leaving n-k-1 degrees available for error to occur. Thus the SS Error is divided by n-k-1 to find the mean

    squared error, instead of dividing by n.

    What is F-value? When is a model significant?

    F-value is the ratio of MSR/MSE = 31.119/1.6079. This shows the ratio of the average error that is explained by the regression to the average

    error that is still unexplained. Thus, the higher the F, the better the model, and the more confidence we have that the model that we

    derived from sample data actually applies to the whole population, and is not just an aberration found in the sample.

    In this case, the level of confidence is around 99.9%, reflected in the significance value of 0.00106 shown in the ANOVA table.

    That value was computed by looking at standardized tables that consider the F-value and your sample size to make that determination.

    Simple

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Shoe Size (Y)

    Age in Years

    Shoe Size

    Shoe Sizes of Teens

    Multiple-initial

    Example of Multiple Regression: Can Shoe Size (Y) be predicted by

    the independent variables X1 through X4?

    YX1X2X3X4

    Shoe SizeAgeWeightSexIQ Score

    511750100

    61285180

    51288050

    7.5131351120

    613800115

    8.5131800106

    814140096

    1015200088

    715110078

    817120065

    11181501101

    8181250105

    11191651130

    Female=0

    Male=1

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9805339782

    R Square0.9614468824

    Adjusted R Square0.9421703236

    Standard Error0.484985657

    Observations13

    ANOVA

    dfSSMSFSignificance F

    Regression446.92600360811.73150090249.87647915960.0000107054

    Residual81.88168869970.2352110875

    Total1248.8076923077

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept-1.7743425060.9297924371-1.9083210780.092770928-3.91844909750.3697640855

    Age0.34626069830.06232644985.55559797250.00053743450.20253555430.4899858422

    Weight0.03038034870.00418029997.26750462880.00008656980.02074055370.0400201438

    Sex0.79189071910.32269621242.45398206910.03968948280.04775143761.5360300006

    IQ Score0.00396324190.00714713770.55452155650.5943806023-0.01251809790.0204445817

    IQ not significantly related to Shoe Size

    Multiple-revised

    Multiple Regression with the insignificant variable (IQ) dropped

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9797780489

    R Square0.9599650251

    Adjusted R Square0.9466200335

    Standard Error0.4659535903

    Observations13

    ANOVA

    dfSSMSFSignificance F

    Regression346.85367757315.617892524371.93447942010.0000013077

    Residual91.95401473470.2171127483

    Total1248.8076923077

    CoefficientsStandard Errort StatP-valueLower 95%

    Intercept-1.51044939720.767427851-1.96819726480.0805754213-3.2464931304

    Age0.34745099110.05984507885.8058406470.00025758330.2120719142

    Weight0.03096629490.00388582827.96903343470.00002283110.0221759341

    Sex0.85821709040.28794892222.9804490460.01543832690.2068308771

    RESIDUAL OUTPUT

    ObservationPredicted Shoe SizeResiduals

    14.63398362350.3660163765

    26.1493146541-0.1493146541

    35.3839964485-0.3839964485

    48.0450803909-0.5450803909

    55.48371708030.5162829197

    68.5803465716-0.0803465716

    77.68914576620.3108542338

    89.89457445210.1054255479

    97.1076079099-0.1076079099

    108.1121728412-0.1121728412

    1110.24682977010.7531702299

    128.6144553069-0.6144553069

    1311.0587751849-0.0587751849

  • Pattern-based forecasting Seasonal

    Once data turn out to be seasonal, deseasonalize the data.The methods we have learned (Heuristic methods and Regression) is not suitable for data that has pronounced fluctuations.Make forecast based on the deseasonalized dataReseasonalize the forecastGood forecast should mimic reality. Therefore, it is needed to give seasonality back.

  • Pattern-based forecasting Seasonal

    Deseasonalize

    Forecast

    Reseasonalize

    Actual data

    Deseasonalized data

    Example (SI + Regression)

  • Pattern-based forecasting Seasonal

    Deseasonalization

    Deseasonalized data = Actual / SI

    Reseasonalization

    Reseasonalized forecast

    = deseasonalized forecast * SI

  • Seasonal Index

    Whats an index?RatioSI = ratio between actual and average demandSupposeSI for quarter demand is 1.20Whats that mean?Use it to forecast demand for next fallSo, where did the 1.20 come from?!

  • Calculating Seasonal Indices

    Quick and dirty method of calculating SIFor each year, calculate average demandDivide each demand by its yearly averageThis creates a ratio and hence a raw indexFor each quarter, there will be as many raw indices as there are yearsAverage the raw indices for each of the quartersThe result will be four values, one SI per quarter

  • Classical decomposition

    Start by calculating seasonal indicesThen, deseasonalize the demandDivide actual demand values by their SI values

    y = y / SI

    Results in transformed data (new time series)Seasonal effect removedForecastRegression if deseasonalized data is trendyHeuristics methods if deseasonalized data is stationaryReseasonalize with SI

  • Causal or Time series?

    What are the difference?

    Which one to use?

  • Can you

    describe general forecasting process?compare and contrast trend, seasonality and cyclicality?describe the forecasting method when data is stationary?describe the forecasting method when data shows trend?describe the forecasting method when data shows seasonality?

    n

    Error

    n

    Forecast)

    -

    (Actual

    BIAS

    =

    =

    n

    |

    Error

    |

    n

    Forecast

    -

    Actual

    |

    MAD

    =

    =

    |

    n

    (Error)

    n

    Forecast)

    -

    (Actual

    MSE

    2

    2

    =

    =

    n

    Actual

    |

    Forecast

    -

    Actual

    |

    MAPE

    =

    %

    100

    *

    n

    X

    b

    n

    Y

    a

    n

    X

    X

    n

    Y

    X

    Y

    X

    b

    i

    i

    i

    i

    i

    i

    i

    i

    -

    =

    -

    -

    =

    2

    2

    )

    (

    /

    n

    n

    t

    Y

    2

    t

    Y

    1

    t

    Y

    =

    n

    periods

    n

    previous

    in

    values

    actual

    of

    Sum

    t

    F

    -

    +

    +

    -

    +

    -

    =

    L

    n

    t

    Y

    n

    w

    2

    t

    Y

    2

    w

    1

    t

    Y

    1

    w

    =

    t

    F

    -

    +

    +

    -

    +

    -

    L

    a

    a

    a

    a

    a

    a

    a

    (

    )

    (

    )

    (

    )

    1

    1

    1

    2

    3

    -

    -

    -

    0

    1