forecasting with regression models trend analysis business forecasting prof. dr. burç Ülengin itu...

Post on 23-Dec-2015

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FORECASTING WITH REGRESSION FORECASTING WITH REGRESSION

MODELS MODELS

TREND ANALYSISTREND ANALYSIS

BUSINESS FORECASTINGBUSINESS FORECASTING

Prof. Dr. Burç ÜlenginProf. Dr. Burç Ülengin

ITUMANAGEMENT ENGINEERING FACULTY

FALL 2011

OVERVIEW

The bivarite regression modelData inspectionRegression forecast processForecasting with simple linear trendCausal regression modelStatistical evaluation of regression modelExamples...

The Bivariate Regression Model

The bivariate regression model is also known a simple regression model

It is a statistical tool that estimates the relationship between a dependent variable(Y) and a single independent variable(X).

The dependent variable is a variable which we want to forecast

The Bivariate Regression Model

)X(fY General form

Dependent variable

Independent variable

Specific form: Linear Regression Model

XY 10

Random disturbance

The Bivariate Regression Model

XY 10

•The regression model is indeed a line equation

1= slope coefficient that tell us the rate of change in Y per unit change in X

•If 1= 5, it means that one unit increase in X causes 5 unit increase in Y

is random disturbance, which causes for given X, Y can take different values

•Objective is to estimate 0 and 1 such a way that the fitted values should be as close as possible

The Bivariate Regression ModelGeometrical Representation

X

Y

Poor fit

Good fit

The red line is more close the data points than the blue one

Best Fit Estimates

210

2^

2

^

10

^

10

)XbbY()Y-Y(emin

OLS-Estimate SquareLeast Ordinary

Y-Ye

error term

XbbY

model regression Sample

XY

model regression Population

population

sample

Best Fit Estimates-OLS

XbYb

)XnX(

)YXnXY(b

)XbbY()Y-Y(emin

OLS-Estimate SquareLeast Ordinary

10

21

210

2^

2

Misleading Best Fits

X

Y

X

Y

X

Y

X

Y

e2=100e2=100

e2=100

e2=100

THE CLASSICAL ASSUMPTIONS

1. The regression model is linear in the coefficients, correctly specified, & has an additive error term.

2. E() = 0.3. All explanatory variables are uncorrelated with the error

term.4. Errors corresponding to different observations are

uncorrelated with each other.5. The error term has a constant variance.6. No explanatory variable is an exact linear function of any

other explanatory variable(s).7. The error term is normally distributed such that:

2,0 ~ Niidi

Regression Forecasting Process

Data consideration: plot the graph of each variable over time and scatter plot. Look at Trend Seasonal fluctuation Outliers

To forecast Y we need the forecasted value of X

Reserve a holdout period for evaluation and test the estimated equation in the holdout period

1T101T XbbY

An Example: Retail Car Sales

The main explanatory variables: Income Price of a car Interest rates- credit usage General price level Population Car park-number of cars sold up to time-replacement

purchases Expectation about future

For simple-bivariate regression, income is chosen as an explanatory variable

Bi-variate Regression Model

Population regression model

Our expectation is1>0But, we have no all available data at hand, the data set

only covers the 1990s.We have to estimate model over the sample periodSample regression model is

tt10t DPIRCS

tt10t eDPIbbRCS

Retail Car Sales and Disposable Personal Income Figures

17500

18000

18500

19000

19500

20000

20500

1400

1600

1800

2000

2200

90 91 92 93 94 95 96 97 98

DPI RCS

Quarterly car sales

000 carsDisposable income $

OLS Estimate

Dependent Variable: RCS

Method: Least Squares

Sample: 1990:1 1998:4

Included observations: 36

Variable CoefficientStd. Error t-Statistic Prob.

C 541010.9 746347.9 0.724878 0.4735

DPI 62.39428 40.00793 1.559548 0.1281

R-squared 0.066759 Mean dependent var 1704222.

Adjusted R-squared 0.039311 S.D. dependent var 164399.9

S.E. of regression 161136.1 Akaike info criterion 26.87184

Sum squared resid 8.83E+11 Schwarz criterion 26.95981

Log likelihood -481.6931 F-statistic 2.432189

Durbin-Watson stat 1.596908 Prob(F-statistic) 0.128128

DPIbbRCS 10

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit

change in X

When the DPI increases one $, the number of cars sold increases 62. Hypothesis test related with 1

H0: 1 =0 H1: 1 0 t test is used to test the validity of H0

t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0

• t= 1,56 < t table or Pr = 0.1281 > 0.05 Do not Reject H0

• DPI has no effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation

in Y explained by X 0<R2<1 ,

R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.066 indicates very weak explanation power

Hypothesis test related with R2

H0: R2=0 H1: R20 F test check the hypothesis

• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic=2.43 < F table or Pr = 0.1281 > 0.05 Do not reject H0

• Estimated equation has no power to explain RCS figures

Graphical Evaluation of Fitand Error Terms

-400000

-200000

0

200000

400000

1400000

1600000

1800000

2000000

2200000

90 91 92 93 94 95 96 97 98

Residual Actual Fitted

Residuls show clear seasonal pattern

Model Improvement

When we look the graph of the series, the RCS exhibits clear seasonal fluctuations, but PDI does not.

Remove seasonality using seasonal adjustment method.

Then, use seasonally adjusted RCS as a dependent variable.

Seasonal Adjustment

Sample: 1990:1 1998:4 Included observations: 36 Ratio to Moving Average Original Series: RCS Adjusted Series: RCSSA

Scaling Factors:• 1 0.941503• 2 1.119916• 3 1.016419• 4 0.933083

Seasonally Adjusted RCS and RCS

1400000

1600000

1800000

2000000

2200000

90 91 92 93 94 95 96 97 98

RCS RCSSA

OLS EstimateDependent Variable: RCSSA

Method: Least Squares

Sample: 1990:1 1998:4

Included observations: 36

Variable CoefficientStd. Error t-Statistic Prob.

C 481394.3 464812.8 1.035674 0.3077

DPI 65.36559 24.91626 2.623411 0.0129

R-squared 0.168344 Mean dependent var 1700000.

Adjusted R-squared 0.143883 S.D. dependent var 108458.4

S.E. of regression 100352.8 Akaike info criterion 25.92472

Sum squared resid 3.42E+11 Schwarz criterion 26.01270

Log likelihood -464.6450 F-statistic 6.882286

Durbin-Watson stat 0.693102 Prob(F-statistic) 0.012939

DPIbbRCSSA 10

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y

per unit change in X

When the DPI increases one $, the number of cars sold increases 65.

Hypothesis test related with 1 H0: 1 =0 H1: 1 0 t test is used to test the validity of H0

t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0

• t= 2,62 < t table or Pr = 0.012 < 0.05 Reject H0

• DPI has statistically significant effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the

fraction of the variation in Y explained by X 0<R2<1 ,

R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.1683 indicates very weak explanation power

Hypothesis test related with R2

H0: R2=0 H1: R20 F test check the hypothesis

• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic = 6.88 < F table or Pr = 0.012 < 0.05 Reject H0

• Estimated equation has some power to explain RCS figures

Graphical Evaluation of Fitand Error Terms

-200000

-100000

0

100000

200000

300000

1500000

1600000

1700000

1800000

1900000

90 91 92 93 94 95 96 97 98

Residual Actual Fitted

No seasonality but it still does not look random disturbance

Omitted Variable?Business Cycle?

Trend Models

Simple Regression ModelSpecial Case: Trend Model

t 1 0 tt b b Y

•Independent variable Time, t = 1, 2, 3,...., T-1, T

•There is no need to forecast the independent variable

•Using simple transformations, variety of nonlinear trend equations can be estimated , therefore the estimated model can mimic the pattern of the data

Suitable Data Pattern

NO SEASONALITY

ADDITIVE SEASONALITY

MULTIPLICTIVE SEASONALITY

NO TREND

ADDITIVE TREND

MULTIPLICATIVE TREND

Chapter 3 Exercise 13College Tuition Consumers' Price Index by

Quarter

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE

Holdout period

OLS EstimatesDependent Variable: FEE

Method: Least Squares

Sample: 1986:1 1994:4

Included observations: 36

Variable CoefficientStd. Error t-Statistic Prob.

C 115.7312 1.982166 58.38624 0.0000

@TREND 3.837580 0.097399 39.40080 0.0000

R-squared 0.978568 Mean dependent var 182.8889

Adjusted R-squared 0.977938 S.D. dependent var 40.87177

S.E. of regression 6.070829 Akaike info criterion 6.498820

Sum squared resid 1253.069 Schwarz criterion 6.586793

Log likelihood -114.9788 F-statistic 1552.423

Durbin-Watson stat 0.284362 Prob(F-statistic) 0.000000

tbbfee 10

e2

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit

change in X

Each year tuition increases 3.83 points. Hypothesis test related with 1

H0: 1 =0 H1: 1 0 t test is used to test the validity of H0

t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0

• t= 39,4 > t table or Pr = 0.0000 < 0.05 Reject H0

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation

in Y explained by X 0<R2<1 ,

R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.9785 indicates very weak explanation power

Hypothesis test related with R2

H0: R2=0 H1: R20 F test check the hypothesis

• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic= 1552 < F table or Pr = 0.0000 < 0.05 Reject H0

• Estimated equation has explanatory power

Graphical Evaluation of Fit

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95

FEE FEEF

Holdout period

ACTUAL FORECAST

1995 Q1 260.00 253.881995 Q2 259.00 257.721995 Q3 266.00 261.551995 Q4 274.00 265.39

Graphical Evaluation of Fitand Error Terms

-15

-10

-5

0

5

10

15

100

150

200

250

300

86 87 88 89 90 91 92 93 94

Residual Actual Fitted

Residuals exhibit clear pattern, they are not random

Also the seasonal fluctuations can not be modelled

Regression model is misspecified

Model Improvement Data may exhibit exponential trend

In this case, take the logarithm of the dependent variable

Calculate the trend by OLS After OLS estimation forecast the holdout period Take exponential of the logarithmic forecasted values in

order to reach original units

tt

tt

etbbY

tY

10

10

)ln(

)ln(

tt AeY 1

Suitable Data Pattern

NO SEASONALITY

ADDITIVE SEASONALITY

MULTIPLICTIVE SEASONALITY

NO TREND

ADDITIVE TREND

MULTIPLICATIVE TREND

Original and Logarithmic Transformed Data

4.8

5.0

5.2

5.4

5.6

5.8

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95

LFEE FEE

LOG(FEE) FEE 4.844187 127.000 4.844187 127.000 4.867534 130.000 4.912655 136.000 4.912655 136.000 4.919981 137.000 4.941642 140.000 4.976734 145.000 4.983607 146.000

OLS Estimate of the Logrithmin Trend Model

Dependent Variable: LFEE

Method: Least Squares

Sample: 1986:1 1994:4

Included observations: 36

Variable Coefficient Std. Error t-Statistic Prob.

C 4.816708 0.005806 829.5635 0.0000

@TREND 0.021034 0.000285 73.72277 0.0000

R-squared 0.993783 Mean dependent var 5.184797

Adjusted R-squared 0.993600 S.D. dependent var 0.222295

S.E. of regression 0.017783 Akaike info criterion -5.167178

Sum squared resid 0.010752 Schwarz criterion -5.079205

Log likelihood 95.00921 F-statistic 5435.047

Durbin-Watson stat 0.893477 Prob(F-statistic) 0.000000

tbb)feeln( 10

Forecast Calculations

obs FEE LFEEF FEELF=exp(LFEEF)1993:1 228.0000 5.405651 222.66101993:2 228.0000 5.426684 227.39401993:3 235.0000 5.447718 232.22761993:4 243.0000 5.468751 237.16391994:1 244.0000 5.489785 242.20521994:2 245.0000 5.510819 247.35361994:3 251.0000 5.531852 252.61141994:4 259.0000 5.552886 257.98101995:1 260.0000 5.573920 263.46481995:2 259.0000 5.594953 269.06511995:3 266.0000 5.615987 274.78451995:4 274.0000 5.637021 280.6254

Graphical Evaluation of Fitand Error Terms

-0.04

-0.02

0.00

0.02

0.044.8

5.0

5.2

5.4

5.6

86 87 88 89 90 91 92 93 94

Residual Actual Fitted

Residuals exhibit clear pattern, they are not random

Also the seasonal fluctuations can not be modelled

Regression model is misspecified

Model Improvement

In order to deal with seasonal variations remove seasonal pattern from the data

Fit regression model to seasonally adjusted data

Generate forecastsAdd seasonal movements to the forecasted

values

Suitable Data Pattern

NO SEASONALITY

ADDITIVE SEASONALITY

MULTIPLICTIVE SEASONALITY

NO TREND

ADDITIVE TREND

MULTIPLICATIVE TREND

Multiplicative Seasonal Adjustment

Included observations: 40 Ratio to Moving Average

Original Series: FEE Adjusted Series: FEESA

Scaling Factors:• 1 1.002372• 2 0.985197• 3 0.996746• 4 1.015929

Original and Seasonally Adjusted Data

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE FEESA

OLS Estimate of the Seasonally Adjusted Trend Model

Dependent Variable: FEESAMethod: Least SquaresSample: 1986:1 1995:4Included observations: 40

Variable Coefficient Std. Error t-Statistic Prob. C 115.0387 1.727632 66.58749 0.0000@TREND 3.897488 0.076240 51.12152 0.0000

R-squared 0.985668 Mean dependent var 191.0397Adjusted R-squared 0.985291 S.D. dependent var 45.89346S.E. of regression 5.566018 Akaike info criterion 6.319943Sum squared resid 1177.261 Schwarz criterion 6.404387Log likelihood -124.3989 F-statistic 2613.410

Durbin-Watson stat 0.055041 Prob(F-statistic) 0.000000

Graphical Evaluation of Fitand Error Terms

-10

-5

0

5

10

15

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95

Residual Actual Fitted

Residuals exhibit clear pattern, they are not random

There is no seasonal fluctuations

Regression model is misspecified

Model Improvement

Take the logarithm in order to remove existing nonlinearity

Use additive seasonal adjustment to logarithmic data

Apply OLS to seasonally adjusted logrithmic data Forecast holdout period Add seasonal movements to reach seasonal

forecasts Take an exponential in order to reach original

seasonal forecasts

Suitable Data Pattern

NO SEASONALITY

ADDITIVE SEASONALITY

MULTIPLICTIVE SEASONALITY

NO TREND

ADDITIVE TREND

MULTIPLICATIVE TREND

Logarithmic Transformation and Additive Seasonal Adjustment

Sample: 1986:1 1995:4

Included observations: 40

Difference from Moving Average

Original Series: LFEE =log(FEE)

Adjusted Series: LFEESA

Scaling Factors:

1 0.002216

2 -0.014944

3 -0.003099

4 0.015828

Original and Logarithmic Additive Seasonally Adjustment Series

100

150

200

250

300

4.8

5.0

5.2

5.4

5.6

5.8

86 87 88 89 90 91 92 93 94 95

FEE LFEESA

OLS Estimate of the Logarithmic Additive Seasonally Adjustment Data

Dependent Variable: LFEESA

Method: Least Squares

Sample: 1986:1 1995:4

Included observations: 40

Variable Coefficient Std. Error t-Statistic Prob.

C 4.822122 0.004761 1012.779 0.0000

@TREND 0.020618 0.000210 98.12760 0.0000

R-squared 0.996069 Mean dependent var 5.224171

Adjusted R-squared 0.995966 S.D. dependent var 0.241508

S.E. of regression 0.015340 Akaike info criterion -5.468039

Sum squared resid 0.008942 Schwarz criterion -5.383595

Log likelihood 111.3608 F-statistic 9629.026

Durbin-Watson stat 0.149558 Prob(F-statistic) 0.000000

Graphical Evaluation of Fitand Error Terms

-0.04

-0.02

0.00

0.02

0.04

4.8

5.0

5.2

5.4

5.6

5.8

86 87 88 89 90 91 92 93 94 95

Residual Actual Fitted

Residuals exhibit clear pattern, they are not random

There is no seasonal fluctuations

Regression model is misspecified

Autoregressive Model

Some cases the growth model may be more suitable to the data

If data exhibits the nonlinearity, the autoregressive model can be adjusted to model exponential pattern

model Sample eYaaY

model Population YY

t1t10t

t1t10t

model Sample e)Y(Lnaa)Y(Ln

model Population )Y(Ln)Y(Ln

t1t10t

t1t10t

OLS Estimate of Autoregressive Model

Dependent Variable: FEE

Method: Least Squares

Sample(adjusted): 1986:2 1995:4

Included observations: 39 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 0.739490 2.305654 0.320729 0.7502

FEE(-1) 1.016035 0.011884 85.49718 0.0000

R-squared 0.994964 Mean dependent var 192.7179

Adjusted R-squared 0.994828 S.D. dependent var 45.45787

S.E. of regression 3.269285 Akaike info criterion 5.256940

Sum squared resid 395.4643 Schwarz criterion 5.342251

Log likelihood -100.5103 F-statistic 7309.767

Durbin-Watson stat 1.888939 Prob(F-statistic) 0.000000

1t10t feebbfee

Graphical Evaluation of Fitand Error Terms

-10

-5

0

5

10

100

150

200

250

300

87 88 89 90 91 92 93 94 95

Residual Actual Fitted

Clear seasonal pattern

Model is misspecified

Model Improvement

To remove seasonal fluctuations Seasonally adjust the data Apply OLS to Autoregressive Trend ModelForecast seasonally adjusted dataAdd seasonal movement to forecasted

values

Dependent Variable: FEESAMethod: Least SquaresSample(adjusted): 1986:2 1995:4Included observations: 39 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 1.125315 0.811481 1.386743 0.1738FEESA(-1) 1.013445 0.004181 242.4027 0.0000

R-squared 0.999371 Mean dependent var 192.6894Adjusted R-squared 0.999354 S.D. dependent var 45.27587S.E. of regression 1.151024 Akaike info criterion 3.169101Sum squared resid 49.01968 Schwarz criterion 3.254412Log likelihood -59.79748 F-statistic 58759.08

Durbin-Watson stat 1.335932 Prob(F-statistic) 0.000000

OLS Estimate of Seasonally AdjustedAutoregressive Model

Graphical Evaluation of Fitand Error Terms

-2

0

2

4

100

150

200

250

300

87 88 89 90 91 92 93 94 95

Residual Actual Fitted

No seasonal pattern in the residuals

Model specification seems more corret than the previous estimates

Seasonal Autoregressive Model

If data exhibits sesonal fluctutions, the growth model should be remodeled

If data exhibits the nonlinearity and sesonality together, the seasonal autoregressive model can be adjusted to model exponential pattern

model Sample eYaaY

model Population YY

tst10t

tst10t

model Sample e)Y(Lnaa)Y(Ln

model Population )Y(Ln)Y(Ln

tst10t

tst10t

New Product ForecastingGrowth Curve Fitting

For new products, the main problem is typically lack of historical data.

Trend or Seasonal pattern can not be determined. Forecasters can use a number of models that generally fall in the

category called Diffusion Models. These models are alternatively called S-curves, growth models,

saturation models, or substitution curves. These models imitate life cycle of poducts. Life cycles follows a

common pattern: A period of slow growth just after introduction of new product A period of rapid growth Slowing growth in a mature phase Decline

New Product ForecastingGrowth Curve Fitting

Growth models has its own lower and upper limit.

A significant benefit of using diffusion models is to identfy and predict the timing of the four phases of the life cycle.

The usual reason for the transition from very slow initial growth to rapid growth is often the result of solutions to technical difficulties and the market’s acceptance of the new product / technology.

There are uper limits and a maturity phase occurs in which growth slows and finally ceases.

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10TIME

SALES

GOMPERTZ CURVEGompertz function is given as

where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the

curveThe Gompertz curve will range in value from

zero to L as t varies from zero to infinity.Gompertz curve is a way to summarize the

growth with a few parameters.

btaet LeY

GOMPERTZ CURVEAn Example

HDTV: LCD and Plazma TV sales figures

YEAR HDTV

2000 1200

2001 1500

2002 1770

2003 3350

2004 5500

2005 9700

2006 15000

0

2000

4000

6000

8000

10000

12000

14000

16000

00 01 02 03 04 05 06 07 08 09 10

HDTV

0

50000

100000

150000

200000

250000

300000

350000

00 05 10 15 20 25 30 35 40 45 50

HDTVF

GOMPERTZ CURVEAn Example

Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 61 iterationsHDTV=C(1)*EXP(-C(2)*EXP(-C(3)*@TREND))

Coefficient Std. Error t-Statistic Prob. C(1) 332940 850837 0.391 0.716C(2) 6.718 2.023 3.321 0.029C(3) 0.128 0.087 1.477 0.214

R-squared 0.992 Mean dependent var 5431.429Adjusted R-squared 0.988 S.D. dependent var 5178.199S.E. of regression 559.922 Akaike info criterion 15.791Sum squared resid1254049 Schwarz criterion 15.76782Log likelihood -52.26849 Durbin-Watson stat 0.704723

LOGISTICS CURVE Logistic function is given as

where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the curve

The Logistic curve will range in value from zero to L as t varies from zero to infinity.

The Logistic curve is symetric about its point of inflection. The Gompertz curve is not necessarily symmetric.

btt ae

LY

1

LOGISTICS or GOMPERTZ CURVES ?

The answer lies in whether, in a particular situation, it is easier to achieve the maximum value the closer you get to it, or whether it becomes more difficult to attain the maximum value the closer you get to it. Are there factors assisting the attainment of the maximum

value once you get close to it, or Are there factors preventing the attainment of the maximum

value once it is nearly attained? If there is an offsetting factor such that growth is more

difficult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.

If there are no such offsetting factors hindering than attainment of the maximum value, the logistics curve will be the best choice.

0

40000

80000

120000

160000

00 05 10 15 20 25 30 35 40 45 50

HDTV HDTVF

LOGISTICS CURVEAn Example

HDTV: LCD and Plazma TV sales figures

YEAR HDTV2000 1200

2001 1500

2002 1770

2003 3350

2004 5500

2005 9700

2006 15000Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 1 iterationHDTV=C(1)/(1+C(2)*EXP(-C(3)*@TREND))

Coefficient Std. Error t-Statistic Prob. C(1) 149930.000 350258.500 0.428 0.691C(2) 199.182 432.110 0.461 0.669C(3) 0.517 0.073 7.048 0.002

R-squared 0.997 Mean dependent var #######Adjusted R-squared 0.995 S.D. dependent var #######S.E. of regression 370.451 Akaike info criterion 14.965Sum squared resid 548936 Schwarz criterion 14.942Log likelihood -49.377 Durbin-Watson stat 1.632

0

50000

100000

150000

200000

250000

300000

350000

00 05 10 15 20 25 30 35 40 45 50

HD TV F _G HD TV F _L

LOGISTICS versus GOMPERTZ CURVES

FORECASTING WITH FORECASTING WITH

MULTIPLE REGRESSION MULTIPLE REGRESSION

MODELSMODELS

BUSINESS FORECASTINGBUSINESS FORECASTING

CONTENT DEFINITION INDEPENDENT VARIABLE

SELECTION,FORECASTING WITH MULTIPLE REGRESSION MODEL

STATISTICAL EVALUATION OF THE MODEL SERIAL CORRELATION SEASONALITY TREATMENT GENERAL AUTOREGRESSIVE MODEL ADVICES EXAMPLES....

MULTIPLE REGRESSION MODEL

DEPENDENT VARIABLE, Y, IS A FUNCTION OF MORE THAN ONE INDEPENDENT VARIABLE, X1, X2,..Xk

eXb...XbXbbY

REGRESSION SAMPLE -FORM LINEAR

X...XXY

REGRESSION POPULATION -FORM LINEAR

)X,...X,X(fY

FORM GENERAL

kk22110

kk22110

k21

SELECTING INDEPENDENT VARIABLES

FIRST, DETERMINE DEPENDENT VARIABLE SEARCH LITERATURE, USE COMMONSENSE AND

LIST THE MAIN POTENTIAL EXPLANATORY VARIABLES

IF TWO VARIABLE SHARE THE SAME INFORMATION SUCH AS GDP AND GNP SELECT THE MOST RELEVANT ONE

IF A VARITION OF A VARIABLE IS VERY LITTLE, FIND OUT MORE VARIABLE ONE

SET THE EXPECTED SIGNS OF THE PARAMETERS TO BE ESTIMATED

AN EXAMPLE: SELECTING INDEPENDENT VARIABLES

LIQUID PETROLIUM GAS-LPG- MARKET SIZE FORECAST

POTENTIAL EXPLANATORY VARIABLES POPULATION PRICE URBANIZATION RATIO GNP or GDP

EXPECTATIONS

0 0 0 0

PRICEURGDPPOPLPG

4321

tt4t3t2t10t

2kk210

2^

2

^

kk210

^

kk210

)Xb ....X2bX1bbY(

)Y-Y(emin

OLS-Estimate SquareLeast Ordinary

Y-Ye

Error term

Xb ....X2bX1bbY

Model Regression Sample

X ....X2X1Y

Model Regression Population

PARAMETER ESTIMATES-OLS ESTIMATION

IT IS VERY COMPLEX TO CALCULATE b’s, MATRIX ALGEBRA IS USED TO ESTIMATE b’s.

FORECASTING WITH MULTIPLE REGRESSION MODEL

Ln(SALESt) = 23 + 1.24*Ln(GDPt) - 0.90*Ln(PRICEt)

IF GDP INCREASES 1%, SALES INCRESES 1.24% IF PRICE INCREASES 1% SALES DECRAESES 0.9% PERIOD GDP PRICE SALES

100 1245 100 230

101 1300 103 ?

Ln(SALESt) = 23 + 1.24*Ln(1300) - 0.90*Ln(103)

Ln(SALESt) = 3.63

e3.63 = 235

EXAMPLE : LPG FORECAST

0

200000

400000

600000

800000

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

TUPSATAY

LOGARITHMIC TRANSFORMATION

10.5

11.0

11.5

12.0

12.5

13.0

13.5

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

LSATA

SCATTER DIAGRAM

10

11

12

13

14

13.6 13.8 14.0 14.2 14.4 14.6

LGNP

LS

ATA

10

11

12

13

14

0 2 4 6 8 10 12

LP

LS

ATA

UNEXPECTED RELATION

LSATA=f(LGNP)

Dependent Variable: LSATAMethod: Least SquaresSample: 1968 1997Included observations: 30

Variable Coefficient Std. Error t-Statistic Prob. C -44.91150 3.097045 -14.50140 0.0000LGNP 4.081938 0.220265 18.53195 0.0000

R-squared 0.924616 Mean dependent var 12.47858Adjusted R-squared 0.921924 S.D. dependent var 0.736099S.E. of regression 0.205681 Akaike info criterion -0.260637Sum squared resid 1.184535 Schwarz criterion -0.167224 Log likelihood 5.909555 F-statistic 343.4333

Durbin-Watson stat 0.485414 Prob(F-statistic) 0.000000

LGNPLSATA 10

Graphical Evaluation of Fitand Error Terms

-0.6

-0.4

-0.2

0.0

0.2

0.4

10

11

12

13

14

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

Residual Actual Fitted

NOT RANDOM

LSATA=f(LP)

Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 11.70726 0.081886 142.9694 0.0000LP 0.190128 0.015096 12.59492 0.0000

R-squared 0.854551 Mean dependent var 12.53724Adjusted R-squared 0.849164 S.D. dependent var 0.674006S.E. of regression 0.261768 Akaike info criterion 0.223756Sum squared resid 1.850107 Schwarz criterion 0.318052Log likelihood -1.244459 F-statistic 158.6319Durbin-Watson stat 0.187322 Prob(F-statistic) 0.000000

LPLSATA 10

Graphical Evaluation of Fitand Error Terms

-1.0

-0.5

0.0

0.5

11

12

13

14

70 72 74 76 78 80 82 84 86 88 90 92 94 96

Residual Actual Fitted

NOT RANDOM

LSATA=f(LGNP,LP)

Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C -30.808410 7.715902 -3.992846 0.0005LGNP 3.066655 0.556533 5.510284 0.0000LP 0.045318 0.028281 1.602436 0.1211

R-squared 0.932905 Mean dependent var 12.53724Adjusted R-squared 0.927744 S.D. dependent var 0.674006S.E. of regression 0.181176 Akaike info criterion -0.480999Sum squared resid 0.853443 Schwarz criterion -0.339555Log likelihood 9.974488 F-statistic 180.7558Durbin-Watson stat 0.364799 Prob(F-statistic) 0.000000

LPLGNPLSATA 210

Graphical Evaluation of Fitand Error Terms

-0.4

-0.2

0.0

0.2

0.4

11

12

13

14

70 72 74 76 78 80 82 84 86 88 90 92 94 96

Residual Actual Fitted

NOT RANDOM

WHAT IS MISSING?

GNP AND PRICE ARE THE MOST IMPORTANT VARIABLES BUT THE COEFFICIENT OF THE PRICE IS NOT SIGNIFICANT AND HAS UNEXPECTED SIGN

RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?

WRONG FUNCTION-NONLINEAR MODEL? LACK OF DYNAMIC MODELLING? MISSING IMPORTANT VARIABLE?

• POPULATION?

Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C -50.913420 3.992134 -12.75343 0.0000LGNP 0.755445 0.337894 2.235746 0.0345LP -0.131508 0.021528 -6.108568 0.0000LPOP 4.955945 0.486887 10.17885 0.0000

R-squared 0.986958 Mean dependent var 2.53724Adjusted R-squared 0.985393 S.D. dependent var 0.674006S.E. of regression 0.081461 Akaike info criterion -2.049934Sum squared resid 0.165899 Schwarz criterion -1.861342Log likelihood 33.72405 F-statistic 630.6084Durbin-Watson stat 0.398661 Prob(F-statistic) 0.000000

LSATA=f(LGNP,LP,LPOP)

LPOPLPLGNPLSATA 3210

Graphical Evaluation of Fitand Error Terms

-0.2

-0.1

0.0

0.1

0.2

11.0

11.5

12.0

12.5

13.0

13.5

70 72 74 76 78 80 82 84 86 88 90 92 94 96

Residual Actual Fitted

NOT RANDOM

WHAT IS MISSING?

GNP, POPULATION AND PRICE ARE THE MOST IMPORTANT VARIABLES.• THEY ARE SIGNIFICANT • THEY HAVE EXPECTED SIGN

RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?

• WRONG FUNCTION-NONLINEAR MODEL?

• LACK OF DYNAMIC MODELLING? YES.

• MISSING IMPORTANT VARIABLE? YES, URBANIZATION

Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C -16.185910 3.832897 -4.222893 0.0003LGNP 0.523657 0.150971 3.468585 0.0020LP -0.033964 0.013483 -2.518934 0.0188LPOP 1.279753 0.419566 3.050182 0.0055LSATA(-1) 0.619986 0.060756 10.20446 0.0000

R-squared 0.997557 Mean dependent var 2.53724Adjusted R-squared 0.997150 S.D. dependent var 0.674006S.E. of regression 0.035983 Akaike info criterion -3.655968Sum squared resid 0.031074 Schwarz criterion -3.420227Log likelihood 58.01154 F-statistic 2450.048Durbin-Watson stat 2.118752 Prob(F-statistic) 0.000000

LSATA=f(LGNP,LP,LPOP,LSATAt-1)

1t4t3t2t10t LSATALPOPLPLGNPLSATA

Graphical Evaluation of Fitand Error Terms

-0.10

-0.05

0.00

0.05 11.0

11.5

12.0

12.5

13.0

13.5

14.0

70 72 74 76 78 80 82 84 86 88 90 92 94 96

Residual Actual Fitted

RANDOM

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y

per unit change in X

When the GNP increases 1%, the volume of LPG sales increases 0.52%.

Hypothesis test related with 1 • H0: 1 =0• H1: 1 0• t test is used to test the validity of H0

• t = 1/se(1) If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 If t statistic < t table Do not reject H0 or Pr > Do not reject H0

t= 3,46 < t table or Pr = 0.002 < 0.05 Reject H0

GNP has effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the

fraction of the variation in Y explained by X 0<R2<1 ,

• R2 = 0 indicates no explanatory power of X-the equation.• R2 = 1 indicates perfect explanation of Y by X-the equation.• R2 = 0.9975 indicates very strong explanation power

Hypothesis test related with R2

• H0: R2=0• H1: R20• F test check the hypothesis

If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 If F statistic < F table Do not reject H0 or Pr > Do not reject H0 F-statistic=2450 < F table or Pr = 0.0000 < 0.05 Reject H0

Estimated equation has power to explain RCS figures

SHORT AND LONG TERM IMPACTS

If we specify a dynamic model, we can estimate short and a long term impact of independent variables simultaneously on the dependent variable

x)1()1(

y

xy)1(

xyy

yxy

yxy

2

1

2

0

t102

t102

2t10

1t2t10t

yyy

conditions mequilibriu in

1-tt

Short term effect of x

Long term effect of x

AN EXAMPLE: SHORT AND LONG TERM IMPACTS

Short Term Impact Long Term Impact

LGNP 0.523657 1.3778

LP -0.033964 -0.0892

LPOP 1.279753 3.3657

If GNP INCREASES 1% AT TIME t, THE LPG SALES INCREASES 0.52% AT TIME t

IN THE LONG RUN, WITHIN 3-5 YEARS, THE LPG SALES INCREASES 1.38%

SESONALITY AND MULTIPLE REGRESSION MODEL

SEASONAL DUMMY VARIABLES CAN BE USED TO MODEL SEASONAL PATTERNS

DUMMY VARIABLE IS A BINARY VARIABLE THAT ONLY TAKES THE VALUES 0 AND 1.

DUMMY VARIABLES RE THE INDICATOR VARIABLES, IF THE DUMMY VARIABLE TAKES 1 IN A GIVEN TIME, IT MEANS THAT SOMETHING HAPPENS IN THAT PERIOD.

SEASONAL DUMMY VARIABLES THE SOMETHING CAN BE SPECIFIC SEASON THE DUMMY VARIABLE INDICATES THE SPECIFIC SEASON D1 IS A DUMMY VARIABLE WHICH INDICATES THE FIRST QUARTERS

» 1990Q1 1» 1990Q2 0» 1990Q3 0» 1990Q4 0» 1991Q1 1» 1991Q2 0» 1991Q3 0» 1991Q4 0» 1992Q1 1» 1992Q2 0» 1992Q3 0» 1992Q4 0

BASE PERIOD

DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0 1990 Q1 1 0 0 1991 Q2 0 1 0 1991 Q3 0 0 1 1991 Q4 0 0 0 1992 Q1 1 0 0 1992 Q2 0 1 0 1992 Q3 0 0 1 1992 Q4 0 0 0

FULL SEASONAL DUMMY VARIABLE REPRESANTATION

COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE

COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER

QUARTERLY DATA THEREFORE 3 DUMMY VARIABLES WILL BE SUFFICIENT TO CAPTURE THE SEASONAL PATTERN

DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0

SEASONAL PATTERN MODELLED

COLLEGE TUITION PRICE INDEX TREND ESTIMATION

Dependent Variable: LOG(FEE)

Method: Least Squares

Sample(adjusted): 1986:3 1995:4

Included observations: 38 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 4.832335 0.006948 695.4771 0.0000

@TREND 0.020780 0.000232 89.57105 0.0000

D1 -0.011259 0.007202 -1.563344 0.1275

D1(-1) -0.029526 0.007198 -4.101948 0.0003

D1(-2) -0.017082 0.007010 -2.436806 0.0204

R-squared 0.995921 Mean dependent var 5.244170

Adjusted R-squared 0.995427 S.D. dependent var 0.231661

S.E. of regression 0.015666 Akaike info criterion -5.352558

Sum squared resid 0.008099 Schwarz criterion -5.137087

Log likelihood 106.6986 F-statistic 2014.429

Durbin-Watson stat 0.161634 Prob(F-statistic) 0.000000

2t41t3t210t 1D1D1DtLFEE

Graphical Evaluation of Fitand Error Terms

-0.04

-0.02

0.00

0.02

0.04

4.8

5.0

5.2

5.4

5.6

5.8

87 88 89 90 91 92 93 94 95

Residual Actual Fitted

NOT RANDOM

COLLEGE TUITION PRICE INDEX AUTOREGRESSIVE TREND

ESTIMATIONDependent Variable: LOG(FEE)Method: Least SquaresSample(adjusted): 1986:3 1995:4Included observations: 38 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.050887 0.022969 2.215524 0.0337LOG(FEE(-1)) 0.997510 0.004375 227.9958 0.0000D1 -0.031634 0.002833 -11.16704 0.0000D1(-1) -0.035335 0.002833 -12.47301 0.0000D1(-2) -0.006775 0.002761 -2.454199 0.0196

R-squared 0.999368 Mean dependent var 5.244170Adjusted R-squared 0.999292 S.D. dependent var 0.231661S.E. of regression 0.006165 Akaike info criterion -7.217678Sum squared resid 0.001254 Schwarz criterion -7.002206Log likelihood 142.1359 F-statistic 13051.60

Durbin-Watson stat 1.605178 Prob(F-statistic) 0.000000

2t41t3t21t10t 1D1D1DLFEELFEE

Graphical Evaluation of Fitand Error Terms

-0.02

-0.01

0.00

0.01

0.02

4.8

5.0

5.2

5.4

5.6

5.8

87 88 89 90 91 92 93 94 95

Residual Actual Fitted

RANDOM

SEASONAL PART OF THE MODEL

DYNAMIC PART OF THE MODEL

COLLEGE TUITION PRICE INDEX GENERALIZED AUTOREGRESSIVE TREND

ESTIMATIONDependent Variable: LFEEMethod: Least SquaresSample(adjusted): 1987:1 1995:4Included observations: 36 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 0.048752 0.024114 2.021760 0.0529

LFEE(-1) 1.126366 0.182970 6.156010 0.0000

LFEE(-2) 0.292152 0.256488 1.139051 0.2643

LFEE(-3) -0.344963 0.253185 -1.362491 0.1839

LFEE(-4) -0.076855 0.181751 -0.422857 0.6756

D1 -0.043879 0.005597 -7.840118 0.0000

D1(-1) -0.048562 0.010241 -4.742040 0.0001

D1(-2) -0.005369 0.009855 -0.544814 0.5902R-squared 0.999502 Mean dependent var 5.263841Adjusted R-squared 0.999377 S.D. dependent var 0.221681S.E. of regression 0.005532 Akaike info criterion -7.363447Sum squared resid 0.000857 Schwarz criterion -7.011554Log likelihood 140.5420 F-statistic 8025.362

Durbin-Watson stat 1.892211 Prob(F-statistic) 0.000000

2t41t3t2iti

s

1i

0t 1D1D1DLFEELFEE

GAP SALES FORECAST

11

12

13

14

15

16

0

1000000

2000000

3000000

4000000

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

LSALES SALES

SIMPLE AUTOREGRESSIVE REGRESSION MODEL

Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:2 1999:4Included observations: 59 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.613160 0.484163 1.266433 0.2105LSALES(-1) 0.958714 0.036128 26.53623 0.0000

R-squared 0.925115 Mean dependent var 13.43549Adjusted R-squared 0.923802 S.D. dependent var 0.848687S.E. of regression 0.234272 Akaike info criterion -0.031358Sum squared resid 3.128350 Schwarz criterion 0.039067Log likelihood 2.925062 F-statistic 704.1714

Durbin-Watson stat 2.159164 Prob(F-statistic) 0.000000

SEASONALITY IS NOT MODELLED

1t0t LSALESLSALES

Graphical Evaluation of Fitand Error Terms

-0.6

-0.4

-0.2

0.0

0.2

0.4

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99

Residual Actual Fitted

NOT RANDOM

AUTOREGRESSIVE REGRESSION MODEL WITH SEASONAL DUMMIES

Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:3 1999:4Included observations: 58 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.299734 0.111564 2.686656 0.0096LSALES(-1) 0.994473 0.008213 121.0873 0.0000D1 -0.547251 0.018685 -29.28766 0.0000D1(-1) -0.175405 0.018732 -9.364126 0.0000D1(-2) 0.033281 0.018458 1.803073 0.0771

R-squared 0.996547 Mean dependent var 13.46547Adjusted R-squared 0.996287 S.D. dependent var 0.823972S.E. of regression 0.050210 Akaike info criterion -3.062940Sum squared resid 0.133616 Schwarz criterion -2.885316Log likelihood 93.82526 F-statistic 3824.335

Durbin-Watson stat 1.828642 Prob(F-statistic) 0.000000

2t41t3t21t0t 1D1D1DLSALESLFEE

Graphical Evaluation of Fitand Error Terms

-0.2

-0.1

0.0

0.1

0.2

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99

Residual Actual Fitted

RANDOM

ALTERNATIVE SEASONAL MODELLING

FOR NONSEASONAL DATA, THE AUTOREGRESSIVE MODEL CAN BE WRITTEN AS

IF THE LENGTH OF THE SEASONALITY IS S, THE SESONAL AUTOREGRESSIVE MODEL CAN BE WRITTEN AS

st10t yy

1t10t yy

SEASONAL LAGGED AUTOREGRESSIVE REGRESSION

MODELDependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1986:1 1999:4Included observations: 56 after adjusting endpoints

Variable CoefficientStd. Error t-Statistic Prob. C 0.329980 0.169485 1.946953 0.0567LSALES(-4) 0.990877 0.012720 77.89949 0.0000

R-squared 0.991180 Mean dependent var 3.50893Adjusted R-squared 0.991016 S.D. dependent var 0.804465S.E. of regression 0.076248 Akaike info criterion 2.274583Sum squared resid 0.313945 Schwarz criterion -2.202249Log likelihood 65.68834 F-statistic 6068.330

Durbin-Watson stat 0.434696 Prob(F-statistic) 0.000000

4t10t LSALESLSALES

Graphical Evaluation of Fitand Error Terms

-0.2

-0.1

0.0

0.1

0.2

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99

Residual Actual Fitted

top related