lecture 18

107
Demand Estimation Demand Estimation DEMAND FORECASTING DEMAND FORECASTING

Upload: rkdharmani5305

Post on 18-Nov-2014

794 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 18

Demand EstimationDemand Estimation

DEMAND FORECASTINGDEMAND FORECASTING

Page 2: Lecture 18

OVERVIEWOVERVIEW

Demand Curve Estimation Identification Problem Interview and Experimental Methods Regression Analysis Measuring Regression Model Significance Measures of Individual Variable Significance Demand/Sales/Revenue/Profit Forecasting

Methods: Single Equation Regression Models, Simultaneous Equation Regression Models, Autoregressive Integrated Moving Average (ARIMA)

Models, and Vector Autoregressive (VAR) Models

Page 3: Lecture 18

KEY CONCEPTSKEY CONCEPTS

simultaneous relation identification problem consumer interview market experiments regression analysis deterministic relation statistical relation time series cross section scatter diagram linear model multiplicative model simple regression model

multiple regression model standard error of the

estimate (SEE) correlation coefficient coefficient of

determination degrees of freedom corrected coefficient of

determination F statistic t statistic two-tail t tests one‑tail t tests

Page 4: Lecture 18

Demand Curve Estimation

Simple Linear Demand Curves The best estimation method balances

marginal costs and marginal benefits. Simple linear relations are useful for

demand estimation. Using Simple Linear Demand Curves

Straight-line relations give useful approximations.

Page 5: Lecture 18

Identification Problem

Changing Nature of Demand Relations Demand relations are dynamic.

Interplay of Supply and Demand Economic conditions affect demand and

supply. Shifts in Demand and Supply

Curve shifts can be estimated. Simultaneous Relations

Page 6: Lecture 18
Page 7: Lecture 18

Interview and Experimental Methods

Consumer Interviews Interviews can solicit useful information

when market data is scarce. Interview opinions often differ from

actual market transaction data. Market Experiments

Controlled experiments can generate useful insight.

Experiments can become expensive.

Page 8: Lecture 18

Regression Analysis What Is a Statistical Relation?

A statistical relation exists when averages are related. A deterministic relation is true by definition.

Specifying the Regression Model Dependent Variable/Explained

Variable/Predictand/Regressand/Response/Endogenous Variable

Explainatory Variable/Independent Variable/Predictor/Regressor/Stimulus or Control Variable/Exogenous Variable

Dependent variable Y is caused by X. X variables are independently determined from Y.

Least Squares Method Minimize sum of squared residuals.

Page 9: Lecture 18
Page 10: Lecture 18
Page 11: Lecture 18

Measuring Regression Model Significance

Standard Error of the Estimate (SEE) increases with scatter about the regression line.

Page 12: Lecture 18

Goodness of Fit, r and R2

r = 1 means perfect correlation; r = 0 means no correlation.

R2 = 1 means perfect fit; R2 = 0 means no relation.

Corrected Coefficient of Determination, R2 Adjusts R2 downward for small samples.

Page 13: Lecture 18
Page 14: Lecture 18

F statistic

Tells if R2 is statistically significant.

Page 15: Lecture 18

Measures of Individual Variable Significance

t statistics t statistics compare a sample characteristic to

the standard deviation of that characteristic. A calculated t statistic more than two suggests a

strong effect of X on Y (95 % confidence). A calculated t statistic more than three suggests

a very strong effect of X on Y (99 % confidence). Two-tail t Tests

Tests of effect. One-Tail t Tests

Tests of magnitude or direction.

Page 16: Lecture 18
Page 17: Lecture 18

Demand Estimation Demand Estimation

What will happen to quantity What will happen to quantity demanded, total revenue and profit if demanded, total revenue and profit if we increase prices?we increase prices?

What will happen to demand if What will happen to demand if consumer incomes increase or consumer incomes increase or decrease due to an economic decrease due to an economic expansion or contraction?expansion or contraction?

What affect will a tuition increase What affect will a tuition increase have on Marquette’s revenue?have on Marquette’s revenue?

Page 18: Lecture 18

Practical Example: Port Authority Practical Example: Port Authority Transit CaseTransit Case

How will the fare How will the fare price increase price increase affect demand affect demand and overall and overall revenues?revenues?

What other What other factors, besides factors, besides fares, affect fares, affect demand?demand?

Page 19: Lecture 18

Demand Estimation Using Market Demand Estimation Using Market Research TechniquesResearch Techniques

How do we estimate the Demand How do we estimate the Demand Function?Function? Econometric Techniques (Your Project)Econometric Techniques (Your Project) Non-econometric TechniquesNon-econometric Techniques

Look first at Non-econometric Look first at Non-econometric ApproachesApproaches

What are these?What are these?

Page 20: Lecture 18

Consumer Surveys: Just Ask Consumer Surveys: Just Ask ThemThem

Question customers to Question customers to estimate demandestimate demand ““How many bags of How many bags of

chips would you buy if chips would you buy if the price was Rs. the price was Rs. 2.29/bag?”2.29/bag?”

““How many cases of How many cases of beer would you buy if beer would you buy if the price of beer was the price of beer was Rs. 11.99/case?”Rs. 11.99/case?”

Compare different Compare different individuals’ responsesindividuals’ responses

Advantages:Advantages: FlexibleFlexible Relatively Relatively

inexpensive to inexpensive to conductconduct

DisadvantagesDisadvantages Many potential Many potential

biasesbiases StrategicStrategic InformationInformation HypotheticalHypothetical InterviewerInterviewer

Page 21: Lecture 18

Market ExperimentsMarket Experiments

Firms vary prices and/or advertising and Firms vary prices and/or advertising and compare consumer behaviorcompare consumer behavior Over time Over time (e.g., before and after rebate offer)(e.g., before and after rebate offer) Over space Over space (e.g., compare Delhi and Haryana (e.g., compare Delhi and Haryana

consumption when prices are varied between consumption when prices are varied between two regions)two regions)

Potential ProblemsPotential Problems Control of other factors not guaranteed.Control of other factors not guaranteed. ““Playing” with market prices may be risky.Playing” with market prices may be risky. ExpensiveExpensive

Page 22: Lecture 18

Consumer Clinics and Focus Consumer Clinics and Focus GroupsGroups

Simulated market Simulated market setting in which setting in which consumers are consumers are given income to given income to spend on a variety spend on a variety of goods of goods

The experimenters The experimenters control income, control income, prices, advertising, prices, advertising, packaging, etc.packaging, etc.

AdvantagesAdvantages FlexibilityFlexibility

DisadvantagesDisadvantages Selectivity biasSelectivity bias Very expensiveVery expensive

Page 23: Lecture 18

EconometricsEconometrics

““Economic Measurement”Economic Measurement” Collection of statistical techniques Collection of statistical techniques

available for testing economic available for testing economic theories by empirically measuring theories by empirically measuring relationships among economic relationships among economic variables.variables.

Quantify economic reality – bridge Quantify economic reality – bridge the gap between abstract theory and the gap between abstract theory and real world human activity.real world human activity.

Page 24: Lecture 18

Practical ExamplePractical Example

How does the state How does the state of Delhi set a of Delhi set a budget?budget?

What is the What is the process?process?

Page 25: Lecture 18

The Econometric Modeling The Econometric Modeling ProcessProcess

1.1. Specification of the theoretical Specification of the theoretical modelmodel

2.2. Identification of the variablesIdentification of the variables

3.3. Collection of the dataCollection of the data

4.4. Estimation of the parameters of Estimation of the parameters of the model and their interpretationthe model and their interpretation

5.5. Development of forecasts Development of forecasts (estimates) based on the model(estimates) based on the model

Page 26: Lecture 18

Numbers Instead of Symbols!Numbers Instead of Symbols!

Normal model of consumer demandNormal model of consumer demand Q = f(P, PQ = f(P, Pss, I, Idd)) Q = quantity demanded of good, P = Q = quantity demanded of good, P =

good price, Pgood price, Pss = price of substitute = price of substitute good, Igood, Idd = disposable income = disposable income

Econometrics allows us to estimate Econometrics allows us to estimate the relationship between Q and P, Pthe relationship between Q and P, Pss and Iand Id d based on past data for these based on past data for these variablesvariables

Page 27: Lecture 18

Q = 31.5 – 0.73P + 0.11PQ = 31.5 – 0.73P + 0.11Pss + 0.23Y + 0.23Ydd

Instead of just expecting Q to “increase” Instead of just expecting Q to “increase” if there is an increase in Iif there is an increase in Idd – we estimate – we estimate that Q will increase by 0.23 units per 1 that Q will increase by 0.23 units per 1 dollar of increased disposable incomedollar of increased disposable income

0.23 is called an estimated regression 0.23 is called an estimated regression coefficientcoefficient

The ability to estimate these coefficients The ability to estimate these coefficients is what makes econometrics usefulis what makes econometrics useful

Page 28: Lecture 18

Regression AnalysisRegression Analysis

One econometric approachOne econometric approach Most popular among economists, Most popular among economists,

business analysis and social sciencesbusiness analysis and social sciences Allows quantitative estimates of Allows quantitative estimates of

economic relationships that economic relationships that previously had been completely previously had been completely theoreticaltheoretical

Answer “what if” questionsAnswer “what if” questions

Page 29: Lecture 18

Regression Analysis ContinuedRegression Analysis Continued Regression analysis is a statistical technique Regression analysis is a statistical technique

that attempts to “explain” movements in one that attempts to “explain” movements in one variable, the dependent variable, as a function variable, the dependent variable, as a function of movements in a set of other variables, called of movements in a set of other variables, called the independent (or explanatory) variables, the independent (or explanatory) variables, through the quantification of a single equation.through the quantification of a single equation.

Q = f(P, PQ = f(P, Pss, Y, Ydd)) Q = dependent variableQ = dependent variable P, PP, Ps s , Y, Yd d = independent variables= independent variables Deals with the frequent questions of cause and Deals with the frequent questions of cause and

effect in businesseffect in business

Page 30: Lecture 18

What is Regression Really What is Regression Really Doing?Doing?

Regression is the fitting of curves to Regression is the fitting of curves to data.data.

P

Q

More later!

Page 31: Lecture 18

Gathering DataGathering Data

Once the model is specified, we must Once the model is specified, we must collect data.collect data. Time-series dataTime-series data

e.g., sales for my company over time.e.g., sales for my company over time. What most of you will be using in your What most of you will be using in your

projects.projects. Cross-sectional dataCross-sectional data

e.g., sales of 10 companies in the food e.g., sales of 10 companies in the food processing industry at one point in time.processing industry at one point in time.

Panel Data/Longitudinal Data/Pooled DataPanel Data/Longitudinal Data/Pooled Data e.g., sales of 10 companies in the food e.g., sales of 10 companies in the food

processing industry at various point in time.processing industry at various point in time.

Page 32: Lecture 18

Garbage In, Garbage OutGarbage In, Garbage Out

Your empirical estimates will be Your empirical estimates will be only as reliable as your data.only as reliable as your data. Look at the two quotes from Stamp Look at the two quotes from Stamp

and Valavanis that follow.and Valavanis that follow. You will want to take particular You will want to take particular

care in developing your databases.care in developing your databases.

Page 33: Lecture 18

Sir Josiah StampSir Josiah Stamp“Some Economic Factors in Modern Life”“Some Economic Factors in Modern Life”

The government are very keen on amassing The government are very keen on amassing statistics. They collect them, add them, statistics. They collect them, add them, raise them to the n’th power, take the cube raise them to the n’th power, take the cube root and prepare wonderful diagrams. But root and prepare wonderful diagrams. But you must never forget that every one of you must never forget that every one of those figures comes in the first instance those figures comes in the first instance from the village watchman, who just puts from the village watchman, who just puts down what he damn well pleases.down what he damn well pleases.

Moral:Moral: Know where your data comes from! Know where your data comes from!

Page 34: Lecture 18

ValavanisValavanis

““Econometric theory is like an Econometric theory is like an exquisitely balanced French exquisitely balanced French recipe, spelling out precisely with recipe, spelling out precisely with how many turns to mix the sauce, how many turns to mix the sauce, how many carats of spice to add, how many carats of spice to add, and for how many milliseconds to and for how many milliseconds to bake the mixture at exactly 474 bake the mixture at exactly 474 degrees of temperature.”degrees of temperature.”

Page 35: Lecture 18

Valavanis - continuedValavanis - continued ““But when the statistical cook turns to But when the statistical cook turns to

raw materials, he finds that hearts of raw materials, he finds that hearts of cactus fruit are unavailable, so he cactus fruit are unavailable, so he substitutes chunks of cantaloupe; where substitutes chunks of cantaloupe; where the recipe calls for vermicelli he uses the recipe calls for vermicelli he uses shredded wheat; and he substitutes shredded wheat; and he substitutes green garment dye for curry, ping-pong green garment dye for curry, ping-pong balls for turtle’s eggs, and, for balls for turtle’s eggs, and, for Chaligougnac vintage 1883, a can of Chaligougnac vintage 1883, a can of turpentine.”turpentine.”

Moral: Moral: Be careful in your choice of proxy Be careful in your choice of proxy variablesvariables

Page 36: Lecture 18

Economic DataEconomic Data You are in the process of gathering You are in the process of gathering

economic data.economic data. Some will come from your firm.Some will come from your firm. Some may come from trade Some may come from trade

publications.publications. Some will come from the government.Some will come from the government. Must be of the same time scale Must be of the same time scale

(monthly, quarterly, yearly, etc.)(monthly, quarterly, yearly, etc.)

Page 37: Lecture 18

Always be SkepticalAlways be Skeptical Always approach your data with a Always approach your data with a

critical eye. critical eye. Remember the quotes Remember the quotes Just because something appears in a Just because something appears in a

table somewhere, does not mean it is table somewhere, does not mean it is necessarily correct.necessarily correct.

Government data revisions. Government data revisions. Does your data pass the “smell test”? Does your data pass the “smell test”?

Page 38: Lecture 18

How to Begin the Data How to Begin the Data ExerciseExercise

First question you should ask yourself is:First question you should ask yourself is: ““If money were no object, what would be If money were no object, what would be

the perfect data for my demand model?”the perfect data for my demand model?” From that basis, you can then start From that basis, you can then start

finding what actual data you can get finding what actual data you can get your hands on.your hands on. There will be compromises that you have to There will be compromises that you have to

make. These are called proxy variables!make. These are called proxy variables! Remember the Valavanis quoteRemember the Valavanis quote..

Page 39: Lecture 18

How to Choose a Good ProxyHow to Choose a Good Proxy

Proxy variables should be variables Proxy variables should be variables whose movements closely mirror the whose movements closely mirror the desired variable for which you do not desired variable for which you do not have a measure.have a measure.

For example: Tastes of consumers are For example: Tastes of consumers are difficult to measure. difficult to measure. May use a time trend variable if you suspect May use a time trend variable if you suspect

these are changing over time.these are changing over time. May include demographic characteristics of May include demographic characteristics of

the population.the population.

Page 40: Lecture 18

Dummy VariablesDummy Variables

Binary VariableBinary Variable Take on a “1” or a “0”Take on a “1” or a “0” Example: Trying to model salariesExample: Trying to model salaries 1 if you have a college degree, 0 if 1 if you have a college degree, 0 if

you don’tyou don’t Example: Model effect of Harley-Example: Model effect of Harley-

Davidson reunion years on demandDavidson reunion years on demand 1 for reunion years, 0 otherwise1 for reunion years, 0 otherwise

Page 41: Lecture 18

Back to Regression AnalysisBack to Regression Analysis Theoretical Model: Y = Theoretical Model: Y = 00 + + 11X + X + Y is dependent variableY is dependent variable X is independent variableX is independent variable Linear Equation (no powers greater than 1)Linear Equation (no powers greater than 1) ’’s are coefficients – determine coordinates of s are coefficients – determine coordinates of

the straight line at any pointthe straight line at any point 0 0 is the constant term – value of Y when X is 0 is the constant term – value of Y when X is 0

(more on this later - no economic meaning but (more on this later - no economic meaning but required)required)

1 1 is the slope term – amount Y will change when is the slope term – amount Y will change when X increases by one unit (can be X increases by one unit (can be 2 … 2 … nn) holds all ) holds all other other ’s constant (except those not in model!)’s constant (except those not in model!)

More about More about , the error term, later, the error term, later

Page 42: Lecture 18

Graphical Representation of Graphical Representation of Regression CoefficientsRegression Coefficients

Regression Line

Y = 0 + 1X

XY

0

Y

X

1

Page 43: Lecture 18

The Error TermThe Error Term

Y = Y = 00 + + 11X + X + is purely theoreticalis purely theoretical Stochastic Error Term Needed Because:Stochastic Error Term Needed Because: Minor influences on Y are omitted from Minor influences on Y are omitted from

equation (data not available)equation (data not available) Impossible not to have some measurement Impossible not to have some measurement

error in one of the equation’s variableserror in one of the equation’s variables Different functional form (not linear)Different functional form (not linear) Pure randomness of variation (remember Pure randomness of variation (remember

human behavior!)human behavior!)

Page 44: Lecture 18

Example of ErrorExample of Error

Trying to estimate demand for SUV’sTrying to estimate demand for SUV’s Demand may fall because of uncertainty Demand may fall because of uncertainty

about the economy (what data do we use about the economy (what data do we use for uncertainty?)for uncertainty?)

Other independent variables may be Other independent variables may be omittedomitted

Demand function may be non-linearDemand function may be non-linear Demand for SUV’s is determined by human Demand for SUV’s is determined by human

behavior – some purely random variationbehavior – some purely random variation All end up in error termAll end up in error term

Page 45: Lecture 18

The Estimated Regression The Estimated Regression EquationEquation

Theoretical Regression Equation: Theoretical Regression Equation: Y = Y = 00 + + 11X + X +

Estimated Regression Equation: Estimated Regression Equation: Y Y^ = 103.40 + 6.38X + e = 103.40 + 6.38X + e

Observed, real word X and Y values are Observed, real word X and Y values are used to calculate coefficient estimates used to calculate coefficient estimates 103.40 and 6.38103.40 and 6.38

Estimates are used to determine Y-hat, the Estimates are used to determine Y-hat, the fitted value of Yfitted value of Y

““Plug-in” X and get estimate of YPlug-in” X and get estimate of Y

Page 46: Lecture 18

Differences Between Theoretical and Differences Between Theoretical and Estimated Regression EquationsEstimated Regression Equations

0, 0, 11 replaced with estimates , (103.40 and replaced with estimates , (103.40 and 6.38)6.38)

Can’t observe true coefficients, we make estimatesCan’t observe true coefficients, we make estimates Best guesses given data for X and YBest guesses given data for X and Y YY^ is estimated value of Y – calculated from the is estimated value of Y – calculated from the

regression equation (line through Y data)regression equation (line through Y data) Residual e = Y – Residual e = Y – Residual is difference between Y (data) and Residual is difference between Y (data) and

(estimated Y with regression)(estimated Y with regression) Theoretical model has error, estimated model has Theoretical model has error, estimated model has

residualresidual

01

Y

Y

Page 47: Lecture 18

A Simple Regression Example A Simple Regression Example in Eviewsin Eviews

Demand for Ford TaurusDemand for Ford Taurus

Page 48: Lecture 18

Ordinary Least Squares Ordinary Least Squares RegressionRegression

OLS RegressionOLS Regression Most CommonMost Common Easy to useEasy to use Estimates have Estimates have

useful useful characteristicscharacteristics

Page 49: Lecture 18

How Does Ordinary Least How Does Ordinary Least Squares Regression Work?Squares Regression Work?

We attempt to find the curve that We attempt to find the curve that best fits the data among all best fits the data among all possibilitiespossibilities

While there are a number of ways of While there are a number of ways of doing this, OLS minimizes the sum of doing this, OLS minimizes the sum of the squared residualsthe squared residuals

Page 50: Lecture 18

Finding Best Fitting Line using Finding Best Fitting Line using Ordinary Least SquaresOrdinary Least Squares

Y = Y = 00 + + 11X + X +

= + X + e= + X + e

““hat” is sample hat” is sample estimate of true estimate of true valuevalue

OLS minimizes: OLS minimizes: e e 22

e = (Y – )e = (Y – )

OLS minimizes OLS minimizes (Y- (Y- ))22

Y

_Q

P

_P

Q

Actual data points are dependent variable (Y’s)

Best possible linear line through data

Y

Y

01

Page 51: Lecture 18

True vs. Estimated Regression LineTrue vs. Estimated Regression Line

No one knows the parameters of the No one knows the parameters of the true regression line: true regression line: YYtt = = + + XXtt + + t t (theoretical)(theoretical)

We must come up with estimates.We must come up with estimates.YY^

tt = = ^ + + ^XXtt + e + et t (estimated)(estimated)

Page 52: Lecture 18

So how does OLS work?So how does OLS work?

OLS selects the OLS selects the estimates of estimates of 0 0 and and 11 that minimize the that minimize the squared residualssquared residuals

Minimize difference Minimize difference between Y and Y^between Y and Y^

Statistical SoftwareStatistical Software Complex math Complex math

behind the scenesbehind the scenes

Page 53: Lecture 18

OLS Regression Coefficient OLS Regression Coefficient InterpretationInterpretation

Regression coefficients (Regression coefficients (’s) indicate ’s) indicate the change in the dependent variable the change in the dependent variable associated with a one-unit increase associated with a one-unit increase in the independent variable in in the independent variable in question question holding constant the other holding constant the other independent variables in the independent variables in the equations (but not those not in the equations (but not those not in the equation)equation)

A controlled economic experiment?A controlled economic experiment?

Page 54: Lecture 18

Another ExampleAnother Example

The demand for beefThe demand for beef B = B = 00 + + 11P + P + 22YYdd

B = per capita consumption of beef B = per capita consumption of beef per yearper year

YYdd = per capita disposable income = per capita disposable income per yearper year

P = price of beef (cents/pound)P = price of beef (cents/pound) Estimate this using EviewsEstimate this using Eviews

Page 55: Lecture 18

Overall Fit of the ModelOverall Fit of the Model

Need a way to evaluate modelNeed a way to evaluate model Compare one model with anotherCompare one model with another Compare one functional form with Compare one functional form with

anotheranother Compare combinations of Compare combinations of

independent variablesindependent variables Use coefficient of determination rUse coefficient of determination r22

Page 56: Lecture 18

rr2 – 2 – The Coefficient of The Coefficient of DeterminationDetermination

Reported by Eviews every time you run a Reported by Eviews every time you run a regressionregression

Between 0 and 1Between 0 and 1 The larger the betterThe larger the better Close to one shows an excellent fitClose to one shows an excellent fit Near zero shows failure of estimated Near zero shows failure of estimated

regression to explain variance in Yregression to explain variance in Y Relative termRelative term rr2 2 = .85 says that 85% of the variation in the = .85 says that 85% of the variation in the

dependent variable is explained by the dependent variable is explained by the independent variablesindependent variables

Page 57: Lecture 18

Graphical rGraphical r22

rr2 2 = 0= 0 rr2 2 = .95= .95 rr2 2 = 1= 1

Page 58: Lecture 18

The Adjusted rThe Adjusted r22

Problem with rProblem with r22: Adding another : Adding another independent variable never decreases rindependent variable never decreases r22

Even a nonsensical variableEven a nonsensical variable Need to account for a decrease in Need to account for a decrease in

“degrees of freedom”“degrees of freedom” Degrees of freedom = data observations – Degrees of freedom = data observations –

coefficients estimatedcoefficients estimated Example: 100 years of data, 3 variables Example: 100 years of data, 3 variables

estimated (including constant)estimated (including constant) DF = 97DF = 97

Page 59: Lecture 18

Adjusted rAdjusted r22

Slightly negative to 1Slightly negative to 1 Accounts for degrees of freedomAccounts for degrees of freedom Better estimate of fitBetter estimate of fit Don’t rely on any one statisticDon’t rely on any one statistic Common sense and theory more Common sense and theory more

importantimportant Same interpretation as rSame interpretation as r22

Use adjusted rUse adjusted r22 from now on! from now on!

Page 60: Lecture 18

The Classical Linear The Classical Linear Regression (CLR) Model Regression (CLR) Model

These are some basic assumptions These are some basic assumptions which when met, make the Ordinary which when met, make the Ordinary Least Squares procedure the “Best Least Squares procedure the “Best Linear Unbiased Estimator” (aka Linear Unbiased Estimator” (aka BLUE).BLUE).

When one or more of these When one or more of these assumptions is violated, it is assumptions is violated, it is sometimes necessary to make sometimes necessary to make adjustments to our model.adjustments to our model.

Page 61: Lecture 18

AssumptionsAssumptions(Y(Ytt==XX1t1t+ + XX2t2t+...++...+tt))

Linearity in coefficients and error termLinearity in coefficients and error term has zero population meanhas zero population mean All independent variables are independent All independent variables are independent

of of Error term observations are uncorrelated Error term observations are uncorrelated

with each other (no serial correlation)with each other (no serial correlation) has constant variance (no has constant variance (no

heteroskedasticity)heteroskedasticity) No independent variables are perfectly No independent variables are perfectly

correlated (multicollinearity)correlated (multicollinearity)Will come back to some of these when we test our models

Page 62: Lecture 18

1st Assumption: Linearity1st Assumption: Linearity

We assume that the model is linear We assume that the model is linear (additive) in the coefficients and in the (additive) in the coefficients and in the error term, and specification is correct.error term, and specification is correct. e.g., e.g., YYtt==XX11+ + XX22++ is is linear in both, is is linear in both,

whereas whereas YYtt==XX11+ +

XX22++ is not.is not. Some nonlinear models can be Some nonlinear models can be

transformed into linear models.transformed into linear models. e.g., e.g., YYtt==XX11

XX22

We showed this can be transformed using We showed this can be transformed using logs to:logs to:

lnYlnYtt=ln=lnlnXlnX11+ + lnXlnX22+ ln+ ln

Page 63: Lecture 18

Hypothesis TestingHypothesis Testing

In statistics we In statistics we cannot “prove” a cannot “prove” a theory is correcttheory is correct

Can “reject” a Can “reject” a hypothesis with a hypothesis with a certain degree of certain degree of confidenceconfidence

Page 64: Lecture 18

Common Hypothesis TestCommon Hypothesis Test

HH00: : = 0 – Null Hypothesis = 0 – Null Hypothesis HHAA: : 0 – Alternative Hypothesis 0 – Alternative Hypothesis Test whether or not the coefficient is Test whether or not the coefficient is

statistically significantly different statistically significantly different from zerofrom zero

Does the coefficient affect demand?Does the coefficient affect demand? Two-tailed testTwo-tailed test

Page 65: Lecture 18

Does Rejecting the Null Hypothesis Does Rejecting the Null Hypothesis Guarantee that the Theory is Correct?Guarantee that the Theory is Correct?

NO! It is possible that we are NO! It is possible that we are committing what is known as a Type committing what is known as a Type I error.I error. A Type I error is rejecting that Null A Type I error is rejecting that Null

hypothesis when it is in fact correct.hypothesis when it is in fact correct. Likewise, we may also commit a Likewise, we may also commit a

Type II errorType II error A Type II error is failing to reject the Null A Type II error is failing to reject the Null

hypothesis when the alternative hypothesis when the alternative hypothesis is correct.hypothesis is correct.

Page 66: Lecture 18

Type I and Type II Error Type I and Type II Error ExampleExample

Presumption of innocence until Presumption of innocence until proven guiltyproven guilty

HH00: The defendant is innocent: The defendant is innocent HHAA: The defendant is guilty: The defendant is guilty Type I error: sending an innocent Type I error: sending an innocent

defendant to jaildefendant to jail Type II error: freeing a guilty Type II error: freeing a guilty

defendantdefendant

Page 67: Lecture 18

The t-Test, and the t-StatisticThe t-Test, and the t-Statistic We can use the t-Test to do We can use the t-Test to do

hypothesis testing on individual hypothesis testing on individual coefficients.coefficients.

Given the linear regression model:Given the linear regression model: YYtt==XX11+ + XX22+...++...+tt

We can calculate the t-statistic for We can calculate the t-statistic for each estimated value of each estimated value of i.e., i.e., hathat), ), and test hypotheses on that estimate.and test hypotheses on that estimate.

Page 68: Lecture 18

Setting up the Null and Setting up the Null and Alternative HypothesesAlternative Hypotheses

HH00: : 11= 0 = 0 (i.e., X(i.e., X11 is not important) is not important)

HHAA: : 110 0 (i.e., X(i.e., X11 is important, is important,

either positively or either positively or negatively) negatively)

Page 69: Lecture 18

Testing the HypothesisTesting the Hypothesis Set up null and alternative hypothesisSet up null and alternative hypothesis Run regression and generate t-score.Run regression and generate t-score. Look up the critical value of the t-Statistic (tLook up the critical value of the t-Statistic (tcc), ),

given the degrees of freedom (n-k) given the degrees of freedom (n-k) in a two-in a two-tailed test tailed test using X% level of significance (1%, using X% level of significance (1%, 5%, 10%)5%, 10%)

n = sample size, k = estimated coefficients n = sample size, k = estimated coefficients (including intercept)(including intercept)

Reject null (=0) if abs(tReject null (=0) if abs(tkk)> t)> tcc t Statistic Table on Page 754 of Hirscheyt Statistic Table on Page 754 of Hirschey Interpretation of level of significance: 5% Interpretation of level of significance: 5%

means only 5% chance estimate is actually means only 5% chance estimate is actually equal to zero or not significant statistically (this equal to zero or not significant statistically (this is a 95% confidence)is a 95% confidence)

Page 70: Lecture 18

ExampleExample Taurus example with t-statsTaurus example with t-stats

Page 71: Lecture 18

Limitations of the t-TestLimitations of the t-Test

1.1. Does not indicate theoretical Does not indicate theoretical validityvalidity

2.2. Does not test Importance of Does not test Importance of the variable.the variable. The size of the coefficient does The size of the coefficient does

this.this.

Page 72: Lecture 18

F-test and the F-statisticF-test and the F-statistic

You can also test You can also test whether a group of whether a group of coefficients is coefficients is statistically significant.statistically significant.

Look at the F-test for Look at the F-test for all of the independent all of the independent variable coefficients.variable coefficients.

First set up the null First set up the null and alternative and alternative hypotheses.hypotheses.

Page 73: Lecture 18

HH00 and H and HA A for the F-testfor the F-test

HH00: : kk00 i.e., all of the slope coefficients are i.e., all of the slope coefficients are

simultaneously zero.simultaneously zero. HHAA: not H: not H00

i.e., at least one, if not more slope i.e., at least one, if not more slope coefficients, are nonzero.coefficients, are nonzero.

Note: It does not indicate which one or Note: It does not indicate which one or ones of the coefficients are nonzero. ones of the coefficients are nonzero.

Page 74: Lecture 18

The Critical FThe Critical F As with the t-statistic, you must As with the t-statistic, you must

compare the actual value of F with its compare the actual value of F with its critical value (Fcritical value (Fcc):):

Actual value from EVIEWS or Actual value from EVIEWS or

FFk-1, n-k k-1, n-k = [r= [r22/(k-1)]/[(1-r/(k-1)]/[(1-r22)/(n-k)])/(n-k)] FFCC must be looked up in a table, using the must be looked up in a table, using the

appropriate degrees of freedom for the appropriate degrees of freedom for the numerator (k-1) and the denominator (n-k)numerator (k-1) and the denominator (n-k)

Table on Page 751 (10%), 752 (5%) and Table on Page 751 (10%), 752 (5%) and 753 (1%) of Hirschey753 (1%) of Hirschey

Page 75: Lecture 18

The F-TestThe F-Test If FIf FFFCC then you reject H then you reject H0 0 (all (all

coefficients not equal to coefficients not equal to zero)zero)

If F<FIf F<FCC then you fail to reject then you fail to reject HH00

Look at an Eviews exampleLook at an Eviews example

Page 76: Lecture 18

Specification ErrorsSpecification Errors

Suppose that your make a Suppose that your make a mistake in your choice of mistake in your choice of independent variables. There independent variables. There are 2 possibilitiesare 2 possibilities:: You omit an important variableYou omit an important variable You include an extraneous variableYou include an extraneous variable

There are consequences in both There are consequences in both cases.cases.

Page 77: Lecture 18

Omitting an Important VariableOmitting an Important Variable

Suppose your true regression model is:Suppose your true regression model is:QQtt==PPtt+ + IItt++t t

Suppose you specify the model as:Suppose you specify the model as:QQtt==PPtt++tt

**

Thus, error term of the misspecified Thus, error term of the misspecified model captures the influence of model captures the influence of income, Iincome, Itt..

tt**IItt++tt

Page 78: Lecture 18

ConsequencesConsequences

Prevents you from getting a Prevents you from getting a coefficient for incomecoefficient for income

Causes bias in the price estimateCauses bias in the price estimate Violates classical assumption of Violates classical assumption of

error term not being correlated error term not being correlated with an explanatory with an explanatory (independent) variable(independent) variable

Page 79: Lecture 18

Inclusion of an Irrelevant Inclusion of an Irrelevant VariableVariable

A variable that is included, that does A variable that is included, that does not belong in your model also has not belong in your model also has consequences.consequences.

Does NOT bias the other coefficients.Does NOT bias the other coefficients. Lowers t-scores of other coefficients Lowers t-scores of other coefficients

(so you might reject)(so you might reject) Will raise rWill raise r22 but will likely decrease the but will likely decrease the

adjusted radjusted r22 (help you identify) (help you identify)

Page 80: Lecture 18

ExampleExample

Annual Consumption of ChickenAnnual Consumption of Chicken Y = consumption of chicken, PC = price of Y = consumption of chicken, PC = price of

chicken, PB = price of beef, I = disposable chicken, PB = price of beef, I = disposable incomeincome

Y^ = 31.5 – 0.73PC + 0.11PB + 0.23IY^ = 31.5 – 0.73PC + 0.11PB + 0.23I PC t-stat = -9.12, PB t-stat = 2.50, I t-stat PC t-stat = -9.12, PB t-stat = 2.50, I t-stat

= 14.22= 14.22 Adjusted rAdjusted r22 = 0.986 = 0.986 Interpretation?Interpretation?

Page 81: Lecture 18

ExampleExample

Add interest rate to the equation, RAdd interest rate to the equation, R Y^ = 30 – 0.73PC + 0.12PB + 0.22YD + Y^ = 30 – 0.73PC + 0.12PB + 0.22YD +

0.17R0.17R PC t-stat = -9.10, PB t-stat = 2.08, YD t-PC t-stat = -9.10, PB t-stat = 2.08, YD t-

stat = 11.05, R t-stat = 0.82stat = 11.05, R t-stat = 0.82 Adjusted rAdjusted r22 = .985 = .985 Lowers t-stats and adjusted rLowers t-stats and adjusted r22

t-stat suggests rejection and so does the t-stat suggests rejection and so does the adjusted radjusted r22

Page 82: Lecture 18

How do you decide whether a How do you decide whether a variable should be included?variable should be included?

Trial and Error – Many EVIEWS runs!Trial and Error – Many EVIEWS runs! Start withStart with THEORY!THEORY!

Use your judgement here!Use your judgement here! If theory does not provide a clear answer, If theory does not provide a clear answer,

then:then: Look at t-testLook at t-test Look at adjusted rLook at adjusted r22

Look at whether other coefficients appear to be Look at whether other coefficients appear to be biased when you exclude the variable from the biased when you exclude the variable from the model.model.

Page 83: Lecture 18

Inclusion of Lagged VariablesInclusion of Lagged Variables

Some independent variables influence Some independent variables influence demand with a lag.demand with a lag. For example, advertising may primarily For example, advertising may primarily

influence demand in the following month, influence demand in the following month, rather than the current month.rather than the current month.

Thus, QThus, Qtt==00++11PPtt++22IItt++33AAt-1t-1++tt When there is a good reason to suspect a When there is a good reason to suspect a

lag (i.e., when theory suggests a lagged lag (i.e., when theory suggests a lagged relationship), you can investigate this relationship), you can investigate this option.option.

Page 84: Lecture 18

Eviews Lagged VariableEviews Lagged Variable

Unemployment in previous time Unemployment in previous time periods important to current demand periods important to current demand for Taurus?for Taurus?

Page 85: Lecture 18

Functional FormFunctional Form

Don’t forget the constant term – no Don’t forget the constant term – no meaning but required for classical meaning but required for classical assumptionsassumptions

Linear FormLinear Form Double Log FormDouble Log Form There are many others that we won’t There are many others that we won’t

discussdiscuss

Page 86: Lecture 18

Linear FormLinear Form

Y = Y = 00 + + 11X + X + What we have looked at thus farWhat we have looked at thus far Constant slope is assumed Constant slope is assumed Y/Y/X = X = 11

Page 87: Lecture 18

Double-Log FormDouble-Log Form Second most commonSecond most common Natural log of Y is independent variable and Natural log of Y is independent variable and

natural log of X’s are dependent variablesnatural log of X’s are dependent variables lnY = lnY = 00 + + 11lnXlnX11 + + 22lnXlnX22 + + Elasticities of the model are constantElasticities of the model are constant ElasticityElasticityY,XkY,Xk = % = %Y/ %Y/ %XX11 = = 1 1 = constant= constant Interpretation of coefficients: if XInterpretation of coefficients: if X11 increases increases

by 1% while the other Xby 1% while the other X22 is held constant, Y is held constant, Y will change by will change by 11%%

Can’t be any negative or 0 observations in Can’t be any negative or 0 observations in your data set (natural log not defined)your data set (natural log not defined)

Page 88: Lecture 18

Violations of the Classical Violations of the Classical ModelModel

MulticollinearityMulticollinearity Serial CorrelationSerial Correlation OthersOthers

Page 89: Lecture 18

Problem of MulticollinearityProblem of Multicollinearity

Recall the CLR assumption that the Recall the CLR assumption that the independent variables are not independent variables are not perfectly correlated with each otherperfectly correlated with each other

This is called “perfect This is called “perfect multicollinearity”multicollinearity” Easy to detectEasy to detect OLS cannot estimate parameters in this OLS cannot estimate parameters in this

situation (put in the same independent situation (put in the same independent twice and Eviews can’t do it)twice and Eviews can’t do it)

Look at problem of imperfect Look at problem of imperfect multicollinearitymulticollinearity

Page 90: Lecture 18

Imperfect MulticollinearityImperfect Multicollinearity

This occurs when two or more This occurs when two or more independent variables are highly, independent variables are highly, but not perfectly correlated with but not perfectly correlated with each other!each other! If this is severe enough, it can influence If this is severe enough, it can influence

the estimation of the the estimation of the ’s in the model.’s in the model.

Page 91: Lecture 18

How to Detect the ProblemHow to Detect the Problem

There are some formal testsThere are some formal tests Beyond scope of this courseBeyond scope of this course

Look for the tell-tale signs of the Look for the tell-tale signs of the problem:problem: High adjusted rHigh adjusted r22, high F-statistics, and low , high F-statistics, and low

t-scores on suspected collinear variables.t-scores on suspected collinear variables. Eviews example with TaurusEviews example with Taurus

Page 92: Lecture 18

RemediesRemedies

Possibly do nothing!Possibly do nothing! If t-scores are at or near significance If t-scores are at or near significance

levels, you may want to “live with it”.levels, you may want to “live with it”. Drop one or more collinear Drop one or more collinear

variables.variables. Let the remaining variable pick up the Let the remaining variable pick up the

joint impact. joint impact. This is ok if you have redundancies. This is ok if you have redundancies.

Page 93: Lecture 18

Remedies - continuedRemedies - continued

Form a new variable:Form a new variable: e.g., if income and population are e.g., if income and population are

correlated, you could form per capita correlated, you could form per capita income: I/Pop.income: I/Pop.

Other solutions I can help with on your Other solutions I can help with on your projectsprojects

Page 94: Lecture 18

The Problem of The Problem of Serial CorrelationSerial Correlation

The fourth assumption of the CLR The fourth assumption of the CLR model is:model is: “ “Observations of the error term are Observations of the error term are

uncorrelated with each other”uncorrelated with each other” When this is not satisfied, we have a When this is not satisfied, we have a

problem known as serial correlation.problem known as serial correlation.

Page 95: Lecture 18

Examples of Serial CorrelationExamples of Serial Correlation

Positive Serial Positive Serial CorrelationCorrelation

Negative Serial Negative Serial CorrelationCorrelation

Q

P

Q

P

Page 96: Lecture 18

Consequences of Serial Consequences of Serial CorrelationCorrelation

Pure serial correlation does not bias the Pure serial correlation does not bias the estimatesestimates

Serial correlation tends to distort t-Serial correlation tends to distort t-scores scores

Serial correlation results in a pattern of Serial correlation results in a pattern of observations in which OLS gives a better observations in which OLS gives a better fit to the data than would be obtained in fit to the data than would be obtained in the absence of the problem (t scores the absence of the problem (t scores higher).higher).

Uses error to explain dependent variableUses error to explain dependent variable

Page 97: Lecture 18

QUESTION: QUESTION: Why is this a problem? Why is this a problem?

This suggests that t-statistics are This suggests that t-statistics are overestimated!overestimated!

Type I error: Type I error: You may falsely reject You may falsely reject the null hypothesis, when it is in fact the null hypothesis, when it is in fact true. true.

Neither, F-statistics nor t-statistics Neither, F-statistics nor t-statistics can be trusted in the presence of can be trusted in the presence of serial correlation.serial correlation.

Page 98: Lecture 18

Detection: Detection: The Durbin-Watson d-testThe Durbin-Watson d-test

This is a test for first order serial This is a test for first order serial correlation correlation This is the most common type in This is the most common type in

economic models.economic models. Note that there are other tests (Q-test, Note that there are other tests (Q-test,

Breusch-Godfrey LM test), but we will not Breusch-Godfrey LM test), but we will not cover them here.cover them here.

The d-statistic is derived from the The d-statistic is derived from the regression residuals (e). regression residuals (e).

Page 99: Lecture 18

Theoretical range of d-statisticTheoretical range of d-statistic

If there is perfect positive serial correlation If there is perfect positive serial correlation then d=0.then d=0.

If there is perfect negative serial If there is perfect negative serial correlation then d=4.correlation then d=4.

If there is no serial correlation, then d=2If there is no serial correlation, then d=2 Check this statistic in Eviews on your Check this statistic in Eviews on your

projectproject If near 2 no problem, if different than 2 If near 2 no problem, if different than 2

then …then …

Page 100: Lecture 18

Correction for Serial Correlation Correction for Serial Correlation using GLSusing GLS

Adding an autoregressive term solves serial Adding an autoregressive term solves serial correlation problemcorrelation problem

Details is outside scope of classDetails is outside scope of class Soviet Defense spending model Soviet Defense spending model If your original regression model was:If your original regression model was:

LS SDH C USD SY SP LS SDH C USD SY SP DW=0.62 a problemDW=0.62 a problem

Simply add an AR(1) term to your command Simply add an AR(1) term to your command line:line: LS SDH C USD SY SP AR(1)LS SDH C USD SY SP AR(1)DW=1.97 problem solvedDW=1.97 problem solved

Page 101: Lecture 18

Summary Steps for ProjectSummary Steps for Project Think about theoretical model: what Think about theoretical model: what

independent variables make sense based on independent variables make sense based on theory? (already doing this)theory? (already doing this)

Collect data and examine it (already doing this)Collect data and examine it (already doing this) Choose a functional form (likely linear)Choose a functional form (likely linear) Run regression models in EviewsRun regression models in Eviews Examine adjusted rExamine adjusted r22, t-stats, F-stat and exclude , t-stats, F-stat and exclude

or include variables based on these and theoryor include variables based on these and theory Do you need lagged variables?Do you need lagged variables? Look for evidence of (and correct for) Look for evidence of (and correct for)

multicollinearity or serial correlationmulticollinearity or serial correlation

Page 102: Lecture 18

Summary Steps for ProjectSummary Steps for Project

Interpret your resultsInterpret your results Use model to forecast demand (next Use model to forecast demand (next

topic)topic) I’ll do a “sample project” next time I’ll do a “sample project” next time

using the Taurus datausing the Taurus data

Page 103: Lecture 18

Homework ContinuedHomework Continued Interpret Adjusted rInterpret Adjusted r22

Which is a better measure of overall fit? Why?Which is a better measure of overall fit? Why? Is F-Stat Significant? What does it mean?Is F-Stat Significant? What does it mean? Any evidence of serial correlation?Any evidence of serial correlation? How could this be corrected for?How could this be corrected for? Estimate the equation as a log-log modelEstimate the equation as a log-log model Interpret the resultsInterpret the results Is beef a normal good?Is beef a normal good? Is demand elastic or inelastic (for price and Is demand elastic or inelastic (for price and

income)?income)?

Page 104: Lecture 18

ORDINARY LEAST SQUARES ESTIMATORS (OLS)ORDINARY LEAST SQUARES ESTIMATORS (OLS)

OLS estimation method estimates the values of the OLS estimation method estimates the values of the parameters by minimising the sum of the error term.parameters by minimising the sum of the error term.

Let the theoretical econometric model beLet the theoretical econometric model be YYtt = = 00 + + 11XXtt + + tt

With usual assumptions about With usual assumptions about .. is an unknown random (stochastic) disturbance is an unknown random (stochastic) disturbance

term. However, statistical estimation requires term. However, statistical estimation requires regression of Y on X and estimating regression of Y on X and estimating 00, , 11 and and residuals – particular values of residuals – particular values of . To distinguish from . To distinguish from the unknown disturbance the unknown disturbance , the sample residuals are , the sample residuals are termed as errors and denoted by termed as errors and denoted by . Thus, the . Thus, the empirical model is empirical model is

YYtt = = 00 + + 11XXtt + + tt

tt = = YYtt – ( – (00 + + 11XXtt)) = Actual Value – Predicted Value= Actual Value – Predicted Value ((YYtt - - ) )

tY

Page 105: Lecture 18

ORDINARY LEAST SQUARES ESTIMATORS (OLS)ORDINARY LEAST SQUARES ESTIMATORS (OLS)

The method of least squares minimises the The method of least squares minimises the sum of squared residuals (or errors), SSE or sum of squared residuals (or errors), SSE or Error Sum of Squares (ESS).Error Sum of Squares (ESS).

This means that we are minimising the sum This means that we are minimising the sum of the squares of the vertical distances of the squares of the vertical distances from the line of regression. Alternatively, from the line of regression. Alternatively, we could have minimised the absolute sum we could have minimised the absolute sum of vertical distances or sum of squares of of vertical distances or sum of squares of the horizontal distances or perpendicular the horizontal distances or perpendicular distances (orthogonal estimators). OLS distances (orthogonal estimators). OLS confines to the minimisation of the sum of confines to the minimisation of the sum of squares of the vertical distances.squares of the vertical distances.

i.e., it minimises i.e., it minimises

22

0 11 1

( )n n

t t tt t

Y X

Page 106: Lecture 18

ORDINARY LEAST SQUARES ESTIMATORS (OLS)ORDINARY LEAST SQUARES ESTIMATORS (OLS)

SSE is to be minimised with respect to the SSE is to be minimised with respect to the parameters parameters 0 0 and and 11. We have to choose those . We have to choose those values of values of 0 0 and and 1 1 which will give as close a fit to which will give as close a fit to data as is possible with this specification. Let these data as is possible with this specification. Let these values be and .values be and .

We see that the standard model or the ordinary least We see that the standard model or the ordinary least squares (OLS) model as it is more popular called and squares (OLS) model as it is more popular called and the classical linear regression model [OLS with the the classical linear regression model [OLS with the additional assumption of additional assumption of tt ~ N (0, ~ N (0, 22)] have 4 )] have 4 parameters. parameters. 0 and 0 and 1 – the parameters of linear 1 – the parameters of linear dependence, and E(dependence, and E(tt) and ) and 22

- the parameters of - the parameters of the probability distribution of the probability distribution of . However, the . However, the assumption of E(assumption of E(tt) = 0 i.e., randomness of the ) = 0 i.e., randomness of the disturbance is not very restrictive. Let the expected disturbance is not very restrictive. Let the expected value of value of II be non-zero, say be non-zero, say kk, then , then

0

1

Page 107: Lecture 18

ORDINARY LEAST SQUARES ESTIMATORS (OLS)ORDINARY LEAST SQUARES ESTIMATORS (OLS)

0 1

0 1

0 1

/ ( ) ( )

( )

( ) ( )

t i t t

t

t

Y EY X X E

X k

k X

0 1

2 20 1

0 0

[ ]t t t

ESS ESSand

whereESS Y X

are the estimated values of the parameters.