chapter7: multiple regression analysis: estimation · general modelwith two independent variables:...

80
Basic Econometrics Basic Econometrics Chapter 7: Multiple Regression Analysis: Multiple Regression Analysis: Estimation Iris Wang [email protected]

Upload: others

Post on 07-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Basic EconometricsBasic Econometrics

Chapter 7: Multiple Regression Analysis:Multiple Regression Analysis: 

Estimation

Iris Wang

[email protected]

Page 2: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Motivation for multiple regressionMotivation for multiple regression

• In chapter 3 we have learned how to use simpleIn chapter 3 we have learned how to use simple regression analysis to explain a dependentvariable y as a function of a single explanatory y g p yvariable x.

• Key assumption is:Key assumption is: Assumption 3: The error u has an expected value of zero given x: E(u|x)=0

• Main drawback of that framework: All otherfactors affecting y have to be uncorrelated with x.

Page 3: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• Multiple regression analysis is more suitablefor causal (ceteris paribus) analysis.

• Reason: We can explicitly control for otherp yfactors that affect the dependent variable y.

• Example 1: Wage equationExample 1: Wage equation

• If we estimate the parameters of this model using OLS, whatinterpretation can we give to β1? 

• Why might this approach yield a more reliable estimate of the causal effect of education than if we were using a simple regression with educ as the sole explanatory p g p yvariable?

Page 4: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

General model with two independent variables:

wherewhere

• β0 is the intercept

• β1 measures the change in y with respect to x1, holding otherβ1 measures the change in y with respect to x1, holding otherfactors fixed

• β2 measures the change in y with respect to x2, holding otherfactors fixed

Page 5: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Key assumption for the model withd d bltwo independent variables:

h l h• Note the similarity to Assumption 3 in Chap.3 introduced previously. 

• Interpretation: for any values of x1 and x2 in the population, the average unobservable (u) is equal to zero.

Page 6: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The model with k independent variablesThe model with k independent variables

Themultiple regression model:The multiple regression model:

where

β0 is the interceptβ0 is the intercept

β1 is the parameter associated with x1 (measures the change in y with respect to x1, holding other factors fixed)1

β2 is the parameter associated with x1 (measures the change in y with respect to x2, holding other factors fixed)

and so on…

Page 7: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The model with k independent variables (cont’d)

• β1, β2,…,βk are often referred to as slope parameters

• u is the disturbance term (error term). It contains factorsother than the x1,x2,…,xk affecting y.

Page 8: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Key assumption for the model with kd d blindependent variables:

• Thus, all factors in the unobserved error term u are assumed uncorrelated with the explanatory variables.

Page 9: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

AssumptionsAssumption 1: Linear in parameters: 

y = β0 + β1x1 + β2x2 +…+ u.y   β0  β1x1  β2x2 …  u. 

Assumption 8: No perfect collinearity: In the sample, none of the independent variables is constant and there are no exactthe independent variables is constant and there are no exactlinear relationships among the independent variables.

Assumption 3: Zero conditional mean ‐ the error u has an expected value of zero given any values of the independent variables:variables: 

E(u|x1, x2,…,xk)=0p.189

Page 10: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• Assumption 1 is straightforward• Assumption 1 is straightforward.

• Assumption 8 is new: No perfect collinearity. Keyp p y yin practice: No exact linear depdendencebetween independent variables.– If there is linear dependence between variables, thenwe say there is perfect collinearity. In such a case wecannot estimate the parameters using OLS.p g

– Examples: • X2 = a*x1

* *• x3 = a1*x1 + a2*x2

Page 11: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Under Assumptions, OLS is unbiasedUnder Assumptions, OLS is unbiased

• You do not have to know how to prove that pOLS is unbiased. But you need to know:– The definition above and what it meansThe definition above and what it means

– The assumptions you need for unbiasedeness

Page 12: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Zero conditional meanZero conditional mean

It i th t i t t f th th ti dIt is the most important of the three assumptions and requires the residual u to be uncorrelated with all explanatory variables in the population modelexplanatory variables in the population model. 

When Assumption 3 holds, we say that the explanatory i blvariables are exogenous. 

Page 13: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Variance of the OLS estimatorsVariance of the OLS estimatorsWe now obtain the variance of the OLS estimators, 

th t h f th d i th iso that we have a measure of the spread in theirsampling distributions.

• Assumption 4: Homoskedasticity The error u has theAssumption 4: Homoskedasticity. The error u has the same variance given any value of the explanatoryvariables: 

This means that the variance in the error term, u, conditionalon the explanatory variables, is the same for all values of the explanatory variables.p y

If this is not the case, there is heteroskedasticity and the variance formula below has to be adjusted.

Page 14: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Zero conditional meanZero conditional meanAssumption 3 may fail for the following reasons:

• Omitting an important explanatory variable that is correlatedwith any of the x1, x2,…,xk

• Mis‐specified functional relationship between the dependentand independent variables (e.g. omitted squared term; usinglevel instead of log; or log instead of level…) g; g )

The first of these – omitted variables – is by far the biggestconcern in empirical work. p

Page 15: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

OLS Estimates

• We focus first on the model with two independent variables. 

• We write the estimated OLS regression in a form similar to the• We write the estimated OLS regression in a form similar to the simple regression case:

where ”hats” on the parameters indicate that these areestimates of the true (unknown) population parameters:estimates of the true (unknown) population parameters:

and the ”hat” on y means predicted (instead of actual) y.

Page 16: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• How obtain the OLS estimates?

• As mentioned in Chap.3, the method of ordinaryAs mentioned in Chap.3, the method of ordinaryleast squares (OLS) chooses the estimates that minimizes the sum of squared residuals. 

• That is, given n observations on the y and x1,…,xkvariables, the OLS estimates minimize: 

where i refers to observation number, and the second d d h d ff blindex distinguishes different variables.

More details on p.193

Page 17: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

OLS fitted values and residualsOLS fitted values and residuals

• For observation i the fitted value is simplyFor observation i the fitted value is simply

• The residual for observation i is defined just as in the jsimple regression case:

Page 18: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Estimating standard errors of the OLS estimatesEstimating standard errors of the OLS estimates

• The main practical usage of the variance formula isThe main practical usage of the variance formula is for computing standard errors of the OLS estimates

• A technicality in this context is that the trueyparameter σ2 is not observed. But it can be estimated as follows:

where

Page 19: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Degrees of freedom (df):

Page 20: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Properties:Properties:

Page 21: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Goodness‐of‐fit: Same as for simple regression modelSame as for simple regression model

• SST = Total Sum of Squaresq• SSE = Explained Sum of Squares• SSR = Residual Sum of Squares

Page 22: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Some points about the R‐squaredSome points about the R squared

• The R‐squared is equal to the squared correlatione squa ed s equa to t e squa ed co e at obetween actual and fitted y.

• The R‐squared never decreases, and usuallyq yincreases, when another independent variable is added to a regression.– This is because the RSS can never increase when you add more regressors to the model (why?)

• Why is the R‐squared a poor tool for deciding• Why is the R‐squared a poor tool for deciding whether a particular variable should be added to the model?

Page 23: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Adjusted R‐squaredAdjusted R squared

• Adiusted R‐squared is an alternative measure ofAdiusted R squared is an alternative measure of the goodness‐of‐fit, which penalizes the inclusion of additional variablesinclusion of additional variables.

• This is different from the traditional R‐squared. The traditional R squared can never decrease asThe traditional R‐squared can never decrease as you add more explanatory variables.

Page 24: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Comparing traditional and d d dadjusted R‐squareds

Traditional R‐squared:

Adjusted R‐squared:

Adjusted R‐squared penalizes inclusion of more x‐variables (since k increases). 

Adjusted R‐squared can even become negative – a very poor model fit!

Adding an irrelevant x‐variable reduces the adjusted R‐squared. g j q

Adjusted R‐squared is occasionally used to choose between models. 

Page 25: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Examples

Page 26: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Examples (cont’d)p ( )

Page 27: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The natural logarithm FunctionThe natural logarithm Function

• Using natural logarithm in econometric modelling – someUs g a u a oga eco o e c ode g so eexamples:

For example p 207 Cobb‐Douglas Production functionFor example, p.207, Cobb Douglas Production function

Page 28: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The exponential functionThe exponential function

• The exponential function is closely related to the log e e po e t a u ct o s c ose y e a ed o e ogfunction. For example, 

In other words, if you’ve got log(y) specified as a linearfunction of x, then y is an exponential function of x.y p

Page 29: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Why express variables in log form?Why express variables in log form?

• Often leads to less heteroskedasticity (recall wage vs. y ( glog wage residuals).

• Estimates less sensitive to large outliers• Rule of thumb: When a variable is a positive dollar (or any other currency…) amount, the log is oftentakentaken.

• But: cannot take the log of a negative number, or of zeros.– If y is sometimes zero but never negative, we sometimesuse log(y+1). 

Page 30: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to
Page 31: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Quadratic functions

• Consider the following nonlinear function:

Polynomial regression models for example on p 210Polynomial regression models, for example, on p.210

Page 32: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Note: Nonlinear dependence is okay!Note: Nonlinear dependence is okay!

• This type of model can be estimated by OLS:This type of model can be estimated by OLS:

• But this type of model can’t be estimated by OLS:

Since income_thousandsdollars = 1,000*income_dollars, i.e. there’s lineardependence.

Page 33: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The optional partThe optional part 

Page 34: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Example (WAGE1.dta)Example (WAGE1.dta)Regression results (std errors omitted):

h d h ff d d• exper has a diminishing effect on predicted wage

• When (as is the case here) the coefficient on x is positive and that on x2 is negative, the quadratichas a parapolic shape.

• The turning point is achieved at:

Page 35: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Negative returns to experience beyond 24.4 years?

Page 36: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• Alternatively, it could be the coefficient on x is negative and that on x2 is positive: u‐shape

• Or, the coefficients on x and x2 could both be negative –draw graph (no turning point)

• Or, the coefficients on x and x2 could both be positive –draw graph (no turning point)

• Would it be conceivable to include x, x2 and x3 in a model? 

Page 37: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Models with interaction termsModels with interaction terms

• Sometimes partial effects depend on theSometimes partial effects depend on the magnitude of another explanatory variable. Example:Example: 

• How interpret this partial effect? p p• How interpret β2? • Why is it of interest to test H0: β3=0 and how would you do it?

Page 38: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Basic EconometricsBasic Econometrics

Chapter 8: Multiple Regression Analysis:Multiple Regression Analysis: 

Inference

Iris Wang

[email protected]

Page 39: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Testing hypotheses about the parameters

• In the previous lectures we have seen how theIn the previous lectures we have seen how the population model parameters can be estimated.

• Now we will turn to the problem of testing hypothesesabout these parameters.– For example, we may want to test the hypothesis that a 

i i l ( h l )certain parameter is equal to zero (or some other value)

– Alternatively, we may want to test hypothesis that a groupof parameters (e.g. all slope parameters) are equal to zeroof parameters (e.g. all slope parameters) are equal to zero

• Testing hypotheses is known as inference (becausewe infer conclusions about the true population paremeters)

Page 40: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Testing hypotheses about a singlel hpopulation parameter: The t test

• This section covers a very important topic – testingThis section covers a very important topic testinghypotheses about a single parameter.

• Our starting point is the population modelg p p p

where we assume that the CLM assumptions hold.• Our goal is to test hypotheses about a particular

R b Th k d• Remember: The      are unknown parameters and wewill never know them with certainty. But we canhypothesize about the value of and then usehypothesize about the value of       and then usestatistical inference to test our hypothesis.

Page 41: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Key result for testing hypotheses:y g ypFrom Chapter 5 and under the CLM assumptions, weh h f ll ihave the following:

where k+1 is the number of unknown parameters in thewhere k+1 is the number of unknown parameters in the population model (k slope parameters & the intercept).

In words this says that the deviation between theIn words, this says that the deviation between the estimated value and the true parameter value, divided by the standard error of the estimator follows a t‐distribtionith k 1 d f f dwith n‐k‐1 degrees of freedom.

Page 42: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The t distributionThe t distribution

• The pdf of the t distribution has a shape similar to the standard normal distribution*, except it’s more spread outand therefore has more area in the tails.

• As the degrees of freedomAs the degrees of freedom gets large, the t distribution approaches the standard normal distributionnormal distribution.

*normally distributed with mean zero and variance equal to 1variance equal to 1

Page 43: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Two‐sided t testsTwo sided t tests

• While of some interest, one‐sided tests are rarely, yused in econometrics. 

• tests of a null hypothesis like H0: βj=0 against a two sided alternative like H β ≠0two‐sided alternative like H1: βj≠0.

• In words, H1 is that xj has a ceteris paribus effecton y, which could be either positive or negative.on y, which could be either positive or negative.

• When the alternative is two‐sided, we areinterested in the absolute value of the t‐statistic.

Rejection rule: 

Page 44: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Distribution of t under H0:

The rule is to reject if

If c=2.06 then I will reject a true null hypothesis 5% of the time.

Suppose I estimate beta atSuppose I estimate beta at 0.75 and suppose the t‐value is 1.2.

Wh d I t t t j tWhy do I not want to rejectthe null that beta is zero in such a case?

Page 45: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Example: Wage equationExample: Wage equation

• The null hypothesis H0: β2=0 means that, onceyp 0 β2 ,education and tenure have been accountedfor, years of experience has no effect on , y phourly wage. 

• Is this an economically interesting hypothesis?Is this an economically interesting hypothesis?

• Now let’s look at how we can carry out and interpret such a testinterpret such a test.

Page 46: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Data‐wage1 SPSS output:Data wage1 SPSS output:

Coefficientsa

Unstandardized CoefficientsStandardized Coefficients

Model

Unstandardized Coefficients Coefficients

t Sig.B Std. Error Beta1 (Constant) ,287 ,104 2,745 ,006

educ ,092 ,007 ,479 12,519 ,000, , , , ,

exper ,004 ,002 ,104 2,367 ,018

tenure ,022 ,003 ,300 7,126 ,000

a. Dependent Variable: lwage

Page 47: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Testing the overall significancef h lof the sample regression

• So far we have focused on tests of hypothesesSo far we have focused on tests of hypothesesreferring to a single parameter βj.

• But we must often test hypotheses involvingBut we must often test hypotheses involvingmore than one of the population parameters.

• Consider the following wage model:Consider the following wage model:

log(wage)= β0+β1educ+β2exper+ulog(wage) β0+β1educ+β2exper+u

H : β =β =0H0: β1=β2=0 

Page 48: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The overall testThe overall test

• Note that the hypothese above concerns twoNote that the hypothese above concerns twoparameters, β1 and β2.

• We cannot simply use the individual t test to• We cannot simply use the individual t test to test the null hypothese.

W h f h F i d 240• We, therefore, use the F test instead, p.240.

Page 49: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

SPSS output:SPSS output:

ANOVAb

ModelSum of Squares df

Mean Square F Sig.

1 Regression 46,662 3 15,554 79,898 ,000a

Residual 101,425 521 ,195Total 148,087 524

a. Predictors: (Constant), tenure, educ, exper

b D d t V i bl lb. Dependent Variable: lwage

Page 50: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The F statistic for overall significancefof a regression

• Consider the following model and null hypothesis:g yp

H0:    x1, x2,…, xk do not help to explain y

• The F statistic can be computed as 

Page 51: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Model Summary

Model R R Square Adjusted R SquareStd. Error of the

Estimate1 ,561a ,315 ,311 ,441219071

a. Predictors: (Constant), tenure, educ, exper

ANOVAb

ModelSum of Squares df

Mean Square F Sig.

1 Regression 46,662 3 15,554 79,898 ,000a

Residual 101,425 521 ,195Total 148,087 524

a. Predictors: (Constant), tenure, educ, exper

F = = (0.315/3) / [(1‐0.315)/521]= 79 861

a. Predictors: (Constant), tenure, educ, experb. Dependent Variable: lwage

• This type of test determines the overall significance of the regression. 

• If we fail to reject the null hypothesis, our model has very little explanatory powerf l h l bl

= 79.861

– it is not significant improvement over a model with no explanatory variables!

• In such a case, we should probably look for other explanatory variables…

Page 52: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Testing Multiple Restrictions: hThe F test

• Suppose we want to test multiple hypotheses aboutSuppose we want to test multiple hypotheses aboutthe parameters in our model. 

• For example, suppose we want to test the nullp , pphypothesis that all of the slope coefficients are equalto zero, in a model of the following type:

ld d h ll h h ?• How would you write down the null hypothesis?• If we can’t reject the null hypothesis, what is the 

implication for our model?implication for our model? 

Page 53: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Testing exclusion restrictionsTesting exclusion restrictions

• Goal: test whether a group of variables has no effectGoal: test whether a group of variables has no effecton the dependent variable.

• Consider the following model of (major league) g ( j g )baseball players’ salaries:

(salary = total 1993 salary; years = years in the league; gamesyr = average games played per year; bavg = y y y y g g y g g p y p y gcareer batting average; hrunsyr = home runs per year; rbisyr = runs batted in per year)

”Exclusion restrictions”

State and explain the nullhypothesis in words

Page 54: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

SPSS Results (MLB1.dta)SPSS Results (MLB1.dta)

Coefficientsa

ModelUnstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta

T t f t th

1 (Constant) 11,204 ,289 38,714 ,000

years ,069 ,012 ,225 5,651 ,000gamesyr ,012 ,003 ,381 4,700 ,000bavg ,001 ,001 ,031 ,861 ,390

Test refers to thesecoefficients

hrunsyr ,013 ,016 ,077 ,836 ,404rbisyr ,011 ,007 ,214 1,541 ,124

a. Dependent Variable: lsalary

So each of the coefficients is statistically insignificiant. 

Does that imply we should not reject H0?

A N i h ll h h i f l i l i iAnswer: No – since the null hypothesis refers to multiple restrictions.

Page 55: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• The key issue: How much does RSS increase when weh ?impose the restrictions?

• We distinguish between:– The unrestricted (new) model: No restrictions imposed

– The restricted (old) model: Some restrictions imposed

I th t i t d d l b itt• In our case, the restricted model can be written as

+ u

Compared to the unrestricted model on the previous slide weCompared to the unrestricted model on the previous slide, weknow the RSS must be higher for this restricted model (sincethe factors omitted now go into the residual u).

Page 56: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

+ u

• Key question: Does RSS increase enough for it to be warranted to reject the null hypothesis?to be warranted to reject the null hypothesis?– If RSS increases a lot when you exclude the last three explanatory variables => those variables p yhave significant explanatory power (and shouldnot be omitted).

– If RSS increases little when you exclude the last three explanatory variables => those variables 

(have little explanatory power (and can be omitted).

Page 57: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The F statisticThe F statistic

where SSRr is the sum of squared residuals for the restrictedwhere SSRr is the sum of squared residuals for the restrictedmodel; SSRur is the SSR for the unrestricted model, and q is the number of restrictions imposed in moving from the unrestrictedto the restricted model.

See p. 244

F statistic can be written as 

Page 58: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

The F statistic & the F distributionThe F statistic & the F distribution

• To use the F statistic we must know its sampling o use t e stat st c e ust o ts sa p gdistribution under the null (this enables us tochoose critical values & rejection rules)

• Under H0, F follows an F distribution with (q,n‐k‐1) degrees of freedom: F~Fq,n‐k‐1.

• The 10%, 5% and 1% critical values for the F distribution are given in Table D.3, p.880.R j ti l R j t H i f f H t ( ) th• Rejection rule: Reject H0 in favor of H1 at (say) the 5% significance level if F>c, where c is the 95th percentile in the F k 1 distributionpercentile in the Fq,n‐k‐1 distribution.

Page 59: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to
Page 60: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Unrestricted model:

ANOVAb

ModelSum of Squares df Mean Square F Sig.

1 Regression 304,579 5 60,916 115,249 ,000a

Residual 182,881 346 ,529Total 487,460 351

a. Predictors: (Constant), rbisyr, bavg, years, gamesyr, hrunsyr

b. Dependent Variable: lsalary

Restricted model:

ANOVAb

F= [(197.795‐182.881)/3] / (182.881/346) = 9.41

Implication???

ANOVA

ModelSum of Squares df Mean Square F Sig.

1 Regression 289,664 2 144,832 255,549 ,000a

Residual 197,795 349 ,567T l 487 460 351Total 487,460 351

a. Predictors: (Constant), gamesyr, yearsb. Dependent Variable: lsalary

Page 61: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

According to the follwing SPSS outputs, can you get the lt I h d th i lid ?same result as I showed on the previous slide?

Model 1 Summary

Model R R Square Adjusted R SquareStd. Error of the

Estimate1 ,790a ,625 ,619 ,72701947

a. Predictors: (Constant), rbisyr, bavg, years, gamesyr, hrunsyr

Model 2Summary

Std E f thModel R R Square Adjusted R Square

Std. Error of the Estimate

1 ,771a ,594 ,592 ,75282740

a. Predictors: (Constant), gamesyr, years

Page 62: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Carrying out an F test is easy in SPSS…ANOVAb

Model

Sum of

Squares dfMean Square F Sig.

1 Regression 304,579 5 60,916 115,249 ,000a

bavg hrunsyr rbisyr are jointly

g , , , ,

Residual 182,881 346 ,529Total 487,460 351

a. Predictors: (Constant), rbisyr, bavg, years, gamesyr, hrunsyrb. Dependent Variable: lsalary bavg, hrunsyr, rbisyr are jointly

statistically significant in this model

In view of this, how explain that the t f h h bl

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Betastatitics for these three variables are all insignificant? (low t‐values)

Multicollinearity:You will study it in 

g1 (Constant) 11,204 ,289 38,714 ,000

years ,069 ,012 ,225 5,651 ,000gamesyr ,012 ,003 ,381 4,700 ,000bavg ,001 ,001 ,031 ,861 ,390hrunsyr 013 016 077 836 404 y y

Chap.10.hrunsyr ,013 ,016 ,077 ,836 ,404rbisyr ,011 ,007 ,214 1,541 ,124

a. Dependent Variable: lsalary

Page 63: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Computing p‐values for F testsComputing p values for F tests

• The p‐value is defined asThe p value is defined as

where F is the random variable with df=(q,n‐k‐k) and F is the actual value of the test statistic.

Interpretation of p: The probability of observing a value for F as large as we did given that the null hypothesis is true.g g yp

For example, p‐value = 0.016 implies such probability is only1 6% hence we would reject the null hypothesis at the 5%1.6% ‐ hence we would reject the null hypothesis at the 5% level (but not at the 1% level, right?).

Page 64: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Basic EconometricsBasic Econometrics

Chapter 9: Dummy Variable Regression Models

Iris Wang

[email protected]

Page 65: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Multiple Regression Analysis with QualitativeI f i Bi ( D ) V i blInformation: Binary (or Dummy) Variables

• Previous chapters: The dependent and independent variables• Previous chapters: The dependent and independent variables had quantitative meaning (e.g. hourly wage rate, years of education etc.)

• We will now study methods for incorporating qualitativefactors in regression analysis.

• Examples: Gender of individual (male or female); location• Examples: Gender of individual (male or female); location(e.g. region); marital status (married or not); employmentstatus (e.g. someone either has a job or not)

Page 66: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Describing qualitative informationDescribing qualitative information• Qualitative factors often come in the form of binary

information (e.g. female / male; employed or not…)

• Such information is captured by a binary variable, or a zero‐blone variable.

• Binary variables are often called dummy variables.

• Suppose we know whether each person in the data is male or female. We want to use this information to construct a d i bl hi h h ll f l d hi hdummy variable, which perhaps we call female and which weset equal to one for all females and zero for all males. (Alternatively, we could define male to be equal to 1 for males and 0 for females.)

Page 67: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Example:

Page 68: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

A single dummy independent variableA single dummy independent variable

• Now consider adding a dummy variable to the i S dd f l t thregression. Suppose we add female to the wage

model:

• Interpretation of δ0: Difference in hourly wagep 0 y gbetween females and males, given the same amount of eduction (and the same error term u)

• Discrimination: if δ0<0, then for the same level of other factors women earn less than men onother factors, women earn less than men on average.

Page 69: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

If we assume E(u|female,educ)=0 (MLR 5), then:

or simply

• Important: The level of education is the same inImportant: The level of education is the same in both expectations, so the difference δ0 is due togender only.  g y

• Suppose we were to estimate the model witheducation excluded ‐ how would that change the ginterpretation of the results? 

Page 70: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Graphical illustration (δ0<0)

It’s like an intercept shift…

• Why are we not including a dummy variable for male?

• No need: two groups (males & females), so two intercepts areenough. Also note male=1‐female, i f lli ii.e. perfect collinearity.

• If female=0 and male=1, will you get the same results? Why and h ?why not?

Page 71: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Base group (or benchmark group)Base group (or benchmark group)

• In this specification we have chosen males to be the base group (or the benchmark group) the groupbase group (or the benchmark group) – the groupagainst which comparisons are made.

• We could alternatively write the model as• We could alternatively write the model as 

• How would the parameter estimates differ across these twospecifications? Would the interpretation differ?

! d k h h h b• Important! You need to know which group is the base group, otherwise the results are impossible to interpret

Page 72: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Example: A simple test for wage discrimination

• Data: WAGE1.dta. Model:

• Summary statistics by gender and overall:• Summary statistics, by gender and overall:

Note the big difference in average

Report

female wage educ exper tenure g gwage!

4.5936 – 7.0995 = ‐2.5059

male Mean 7,0995 12,79 17,56 6,47

N 274 274 274 274female Mean 4,5936 12,32 16,49 3,63

There are also differences in the human capital variables – could it be that the wage difference across

female Mean 4,5936 12,32 16,49 3,63

N 251 251 251 251Total Mean 5,9014 12,57 17,05 5,11

genders is due to differences in human capital?N 525 525 525 525

Page 73: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

How to do it in SPSS

Page 74: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

OLS Results from SPSSOLS Results from SPSS

1) Interpret: ) p• i) all coefficients; • ii) all t‐values; • iii) all p‐values; • iv) the R‐squared; • v) the F‐test2) Based on these results, 

ld hwould you argue thereis evidence of genderwage discrimination?wage discrimination?

Page 75: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Simple regression & how it relates tosummary statistics

Coefficientsa

Unstandardized Coefficients

Standardized Coefficients

Model t Sig.B Std. Error Beta1 (Constant) 7,099 ,210 33,779 ,000

female -2,506 ,304 -,339 -8,244 ,000a Dependent Variable: wagea. Dependent Variable: wage

Report

wage

How do these resultsrelate to each other?

wage

female Mean Nmale 7,0995 274f l 4 5936 251female 4,5936 251

Total 5,9014 525

Page 76: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

CausalityCausality

• As with other independent variables, we should alwayss o e depe de a ab es, e s ou d a aysask if the estimated effect of the qualitative variable is causal.

• This depends on whether MLR.3 is satisified or not.

• Example: Effect of a job training grant on hours of• Example: Effect of a job training grant on hours of training per employee, controlling for sales and employment. Effect is estimated positive and is highlystatistically significant. 

– But perhaps firms receiving grants would have trained theirworkers more even in the absense of a grant…? That is, theremight be omitted variable bias.

Page 77: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

• Assumption 3: Zero conditional mean ‐ the error u• Assumption 3: Zero conditional mean ‐ the error uihas an expected value of zero given xi: 

E(u i|xi)=0E(u i|xi) 0

In deciding when the linear regression is goingIn deciding when the linear regression is going to produce unbiased estimators, assumption 3 is crucialis crucial.

Page 78: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

Interpreting coefficients on dummy explanatory ibl h h d d i bl i l ( )varaibles when the dependent variable is log(y)

• Model:Model:

• As you know, coefficients in models in which the dependent variable have a percentage interpretation.

• This is true for coefficients on dummy explanatory variables too.

• The exact percentage difference is of the form

Page 79: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

9.4 Using dummy variables for l lmultiple  categories

• Addmarried (0 if not married, 1 if married) to theAddmarried (0 if not married, 1 if married) to the wage equation – now two dummies included:

Coefficientsa

Unstandardized CoefficientsStandardized Coefficients

ModelUnstandardized Coefficients Coefficients

t Sig.B Std. Error Beta1 (Constant) ,490 ,101 4,837 ,000

female -,286 ,037 -,269 -7,650 ,000d 084 007 437 12 015 000educ ,084 ,007 ,437 12,015 ,000

exper ,003 ,002 ,080 1,860 ,063tenure ,017 ,003 ,229 5,702 ,000married ,126 ,040 ,115 3,140 ,002

a. Dependent Variable: lwage

Page 80: Chapter7: Multiple Regression Analysis: Estimation · General modelwith two independent variables: where • β 0 is the intercept • β 1 measures the change in y withrespect to

*9.5 Allowing for different slopes

C id h d lConsider the model:

Different intercept for females Different slope coef for females