chapter 12 multiple linear regression doing it with more variables! more is better. chapter 12a

40
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Upload: isabel-pope

Post on 12-Jan-2016

239 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Chapter 12Multiple Linear Regression

Doing it with more variables!

More is better.

Chapter 12A

Page 2: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

What are we doing?

Page 3: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

12-1 Multiple Linear Regression Models

• Many applications of regression analysis involve situations in which there are more than one regressor variable.

• A regression model that contains more than one regressor variable is called a multiple regression model.

Page 4: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

12-1.1 Introduction

• For example, suppose that the effective life of a cutting tool depends on the cutting speed and the tool angle. A possible multiple regression model could be

where

Y – tool life

x1 – cutting speed

x2 – tool angle

Page 5: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The Model

Y = + X1 + X2 + … + kXk +

More than one regressor or predictor variable.

Linear in the unknown parameters – the ’s.

The - intercept, i - partial regression coefficients, – errors.

Can handle nonlinear functions as predictors, e.g. X3 = Z2.

Interactions can be present, e.g. X1X2.

Page 6: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The Data

The data collection stepin a regression analysis

Page 7: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

A Data Example

TeamGames Won

Passing Yds.

% Run

Plays

Opp. Rushing

Yds.

Oakland 13 2285 45.3 1903Pittsburgh 10 2971 53.8 1457Baltimore 11 2309 74.1 1848Los Angeles 10 2528 65.4 1564Dallas 11 2147 78.3 1821Atlanta 4 1689 47.6 2577Buffalo 2 2566 54.2 2476Chicago 7 2363 48 1984

Example – Oakland games won:

13 = + *2285 + 45.3 +

*1903 +

Similar equation for every data point.

More equations than beta’s

Page 8: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Least Squares Estimation of the Parameters

• The least squares function is given by

• The least squares estimates must satisfy

Page 9: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The least squares normal Equations

• The solution to the normal Equations are the least squares estimators of the regression coefficients.

Page 10: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The Matrix Approach

where

Vector of predicted

values

Our observations – the predictor

variablesUnknown vector of error terms –

possibly normally

distributed

The vector of coefficients we must estimate.

11 12 1 01 1

2 21 22 2 21

1 2

1 ...

1 ...

: ::: : : :

1 ...

k

k

n nkn n nk

x x xy

y x x xy X

y x x x

Page 11: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Solving those normal equations

ˆ

t t

-1t t

-1t t

Y = XB

X Y = X XB

X X X X B = B

B = X X X Y

11 12 1

11 21 32 21 22 2

1 2 3 1 2

1 ...1 1 1 ...... 1

... 1 ...'

: : : : : : : :

... 1 ...

k

nk k

k k k nk n n nk

x x x

x x x x x x xX X

x x x x x x x

Page 12: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Least-Squares in Matrix Form

2

1

' ( ) '( )

( ' ' )( ) ' 2 ' ' ' '

n

ii

L y X y X

y X y X y y X y X X

1

ˆ0 (the normal equations)

ˆ,

ˆ ˆ( ) analogous to xy

xx

Limplies X X X y

Solving for we get

SX X X y

S

Page 13: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

More Matrix Approach

Page 14: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Wire bonding is a method of making interconnections between a microchip and other electronics as part of semiconductor device fabrication.

Page 15: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Page 16: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Page 17: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Page 18: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Page 19: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Example 12-2

Page 20: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Some Basic Terms and Concepts

Residuals are estimators of the error term in the regression model:

iii yye ˆ

We use an unbiased estimator of the variance of the error term.

pn

SS

pn

eE

n

ii

1

2

SSE is called the residual sums of squares and n-p is the residual degrees of freedom. ‘residual’ – what remains after the regression explains all of the variability in the data it can.

Page 21: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A
Page 22: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Estimating 2

An unbiased estimator of 2 is

Page 23: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Properties of the Least Squares Estimators

Note that in this treatment, the elements of X are not random variables. They are the observed values of the xij. We treat them as though they are constants, often coefficients of random variables like the i.

12 )()ˆ(

)ˆ(

XXV

E

•The first result says that the estimators are unbiased.

•The second result shows the covariance structure of the estimators – diagonal and off-diagonal elements

•It is important to remember that in a typical multiple regression model the estimates of the coefficients are not independent of one another.

Page 24: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Properties of the Least Squares Estimators

Unbiased estimators:

Covariance Matrix:

Page 25: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Covariance Matrix of the Regression Coefficients

jiCC

CV

CCC

CCC

CCC

CCC

CXX

jiijji

jjj

jiij

for )ˆ,ˆcov(

and)ˆ(

. that so symmetric is

)(

22

2

222120

121110

020100

1

•In general, we do not know 2. We estimate it by the mean square error of the residuals (estimated standard error)

•the quality of our estimates of the regression coefficients is very much related to (X’X)-1.

•the estimates of the coefficients are not independent

Page 26: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Test for Significance of Regression

The appropriate hypotheses are

The test statistic is

Page 27: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

ANOVA

•The basic idea is that the data (the yi values) has some variability – if it didn’t there would be nothing to explain.

•A successful model explains most of the variability, leaving little to be carried by the error term.

Page 28: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

R2

The coefficient of multiple determination

Page 29: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The Adjusted R2

• The adjusted R2 statistic penalizes the analyst for adding terms to the model.

• It can help guard against overfitting (including regressors that are not really useful)

Page 30: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Tests on Individual Regression Coefficients and Subsets of Coefficients

The test statistic is

• Reject H0 if |t0| > t/2,n-p.

• This is called a partial or marginal test

H0: j = j0

H1: j = j0

Page 31: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Linear Independence of the Predictors - some random thoughts

Instabilities in regression coefficients will occur where the values of one of the predictors are ‘nearly’ a linear combination of other predictors.

It would be incredibly unlikely that you would get an exact linear dependence. Coming close is bad enough.

What is the dimension of the space you are working in? It is n, where n is the number of data points in your sample. The prediction you are trying to match is an n dimensional

vector. You are trying to match it with a set of k (k << n)

predictors. The predictors had better be related to the prediction if this

is going to be successful!

Page 32: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Interactions and Higher Order Terms – still thinking randomly

Including interaction terms (products of two predictors), higher order terms, or functions of predictors does not make the model nonlinear.

Suppose you believe that the following relation may apply: Y = 0 + 1X1 + 22X2X2 +23X2X3 + 4exp(X4) + This is still a linear regression model – linear in the

beta’s. After recording the values of X1 through X4, you simply

calculate the values of the predictors into the columns of the worksheet for the regression software.

The model would become nonlinear if you were trying to estimate a parameter inside of the exponential function, e.g. 4exp(4eX4).

Page 33: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The NFL Again – problem 12-15 Predictor variables

Att pass attempts Comp – completed passes Pct Comp = percent completed passes Yds – yards gained passing Yds per Att – yards gained per pass attempt Pct TD = percent of attempts that are TDs Long – longest pass completion Int – number of interceptions Pct Int – percentage of attempts that are

interceptions Response Variable – quarterback rating

Page 34: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The NFL Again – problem 12-15

Player Att Comp Pct Yds Yds per TD Pct TD Long Int Pct Int Rating D.Culpepper, MIN 548 379 69.2 4717 8.61 39 7.1 82 11 2 110.9D.McNabb, PHI 469 300 64 3875 8.26 31 6.6 80 8 1.7 104.7B.Griese, TAM 336 233 69.3 2632 7.83 20 6 68 12 3.6 97.5M.Bulger, STL 485 321 66.2 3964 8.17 21 4.3 56 14 2.9 93.7B.Favre, GBP 540 346 64.1 4088 7.57 30 5.6 79 17 3.1 92.4J.Delhomme, CAR 533 310 58.2 3886 7.29 29 5.4 63 15 2.8 87.3K.Warner, NYG 277 174 62.8 2054 7.42 6 2.2 62 4 1.4 86.5M.Hasselbeck, SEA 474 279 58.9 3382 7.14 22 4.6 60 15 3.2 83.1A.Brooks, NOS 542 309 57 3810 7.03 21 3.9 57 16 3 79.5T.Rattay, SFX 325 198 60.9 2169 6.67 10 3.1 65 10 3.1 78.1M.Vick, ATL 321 181 56.4 2313 7.21 14 4.4 62 12 3.7 78.1J.Harrington, DET 489 274 56 3047 6.23 19 3.9 62 12 2.5 77.5V.Testaverde, DAL 495 297 60 3532 7.14 17 3.4 53 20 4 76.4P.Ramsey, WAS 272 169 62.1 1665 6.12 10 3.7 51 11 4 74.8J.McCown, ARI 408 233 57.1 2511 6.15 11 2.7 48 10 2.5 74.1P.Manning, IND 497 336 67.6 4557 9.17 49 9.9 80 10 2 121.1D.Brees, SDC 400 262 65.5 3159 7.9 27 6.8 79 7 1.8 104.8B.Roethlisberger, PIT 295 196 66.4 2621 8.88 17 5.8 58 11 3.7 98.1T.Green, KAN 556 369 66.4 4591 8.26 27 4.9 70 17 3.1 95.2T.Brady, NEP 474 288 60.8 3692 7.79 28 5.9 50 14 3 92.6C.Pennington, NYJ 370 242 65.4 2673 7.22 16 4.3 48 9 2.4 91B.Volek, TEN 357 218 61.1 2486 6.96 18 5 48 10 2.8 87.1J.Plummer, DEN 521 303 58.2 4089 7.85 27 5.2 85 20 3.8 84.5D.Carr, HOU 466 285 61.2 3531 7.58 16 3.4 69 14 3 83.5B.Leftwich, JAC 441 267 60.5 2941 6.67 15 3.4 65 10 2.3 82.2C.Palmer, CIN 432 263 60.9 2897 6.71 18 4.2 76 18 4.2 77.3J.Garcia, CLE 252 144 57.1 1731 6.87 10 4 99 9 3.6 76.7D.Bledsoe, BUF 450 256 56.9 2932 6.52 20 4.4 69 16 3.6 76.6K.Collins, OAK 513 289 56.3 3495 6.81 21 4.1 63 20 3.9 74.8K.Boller, BAL 464 258 55.6 2559 5.52 13 2.8 57 11 2.4 70.9

Page 35: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

The NFL Again – problem 12-15

Fit a multiple regression model using Pct Comp, Pct TD, and the Pct Int

Estimate 2

Determine the standard errors of the regression coefficients

Predict the rating when Pct Comp = 60%, Pct TD is 4%, and the Pct Int = 3%

Page 36: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Now the solutions

Page 37: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

More NFL – problem 12-31 Test the regression model for significance

using = .05 Find the p-value conduct a t-test on each regression coefficient

These are very good problems

to answer.

Page 38: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Again with the answers

Page 39: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Even more answers

Page 40: Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

Next Time

Confidence Intervals, againModeling and Model

AdequacyAlso, Doing it with Computers

Computers are good.