1 chapter 13 multiple regression. 2 chapter outline multiple regression model least squares method...

38
1 Chapter 13 Multiple Regression

Upload: norah-oconnor

Post on 05-Jan-2016

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

1

Chapter 13

Multiple Regression

Page 2: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

2

Chapter Outline

Multiple Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Estimation and Prediction Categorical Independent Variables

Page 3: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

3

Multiple Regression Model

yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +

The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is:

where:0, 1, 2, . . . , p are the parameters, and is a random variable called the error term

Multiple Regression Model

Page 4: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

4

Multiple Regression Equation

Multiple regression equation is:Multiple regression equation is:

The equation that describes how the mean value of y is related to x1, x2, . . . xp is:

E(y) = 0 + 1x1 + 2x2 + . . . + pxp

Page 5: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

5

Estimated Multiple Regression Equation

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimates of the parameters 0, 1, 2, . . . , p.

^y = b0 + b1x1 + b2x2 + . . .

+ bpxp

Estimated Multiple Regression Equation

Page 6: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

6

Estimation Process

Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ pxp +

Multiple Regression EquationE(y) = 0 + 1x1 + 2x2 +. . .+ pxp

Unknown parameters are0, 1, 2, . . . , p

Sample Data:x1 x2 . . . xp y. . . .. . . .

0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x

Estimated MultipleRegression Equation

Sample statistics are

b0, b1, b2, . . . , bp

b0, b1, b2, . . . , bp

provide estimates of0, 1, 2, . . . , p

Page 7: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

7

Least Squares Method

Least Squares CriterionLeast Squares Criterion

min (y yi i )2min (y yi i )2

Computation of Coefficient Values:

The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra.

We will rely on computer software packages toperform the calculations.

Page 8: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

8

Multiple Regression

Example: Employee Salary SurveyExample: Employee Salary Survey

The gender of employees, years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 employees is shown on the next slide.

A local firm collected data for a sample of 20 employees. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s aptitude test.

Page 9: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

9

Multiple Regression Example: Employee Salary Survey (data)Example: Employee Salary Survey (data)

GenderYears of

ExperienceScore

Salary ($1,000)

GenderYears of

ExperienceScore

Salary ($1,000)

F 4 78 24.0 M 9 88 38.0M 7 100 43.0 F 2 73 26.6F 1 86 23.7 M 10 75 36.2M 5 82 34.3 F 5 81 31.6M 8 86 35.8 F 6 74 29.0M 10 84 38.0 M 8 87 34.0F 0 75 22.2 F 4 79 30.1F 1 80 23.1 M 6 94 33.9F 6 83 30.0 F 3 70 28.2M 6 91 33.0 F 3 89 30.0

Page 10: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

10

Multiple Regression Model

Suppose we believe that salary (y) is related to

the years of experience (x1) and the score on the

aptitude test (x2) by the following regression model:

where y = annual salary ($1000) x1 = years of experience

x2 = score on aptitude test

y = 0 + 1x1 + 2x2 +

Page 11: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

11

Solving for the Estimates of 0, 1, 2

Input DataInput DataLeast SquaresLeast Squares

OutputOutput

xx11 xx22 yy

4 78 4 78 2424

7 100 7 100 4343

. . . . ..

. . . . ..

3 89 3 89 3030

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

bb00 = =

bb11 = =

bb22 = =

RR22 = =

etc.etc.

Page 12: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

12

Solving for the Estimates of 0, 1, 2

Excel’s Regression Output – Parameter Estimates

Coefficients Standard Error t Stat P-valueIntercept 3.1739 6.1561 0.5156 0.6128Years of

Experience1.4039 0.1986 7.0702 0.0000

Score 0.2509 0.0774 3.2433 0.0048

Note: All the numbers are rounded to the 4th decimal point.

Page 13: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

13

Estimated Regression Equation

SALARY = 3.174 + 1.404(YEARS) + 0.251(SCORE)

Note: Predicted salary will be in thousands of dollars.

Page 14: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

14

Interpreting the Coefficients

In multiple regression analysis, we interpret each

regression coefficient as follows: bi represents an estimate of the change in y corresponding to a 1-unit increase in xi when all other independent variables are held constant.

Page 15: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

15

Interpreting the Coefficients

Salary is expected to increase by $1,404 for each additional year of experience (when the variablescore on attitude test is held constant).

b1 = 1.404b1 = 1.404

Page 16: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

16

Interpreting the Coefficients

b2 = 0.251b2 = 0.251

Salary is expected to increase by $251 for each additional point scored on the aptitude test (when the variable years of experience is held constant).

Page 17: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

17

Multiple Coefficient of Determination

where: SST = total sum of squares (i.e. total variability

of y) SSR = sum of squares due to regression (i.e. the

variability of y that is explained by regression) SSE = sum of squares due to error (i.e. the variability

of y that cannot be explained by regression)

SST = SSR + SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y

• Relationship Among SST, SSR, SSE

Page 18: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

18

Multiple Coefficient of Determination

Excel’s ANOVA Output

ANOVAdf SS MS F Significance F

Regression 2 500.3285303 250.1643 42.76013 2.32774E-07Residual 17 99.45696969 5.85041

Total 19 599.7855

SSTSSR

Page 19: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

19

Multiple Coefficient of Determination

r2 = SSR/SST = 500.3285/599.7855 = .83418500.3285/599.7855 = .83418

The regression relationship is strong. About83.4% of the variability in the salary of employees can beexplained by the years of experience and the aptitude score.

Page 20: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

20

Adjusted Multiple Coefficient of Determination

R Rn

n pa2 21 1

11

( )R Rn

n pa2 21 1

11

( )

2 20 11 (1 .834179) .814671

20 2 1aR

2 20 11 (1 .834179) .814671

20 2 1aR

Note: p is the number of slope coefficients.

Page 21: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

21

Assumptions About the Error Term

yy = = 00 + + 11xx11 + + 22xx22 + … + + … + ppxxpp + +

1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.

2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.

3. The values of are independent.3. The values of are independent.

4. The error 4. The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

4. The error 4. The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

Page 22: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

22

Testing for Significance

In simple linear regression, the F and t tests provide the same conclusion. In simple linear regression, the F and t tests provide the same conclusion.

In multiple regression, the F and t tests have different purposes. In multiple regression, the F and t tests have different purposes.

Page 23: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

23

Testing for Significance

The F test is used to test the overall significance of a regression model. The F test is used to test the overall significance of a regression model.

The t test is used to test the individual significance, i.e. whether each of the individual independent variables is significant.

The t test is used to test the individual significance, i.e. whether each of the individual independent variables is significant.

Page 24: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

24

Testing for Significance: F Test

Hypotheses

Rejection Rule

Test Statistics

H0: 1 = 2 = . . . = p = 0

Ha: One or more of the parameters

is not equal to zero.

F = MSR/MSE

Reject H0 if p-value < or if F > F

where F is based on an F distribution

with p d.f. in the numerator andn - p - 1 d.f. in the denominator.

Page 25: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

25

F Test: Employee Salary Survey

Hypotheses

Test Statistics

H0: 1 = 2 = 0

Ha: One or both of the parameters

is not equal to zero.

F = MSR/MSE = 250.16/5.85 = 42.76

Rejection Rule For = .05 and d.f. = (2, 17); F.05 = 3.59

Reject H0 if p-value < .05 or F > 3.59

Page 26: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

26

F Test: Employee Salary Survey

ANOVAdf SS MS F Significance F

Regression 2 500.3285303 250.1643 42.76013 2.32774E-07Residual 17 99.45696969 5.85041

Total 19 599.7855

ConclusionConclusion pp-value -value << .05, so we can reject .05, so we can reject HH00..(Also, (Also, FF = 42.76 = 42.76 >> 3.59) 3.59)

p-value

Page 27: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

27

Testing for Significance: t Test

Hypotheses

Rejection Rule

Test Statistics

Reject H0 if p-value < or

if t < -tor t > twhere t

is based on a t distributionwith n - p - 1 degrees of freedom.

0 : 0iH 0 : 0iH

: 0a iH : 0a iH

ib

i

s

bt

Page 28: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

28

t Test: Employee Salary Survey

Hypotheses

Rejection Rule

Test Statistics

0 : 0iH 0 : 0iH

: 0a iH : 0a iH

07.71986.0

4039.1

1

1 bs

b24.3

07735.0

25089.0

2

2 bs

b

For = .05 and d.f. = 17, t.025 = 2.11

Reject H0 if p-value < .05, or

if t < -2.11 or t > 2.11

Page 29: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

29

t Test: Employee Salary Survey

Coefficients Standard Error t Stat P-valueIntercept 3.1739 6.1561 0.5156 0.6128Years of

Experience1.4039 0.1986 7.0702 0.0000

Score 0.2509 0.0774 3.2433 0.0048

ConclusionsReject both H0: 1 = 0 and H0: 2 = 0.

Both independent variables aresignificant.

Page 30: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

30

Test for Significance: Multicollinearity

The term multicollinearity refers to the correlation among the independent variables. The term multicollinearity refers to the correlation among the independent variables.

When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.

When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.

Page 31: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

31

Using the Estimated Regression Equation for Estimation and Prediction

The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.

The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.

We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.

We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.

Page 32: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

32

Categorical Independent Variables

In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.

In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.

For example, xi might represent gender where xi = 0 indicates male and xi = 1 indicates female.

For example, xi might represent gender where xi = 0 indicates male and xi = 1 indicates female.

In this case, xi is called a dummy or indicator variable. In this case, xi is called a dummy or indicator variable.

Page 33: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

33

Categorical Independent Variables

The years of experience, the score on the aptitude test, employees’ gender, and the annual salary ($000) for each of the sampled 20 employees are shown on the next slide.

Example: Employee Salary Survey

As an extension of the problem involving the employee salary survey, suppose that management wants to find out if the annual salary is related to employees’ gender.

Page 34: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

34

Estimated Regression Equation

^

where:

y = annual salary ($1000) x1 = years of experience

x2 = score on aptitude test

x3 = 0 if an employee is female; 1 if an employee is male.

x3 is a dummy variable

y = b0 + b1x1 + b2x2 + b3x3^

Page 35: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

35

Categorical Independent Variables

Years of Experience

Score GenderSalary ($1,000)

Years of Experience

Score GenderSalary ($1,000)

4 78 0 24.0 9 88 1 38.07 100 1 43.0 2 73 0 26.61 86 0 23.7 10 75 1 36.25 82 1 34.3 5 81 0 31.68 86 1 35.8 6 74 0 29.010 84 1 38.0 8 87 1 34.00 75 0 22.2 4 79 0 30.11 80 0 23.1 6 94 1 33.96 83 0 30.0 3 70 0 28.26 91 1 33.0 3 89 0 30.0

Page 36: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

36

Categorical Independent Variables

Excel’s Regression Statistics

Regression StatisticsMultiple R 0.92021524R Square 0.84679609Adjusted R Square 0.81807035Standard Error 2.3964751Observations 20

Page 37: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

37

Categorical Independent Variables

Excel’s ANOVA Output

ANOVAdf SS MS F Significance F

Regression 3 507.8960134 169.2987 29.47866 9.41675E-07Residual 16 91.88948657 5.743093Total 19 599.7855

Page 38: 1 Chapter 13 Multiple Regression. 2 Chapter Outline  Multiple Regression Model  Least Squares Method  Coefficient of Determination  Model Assumptions

38

Categorical Independent Variables

Excel’s Regression Equation Output

Coefficients Standard Error t Stat P-valueIntercept 7.94484872 7.380797058 1.076422 0.297702Years of Experience 1.14758173 0.29760152 3.856102 0.001397Score 0.19693699 0.089903726 2.190532 0.04364Gender 2.28042384 1.986610668 1.147897 0.267885

Not significant