10 - regression 1

7/29/2019 10 - Regression 1

1/58

Simple Linear Regression

Simple Linear Regression Model

Least Squares Method

Coefficient of Determination

Model Assumptions

Testing for Significance

Using the Estimated Regression Equation

for Estimation and Prediction

Residual Analysis: Validating Model Assumptions

Outliers and Influential Observations

7/29/2019 10 - Regression 1

2/58


Regression analysis can be used to develop anequation showing how the variables are related.

Managerial decisions often are based on therelationship between two or more variables.

The variables being used to predict the value of thedependent variable are called the independent

variables and are denoted by x.

The variable being predicted is called the dependentvariable and is denoted by y.

7/29/2019 10 - Regression 1

3/58


The relationship between the two variables isapproximated by a straight line.

Simple linear regression involves one independentvariable and one dependent variable.

Regression analysis involving two or moreindependent variables is called multiple regression.

7/29/2019 10 - Regression 1

4/58

Simple Linear Regression Model

y =b0 +b1x +e

where:

b0 andb1 are called parameters of the model,

e is a random variable called the error term.

The simple linear regression model is:

The equation that describes how y is related to x andan error term is called the regression model.

7/29/2019 10 - Regression 1

5/58

Simple Linear Regression Equation

The simple linear regression equation is:

E(y) is the expected value of y for a given x value.

b1 is the slope of the regression line.

b0 is the y intercept of the regression line.

Graph of the regression equation is a straight line.

E(y) =b0 +b1x

7/29/2019 10 - Regression 1

6/58


Positive Linear Relationship

E(y)

x

Slopeb1is positive

Regression line

Interceptb0

7/29/2019 10 - Regression 1

7/58


Negative Linear Relationship

E(y)

x

Slopeb1is negative

Regression lineIntercept

b0

7/29/2019 10 - Regression 1

8/58


No Relationship

E(y)

x

Slopeb1is 0

Regression line

Interceptb0

7/29/2019 10 - Regression 1

9/58

Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

0 1y b b x

is the estimated value of y for a given x value.y b1 is the slope of the line. b0 is the y intercept of the line. The graph is called the estimated regression line.

7/29/2019 10 - Regression 1

10/58

Estimation Process

Regression Modely =b0 +b1x +e

Regression EquationE(y) =b0 +b1x

Unknown Parametersb0,b1

Sample Data:

x yx1 y1. .

. .xn yn

b0

and b1

provide estimates ofb0 andb1

EstimatedRegression Equation

Sample Statisticsb0, b1

0 1y b b x

7/29/2019 10 - Regression 1

11/58


Least Squares Criterion

min (y yi i )2

where:

yi = observed value of the dependent variablefor the ith observation

^yi = estimated value of the dependent variable

for the ith observation

7/29/2019 10 - Regression 1

12/58

Slope for the Estimated Regression Equation

1 2

( )( )

( )

i i

i

x x y yb

x x


where:xi = value of independent variable for ith

observation

_y = mean value for dependent variable

_x = mean value for independent variable

yi = value of dependent variable for ith

observation

7/29/2019 10 - Regression 1

13/58

y-Intercept for the Estimated Regression Equation


0 1b y b x

7/29/2019 10 - Regression 1

14/58

Reed Auto periodically has

a special week-long sale.

As part of the advertising

campaign Reed runs one ormore television commercials

during the weekend preceding the sale. Data from a

sample of 5 previous sales are shown on the next slide.


Example: Reed Auto Sales

7/29/2019 10 - Regression 1

15/58


Example: Reed Auto Sales

Number ofTV Ads (x)

Number ofCars Sold (y)

1

3213

14

24181727

Sx = 10 Sy = 1002x 20y

7/29/2019 10 - Regression 1

16/58

Estimated Regression Equation

10 5y x

1 2

( )( ) 205

( ) 4i i

i

x x y yb

x x

0 1 20 5(2) 10b y b x

Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation

7/29/2019 10 - Regression 1

17/58

Scatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Cars

Sold

7/29/2019 10 - Regression 1

18/58


Relationship Among SST, SSR, SSE

where:

SST = total sum of squares

SSR = sum of squares due to regression

SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y2

( )iy y 2

( )i iy y

7/29/2019 10 - Regression 1

19/58

The coefficient of determination is:


where:

SSR = sum of squares due to regressionSST = total sum of squares

r2 = SSR/SST

7/29/2019 10 - Regression 1

20/58


r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong; 87.7%

of the variability in the number of cars sold can be

explained by the linear relationship between thenumber of TV ads and the number of cars sold.

7/29/2019 10 - Regression 1

21/58

Sample Correlation Coefficient

2

1 )of(sign rbrxy

ionDeterminatoftCoefficien)of(sign 1brxy

where: b1 = the slope of the estimated regression

equation xbby 10

7/29/2019 10 - Regression 1

22/58

21 )of(sign rbrxy

The sign of b1 in the equation is +. 10 5y x

= + .8772xyr

Sample Correlation Coefficient

rxy = +.9366

7/29/2019 10 - Regression 1

23/58

Assumptions About the Error Term e

1. The error e is a random variable with mean of zero.

2. The variance of e, denoted by 2, is the same forall values of the independent variable.

3. The values of e are independent.

4. The error e is a normally distributed randomvariable.

7/29/2019 10 - Regression 1

24/58


To test for a significant regression relationship, wemust conduct a hypothesis test to determine whetherthe value ofb1 is zero.

Two tests are commonly used:t Test and FTest

Both the t test and Ftest require an estimate of 2,the variance of e in the regression model.

7/29/2019 10 - Regression 1

25/58

An Estimate of 2


2

10

2)()(SSE iiii xbbyyy

where:

s2 = MSE = SSE/(n 2)

The mean square error (MSE) provides the estimate

of 2, and the notation s2 is also used.

7/29/2019 10 - Regression 1

26/58


An Estimate of

2

SSEMSE

ns

To estimate we take the square root of 2.

The resulting s is called the standard error ofthe estimate.

7/29/2019 10 - Regression 1

27/58

Hypotheses

Test Statistic

Testing for Significance: t Test

0 1: 0H b

1: 0aH b

1

1

b

bt

s where

1 2

( )

b

i

ss

x x

S

7/29/2019 10 - Regression 1

28/58

Rejection Rule


where:

tis based on a t distribution

with n - 2 degrees of freedom

Reject H0 ifp-value < or t < -t or t > t

7/29/2019 10 - Regression 1

29/58

1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule. Reject H0 ifp-value < .05or |t| > 3.182 (with3 degrees of freedom)


0 1: 0H b 1: 0aH b

1

1

b

bt

s

7/29/2019 10 - Regression 1

30/58


5. Compute the value of the test statistic.

6. Determine whether to reject H0.

t = 4.541 provides an area of .01 in the uppertail. Hence, thep-value is less than .02. (Also,t = 4.63 > 3.182.) We can reject H0.

1

1 5 4.631.08b

bt

s

7/29/2019 10 - Regression 1

31/58

Confidence Interval forb1

H0 is rejected if the hypothesized value of b1 is notincluded in the confidence interval for b1.

We can use a 95% confidence interval forb1 to test

the hypotheses just used in the t test.

7/29/2019 10 - Regression 1

32/58

The form of a confidence interval forb1 is:


11 /2 bb t s

where is the t value providing an area

of /2 in the upper tail of a t distributionwith n - 2 degrees of freedom

2/t

b1 is thepoint

estimator

is themarginof error

1/2 bt s

7/29/2019 10 - Regression 1

33/58


Reject H0 if 0 is not included in

the confidence interval for b1.

0 is not included in the confidence interval.Reject H0

= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb or 1.56 to 8.44

Rejection Rule

95% Confidence Interval forb1

Conclusion

7/29/2019 10 - Regression 1

34/58

Hypotheses

Test Statistic

Testing for Significance: F Test

F= MSR/MSE

0 1: 0H b

1: 0aH b

7/29/2019 10 - Regression 1

35/58

Rejection Rule


where:Fis based on an Fdistribution with

1 degree of freedom in the numerator and

n - 2 degrees of freedom in the denominator

Reject H0 ifp-value <

or F> F

7/29/2019 10 - Regression 1

36/58

1. Determine the hypotheses.

2. Specify the level of significance.

3. Select the test statistic.

= .05

4. State the rejection rule. Reject H0 ifp-value < .05

or F> 10.13 (with 1 d.f.in numerator and3 d.f. in denominator)


0 1: 0H b

1: 0aH b

F= MSR/MSE

7/29/2019 10 - Regression 1

37/58


5. Compute the value of the test statistic.

6. Determine whether to reject H0.

F= 17.44 provides an area of .025 in the uppertail. Thus, thep-value corresponding to F= 21.43is less than 2(.025) = .05. Hence, we reject H0.

F= MSR/MSE = 100/4.667 = 21.43

The statistical evidence is sufficient to conclude

that we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

Some Cautions about the

7/29/2019 10 - Regression 1

38/58

Some Cautions about theInterpretation of Significance Tests

Just because we are able to reject H0:b

1= 0 and

demonstrate statistical significance does not enableus to conclude that there is a linear relationshipbetween x and y.

Rejecting H0:b1 = 0 and concluding that the

relationship between x and y is significant doesnot enable us to conclude that a cause-and-effectrelationship is present between x and y.

Using the Estimated Regression Equation

7/29/2019 10 - Regression 1

39/58

Using the Estimated Regression Equationfor Estimation and Prediction

/ y t sp yp 2

where:

confidence coefficient is 1 - and

t/2 is based on a t distribution

with n - 2 degrees of freedom

/2 indpy t s

Confidence Interval Estimate of E(yp)

Prediction Interval Estimate of yp

7/29/2019 10 - Regression 1

40/58

If 3 TV ads are run prior to a sale, we expect

the mean number of cars sold to be:

Point Estimation

y = 10 + 5(3) = 25 cars

C fid I l f E( )

7/29/2019 10 - Regression 1

41/58

2

2

( )1

( )pp

y

i

x xs s

n x x

Estimate of the Standard Deviation of py

Confidence Interval for E(yp)

2

2 2 2 2 2

(3 2)12.16025

5 (1 2) (3 2) (2 2) (1 2) (3 2)pys

1 12.16025 1.44915 4pys

C fid I l f E( )

7/29/2019 10 - Regression 1

42/58

The 95% confidence interval estimate of the mean

number of cars sold when 3 TV ads are run is:

Confidence Interval for E(yp)

25 + 4.61

/ y t sp yp 2

25 + 3.1824(1.4491)

20.39 to 29.61 cars

P di i I l f

7/29/2019 10 - Regression 1

43/58

2

ind 2

( )11

( )

p

i

x xs s

n x x

Estimate of the Standard Deviation

of an Individual Value of yp

1 12.16025 1

5 4pys

2.16025(1.20416) 2.6013pys

Prediction Interval for yp

P di ti I t l f

7/29/2019 10 - Regression 1

44/58

The 95% prediction interval estimate of the numberof cars sold in one particular week when 3 TV adsare run is:

Prediction Interval for yp

25 + 8.28

25 + 3.1824(2.6013)

/2 indpy t s

16.72 to 33.28 cars

R id l A l i

7/29/2019 10 - Regression 1

45/58

Residual Analysis

i iy y

Much of the residual analysis is based on anexamination of graphical plots.

Residual for Observation i

The residuals provide the best information about e.

If the assumptions about the error term e appear

questionable, the hypothesis tests about thesignificance of the regression relationship and theinterval estimation results may not be valid.

R id l Pl t A i t

7/29/2019 10 - Regression 1

46/58

Residual Plot Against x

If the assumption that the variance of e is the same

for all values of x is valid, and the assumedregression model is an adequate representation of therelationship between the variables, then

The residual plot should give an overall

impression of a horizontal band of points

R id l Pl t A i t

7/29/2019 10 - Regression 1

47/58

x

y y

0

Good Pattern

Resid

ual


R id l Pl t A i t

7/29/2019 10 - Regression 1

48/58


x

y y

0

Resid

ual

Nonconstant Variance

R id l Pl t A i t

7/29/2019 10 - Regression 1

49/58


x

y y

0

Resid

ual

Model Form Not Adequate


7/29/2019 10 - Regression 1

50/58

Residuals


Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2


7/29/2019 10 - Regression 1

51/58


TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Re

siduals

Standardized Residuals

7/29/2019 10 - Regression 1

52/58

Standardized Residual for Observation i


i i

i i

y y

y y

s

1

i i iy ys s h

2

2

( )1

( )i

i

i

x xh n x x

where:

Standardized Residual Plot

7/29/2019 10 - Regression 1

53/58


The standardized residual plot can provide insight

about the assumption that the error term e has anormal distribution.

If this assumption is satisfied, the distribution of thestandardized residuals should appear to come from a

standard normal probability distribution.


7/29/2019 10 - Regression 1

54/58



Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069


7/29/2019 10 - Regression 1

55/58



A B C D

28

29 RESIDUAL OUTPUT

30

31 Observation Predicted Y Residuals dard Resid32 1 15 -1 -0.534522

33 2 25 -1 -0.534522

34 3 20 -2 -1.069045

354 15 2 1.069045

36 5 25 2 1.069045

37

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

StandardR

esidual


7/29/2019 10 - Regression 1

56/58


All of the standardized residuals are between 1.5

and +1.5 indicating that there is no reason to questionthe assumption that e has a normal distribution.


7/29/2019 10 - Regression 1

57/58


Detecting Outliers

An outlier is an observation that is unusual incomparison with the other data.

Minitab classifies an observation as an outlier if itsstandardized residual value is < -2 or > +2.

This standardized residual rule sometimes fails toidentify an unusually large observation as beingan outlier.

This rules shortcoming can be circumvented byusing studentized deleted residuals.

The |i th studentized deleted residual| will belarger than the |i th standardized residual|.

7/29/2019 10 - Regression 1

58/58

10 - regression 1

Documents