lesson07_new

47
Lesson 7- Lesson 7-1 Statistics for Management Lesson 6 The Linear Regression Model and Correlation

Upload: shengvn

Post on 31-Jan-2015

1.746 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lesson07_new

Lesson 7-Lesson 7-11

Statistics for Management

Lesson 6

The Linear Regression Model and Correlation

Page 2: Lesson07_new

Lesson. 7 - 2

Lesson Topics

1. Types of Regression Models

2. Determining the Simple Linear Regression Equation

3. Measures of Variation in Regression and Correlation

4. Estimation of Predicted Values

5. The Multiple Regression Model

Page 3: Lesson07_new

Lesson. 7 - 3

1. Purpose of Regression and Correlation Analysis

• Regression Analysis is Used Primarily for Prediction

A statistical model used to predict the values of a dependent or response variable based on values of at least one independent or explanatory variable

Correlation Analysis is Used to Measure Strength of the Association Between Numerical Variables

Page 4: Lesson07_new

Lesson. 7 - 4

The Scatter Diagram

0204060

0 20 40 60

X

Y

Plot of all (Xi , Yi) pairs

Page 5: Lesson07_new

Lesson. 7 - 5

Types of Regression Models

Positive Linear Relationship

Negative Linear Relationship

Relationship NOT Linear

No Relationship

Page 6: Lesson07_new

Lesson. 7 - 6

2. Simple Linear Regression Model

iii XY 10

Y intercept

Slope

• The Straight Line that Best Fit the Data

• Relationship Between Variables Is a Linear Function

Random Error

Dependent (Response) Variable

Independent (Explanatory) Variable

Page 7: Lesson07_new

Lesson. 7 - 7

i = Random Error

Y

X

Population Linear Regression Model

Observed Value

Observed Value

YX iX 0 1

Y Xi i i 0 1

Page 8: Lesson07_new

Lesson. 7 - 8

Sample Linear Regression Model

ii XbbY 10

Yi

= Predicted Value of Y for observation i

Xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate ofthe population 0

b1 = Sample Slope used as estimate of the population 1

Page 9: Lesson07_new

Lesson. 7 - 9

Simple Linear Regression Equation: Example

You wish to examine the relationship between the square footage of produce stores and its annual sales. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best

Annual Store Square Sales

Feet ($000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Page 10: Lesson07_new

Lesson. 7 - 10

Scatter Diagram Example

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($00

0)

Excel Output

Page 11: Lesson07_new

Lesson. 7 - 11

Equation for the Best Straight Line

i

ii

X..

XbbY

4871415163610

From SPSS Printout:

CoefficientsIntercept 1636.414726X Variable 1 1.486633657

Page 12: Lesson07_new

Lesson. 7 - 12

Graph of the Best Straight Line

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($00

0)

Y i = 1636.415 +1.487X i

Page 13: Lesson07_new

Lesson. 7 - 13

Interpreting the Results

Yi = 1636.415 +1.487Xi

The slope of 1.487 means for each increase of one unit in X, the Y is estimated to increase 1.487units.

For each increase of 1 square foot in the size of the store, the model predicts that the expected annual sales are estimated to increase by $1487.

Page 14: Lesson07_new

Lesson. 7 - 14

Standard Error of Estimate

2

n

SSESyx

21

2

n

)YY(n

iii

=

The standard deviation of the variation of observations around the regression line

Page 15: Lesson07_new

Lesson. 7 - 15

Inferences about the Slope: t Test

• t Test for a Population Slope Is a Linear Relationship Between X & Y ?

1

11

bS

bt

•Test Statistic:

n

ii

YXb

)XX(

SS

1

21

and df = n - 2

•Null and Alternative Hypotheses

H0: 1 = 0 (No Linear Relationship) H1: 1 0 (Linear Relationship)

Where

Page 16: Lesson07_new

Lesson. 7 - 16

Example: Produce Stores

Data for 7 Stores: Regression Model Obtained:

The slope of this model is 1.487.

Is there a linear relationship between the square footage of a store and its annual sales?

Annual

Store Square Sales Feet ($000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Yi = 1636.415 +1.487Xi

Page 17: Lesson07_new

Lesson. 7 - 17

t Stat P-valueIntercept 3.6244333 0.0151488X Variable 1 9.009944 0.0002812

H0: 1 = 0

H1: 1 0

.05

df 7 - 2 = 7

Critical Value(s):

Test Statistic:

Decision:

Conclusion:

There is evidence of a relationship.t

0 2.5706-2.5706

.025

Reject Reject

.025

From Excel Printout

Reject H0

Inferences about the Slope: t Test Example

Page 18: Lesson07_new

Lesson. 7 - 18

Inferences about the Slope: Confidence Interval Example

Confidence Interval Estimate of the Slopeb1 tn-2 1bS

Excel Printout for Produce Stores

At 95% level of Confidence The confidence Interval for the slope is (1.062, 1.911). Does not include 0.

Conclusion: There is a significant linear relationship between annual sales and the size of the store.

Lower 95% Upper 95%Intercept 475.810926 2797.01853X Variable 11.06249037 1.91077694

Page 19: Lesson07_new

Lesson. 7 - 19

3. Measures of Variation:The Sum of Squares

SST = Total Sum of Squares

•measures the variation of the Yi values around their mean Y

SSR = Regression Sum of Squares

•explained variation attributable to the relationship between X and Y

SSE = Error Sum of Squares

•variation attributable to factors other than the relationship between X and Y

_

Page 20: Lesson07_new

Lesson. 7 - 20

Measures of Variation: The Sum of Squares

Xi

Y i = b0

+ b1X i

Y

X

Y

SST = (Yi - Y)2

SSE =(Yi - Yi )2

SSR = (Yi - Y)2

_

_

_

Page 21: Lesson07_new

Lesson. 7 - 21

df SSRegression 1 30380456.12Residual 5 1871199.595Total 6 32251655.71

Measures of VariationThe Sum of Squares: Example

SPSS Output for Produce Stores

SSR SSE SST

Page 22: Lesson07_new

Lesson. 7 - 22

The Coefficient of Determination

SSR regression sum of squares

SST total sum of squaresr2 = =

Measures the proportion of variation that is explained by the independent variable X in the regression model

Page 23: Lesson07_new

Lesson. 7 - 23

Coefficients of Determination (r2) and Correlation (r)

r2 = 1, r2 = 1,

r2 = .8, r2 = 0,Y

Yi = b0 + b1Xi

X

^

YYi = b0 + b1Xi

X

^Y

Yi = b0 + b1Xi

X

^

Y

Yi = b0 + b1Xi

X

^

r = +1 r = -1

r = +0.9 r = 0

Page 24: Lesson07_new

Lesson. 7 - 24

Correlation: Measuring the Strength of Association

• Answer ‘How Strong Is the Linear Relationship Between 2 Variables?’

• Coefficient of Correlation Used Population correlation coefficient denoted

(‘Rho’) Values range from -1 to +1 Measures degree of association

• Is the Square Root of the Coefficient of Determination

Page 25: Lesson07_new

Lesson. 7 - 25

Regression StatisticsMultiple R 0.9705572R Square 0.94198129Adjusted R Square 0.93037754Standard Error 611.751517Observations 7

Measures of Variation: Example

Excel Output for Produce Stores

r2 = .94 Syx94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage

Page 26: Lesson07_new

Lesson. 7 - 26

4. Estimation of Predicted Values

Confidence Interval Estimate for XY

The Mean of Y given a particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

t value from table with df=n-2

Standard error of the estimate

Size of interval vary according to distance away from mean, X.

Page 27: Lesson07_new

Lesson. 7 - 27

Estimation of Predicted Values

Confidence Interval Estimate for Individual Response Yi at a Particular Xi

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Addition of this 1 increased width of interval from that for the mean Y

Page 28: Lesson07_new

Lesson. 7 - 28

Interval Estimates for Different Values of X

X

Y

X

Confidence Interval for a individual Yi

A Given X

Confidence Interval for the mean of Y

Y i = b0 + b1X i

_

Page 29: Lesson07_new

Lesson. 7 - 29

Example: Produce Stores

Yi = 1636.415 +1.487Xi

Data for 7 Stores:

Regression Model Obtained:

Predict the annual sales for a store with

2000 square feet.

Annual Store Square Sales

Feet ($000)

1 1,726 3,681

2 1,542 3,395

3 2,816 6,653

4 5,555 9,543

5 1,292 3,318

6 2,208 5,563

7 1,313 3,760

Page 30: Lesson07_new

Lesson. 7 - 30

Estimation of Predicted Values: Example

Confidence Interval Estimate for Individual Y

Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

= 4610.45 980.97

Confidence interval for mean Y

Page 31: Lesson07_new

Lesson. 7 - 31

Estimation of Predicted Values: Example

Confidence Interval Estimate for XY

Find the 95% confidence interval for annual sales of one particular stores of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

= 4610.45 1853.45

Confidence interval for individual Y

n

ii

iyxni

)XX(

)XX(

nStY

1

2

2

21

1

Page 32: Lesson07_new

Lesson. 7 - 32

5. Multipe Regression model

• The Multiple Regression Model

• Coefficient of Determination

• Model Building

Page 33: Lesson07_new

Lesson. 7 - 33

The Multiple Regression Model

ipipiii XXXY 22110

Relationship between 1 dependent & 2 or more independent variables is a linear function

Population Y-intercept

Population slopes

Dependent (Response) variable for sample

Independent (Explanatory) variables for sample model

Random Error

ipipiii eXbXbXbbY 22110

Page 34: Lesson07_new

Lesson. 7 - 34

Sample Multiple Regression Model

X2

X1

Y

pipiii XbXbXbbY 22110

ipipiii eXbXbXbbY 22110

ei

Page 35: Lesson07_new

Lesson. 7 - 35

Oil (Gal) Temp Insulation275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6

230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10

203.50 41 6441.10 21 3323.00 38 352.50 58 10

Multiple Regression Model: Example

(0F)Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.

Page 36: Lesson07_new

Lesson. 7 - 36

Sample Regression Model: Example

pipiii XbXbXbbY 22110Coefficients

Intercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067

Excel Output

iii X.X..Y 21 012204375151562 For each degree increase in

temperature, the average amount of heating oil used is decreased by 5.437 gallons, holding insulation

constant.

For each increase in one inch of insulation, the use of heating oil is decreased by 20.012 gallons, holding temperature constant.

Page 37: Lesson07_new

Lesson. 7 - 37

Using The Model to Make Predictions

969278

601220304375151562

012204375151562 21

.

...

X.X..Y iii

Estimate the average amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.

The estimated heating oil used is 278.97 gallons

Page 38: Lesson07_new

Lesson. 7 - 38

Coefficient of Multiple Determination

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

SPSS Output

SST

SSRr ,Y 2

12

Adjusted r2

•reflects the number of explanatory variables and sample size

• is smaller than r2

Page 39: Lesson07_new

Lesson. 7 - 39

Testing for Overall Significance

•Shows if there is a linear relationship between all of the X variables together and Y

•Use F test Statistic

•Hypotheses:

H0: 1 = 2 = … = p = 0 (No linear relationship)

H1: At least one i 0 ( At least one independent

variable affects Y)

Page 40: Lesson07_new

Lesson. 7 - 40

Test for Overall SignificanceExcel Output: Example

ANOVAdf SS MS F Significance F

Regression 2 228014.6 114007.3 168.4712028 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2

p = 2, the number of explanatory variables n - 1

MRSMSE

p value

= F Test Statistic

Page 41: Lesson07_new

Lesson. 7 - 41

F0 3.89

H0: 1 = 2 = … = p = 0

H1: At least one I 0 = .05

df = 2 and 12

Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at = 0.05

There is evidence that At least one independentvariable affects Y

= 0.05

F

Test for Overall SignificanceExample Solution

168.47(Excel Output)

Page 42: Lesson07_new

Lesson. 7 - 42

Test for Significance:Individual Variables

•Shows if there is a linear relationship between the

variable Xi and Y

•Use t test Statistic

•Hypotheses:

H0: i = 0 (No linear relationship)

H1: i 0 (Linear relationship between Xi and Y)

Page 43: Lesson07_new

Lesson. 7 - 43

Coefficients Standard Error t StatIntercept 562.151009 21.09310433 26.65094X Variable 1 -5.4365806 0.336216167 -16.1699X Variable 2 -20.012321 2.342505227 -8.54313

t Test StatisticExcel Output: Example

t Test Statistic for X1 (Temperature)

t Test Statistic for X2 (Insulation)

Page 44: Lesson07_new

Lesson. 7 - 44

H0: 1 = 0

H1: 1 0

df = 12 Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject H0 at = 0.05

There is evidence of a significant effect of temperature on oil

consumption.Z0 2.1788-2.1788

.025

Reject H0 Reject H0

.025

Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.

t Test : Example Solution

t Test Statistic = -16.1699

Page 45: Lesson07_new

Lesson. 7 - 45

Confidence Interval Estimate For The Slope

Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).

111 bpn Stb Coefficients Lower 95% Upper 95%

Intercept 562.151009 516.1930837 608.108935X Variable 1 -5.4365806 -6.169132673 -4.7040285X Variable 2 -20.012321 -25.11620102 -14.90844

-6.169 1 -4.704The average consumption of oil is reduced by between

4.7 gallons to 6.17 gallons per each increase of 10 F.

Page 46: Lesson07_new

Lesson. 7 - 46

Model Building

• Goal is to Develop Model with Fewest Explanatory Variables Easier to interpret Lower probability of collinearity

• Stepwise Regression Procedure Provide limited evaluation of alternative

models

• Best-Subset Approach

Page 47: Lesson07_new

Lesson. 7 - 47

Lesson Summary

• Described Types of Regression Models

• Determined the Simple Linear Regression Equation

• Provided Measures of Variation in Regression and Correlation

• Provided Estimation of Predicted Values

• Determined the Multiple Regression equation