chapter 11 correlation and simple linear regression statistics for business (econ) 1

Post on 05-Jan-2016

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 11Correlation and

Simple Linear Regression

Statistics for Business(Econ)

1

2

Introduction

• In this chapter we employ Regression Analysisto examine the relationship among quantitative variables.

• The technique is used to predict the value of one variable (the dependent variable - y) based on the value of other variables (independent variables x1, x2,…xk.)

3

Correlation is a statistical technique that is used to measure and describe a relationship between two variables. The correlation between two variables reflects the degree to which the variables are related.For example:

A researcher interested in the relationship between nutrition and IQ could observe the dietary patterns for a group of children and then measure their IQ scores.A business analyst may wonder if there is any relationship between profit margin and return on capital for a group of public companies.

4

A set of n= 6 pairs of scores (X and Y values) is shown in a table and in a scatterplot. The scatterplot allows you to see the relationship between X and Y.

5

Positive correlation

6

Negative correlation

7

Non-linear relationship

8

9

A strong positive relationship, approximately +0.90;

A relatively weak negative correlation, approximately -0.40

10

A perfect negative correlation, -1.00

No linear trend, 0.00.

11

A demonstration of how one extreme data point (an outrider) can influence the value of a correlation.

12

A demonstration of how one extreme data point (an outrider) can influence the value of a correlation.

13

Pearson correlation The most common measure of correlation is the Pearson Product Moment Correlation (called Pearson's correlation for short).

=

=

1

1

n

yyxxs

n

iii

xy

yx

xy

ss

sr

correlation coefficient

14

The value r2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable. A correlation of r =0.80 (or -0.80), for example, means that r2 =0.64 (or 64%) of the variability in the Y scores can be predicted from the relationship with X.

15

16

17

18

19

20

21

22

23

24

25

Least square fit

26

27

28

29

30

• The linear model

y = dependent variablex = independent variable0 = y-intercept

1 = slope of the line

= error variable

xy 10 xy 10

x

y

0 Run

Rise = Rise/Run

0 and 1 are unknown,therefore, are estimated from the data.

31

To calculate the estimates of the coefficientsthat minimize the differences between the data points and the line, use the formulas:

xbyb

s

)Y,Xcov(b

10

2x

1

xbyb

s

)Y,Xcov(b

10

2x

1

The regression equation that estimatesthe equation of the first order linear modelis:

xbby 10ˆ xbby 10ˆ

32

• Example 12.1 Relationship between odometer reading and a used car’s selling price.

– A car dealer wants to find the relationship between the odometer reading and the selling price of used cars.

– A random sample of 100 cars is selected, and the data recorded.

– Find the regression line.

Car Odometer Price1 37388 53182 44758 50613 45833 50084 30862 57955 31705 57846 34010 5359

. . .

. . .

. . .

Independent variable x

Dependent variable y

33

• Solution– Solving by hand

• To calculate b0 and b1 we need to calculate several statistics first;

;41.411,5y

;45.009,36x

256,356,11n

)yy)(xx()Y,Xcov(

688,528,431n

)xx(s

ii

2i2

x

where n = 100.

533,6)45.009,36)(0312.(41.5411xbyb

0312.688,528,43256,356,1

s

)Y,Xcov(b

10

2x

1

x0312.533,6xbby 10

34

4500

5000

5500

6000

19000 29000 39000 49000

OdometerPrice

– Using the computer (see file Xm17-01.xls)

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.806308R Square 0.650132Adjusted R Square0.646562Standard Error151.5688Observations 100

ANOVAdf SS MS F Significance F

Regression 1 4183528 4183528 182.1056 4.4435E-24Residual 98 2251362 22973.09Total 99 6434890

CoefficientsStandard Error t Stat P-valueIntercept 6533.383 84.51232 77.30687 1.22E-89Odometer -0.03116 0.002309 -13.4947 4.44E-24

x0312.533,6y

Tools > Data analysis > Regression > [Shade the y range and the x range] > OK

35

This is the slope of the line.For each additional mile on the odometer,the price decreases by an average of $0.0312

4500

5000

5500

6000

19000 29000 39000 49000

Odometer

Price

x0312.533,6y

The intercept is b0 = 6533.

6533

0 No data

Do not interpret the intercept as the “Price of cars that have not been driven”

top related