mvs 250: v. katch

38
1 MVS 250: V. Katch STATISTICS Chapter 5 Correlation/Regression

Upload: lane-cherry

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

S TATISTICS. Chapter 5 Correlation/Regression. MVS 250: V. Katch. Paired Data is there a relationship if so, what is the equation use the equation for prediction. Overview. Correlation. Correlation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MVS 250: V. Katch

1MVS 250: V. KatchMVS 250: V. Katch

STATISTICSChapter 5 Correlation/Regression

Page 2: MVS 250: V. Katch

2

Overview

Paired Data

is there a relationship

if so, what is the equation

use the equation for prediction

Page 3: MVS 250: V. Katch

3

Correlation

Page 4: MVS 250: V. Katch

4

Definition

Correlation

exists between two variables when one of them is related to the other in some way

Page 5: MVS 250: V. Katch

5

Assumptions

1. The sample of paired data (x,y) is a random sample.

2. The pairs of (x,y) data have a bivariate normal distribution.

Page 6: MVS 250: V. Katch

6

Definition

Scatterplot (or scatter diagram)

is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point.

Page 7: MVS 250: V. Katch

7

Scatter Diagram of Paired Data

Page 8: MVS 250: V. Katch

8

Scatter Diagram of Paired Data

Page 9: MVS 250: V. Katch

9

Positive Linear Correlation

x x

yy y

x

Scatter Plots

(a) Positive (b) Strong positive

(c) Perfect positive

Page 10: MVS 250: V. Katch

10

Negative Linear Correlation

x x

yy y

x(d) Negative (e) Strong

negative(f) Perfect

negative

Scatter Plots

Page 11: MVS 250: V. Katch

11

No Linear Correlation

x x

yy

(g) No Correlation (h) Nonlinear Correlation

Scatter Plots

Page 12: MVS 250: V. Katch

12

xy/n - (x/n)(y/n)(SDx) (SDy)

r =

DefinitionLinear Correlation Coefficient r

measures strength of the linear relationship between paired x and y values in a sample

Where xy/n is the mean of the cross products; (x/n) is the mean of the x variable; (y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable

Page 13: MVS 250: V. Katch

13

Notation for the Linear Correlation Coefficient

n number of pairs of data presented

denotes the addition of the items indicated.

x/n denotes the mean of all x values.

y/n denotes the mean of all y values.

xy/n denotes the mean of the cross products [x times y, summed; divided by n]

r linear correlation coefficient for a sample

linear correlation coefficient for a population

Page 14: MVS 250: V. Katch

14

Round to three decimal placesUse calculator or computer if possible

Rounding the Linear Correlation Coefficient r

Page 15: MVS 250: V. Katch

15

Properties of the Linear Correlation Coefficient r

1. -1 r 1

2. Value of r does not change if all values of either variable are converted to a different scale.

3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not

      change.

4. r measures strength of a linear relationship.

Page 16: MVS 250: V. Katch

16

Interpreting the Linear Correlation Coefficient

If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation.

Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.

Remember to use n-2

Page 17: MVS 250: V. Katch

17

Common Errors Involving Correlation

1. Causation: It is wrong to conclude that correlation implies causality.

2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.

3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation.

Page 18: MVS 250: V. Katch

18

0

50

100

150

200

250

0 1 2 3 4 5 6 7 8

Dis

tan

ce(f

eet)

Time (seconds)

Common Errors Involving Correlation

Page 19: MVS 250: V. Katch

19

Correlation is Not Causation

A B

C

Page 20: MVS 250: V. Katch

20

Correlation Calculations

Rank Order Correlation - Rho

Pearson’s - r

Page 21: MVS 250: V. Katch

21

Rank Order Correlation

Hits Rank HR Rank D D2

1 10 3 8 2 4

2 9 4 7 2 4

3 8 5 6 2 4

4 7 1 10 -3 9

5 6 7 4 2 4

6 5 6 5 0 0

7 4 2 9 -5 25

8 3 10 1 2 4

9 2 9 2 0 0

10 1 8 3 2 4

Page 22: MVS 250: V. Katch

22

Rank Order Correlation, cont

Hits Rank HR Rank D D2

1 10 3 8 2 4

2 9 4 7 2 4

3 8 5 6 2 4

4 7 1 10 -3 9

5 6 7 4 2 4

6 5 6 5 0 0

7 4 2 9 -5 25

8 3 10 1 2 4

9 2 9 2 0 0

10 1 8 3 2 4

Rho = 1- [6 (∑D2) / N (N2-1)]

Rho = 1- [6(58)/10(102-1)]

Rho = 1- [348 / 10 (100 -1)]

Rho = 1- [348 / 990]

Rho = 1- 0.352

Rho = 0.648

(∑D2 = 58)N=10

Page 23: MVS 250: V. Katch

23

Pearson’s rHits HR xy

1 3 3

2 4 8

3 5 15

4 1 4

5 7 35

6 6 36

7 2 14

8 10 80

9 9 81

10 8 80

x/n=5.

5

x/n= 5.5

xy/n =32.86

xy/n - (x/n)(y/n)(SDx) (SDy)

r =

r = 32.86 - (5.5) (5.5)/(3.03) (3.03)

r = 35.86 - 30.25 / 9.09

r = 5.61 / 9.09

r = 0.6172

Page 24: MVS 250: V. Katch

24

Pearson’s r

Excel Demonstration

Page 25: MVS 250: V. Katch

25

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

Is there a significant linear correlation?

Page 26: MVS 250: V. Katch

26

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

Is there a significant linear correlation?

Plastic Household0.27 21.41 32.19 32.83 62.19 41.81 20.85 13.05 5

Page 27: MVS 250: V. Katch

27

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

Is there a significant linear correlation?

Plastic Garbage v Household size

R2 = 0.7096

01234567

0 0.5 1 1.5 2 2.5 3 3.5

Plastic (lbs)

Household size

r = 0.842R2 = 0.71

Page 28: MVS 250: V. Katch

28

n = 8 = 0.05 H0: = 0

H1 : 0

Test statistic is r = 0.842

Critical values are r = - 0.707 and 0.707(Table R with n = 8 and = 0.05)

TABLE R Critical Values of the Pearson Correlation Coefficient r

456789

101112131415161718192025303540455060708090

100

n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256

.950

.878

.811

.754

.707

.666

.632

.602

.576

.553

.532

.514

.497

.482

.468

.456

.444

.396

.361

.335

.312

.294

.279

.254

.236

.220

.207

.196

= .05 = .01

Is there a significant linear correlation?

Page 29: MVS 250: V. Katch

29

0r = - 0.707 r = 0.707 1

Sample data:r = 0.842

- 1

0.842 > 0.707, That is the test statistic does fall within the critical region.

Therefore, we REJECT H0: = 0 (no correlation) and concludethere is a significant linear correlation between the weights ofdiscarded plastic and household size.

Is there a significant linear correlation?

Fail to reject = 0

Reject= 0

Reject= 0

Page 30: MVS 250: V. Katch

30

Method 1: Test Statistic is t(follows format of earlier chapters)

Page 31: MVS 250: V. Katch

31

Formal Hypothesis Test

To determine whether there is a significant linear correlation between two variables

Two methods

Both methods let H0: = (no significant linear correlation)

H1: (significant linear correlation)

Page 32: MVS 250: V. Katch

32

Test statistic: r

Critical values: Refer to Table R (no degrees of freedom)

Method 2: Test Statistic is r(uses fewer calculations)

Page 33: MVS 250: V. Katch

33

Test statistic: r

Critical values: Refer to Table A-6 (no degrees of freedom)

Method 2: Test Statistic is r(uses fewer calculations)

Fail to reject = 0

0r = - 0.811 r = 0.811 1

Sample data:r = 0.828

-1

Reject= 0

Reject= 0

Page 34: MVS 250: V. Katch

34

Method 1: Test Statistic is t(follows format of earlier chapters)

Test statistic:

1 - r 2

n - 2

r

Critical values:

use Table T with degrees of freedom = n - 2

t =

Page 35: MVS 250: V. Katch

35

Testing for a

Linear Correlation

If H0 is rejected conclude that thereis a significant linear correlation.

If you fail to reject H0, then there isnot sufficient evidence to conclude

that there is linear correlation.

If the absolute value of thetest statistic exceeds the

critical values, reject H0: = 0Otherwise fail to reject H0

The test statistic is

t =1 - r 2

n -2

r

Critical values of t are from Table A-3

with n -2 degrees of freedom

The test statistic is

Critical values of t are from Table A-6

r

Calculate r using

Formula 9-1

Select asignificance

level

Let H0: = 0 H1: 0

Start

METHOD 1 METHOD 2

Page 36: MVS 250: V. Katch

36

Why does the critical value of r increase as sample size decreases?

A correlation by chance is more likely.

Page 37: MVS 250: V. Katch

37

Coefficient of Determination(Effect Size)

r2

The part of variance of one variable that can be explained by the variance of a related variable.

Page 38: MVS 250: V. Katch

38

x = 3

•••Quadrant 3

Quadrant 2 Quadrant 1

Quadrant 4

x

y

y = 11(x, y)

x - x = 7- 3 = 4

y - y = 23 - 11 = 12

0 1 2 3 4 5 6 70

4

8

12

16

20

24

r = (x -x) (y -y)(n -1) Sx Sy

(x, y) centroid of sample points

•(7, 23)

Justification for r Formula