mvs 250: v. katch
DESCRIPTION
S TATISTICS. Chapter 5 Correlation/Regression. MVS 250: V. Katch. Paired Data is there a relationship if so, what is the equation use the equation for prediction. Overview. Correlation. Correlation - PowerPoint PPT PresentationTRANSCRIPT
1MVS 250: V. KatchMVS 250: V. Katch
STATISTICSChapter 5 Correlation/Regression
2
Overview
Paired Data
is there a relationship
if so, what is the equation
use the equation for prediction
3
Correlation
4
Definition
Correlation
exists between two variables when one of them is related to the other in some way
5
Assumptions
1. The sample of paired data (x,y) is a random sample.
2. The pairs of (x,y) data have a bivariate normal distribution.
6
Definition
Scatterplot (or scatter diagram)
is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point.
7
Scatter Diagram of Paired Data
8
Scatter Diagram of Paired Data
9
Positive Linear Correlation
x x
yy y
x
Scatter Plots
(a) Positive (b) Strong positive
(c) Perfect positive
10
Negative Linear Correlation
x x
yy y
x(d) Negative (e) Strong
negative(f) Perfect
negative
Scatter Plots
11
No Linear Correlation
x x
yy
(g) No Correlation (h) Nonlinear Correlation
Scatter Plots
12
xy/n - (x/n)(y/n)(SDx) (SDy)
r =
DefinitionLinear Correlation Coefficient r
measures strength of the linear relationship between paired x and y values in a sample
Where xy/n is the mean of the cross products; (x/n) is the mean of the x variable; (y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable
13
Notation for the Linear Correlation Coefficient
n number of pairs of data presented
denotes the addition of the items indicated.
x/n denotes the mean of all x values.
y/n denotes the mean of all y values.
xy/n denotes the mean of the cross products [x times y, summed; divided by n]
r linear correlation coefficient for a sample
linear correlation coefficient for a population
14
Round to three decimal placesUse calculator or computer if possible
Rounding the Linear Correlation Coefficient r
15
Properties of the Linear Correlation Coefficient r
1. -1 r 1
2. Value of r does not change if all values of either variable are converted to a different scale.
3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not
change.
4. r measures strength of a linear relationship.
16
Interpreting the Linear Correlation Coefficient
If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation.
Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.
Remember to use n-2
17
Common Errors Involving Correlation
1. Causation: It is wrong to conclude that correlation implies causality.
2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.
3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation.
18
0
50
100
150
200
250
0 1 2 3 4 5 6 7 8
Dis
tan
ce(f
eet)
Time (seconds)
Common Errors Involving Correlation
19
Correlation is Not Causation
A B
C
20
Correlation Calculations
Rank Order Correlation - Rho
Pearson’s - r
21
Rank Order Correlation
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
22
Rank Order Correlation, cont
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
Rho = 1- [6 (∑D2) / N (N2-1)]
Rho = 1- [6(58)/10(102-1)]
Rho = 1- [348 / 10 (100 -1)]
Rho = 1- [348 / 990]
Rho = 1- 0.352
Rho = 0.648
(∑D2 = 58)N=10
23
Pearson’s rHits HR xy
1 3 3
2 4 8
3 5 15
4 1 4
5 7 35
6 6 36
7 2 14
8 10 80
9 9 81
10 8 80
x/n=5.
5
x/n= 5.5
xy/n =32.86
xy/n - (x/n)(y/n)(SDx) (SDy)
r =
r = 32.86 - (5.5) (5.5)/(3.03) (3.03)
r = 35.86 - 30.25 / 9.09
r = 5.61 / 9.09
r = 0.6172
24
Pearson’s r
Excel Demonstration
25
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
Is there a significant linear correlation?
26
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
Is there a significant linear correlation?
Plastic Household0.27 21.41 32.19 32.83 62.19 41.81 20.85 13.05 5
27
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
Is there a significant linear correlation?
Plastic Garbage v Household size
R2 = 0.7096
01234567
0 0.5 1 1.5 2 2.5 3 3.5
Plastic (lbs)
Household size
r = 0.842R2 = 0.71
28
n = 8 = 0.05 H0: = 0
H1 : 0
Test statistic is r = 0.842
Critical values are r = - 0.707 and 0.707(Table R with n = 8 and = 0.05)
TABLE R Critical Values of the Pearson Correlation Coefficient r
456789
101112131415161718192025303540455060708090
100
n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05 = .01
Is there a significant linear correlation?
29
0r = - 0.707 r = 0.707 1
Sample data:r = 0.842
- 1
0.842 > 0.707, That is the test statistic does fall within the critical region.
Therefore, we REJECT H0: = 0 (no correlation) and concludethere is a significant linear correlation between the weights ofdiscarded plastic and household size.
Is there a significant linear correlation?
Fail to reject = 0
Reject= 0
Reject= 0
30
Method 1: Test Statistic is t(follows format of earlier chapters)
31
Formal Hypothesis Test
To determine whether there is a significant linear correlation between two variables
Two methods
Both methods let H0: = (no significant linear correlation)
H1: (significant linear correlation)
32
Test statistic: r
Critical values: Refer to Table R (no degrees of freedom)
Method 2: Test Statistic is r(uses fewer calculations)
33
Test statistic: r
Critical values: Refer to Table A-6 (no degrees of freedom)
Method 2: Test Statistic is r(uses fewer calculations)
Fail to reject = 0
0r = - 0.811 r = 0.811 1
Sample data:r = 0.828
-1
Reject= 0
Reject= 0
34
Method 1: Test Statistic is t(follows format of earlier chapters)
Test statistic:
1 - r 2
n - 2
r
Critical values:
use Table T with degrees of freedom = n - 2
t =
35
Testing for a
Linear Correlation
If H0 is rejected conclude that thereis a significant linear correlation.
If you fail to reject H0, then there isnot sufficient evidence to conclude
that there is linear correlation.
If the absolute value of thetest statistic exceeds the
critical values, reject H0: = 0Otherwise fail to reject H0
The test statistic is
t =1 - r 2
n -2
r
Critical values of t are from Table A-3
with n -2 degrees of freedom
The test statistic is
Critical values of t are from Table A-6
r
Calculate r using
Formula 9-1
Select asignificance
level
Let H0: = 0 H1: 0
Start
METHOD 1 METHOD 2
36
Why does the critical value of r increase as sample size decreases?
A correlation by chance is more likely.
37
Coefficient of Determination(Effect Size)
r2
The part of variance of one variable that can be explained by the variance of a related variable.
38
x = 3
•••Quadrant 3
Quadrant 2 Quadrant 1
Quadrant 4
•
x
y
y = 11(x, y)
x - x = 7- 3 = 4
y - y = 23 - 11 = 12
0 1 2 3 4 5 6 70
4
8
12
16
20
24
r = (x -x) (y -y)(n -1) Sx Sy
(x, y) centroid of sample points
•(7, 23)
Justification for r Formula