correlation they go together like salt and pepper… like oil and vinegar… like bread and...

40
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

Upload: mavis-ryan

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Correlation

They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

Page 2: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Major Uses

• Correlational techniques are used for three major purposes:

– Degree of Association

– Predication

– Reliability

Page 3: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Bivariate Distribution

• Bivariate distribution - a distribution in which two variables are presented simultaneously

Consider the following:X Y4 56 97 88 44 65 66 7

Ordinarily, we might construct a graph for each set of data. However, we can place both on a “scatter diagram.”

Page 4: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

X

Y

0 1 2 3 4 5 6 7 8

8

7

6

5

4

3

2

1

0

Scatter Diagram

X Y4 56 87 88 44 65 66 7

Page 5: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

What Scatter Diagrams Can Tell Us

• A scatter diagram can tell us much about a bivariate distribution:

– presence of relationship

No Relationship Relationship

Page 6: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

- Direction of relationship

PositiveRelationship

NegativeRelationship

There is a positive relationship between high school SAT scores and college GPA. Other examples?

There is a negative relationship between the number of missed classes and exam scores. Other examples?

Page 7: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

- Linear or non-linear

Linear Non-linear

Page 8: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

- Homoscedasticity/Heteroscedasticity

Homoscedastic Heteroscedastic

Page 9: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

- Exceptions to relationship

Perfect Relationship

Page 10: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Conceptualizing r

I IIIV III

Y

X

(-) values (+) values

(+) values (-) values

rxy =xy

(x2)(y2)√xyxy

cross-products

Page 11: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Computational Formula

rxy =

(X)(Y)nXY -

X2 -(X)2

n Y2 -(Y)2

n[ ][ ]√

rxy = =xy

(x2)(y2)√

Page 12: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Correlation and Causation

• “Correlation does not imply causation.”

• Consider the following:

There is a very high correlation (i.e., in the upper .90s) between the length of a person’s big toe and ability to spell!

• Several possibilities exist:

– changes in X cause changes in Y

– changes in Y cause changes in X

– a third (or other) variable affects X and Y

Page 13: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Correlation and Causation

• How about this one?

Children exposed to violent TV are more aggressive than children exposed to non-violent TV

Page 14: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”

• Linearity of regression

– the more closely scores follow a straight line, the higher the value of r

High value r Low value r r underestimates true degree of association in non-linear relationship

Page 15: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”• Restriction of Range (Truncated range)

– If the correlation coefficient is calculated on a portion of the data, r will usually be smaller than had all data been used

Higher value r Lower value r

Page 16: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”• Discontinuous distribution

– If the correlation coefficient is calculated on portions of the data that are separated, r will usually be higher than had all data been used

Lower value r Higher value r

Page 17: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”

• The correlation coefficient will adequately reflect the degree of association for a homoscedastic distribution across the entire range of scores, but not for a heteroscedastic distribution

Homoscedastic Heteroscedastic

Over estimates the degree of association

at this point

Under estimates the degree of association

at this point

Page 18: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”• Pooled data

– small samples may be combined if their means and standard deviations are similar, otherwise “spurious correlations” may occur

Lower value r Higher value r

Page 19: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Factors Influencing the Size of “r”

• Sampling Variability

– Large sample sizes (i.e., n > 100) are not greatly affected by sampling variability

– Small sample sizes will vary considerably, so one must take sample size into consideration when interpreting r.

• Each of the previous factors indicates the need to consider the conditions under which the correlation coefficient is calculated when interpreting r

Page 20: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Interpreting Strength of Association

• The correlation coefficient is not the best way to interpret the strength of the association between X and Y

– scale is not linear and, therefore, r = .60 (for example) is not twice as strong a relationship as r = .30

• The coefficient of determination is a better index of strength

– coefficient of determination - the proportion of variability in Y scores that can be explained by changes in X scores

– r 2

Page 21: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Regression

Page 22: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Prediction

• If two variables are correlated, you can predict Y from X with better than chance probability

• Given r < 1, there will be predictive error - the difference between the actual Y score and the predicted score (Y’) for a given value of X

– For example,

predicted GPA = 3.40

actual GPA = 2.78

error = 2.78 - 3.40 = .62

• Predictive error = Y - Y’

Page 23: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Reducing Predictive Error

• Obviously, we would want our predictions to be as accurate as possible (i.e., have little predictive error)

• When (Y - Y’)2 is a minimum, we have met the least squares criterion for the “best fitting straight line” called the regression line

Page 24: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

• The regression line can be thought of as a “running mean”

– the means are estimated (i.e., what would be expected given a large number of observations for a given X value)

Y’ = 2.31

Y’ = 2.78

Y = 2.57

X = 425 X = 650

Regression line

Page 25: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Which Line is Best?Given the scatter plot below, where would we place the regression line?

Page 26: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

The Regression Equation

Fortunately, there is a simple way to determine precisely where the regression line should be placed so that the least squares criterion is met:

r ( )Sy

SxX - [ X]+ Yr ( )Sy

SxY’ =

X score

Page 27: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

• The regression equation is really nothing more than the equation for a straight line:

y = aX + b

where,

a = slope

b = y-intercept

r ( )Sy

SxX - [ X]+ Yr ( )Sy

SxY’ =

{slope {y-intercept

• As such, we can use the regression equation to predict Y from X

Page 28: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

An Example

Consider the following data: Batting Avg HR

.219 8

.287 11

.306 12

.315 15n = 4X = 1.127X2 = .323191Y = 46Y2 = 554XY = 13.306

X = .28175Y = 11.5SX = .037612332SY = 2.5r = .918581710

Y’ = 61.06X - 5.71

XAVG = .271Y’HR = 10.84

Page 29: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

• Any time r < 1.00, the Y’ values will cluster more towards the overall Y

Regression to the Mean

• The tendency for Y’ values to move closer to Y is called regression to the mean

• At the extreme case where r = 0, all our Y’ values will equal Y

Page 30: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Measuring Predictive Error

• Since a predicted value is only a “best estimate,” we would like to know how much is the predictive error overall

• One way to measure the predictive error is to calculate the amount of variability of the Y scores around the regression line

• Standard error of estimate (prediction):

SYX = (Y - Y’)2

n√

Page 31: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Standard Error of Estimate

• The standard error of estimate is like a standard deviation, but one where the deviations are measured from the regression line and not the mean

SYX = (Y - Y’)2

n√ SX = (X - X)2

n√Standard deviationStandard error of estimate vs.

Page 32: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Standard Error of Estimate

High value r Low value r

SYX = SY 1 - r2√

• An easier formula is as follows:

• As r decreases, SYX increases

Page 33: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Confidence in Predictions• We can also establish limits, with a specified probability,

within which an individual’s actual score is likely to fall

– For example, given:

Y’ = 2.78, SYX = .45

Y’GPA = 2.78

SAT = 650

1.96(SYX)

-1.96(SYX)

95%

Upper limit3.66 1.96(.45) + 2.78 = 3.66

1.90Lower limit

-1.96(.45) + 2.78 = 1.90

Page 34: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Confidence in Predictions

• Given an SAT = 650, we can be 95% confident the individual’s actual GPA will fall between 1.90 and 3.66

• For such “confidence intervals” to make sense:

– the relationship between X and Y must be linear

– the bivariate distribution must be homoscedastic

– Y values must be normally distributed about Y’

– n > 100

Page 35: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Ordinal and NominalMeasures of Association

Page 36: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Spearman r

• When you have two ordinal variables (e.g., ranks of candidates from two admissions counselors), you can determine the degree of association between the variables with Spearman r

rs = 1 - 6 D2

n(n2 - 1) where,

D = difference between rankingsn = number of pairs of ranks

Page 37: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Spearman r

• In case of ties, it is usual to assign to each tied observation the mean rank of the ranks the tied observations would have otherwise occupied

– For example, if you cannot decide whether applicant #8 or applicant #3 should be your 7th choice, then assign each a rank of 7.5 since they would have been your 7th and 8th choices had you been able to decide

• It is best to make the judges not have ties, but if they persist, it would be better to calculate Pearson r and interpret the value as Spearman r corrected for ties

Page 38: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Phi ()• When you have two true dichotomous variables (e.g.,

gender and employment), you can use

M F

Employed

Unemployed

(A) (B)

(C) (D)

75 25

40 60

n = 200

(AD - BC)

(A+B)(C+D)(A+C)(B+D)√ = = .35

Page 39: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Reliability

• The third major use of correlation is determining reliability - how consistently does a measuring instrument measure over time

• The most common is test-retest reliability in which a test is given at one time and, following some period (e.g., a week, month, year, etc.), the test is given a second time

• Other types of reliability include

– split-half

– alternate forms

Page 40: Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc

Multiple Correlation and Regression

• Thus far we have examined the relationship between two variables, X and Y

• Multiple correlation and multiple regression examine the relationship between several X variables and a single Y variable (more commonly called “predictor” variables and the “criterion” variable)

• R = multiple correlation coefficient

• R2 = proportion of variability in Y scores that can be explained by the combined predictors Xi

• Y’ = a + b1X1 + b2X2