1 g89.2228 lect 8b g89.2228 lecture 8b correlation: quantifying linear association between random...

12
1 G89.2228 Lect 8b G89.2228 Lecture 8b • Correlation: quantifying linear association between random variables • Example: Okazaki’s inferences from a survey • Review of Covariance • Covariance and correlation • Correlation as parameter • Correlation in data analysis • Correlation when one or more variables is binary

Upload: miles-hawkins

Post on 01-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

1G89.2228 Lect 8b

G89.2228Lecture 8b

• Correlation: quantifying linear association between random variables

• Example: Okazaki’s inferences from a survey

• Review of Covariance

• Covariance and correlation

• Correlation as parameter

• Correlation in data analysis

• Correlation when one or more variables is binary

Page 2: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

2G89.2228 Lect 8b

Correlation

• The correlation coefficient is the best known measure of association between two variables» It measures linear association» It ranges from

• –1 (perfect inverse association),

• to 0 (no linear association)

• to +1 (perfect association)

• The correlation coefficient is also related to an important parameter of the bivariate normal distribution

Page 3: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

3G89.2228 Lect 8b

Example: Okazaki’s Inferences from a survey

• Does self-construal account for relation of adverse functioning with Asian status?

• Survey of 348 students (simple r. sample)• Self-reported Interdependence was

correlated .53 with self-reported Fear of Negative Evaluation

• Illustrative plot (simulated) of r=.53

Bivariate Normal With .53 Correlation

-4

-3

-2

-1

0

1

2

3

-4 -2 0 2 4

X

Y

Page 4: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

4G89.2228 Lect 8b

Review of Covariance as Statistical Concept

• We discussed covariance as a bivariate moment

• E[(X-x)(Y-y)] = Cov(X,Y) = XY is called the population covariance.

• Covariance provides an index of linear dependence of two variables

• It is an expectation that depends on the joint bivariate density of X and Y, f(X,Y).» f(X,Y) says how likely are any pair of

values of X and Y

» When X and Y are binary, then f(X,Y) represents joint probabilities

» Scatterplots give an impression of the joint density

Page 5: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

5G89.2228 Lect 8b

Interpreting covariance as index of linear association

• When X and Y tend to increase together, Cov(X,Y)>0

• When high levels of X go with low levels of Y, Cov(X,Y)<0

• When X and Y are independent, Cov(X,Y) = 0.

• Note that there are cases when Cov(X,Y) take the value zero when X and Y are related nonlinearly.

X

Y

+,+

-,-

-,+

+,-

Page 6: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

6G89.2228 Lect 8b

Correlation and Covariance

• Besides noticing its sign and whether it is zero, it is difficult to interpret the absolute magnitude of covariance

• Note that Cov(X,Y) is bounded by V(X) and V(Y):

• Correlation, Corr(X,Y), is a rescaled version of covariance that is bounded by –1 and +1» It is the covariance of two variables that

have variances of 1

)](),([Max),(Cov YVXVYX

YX

XYXY

XYYXXY

Page 7: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

7G89.2228 Lect 8b

Estimating covariance

• Since covariance is simply the expected average product of deviations from the means of X and Y, we estimate it using an average of products of deviations in the sample,

• if x and y are not known, we use:

as an unbiased estimator

n

YXn

iYiXi

1

))((

1

))((1

n

YYXXs

n

iii

XY

Page 8: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

8G89.2228 Lect 8b

Product moment estimate of correlation

• The population correlation is defined as:

• The sample product moment correlation is obtained by inserting the sample estimates of the moments

e.g., .69=43.38/(11.36*5.49)

)/( YXXYXY

)/( YXXYXY sssr

Correlation scatterplot

0

5

10

15

20

0 10 20 30 40 50

CESD 2 Weeks

CES

D 6

Wee

ks

Page 9: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

9G89.2228 Lect 8b

Correlation as a parameter

• Bivariate distribution functions describe not only the marginal distributions of each variable, but also the pattern of association between variables.

• The bivariate normal distribution function is parameterized by the means, variances and an index of linear association (covariance or correlation).

• In such cases, we can think about the population correlation, (rho), as a parameter to be estimated.

• The estimate is obtained from a survey of multivariate normal observations.

• Product moment correlation (r) provides a reasonable (but biased) estimate of radj is less so.

Page 10: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

10G89.2228 Lect 8b

Correlation as a summary of data

• Pearson product moment (PPM) correlations (r) can be computed as summaries of linear association even when population parameter is not of central interest.

• If one or more variables are binary, r may be affected by the marginal variance» Only under special conditions will r

take the value of 1 or -1.» r is related to test statistics.» When both variables are binary, the

PPM correlation is called phi,

» When one variable is binary, the PPM is called a Point Biserial Correlation,

N2

rpb tt 2 N 2

Page 11: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

11G89.2228 Lect 8b

Other kinds of Correlation for categorical data

• Biserial, tetrachoric and polychoric correlations are alternatives to r that estimate what bivariate normal might have been if the categories had been formed by cutting up a truly normal continuum into “High”, “Low” and so on.

• These estimates are often unstable, but they can be useful if the sample is large.

a b

c d

a b

c d

Page 12: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey

12G89.2228 Lect 8b

Example: ZZ1 and ZZ2 Continuous, CZ1, CZ2 Discrete

Descriptive Statistics

13.9008 2.9166 500

19.7594 2.9285 500

.9080 .2893 500

.3300 .4707 500

ZZ1

ZZ2

CZ1

CZ2

Mean Std. Deviation N

Correlations

1.000 .596** .571** .472**

. .000 .000 .000

8.507 5.092 .482 .648

500 500 500 500

.596** 1.000 .325** .779**

.000 . .000 .000

5.092 8.576 .275 1.074

500 500 500 500

.571** .325** 1.000 .209**

.000 .000 . .000

.482 .275 .0837 .0284

500 500 500 500

.472** .779** .209** 1.000

.000 .000 .000 .

.648 1.074 .0284 .222

500 500 500 500

Pearson Correlation

Sig. (2-tailed)

Covariance

N

Pearson Correlation

Sig. (2-tailed)

Covariance

N

Pearson Correlation

Sig. (2-tailed)

Covariance

N

Pearson Correlation

Sig. (2-tailed)

Covariance

N

ZZ1

ZZ2

CZ1

CZ2

ZZ1 ZZ2 CZ1 CZ2

Correlation is significant at the 0.01 level (2-tailed).**.