1 g89.2228 lect 8b g89.2228 lecture 8b correlation: quantifying linear association between random...
TRANSCRIPT
![Page 1: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/1.jpg)
1G89.2228 Lect 8b
G89.2228Lecture 8b
• Correlation: quantifying linear association between random variables
• Example: Okazaki’s inferences from a survey
• Review of Covariance
• Covariance and correlation
• Correlation as parameter
• Correlation in data analysis
• Correlation when one or more variables is binary
![Page 2: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/2.jpg)
2G89.2228 Lect 8b
Correlation
• The correlation coefficient is the best known measure of association between two variables» It measures linear association» It ranges from
• –1 (perfect inverse association),
• to 0 (no linear association)
• to +1 (perfect association)
• The correlation coefficient is also related to an important parameter of the bivariate normal distribution
![Page 3: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/3.jpg)
3G89.2228 Lect 8b
Example: Okazaki’s Inferences from a survey
• Does self-construal account for relation of adverse functioning with Asian status?
• Survey of 348 students (simple r. sample)• Self-reported Interdependence was
correlated .53 with self-reported Fear of Negative Evaluation
• Illustrative plot (simulated) of r=.53
Bivariate Normal With .53 Correlation
-4
-3
-2
-1
0
1
2
3
-4 -2 0 2 4
X
Y
![Page 4: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/4.jpg)
4G89.2228 Lect 8b
Review of Covariance as Statistical Concept
• We discussed covariance as a bivariate moment
• E[(X-x)(Y-y)] = Cov(X,Y) = XY is called the population covariance.
• Covariance provides an index of linear dependence of two variables
• It is an expectation that depends on the joint bivariate density of X and Y, f(X,Y).» f(X,Y) says how likely are any pair of
values of X and Y
» When X and Y are binary, then f(X,Y) represents joint probabilities
» Scatterplots give an impression of the joint density
![Page 5: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/5.jpg)
5G89.2228 Lect 8b
Interpreting covariance as index of linear association
• When X and Y tend to increase together, Cov(X,Y)>0
• When high levels of X go with low levels of Y, Cov(X,Y)<0
• When X and Y are independent, Cov(X,Y) = 0.
• Note that there are cases when Cov(X,Y) take the value zero when X and Y are related nonlinearly.
X
Y
+,+
-,-
-,+
+,-
![Page 6: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/6.jpg)
6G89.2228 Lect 8b
Correlation and Covariance
• Besides noticing its sign and whether it is zero, it is difficult to interpret the absolute magnitude of covariance
• Note that Cov(X,Y) is bounded by V(X) and V(Y):
• Correlation, Corr(X,Y), is a rescaled version of covariance that is bounded by –1 and +1» It is the covariance of two variables that
have variances of 1
)](),([Max),(Cov YVXVYX
YX
XYXY
XYYXXY
![Page 7: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/7.jpg)
7G89.2228 Lect 8b
Estimating covariance
• Since covariance is simply the expected average product of deviations from the means of X and Y, we estimate it using an average of products of deviations in the sample,
• if x and y are not known, we use:
as an unbiased estimator
n
YXn
iYiXi
1
))((
1
))((1
n
YYXXs
n
iii
XY
![Page 8: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/8.jpg)
8G89.2228 Lect 8b
Product moment estimate of correlation
• The population correlation is defined as:
• The sample product moment correlation is obtained by inserting the sample estimates of the moments
e.g., .69=43.38/(11.36*5.49)
)/( YXXYXY
)/( YXXYXY sssr
Correlation scatterplot
0
5
10
15
20
0 10 20 30 40 50
CESD 2 Weeks
CES
D 6
Wee
ks
![Page 9: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/9.jpg)
9G89.2228 Lect 8b
Correlation as a parameter
• Bivariate distribution functions describe not only the marginal distributions of each variable, but also the pattern of association between variables.
• The bivariate normal distribution function is parameterized by the means, variances and an index of linear association (covariance or correlation).
• In such cases, we can think about the population correlation, (rho), as a parameter to be estimated.
• The estimate is obtained from a survey of multivariate normal observations.
• Product moment correlation (r) provides a reasonable (but biased) estimate of radj is less so.
![Page 10: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/10.jpg)
10G89.2228 Lect 8b
Correlation as a summary of data
• Pearson product moment (PPM) correlations (r) can be computed as summaries of linear association even when population parameter is not of central interest.
• If one or more variables are binary, r may be affected by the marginal variance» Only under special conditions will r
take the value of 1 or -1.» r is related to test statistics.» When both variables are binary, the
PPM correlation is called phi,
» When one variable is binary, the PPM is called a Point Biserial Correlation,
N2
rpb tt 2 N 2
![Page 11: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/11.jpg)
11G89.2228 Lect 8b
Other kinds of Correlation for categorical data
• Biserial, tetrachoric and polychoric correlations are alternatives to r that estimate what bivariate normal might have been if the categories had been formed by cutting up a truly normal continuum into “High”, “Low” and so on.
• These estimates are often unstable, but they can be useful if the sample is large.
a b
c d
a b
c d
![Page 12: 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey](https://reader036.vdocument.in/reader036/viewer/2022082818/56649ec45503460f94bcf2aa/html5/thumbnails/12.jpg)
12G89.2228 Lect 8b
Example: ZZ1 and ZZ2 Continuous, CZ1, CZ2 Discrete
Descriptive Statistics
13.9008 2.9166 500
19.7594 2.9285 500
.9080 .2893 500
.3300 .4707 500
ZZ1
ZZ2
CZ1
CZ2
Mean Std. Deviation N
Correlations
1.000 .596** .571** .472**
. .000 .000 .000
8.507 5.092 .482 .648
500 500 500 500
.596** 1.000 .325** .779**
.000 . .000 .000
5.092 8.576 .275 1.074
500 500 500 500
.571** .325** 1.000 .209**
.000 .000 . .000
.482 .275 .0837 .0284
500 500 500 500
.472** .779** .209** 1.000
.000 .000 .000 .
.648 1.074 .0284 .222
500 500 500 500
Pearson Correlation
Sig. (2-tailed)
Covariance
N
Pearson Correlation
Sig. (2-tailed)
Covariance
N
Pearson Correlation
Sig. (2-tailed)
Covariance
N
Pearson Correlation
Sig. (2-tailed)
Covariance
N
ZZ1
ZZ2
CZ1
CZ2
ZZ1 ZZ2 CZ1 CZ2
Correlation is significant at the 0.01 level (2-tailed).**.