major points
DESCRIPTION
Major Points. Scatterplots The correlation coefficient Correlations on ranks Factors affecting correlations Testing for significance Intercorrelation matrices Other kinds of correlations. The Problem. Are two variables related? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/1.jpg)
Major Points• Scatterplots• The correlation coefficient
– Correlations on ranks
• Factors affecting correlations• Testing for significance• Intercorrelation matrices• Other kinds of correlations
![Page 2: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/2.jpg)
The Problem
Are two variables related? Does one increase as the other
increases? e. g. skills and income Does one decrease as the other
increases? e. g. health problems and nutrition
How can we get a graphical representation of the degree of relationship?
![Page 3: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/3.jpg)
Relation between father and son’s height: Pearson, (1896)
Reliability
1
![Page 4: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/4.jpg)
Another dataset:Heart Disease and Cigarettes
• Landwehr & Watkins report data on heart disease and cigarette smoking in 21 developed countries
• Data have been rounded for computational convenience. The results were not affected.
![Page 5: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/5.jpg)
Scatterplot of Heart Disease
• CHD Mortality goes on y axis• Cigarette consumption on x axis• What does each dot represent?• Best fitting line included for clarity
![Page 6: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/6.jpg)
Cigarette Consumption per Adult per Day
12108642
CH
D M
orta
lity
per
10,0
00
30
20
10
0
{X = 6, Y = 11}
2
![Page 7: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/7.jpg)
Cigarette Consumption per Adult per Day
12108642
CH
D M
orta
lity
per
10,0
00
30
20
10
0
{X = 6, Y = 11}
3
![Page 8: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/8.jpg)
What Does the Scatterplot Show?
• As smoking increases, so does coronary heart disease mortality.
• Relationship looks strong• Not all data points on line.
This gives us “residuals” or “errors of prediction”
![Page 9: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/9.jpg)
Example Scatterplots
xx
x
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
x
y
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
yHigh correlation Low correlation
4
![Page 10: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/10.jpg)
Scatter plots:
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
r = .00
5
![Page 11: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/11.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
r =.15
![Page 12: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/12.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2
r = .40
6
![Page 13: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/13.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2
r = .81
![Page 14: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/14.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2
r = .99
7
![Page 15: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/15.jpg)
0
20
40
60
80
100
120
0 20 40 60 80 100 120
r = -.79
Guessing correlations: from Rice University
10
![Page 16: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/16.jpg)
Another way to visualize a correlation
Variance in A
Variance in b
Variance in A
Variance in b
Covariance Covariance
11
![Page 17: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/17.jpg)
What is a Correlation Coefficient
• A measure of degree of relationship.• Sign refers to direction.• Based on covariance
• Measure of degree to which large scores go with large scores, and small scores with small scores
• Pearson’s correlation coefficient is most often used
![Page 18: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/18.jpg)
The DataThe DataCigarette Consumption and Coronary Heart Disease Mortality for 21 Countries
Cig. 11 9 9 9 8 8 8 6 6 5 5CHD 26 21 24 21 19 13 19 11 23 15 13
Cig. 5 5 5 5 4 4 4 3 3 3CHD 4 18 12 3 11 15 6 13 4 14
Cig. = Cigarettes per adult per dayCHD = Cornary Heart Disease Mortality per 10,000 population
Surprisingly, the U.S. it the first country on the list--the country with the highest consumption and highest mortality.
![Page 19: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/19.jpg)
Cig. CHD11 26
9 219 249 218 198 138 196 116 235 155 135 45 185 125 34 114 154 63 133 43 14
Cigarette Consumption and Coronary Heart Disease Mortality for 21 countries
Cigarette Consumption: per adult per dayCoronary Heart Disease: Mortality per 10,000 population
![Page 20: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/20.jpg)
Covariance• The formula
• Index of degree to which both list of numbers covary
• When would covXY be large and positive?• When would covXY be large and negative?
1))((
NYYXX
CovXY
![Page 21: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/21.jpg)
Calculation
• CovXY = 11.13
• sX = 2.33
• sY = 6.69
71.59.15
13.11
)69.6)(33.2(
13.11cov
YX
XY
SDSDr
![Page 22: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/22.jpg)
Correlation Coefficient
• Symbolized by r
• Covariance ÷ (product of st. dev.)
YX
XY
SDSD
Covr
![Page 23: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/23.jpg)
Correlation in a random sample
Generated 6 sets of random numbers (100each)The correlation Matrix
var1 var2 var3 var4 var5 var6var1 1.00var2 -0.02 1.00var3 0.08 -0.03 1.00var4 -0.10 -0.15 0.02 1.00var5 0.03 0.01 -0.11 0.22 1.00var6 0.00 0.19 -0.03 0.13 -0.15 1.00
![Page 24: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/24.jpg)
Factors Affecting r• Range restrictions
• Outliers
• Nonlinearity e.g. anxiety and performance
• Heterogeneous subsamples Everyday examples
![Page 25: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/25.jpg)
The effect of outliers on correlations
Dataset: 20 cases selected from darts and pros
DARTS
806040200-20-40
Pro
s80
60
40
20
0
-20
-40
r = .80
![Page 26: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/26.jpg)
Dataset: one case altered to give more extreme values
DARTS
Pro
s
r = .58
806040200-20-40
80
60
40
20
0
-20
-40
12
![Page 27: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/27.jpg)
Summary of effect of outliers
•A few extreme values can have extreme effects
•Especially when sample size is sample
•You cannot randomly toss out data! You need to have a theoretical or statistical justification
![Page 28: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/28.jpg)
Restriction of range: Countries With Low Consumptions
Data With Restricted Range
Truncated at 5 Cigarettes Per Day
Cigarette Consumption per Adult per Day
5.55.04.54.03.53.02.5
CH
D M
ort
alit
y p
er
10
,00
0
20
18
16
14
12
10
8
6
4
2
![Page 29: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/29.jpg)
R between between grades in high school and grades in college.
Scatter plot for 250 students who vary on High School GPA
Scatter plot for students who have GPA equal to or greater than 3.5
![Page 30: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/30.jpg)
•no effect on Pearson's correlation coefficient.
•Example: r between height and weight is the same regardless of whether height is measured in inches, feet, centimeters or even miles.
•This is a very desirable property since choice of measurement scales that are linear transformations of each other is often arbitrary.
Effect of linear transformations of data
![Page 31: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/31.jpg)
An example: •Scores on the Scholastic Aptitude Test (SAT) range from 200-800.
•200 to 800 is an arbitrary range.
•You could subtract 100 points from each score and multiply each score by 3. Scores on the SAT would then range from 300-2100. Test would remain the same.
•r between SAT and some other variable (such as college grade point average) would not be affected by this linear transformation.
![Page 32: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/32.jpg)
Non linear relationships
Example: Anxiety and Performance
0
2
4
6
8
10
12
14
16
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
r = .07
13
![Page 33: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/33.jpg)
The interpretation of a correlation coefficient
• Ranges from –1 to 1• No correlation in the data means you
will get a is 0 r or near it• Suffers from sampling error (like
everything else!). So you need to estimate true population correlation from the sample correlation.
![Page 34: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/34.jpg)
• Correlations in the sample differ from the correlations in the population by some amount (sampling error)
• Sometimes it is higher than population correlation, sometimes it is lower, rarely on the target.
• How do you know when to accept and when to reject correlation?
![Page 35: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/35.jpg)
Possible ways to decide
• Accept it if it fits your hypothesis, reject it otherwise!
• Toss a coin
• Democratically: Ask your officemates to vote.
![Page 36: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/36.jpg)
Fisherian Statistics: Null and Alternative Hypothesis
• Sampling error implies that sometimes the results we obtain will be due to chance (since not every sample will accurately resemble the population)
• The null hypothesis expresses the idea that an observed difference is due to chance.
• For example: There is no difference between the norms regarding the use of email and voice mail
![Page 37: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/37.jpg)
• The alternative hypothesis (the experimental hypothesis) is often the one that you formulate: there is a correlation between people’s perception of a website’s reliability and the probability of their buying something on the site
• Why bother to have a null hypothesis?– Can you reject the null hypothesis
The alternative hypothesis
![Page 38: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/38.jpg)
An Example
• Relationship between browsing and buying on an electronic commerce site
• Data gathered from server logs
• Hypothesis: Those who browse longer also tend to purchase
• Hypothesis can be framed in another way: There is no relationship between time spent browsing and likelihood of purchase (Null Hypothesis)
![Page 39: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/39.jpg)
Testing the significance of a r
• Population parameter = • Null hypothesis H0: = 0
What would a true null mean here? What would a false null mean here?
• Alternative hypothesis (H1)
![Page 40: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/40.jpg)
Tables of Significance
• Table in Appendix E.2
• For N - 2 = 19 df, rcrit = .433
• Our correlation > .433
• Reject H0 Correlation is significant. More cigarette consumption
associated with more CHD mortality.
![Page 41: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/41.jpg)
SPSS Printout
• SPSS Printout gives test of significance. Double asterisks with footnote
indicate p < .01.
![Page 42: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/42.jpg)
SPSS Printout
Correlations
1.000 .138
. .530
23 23
.138 1.000
.530 .
23 23
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
LIFEEXP
EXPEND
LIFEEXP EXPEND
![Page 43: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/43.jpg)
LIFEEXP
747372717069686766
EX
PE
ND
1600
1400
1200
1000
800
600
400
200
Sweden
United_S
France
NorwayCanadaDenmark
Australi NetherlaGermany SwitzerlFinland
ItalyBelgiumNew_ZealAustria Eng-WaleIrelandLuxembou
JapanSpain
IcelandPortugalGreece
SPSS printout for scatterplot
![Page 44: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/44.jpg)
OPTIM
RELINFL
RELINV
RELHOPE
A matrix of scatterplots
Correlation is significant at the 0.01 level (2-tailed).**.
1.000 .272** .167** .266**
.272** 1.000 .449** .419**
.167** .449** 1.000 .544**
.266** .419** .544** 1.000
OPTIM
RELINFL
RELINV
RELHOPE
OPTIM RELINFL RELINV RELHOPE
![Page 45: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/45.jpg)
A review of Scatterplots
next three slides• Infant mortality and number of physicians• Life expectance and health care
expenditures• Cancer rate and solar radiation
![Page 46: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/46.jpg)
Figure 9.1
Infant Mortaility and Number of Physicians
Physicians per 100,000 Population
201816141210
Infa
nt
Mo
rta
lity
10
8
6
4
2
0
-2
-4
-6
![Page 47: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/47.jpg)
Figure 9.2
Life Expectancy and Health Care Costs
Health Care Expenditures
1600140012001000800600400200
Life
Exp
ect
an
cy (
Ma
les)
74
73
72
71
70
69
68
67
66
![Page 48: Major Points](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ac5550346895da2d633/html5/thumbnails/48.jpg)
Figure 9.3
Cancer Rate and Solar Radiation
Solar Radiation
600500400300200
Bre
ast
Ca
nce
r R
ate
34
32
30
28
26
24
22
20