linear correlations
TRANSCRIPT
![Page 1: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/1.jpg)
Linear Correlations
Vignon S. Oussa
December 7, 2011
Vignon S. Oussa () Linear Correlations December 7, 2011 1 / 8
![Page 2: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/2.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 3: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/3.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 4: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/4.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 5: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/5.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 6: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/6.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 7: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/7.jpg)
Content
Introduction
Measure of dispersion
Concept of linear correlation of variables
Computations in Mathematica
Misc
questions
Vignon S. Oussa () Linear Correlations December 7, 2011 2 / 8
![Page 8: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/8.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 9: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/9.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 10: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/10.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 11: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/11.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 12: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/12.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 13: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/13.jpg)
What does it mean when we say smoking causes lung cancer?
Does it mean you will get cancer if you smoke a single cigarette?
People smoke for several years but never had lung cancer
Statistical meaning (You are much likely to get lung cancer if yousmoke more than if you don�t smoke)
Studies showed a clear correlation between smoking and lung cancer.
These studies are bases on statistics
.
Vignon S. Oussa () Linear Correlations December 7, 2011 3 / 8
![Page 14: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/14.jpg)
De�nitionA correlation exists between two variables when higher values of onevariable consistently go with higher values of another, or when highervalues of one variable consistently go with lower values of another.
ExampleThere is a correlation between the variables height and weight for people.Taller people trend to weight more than shorter people.
ExampleThere is a correlation between the variables demand for apples and price ofapples; demands tend to be lower when the price is higher.
ExampleThere is a correlation between practice time and skill among piano players;those who practice more tend to be more skilled
Vignon S. Oussa () Linear Correlations December 7, 2011 4 / 8
![Page 15: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/15.jpg)
Given x1, � � � , xN be N measurement of observations in population of Nindividuals with average µ.
Population mean µ =x1 + � � �+ xN
N= average.
Population Variance σ2 =∑Ni=1 (xi � µ)2
N
Pop Stand Deviation σ =p
σ2 =
s∑Ni=1 (xi � µ)2
N.
Vignon S. Oussa () Linear Correlations December 7, 2011 5 / 8
![Page 16: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/16.jpg)
Now let x1, � � � , xn be n observations in the sample of the originalpopulation with mean x
Sample mean x =x1 + � � �+ xn
n
Sample Variance s2 =∑ni=1 (xi � x)
2
n� 1
Sample Stand Deviation s =ps2 =
s∑ni=1 (xi � x)
2
n� 1 .
Vignon S. Oussa () Linear Correlations December 7, 2011 6 / 8
![Page 17: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/17.jpg)
Some Sample data
Company Revenue in 1998GM 178.1Wal-Mart 119.3IBM 78.5Boeing 45.8
Sample mean = 105.425
Sample variance = 3251.42
Sample standard deviation = 57.02
Vignon S. Oussa () Linear Correlations December 7, 2011 7 / 8
![Page 18: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/18.jpg)
De�nitionsBivariate data are data in which 2 variables are measured on anindividual.
We are interested in the fundamental question
ProblemHow can the value of one variable be used to predict the value of the othervariable?
Vignon S. Oussa () Linear Correlations December 7, 2011 8 / 8
![Page 19: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/19.jpg)
Scatter Diagrams and Correlation
4-2
Objectives fJ Draw and interpret scatter diagrams
S Describe the properties of the linear correlation coefficient
£J Compute and interpret the linear correlation coefficient
I!J Determine whether a linear relation exists between two variables
fa Explain the difference between correlation and causation
![Page 20: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/20.jpg)
4-33
IJ Dra'W' a:n.d Interpret Se!atter Diagra:m.s
![Page 21: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/21.jpg)
4-4
The reS()Onse variable is the variable whose value can be explained by the value of the eX()Ianatory or 1nedictor variable.
![Page 22: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/22.jpg)
4-5
A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the h01izontal axis, and the response variable is plotted on the vertical axis.
![Page 23: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/23.jpg)
EXAMPLE Drawing and Interpreting a Scatter Diagram
The data shown to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the explanatory variable, x, and time (in minutes) to drill five feet is the response variable, y. Draw a scatter diagram of the data.Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6.
4-6
![Page 24: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/24.jpg)
4-7
8-
(1.)
E_ jZ:
6- • • I
50
Drilling Time versus Depth
•• • • •
• • •
• •
I I I
100 150 200
Depth
![Page 25: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/25.jpg)
Various Types of Relations in a Scatter Diagram
Response
Response
•• ••• ••• • ••• •• • • • • •• Explanatory (a) Linear
••• -~···· •• •• •• • • •• •• Explanatory
(c) Nonlinear
Response
Response
•• • •• • • • •• • • • •• ••• • • Explanatory (b) Linear
•• • •• •••• •• • ••••• •• • • •• • • Explanatory
(d) Nonlinear
Response
• • • • • ••••• • • • •
Explanatory (e) No relation
![Page 26: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/26.jpg)
![Page 27: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/27.jpg)
4-10
fJ Des~ibe the Properties of the Linear Correlation Coeffi~ient
![Page 28: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/28.jpg)
4-11
The linear correlation coeflicient or Pearson product moment correlation coefficient is a measure of the strength and direction of the linear relation between two quantitative va1iables. We use the Greek letter p (rho) to represent the population correlation coefficient and r to represent the sample conelation coefficient. We present only the formula for the sample correlation coefficient.
Samtlle Linear Correlation Coefficient':'
r= n -1
where x is the sample mean of the explanatory variable
Sx is the sample standard deviation of the explanatory vmiable
y is the sample mean of the response vmiable
Sy is the sample standard deviation of the response va1iable
n is the number of individuals in the sample
(1)
![Page 29: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/29.jpg)
4-12
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always between -1 and 1, inclusive. That is, -1 < r < 1. 2. If r = + 1, then a perfect positive linear relation exists between the two vatiables. See Figure 4(a). 3. If r = -1, then a perfect negative linear relation exists between the two vatiables. See Figure 4( d). 4. The closer r is to + 1, the stronger is the evidence of positive association between the two variables. See Figures 4(b) and 4(c).
5. The closer r is to -1, the stronger is the evidence of negative association between the two variables. See Figures 4( e) and 4(f). 6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. Because the linear correlation coefficient is a measme of the strength of the linear relation, r close to 0 does not imt)ly no relation, just no linear relation. See Figures 4(g) and 4(h). 7. The linear correlation coefficient is a unitless measure of association. So the unit of measure for x and y plays no role in the interpretation of r. 8. The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient.
![Page 30: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/30.jpg)
4-13
• • • • • • • • • (;1) Perfect positive
linear relation, r = 1
I • • • •
•
• : I • • I • • • •
(e) Strong negative
•
linear relation, r = - 0.9
•
•
• •
I I •
! I ~ • • • I I
• • I I •
(b) Strong positive linear relation, r = 0.9
I • • • • • • •
• • • • • • I • • • • • • (I) Moderate negative
• I
linear relation, r = - 0.4
• I
• • I • • • • • • • •
• • I • •••• I • • • • •
(c) Moderate positive linear relation, r = 0.4
I • I •
• • I • • • • • • • • • •
(g) No linear relation, r close to 0.
I •
I •
• • I
• •
• • • • • • • • • • • (d) Perfect negative
linear relation, r = - 1
• • I • • • I • • • • • I I
(h) No linear relation, r close to 0.
•
I I
![Page 31: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/31.jpg)
4-14
fJ Co:tnpute and Interpret the Linear Correlation Coeffie!ient
![Page 32: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/32.jpg)
EXAMPLE Determining the Linear Correlation Coefficient
Determine the linear correlation coefficient of the drilling data.
4-15
![Page 33: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/33.jpg)
4-16
Depth, x Time, y
35 5.88 -1.74712 -1.41633 2.474501
50 5.99 -1.45992 -1.27544 1.862051
75 6.74 -0.98126 -0.31486 0.308958
95 6.1 -0.59833 -1.13456 0.678839
120 7.47 -0.11967 0.620111 -0.07421
130 6.93 0.0718 -0.07151 -0.00513
145 6.42 0.358998 -0.72471 -0.26017
155 7.97 0.550463 1.260501 0.693859
160 7.92 0.646196 1.196462 0.773149
175 7.62 0.933394 0.812228 0.758129
185 6.89 1.12486 -0.12274 -0.13807
190 7.9 1.220592 1.170846 1.429126
126.25 6.985833 8.501037
sx = 52.2287 4 sv = 0. 78077 4
![Page 34: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/34.jpg)
17
18.501037
12 10.773
i i
x y
x x y ys s
rn
− − =
−
=−
=
∑
4-17
![Page 35: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/35.jpg)
184-18
fJ Deter:tnine Whether a Linear Relation Exists bet'W'een TW'o Variables
![Page 36: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/36.jpg)
Does a Linear Relation Exist?
Since is fairly close to +1, we can say there is a positive linear relation between time to drill five feet and depth at which drilling begins.
194-19
![Page 37: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/37.jpg)
204-20
fEJ Explain the Differen~e bet'Ween Correlation and Causation
![Page 38: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/38.jpg)
21
According to data obtained from the Statistical Abstract of the United States, the correlation between the percentage of the female population with a bachelor’s degree and the percentage of births to unmarried mothers since 1990 is 0.940.
Does this mean that a higher percentage of females with bachelor’s degrees causes a higher percentage of births to unmarried mothers?
Certainly not! The correlation exists only because both percentages have been increasing since 1990. It is this relation that causes the high correlation. In general, time series data (data collected over time) will have high correlations because each variable is moving in a specific direction over time (both going up or down over time; one increasing, while the other is decreasing over time).
When data are observational, we cannot claim a causal relation exists between two variables. We can only claim causality when the data are collected through a designed experiment.
4-21
![Page 39: Linear Correlations](https://reader030.vdocument.in/reader030/viewer/2022032823/62148a022a11a1445d74c2cc/html5/thumbnails/39.jpg)
Another way that two variables can be related even though there is not a causal relation is through a lurking variable.
A lurking variable is related to both the explanatory and response variable.
For example, ice cream sales and crime rates have a very high correlation. Does this mean that local governments should shut down all ice cream shops? No! The lurking variable is temperature. As air temperatures rise, both ice cream sales and crime rates rise.
4-22