chapter 3 – examining relationships
DESCRIPTION
Chapter 3 – Examining Relationships. Scatterplots and Correlation - 3.1. Shows a relationship between two variables. Scatterplots:. Response Variables:. Variable on the y- axis. Response to a variable. Explanatory Variables:. Variable on the x- axis. Influences the response. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/1.jpg)
Chapter 3 – Examining Relationships
![Page 2: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/2.jpg)
Scatterplots and Correlation - 3.1
![Page 3: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/3.jpg)
Scatterplots: Shows a relationship between two variables.
Explanatory Variables: Variable on the x-axis.Influences the response
Response Variables: Variable on the y-axis.
Response to a variable
![Page 4: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/4.jpg)
Looking at Scatterplots:
• Direction: Positive as x increases, y increasesNegative as x increases, y decreases
• Form: Is there a linear relationship between the two variables?
• Strength: Do the points follow a single stream that is tight to the line or is there considerable spread (or variability) around the line?
![Page 5: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/5.jpg)
Calculator Tip: Scatterplots
L1: Explanatory Variable
L2: Response Variable
Use statplot to graph
![Page 6: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/6.jpg)
Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.
1. T-shirts at a store: Price of each, Number Sold
x
yD:
S:
negative
strong
$5 $50
1
100
Price of shirt
# sold
explanatory response
![Page 7: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/7.jpg)
Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.
2. Drivers: Reaction Time, Blood Alcohol Level
x
yD:
S:
positive
strong
.01 .5
1
10
BAC
Time
explanatoryresponse
![Page 8: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/8.jpg)
Example #1:Suppose you were to collect data for each pair of variables below. Which variable is the explanatory and which is the response? Determine the likely direction and strength of the relationship.
3. Cars: Age of Owner, Weight of the Car
Makes no sense!!!
![Page 9: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/9.jpg)
Example #2:In a study of whether a relationship exists between a child's aptitude and the age at which he/she first speaks, researchers recorded the age (in months) of a child's first speech and the child's score on an aptitude test. These data for these 21 children follow:
Make a scatterplot and describe the relationship in the context of the problem.
![Page 10: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/10.jpg)
D:
F:
S:positive
curved
moderate
![Page 11: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/11.jpg)
Correlation:
Measures the direction and strength of the linear relationship
“r”
Must be quantitative
![Page 12: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/12.jpg)
Attributes of the Correlation
1.The correlation coefficient is a unit-less measurement, denoted with the letter r, and has values between -1 and 1.
2. When r = 1 all the data points form a perfect straight line relationship with a positive slope.
3. When r = -1 all the data points form a perfect straight line relationship with a negative slope.
![Page 13: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/13.jpg)
Attributes of the Correlation
4. Values of r close to 0 means that the linear relationship is weak. There is a general linear trend, but there is a lot of variability around that trend.
5. When r =0 there is no relationship between the two variables. In other words, the best fitting line has a slope of zero.
![Page 14: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/14.jpg)
6. Outliers have a large influence on the correlation coefficient. The correlation is NOT resistant to outliers.
Attributes of the Correlation
7. Correlation does not describe curved relationships! (ONLY LINEAR)
![Page 15: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/15.jpg)
Types of Correlation:
r = 0 r = -0.3
r = 0.5 r = -0.7
r = 0.9 r = -0.99
![Page 16: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/16.jpg)
Example #3:What is wrong with the following statements?
There is a strong correlation between the gender of American workers and their income.
Gender is categorical
![Page 17: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/17.jpg)
Example #3:What is wrong with the following statements?
2. We found a high correlation (r = 1.09) between students’ rating of faculty teaching and ratings made by other faculty members.
r can’t be bigger than 1
![Page 18: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/18.jpg)
Example #3:What is wrong with the following statements?
3. We found a very weak correlation (r = -0.95) which suggests little relationship between income and hours spent at casinos.
r = -0.95 is a strong negative relationship
![Page 19: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/19.jpg)
Example #3:What is wrong with the following statements?
4. We found a very weak correlation (r = 0.01) which suggests little relationship between age and death rate.
Should be a very strong relationship!
![Page 20: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/20.jpg)
Guidelines: How strong is the linear relationship?
0 < r < 0.3 = weak positive -0.3 < r < 0 = weak negative0.4 < r < 0.7 = moderate positive -0.4 < r < -0.7 = moderate negative0.8 < r < 1 = strong positive -0.8 < r < -1 = strong negative
![Page 21: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/21.jpg)
HOW TO CALCULATE THE CORRELATION COEFFICIENT
Remember how to calculate the z-score? We used this calculation to determine how many standard deviations our observations was from the mean.
RECALL:
z - score = z = x
![Page 22: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/22.jpg)
In this case, we were only concerned with one variable.
Now, we are considering two variables and each must be standardized.
![Page 23: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/23.jpg)
Notation:
s' theofdeviation standard sampleS
s' theofn observatioth ' the
s' ofmean sample
n correlatio
x x
xix
xx
r
i
s' theofdeviation standard sampleS
s' theofn observatioth ' the
s' ofmean sample
nsobservatio ofnumber totaln
y y
yiy
yy
i
![Page 24: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/24.jpg)
FORMULA:
y
i
x
i
S
yy
S
xx
n 1
1r
![Page 25: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/25.jpg)
Calculator Tip: Correlation
L1: Explanatory Variable
L2: Response Variable
Stat-calc-LinReg(a+bx), L1, L2
(make sure your diagnostic is on!!!)
![Page 26: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/26.jpg)
Example #4:
Speed (x) 20 30 40
MPG (y) 25 35 45
Step #1: Find the following summary statistics:
n = ________
SPEED: x = ______ sx = _______
MPG: y = ______ sy = _______
330 10
35 10
![Page 27: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/27.jpg)
Step #2: Calculate z-scores
SPEED Z(x1) = Z(x2) = Z(x3) =
MPG Z(y1) = Z(y2) = Z(y3) =
PRODUCT Z(x1)Z(y1) = Z(x2)Z(y2) = Z(x3)Z(y3) =
10
3020Z
1Z
10
3030Z
0Z
10
3040Z
1Z
10
3525Z
1Z
10
3535Z
0Z
10
3545Z
1Z
1 0 1
![Page 28: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/28.jpg)
Step #3: Calculate the Correlation
10113
1r
)2(2
1r
1r
![Page 29: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/29.jpg)
3.2 – Least-Squares Regression
![Page 30: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/30.jpg)
Regression line: straight line that describes the linear relationship between an explanatory variable and a response variable.
![Page 31: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/31.jpg)
LEAST SQUARES REGRESSION LINE:
• This is the best-fitting line to the data.
• The goal is to minimize the (vertical) distances of your observations (data) from your line.
• Again, we must square the distances (like the calculation of the variance) because some data points will be larger than the mean (positive) and some are smaller than the mean (negative) and they will cancel each other out. So to compensate, they are squared.
![Page 32: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/32.jpg)
We can use this line to predict a response, y, from a given explanatory variable, x.
![Page 33: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/33.jpg)
Remember graphing??
Slope-Intercept formula for a line:
y = mx + b where m = ____________
and b = ____________
slope
y-intercept
Do you remember the SLOPE?
rise
run
y
x
In statistics, we write it
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
![Page 34: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/34.jpg)
Example #1Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.
Predicted Weight = – 393 + 5.9(length)
1. What is the slope of the line? What does it mean?
m = 5.9
For every inch in length, it adds 5.9 pounds in weight
![Page 35: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/35.jpg)
Example #1Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.
Predicted Weight = – 393 + 5.9(length)
2. What is the y-intercept of the line? What does it mean?
b = -393
If an alligator is 0 inches, then it weights -393lbs. This makes no sense!!!
![Page 36: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/36.jpg)
Example #1Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.
Predicted Weight = – 393 + 5.9(length)
3. Describe the relationship between weight and length of alligators.
As the length increases, their weight increases.
![Page 37: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/37.jpg)
Example #1Wildlife researchers monitor many wildlife populations by taking aerial photographs in order to estimate the weights of alligators. Here is the regression line of the weights of adult alligators (in pounds) and their lengths (in inches) based on the data collected from captured alligators.
Predicted Weight = – 393 + 5.9(length)
4. What is the predicted weight for an alligator 90 inches long?
= -393 + 5.9(90)
= -393 + 531
= 138 lbs
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
![Page 38: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/38.jpg)
CALCULATION:
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
![Page 39: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/39.jpg)
Facts about Least Squares Regression:
1. The distinction between explanatory and response variables is essential (which variable is used to predict which?).
2. It always passes through the point (x, y).
3. Correlation ‘r’ describes the direction and strength of the straight line, but doesn’t tell us anymore about the slope than if it is positive or negative, or zero.
![Page 40: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/40.jpg)
Extrapolation: Predicting outside the range of the x values
![Page 41: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/41.jpg)
Calculator Tip: LSRL
L1: Explanatory Variable
L2: Response Variable
Stat-calc-LinReg(a+bx), L1, L2, vars/y-vars/Function/ Y1
![Page 42: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/42.jpg)
Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:
Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396
Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843
a. Interpret the value of the correlation coefficient in the context of the problem.
As wine consumption increases, mean deaths from heart disease decreases.
![Page 43: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/43.jpg)
Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:
Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396
Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843
b. Calculate the least-squares regression line predicting death rate from wine consumption.
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= -0.0843(68,396/2,510) = -2.2971
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx = 191,053–(-2.2971*3,026)= 198004.0991
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= 198,004.0991 – 2.2971x
![Page 44: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/44.jpg)
Example #2: Is there a relationship between wine consumption (in liters) and yearly deaths from heart disease (deaths per 100,000)? Here are the summary statistics:
Mean wine consumption: 3,026 SD of wine consumption: 2,510Mean deaths from heart disease: 191,053 SD of heart disease deaths: 68,396
Correlation coefficient between wine consumption and yearly deaths from heart disease = -.0843
c. Use your line to predict death rate for an average adult who consumes 4 liters of wine.
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= 198,004.0991 – 2.2971x
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= 198,004.0991 – 2.2971(4)
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= 197,994.9107
![Page 45: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/45.jpg)
Example #3: The following data describes the relationship between a tree trunks diameter vs. it height. Make a scatterplot of the data and find the LSRL. Define any variables used in this equation. How strong of an association is there?
Trunk Diameter
8 9 7 6 13 7 11 12
Tree Height
35 49 27 33 60 21 45 51
![Page 46: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/46.jpg)
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= -1.31467 + 4.54133x
Where x = trunk diameter and
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= tree height
Strong correlation, r = 0.88
![Page 47: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/47.jpg)
Residual: How close is the data to the line?
Observed y – predicted
yy ˆ
y
![Page 48: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/48.jpg)
residual
![Page 49: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/49.jpg)
Residual Plot: A plot that shows the residuals for all the data. A good line has no pattern.
Calculator Tip: Residual PlotCalculate the LSRLL3: vars/ y-vars/ function/ Y1(L1)L4: L2 – L3
Scatterplot: L1, L4
![Page 50: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/50.jpg)
Example of random residual plots
![Page 51: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/51.jpg)
Example of curved residual plots
Not a linear model.
![Page 52: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/52.jpg)
Example of fanning residual plots
Less accurate for larger x values.
![Page 53: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/53.jpg)
Standard Deviation of the residuals:
Used to measure the prediction error of the line
2
residuals2
ns
Calculator Tip: SD of residuals
Find residuals/ in L5: L42/2nd List/ math/ sum(L5)
![Page 54: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/54.jpg)
Example #4The ages (in years) of seven men and their systolic blood pressures are given below:
Age (x) 16 25 39 45 49 64 70Systolic BP 100 120 140 160 165 185 200
Predicted Pressure (ˆ y )
102.2 118.5 143.8 154.7 161.9 189 199.8y
Regression Equation: xy 8068.13589.73ˆ
Residuals: -2.27 1.47 -3.82 5.34 3.11 -3.99 .17
![Page 55: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/55.jpg)
Residual Plot:
No apparent pattern.
![Page 56: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/56.jpg)
Standard deviation of the residuals::
-2.27 1.47 -3.82 5.34 3.11 -3.99 .17
2
residuals2
ns
27
)17(.)99.3()11.3()34.5()82.3()47.1()27.2( 2222222
s
5
03275905.76s
899557899.3s
![Page 57: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/57.jpg)
Assessing the Predictive Power of the Equation:
1. Correlation of Determination: r2 = the correlation coefficient, squared
2. It is the fraction (or percent) of the variation in the values of y that is explained by the least-squares regression of y on x.
3. The closer r2 is to 1, the better the regression line describes the connection between x and y – in particular, predictions made with the equation will be more accurate.
![Page 58: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/58.jpg)
3.2 & 3.3 – Correlation of Determination, Lurking Variables
![Page 59: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/59.jpg)
Correlation of Determination: (r2)
How much of the y value is explained by the x value
![Page 60: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/60.jpg)
Reading Computer Output:
Predictor Coef StDev T PConstantx-variable
S = R-Sq= R-Sq(adj) =
y-intSlope
r2
![Page 61: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/61.jpg)
Example #1The correlation between alcohol and yearly deaths from heart disease was -0.843. What percent of the variation in the yearly deaths from heart disease can be explained by the regression of yearly deaths in alcohol consumption?
r = -0.843
r2 = 0.710649
71% of deaths from heart disease can be explained by alcohol consumption.
![Page 62: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/62.jpg)
Example #2Is there a linear relationship between marijuana consumption and other drug usage? For this regression, the percent of variability in other drug usage explained by the regression of other drugs on marijuana use as 66.5%. What is the correlation coefficient?
r = 0.815475
r2 = .665
![Page 63: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/63.jpg)
Example #3Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.
a. Calculate the LSRL.
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= 0.849(143/2.008) = 60.46165339
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx = 446.9 – (60.46*7.557) = -10.00871464
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
= -10.0087 + 60.4617x
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
is the predicted number of calories and x is the serving size.
![Page 64: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/64.jpg)
b. What percent of the variability in calories is explained by the least squares line with serving size?
Example #3Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.
r2 = 0.8492 = 0.720801
72% of the variability in calories is explained by serving size
![Page 65: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/65.jpg)
c. Use this regression line to predict the average number of calories in a 35-ounce serving. Explain if the least squares would be appropriate to use in this situation.
Example #3Fast Food Sandwiches: The mean serving size for fast food sandwiches is 7.557 ounces with a standard deviation of 2.008 ounces. The mean number of calories per sandwich is 446.9 with a standard deviation of 143. The correlation between serving size and calories is 0.849.
xy 4617.600087.10ˆ )35(4617.600087.10ˆ y
1508.2106ˆ y
No, extrapolation, too far away from normal values.
![Page 66: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/66.jpg)
Example #3:Commercial airlines need to know the operating cost per hour of flight for each plane in their fleet. In a study of the relationship between operating cost per hour and number of passenger seats, investigators computed the regression of operating cost per hour on the number of passenger seats. The 12 sample aircraft used in the study included planes with as few as 126 passenger seats and planes with as many as 410 passenger seats. Operating cost per hour ranged between $3,600 and $7,800. Some computer output from a regression analysis of these data are shown below.
![Page 67: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/67.jpg)
![Page 68: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/68.jpg)
a. What is the equation of the least squares regression line that describes the relationship between operating cost per hour and number of passenger seats in the plane? Define any variables used in this equation.
xy 673.141136ˆ
ˆ y a bx
1.Slope: b rSy
Sx
Calculate this first!
2. Y - intercept: a = y - bx
is the predicted operation cost and x is the # of passenger seats
![Page 69: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/69.jpg)
b. What is the value of the correlation coefficient for operating cost per hour and number of passenger seats in the plane? Interpret this correlation.
57.0r = 0.75498
There is a positive strong correlation between the number of passenger seats and cost for operation.
![Page 70: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/70.jpg)
c. Suppose that you want to describe the relationship between operating cost per hour and number of passenger seats in the planes only in the range of 250 to 350 seats. Does the line shown in the scatterplot still provide the best description of the relationship for data in this range? Why or why not?
No, Between 250 and 350 seats, the direction looks negative.
![Page 71: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/71.jpg)
Cautions in Making Predictions with Regression Lines:
1. If the correlation is not strong, predictions will not be accurate.
2. Extrapolation: Do not make predictions outside of the range for which you have data.
3. Correlation simply does not imply causation
• The correlation may be a coincidence• Both correlation variables might be directly influenced by some common underlying cause
![Page 72: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/72.jpg)
It is a variable that is not among the explanatory or response variables, but influences the interpretation of the relationship.
Lurking Variables:
Causation Common Response (z = lurking variable)
X YX Y
Z
![Page 73: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/73.jpg)
Example #4There is a positive correlation between the number of deaths by drowning and the number of ice cream cones sold. Is this evidence that people are not heeding the old advice to wait 2 hours after eating before swimming and are paying the price for it?
No! Summer is the lurking variable
![Page 74: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/74.jpg)
Example #5 Smoke Causes Coughs: A strong relationship is
found between weekly sales of firewood and weekly sales of cough drops from September to March. Can we conclude that smoke from the fires causes coughs?
No! Winter is the lurking variable
![Page 75: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/75.jpg)
Outlier: Observation away from the other data points
Influential Point:
Observation that drastically changes the LSRL
![Page 76: Chapter 3 – Examining Relationships](https://reader036.vdocument.in/reader036/viewer/2022062301/56814c50550346895db95fbb/html5/thumbnails/76.jpg)
http://bcs.whfreeman.com/tps3e/pages/bcs-main.asp?v=category&s=00020&n=99000&i=99020.01&o=|00510|00520|00530|00010|00020|00030|00040|00050|00060|00070|00080|00110|00120|00300|0P000|01000|02000|03000|04000|05000|06000|07000|08000|09000|10000|11000|12000|13000|14000|15000|99000|
Applet: