chapter 7: introduction to linear regressionstevel/251/slides/os3_slides_07.pdfchapter 7:...
TRANSCRIPT
![Page 1: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/1.jpg)
Chapter 7: Introduction to linear regression
OpenIntro Statistics, 3rd Edition
Slides developed by Mine Cetinkaya-Rundel of OpenIntro.The slides may be copied, edited, and/or shared via the CC BY-SA license.Some images may be included under fair use guidelines (educational purposes).
![Page 2: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/2.jpg)
Line fitting, residuals, and correla-tion
![Page 3: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/3.jpg)
Modeling numerical variables
In this unit we will learn to quantify the relationship between twonumerical variables, as well as modeling numerical responsevariables using a numerical or categorical explanatory variable.
2
![Page 4: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/4.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 5: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/5.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 6: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/6.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 7: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/7.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 8: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/8.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 9: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/9.jpg)
Poverty vs. HS graduate rate
The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who livebelow the poverty line (income below $23,050 for a family of 4 in 2012).
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
Response variable?
% in poverty
Explanatory variable?
% HS grad
Relationship?
linear, negative, moderatelystrong
3
![Page 10: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/10.jpg)
Quantifying the relationship
• Correlation describes the strength of the linear associationbetween two variables.
• It takes values between -1 (perfect negative) and +1 (perfectpositive).
• A value of 0 indicates no linear association.
4
![Page 11: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/11.jpg)
Quantifying the relationship
• Correlation describes the strength of the linear associationbetween two variables.
• It takes values between -1 (perfect negative) and +1 (perfectpositive).
• A value of 0 indicates no linear association.
4
![Page 12: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/12.jpg)
Quantifying the relationship
• Correlation describes the strength of the linear associationbetween two variables.
• It takes values between -1 (perfect negative) and +1 (perfectpositive).
• A value of 0 indicates no linear association.
4
![Page 13: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/13.jpg)
Guessing the correlation
Which of the following is the best guess for the correlation between% in poverty and % HS grad?
(a) 0.6
(b) -0.75
(c) -0.1
(d) 0.02
(e) -1.580 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
5
![Page 14: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/14.jpg)
Guessing the correlation
Which of the following is the best guess for the correlation between% in poverty and % HS grad?
(a) 0.6
(b) -0.75
(c) -0.1
(d) 0.02
(e) -1.580 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
5
![Page 15: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/15.jpg)
Guessing the correlation
Which of the following is the best guess for the correlation between% in poverty and % HS grad?
(a) 0.1
(b) -0.6
(c) -0.4
(d) 0.9
(e) 0.58 10 12 14 16 18
6
8
10
12
14
16
18
% female householder, no husband present
% in
pov
erty
6
![Page 16: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/16.jpg)
Guessing the correlation
Which of the following is the best guess for the correlation between% in poverty and % HS grad?
(a) 0.1
(b) -0.6
(c) -0.4
(d) 0.9
(e) 0.58 10 12 14 16 18
6
8
10
12
14
16
18
% female householder, no husband present
% in
pov
erty
6
![Page 17: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/17.jpg)
Assessing the correlation
Which of the following is has the strongest correlation, i.e. correla-tion coefficient closest to +1 or -1?
(a) (b)
(c) (d)
7
![Page 18: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/18.jpg)
Assessing the correlation
Which of the following is has the strongest correlation, i.e. correla-tion coefficient closest to +1 or -1?
(a) (b)
(c) (d)
(b)→correlationmeans linearassociation
7
![Page 19: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/19.jpg)
Fitting a line by least squares re-gression
![Page 20: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/20.jpg)
Eyeballing the line
Which of the follow-ing appears to bethe line that best fitsthe linear relation-ship between % inpoverty and % HSgrad? Choose one.
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
(a)(b)(c)(d)
9
![Page 21: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/21.jpg)
Eyeballing the line
Which of the follow-ing appears to bethe line that best fitsthe linear relation-ship between % inpoverty and % HSgrad? Choose one.
(a)
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
(a)(b)(c)(d)
9
![Page 22: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/22.jpg)
Residuals
Residuals are the leftovers from the model fit: Data = Fit +Residual
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
10
![Page 23: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/23.jpg)
Residuals (cont.)
Residual
Residual is the difference between the observed (yi) and predictedyi.
ei = yi − yi
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
y
5.44
yy
−4.16
y
DC
RI
• % living in poverty inDC is 5.44% morethan predicted.
• % living in poverty inRI is 4.16% less thanpredicted.
11
![Page 24: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/24.jpg)
Residuals (cont.)
Residual
Residual is the difference between the observed (yi) and predictedyi.
ei = yi − yi
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
y
5.44
yy
−4.16
y
DC
RI
• % living in poverty inDC is 5.44% morethan predicted.
• % living in poverty inRI is 4.16% less thanpredicted.
11
![Page 25: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/25.jpg)
Residuals (cont.)
Residual
Residual is the difference between the observed (yi) and predictedyi.
ei = yi − yi
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
y
5.44
yy
−4.16
y
DC
RI
• % living in poverty inDC is 5.44% morethan predicted.
• % living in poverty inRI is 4.16% less thanpredicted.
11
![Page 26: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/26.jpg)
A measure for the best line
• We want a line that has small residuals:
1. Option 1: Minimize the sum of magnitudes (absolute values) ofresiduals
|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 27: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/27.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 28: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/28.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 29: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/29.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?
1. Most commonly used2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 30: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/30.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used
2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 31: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/31.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used2. Easier to compute by hand and using software
3. In many applications, a residual twice as large as another isusually more than twice as bad
12
![Page 32: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/32.jpg)
A measure for the best line
• We want a line that has small residuals:1. Option 1: Minimize the sum of magnitudes (absolute values) of
residuals|e1| + |e2| + · · · + |en|
2. Option 2: Minimize the sum of squared residuals – leastsquares
e21 + e2
2 + · · · + e2n
• Why least squares?1. Most commonly used2. Easier to compute by hand and using software3. In many applications, a residual twice as large as another is
usually more than twice as bad
12
![Page 33: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/33.jpg)
The least squares line
y = β0 + β1x
������predicted y �
���
intercept
@@@R
slope
HHHHHj
explanatory variable
Notation:
• Intercept:• Parameter: β0
• Point estimate: b0
• Slope:• Parameter: β1
• Point estimate: b113
![Page 34: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/34.jpg)
Given...
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
% HS grad % in poverty(x) (y)
mean x = 86.01 y = 11.35sd sx = 3.73 sy = 3.1
correlation R = −0.75
14
![Page 35: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/35.jpg)
Slope
Slope
The slope of the regression can be calculated as
b1 =sy
sxR
In context...b1 =
3.13.73
× −0.75 = −0.62
InterpretationFor each additional % point in HS graduate rate, we would expectthe % living in poverty to be lower on average by 0.62% points.
15
![Page 36: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/36.jpg)
Slope
Slope
The slope of the regression can be calculated as
b1 =sy
sxR
In context...b1 =
3.13.73
× −0.75 = −0.62
InterpretationFor each additional % point in HS graduate rate, we would expectthe % living in poverty to be lower on average by 0.62% points.
15
![Page 37: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/37.jpg)
Slope
Slope
The slope of the regression can be calculated as
b1 =sy
sxR
In context...b1 =
3.13.73
× −0.75 = −0.62
InterpretationFor each additional % point in HS graduate rate, we would expectthe % living in poverty to be lower on average by 0.62% points.
15
![Page 38: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/38.jpg)
Intercept
Intercept
The intercept is where the regression line intersects the y-axis.The calculation of the intercept uses the fact the a regression linealways passes through (x, y).
b0 = y − b1x
0 20 40 60 80 1000
10
20
30
40
50
60
70
% HS grad
% in
pov
erty
intercept
b0 = 11.35 − (−0.62) × 86.01
= 64.68
16
![Page 39: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/39.jpg)
Intercept
Intercept
The intercept is where the regression line intersects the y-axis.The calculation of the intercept uses the fact the a regression linealways passes through (x, y).
b0 = y − b1x
0 20 40 60 80 1000
10
20
30
40
50
60
70
% HS grad
% in
pov
erty
intercept
b0 = 11.35 − (−0.62) × 86.01
= 64.68
16
![Page 40: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/40.jpg)
Intercept
Intercept
The intercept is where the regression line intersects the y-axis.The calculation of the intercept uses the fact the a regression linealways passes through (x, y).
b0 = y − b1x
0 20 40 60 80 1000
10
20
30
40
50
60
70
% HS grad
% in
pov
erty
intercept
b0 = 11.35 − (−0.62) × 86.01
= 64.68
16
![Page 41: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/41.jpg)
Which of the following is the correct interpretation of the intercept?
(a) For each % point increase in HS graduate rate, % living inpoverty is expected to increase on average by 64.68%.
(b) For each % point decrease in HS graduate rate, % living inpoverty is expected to increase on average by 64.68%.
(c) Having no HS graduates leads to 64.68% of residents livingbelow the poverty line.
(d) States with no HS graduates are expected on average to have64.68% of residents living below the poverty line.
(e) In states with no HS graduates % living in poverty is expectedto increase on average by 64.68%.
17
![Page 42: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/42.jpg)
Which of the following is the correct interpretation of the intercept?
(a) For each % point increase in HS graduate rate, % living inpoverty is expected to increase on average by 64.68%.
(b) For each % point decrease in HS graduate rate, % living inpoverty is expected to increase on average by 64.68%.
(c) Having no HS graduates leads to 64.68% of residents livingbelow the poverty line.
(d) States with no HS graduates are expected on average to have64.68% of residents living below the poverty line.
(e) In states with no HS graduates % living in poverty is expectedto increase on average by 64.68%.
17
![Page 43: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/43.jpg)
More on the intercept
Since there are no states in the dataset with no HS graduates, theintercept is of no interest, not very useful, and also not reliablesince the predicted value of the intercept is so far from the bulk ofthe data.
0 20 40 60 80 1000
10
20
30
40
50
60
70
% HS grad
% in
pov
erty
intercept
18
![Page 44: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/44.jpg)
Regression line
% in poverty = 64.68 − 0.62 % HS grad
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
19
![Page 45: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/45.jpg)
Interpretation of slope and intercept
• Intercept: When x = 0, y isexpected to equal theintercept.
• Slope: For each unit in x, yis expected to increase /decrease on average by theslope.
Note: These statements are not causal, unless the study is a randomized
controlled experiment.
20
![Page 46: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/46.jpg)
Prediction
• Using the linear model to predict the value of the responsevariable for a given value of the explanatory variable is calledprediction, simply by plugging in the value of x in the linearmodel equation.
• There will be some uncertainty associated with the predictedvalue.
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
21
![Page 47: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/47.jpg)
Extrapolation
• Applying a model estimate to values outside of the realm ofthe original data is called extrapolation.
• Sometimes the intercept might be an extrapolation.
0 20 40 60 80 1000
10
20
30
40
50
60
70
% HS grad
% in
pov
erty
intercept
22
![Page 48: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/48.jpg)
Examples of extrapolation
23
![Page 49: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/49.jpg)
Examples of extrapolation
24
![Page 50: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/50.jpg)
Examples of extrapolation
25
![Page 51: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/51.jpg)
Conditions for the least squares line
1. Linearity
2. Nearly normal residuals
3. Constant variability
26
![Page 52: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/52.jpg)
Conditions for the least squares line
1. Linearity
2. Nearly normal residuals
3. Constant variability
26
![Page 53: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/53.jpg)
Conditions for the least squares line
1. Linearity
2. Nearly normal residuals
3. Constant variability
26
![Page 54: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/54.jpg)
Conditions: (1) Linearity
• The relationship between the explanatory and the responsevariable should be linear.
• Methods for fitting a model to non-linear relationships exist,but are beyond the scope of this class. If this topic is ofinterest, an Online Extra is available on openintro.org coveringnew techniques.
• Check using a scatterplot of the data, or a residuals plot.
x x
ysu
mm
ary(
g)$r
esid
uals
x
ysu
mm
ary(
g)$r
esid
uals
27
![Page 55: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/55.jpg)
Conditions: (1) Linearity
• The relationship between the explanatory and the responsevariable should be linear.
• Methods for fitting a model to non-linear relationships exist,but are beyond the scope of this class. If this topic is ofinterest, an Online Extra is available on openintro.org coveringnew techniques.
• Check using a scatterplot of the data, or a residuals plot.
x x
ysu
mm
ary(
g)$r
esid
uals
x
ysu
mm
ary(
g)$r
esid
uals
27
![Page 56: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/56.jpg)
Conditions: (1) Linearity
• The relationship between the explanatory and the responsevariable should be linear.
• Methods for fitting a model to non-linear relationships exist,but are beyond the scope of this class. If this topic is ofinterest, an Online Extra is available on openintro.org coveringnew techniques.
• Check using a scatterplot of the data, or a residuals plot.
x x
ysu
mm
ary(
g)$r
esid
uals
x
ysu
mm
ary(
g)$r
esid
uals
27
![Page 57: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/57.jpg)
Anatomy of a residuals plot
% HS grad
% in
pov
erty
80 85 90
5
10
15
−5
0
5
N RI:
% HS grad = 81 % in poverty = 10.3% in poverty = 64.68 − 0.62 ∗ 81 = 14.46
e = % in poverty − % in poverty
= 10.3 − 14.46 = −4.16
� DC:
% HS grad = 86 % in poverty = 16.8% in poverty = 64.68 − 0.62 ∗ 86 = 11.36
e = % in poverty − % in poverty
= 16.8 − 11.36 = 5.44
28
![Page 58: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/58.jpg)
Anatomy of a residuals plot
% HS grad
% in
pov
erty
80 85 90
5
10
15
−5
0
5
N RI:
% HS grad = 81 % in poverty = 10.3% in poverty = 64.68 − 0.62 ∗ 81 = 14.46
e = % in poverty − % in poverty
= 10.3 − 14.46 = −4.16
� DC:
% HS grad = 86 % in poverty = 16.8% in poverty = 64.68 − 0.62 ∗ 86 = 11.36
e = % in poverty − % in poverty
= 16.8 − 11.36 = 5.44
28
![Page 59: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/59.jpg)
Conditions: (2) Nearly normal residuals
• The residuals should be nearly normal.
• This condition may not be satisfied when there are unusualobservations that don’t follow the trend of the rest of the data.
• Check using a histogram or normal probability plot ofresiduals.
residuals
freq
uenc
y
−4 −2 0 2 4 6
02
46
810
12
−2 −1 0 1 2
−4
−2
02
4
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
29
![Page 60: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/60.jpg)
Conditions: (2) Nearly normal residuals
• The residuals should be nearly normal.• This condition may not be satisfied when there are unusual
observations that don’t follow the trend of the rest of the data.
• Check using a histogram or normal probability plot ofresiduals.
residuals
freq
uenc
y
−4 −2 0 2 4 6
02
46
810
12
−2 −1 0 1 2
−4
−2
02
4
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
29
![Page 61: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/61.jpg)
Conditions: (2) Nearly normal residuals
• The residuals should be nearly normal.• This condition may not be satisfied when there are unusual
observations that don’t follow the trend of the rest of the data.• Check using a histogram or normal probability plot of
residuals.
residuals
freq
uenc
y
−4 −2 0 2 4 6
02
46
810
12
−2 −1 0 1 2
−4
−2
02
4
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
29
![Page 62: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/62.jpg)
Conditions: (3) Constant variability
80 85 90
68
1012
1416
18
% HS grad
% in
pov
erty
80 90
−4
04
• The variability of pointsaround the least squaresline should be roughlyconstant.
• This implies that thevariability of residualsaround the 0 line should beroughly constant as well.
• Also calledhomoscedasticity.
• Check using a histogram ornormal probability plot ofresiduals.
30
![Page 63: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/63.jpg)
Conditions: (3) Constant variability
80 85 90
68
1012
1416
18
% HS grad
% in
pov
erty
80 90
−4
04
• The variability of pointsaround the least squaresline should be roughlyconstant.
• This implies that thevariability of residualsaround the 0 line should beroughly constant as well.
• Also calledhomoscedasticity.
• Check using a histogram ornormal probability plot ofresiduals.
30
![Page 64: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/64.jpg)
Conditions: (3) Constant variability
80 85 90
68
1012
1416
18
% HS grad
% in
pov
erty
80 90
−4
04
• The variability of pointsaround the least squaresline should be roughlyconstant.
• This implies that thevariability of residualsaround the 0 line should beroughly constant as well.
• Also calledhomoscedasticity.
• Check using a histogram ornormal probability plot ofresiduals.
30
![Page 65: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/65.jpg)
Conditions: (3) Constant variability
80 85 90
68
1012
1416
18
% HS grad
% in
pov
erty
80 90
−4
04
• The variability of pointsaround the least squaresline should be roughlyconstant.
• This implies that thevariability of residualsaround the 0 line should beroughly constant as well.
• Also calledhomoscedasticity.
• Check using a histogram ornormal probability plot ofresiduals.
30
![Page 66: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/66.jpg)
Checking conditions
What condition is this linearmodel obviously violating?
(a) Constant variability
(b) Linear relationship
(c) Normal residuals
(d) No extreme outliers x x
yg$residuals
x
yg$residuals
31
![Page 67: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/67.jpg)
Checking conditions
What condition is this linearmodel obviously violating?
(a) Constant variability
(b) Linear relationship
(c) Normal residuals
(d) No extreme outliers x x
yg$residuals
x
yg$residuals
31
![Page 68: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/68.jpg)
Checking conditions
What condition is this linearmodel obviously violating?
(a) Constant variability
(b) Linear relationship
(c) Normal residuals
(d) No extreme outliersx x
yg$residuals
x
yg$residuals
32
![Page 69: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/69.jpg)
Checking conditions
What condition is this linearmodel obviously violating?
(a) Constant variability
(b) Linear relationship
(c) Normal residuals
(d) No extreme outliersx x
yg$residuals
x
yg$residuals
32
![Page 70: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/70.jpg)
R2
• The strength of the fit of a linear model is most commonlyevaluated using R2.
• R2 is calculated as the square of the correlation coefficient.
• It tells us what percent of variability in the response variable isexplained by the model.
• The remainder of the variability is explained by variables notincluded in the model or by inherent randomness in the data.
• For the model we’ve been working with, R2 = −0.622 = 0.38.
33
![Page 71: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/71.jpg)
R2
• The strength of the fit of a linear model is most commonlyevaluated using R2.
• R2 is calculated as the square of the correlation coefficient.
• It tells us what percent of variability in the response variable isexplained by the model.
• The remainder of the variability is explained by variables notincluded in the model or by inherent randomness in the data.
• For the model we’ve been working with, R2 = −0.622 = 0.38.
33
![Page 72: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/72.jpg)
R2
• The strength of the fit of a linear model is most commonlyevaluated using R2.
• R2 is calculated as the square of the correlation coefficient.
• It tells us what percent of variability in the response variable isexplained by the model.
• The remainder of the variability is explained by variables notincluded in the model or by inherent randomness in the data.
• For the model we’ve been working with, R2 = −0.622 = 0.38.
33
![Page 73: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/73.jpg)
R2
• The strength of the fit of a linear model is most commonlyevaluated using R2.
• R2 is calculated as the square of the correlation coefficient.
• It tells us what percent of variability in the response variable isexplained by the model.
• The remainder of the variability is explained by variables notincluded in the model or by inherent randomness in the data.
• For the model we’ve been working with, R2 = −0.622 = 0.38.
33
![Page 74: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/74.jpg)
R2
• The strength of the fit of a linear model is most commonlyevaluated using R2.
• R2 is calculated as the square of the correlation coefficient.
• It tells us what percent of variability in the response variable isexplained by the model.
• The remainder of the variability is explained by variables notincluded in the model or by inherent randomness in the data.
• For the model we’ve been working with, R2 = −0.622 = 0.38.
33
![Page 75: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/75.jpg)
Interpretation of R2
Which of the below is the correct interpretation of R = −0.62, R2 = 0.38?
(a) 38% of the variability in the % of HGgraduates among the 51 states isexplained by the model.
(b) 38% of the variability in the % ofresidents living in poverty among the51 states is explained by the model.
(c) 38% of the time % HS graduatespredict % living in poverty correctly.
(d) 62% of the variability in the % ofresidents living in poverty among the51 states is explained by the model.
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
34
![Page 76: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/76.jpg)
Interpretation of R2
Which of the below is the correct interpretation of R = −0.62, R2 = 0.38?
(a) 38% of the variability in the % of HGgraduates among the 51 states isexplained by the model.
(b) 38% of the variability in the % ofresidents living in poverty among the51 states is explained by the model.
(c) 38% of the time % HS graduatespredict % living in poverty correctly.
(d) 62% of the variability in the % ofresidents living in poverty among the51 states is explained by the model.
80 85 90
6
8
10
12
14
16
18
% HS grad
% in
pov
erty
34
![Page 77: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/77.jpg)
Poverty vs. region (east, west)
poverty = 11.17 + 0.38 × west
• Explanatory variable: region, reference level: east• Intercept: The estimated average poverty percentage in
eastern states is 11.17%
• This is the value we get if we plug in 0 for the explanatoryvariable
• Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.• Then, the estimated average poverty percentage in western
states is 11.17 + 0.38 = 11.55%.• This is the value we get if we plug in 1 for the explanatory
variable
35
![Page 78: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/78.jpg)
Poverty vs. region (east, west)
poverty = 11.17 + 0.38 × west
• Explanatory variable: region, reference level: east• Intercept: The estimated average poverty percentage in
eastern states is 11.17%• This is the value we get if we plug in 0 for the explanatory
variable
• Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.• Then, the estimated average poverty percentage in western
states is 11.17 + 0.38 = 11.55%.• This is the value we get if we plug in 1 for the explanatory
variable
35
![Page 79: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/79.jpg)
Poverty vs. region (east, west)
poverty = 11.17 + 0.38 × west
• Explanatory variable: region, reference level: east• Intercept: The estimated average poverty percentage in
eastern states is 11.17%• This is the value we get if we plug in 0 for the explanatory
variable
• Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.
• Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.
• This is the value we get if we plug in 1 for the explanatoryvariable
35
![Page 80: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/80.jpg)
Poverty vs. region (east, west)
poverty = 11.17 + 0.38 × west
• Explanatory variable: region, reference level: east• Intercept: The estimated average poverty percentage in
eastern states is 11.17%• This is the value we get if we plug in 0 for the explanatory
variable
• Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.• Then, the estimated average poverty percentage in western
states is 11.17 + 0.38 = 11.55%.
• This is the value we get if we plug in 1 for the explanatoryvariable
35
![Page 81: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/81.jpg)
Poverty vs. region (east, west)
poverty = 11.17 + 0.38 × west
• Explanatory variable: region, reference level: east• Intercept: The estimated average poverty percentage in
eastern states is 11.17%• This is the value we get if we plug in 0 for the explanatory
variable
• Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.• Then, the estimated average poverty percentage in western
states is 11.17 + 0.38 = 11.55%.• This is the value we get if we plug in 1 for the explanatory
variable
35
![Page 82: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/82.jpg)
Poverty vs. region (northeast, midwest, west, south)
Which region (northeast, midwest, west, or south) is the referencelevel?
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.50 0.87 10.94 0.00
region4midwest 0.03 1.15 0.02 0.98region4west 1.79 1.13 1.59 0.12
region4south 4.16 1.07 3.87 0.00
(a) northeast
(b) midwest
(c) west
(d) south
(e) cannot tell
36
![Page 83: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/83.jpg)
Poverty vs. region (northeast, midwest, west, south)
Which region (northeast, midwest, west, or south) is the referencelevel?
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.50 0.87 10.94 0.00
region4midwest 0.03 1.15 0.02 0.98region4west 1.79 1.13 1.59 0.12
region4south 4.16 1.07 3.87 0.00
(a) northeast
(b) midwest
(c) west
(d) south
(e) cannot tell
36
![Page 84: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/84.jpg)
Poverty vs. region (northeast, midwest, west, south)
Which region (northeast, midwest, west, or south) has the lowestpoverty percentage?
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.50 0.87 10.94 0.00
region4midwest 0.03 1.15 0.02 0.98region4west 1.79 1.13 1.59 0.12
region4south 4.16 1.07 3.87 0.00
(a) northeast
(b) midwest
(c) west
(d) south
(e) cannot tell
37
![Page 85: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/85.jpg)
Poverty vs. region (northeast, midwest, west, south)
Which region (northeast, midwest, west, or south) has the lowestpoverty percentage?
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.50 0.87 10.94 0.00
region4midwest 0.03 1.15 0.02 0.98region4west 1.79 1.13 1.59 0.12
region4south 4.16 1.07 3.87 0.00
(a) northeast
(b) midwest
(c) west
(d) south
(e) cannot tell
37
![Page 86: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/86.jpg)
Types of outliers in linear regres-sion
![Page 87: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/87.jpg)
Types of outliers
How do outliers influence theleast squares line in this plot?
To answer this question think ofwhere the regression line wouldbe with and without the outlier(s).Without the outliers theregression line would be steeper,and lie closer to the larger groupof observations. With the outliersthe line is pulled up and awayfrom some of the observations inthe larger group.
−20
−10
0
−5
0
5
39
![Page 88: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/88.jpg)
Types of outliers
How do outliers influencethe least squares line inthis plot? 0
5
10
−2
0
2
40
![Page 89: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/89.jpg)
Types of outliers
How do outliers influencethe least squares line inthis plot?
Without the outlier there isno evident relationshipbetween x and y.
0
5
10
−2
0
2
40
![Page 90: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/90.jpg)
Some terminology
• Outliers are points that lie away from the cloud of points.
• Outliers that lie horizontally away from the center of the cloudare called high leverage points.
• High leverage points that actually influence the slope of theregression line are called influential points.
• In order to determine if a point is influential, visualize theregression line with and without the point. Does the slope ofthe line change considerably? If so, then the point isinfluential. If not, then itOs not an influential point.
41
![Page 91: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/91.jpg)
Some terminology
• Outliers are points that lie away from the cloud of points.
• Outliers that lie horizontally away from the center of the cloudare called high leverage points.
• High leverage points that actually influence the slope of theregression line are called influential points.
• In order to determine if a point is influential, visualize theregression line with and without the point. Does the slope ofthe line change considerably? If so, then the point isinfluential. If not, then itOs not an influential point.
41
![Page 92: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/92.jpg)
Some terminology
• Outliers are points that lie away from the cloud of points.
• Outliers that lie horizontally away from the center of the cloudare called high leverage points.
• High leverage points that actually influence the slope of theregression line are called influential points.
• In order to determine if a point is influential, visualize theregression line with and without the point. Does the slope ofthe line change considerably? If so, then the point isinfluential. If not, then itOs not an influential point.
41
![Page 93: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/93.jpg)
Some terminology
• Outliers are points that lie away from the cloud of points.
• Outliers that lie horizontally away from the center of the cloudare called high leverage points.
• High leverage points that actually influence the slope of theregression line are called influential points.
• In order to determine if a point is influential, visualize theregression line with and without the point. Does the slope ofthe line change considerably? If so, then the point isinfluential. If not, then itOs not an influential point.
41
![Page 94: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/94.jpg)
Influential points
Data are available on the log of the surface temperature and thelog of the light intensity of 47 stars in the star cluster CYG OB1.
3.6 3.8 4.0 4.2 4.4 4.6
4.0
4.5
5.0
5.5
6.0
log(temp)
log(
light
inte
nsity
)
w/ outliersw/o outliers
42
![Page 95: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/95.jpg)
Types of outliers
Which of the below bestdescribes the outlier?
(a) influential
(b) high leverage
(c) none of the above
(d) there are no outliers
−20
0
20
40
−2
0
2
43
![Page 96: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/96.jpg)
Types of outliers
Which of the below bestdescribes the outlier?
(a) influential
(b) high leverage
(c) none of the above
(d) there are no outliers
−20
0
20
40
−2
0
2
43
![Page 97: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/97.jpg)
Types of outliers
Does this outlier influencethe slope of the regressionline?
−5
0
5
10
15
−5
0
5
44
![Page 98: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/98.jpg)
Types of outliers
Does this outlier influencethe slope of the regressionline?
Not much... −5
0
5
10
15
−5
0
5
44
![Page 99: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/99.jpg)
Recap
Which of following is true?
(a) Influential points always change the intercept of the regressionline.
(b) Influential points always reduce R2.
(c) It is much more likely for a low leverage point to be influential,than a high leverage point.
(d) When the data set includes an influential point, the relationshipbetween the explanatory variable and the response variable isalways nonlinear.
(e) None of the above.
45
![Page 100: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/100.jpg)
Recap
Which of following is true?
(a) Influential points always change the intercept of the regressionline.
(b) Influential points always reduce R2.
(c) It is much more likely for a low leverage point to be influential,than a high leverage point.
(d) When the data set includes an influential point, the relationshipbetween the explanatory variable and the response variable isalways nonlinear.
(e) None of the above.
45
![Page 101: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/101.jpg)
Recap (cont.)
R = 0.08,R2 = 0.0064
−2
0
2
4
6
8
10
−2
0
2
R = 0.79,R2 = 0.6241
0
5
10
−2
0
2
46
![Page 102: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/102.jpg)
Inference for linear regression
![Page 103: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/103.jpg)
Nature or nurture?
In 1966 Cyril Burt published a paper called “The genetic determination of
differences in intelligence: A study of monozygotic twins reared apart?”
The data consist of IQ scores for [an assumed random sample of] 27
identical twins, one raised by foster parents, the other by the biological
parents.
70 80 90 100 110 120 130
70
80
90
100
110
120
130
biological IQ
fost
er IQ
R = 0.882
48
![Page 104: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/104.jpg)
Which of the following is false?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.20760 9.29990 0.990 0.332
bioIQ 0.90144 0.09633 9.358 1.2e-09
Residual standard error: 7.729 on 25 degrees of freedom
Multiple R-squared: 0.7779,Adjusted R-squared: 0.769
F-statistic: 87.56 on 1 and 25 DF, p-value: 1.204e-09
(a) Additional 10 points in the biological twin’s IQ is associatedwith additional 9 points in the foster twin’s IQ, on average.
(b) Roughly 78% of the foster twins’ IQs can be accuratelypredicted by the model.
(c) The linear model is fosterIQ = 9.2 + 0.9 × bioIQ.(d) Foster twins with IQs higher than average IQs tend to have
biological twins with higher than average IQs as well.49
![Page 105: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/105.jpg)
Which of the following is false?
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.20760 9.29990 0.990 0.332
bioIQ 0.90144 0.09633 9.358 1.2e-09
Residual standard error: 7.729 on 25 degrees of freedom
Multiple R-squared: 0.7779,Adjusted R-squared: 0.769
F-statistic: 87.56 on 1 and 25 DF, p-value: 1.204e-09
(a) Additional 10 points in the biological twin’s IQ is associatedwith additional 9 points in the foster twin’s IQ, on average.
(b) Roughly 78% of the foster twins’ IQs can be accuratelypredicted by the model.
(c) The linear model is fosterIQ = 9.2 + 0.9 × bioIQ.(d) Foster twins with IQs higher than average IQs tend to have
biological twins with higher than average IQs as well.49
![Page 106: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/106.jpg)
Testing for the slope
Assuming that these 27 twins comprise a representative sample ofall twins separated at birth, we would like to test if these data pro-vide convincing evidence that the IQ of the biological twin is a sig-nificant predictor of IQ of the foster twin. What are the appropriatehypotheses?
(a) H0 : b0 = 0; HA : b0 , 0
(b) H0 : β0 = 0; HA : β0 , 0
(c) H0 : b1 = 0; HA : b1 , 0
(d) H0 : β1 = 0; HA : β1 , 0
50
![Page 107: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/107.jpg)
Testing for the slope
Assuming that these 27 twins comprise a representative sample ofall twins separated at birth, we would like to test if these data pro-vide convincing evidence that the IQ of the biological twin is a sig-nificant predictor of IQ of the foster twin. What are the appropriatehypotheses?
(a) H0 : b0 = 0; HA : b0 , 0
(b) H0 : β0 = 0; HA : β0 , 0
(c) H0 : b1 = 0; HA : b1 , 0
(d) H0 : β1 = 0; HA : β1 , 0
50
![Page 108: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/108.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 109: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/109.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null valueSE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 110: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/110.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 111: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/111.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 112: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/112.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 113: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/113.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.
Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 114: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/114.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
• We always use a t-test in inference for regression.Remember: Test statistic, T = point estimate−null value
SE
• Point estimate = b1 is the observed slope.
• SEb1 is the standard error associated with the slope.
• Degrees of freedom associated with the slope is df = n − 2,where n is the sample size.Remember: We lose 1 degree of freedom for each parameter we estimate, and in
simple linear regression we estimate 2 parameters, β0 and β1.
51
![Page 115: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/115.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
T =0.9014 − 0
0.0963= 9.36
df = 27 − 2 = 25
p − value = P(|T | > 9.36) < 0.01
52
![Page 116: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/116.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
T =0.9014 − 0
0.0963= 9.36
df = 27 − 2 = 25
p − value = P(|T | > 9.36) < 0.01
52
![Page 117: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/117.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
T =0.9014 − 0
0.0963= 9.36
df = 27 − 2 = 25
p − value = P(|T | > 9.36) < 0.01
52
![Page 118: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/118.jpg)
Testing for the slope (cont.)
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
T =0.9014 − 0
0.0963= 9.36
df = 27 − 2 = 25
p − value = P(|T | > 9.36) < 0.01
52
![Page 119: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/119.jpg)
% College graduate vs. % Hispanic in LA
What can you say about the relationship between % college gradu-ate and % Hispanic in a sample of 100 zip code areas in LA?
Education: College graduate
0.0
0.2
0.4
0.6
0.8
1.0
No dataFreeways
Race/Ethnicity: Hispanic
0.0
0.2
0.4
0.6
0.8
1.0
No dataFreeways
53
![Page 120: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/120.jpg)
% College educated vs. % Hispanic in LA - another look
What can you say about the relationship between of % college grad-uate and % Hispanic in a sample of 100 zip code areas in LA?
% Hispanic
% C
olle
ge g
radu
ate
0% 25% 50% 75% 100%
0%
25%
50%
75%
100%
54
![Page 121: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/121.jpg)
% College educated vs. % Hispanic in LA - linear model
Which of the below is the best interpretation of the slope?
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.7290 0.0308 23.68 0.0000%Hispanic -0.7527 0.0501 -15.01 0.0000
(a) A 1% increase in Hispanic residents in a zip code area in LA isassociated with a 75% decrease in % of college grads.
(b) A 1% increase in Hispanic residents in a zip code area in LA isassociated with a 0.75% decrease in % of college grads.
(c) An additional 1% of Hispanic residents decreases the % ofcollege graduates in a zip code area in LA by 0.75%.
(d) In zip code areas with no Hispanic residents, % of collegegraduates is expected to be 75%.
55
![Page 122: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/122.jpg)
% College educated vs. % Hispanic in LA - linear model
Which of the below is the best interpretation of the slope?
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.7290 0.0308 23.68 0.0000%Hispanic -0.7527 0.0501 -15.01 0.0000
(a) A 1% increase in Hispanic residents in a zip code area in LA isassociated with a 75% decrease in % of college grads.
(b) A 1% increase in Hispanic residents in a zip code area in LA isassociated with a 0.75% decrease in % of college grads.
(c) An additional 1% of Hispanic residents decreases the % ofcollege graduates in a zip code area in LA by 0.75%.
(d) In zip code areas with no Hispanic residents, % of collegegraduates is expected to be 75%.
55
![Page 123: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/123.jpg)
% College educated vs. % Hispanic in LA - linear model
Do these data provide convincing evidence that there is a statis-tically significant relationship between % Hispanic and % collegegraduates in zip code areas in LA?
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.7290 0.0308 23.68 0.0000
hispanic -0.7527 0.0501 -15.01 0.0000
How reliable is this p-value if these zip code areas are not randomlyselected?
56
![Page 124: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/124.jpg)
% College educated vs. % Hispanic in LA - linear model
Do these data provide convincing evidence that there is a statis-tically significant relationship between % Hispanic and % collegegraduates in zip code areas in LA?
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.7290 0.0308 23.68 0.0000
hispanic -0.7527 0.0501 -15.01 0.0000
Yes, the p-value for % Hispanic is low, indicating that the dataprovide convincing evidence that the slope parameter is differentthan 0.
How reliable is this p-value if these zip code areas are not randomlyselected?
56
![Page 125: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/125.jpg)
% College educated vs. % Hispanic in LA - linear model
Do these data provide convincing evidence that there is a statis-tically significant relationship between % Hispanic and % collegegraduates in zip code areas in LA?
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.7290 0.0308 23.68 0.0000
hispanic -0.7527 0.0501 -15.01 0.0000
Yes, the p-value for % Hispanic is low, indicating that the dataprovide convincing evidence that the slope parameter is differentthan 0.
How reliable is this p-value if these zip code areas are not randomlyselected?
Not very... 56
![Page 126: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/126.jpg)
Confidence interval for the slope
Remember that a confidence interval is calculated as point estimate ±ME
and the degrees of freedom associated with the slope in a simple linear
regression is n−2. Which of the below is the correct 95% confidence inter-
val for the slope parameter? Note that the model is based on observations
from 27 twins.
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 1.65 × 9.2999
(b) 0.9014 ± 2.06 × 0.0963
(c) 0.9014 ± 1.96 × 0.0963
(d) 9.2076 ± 1.96 × 0.0963
n = 27 df = 27 − 2 = 25
95% : t?25 = 2.06
0.9014 ± 2.06 × 0.0963
(0.7 , 1.1)
57
![Page 127: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/127.jpg)
Confidence interval for the slope
Remember that a confidence interval is calculated as point estimate ±ME
and the degrees of freedom associated with the slope in a simple linear
regression is n−2. Which of the below is the correct 95% confidence inter-
val for the slope parameter? Note that the model is based on observations
from 27 twins.
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 1.65 × 9.2999
(b) 0.9014 ± 2.06 × 0.0963
(c) 0.9014 ± 1.96 × 0.0963
(d) 9.2076 ± 1.96 × 0.0963
n = 27 df = 27 − 2 = 25
95% : t?25 = 2.06
0.9014 ± 2.06 × 0.0963
(0.7 , 1.1)
57
![Page 128: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/128.jpg)
Confidence interval for the slope
Remember that a confidence interval is calculated as point estimate ±ME
and the degrees of freedom associated with the slope in a simple linear
regression is n−2. Which of the below is the correct 95% confidence inter-
val for the slope parameter? Note that the model is based on observations
from 27 twins.
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 1.65 × 9.2999
(b) 0.9014 ± 2.06 × 0.0963
(c) 0.9014 ± 1.96 × 0.0963
(d) 9.2076 ± 1.96 × 0.0963
n = 27 df = 27 − 2 = 25
95% : t?25 = 2.06
0.9014 ± 2.06 × 0.0963
(0.7 , 1.1)
57
![Page 129: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/129.jpg)
Confidence interval for the slope
Remember that a confidence interval is calculated as point estimate ±ME
and the degrees of freedom associated with the slope in a simple linear
regression is n−2. Which of the below is the correct 95% confidence inter-
val for the slope parameter? Note that the model is based on observations
from 27 twins.
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 1.65 × 9.2999
(b) 0.9014 ± 2.06 × 0.0963
(c) 0.9014 ± 1.96 × 0.0963
(d) 9.2076 ± 1.96 × 0.0963
n = 27 df = 27 − 2 = 25
95% : t?25 = 2.06
0.9014 ± 2.06 × 0.0963
(0.7 , 1.1)
57
![Page 130: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/130.jpg)
Confidence interval for the slope
Remember that a confidence interval is calculated as point estimate ±ME
and the degrees of freedom associated with the slope in a simple linear
regression is n−2. Which of the below is the correct 95% confidence inter-
val for the slope parameter? Note that the model is based on observations
from 27 twins.
Estimate Std. Error t value Pr(>|t|)(Intercept) 9.2076 9.2999 0.99 0.3316
bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 1.65 × 9.2999
(b) 0.9014 ± 2.06 × 0.0963
(c) 0.9014 ± 1.96 × 0.0963
(d) 9.2076 ± 1.96 × 0.0963
n = 27 df = 27 − 2 = 25
95% : t?25 = 2.06
0.9014 ± 2.06 × 0.0963
(0.7 , 1.1)
57
![Page 131: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/131.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:
• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 132: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/132.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 133: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/133.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 134: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/134.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 135: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/135.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 136: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/136.jpg)
Recap
• Inference for the slope for a single-predictor linear regressionmodel:• Hypothesis test:
T =b1 − null value
SEb1
df = n − 2
• Confidence interval:
b1 ± t?df=n−2SEb1
• The null value is often 0 since we are usually checking for anyrelationship between the explanatory and the responsevariable.
• The regression output gives b1, SEb1 , and two-tailed p-valuefor the t-test for the slope where the null value is 0.
• We rarely do inference on the intercept, so we’ll be focusingon the estimates and inference for the slope.
58
![Page 137: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/137.jpg)
Caution
• Always be aware of the type of data you’re working with:random sample, non-random sample, or population.
• Statistical inference, and the resulting p-values, aremeaningless when you already have population data.
• If you have a sample that is non-random (biased), inferenceon the results will be unreliable.
• The ultimate goal is to have independent observations.
59
![Page 138: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/138.jpg)
Caution
• Always be aware of the type of data you’re working with:random sample, non-random sample, or population.
• Statistical inference, and the resulting p-values, aremeaningless when you already have population data.
• If you have a sample that is non-random (biased), inferenceon the results will be unreliable.
• The ultimate goal is to have independent observations.
59
![Page 139: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/139.jpg)
Caution
• Always be aware of the type of data you’re working with:random sample, non-random sample, or population.
• Statistical inference, and the resulting p-values, aremeaningless when you already have population data.
• If you have a sample that is non-random (biased), inferenceon the results will be unreliable.
• The ultimate goal is to have independent observations.
59
![Page 140: Chapter 7: Introduction to linear regressionstevel/251/slides/os3_slides_07.pdfChapter 7: Introduction to linear regression OpenIntro Statistics, 3rd Edition Slides developed by Mine](https://reader034.vdocument.in/reader034/viewer/2022042712/5f8f5ba73624e747020f25f0/html5/thumbnails/140.jpg)
Caution
• Always be aware of the type of data you’re working with:random sample, non-random sample, or population.
• Statistical inference, and the resulting p-values, aremeaningless when you already have population data.
• If you have a sample that is non-random (biased), inferenceon the results will be unreliable.
• The ultimate goal is to have independent observations.
59