simple linear regression
DESCRIPTION
Simple Linear Regression. Statistics 700 December 4-7, 2001. - PowerPoint PPT PresentationTRANSCRIPT
Simple Linear Regression
Statistics 700
December 4-7, 2001
04/19/23 Simple Linear Regression 2
Example for IllustrationThe human body takes in more oxygen when exercising than
when it is at rest. To deliver oxygen to the muscles, the heart must beat faster. Heart rate is easy to measure, but measuring oxygen uptake requires elaborate equipment. If oxygen uptake (VO2) can be accurately predicted from heart rate (HR), the predicted values may replace actually measured values for various research purposes. Unfortunately, not all human bodies are the same, so no single prediction equation works for all people. Researchers can, however, measure both HR and VO2 for one person under varying sets of exercise conditions and calculate a regression equation for predicting that person’s oxygen uptake from heart rate.
04/19/23 Simple Linear Regression 3
Data From An Individual
• Goals in this illustration:
• Scatterplot: linear relationship or not?
• Obtain the best-fitting line using least-squares.
• To test whether the model is significant or not.
• To obtain a confidence interval for the regression coefficient.
• To obtain predictions.
HR 94 96 95 95 94 95 94 104 104 106VO2 0.473 0.753 0.929 0.939 0.832 0.983 1.049 1.178 1.176 1.292HR 108 110 113 113 118 115 121 127 131
VO2 1.403 1.499 1.529 1.599 1.749 1.746 1.897 2.040 2.231
04/19/23 Simple Linear Regression 4
The Scatterplot
90 100 110 120 130
0.4
1.4
2.4
HeartRate
Oxy
genU
ptak
e
04/19/23 Simple Linear Regression 5
Simple Linear Regression Model1. Conditional on X=x, the response variable Y has mean equal to xx
2. is the y-intercept; while is the slope of the regression line, which could be interpreted as the change in the mean value per unit change in the independent variable.
3. For each X = x, the conditional distribution of Y is normal with mean (x) and variance 2.
4. Y1, Y2, …, Yn are independent of each other.
Shorthand:
Yi = + xi + i with i IID N(0,2)
04/19/23 Simple Linear Regression 6
Least-Squares (LS) RegressionOne of the goals in regression analysis is to estimate the parameters , , and of the regression model. Denote by
bXaY ˆ
The estimate of the regression line, so that a estimates , and b estimates . Then for the observed values of X, which are x1, x2, …, xn, we may obtain the predicted values of the response variable Y for each of these X-values. These are:
04/19/23 Simple Linear Regression 7
Predicted Values
.,...,2,1 ,ˆ nibxaY ii
A good estimate of the regression line should produce predicted values that are close to the actual observed values of the response variable. That is, the set of deviations
,,...,2,1 ),(ˆ nibxaYYYR iiiii
Should ideally be close (if not equal) to zeros. These deviations between observed and predicted values are also called as residuals.
04/19/23 Simple Linear Regression 8
Principle of Least-Squares (LS)In least-squares regression, the best-fitting regression line is that which will make the sum of these squared deviations or residuals as small as possible. Thus, the regression coefficients a and b are chosen in order to minimize the quantity:
.)()ˆ(),(2
1
2
1
n
iiii
n
ii bxaYYYbaQ
Using calculus, the values of a and b that will minimize this quantity are given by:
04/19/23 Simple Linear Regression 9
ncorrelatio ))((
))((
)(
)(
LinePredictionˆ
ofEstimator
ofEstimator
11
2
1
22
1
2
1
22
1
SYYSXX
SXYr
YXnYXYYXXSXY
YnYYYSYY
XnXXXSXX
bXaY
XbYa
SXX
SXYb
i
n
iii
n
ii
n
ii
n
ii
n
ii
n
ii
Least-Squares Solution
04/19/23 Simple Linear Regression 10
SYY
SSRR
MSE
n
SSEMSE
SSESSRSSY
YYSSR
YYRSSE
n
ii
n
iii
n
ii
2
2
2
1
2
11
2
tion Determinaoft Coefficien
ofestimator unbiased an is
2
)ˆ(
)ˆ(
Estimating the Variance
04/19/23 Simple Linear Regression 11
Interpretations of Quantities
• SSE : measures variation not be explained by predictor.
• SSR : measures the amount of variation explained by predictor variable.
• SYY: total variation in the Y-values. This is partitioned into SSR and SSE.
• R2 = (SSR)/(SYY) : coefficient of determination; indicates proportion of variation in Y-values explained by the predictor variable.
• MSE = (SSE)/(n-2) : is the mean-squared error. This provides an unbiased estimate of the common variance 2.
04/19/23 Simple Linear Regression 12
Sampling Distributions of Estimators
SXX
b
b
22
2b ),Normal( is
To estimate the variance, 2 is replaced by the MSE.
SXX
X
n
a
a
a
222
2
1
),Normal( is
04/19/23 Simple Linear Regression 13
Testing Hypothesis
• To test the null hypothesis H0: = 0 versus H1: not equal to 0 we use the t-statistic given by:
SXXMSE
bTc
0
Which follows a t-distribution with degrees-of-freedom equal to n-2 under the null hypothesis. Thus, we reject H0 if |Tc| > tn-2;/2. Similarly, for testing H0: = 0, we use:
.1
)(2
0
SXXX
nMSE
aTc
04/19/23 Simple Linear Regression 14
SXX
Xx
nMSEtbxa
SXX
Xx
nMSEtbxa
SXX
Xx
nMSExY
n
n
20
2/;20
20
2/;20
20
02
)(11)()(
)(1)()(
)(1)()](ˆ[ˆ
Confidence Interval for Mean and Predicting the Value of Y of a new Unit
00)(ˆ bxaxY Estimate of Mean and Predicted Value at x0:
Variance:
CI for (x0):
CI for Y(x0):
04/19/23 Simple Linear Regression 15
Results of Regression Analysis (using Minitab)
Regression Analysis
The regression equation is
OxygenUptake = - 2.80 + 0.0387 HeartRate
Predictor Coef StDev T PConstant -2.8044 0.2583 -10.86 0.000HeartRat 0.038652 0.002400 16.10 0.000
S = 0.1205 R-Sq = 93.8% R-Sq(adj) = 93.5%
Analysis of Variance
Source DF SS MS F PRegression 1 3.7619 3.7619 259.27 0.000Residual Error 17 0.2467 0.0145Total 18 4.0085
P-Value
P-value forregression
(MSR)/(MSE)
04/19/23 Simple Linear Regression 16
Fitted Line on the Scatterplot
90 100 110 120 130
0.5
1.5
2.5
HeartRate
Oxy
genU
ptak
e
Y = -2.80435 + 3.87E-02X
R-Sq = 93.8 %
Regression Plot
04/19/23 Simple Linear Regression 17
90 100 110 120 130
0.5
1.5
2.5
HeartRate
Oxy
Upt
ake
Y = -2.80435 + 3.87E-02X
R-Sq = 93.8 %
Regression
95% CI
95% PI
Regression Plot
Confidence Interval for Mean and Prediction Interval
04/19/23 Simple Linear Regression 18
X YHeartRate OxygenUptake X^2 Y^2 XY Predicted Residual ResSq
94 0.473 8836 0.223729 44.462 0.828944 -0.35594369 0.12669696 0.753 9216 0.567009 72.288 0.906248 -0.1532479 0.02348595 0.929 9025 0.863041 88.255 0.867596 0.061404206 0.0037795 0.939 9025 0.881721 89.205 0.867596 0.071404206 0.00509994 0.832 8836 0.692224 78.208 0.828944 0.00305631 9.34E-0695 0.983 9025 0.966289 93.385 0.867596 0.115404206 0.01331894 1.049 8836 1.100401 98.606 0.828944 0.22005631 0.048425
104 1.178 10816 1.387684 122.512 1.215465 -0.03746474 0.001404104 1.176 10816 1.382976 122.304 1.215465 -0.03946474 0.001557106 1.292 11236 1.669264 136.952 1.292769 -0.00076895 5.91E-07108 1.403 11664 1.968409 151.524 1.370073 0.032926843 0.001084110 1.499 12100 2.247001 164.89 1.447377 0.051622633 0.002665113 1.529 12769 2.337841 172.777 1.563334 -0.03433368 0.001179113 1.599 12769 2.556801 180.687 1.563334 0.035666318 0.001272118 1.749 13924 3.059001 206.382 1.756594 -0.00759421 5.77E-05115 1.746 13225 3.048516 200.79 1.640638 0.105362109 0.011101121 1.897 14641 3.598609 229.537 1.872551 0.02444948 0.000598127 2.04 16129 4.1616 259.08 2.104463 -0.06446315 0.004155131 2.231 17161 4.977361 292.261 2.259072 -0.02807157 0.000788
2.77556E-15 0.2466642033 25.297 0 220049 37.68948 2804.105
SXX 2518 B 0.038652SYY 4.008519 A -2.80435SXY 97.326
SSR 3.761855 SSE 0.246664MSR 3.761855 MSE 0.01451
F 259.2659
Excel Implementation of Formulas