simple linear regression

18
Simple Linear Regression Statistics 700 December 4-7, 2001

Upload: sheila-morrison

Post on 31-Dec-2015

18 views

Category:

Documents


3 download

DESCRIPTION

Simple Linear Regression. Statistics 700 December 4-7, 2001. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simple Linear Regression

Simple Linear Regression

Statistics 700

December 4-7, 2001

Page 2: Simple Linear Regression

04/19/23 Simple Linear Regression 2

Example for IllustrationThe human body takes in more oxygen when exercising than

when it is at rest. To deliver oxygen to the muscles, the heart must beat faster. Heart rate is easy to measure, but measuring oxygen uptake requires elaborate equipment. If oxygen uptake (VO2) can be accurately predicted from heart rate (HR), the predicted values may replace actually measured values for various research purposes. Unfortunately, not all human bodies are the same, so no single prediction equation works for all people. Researchers can, however, measure both HR and VO2 for one person under varying sets of exercise conditions and calculate a regression equation for predicting that person’s oxygen uptake from heart rate.

Page 3: Simple Linear Regression

04/19/23 Simple Linear Regression 3

Data From An Individual

• Goals in this illustration:

• Scatterplot: linear relationship or not?

• Obtain the best-fitting line using least-squares.

• To test whether the model is significant or not.

• To obtain a confidence interval for the regression coefficient.

• To obtain predictions.

HR 94 96 95 95 94 95 94 104 104 106VO2 0.473 0.753 0.929 0.939 0.832 0.983 1.049 1.178 1.176 1.292HR 108 110 113 113 118 115 121 127 131

VO2 1.403 1.499 1.529 1.599 1.749 1.746 1.897 2.040 2.231

Page 4: Simple Linear Regression

04/19/23 Simple Linear Regression 4

The Scatterplot

90 100 110 120 130

0.4

1.4

2.4

HeartRate

Oxy

genU

ptak

e

Page 5: Simple Linear Regression

04/19/23 Simple Linear Regression 5

Simple Linear Regression Model1. Conditional on X=x, the response variable Y has mean equal to xx

2. is the y-intercept; while is the slope of the regression line, which could be interpreted as the change in the mean value per unit change in the independent variable.

3. For each X = x, the conditional distribution of Y is normal with mean (x) and variance 2.

4. Y1, Y2, …, Yn are independent of each other.

Shorthand:

Yi = + xi + i with i IID N(0,2)

Page 6: Simple Linear Regression

04/19/23 Simple Linear Regression 6

Least-Squares (LS) RegressionOne of the goals in regression analysis is to estimate the parameters , , and of the regression model. Denote by

bXaY ˆ

The estimate of the regression line, so that a estimates , and b estimates . Then for the observed values of X, which are x1, x2, …, xn, we may obtain the predicted values of the response variable Y for each of these X-values. These are:

Page 7: Simple Linear Regression

04/19/23 Simple Linear Regression 7

Predicted Values

.,...,2,1 ,ˆ nibxaY ii

A good estimate of the regression line should produce predicted values that are close to the actual observed values of the response variable. That is, the set of deviations

,,...,2,1 ),(ˆ nibxaYYYR iiiii

Should ideally be close (if not equal) to zeros. These deviations between observed and predicted values are also called as residuals.

Page 8: Simple Linear Regression

04/19/23 Simple Linear Regression 8

Principle of Least-Squares (LS)In least-squares regression, the best-fitting regression line is that which will make the sum of these squared deviations or residuals as small as possible. Thus, the regression coefficients a and b are chosen in order to minimize the quantity:

.)()ˆ(),(2

1

2

1

n

iiii

n

ii bxaYYYbaQ

Using calculus, the values of a and b that will minimize this quantity are given by:

Page 9: Simple Linear Regression

04/19/23 Simple Linear Regression 9

ncorrelatio ))((

))((

)(

)(

LinePredictionˆ

ofEstimator

ofEstimator

11

2

1

22

1

2

1

22

1

SYYSXX

SXYr

YXnYXYYXXSXY

YnYYYSYY

XnXXXSXX

bXaY

XbYa

SXX

SXYb

i

n

iii

n

ii

n

ii

n

ii

n

ii

n

ii

Least-Squares Solution

Page 10: Simple Linear Regression

04/19/23 Simple Linear Regression 10

SYY

SSRR

MSE

n

SSEMSE

SSESSRSSY

YYSSR

YYRSSE

n

ii

n

iii

n

ii

2

2

2

1

2

11

2

tion Determinaoft Coefficien

ofestimator unbiased an is

2

)ˆ(

)ˆ(

Estimating the Variance

Page 11: Simple Linear Regression

04/19/23 Simple Linear Regression 11

Interpretations of Quantities

• SSE : measures variation not be explained by predictor.

• SSR : measures the amount of variation explained by predictor variable.

• SYY: total variation in the Y-values. This is partitioned into SSR and SSE.

• R2 = (SSR)/(SYY) : coefficient of determination; indicates proportion of variation in Y-values explained by the predictor variable.

• MSE = (SSE)/(n-2) : is the mean-squared error. This provides an unbiased estimate of the common variance 2.

Page 12: Simple Linear Regression

04/19/23 Simple Linear Regression 12

Sampling Distributions of Estimators

SXX

b

b

22

2b ),Normal( is

To estimate the variance, 2 is replaced by the MSE.

SXX

X

n

a

a

a

222

2

1

),Normal( is

Page 13: Simple Linear Regression

04/19/23 Simple Linear Regression 13

Testing Hypothesis

• To test the null hypothesis H0: = 0 versus H1: not equal to 0 we use the t-statistic given by:

SXXMSE

bTc

0

Which follows a t-distribution with degrees-of-freedom equal to n-2 under the null hypothesis. Thus, we reject H0 if |Tc| > tn-2;/2. Similarly, for testing H0: = 0, we use:

.1

)(2

0

SXXX

nMSE

aTc

Page 14: Simple Linear Regression

04/19/23 Simple Linear Regression 14

SXX

Xx

nMSEtbxa

SXX

Xx

nMSEtbxa

SXX

Xx

nMSExY

n

n

20

2/;20

20

2/;20

20

02

)(11)()(

)(1)()(

)(1)()](ˆ[ˆ

Confidence Interval for Mean and Predicting the Value of Y of a new Unit

00)(ˆ bxaxY Estimate of Mean and Predicted Value at x0:

Variance:

CI for (x0):

CI for Y(x0):

Page 15: Simple Linear Regression

04/19/23 Simple Linear Regression 15

Results of Regression Analysis (using Minitab)

Regression Analysis

The regression equation is

OxygenUptake = - 2.80 + 0.0387 HeartRate

Predictor Coef StDev T PConstant -2.8044 0.2583 -10.86 0.000HeartRat 0.038652 0.002400 16.10 0.000

S = 0.1205 R-Sq = 93.8% R-Sq(adj) = 93.5%

Analysis of Variance

Source DF SS MS F PRegression 1 3.7619 3.7619 259.27 0.000Residual Error 17 0.2467 0.0145Total 18 4.0085

P-Value

P-value forregression

(MSR)/(MSE)

Page 16: Simple Linear Regression

04/19/23 Simple Linear Regression 16

Fitted Line on the Scatterplot

90 100 110 120 130

0.5

1.5

2.5

HeartRate

Oxy

genU

ptak

e

Y = -2.80435 + 3.87E-02X

R-Sq = 93.8 %

Regression Plot

Page 17: Simple Linear Regression

04/19/23 Simple Linear Regression 17

90 100 110 120 130

0.5

1.5

2.5

HeartRate

Oxy

Upt

ake

Y = -2.80435 + 3.87E-02X

R-Sq = 93.8 %

Regression

95% CI

95% PI

Regression Plot

Confidence Interval for Mean and Prediction Interval

Page 18: Simple Linear Regression

04/19/23 Simple Linear Regression 18

X YHeartRate OxygenUptake X^2 Y^2 XY Predicted Residual ResSq

94 0.473 8836 0.223729 44.462 0.828944 -0.35594369 0.12669696 0.753 9216 0.567009 72.288 0.906248 -0.1532479 0.02348595 0.929 9025 0.863041 88.255 0.867596 0.061404206 0.0037795 0.939 9025 0.881721 89.205 0.867596 0.071404206 0.00509994 0.832 8836 0.692224 78.208 0.828944 0.00305631 9.34E-0695 0.983 9025 0.966289 93.385 0.867596 0.115404206 0.01331894 1.049 8836 1.100401 98.606 0.828944 0.22005631 0.048425

104 1.178 10816 1.387684 122.512 1.215465 -0.03746474 0.001404104 1.176 10816 1.382976 122.304 1.215465 -0.03946474 0.001557106 1.292 11236 1.669264 136.952 1.292769 -0.00076895 5.91E-07108 1.403 11664 1.968409 151.524 1.370073 0.032926843 0.001084110 1.499 12100 2.247001 164.89 1.447377 0.051622633 0.002665113 1.529 12769 2.337841 172.777 1.563334 -0.03433368 0.001179113 1.599 12769 2.556801 180.687 1.563334 0.035666318 0.001272118 1.749 13924 3.059001 206.382 1.756594 -0.00759421 5.77E-05115 1.746 13225 3.048516 200.79 1.640638 0.105362109 0.011101121 1.897 14641 3.598609 229.537 1.872551 0.02444948 0.000598127 2.04 16129 4.1616 259.08 2.104463 -0.06446315 0.004155131 2.231 17161 4.977361 292.261 2.259072 -0.02807157 0.000788

2.77556E-15 0.2466642033 25.297 0 220049 37.68948 2804.105

SXX 2518 B 0.038652SYY 4.008519 A -2.80435SXY 97.326

SSR 3.761855 SSE 0.246664MSR 3.761855 MSE 0.01451

F 259.2659

Excel Implementation of Formulas