inference for linear regression - github pages · section 8.4 november 6, 2019 2 / 23. regression...

23
Inference for Linear Regression November 6, 2019 November 6, 2019 1 / 23

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Inference for Linear Regression

November 6, 2019

November 6, 2019 1 / 23

Page 2: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

Asking R for a summary of the regression model, we get the following:

Let’s pick this apart piece by piece.

Section 8.4 November 6, 2019 2 / 23

Page 3: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

The first line shows the command used in R to run this regressionmodel.

The Residuals item shows a quartile-based summary of ourresiduals.

Section 8.4 November 6, 2019 3 / 23

Page 4: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

The F-statistic and p-value give information about the modeloverall.

These are based on an F-distribution.

The null hypothesis is that all of our model parameters are 0 (themodel gives us no good info).

Since p-value< 2.2× 10−16 < α = 0.05, at least one of theparameters is nonzero (the model is useful).

Section 8.4 November 6, 2019 4 / 23

Page 5: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

Multiple R-squared is our squared correlation coefficient R2.

This tells us how good our fit is.

Ignore the adjusted R-squared and residual standard error for now.

Section 8.4 November 6, 2019 5 / 23

Page 6: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

Finally, the Coefficients section gives us several pieces ofinformation:

1 Estimate shows the estimated parameters for each value.

2 Std. Error gives the standard error for each parameter estimate.

3 The t valuess are the test statistics for each parameter estiamte.

4 Finally, Pr(>|t|) are the p-values for each parameter estimate.

Section 8.4 November 6, 2019 6 / 23

Page 7: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

The hypothesis test for each regression coefficient has hypotheses

H0 : βi = 0

HA : βi 6= 0

where i = 0 for the intercept and i = 1 for the slope.

Section 8.4 November 6, 2019 7 / 23

Page 8: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Regression Example

p− value < 2× 10−16 for b0 so we can conclude that the interceptis nonzero.

p− value < 2× 10−16 for b1 so we conclude that the intercept isalso nonzero.

This means that the intercept and slope both provide usefulinformation when predicting values of y = eruptions.

Section 8.4 November 6, 2019 8 / 23

Page 9: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Confidence Intervals for a Coefficient

We can construct confidence intervals similar to those for hypothesistests. A (1− α)100% confidence interval for βi is

bi ± tα/2(df)× SE(bi)

where the model df and SE can be found in the regression output.

Section 8.4 November 6, 2019 9 / 23

Page 10: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Aside: ANOVA for Regression Models

ANOVA will also play a role in regression.

We can get the ANOVA table for a regression.

Section 8.4 November 6, 2019 10 / 23

Page 11: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Aside: ANOVA for Regression Models

The ANOVA table in regression will look something like this:

Df Sum Sq Mean Sq F value Pr(>F)faithful$waiting 1 286.478 286.478 1162.1 < 2.2e-16Residuals 270 66.562 0.247

Section 8.4 November 6, 2019 11 / 23

Page 12: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Example

Find 95% confidence intervals for β0 and β1.

Section 8.4 November 6, 2019 12 / 23

Page 13: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Estimation and Prediction Using a Regression Line

We now know

how to examine if a model is useful.

how to confirm that our regression assumptions are satisfied.

Section 8.4 November 6, 2019 13 / 23

Page 14: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Estimation and Prediction Using a Regression Line

Given a useful regression line, we want to

estimate an average value of y for a given value of x.

estimate a particular value of y for a given value of x.

Section 8.4 November 6, 2019 14 / 23

Page 15: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Estimation and Prediction Using a Regression Line

We’ve already talked about using a regression line to make predictions.

y = b0 + b1x

Plug in x and we get a good estimate for the average value of y at thatpoint.

Section 8.4 November 6, 2019 15 / 23

Page 16: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Estimation and Prediction Using a Regression Line

Point estimates are useful, but we want to consider variability!

Recall: one of our regression assumptions is normally distributederrors.

This means that the variability around the regression line shouldbe approximately normal

with mean β0 + β1xand standard deviation σ.

Section 8.4 November 6, 2019 16 / 23

Page 17: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

The Variability of y

Notice that y is an estimator.

The variability of an estimator is its standard error.

Then σ is well-approximated by

SE(y) =

√MSE

(1

n+

(x0 − x)2

SSx

)

Section 8.4 November 6, 2019 17 / 23

Page 18: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

The Variability of y

Since we are working with a normal distribution, estimation andtesting can be based on the test statistic

t =y − y0SE(y)

which corresponds to a t(n− 2) distribution.

Section 8.4 November 6, 2019 18 / 23

Page 19: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Confidence Intervals for y

A (1− α)100% confidence interval for the average value of y (measuredby β0 + β1x) when x = x0 is

y ± tα/2(n− 2)× SE(y)

or

y ± tα/2(n− 2)×

√MSE

(1

n+

(x0 − x)2

SSx

)

Section 8.4 November 6, 2019 19 / 23

Page 20: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Prediction Intervals for y

So far, we’ve only considered average values of the outcomevariable y.

What if we wanted to predict a particular value of y?

Section 8.4 November 6, 2019 20 / 23

Page 21: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Prediction Intervals for y

For a residual,e = ε+ error in estimating line

We don’t know the true breakdown between these components.

...but we can use this concept to build a new standard errorformula.

Section 8.4 November 6, 2019 21 / 23

Page 22: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Prediction Intervals for y

The standard error of (y − y) is

SE(y − y) =

√MSE

(1 +

1

n+

(x0 − x)2

SSx

)

Section 8.4 November 6, 2019 22 / 23

Page 23: Inference for Linear Regression - GitHub Pages · Section 8.4 November 6, 2019 2 / 23. Regression Example The rst line shows the command used in R to run this regression model. The

Prediction Intervals for y

A (1− α)100% prediction interval for a specific value of y whenx = x0 is

y ± tα/2(n− 2)× SE(y − y)

or

y ± tα/2(n− 2)×

√MSE

(1 +

1

n+

(x0 − x)2

SSx

)

Section 8.4 November 6, 2019 23 / 23