sta291 statistical methods lecture 27. inference for regression

23
STA291 Statistical Methods Lecture 27

Upload: hector-ross

Post on 26-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: STA291 Statistical Methods Lecture 27. Inference for Regression

STA291Statistical Methods

Lecture 27

Page 2: STA291 Statistical Methods Lecture 27. Inference for Regression

Inference for Regression

Does the cost of a movie depend on its length?

Now we want to know, how useful is this model?

Page 3: STA291 Statistical Methods Lecture 27. Inference for Regression

The Population and the SampleThe movie budget sample is based on 120 observations. But we know observations vary from sample to sample. So we imagine a true line that summarizes the relationship between x and y for the entire population,

Where µy is the population mean of y at a given value of x.

We write µy instead of y because the regression line assumes that the means of the y values for each value of x fall exactly on the line.

0 1y x

Page 4: STA291 Statistical Methods Lecture 27. Inference for Regression

For a given value x: Most, if not all, of the y values obtained from a

particular sample will not lie on the line.

The sampled y values will be distributed about µy.

We can account for the difference between ŷ and µy by adding the error residual, or ε : 0 1y x

The Population and the Sample

Page 5: STA291 Statistical Methods Lecture 27. Inference for Regression

Regression Inference Collect a sample and estimate the population β’s by

finding a regression line (Chapter 6):

The residuals e = y – ŷ are the sample-based versions of ε.

Account for the uncertainties in β0 and β1 by making confidence intervals, as we’ve done for means and proportions.

0 1

0 0 1 1

ˆ

estimates , estimates

y b b x

b b

The Population and the Sample

Page 6: STA291 Statistical Methods Lecture 27. Inference for Regression

Assumptions and Conditions

In this order:1. Linearity Assumption2. Independence Assumption3. Equal Variance Assumption4. Normal Population Assumption

Page 7: STA291 Statistical Methods Lecture 27. Inference for Regression

Summary of Assumptions and Conditions

Assumptions and Conditions

Page 8: STA291 Statistical Methods Lecture 27. Inference for Regression

Summary of Assumptions and Conditions1. Make a scatterplot of the data to check for linearity.

(Linearity Assumption)2. Fit a regression and find the residuals, e, and

predicted values ŷ.3. Plot the residuals against time (if appropriate) and

check for evidence of patterns (Independence Assumption).

4. Make a scatterplot of the residuals against x or the predicted values. This plot should not exhibit a “fan” or “cone” shape. (Equal Variance Assumption)

5. Make a histogram and Normal probability plot of the residuals (Normal Population Assumption)

Assumptions and Conditions

Page 9: STA291 Statistical Methods Lecture 27. Inference for Regression

The Standard Error of the SlopeFor a sample, we expect b1 to be close, but not equal to the model slope β1. For similar samples, the standard error of the slope is a measure of the variability of b1 about the true slope β1.

Spread around the line: se

Spread of the x values: sx

Sample size: n

Page 10: STA291 Statistical Methods Lecture 27. Inference for Regression

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare se’s.

The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=

𝑠𝑒𝑠𝑥 √𝑛− 1

Page 11: STA291 Statistical Methods Lecture 27. Inference for Regression

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare sx’s.

The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=

𝑠𝑒𝑠𝑥 √𝑛− 1

Page 12: STA291 Statistical Methods Lecture 27. Inference for Regression

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare n’s.

The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=

𝑠𝑒𝑠𝑥 √𝑛− 1

Page 13: STA291 Statistical Methods Lecture 27. Inference for Regression

A Test for the Regression SlopeWhen the conditions are met, the standardized

estimated regression slope,

Follows a t-distribution with df = n – 2. We estimate SE(b1) with:

Where sx is the ordinary standard deviation of the x’s and

𝑆𝐸 (𝑏1 )=𝑠𝑒

𝑠𝑥 √𝑛− 1

𝑡=𝑏1 −𝛽1

𝑆𝐸 (𝑏1 )

𝑠𝑒=√∑ (𝑦− �̂� )2

𝑛−2

Page 14: STA291 Statistical Methods Lecture 27. Inference for Regression

The usual null hypothesis about the slope is that it’s equal to 0. Why?A slope of zero says that y doesn’t tend to change linearly when x changes. In other words, if the slope equals zero, there is no linear association between the two variables.

H0: β1 = 0. This would mean that x and y are not linearly related.Ha: β1 ≠ 0. This would mean . . .

A Test for the Regression Slope

Page 15: STA291 Statistical Methods Lecture 27. Inference for Regression

CI for the Regression Slope

𝑏1 ± 𝑡𝑛− 2∗ ×𝑆𝐸 (𝑏1 )

When the assumptions and conditions are met, we can find a confidence interval for b1 from

Where the critical value t* depends on the confidence level and has df = n – 2.

Page 16: STA291 Statistical Methods Lecture 27. Inference for Regression

16.4 A Test for the Regression SlopeExample : Soap

A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:

Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001

What is the standard deviation of the residuals?

What is the standard error of b1?

What are the hypotheses for the regression slope?

At α = 0.05, what is the conclusion?

Page 17: STA291 Statistical Methods Lecture 27. Inference for Regression

16.4 A Test for the Regression Slope

Example : Soap

A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:

Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001

What is the standard deviation of the residuals? se = 2.949

What is the standard error of ? SE( ) = 0.01681b1b

Page 18: STA291 Statistical Methods Lecture 27. Inference for Regression

16.4 A Test for the Regression Slope

Example : Soap

A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:

Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001

What are the hypotheses for the

regression slope?

At α = 0.05, what is the conclusion? Since the p-value is small (<0.0001), reject the null hypothesis. There is strong evidence of a linear relationship between Weight and Day.

o 1

a 1

H : 0

H : 0

Page 19: STA291 Statistical Methods Lecture 27. Inference for Regression

16.4 A Test for the Regression Slope

Example : Soap

A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:

Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001

Find a 95% confidence interval for the slope?

Interpret the 95% confidence interval for the slope?

At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion?

Page 20: STA291 Statistical Methods Lecture 27. Inference for Regression

16.4 A Test for the Regression SlopeExample : SoapA soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:

Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001

Find a 95% confidence interval for the slope?

Interpret the 95% confidence interval for the slope? We can be 95% confident that weight of soap decreases by between 5.34 and 5.8 grams per day.

At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion? Yes, the interval does not contain zero, so reject the null hypothesis.

1 1* ( ) 5.57476 (2.160)(0.1068) ( 5.805, 5.344)b t SE b

Page 21: STA291 Statistical Methods Lecture 27. Inference for Regression

Don’t fit a linear regression to data that aren’t straight.

Watch out for changing spread.

Watch out for non-Normal errors. Check the histogram and the Normal probability plot.

Watch out for extrapolation. It is always dangerous to predict for x-values that lie far away from the center of the data.

Page 22: STA291 Statistical Methods Lecture 27. Inference for Regression

Watch out for high-influence points and unusual observations.

Watch out for one-tailed tests. Most software packages perform only two-tailed tests. Adjust your P-values accordingly.

Page 23: STA291 Statistical Methods Lecture 27. Inference for Regression

Looking back

oKnow the Assumptions and conditions for inference about regression coefficients and how to check them, in this order: LIENoKnow the components of the standard error of the slope coefficientoTest statisticoCI Interpretation