unit 3: inference for the regression model class 7… class 8… class 9… class 10… class...

61
Unit 3: Inference for the regression model Class 7… Class 8… Class 9Class 10… Unit 3 / Page 1 © Andrew Ho, Harvard Graduate School of Education

Upload: ariel-matthews

Post on 11-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10…

Unit 3 / Page 1

Page 2: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Where is Unit 3 in our 11-Unit Sequence?

Unit 6:The basics of

multiple regression

Unit 7:Statistical control in depth:Correlation and collinearity

Unit 10:Interaction and quadratic effects

Unit 8:Categorical predictors I:

Dichotomies

Unit 9:Categorical predictors II:

Polychotomies

Unit 11:Regression in practice. Common Extensions.

Unit 1:Introduction to

simple linear regression

Unit 2:Correlation

and causality

Unit 3:Inference for the regression model

Building a solid

foundation

Unit 4:Regression assumptions:Evaluating their tenability

Unit 5:Transformations

to achieve linearity

Mastering the

subtleties

Adding additional predictors

Generalizing to other types of predictors and

effects

Pulling it all

together

Unit 3 / Page 2

Page 3: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

In this unit, we’re going to learn about…

• Distinguishing between population models and fitted sample results• How would regression statistics differ upon repeated random sampling from the

population?– The sampling distribution of a statistic shows the distribution of values a statistic is likely

to take under repeated sampling of sample size from a population.– The standard error of a statistic is the standard deviation of its sampling distribution.– Estimating this sampling distribution with results from just a single sample

• The logic of statistical hypothesis testing: Specifying and its alternative– The sampling distribution of a regression coefficient when is true– The -distribution and its role in regression analysis– -values—what they are and what they’re not

• Confidence intervals—what they are and what they’re not– The relationship between hypothesis testing and confidence intervals– Confidence intervals for regression parameters– Confidence intervals for the mean of at a given value of – Prediction at the extremes vs. prediction in the middle: The dangers of extrapolation– Prediction intervals for an individual value of at a given value of

• How to avoid missing an effect when there is one: Statistical Power.Unit 3 / Page 3

Page 4: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Population Parameters vs. Sample Statistics

Unit 1 / Page 4© Andrew Ho, Harvard Graduate School of Education

The Population. An abstraction to which we generalize.All possible twins, separated at birth, that we could theoretically sample.Generally thought to be infinite.

6080

100

120

140

IQ o

f tw

in r

aise

d in

'fo

ste

r ho

me'

by

adop

tive

pare

nts

60 80 100 120 140IQ of twin raised in 'own home' by birth parents.

The Sample. The data we have sampled, ideally in an unbiased, representative fashion, from the population. Of finite size, in this case,

A parameter is a fact about a population.

Parameters and . Written in Greek. Rarely if never known in practice.

A statistic is a fact about a sample.

Statistics and . Written in Roman or in Greek with hats designating parameter estimates. What we use for inference about the population.

Page 5: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Example: The National Longitudinal Study of Freshmen (NLSF)

RQ: What is the predictive relationship between the degree of school integration the student experienced in high school and his/her current perceived social closeness to minorities?

Contact Hypothesis: Integration makes people more comfortable and tolerant (Allport, 1954)

Conflict Hypothesis: Integration makes people more resentful and contentious

n = 3844

http://nlsf.princeton.edu/

Outcome variable: mincloseIndex of perceived closeness to minorities (Blacks and Latinos).Level: IndividualScale: 0 (very distant) to 30 (very close)

Predictor variable: percminEstimated percentage of nonwhite students in your last high schoolLevel: IndividualScale: 0% to 100%

The National Longitudinal Survey of Freshmen (NLSF) follows a cohort of first-time freshman at selective colleges and universities through their college careers. Equal numbers of whites, blacks, Hispanics, and Asians were sampled at each of the 28 participating schools. Among other uses, the data has been collected with the testing of several competing theories of minority underperformance in college in mind. (Fall 1999)

Unit 3 / Page 5

𝑚𝑖𝑛𝑐𝑙𝑜𝑠𝑒=𝛽0+𝛽1𝑝𝑒𝑟𝑐𝑚𝑖𝑛+𝜖

Page 6: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Univariate Analysis

02

46

8P

erc

ent

0 10 20 30Perceived closeness to minorities

02

46

8P

erc

ent

0 20 40 60 80 100Perceived high school percentage of minority students

Irregular somewhat uniform predictor distribution, generally acceptable as long as the outcome is more normally distributed, which it is, with a heavy left shoulder, heavy right tail.

Unit 3 / Page 6

Page 7: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

02

46

8P

erc

ent

0 20 40 60 80 100Perceived high school percentage of minority students

02

46

8P

erc

ent

0 10 20 30Perceived closeness to minorities

Univariate Analysis

When data are discrete, rounded, or otherwise non-continuous, I recommend the discrete option for histograms. Similarly, for scatterplots…

01

02

03

0P

erc

eive

d c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority studentsUnit 3 / Page 7

© Andrew Ho, Harvard Graduate School of Education

Page 8: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Bivariate Analysis

01

02

03

0P

erc

eive

d c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

Unit 3 / Page 8© Andrew Ho, Harvard Graduate School of Education

Page 9: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

01

02

03

0P

erc

eive

d c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

Bivariate Analysis

026.ˆ,18.14ˆ10

So, how can we think about statistical inference,from this sample, about its population?

Unit 3 / Page 9© Andrew Ho, Harvard Graduate School of Education

�̂�𝑖𝑛𝑐𝑙𝑜𝑠𝑒=14.18− .026𝑝𝑒𝑟𝑐𝑚𝑖𝑛

Page 10: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Thinking Statistically, Part 1: The Sampling Distribution

1. When you enter regress minclose percmin, what do we make of our estimated regression slope, its standard error, and its p-value?

2. Imagine a population of bivariate minclose on percmin data (near-infinite). We’re going to ASSUME that THIS (=3844) is our population.

3. Visualize the population regression line. We know our parameters, and , and we can calculate each residual, . (Unlike real life!)

4. Picture a single sample drawn from this population, and let’s say we’re limited due to our pilot design to a sample of size .

5. Estimate the sample regression line, obtaining .6. Now imagine 10, 100, 1000 other samples () from this population (),

each with their own !7. Appreciate how the sample estimate of that you happened to get is

one of many possible estimates you might have had from other samples, and…

8. The “sampling variability” of these estimates is lower for larger sample sizes.

Unit 3 / Page 10

Page 11: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Thought experiment: Consider the NLSF participants our population

01

02

03

0P

erc

eive

d c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

026.,18.14 10

Unit 3 / Page 11© Andrew Ho, Harvard Graduate School of Education

�̂�𝑖𝑛𝑐𝑙𝑜𝑠𝑒=14.18− .026𝑝𝑒𝑟𝑐𝑚𝑖𝑛

Page 12: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

…and imagine taking repeated draws with sample size n=50

05

10

15

20

25

Pe

rcei

ved

clo

sene

ss to

min

oriti

es

0 20 40 60 80 100Perceived high school percentage of minority students

046.ˆ,48.14ˆ10

To Excel!

Unit 3 / Page 12© Andrew Ho, Harvard Graduate School of Education

�̂�𝑖𝑛𝑐𝑙𝑜𝑠𝑒=14.48− .046𝑝𝑒𝑟𝑐𝑚𝑖𝑛

This is our first draw with sample size

Page 13: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Estimated regression lines from 10, 25 and 250 random samples (n=50)

0.052

-0.092

-0.024

Q: What would the distribution of these estimated slopes look like?

01

02

03

0P

erc

eive

d c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

Unit 3 / Page 13

�̂�𝑖𝑛𝑐𝑙𝑜𝑠𝑒=14.18− .026𝑝𝑒𝑟𝑐𝑚𝑖𝑛

The Population and its ParametersThe Sampling Perspective - Draw a sample of size from the population - Obtain sample statistics, and - Repeat again and again to infinity!

Page 14: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The Sampling Distribution: Distribution of across (let’s start with 5000) random samples, in this case, of sample size 50.

02

46

8P

erce

nt

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

Mean of this distribution: -.026Standard Deviation: .027

This is the sampling distribution for the estimated slope, , when the sample size is 50.

The standard deviation of thesampling distribution is called the standard error, in this case, of the estimated slope.

What does the sampling distribution of the slope look like for smaller samples, say, n=10?

Unit 3 / Page 14

�̂�𝑖𝑛𝑐𝑙𝑜𝑠𝑒=14.18− .026𝑝𝑒𝑟𝑐𝑚𝑖𝑛

The Population and its Parameters

Page 15: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Standard Errors and Sample Size

• The standard deviation of the sampling distribution (i.e., the standard error) when n=50 was .027.

• When n=10, the standard error of the slope is .065• The standard error is fundamental to statistical

inference: the variability of a statistic under sampling. It is lower for larger sample sizes.-.

4-.

20

.2.4

Estimated slopes for sample sizes of 50 Estimated slopes for sample sizes of 10

Unit 3 / Page 15

Page 16: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Thinking Statistically, Part 1: The Sampling Distribution

1. When you enter regress minclose percmin, what do we make of our estimated regression slope, its standard error, and its p-value?

2. Imagine a population of bivariate minclose on percmin data (near-infinite). We’re going to ASSUME that THIS (=3844) is our population.

3. Visualize the population regression line. We know our parameters, and , and we can calculate each residual, . (Unlike real life!)

4. Picture a single sample drawn from this population, and let’s say we’re limited due to our pilot design to a sample of size .

5. Estimate the sample regression line, obtaining .6. Now imagine 10, 100, 1000 other samples () from this population (),

each with their own !7. Appreciate how the sample estimate of that you happened to get is

one of many possible estimates you might have had from other samples, and…

8. The “sampling variability” of these estimates is lower for larger sample sizes.

Unit 3 / Page 16

Page 17: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Thinking Statistically, Part 2: The Hypothesis Test

9. Now that we understand that statistics will vary around their true parameters under sampling, and that their variance depends on the sample size, we proceed with a tricky logical reversal.

10. We assume that the true slope parameter is 0! There is no relationship between percmin and minclose! This is the null hypothesis:

11. We assume the sampling distribution has a similar shape and standard deviation (standard error), but that it’s centered on 0.

12. Then, if our actual sample statistic, , is far enough away from 0, we say that is so improbable that cannot be true.

13. How far away? The standard error gives us a benchmark: if is greater than 2ish standard errors away from 0, then we reject

14. Rejecting suggests that . This is our alternative hypothesis, . We have a “statistically significant” predictive relationship, because we have determined to be false.

Unit 3 / Page 17© Andrew Ho, Harvard Graduate School of Education

Page 18: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The sampling distribution of with a known parameter

02

46

8P

erce

nt

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

Mean of this distribution: -.026Standard Deviation: .027

percminlosecmin 026.18.14ˆ

This is the sampling distribution for the estimated slope, , when the sample size is 50.

We can see here that, when the true slope is -.026, and we take a sample of size 50, a sample slope beyond -.1 or .05 is really, really unlikely!

We could imagine a benchmark of plus or minus about 2 standard deviations (in this case, standard errors) beyond which we could say, this is quite unlikely (less than 5% of the time).

1̂ 026.1

)028.,08.(027.*2026.

-.1: Really, really unlikely!

.05: Really, really unlikely!

Quite unlikely

Let’s start by thinking about how unlikely a particular can be. 1̂

Unit 3 / Page 18

Page 19: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

05

10

15

Pe

rcen

t

-.4 -.2 0 .2 .4Estimated slopes for sample sizes of 10

The sampling distribution of with a known parameter

Mean of this distribution: -.026Standard Deviation: .065

percminlosecmin 026.18.14ˆ

This is the sampling distribution for the estimated slope, , when the sample size is 10.

We can see here that, when the true parameter is -.026, and we take a sample of size 10, a sample slope of -.1 or .05 is quite plausible.

We need a benchmark in terms of standard errors! Let’s stick with plus or minus about 2 standard errors, beyond which we could say, is quite unlikely (less than 5% of the time).

1̂ 026.1

)104.,156.(065.*2026.

-.1: Quite plausible. .05: Quite plausible.

Quite unlikely

Unit 3 / Page 19

Page 20: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

What if we only have sample data? Keep imagining the population…

02

46

8P

erce

nt

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

Mean of this distribution: -.026Standard Deviation: .027

percminlosecmin 026.18.14ˆ

This is the sampling distribution for the estimated slope, , when the sample size is 50.

In this example, we *know* the sampling distribution of , and we *know* it is centered on the population parameter of -.026.

In practice, we get one of these estimated slopes, say, -.046, but we have no idea what the population parameter is.

We could assume that is -.046 and then say, “and that’s why I got a slope of -.046!” But then we carry no burden of proof.

1

Unit 3 / Page 20

Page 21: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The Null Hypothesis,

• We accept the burden of proof by assuming no finding, no relationship, no effect, no predictive utility, and thus no slope to the population regression line.

• And we use the likelihood of the data to convince us we’re wrong.• In the case of slope, the null hypothesis is: .• That is, for a unit increment in , we can’t say anything about in the

population.• In simple linear regression, this is equivalent to a population

correlation of zero.• Or, again equivalently, that has accounted for no variance in : = 0 in

the population.• For statistical inference, then, we consider the sampling distribution

when the null hypothesis is true (no finding): • And, assuming is true, if we get a craaaaazy , then we can reject .

0: 10 H

Unit 3 / Page 21

Page 22: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

The “shift”: The sampling distribution of under the null hypothesis

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

This is the sampling distribution for the estimated slope, , when the sample size is 50 under

How do we know the shape of this distribution? How do we know the standard error? In this case, we cheated, because we had the population to sample from! It’s just this standard deviation, which we said was .027.

027.,or,ˆ1̂

1 In the real world, we only have our sample of 50, and we have to *estimate* this standard deviation of the sampling distribution of the slope (the estimated standard error of the slope).

Unit 3 / Page 22© Andrew Ho, Harvard Graduate School of Education

Page 23: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

If one single sample is all we have, how do we estimate ?

05

10

15

20

25

Pe

rcei

ved

clo

sene

ss to

min

oriti

es

0 20 40 60 80 100Perceived high school percentage of minority students

1̂se

1

ˆ1 ˆ,or,ˆse

• The *estimated* standard deviation of the sampling distribution of the slope statistic.

• The estimated standard error of the slope.

• When we know our population, we can get our sampling distribution for smaller ns by sampling n observations over and over, calculating the slope each time, and looking at the standard deviation of the distribution of slopes. Without this…

024.ˆ,027.11

ˆˆ

21̂

XX

RMSEse

i

A measure of vertical variation about the

regression line. More vertical variation means more slope variation.

A measure of horizontal variation about the mean. More horizontal variation means *less* slope variation.

Unit 3 / Page 23© Andrew Ho, Harvard Graduate School of Education

Page 24: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

The shape of the sampling distribution of under the null hypothesis

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

So now we don’t even need the population to get our standard error of slope… we can estimate it!

And then we can say, if our sample slope, , is greater than 2 standard errors from 0, then it’s quite unlikely that slope could have been drawn under the null… so the null is false, and we have a finding! We call the number of standard errors from 0 our t-statistic.

How unlikely is that slope? Well, the probability of being outside two standard deviations is less than 5%... if the distribution is normal. Is the distribution normal?

Unit 3 / Page 24© Andrew Ho, Harvard Graduate School of Education

Page 25: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The Student’s t-Distribution for the sampling distribution of

William Sealy Gossett (aka, Student)

• If the regression model holds…

• And you have to estimate your standard errors from your sample:

• Then the sampling distribution of your slope is almost normal: t-distributed (a bit more leptokurtic, i.e., peaked with thicker tails), and approaches normal as sample size increases.

21̂

XX

RMSEse

i

To Excel!

Unit 3 / Page 25

Page 26: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

• The t statistic is simply the number of standard errors that your sample slope, , is away from 0 (your null hypothesis).

• A large (positive or negative) t means that your is that many standard errors away from your null hypothesis. At a certain magnitude, the probability of sampling under becomes very low.

• If you had to remember one benchmark, remember 2. If the magnitude of t is greater than 2 (and your sample size is greater than 60 or so), then we can reject the null hypothesis, and your predictive/associative relationship is statistically significant.

• If you can remember 3 benchmarks (for large samples):– 1.65 (A low standard. The probability of is less than 10%)– 1.96 (The standard. The probability of is less than 5%)– 2.58 (A high standard. The probability of is less than 1%)– The smaller the sample, the more you must raise these cutoffs.

1

1

1

11

ˆ

ˆ

ˆ

ˆ

seset

Straight interpretation of the t-statistic

0H1̂

Unit 3 / Page 26© Andrew Ho, Harvard Graduate School of Education

Page 27: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

Guess and interpret the t!

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

02

46

8P

erc

ent

-.1 -.05 0 .05 .1Estimated slopes for n=50 under null hypothesis

Recall that we estimated a standard error of our slope:

And our actual sample slope , , was -.046.

We know that the sampling distribution of follows the t-distribution (on 48 degrees of freedom). We assume it is centered on 0 under , and we assume its standard error is .024.

024.)ˆ( 1 se

How many standard errors away is ?1̂

1

1

ˆ

ˆ

se

t

0H

Unit 3 / Page 27© Andrew Ho, Harvard Graduate School of Education

Page 28: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The Regression Output: Std. Err., t, and your p-value.

• We enter regress minclose percmin into Stata for our single random sample of size 50.

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

This is our estimatedregression coefficient.

This is our estimatedstandard error.

This is how many standard errors our coefficient is from our null hypothesis.

This is the probability of sampling a slope that far away or farther under

If we want to reject the null hypothesis and report a finding, we want bigger coefficients, smaller standard errors, bigger t-statistics, and smaller p-values.

Unit 3 / Page 28

Page 29: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

p-value cutoffs and interpretations.• p-values can be interpreted as the probability of sampling a slope of your size ( )

or greater, given a) that the null hypothesis is true and b) that the assumptions of the regression model hold.

• If your p-value is sufficiently small, we reject outright.• The cutoffs for p-values are judgmental, but .05 is most common.• Over many independent experiments with this criterion, we will reject when

is true about 1 in 20 times (5%): these are false findings, false alarms, where we are reporting and interpreting a relationship that is not actually there.

• The .05 cutoff dates back to Sir R. A. Fisher (1925) and arose from the convenience that the associated cutoff for t was around ±2 (for sample sizes of around 60).

• Calling the results “significant” or “statistically significant” is permissible but also lazy and misleading. Of course, I use them all the time.

• To a technical audience, feel free to use the terms. The audience should know not to interpret “significance” substantively or overinterpret the term.

• To a nontechnical audience, try to avoid using the terms unless you can also describe the magnitude of the effect in a meaningful context.

• Statistical significance is necessary but not sufficient for communicating a result.

0H

http://www.jerrydallal.com/LHSP/p05.htm

0H0H

Unit 3 / Page 29

Page 30: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Visualizing p-values and t cutoffsThis is the sampling distribution of your slope under the null hypothesis, if the model holds.

|t| > 1.65↑ for p < .10

|t| > 1.96↑ for p < .05

|t| > 2.58↑ for p < .01

±1t, common under H0

If t is beyond the desired cutoff, the probability of such a sampled slope is so low that we reject the null hypothesis and accept the alternative.

Unit 3 / Page 30Centered on 0: 10 H

Page 31: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

The Unsatisfying Alternative Hypothesis

• The reasoning behind hypothesis testing can be quite counterintuitive.

• Our null hypothesis is simply• If we sample a slope that seems highly unlikely under the null

hypothesis, then we reject and accept the alternative hypothesis.

• Strictly speaking, the alternative is what the null is not:

• The null is no finding, no relationship, no effect, no predictive utility, and thus no slope to the population regression line.

• The alternative is some finding, some relationship, some effect, some predictive utility, and some slope to the population regression line.

• Technically, we only know that our population slope is nonzero.

0: 10 H

0:or 11 HH a

0H

Unit 3 / Page 31

Page 32: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Common and frustrating misconceptions about hypothesis testing

• Rejecting the null hypothesis does not mean accepting that the population slope is your sample slope, just that the population slope is nonzero.

• Rejecting the null hypothesis technically doesn’t even tell us whether the population slope is positive or negative!

• The p-value is not the probability that the null hypothesis is true, it is the probability that you sample a slope with that magnitude or greater, given that the null hypothesis is true.

• We don’t reject the null hypothesis because it has low probability, we reject the null hypothesis because the sampled slope had such low probability.– If you obtain a sample slope with a large t-statistic, either that is a

very unlikely t-statistic, or the null hypothesis is false.• If you ever slip up with this, just say “Sorry, I’m in a Bayesian mood”

Unit 3 / Page 32

Page 33: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Thinking Statistically, Part 2: The Hypothesis Test

9. Now that we understand that statistics will vary around their true parameters under sampling, and that their variance depends on the sample size, we proceed with a tricky logical reversal.

10. We assume that the true slope parameter is 0! There is no relationship between percmin and minclose! This is the null hypothesis:

11. We assume the sampling distribution has a similar shape and standard deviation (standard error), but that it’s centered on 0.

12. Then, if our actual sample statistic, , is far enough away from 0, we say that is so improbable that cannot be true.

13. How far away? The standard error gives us a benchmark: if is greater than 2ish standard errors away from 0, then we reject

14. Rejecting suggests that . This is our alternative hypothesis, . We have a “statistically significant” predictive relationship, because we have determined to be false.

Unit 3 / Page 33© Andrew Ho, Harvard Graduate School of Education

Page 34: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Confidence intervals for the slope parameter• In practice, we express uncertainty around our sample statistic with

“95% confidence intervals.” Also 90% and 99% CIs. • 95% confidence intervals are simple to calculate: Plus and minus

about 2 standard errors from your sample statistic, .

• Intuitively, these intervals provide a plausible range where the values of the true slope, , may exist: “There is a 95% chance that the true slope is between -.093 and .002.”

• As compelling as this is, this statement is completely incorrect, and we cannot in good conscience let you write or say something that is completely incorrect.

• We can, however, let you say something that is partially incorrect!• “…a plausible range for the slope parameter, .” We will accept this,

grudgingly.

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

Unit 3 / Page 34© Andrew Ho, Harvard Graduate School of Education

Page 35: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

For the record, the correct “frequentist” interpretation of CIs.• Strictly speaking, the CI that is centered on your slope statistic, (-.093, .002),

cannot be used to locate at all.• Instead, one must imagine taking an infinite number of samples and

calculating an infinite number of 95% CIs.• On average, about 95% (19 of 20) of these CIs will encompass the true

parameter, .• As for your particular CI, because exists and has a fixed (albeit unknown)

value, it is either inside your CI or not. It is not possible for you to be 95% sure about it being inside your CI, in the same way that you can’t be 95% sure about . You either reject or you do not, and is either in the CI or not.

• So how do you interpret (-.093, .002)? It’s one of an infinite number of 95% CIs calculated from an infinite number of samples of your sample size, and 95% of those CIs contain the true parameter, .

• This is all rather awkward, but it is a signal for whether you’ve been well trained in statistics.

• We will adopt a “Bayesian” interpretation of CIs, though really doing so requires heavy statistical lifting, hence, “the lazy Bayesian interpretation”: it’s a plausible range for the slope parameter. Unit 3 / Page 35© Andrew Ho, Harvard Graduate School of Education

Page 36: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

-4se -3se -2se -1se 1 +1se +2se +3se +4se

The Frequentist Model of Confidence Intervals

1β̂

1β̂

1β̂ 1β̂

For every 20 intervals we construct, we estimate that an average of 1 won’t cover the true value of

Unfortunately, when we compute our 95% CI, we don’t know whether it’s one of the lucky 95% that do cover the true value or the unfortunate 5% that don’t. Thus, in a strict frequentist framework, any single CI provides little interpretive value beyond the hypothesis test.

As before, imagine a sampling distribution centered on an unknown parameter, (though not necessarily zero).

1

1

Unit 3 / Page 36© Andrew Ho, Harvard Graduate School of Education

Easier to be ambiguous or incorrect, “a plausible range for the true (population) slope,” or “a depiction of variability we might expect under sampling” or, “as a lazy Bayesian might say, it’s likely that the true slope is in this interval.” Just don’t say 95%. That’s too precise, and too incorrect.

1

Your CI is one of many CIs. Its edges are not fixed but random.

Page 37: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Calculating Confidence Intervals

We can re-recenter our sampling distribution back on our sample statistic, and we add an interval of plus and minus a certain number of standard errors given by a “critical value” tcrit

tcrit ≈ 1.65↑ for 90% CI

tcrit ≈ 1.96↑ for 95% CI

tcrit ≈ 2.58↑ for 99% CI

Note that this critical t value, tcrit, is NOT the same as your reported t statistic. Your t statistic is, as before, the number of standard errors your sample slope is away from the null hypothesis. The critical t value, tcrit, is the cutoff number of standard errors beyond which a reported t statistic leads us to reject the null hypothesis.

1crit1ˆˆ:CI set

Unit 3 / Page 37Centered on 1̂

For a 95% CI and for moderate sample sizes (50) and above, tcrit is around 2ish.

Important note: Because this is not centered on our null hypothesis, technically, it is no longer a sampling distribution. It identifies a region wherein we might plausibly expect to find the population parameter. If we wanted a defensible “credible interval” or “posterior distribution” for our , we’d have to get Bayesian.

1

Page 38: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Calculating Confidence Intervals

• The regress command automatically outputs a 95% confidence interval for our slope parameter.

• Note that it is centered on our estimate, -.046, and is plus or minus approximately two standard errors.

• It contains 0, so we cannot reject the null at .05: No finding; no relationship.

• If we were to calculate this ourselves, we could obtain tcrit with the following command in Stata:display invttail(n-2,.025) will return 2.011 (about 2, as expected).

• This command for the inverse t-distribution gives us the t cutoff for a particular sample size and percentage. Here, we would enter 48 for n-2.

• We use .025 for a 95% CI, because .025 is the proportion in one tail of the distribution. We’d use .05 for a 90% CI and .005 for a 99% CI.

Unit 3 / Page 38

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

Page 39: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

05

10

15

20

25

Pe

rcei

ved

clo

sene

ss to

min

oriti

es

0 20 40 60 80 100Perceived high school percentage of minority students

Four standard deviations #1 – The RMSE, the conditional standard deviation

Unit 3 / Page 39© Andrew Ho, Harvard Graduate School of Education

The conditional standard deviation of individual points about your regression line.The regression model assumes this variance is equal over X: Homoscedasticity.

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

Page 40: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Four standard deviations #2 – The standard error of slope:

Unit 3 / Page 40© Andrew Ho, Harvard Graduate School of Education

The estimated standard deviation of the sampling distribution of slopes, under the null hypothesis of no slope.

This negative slope (-.076) goes here

This positive slope (.067) goes here

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

1

ˆ1 ˆ,or,ˆse

The estimated slope plus and minus 2ish standard errors gives us the confidence interval for slope.

.0237

Page 41: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Four standard deviations #3: The standard error of the predicted mean

Unit 3 / Page 41© Andrew Ho, Harvard Graduate School of Education

Under the null hypothesis, the estimated standard deviation of the sampling distribution of the predicted mean at a given , that is, the spread of the regression lines where they intersect a given value of .

20X XX 79.44 100X

The standard error of the predicted mean is the smallest at the mean of and grows with horizontal distance from the mean.

In Stata, we can use the predict command--after we run regress--to store all sorts of useful statistics, including the standard errors of predicted means at each value in the data: predict csemean, stdp stores these standard errors in a new variable, csemean.

Page 42: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Four standard deviations #4: The standard error of the individual forecast

Unit 3 / Page 42© Andrew Ho, Harvard Graduate School of Education

20X XX 79.44 100X

The standard error of the individual forecast also “bows” outward from the average but the bowing is less than the standard error of predicted mean.

In Stata, we can store these conditional standard errors of new individual forecasts with the command, predict csefcast, stdf , which stores these standard errors in the variable, csefcast.

Under the null hypothesis, the estimated standard deviation of the sampling distribution of a new individual forecast at a given . That is, the spread of the regression lines PLUS the spread of the individual datapoints around these lines where they intersect a given value of .

Page 43: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Four Standard Deviations: Algebraic and Interpretive Contrasts

Standard Deviation Algebra Interpretation Use

RMSEConditional SD

“Standard Error of Estimate”

An average vertical deviation of

individual points from the best-fit line.

Description and evaluation of the

performance of the prediction equation.

Standard Error of Slope

Variability of slope estimates under the

null hypothesis.Hypothesis testing and CI estimation.

Standard Error of Predicted Mean

of the Regression Line

Variability of the regression line at a

particular , under the null hypothesis.

Prediction error for averages at a

particular level of . Graphical displays.

Standard Error of Individual Forecastof a New Prediction

Variability of individual values at a particular , under the

null hypothesis.

Prediction error for individuals at a

particular level of . Graphical displays.

Unit 3 / Page 43

2

ˆ 2

n

YY ii

2 XX

RMSE

i

2

2

01

XX

XX

nRMSE

i

2

2

011

XX

XX

nRMSE

i

Page 44: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Confidence Intervals for the Predicted Mean

• Recall that the confidence interval for the slope was plus and minus 2ish standard errors from the estimated slope:

• Similarly, for the confidence interval for the predicted mean, we center on the prediction given a particular , and the interval is plus and minus 2ish standard errors from there:

• In Stata, we can get the confidence intervals for the predicted mean for each value we have in the data.

Unit 3 / Page 44

1crit1ˆˆ set

)ˆ(ˆ|XYcritsetY

/* run the initial regression */regress minclose percmin/* store predicted values for each observation to a new variable, yhat */predict yhat/* store conditional standard errors for the conditional mean... */predict csemean, stdp/* generate lower bound of the 95% confidence interval for the predicted mean */gen cilbmean = yhat-invttail(48,.025)*csemean/* generate upper bound of the 95% confidence interval for the predicted mean */gen ciubmean = yhat+invttail(48,.025)*csemean/* list what you have and see if it makes sense to you */list, clean

Page 45: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Confidence Intervals for the Predicted Mean

Unit 3 / Page 45

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

graph twoway (lfitci minclose percmin) (scatter minclose percmin), legend(off) ytitle("Perceived closeness to minorities")

Should feel odd to you that we’re centering on the sample regression line, not the null hypothesis. Same fuzzy reasoning behind centering the slope CI on , identifying a plausible range wherein the true line might lie.

Page 46: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Prediction interval for the individual forecast• We know that this interval will be larger. By taking into account the error due

to the regression line *and* the variability in individuals, these standard errors are typically quite sizable, easily encompassing the vast majority of points in your sample.

• With some temporary notation for the estimated score of a new individual, :

Unit 3 / Page 46

/* store conditional standard errors for the individual forecast... */predict csefcast, stdf

/* generate lower bound of the 95% confidence interval for the predicted individual*/gen cilbfcast = yhat-invttail(48,.025)*csefcast

/* generate upper bound of the 95% confidence interval for the predicted individual*/gen ciubfcast = yhat+invttail(48,.025)*csefcast

/* list what you have and see if it makes sense to you */list, clean

/* Overlay confidence and prediction intervalsgraph twoway (lfitci minclose percmin, stdf) (lfitci minclose percmin, acolor(gray)) (scatter minclose percmin, mcolor(black)), legend(off) ytitle("Perceived closeness to minorities")

Page 47: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Prediction interval for the individual forecast

Unit 3 / Page 47

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

05

1015

2025

Per

ceiv

ed c

lose

ness

to m

inor

ities

0 20 40 60 80 100Perceived high school percentage of minority students

graph twoway (lfitci minclose percmin, stdf) (scatter minclose percmin), legend(off) ytitle("Perceived closeness to minorities")

The interval is much wider, and much less “bowing” is evident due to the strong relative contribution of the MSE to the error estimate.

Page 48: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Three Confidence Intervals

Confidence Interval Parameter Picture Lazy Bayesian Frequentist

Confidence Interval

For the Slope SlopeA plausible range for the slope parameter.

95% of CIs like this one will contain the

slope parameter.

Confidence Interval for the Predicted Mean

Predicted

mean given

A plausible range for the population

predicted mean (the “true” regression line) at a given .

95% of CIs like this one will contain the population

regression line.

Confidence Interval for the

Individual Forecast

(Prediction Interval)

A “new” forecasted

observation given

A plausible range for a “new” individual value at a given .

95% of CIs like this one will contain the

“new” observation.

© Andrew Ho, Harvard Graduate School of Education Unit 3 / Page 48

02

46

8

Percent

-.1 -.05 0 .05 .1Estimated slopes for sample sizes of 50

Page 49: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

As your sample size increases…

Standard Deviation Algebra As So what?

RMSEConditional SD

“Standard Error of Estimate”

RMSE will approach the population

standard deviation of residuals.

Well, it’s good to know how wrong

your predictions can be in the population.

Standard Error of Slope

Sample slopes will vary less and less.

The standard error of the slope goes to 0.

Always good to know the actual slope in

the population.

Standard Error of Predicted Mean

of the Regression Line

The regression line will vary less and less under sampling. This

error goes to 0.

Can make predicted- mean CIs very tight

even when correlations are low.

Standard Error of Individual Forecastof a New Prediction Approaches RMSE.

Can’t make prediction intervals

tighter than 1.96 times the RMSE.

Unit 3 / Page 49

2

ˆ 2

n

YY ii

2 XX

RMSE

i

2

2

01

XX

XX

nRMSE

i

2

2

011

XX

XX

nRMSE

i

Page 50: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

Use and abuse of confidence intervals

© Andrew Ho, Harvard Graduate School of Education Unit 3 / Page 50

http://www.sanger.ac.uk/genetics/CGP/docs/nature431525b_fs.pdf http://www.nature.com/news/2004/040929/full/news040927-9.html

Apart from the linear extrapolation, which is wholly atheoretical, how should we interpret the CIs? For the mean or for the individual forecast? And what about where they cross?

The “overlapping CI test” is conservative. If they do not overlap, then your difference is statistically significant. If they do overlap, the difference could still be statistically significant!

Page 51: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Confidence Intervals and the Margin of Error

Unit 3 / Page 51

In polling, the confidence interval of a percentage can be estimated as a function of sample size alone, This figure shows that, in general, the best way to narrow the confidence interval (decrease the margin of error) is to increase your sample size.In electoral polling, we should be careful not to write off differences within the margin of error. Not only is the overlap test imperfectly aligned to the relevant hypothesis test, but a lead indicates a probability of leading, and that is estimable, relevant, and interpretable.

From fivethirtyeight.com, two months before the 2008 election: A more Bayesian approach.

Page 52: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Power Analysis: Type I and Type II Error

• Type I Error: The False Alarm.– The probability of a false finding ().– Given that the null hypothesis is true (no finding), this is the

probability that you reject the null hypothesis.– The cutoff is typically = .05. We concede that, when the null

hypothesis is true, we’ll report a false finding about 1 in 20 times.• Type II Error: The Missed Opportunity.

– The probability of an overlooked result (, confusing, I know).– Given that the null hypothesis is false (there is a finding), this is

the probability that you retain the null hypothesis.– The probability of rejecting a false null hypothesis (finding a true

finding) is 1 – and goes by the name, “power.”– The typical power cutoff for funding agencies is .80.

Unit 3 / Page 52

Page 53: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

02

46

8P

erce

ntag

e

-.15 -.1 -.05 0 .05 .1

Balancing Type I and Type II Error

© Andrew Ho, Harvard Graduate School of Education Unit 3 / Page 53

: Alternative sampling distribution, knowing we have a finding (=50)

: Null sampling distribution, assuming we don’t have a finding.

This is two standard errors below the null hypothesis, below which we have a finding.

This is two standard errors above the null hypothesis, above which we have a finding.

What is our power? How do we get more?

Balancing Type I and Type II Error on No-Fly Lists, Medical Diagnostics, Spam Filters, Fire Alarms

http://wise.cgu.edu/powermod/power_applet.asp

Page 54: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Interpreting the Null Finding?

Unit 3 / Page 54

A sweeter take on sugar (2/3/94)By Judy Foreman, Globe Staff

Contrary to the belief of many parents, sugar does not make children hyperactive, grouchy, restless, destructive or unable to learn and remember things. This conclusion, based on a study published today in the New England Journal of Medicine, may prove hard to swallow for parents who are convinced that sugar turns their otherwise angelic offspring into holy terrors.

Researchers … studied two groups of children: 25 pre-schoolers with no psychiatric disorders and 23 school-age children described by their parents as sensitive to sugar. … [The] team created three diets for the children and their families, with each diet lasting three weeks and with none of the family members knowing whether a given diet was high in sugar (sucrose), low in sugar but high in aspartame, or low in sugar and high in saccharin, used as a placebo.

For the supposedly sugar-sensitive children, there were no significant differences on any of 39 behavioral and cognitive variables… For the preschoolers, only four of 31 variables measured were different on the various diets, and these showed no consistent pattern. In fact, the researchers said, "the few differences associated with the ingestion of sucrose were more consistent with a slight calming effect than with hyperactivity."

Precisely because the researchers did the study with such care, the findings might be expected to put to rest the notion that sugar causes bad behavior in children.

Page 55: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Power Analysis and Sample Size: Hungry for

Unit 3 / Page 55

Source: Light, R. J., Singer, J. D., & Willett, J. B. (1990) By Design: Planning Research On Higher Education. Cambridge, MA: Harvard University Press, p . 197

Source: Light, R. J., Singer, J. D., & Willett, J. B. (1990) By Design: Planning Research On Higher Education. Cambridge, MA: Harvard University Press, p . 197

Fact # 1:To get more power, you need a larger sample size

371131,047.90

2885783.80

2367616.70

Large = 0.50

Medium = 0.30

Small = 0.10

Statistical power

Anticipated “effect size”(in this case, correlation)

Fact # 2:If the effect you’re looking for is small, you’ll need a larger sample size; if it’s larger, you can get away with a smaller sample size

Fact # 3:If you fail to find an effect with a small sample size, lack of statistical power may be just as reasonable an explanation as the non-existence of an effect (if not more).

Fact # 4:However, if you DO reject the null hypothesis (if you do find an effect), a small sample size is not a concern (as long as the regression model assumptions are appropriate). The whole purpose of hypothesis testing is to identify a “surprising” slope… given a particular sample size.

Page 56: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

05

10

15

20

25

Pe

rcei

ved

clo

sene

ss to

min

oriti

es

0 20 40 60 80 100Perceived high school percentage of minority students

Connecting Units 2 and 3: Correlation vs. Slope

Unit 3 / Page 56

We could evaluate whether the slope is significantly different from 0.Couldn’t we also evaluate whether the correlation is significantly different from 0?

Page 57: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Connecting Units 2 and 3: The Correlation Test

• What makes for a “statistically significant correlation?”• The sample statistic is , the population parameter is the Greek

“r”: , “rho.” – The statistic could be written , but we usually write it simply as ,

a relative emphasis on the correlation as a descriptive vs. inferential statistic (though it is both).

• The null hypothesis, , and the alternative hypothesis, .• The test statistic. Under the null hypothesis, the test statistic is

• As and go up, goes up. Intuitive?• The correlation, , and , leave us with .• Sound familiar?

Unit 3 / Page 57

Page 58: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Correlation vs. simple linear regression

_cons 14.47845 1.257742 11.51 0.000 11.94959 17.00731 percmin -.0455884 .0236967 -1.92 0.060 -.0932337 .002057 minclose Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1437.705 49 29.3409184 Root MSE = 5.2733 Adj R-squared = 0.0522 Residual 1334.78424 48 27.8080049 R-squared = 0.0716 Model 102.920763 1 102.920763 Prob > F = 0.0603 F( 1, 48) = 3.70 Source SS df MS Number of obs = 50

This is how many standard errors our coefficient is from our null hypothesis.

This is the probability of sampling a slope that far away or farther from

Unit 3 / Page 58

This is the observed correlation.

This is the probability of sampling a correlation that far away or farther from

The bivariate regression test of nonzero slope is equivalent to test of nonzero correlation.

Page 59: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

The “significance” of correlation

• Necessary but not sufficient for reporting and interpretation.

© Andrew Ho, Harvard Graduate School of Education Unit 3 / Page 59

0.9 40.8 60.7 80.6 110.5 150.4 240.3 430.2 960.1 384

0.05 15370.01 38415

Sample Correlation

Sample Req’dfor <.05 (*)

• A significant correlation, coefficient, relationship, finding, or result is not necessarily meaningful, impactful, or even substantively interesting.

• We may be very certain that a correlation or coefficient is not zero, but that doesn’t mean it’s large.

• Strike the phrase “highly significant” from your vocabulary.• Learn to think in terms of magnitudes, standard errors, and confidence intervals.

http://mashable.com/2012/02/21/facebook-profiles-job-performance/

Page 60: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

05

10

15

20

25

Pe

rcei

ved

clo

sene

ss to

min

oriti

es

0 20 40 60 80 100Perceived high school percentage of minority students

Connecting Units 2 and 3: The Naming/Reification Fallacy

Unit 3 / Page 60

Students who report 0% minorities in their high school have a predicted closeness to minorities that is 4.56 points higher than students who report 100% minorities in their high school. Does this help resolve the contact/conflict hypotheses?What is the difference between self-reported closeness and actual closeness? What is the difference between self-reported percentages of minorities and actual diversity?

Sometimes the problem is less a “lurking ” than a fundamental misunderstanding of what and were to begin with.

Page 61: Unit 3: Inference for the regression model Class 7… Class 8… Class 9… Class 10… Class 7…Class 8…Class 9…Class 10… Unit 3 / Page 1© Andrew Ho, Harvard Graduate

© Andrew Ho, Harvard Graduate School of Education

Take-home points from Unit 3.

Unit 3 / Page 61

• Statistical hypothesis testing is useful framework when used appropriately.– The repeated sampling of coefficients in a sampling distribution is central to

appreciating statistical results.– With a single sample, we assume no finding and ask our data to convince us to

reject that null hypothesis.• Confidence intervals can be a useful supplement to hypothesis testing

– A plausible region for the population parameter (ambiguous).– Strict Bayesian interpretation (95% likely the parameter is here) is incorrect.– But a lazy Bayesian approach can add interpretive value.

• Appreciate what increasing sample sizes does to sampling distributions, CIs, and power.

• Be careful about interpreting null effects– It very difficult to “prove” the null hypothesis; with a small sample, failure to find

an effect may be due to an underpowered study.• Do not worship at the altar of p-values and asterisks

– There’s nothing sacred about p < 0.05 or any p-value– Always examine the magnitude of the estimated effects and their associated

standard errors to evaluate your findings and place them in context