statistics: unlocking the power of data lock 5 stat 101 dr. kari lock morgan confidence intervals:...

56
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap distribution (3.3) 95% CI using standard error (3.3) Percentile method (3.4)

Upload: tylor-curvey

Post on 31-Mar-2015

242 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

STAT 101Dr. Kari Lock Morgan

Confidence Intervals: Bootstrap Distribution

SECTIONS 3.3, 3.4• Bootstrap distribution (3.3)• 95% CI using standard error (3.3)• Percentile method (3.4)

Page 2: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Confidence Intervals

Population Sample

Sample

Sample

SampleSampleSample

. . .

Calculate statistic for each sample

Sampling Distribution

Standard Error (SE): standard deviation of sampling distribution

Margin of Error (ME)(95% CI: ME = 2×SE)

Confidence Interval

statistic ± ME

Page 3: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Reality One small problem…

… WE ONLY HAVE ONE SAMPLE!!!!

• How do we know how much sample statistics vary, if we only have one sample?!?

BOOTSTRAP!

Page 4: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Sample: 52/100 orange

Where might the “true” p be?

ONE Reese’s Pieces Sample

ˆ 0.52p

Page 5: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

• Imagine the “population” is many, many copies of the original sample

• (What do you have to assume?)

“Population”

Page 6: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces “Population”

Sample repeatedly from this “population”

Page 7: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

• To simulate a sampling distribution, we can just take repeated random samples from this “population” made up of many copies of the sample

• In practice, we can’t actually make infinite copies of the sample…

• … but we can do this by sampling with replacement from the sample we have (each unit can be selected more than once)

Sampling with Replacement

Page 8: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Suppose we have a random sample of 6 people:

Page 9: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Original Sample

A simulated “population” to sample from

Page 10: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap Sample: Sample with replacement from the original sample, using the same sample size.

Original Sample

Bootstrap Sample

Page 11: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

• How would you take a bootstrap sample from your sample of Reese’s Pieces?

Reese’s Pieces

Page 12: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces “Population”

Page 13: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 19, 20, 21, 22

a) Yesb) No

Bootstrap Sample

Page 14: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 19, 20, 21

a) Yesb) No

Bootstrap Sample

Page 15: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 18, 19, 20, 21

a) Yesb) No

Bootstrap Sample

Page 16: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap

A bootstrap sample is a random sample taken with replacement from the original sample, of

the same size as the original sample

A bootstrap statistic is the statistic computed on a bootstrap sample

A bootstrap distribution is the distribution of many bootstrap statistics

Page 17: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Original Sample

BootstrapSample

BootstrapSample

BootstrapSample

.

.

.

Bootstrap Statistic

Sample Statistic

Bootstrap Statistic

Bootstrap Statistic

.

.

.

Bootstrap Distribution

Page 18: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

StatKeylock5stat.com/statkey/

Page 19: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples.

What is the sample size of each bootstrap sample?

(a) 50(b) 1000

Bootstrap Sample

Page 20: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples.

How many bootstrap statistics will you have?

(a) 50(b) 1000

Bootstrap Distribution

Page 21: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

“Pull yourself up by your bootstraps”

Why “bootstrap”?

• Lift yourself in the air simply by pulling up on the laces of your boots

• Metaphor for accomplishing an “impossible” task without any outside help

Page 22: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Sampling Distribution

Population

µ

BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Page 23: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap Distribution

Bootstrap“Population”

What can we do with just one seed?

Grow a NEW tree!

𝑥

Estimate the distribution and variability (SE) of ’s from the bootstraps

µ

Page 24: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap statistics are to the original sample statistic

as

the original sample statistic is to the population parameter

Golden Rule of Bootstrapping

Page 25: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Center

•The sampling distribution is centered around the population parameter

• The bootstrap distribution is centered around the

•Luckily, we don’t care about the center… we care about the variability!

a) population parameterb) sample statisticc) bootstrap statisticd) bootstrap parameter

Page 26: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Standard Error

•The variability of the bootstrap statistics is similar to the variability of the sample statistics

• The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

Page 27: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Confidence Intervals

SampleBootstrap

Sample

. . .

Calculate statistic for each bootstrap sample

Bootstrap Distribution

Standard Error (SE): standard deviation of bootstrap distribution

Margin of Error (ME)(95% CI: ME = 2×SE)

Confidence Interval

statistic ± ME

Bootstrap Sample

Bootstrap Sample

Bootstrap SampleBootstrap

Sample

Page 28: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Based on this sample, give a 95% confidence interval for the true proportion of Reese’s Pieces that are orange.

a) (0.47, 0.57)b) (0.42, 0.62)c) (0.41, 0.51)d) (0.36, 0.56)e) I have no idea

Reese’s Pieces

0.52 ± 2 × 0.05

Page 29: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

What about Other Parameters?Estimate the standard error and/or a confidence interval for...• proportion ()• difference in means ()• difference in proportions ()• standard deviation ()• correlation ()• ... Generate samples with replacement

Calculate sample statisticRepeat...

Page 30: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

• We can use bootstrapping to assess the uncertainty surrounding ANY sample statistic!

• If we have sample data, we can use bootstrapping to create a 95% confidence interval for any parameter!

(well, almost…)

The Magic of Bootstrapping

Page 31: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Used MustangsWhat’s the average price of a used Mustang car?

Select a random sample of n = 25 Mustangs from a website (autotrader.com) and record the price (in $1,000’s) for each car.

Page 32: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Sample of Mustangs:

Our best estimate for the average price of used Mustangs is $15,980, but how accurate is that estimate?

Price0 5 10 15 20 25 30 35 40 45

MustangPrice Dot Plot

𝑛=25 𝑥=15.98 𝑠=11.11

BOOTSTRAP!

Page 33: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Original Sample 1. Bootstrap Sample

2. Calculate mean price of bootstrap sample

3. Repeat many times!

Page 34: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Used Mustangs

Standard Error

Page 35: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Used Mustangs95% CI:

$15,980

($11,624, $20,336)

We are 95% confident that the average price of a used Mustang on autotrader.com is between $11,624 and $20,336

Page 36: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

What percentage of Americans believe in global warming?

A survey on 2,251 randomly selected individuals conducted in October 2010 found that 1328 answered “Yes” to the question

“Is there solid evidence of global warming?”

Give and interpret a 95% CI for the proportion of Americans who believe there is solid evidence of global warming.

Global Warming

Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10. http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party

Page 37: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Global Warmingwww.lock5stat.com/statkey

We are 95% sure that the true percentage of all Americans that believe there is solid evidence of global warming is between 57% and 61%

0.59 2(0.01)= (0.57, 0.61)

Give and interpret a 95% CI for the proportion of Americans who believe there is solid evidence of global warming.

Page 38: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Does belief in global warming differ by political party?

“Is there solid evidence of global warming?”

The sample proportion answering “yes” was 79% among Democrats and 38% among Republicans.(exact numbers for each party not given, but assume n=1000 for each group)

Give a 95% CI for the difference in proportions.

Global Warming

Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10. http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party

Page 39: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Global Warming

95% CI for the difference in proportions:(a) (0.39, 0.43)(b) (0.37, 0.45)(c) (0.77, 0.81)(d) (0.75, 0.85)

Page 40: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Global WarmingBased on the data just analyzed, can you conclude with 95% certainty that the proportion of people believing in global warming differs by political party?

(a) Yes(b) No

Page 41: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Body TemperatureWhat is the average body temperature of humans?

www.lock5stat.com/statkey

98.26 2 0.105

98.26 0.21

98.05,98.47

We are 95% sure that the average body temperature for humans is between 98.05 and 98.47

Shoemaker, What's Normal: Temperature, Gender and Heartrate, Journal of Statistics Education, Vol. 4, No. 2 (1996)

98.6 ???

Page 42: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Other Levels of Confidence

•What if we want to be more than 95% confident?

• How might you produce a 99% confidence interval for the average body temperature?

Page 43: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Percentile Method•For a P% confidence interval, keep the middle P% of bootstrap statistics

•For a 99% confidence interval, keep the middle 99%, leaving 0.5% in each tail.

• The 99% confidence interval would be

(0.5th percentile, 99.5th percentile)

where the percentiles refer to the bootstrap distribution.

Page 44: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap DistributionBest Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

P%

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

P%P%P%

Upper BoundUpper Bound

Lower Bound

•For a P% confidence interval:

Page 45: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

www.lock5stat.com/statkey

Body Temperature

We are 99% sure that the average body temperature is between 98.00 and 98.58

Middle 99% of bootstrap statistics

Page 46: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Level of Confidence

Which is wider, a 90% confidence interval or a 95% confidence interval?

(a) 90% CI(b) 95% CI

Page 47: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Mercury and pH in Lakes

Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)

• For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake?

𝑟=−0.575

Give a 90% CI for

Page 48: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

www.lock5stat.com/statkey

Mercury and pH in Lakes

We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between -0.702 and -0.433.

Page 49: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap CIOption 1: Estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution, and then generate a 95% confidence interval by

Option 2: Generate a P% confidence interval as the range for the middle P% of bootstrap statistics

2statisti Sc E

Page 50: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Two Methods for 95%

:

98.26 2 0.105 98.05,98.47

statistic ± 2×SE

98.05,98.47

Percentile Method :

Page 51: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Two Methods

•For a symmetric, bell-shaped bootstrap distribution, using either the standard error method or the percentile method will given similar 95% confidence intervals

• If the bootstrap distribution is not bell-shaped or if a level of confidence other than 95% is desired, use the percentile method

Page 52: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap Cautions

•These methods for creating a confidence interval only work if the bootstrap distribution is smooth and symmetric

•ALWAYS look at a plot of the bootstrap distribution!

•If the bootstrap distribution is highly skewed or looks “spiky” with gaps, you will need to go beyond intro stat to create a confidence interval

Page 53: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Bootstrap Cautions

Page 54: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

Number of Bootstrap Samples•When using bootstrapping, you may get a slightly different confidence interval each time. This is fine!

•The more bootstrap samples you use, the more precise your answer will be

•Increasing the number of bootstrap samples will not change the SE or interval (except for random fluctuation)

•For the purposes of this class, 1000 bootstrap samples is fine. In real life, you probably want to take 10,000 or even 100,000 bootstrap samples

Page 55: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

SummaryThe standard error of a statistic is the standard

deviation of the sample statistic, which can be estimated from a bootstrap distribution

Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution

Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous

Page 56: Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap

Statistics: Unlocking the Power of Data Lock5

To DoRead Sections 3.3, 3.4

Do HW 3 (due Monday, 2/10)