statistical inference using scrambles and bootstraps

37
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section Spring Meeting Westminster College

Upload: vesta

Post on 29-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Statistical Inference Using Scrambles and Bootstraps. Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section Spring Meeting Westminster College. The Lock 5 Team. Robin & Patti St. Lawrence. Dennis Iowa State. Eric UNC/Duke/ UMinn. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical Inference Using Scrambles and Bootstraps

Statistical Inference Using Scrambles and Bootstraps

Robin LockBurry Professor of Statistics

St. Lawrence University

MAA Allegheny Mountain 2014 Section Spring Meeting

Westminster College

Page 2: Statistical Inference Using Scrambles and Bootstraps

The Lock5 Team

DennisIowa State

KariHarvard/Duke

EricUNC/Duke/UMinn

Robin & PattiSt. Lawrence

Page 3: Statistical Inference Using Scrambles and Bootstraps

What is Statistical Inference?

Hypothesis Test Is an effect observed in a sample true for a population or just due to random chance?

Confidence Interval Based on the data in a sample, find a range of plausible values for a quantity in a population.

Page 4: Statistical Inference Using Scrambles and Bootstraps

Example #1: Beer & Mosquitoes• Volunteers were randomly assigned to drink either a

liter of beer or a liter of water.• Mosquitoes were caught in nets as they approached

each volunteer and counted . n mean

Beer 25 23.60

Water

18 19.22

Does this provide convincing evidence that mosquitoes tend to be more attracted to beer drinkers or could this difference be just due to random chance? Hypothesis Test

Page 5: Statistical Inference Using Scrambles and Bootstraps

Example #2: Mustang Prices• A student selected a random sample of n=25

Mustang (cars) from an internet site and recorded the prices in $1,000’s.

n mean std. dev.

Price 25 15.98 11.11

Find a range of plausible values where the mean price for all Mustangs at this website is likely to be. Confidence Interval

Price (in $1,000’s)

Page 6: Statistical Inference Using Scrambles and Bootstraps

Two Approaches to InferenceTraditional: • Assume some distribution (e.g. normal or t) to

describe the behavior of sample statistics• Estimate parameters for that distribution from

sample statistics• Calculate the desired quantities from the

theoretical distribution

Simulation: • Generate many samples (by computer) to show

the behavior of sample statistics • Calculate the desired quantities from the

simulation distribution

Page 7: Statistical Inference Using Scrambles and Bootstraps

“New” Simulation Methods?

"Actually, the statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by thiselementary method."

-- Sir R. A. Fisher, 1936

Page 8: Statistical Inference Using Scrambles and Bootstraps

Example #1: Beer & Mosquitoesµ = mean number of attracted mosquitoes

H0: μB = μW

Ha: μB > μW

Competing claims about the population means

Based on the sample data:

P-value: The proportion of samples, when H0 is true, that would give results as (or more) extreme as the original sample.

Is this a “significant” difference?

Page 9: Statistical Inference Using Scrambles and Bootstraps

Traditional Inference2. Which formula?

3. Calculate numbers and plug into formula

4. Chug with calculator

5. Which theoretical distribution?

6. df?

7. Find p-value

0.0005 < p-value < 0.001

1. Check conditions

𝑡=𝑥𝐵−𝑥𝑊

√ 𝑠𝐵2𝑛𝐵

+𝑠𝑊

2

𝑛𝑊

𝑡=23.6−19.22

√ 4.12

25+ 3.7❑

2

18

𝑡=3.68

8. Interpret a decision

Page 10: Statistical Inference Using Scrambles and Bootstraps

Simulation Approach

0

Water 21 22 15 12 21 16 19 15 24 19 23 13 22 20 24 18 20 22

Beer 27 20 21 26 27 31 24 19 23 24 28 19 24 29 20 17 31 20 25 28 21 27 21 18 20

Number of Mosquitoes

𝑥𝑊=19.22

𝑥𝐵−𝑥𝑊=4.38

To simulate samples under H0 (no difference):• Re-randomize the values into

Beer & Water groups • Compute

Original Sample

Page 11: Statistical Inference Using Scrambles and Bootstraps

Simulation Approach

0

Water 21 22 15 12 21 16 19 15 24 19 23 13 22 20 24 18 20 22

Beer 27 20 21 26 27 31 24 19 23 24 28 19 24 29 20 17 31 20 25 28 21 27 21 18 20

Number of Mosquitoes

𝑥𝑊=19.22

𝑥𝐵−𝑥𝑊=4.38

To simulate samples under H0 (no difference):27 19 21 24

20 24 18 1921 29 20 2326 20 21 1327 27 22 2231 31 15 2024 20 12 2419 25 21 1823 28 16 2024 21 19 2228 27 15

Page 12: Statistical Inference Using Scrambles and Bootstraps

Simulation Approach

𝑥𝐵=21.76

Number of Mosquitoes

𝑥𝑊=22.50

𝑥𝐵−𝑥𝑊=−0.84

To simulate samples under H0 (no difference):• Re-randomize the values into

Beer & Water groups • Compute

24 20 24 18 1921 29 20 2326 20 21 1327 27 22 2231 31 15 2024 20 12 2419 25 21 1823 28 16 2024 21 19 2228 27 15

Beer Water

2024192024311318242521181521162822192720232221

27 19 2120263119231522122429202721172428

Repeat this process 1000’s of times to see how “unusual” is the original difference of 4.38.

Page 13: Statistical Inference Using Scrambles and Bootstraps

We need technology!

www.lock5stat.com/statkey

StatKey

Freely available web apps with no login requiredRuns in (almost) any browser (incl. smartphones/tablets) Google Chrome App available (no internet needed)Standalone or supplement to existing technology

Page 14: Statistical Inference Using Scrambles and Bootstraps

p-value = proportion of samples, when H0 is true, that are as (or more) extreme as the original sample.

p-value

Page 15: Statistical Inference Using Scrambles and Bootstraps

Price0 5 10 15 20 25 30 35 40 45

MustangPrice Dot Plot

𝑛=25 𝑥=15.98 𝑠=11.11

Key concept: How much can we expect the sample means to vary just by random chance?

Example #2: Mustang PricesStart with a random sample of 25 prices (in $1,000’s)

Goal: Find an interval that is likely to contain the mean price for all Mustangs

Page 16: Statistical Inference Using Scrambles and Bootstraps

Traditional Inference2. Which formula?

3. Calculate summary stats

6. Plug and chug

𝑥± 𝑡∗ ∙𝑠

√𝑛𝑥± 𝑧∗ ∙𝜎√𝑛

,

4. Find t*

95% CI

5. df?

df=251=24

OR

t*=2.064

15.98±2 .064 ∙11.11

√25

15.98±4.59=(11.39 ,20.57)7. Interpret in context

CI for a mean1. Check conditions

Page 17: Statistical Inference Using Scrambles and Bootstraps

Bootstrapping

To create a bootstrap distribution: • Assume the “population” is many, many copies

of the original sample. • Simulate many samples from the population by

sampling with replacement from the original sample

“Let your data be your guide.”

Brad Efron Stanford University

Page 18: Statistical Inference Using Scrambles and Bootstraps

Original Sample (n=6)

Bootstrap Sample(sample with replacement from the original sample)

Finding a Bootstrap Sample

A simulated “population” to sample from

Page 19: Statistical Inference Using Scrambles and Bootstraps

Original Sample Bootstrap Sample

𝑥=15.98 𝑥=17.51

Repeat 1,000’s of times!

Page 20: Statistical Inference Using Scrambles and Bootstraps

Original Sample

BootstrapSample

BootstrapSample

BootstrapSample

●●●

Bootstrap Statistic

Sample Statistic

Bootstrap Statistic

Bootstrap Statistic

●●●

Bootstrap Distribution

StatKey

Page 21: Statistical Inference Using Scrambles and Bootstraps

StatKey

Standard Error𝑠

√𝑛=

11.114

√25=2.2

)

Page 22: Statistical Inference Using Scrambles and Bootstraps

A 95% Confidence Level

Keep 95% in middle

Chop 2.5% in each tail

Chop 2.5% in each tail

We are 95% sure that the mean price for Mustangs is between $11,800 and $20,190

Page 23: Statistical Inference Using Scrambles and Bootstraps

The same method is used for any statistic, including new statistics that are being defined in areas like genetics.

This is very powerful for practioners!(and appreciated by students – especially visual learners)

Page 24: Statistical Inference Using Scrambles and Bootstraps

Why does the bootstrap

work?

Page 25: Statistical Inference Using Scrambles and Bootstraps

Sampling Distribution

Population

µ

BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Page 26: Statistical Inference Using Scrambles and Bootstraps

Bootstrap Distribution

Bootstrap“Population”

What can we do with just one seed?

Grow a NEW tree!

𝑥

Estimate the distribution and variability (SE) of ’s from the bootstraps

µUse the bootstrap errors that we CAN see to estimate the sampling errors that we CAN’T see.

Page 27: Statistical Inference Using Scrambles and Bootstraps

Golden Rule of Bootstraps

The bootstrap statistics are to the original statistic

as the original statistic is to the population parameter.

Page 28: Statistical Inference Using Scrambles and Bootstraps

Example #3: Malevolent Uniforms

Sample Correlation r = 0.43

Do football teams with more malevolent uniforms tend to get more penalty yards?

H0: ρ = 0Ha: ρ > 0

Page 29: Statistical Inference Using Scrambles and Bootstraps

Simulation Approach

Find out how extreme this correlation would be, if there is no relationship between uniform malevolence and penalties.

i.e., What kinds of results (correlations) would we see, just by random chance?

Sample Correlation = 0.43

Page 30: Statistical Inference Using Scrambles and Bootstraps

Randomization by ScramblingOriginal sample

MalevolentUniformsNFL

NFLTeam NFL_Ma... ZPenYds <new>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

LA Raiders 5.1 1.19

Pittsburgh 5 0.48

Cincinnati 4.97 0.27

New Orl... 4.83 0.1

Chicago 4.68 0.29

Kansas ... 4.58 -0.19

Washing... 4.4 -0.07

St. Louis 4.27 -0.01

NY Jets 4.12 0.01

LA Rams 4.1 -0.09

Cleveland 4.05 0.44

San Diego 4.05 0.27

Green Bay 4 -0.73

Philadel... 3.97 -0.49

Minnesota 3.9 -0.81

Atlanta 3.87 0.3

Indianap... 3.83 -0.19

San Fra... 3.83 0.09

Seattle 3.82 0.02

Denver 3.8 0.24

Tampa B... 3.77 -0.41

New Eng... 3.6 -0.18

Buffalo 3.53 0.63

Scrambled MalevolentUniformsNFL

NFLTeam NFL_Ma... ZPenYds <new>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

LA Raiders 5.1 0.44

Pittsburgh 5 -0.81

Cincinnati 4.97 0.38

New Orl... 4.83 0.1

Chicago 4.68 0.63

Kansas ... 4.58 0.3

Washing... 4.4 -0.41

St. Louis 4.27 -1.6

NY Jets 4.12 -0.07

LA Rams 4.1 -0.18

Cleveland 4.05 0.01

San Diego 4.05 1.19

Green Bay 4 -0.19

Philadel... 3.97 0.27

Minnesota 3.9 -0.01

Atlanta 3.87 0.02

Indianap... 3.83 0.23

San Fra... 3.83 0.04

Seattle 3.82 -0.09

Denver 3.8 -0.49

Tampa B... 3.77 -0.19

New Eng... 3.6 -0.73

Buffalo 3.53 0.09

Scrambled sample

StatKey

Repeat 1000’s of times

Page 31: Statistical Inference Using Scrambles and Bootstraps

P-value

Small p-value Strong evidence of a positive association between uniform malevolence and penalty yards.

Page 32: Statistical Inference Using Scrambles and Bootstraps

How does everything fit together?• We use simulation methods to build understanding of the key statistical ideas.

• We then cover traditional normal and t-based procedures as “short-cut formulas”.

• Students continue to see all the standard methods but with a deeper understanding of the meaning.

Page 33: Statistical Inference Using Scrambles and Bootstraps

Intro Stat – Revise the Topics • Descriptive Statistics – one and two samples• Normal distributions• Data production (samples/experiments)

• Sampling distributions (mean/proportion)

• Confidence intervals (means/proportions)

• Hypothesis tests (means/proportions)

• ANOVA for several means, Inference for regression, Chi-square tests

• Data production (samples/experiments)• Bootstrap confidence intervals• Randomization-based hypothesis tests• Normal distributions

• Bootstrap confidence intervals• Randomization-based hypothesis tests

• Descriptive Statistics – one and two samples

Page 34: Statistical Inference Using Scrambles and Bootstraps

Transitioning to Traditional Inference

Confidence Interval:

Hypothesis Test:

Page 35: Statistical Inference Using Scrambles and Bootstraps
Page 36: Statistical Inference Using Scrambles and Bootstraps

The Next Big Thing...“... the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.”

-- Professor George Cobb, 2007

Page 37: Statistical Inference Using Scrambles and Bootstraps

Thanks for listening!

[email protected]

www.lock5stat.com