models and modeling in introductory statistics

Models and Modeling in Introductory Statistics

Robin H. LockBurry Professor of Statistics

St. Lawrence University

2012 Joint Statistics MeetingsSan Diego, August 2012

What is a Model?

A simplified abstraction that approximates important features of a

more complicated system

Traditional Statistical Models

PopulationYN(μ,σ)

Often depends on non-trivial mathematical ideas.

Traditional Statistical Models

Relationship𝑌 𝛽0+𝛽1 𝑋+𝜀

Predictor (X)

“Empirical” Statistical Models

A representative sample looks like a mini-version of the population.

Model a population with many copies of the sample.

BootstrapSample with replacement from an original sample to study the behavior of a statistic.

“Empirical” Statistical ModelsHypothesis testing: Assess the behavior of a sample statistic, when the population meets a specific criterion.

Create a Null Model in order to sample from a population that satisfies H0

Randomization

Traditional vs. Empirical

Both types of model are important, BUTEmpirical models (bootstrap/randomization) are• More accessible at early stages of a course• More closely tied to underlying statistical

concepts• Less dependent on abstract mathematics

Example: Mustang Prices

Estimate the average price of used Mustangs and provide an interval to reflect the accuracy of the estimate.

Data: Sample prices for n=25 Mustangs

Price10 20 30 40 50

MustangPrice Dot Plot

𝑥=15.98 𝑠=11.11

Original Sample Bootstrap Sample

Original Sample

BootstrapSample

Bootstrap Statistic

Sample Statistic

Bootstrap Statistic

Bootstrap Distribution

Bootstrap Distribution: Mean Mustang Prices

Background?

What do students need to know about before doing a bootstrap interval?

• Random sampling• Sample statistics (mean, std. dev., %-tile)• Display a distribution (dotplot)• Parameter vs. statistic

Traditional Sampling Distribution

Population

BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Bootstrap Distribution

Bootstrap“Population”

What can we do with just one seed?

Grow a NEW tree!

Estimate the distribution and variability (SE) of ’s from the bootstraps

Round 2

Course Order• Data production• Data description (numeric/graphs)• Interval estimates (bootstrap model)• Randomization tests (null model)• Traditional inference for means and

proportions (normal/t model)• Higher order inference (chi-square,

ANOVA, linear regression model)

Traditional models need mathematics,

Empirical models need technology!

Some technology options:• R (especially with Mosaic)• Fathom/Tinkerplots• StatCrunch• JMP

StatKeywww.lock5stat.com

Three Distributions

One to Many Samples

Built-in data Enter new data

Interact with tails

Distribution Summary Stats

Smiles and LeniencyDoes smiling affect leniency in a college disciplinary hearing?

Null Model: Expression has no affect on leniency

LeFrance, M., and Hecht, M. A., “Why Smiles Generate Leniency,” Personality and Social Psychology Bulletin, 1995; 21:

Smiles and LeniencyNull Model: Expression has no affect on leniency

To generate samples under this null model:• Randomly re-assign the smile/neutral labels to

the 68 data leniency scores (34 each).• Compute the difference in mean leniency

between the two groups, • Repeat many times• See if the original difference, , is unusual in the

randomization distribution.

StatKey

p-value = 0.023

Traditional t-testH0:μs = μn H0:μs > μn

𝑡= 4.91−4.12

√ 1.52234+1.68

=0.790.39=2.03

Round 3

Assessment? Construct a bootstrap distribution of sample means for the SPChange variable. The result should be relatively bell-shaped as in the graph below. Put a scale (show at least five values) on the horizontal axis of this graph to roughly indicate the scale that you see for the bootstrap means.

Estimate SE? Find CI from SE? Find CI from percentiles?

Assessment? From 2009 AP Stat: Given summary stats, test skewness

Find and interpret a p-value

ratio0.94 0.96 0.98 1.00 1.02 1.04 1.06

Measures from Collection 1 Dot Plot

𝑅𝑎𝑡𝑖𝑜=𝑥

𝑚𝑒𝑑𝑖𝑎𝑛 Given 100 such ratios for samples drawn from a symmetric distribution

Ratio=1.04 for the original sample

Implementation Issues

• Good technology is critical

• Missed having “experienced” student support the first couple of semesters

Round 4

Why Did I Get Involved with Teaching Bootstrap/Randomization Models?

It’s all George’s fault...

"Introductory Statistics: A Saber Tooth Curriculum?"

Banquet address at the first (2005) USCOTS

George Cobb

Introduce inference with “empirical models” based on simulations from the sample data (bootstraps/randomizations), then approximate with models based on traditional distributions.

Models in Introductory Statistics

models and modeling in introductory statistics

original statistic

sample prices

leniencynull model

bootstrap interval

affect leniency

empiricalboth types

population parameter

traditional ttesth0

Documents

computational modeling in introductory...

models and modeling in introductory statistics robin h. lock...

lecture notes for openstax introductory statistics

recommendations introductory statistics textbooks and the

lecture 01 - introductory management statistics

spss for introductory statistics

introductory statistics – part 1 - samuel chukwuemeka ·...

si 544 introductory statistics and data analysis ... ·...

introductory applied statistics: a variable approach ti...

introductory statistics health sciences

introductory statistics notes - stat-help.com - free...

stat 101 introductory statistics - mysmu. · pdf...

modeling turbulent flows introductory fluent...

risk modeling in introductory statistics

introductory statistics, 7th edition - tim busken

simpler – using r for introductory statistics

introductory turbulence modeling

introductory business statistics - saylor · pdf filethis...

introductory statistics, shafer zhang-attributed

introductory statistics chapter 2 descriptive statistics...