introduction to telling stories with...

Post on 26-Jul-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction toTelling Stories with Data

Author: Nicholas G Reich

This material is part of the statsTeachR project

Made available under the Creative Commons Attribution-ShareAlike 3.0 UnportedLicense: http://creativecommons.org/licenses/by-sa/3.0/deed.en US

Communicating ideas with evidence

What is a narrative? [From the OED]

An account of a series of events, facts, etc., given in order andwith the establishing of connections between them; a narration, astory, an account.

What is data? [From Google: literally, “what is data”]

How to tell a story using data

Telling stories with data requires

I detective work

I creativity, both scientific and artistic

I experimentation

I good data, (good data does not nescessarily equal big data)

A common tool: regression

� The goal is to learn about the relationship between acovariate (predictor) of interest and an outcome of interest.

� Some models focus on prediction.� Other models focus on description.

� Regression is an exercise in inferential statistics: we aredrawing evidence and conclusions from data about “noisy”systems.

State-level SAT score data (1994-95)

Alabama

AlaskaArizona

Arkansas

California

Colorado

ConnecticutDelaware

Florida

Georgia

Hawaii

Idaho

Illinois

Indiana

Iowa

Kansas

Kentucky

Louisiana

MaineMarylandMassachusetts

Michigan

Minnesota

MississippiMissouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

PennsylvaniaRhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

VermontVirginia

WashingtonWest Virginia

Wisconsin

Wyoming

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

State-level SAT score data (1994-95)

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

State-level SAT score data (1994-95)

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

The SAT example

What is the outcome variable?

What is the covariate or predictor variable?

What other data might be part of this story?

State-level SAT score data (1994-95)

20

40

60

80

25 30 35 40 45 50est. average public school teacher salary

perc

enta

ge o

f all

elig

ible

stu

dent

s ta

king

the

SAT

State-level SAT score data (1994-95)

20

40

60

80

25 30 35 40 45 50est. average public school teacher salary

perc

enta

ge o

f all

elig

ible

stu

dent

s ta

king

the

SAT

% taking SATlow

medium

high

State-level SAT score data (1994-95)

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

% taking SATlow

medium

high

State-level SAT score data (1994-95)

low medium high

900

1000

1100

25 30 35 40 45 5025 30 35 40 45 5025 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

% taking SATlow

medium

high

State-level SAT score data (1994-95)

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

% taking SATlow

medium

high

State-level SAT score data (1994-95)

Alabama

AlaskaArizona

Arkansas

California

Colorado

ConnecticutDelaware

Florida

Georgia

Hawaii

Idaho

Illinois

Indiana

Iowa

Kansas

Kentucky

Louisiana

MaineMarylandMassachusetts

Michigan

Minnesota

MississippiMissouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

PennsylvaniaRhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

VermontVirginia

WashingtonWest Virginia

Wisconsin

Wyoming

900

1000

1100

25 30 35 40 45 50est. average public school teacher salary

aver

age

tota

l SAT

sco

re

% taking SATlow

medium

high

State-level SAT score data (1994-95)

What can we conclude from all of this? (BTW, this is an exampleof ”Simpson’s Paradox”.)

Beware of correlation!

Beware of correlation!

1

1 Hat tip to www.tylervigen.com

Regression modeling

The process of using data to describe the relationship betweenoutcomes and predictors is called modeling.

� Models are models, not reality.

� “All models are wrong, but some are useful.”

� Introduce structure to our model that balances realism with“goodness of fit”.

Things to come

� Tools to help tell stories with data.I SoftwareI Statistical methods

� Practice developing and conceiving models/stories.

top related