stat 155, section 2, last time

56
Stat 155, Section 2, Last Time • Linear Regression – Fit a line to data • Least Squares Prediction • Residual Diagnostic Plot • Producing Data • How to Sample? – History of Presidential Election Polls

Upload: malana

Post on 14-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Stat 155, Section 2, Last Time. Linear Regression Fit a line to data Least Squares Prediction Residual Diagnostic Plot Producing Data How to Sample? History of Presidential Election Polls. Reading In Textbook. Approximate Reading for Today’s Material: Pages 198-210, 218-225 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stat 155,  Section 2, Last Time

Stat 155, Section 2, Last Time

• Linear Regression– Fit a line to data

• Least Squares Prediction

• Residual Diagnostic Plot

• Producing Data

• How to Sample?– History of Presidential Election Polls

Page 2: Stat 155,  Section 2, Last Time

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 198-210, 218-225

Approximate Reading for Next Class:

Pages 231-240, 256-257

Page 3: Stat 155,  Section 2, Last Time

Common Problem

Adding lines to an Excel Plot

E.g. Textbook problem 2.17

• Plot Data

• Add line with “Add trendline”

• Add line: y = 35+.5x

• Explicitly add least squares fit line

Page 4: Stat 155,  Section 2, Last Time

Chapter 3: Producing Data

(how this is done is critical to conclusions)

Section 3.1: Statistical Settings

2 Main Types:

I. Observational Study

II. Designed Experiment

Page 5: Stat 155,  Section 2, Last Time

Producing Data

2 Main Types:

I. Observational Study

II. Experiment

(Make Changes, & Study Effect)Apply “treatment” to individuals & measure

“responses”

e.g. Clinical trials for drugs, agricultural trials

(safe? effective?) (max yield?)

Page 6: Stat 155,  Section 2, Last Time

Producing Data

2 Main Types:

I. Observational Study

II. Experiment

(common sense)

Caution: Thinking is required for each.

Both if you do statistics & if you need to understand somebody else’s results

Page 7: Stat 155,  Section 2, Last Time

Helpful Distinctions(Critical Issue of “Good” vs. “Bad”)

I. Observational Studies:

A. Anecdotal Evidence

Idea: Study just a few cases

Problem: may not be representative

(or worse: only considered for this reason)

e.g. Cures for hiccups

Key Question: how were data chosen?(early medicine: this gave crazy attempts at cures)

Page 8: Stat 155,  Section 2, Last Time

Helpful DistinctionsI. Observational Studies:

B. Sampling

Idea: Seek sample representative of population

Challenge: How to sample?

(turns out: not easy)

Page 9: Stat 155,  Section 2, Last Time

How to sample?History of Presidential Election Polls

During Campaigns, constantly hear in news “polls say …” How good are these? Why?

1936 Landon vs. Roosevelt Literary Digest Poll: 43% for R

Result: 62% for R

What happened?Sample size not big enough? 2.4 million

Biggest Poll ever done (before or since)

Page 10: Stat 155,  Section 2, Last Time

Bias in SamplingBias: Systematically favoring one outcome

(need to think carefully)

Selection Bias: Addresses from L. D.

readers, phone books, club memberships

(representative of population?)

Non-Response Bias: Return-mail survey

(who had time?)

Page 11: Stat 155,  Section 2, Last Time

How to sample?1936 Presidential Election (cont.)

Interesting Alternative Poll:

Gallup: 56% for R (sample size ~ 50,000)

Gallup of L.D. 44% for R ( ~ 3,000)

Predicted both correct result (62% for R),

and L. D. error (43% for R)!

(what was better?)

Page 12: Stat 155,  Section 2, Last Time

Improved SamplingGallup’s Improvements:

(i) Personal Interviews

(attacks non-response bias)

(ii) Quota Sampling

(attacks selection bias)

Page 13: Stat 155,  Section 2, Last Time

Quota SamplingIdea: make “sample like population”

So surveyor chooses people to give:i. Right % male

ii. Right % “young”

iii. Right % “blue collar”

iv. …

This worked well, until …

Page 14: Stat 155,  Section 2, Last Time

How to sample?1948 Dewey Truman sample size

Crossley 50% 45%

Gallup 50% 44% 50,000

Roper 53% 38% 15,000

Actual 45% 50% -

Note: Embarassing for polls, famous photo of Truman + Headline “Dewey Wins”

Page 15: Stat 155,  Section 2, Last Time

What went wrong?Problem: Unintentional Bias

(surveyors understood bias,

but still made choices)

Lesson: Human Choice can not give a Representative Sample

Surprising Improvement: Random Sampling

Now called “scientific sampling”

Random = Scientific???

Page 16: Stat 155,  Section 2, Last Time

Random SamplingKey Idea: “random error” is smaller than

“unintentional bias”, for large enough sample sizes

How large?

Current sample sizes: ~1,000 - 3,000

Note: now << 50,000 used in 1948.

So surveys are much cheaper

(thus many more done now….)

Page 17: Stat 155,  Section 2, Last Time

Random Sampling

How Accurate?

• Can (& will) calculate using “probability”

• Justifies term “scientific sampling”

• 2nd improvement over quota sampling

Page 18: Stat 155,  Section 2, Last Time

And now for something completely different

Recall

Distribution

of majors of

students in

this course:

Stat 155, Section 2, Majors

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Busine

ss /

Man

.

Biolog

y

Public

Poli

cy /

Health

Pharm

/ Nur

sing

Jour

nalis

m /

Comm

.

Env. S

ci.

Other

Undec

ided

Fre

qu

ency

Page 19: Stat 155,  Section 2, Last Time

And now for something completely different

A man goes into a drugstore and asks the pharmacist if he can give him something for the hiccups. The pharmacist promptly reaches out and slaps the man's face."What did you do that for?" the man asks.

Page 20: Stat 155,  Section 2, Last Time

And now for something completely different

What did you do that for?" the man asks.

"Well, you don't have the hiccups anymore, do you?“

The man says, "No, but my wife out in the car still does!"

Page 21: Stat 155,  Section 2, Last Time

And now for something completely different

An elderly woman went into the doctor's office. When the doctor asked why she was there, she replied, "I'd like to have some birth control pills."

Taken aback, the doctor thought for a minute and then said, "Excuse me, Mrs. Smith, but you're 75 years old. What possible use could you have for birth control pills?"

The woman responded, "They help me sleep better."

Page 22: Stat 155,  Section 2, Last Time

And now for something completely different

The woman responded, "They help me sleep better."

The doctor thought some more and continued, "How in the world do birth control pills help you to sleep?"

The woman said, "I put them in my granddaughter's orange juice and I sleep better at night."

Page 23: Stat 155,  Section 2, Last Time

Random Sampling

How Accurate?

• Can (& will) calculate using “probability”

• Justifies term “scientific sampling”

• 2nd improvement over quota sampling

Page 24: Stat 155,  Section 2, Last Time

Random SamplingWhat is random?

Simple Random Sampling:

Each member of population is

equally likely to be in sample

Key Idea: Different from “just choose some”

Page 25: Stat 155,  Section 2, Last Time

Random SamplingAn old (but still fun?) experiment:

Choose a number among 1,2,3,4

Old typical results: about 70% choose “3”

(perhaps you have seen this before…)

Main lesson: human choice does not give “equally likely” (i.e. random sample)

Page 26: Stat 155,  Section 2, Last Time

Random Sampling

How to choose a random sample?

Old Approaches:

– Random Number Table

– Roll Dice

Modern Approach:

– Computer Generated

Page 27: Stat 155,  Section 2, Last Time

Random SamplingEXCEL generation of random samples:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg16.xls

Goal 1: Generate Random Numbers

EXCEL approaches:

• RAND function

• Tools Data Analysis Random

Number Generation

Page 28: Stat 155,  Section 2, Last Time

EXCEL Random SamplingGoal 2: Randomly Reorder List

EXCEL approach:

• Highlight block with list & random num’s

• Sort whole thing on numbers

Goal 3: Random Sample from List

• Choose 1st subset from random re-order

• Since, each equally likely in each spot

Page 29: Stat 155,  Section 2, Last Time

EXCEL DetailsRAND:

• Not available among “Statistical” functions

• But can find on “All” menu

• Note no (explicit) inputs

• Just put in desired cell

• Drag downwards for several random #s

• Caution: these change on each re-comp.

• Thus not recommended for this

Page 30: Stat 155,  Section 2, Last Time

EXCEL DetailsTools Data Analysis Random Number

Generation :• Set: # Variables: 1

Distribution: Uniform (over [0,1])

• Generates Fixed List

(doesn’t change with re-computation)

(note entries are “just numbers”)• Thus stable for later interpretation• Recommended for random sample choice

Page 31: Stat 155,  Section 2, Last Time

EXCEL DetailsSorting Lists:

• Highlight Block with Both:

– Names to sort

– Random numbers

• Data Sort Choose Column

• Result is random re-ordering of List

Page 32: Stat 155,  Section 2, Last Time

Random Sampling HWHW:

C7: For the letters A – L, use EXCEL to:

(a) Put in a random order.

(b) Choose a random sample of 6.

(Hints: for (a), want each equally likely,

for (b), reorder, and choose a subset)

Page 33: Stat 155,  Section 2, Last Time

Random Sampling HWInteresting Question:

What is the % of Male Students at UNC?

(Your chance of date,

or take 100% - to get your chance)

HW:

C8: Print Class Handouthttp://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155HWC8.doc

Page 34: Stat 155,  Section 2, Last Time

Random Sampling HWNotes on HW C8:• 3 dumb ways to sample, 1 good one• Goal is to learn about sampling,

Not “get right answer”• Part 1, put symbol for yourself, Ms and Fs

for others• Put both count & % (%100 x count / 25)• Part 2, “tally” is:• Part 4, student phone directory available

in Student Union?

Page 35: Stat 155,  Section 2, Last Time

Random Sampling HWNotes on HW C8,

• Hints on Part 4:– For each draw, first draw a “random page”– Tools Data Analysis Random Number

Generation Uniform is one way to do this– In “Uniform”, you need to set “Parameters”, to

0 and “number of pages”.– This gives a random decimal, to get an

integer, round up, using CEILING– In CEILING, set “significance” to 1.

Page 36: Stat 155,  Section 2, Last Time

Random Sampling HWNotes on HW C8,

• Hints on Part 4 (cont.):– Next Choose Random Column– Next Choose Random Name– Caution: Different numbers on each page.– Challenge: still make equally likely– Approach: choose larger number.– Approach: when not there, just toss it out– Approach: then do a “redraw”– Also redraw if can’t tell gender

Page 37: Stat 155,  Section 2, Last Time

More On SurveysMore Common Sense:

How you ask the question

makes a big difference

HW:

3.57, 3.59

Page 38: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Extreme Bicycling

Need a bicycle helmet there?

Page 39: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Page 40: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Page 41: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Page 42: Stat 155,  Section 2, Last Time

And Now for Something Completely Different

Page 43: Stat 155,  Section 2, Last Time

More about SamplingThe “simple random sample” (recall “each

equally likely”) can be expensive

(e.g. nationwide political poll, collected by personal interview)

So there are many cheaper variations:– Stratified Sampling– Multi Stage Sampling– See text– And there are many others as well

Page 44: Stat 155,  Section 2, Last Time

Sampling for ExperimentsII. Experiments

(Recall I was Observational Studies,

Now take similar look at II)

Terminology:

“treatments” are applied to “individuals”

i.e. to “subjects”

i.e. to “experimental units”

Page 45: Stat 155,  Section 2, Last Time

Sampling for ExperimentsA “treatment” is:

a combination of “levels”,

of explanatory variables (quantities),

called “factors”.

E.g. Medicine, Agriculture, …

Page 46: Stat 155,  Section 2, Last Time

Sampling for ExperimentsAgriculture Example:

Study how plant growth depends on:

fertilizer and water

So plants = “experiment’l units”, i.e. “subjects”

“Factors” are fertilizer and water,

Each plant gets some “level” of each.

Page 47: Stat 155,  Section 2, Last Time

HW on Sampling TerminologyHW:

3.9

3.11

Page 48: Stat 155,  Section 2, Last Time

Design of ExperimentsThe “design” of an experiment is the

assignment of levels and treatments to

experimental units

(just as “choice of sample” was critical for

sampling, this is too. There is a huge

literature on this, including current

research)

Page 49: Stat 155,  Section 2, Last Time

Design of ExperimentsKey Design Issues:

1. Control

Idea: Eliminate “lurking variable” effects,

by comparing treatments on groups of

similar experimental units.

Page 50: Stat 155,  Section 2, Last Time

Controlled Experiments

Common Type: compare “treatment” with

“placebo”, a “sham treatment” that

controls for psychological effects

(think you are better, just because you are

treated, so you are better…)

Called a “blind” experiment

Page 51: Stat 155,  Section 2, Last Time

Controlled Experiments

Further Refinement:

“Double Blind” experiment means neither

patient, nor doctor knows is real or not

Eliminates possible doctor bias

Page 52: Stat 155,  Section 2, Last Time

Design of Experiments

2. Randomization

Useful method for choosing groups above

(e.g. Treatment and Control)

Recall: Different from “just choose some”,

instead means “make each equally likely”

Page 53: Stat 155,  Section 2, Last Time

Design of Experiments2. Randomization

Big Plus: Eliminates biases,

i.e. effects of “lurking variables”

(same as random choice of samples,

again pay price of added variability,

but well worth it)

Page 54: Stat 155,  Section 2, Last Time

Design of Experiments3. Replication

Idea: Reduce chance variation by applying

same treatment to several (even many?)

experimental units.

How many replications are needed?

(depends on context: tradeoff between

cost and reduction of variation)

Will build tools to study (based on probability)

Page 55: Stat 155,  Section 2, Last Time

Design of ExperimentsFancier Designs

(there are many, some in text)

• Blocks

• Matched Pairs

• Balanced Designs

Page 56: Stat 155,  Section 2, Last Time

Example of an Experiment(to tie above ideas together)

Gastric Freezing:

Treatment for stomach ulcers

– Anesthetize patient

– Put balloon in stomach

– Fill with freezing coolant