stat 512 – day 5 statistical significance with quantitative response variable

22
Stat 512 – Day 5 Statistical significance with quantitative response variable

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 512 – Day 5 Statistical significance with quantitative response variable

Stat 512 – Day 5

Statistical significance with quantitative response variable

Page 2: Stat 512 – Day 5 Statistical significance with quantitative response variable

Last Time – Summarizing Quantitative Variables Graphical summaries: (parallel) dotplots,

boxplots, stemplots, histograms Shape (skewed?, “even”?), center, spread,

unusual observations Try several different graphs, scalings

Numerical summaries Center: median (five-number summary), mean

Mean = average of all values (not “resistant”) Median = “typical” value

Spread: interquartile range (IQR=Q3-Q1), standard deviation

Page 3: Stat 512 – Day 5 Statistical significance with quantitative response variable

Last Time – Summarizing Quantitative Variables (cont.) Interquartile range

Width of middle 50% of data values Length of box in boxplot

• 1978 IQR = 81-58 = 23 min

• 2003 IQR = 98-87 = 11 min

• Without outliers IQR = 98-87 = 11 min

Page 4: Stat 512 – Day 5 Statistical significance with quantitative response variable

Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR Standard deviation

Want to compare the distance of the observations from the mean Deviation from mean: yi- Absolute deviations Squared deviations

1

)(1

2

2

n

yys

n

ii

y

1

)(1

2

n

yys

n

ii

Page 5: Stat 512 – Day 5 Statistical significance with quantitative response variable

Old Faithful

1978 SD = 13 minutes 2003 SD = 8.5 minutes

Without outliers SD=6.9 (SD is not resistant!)

Page 6: Stat 512 – Day 5 Statistical significance with quantitative response variable

Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR

Width of middle 50% of data values Length of box in boxplot

Standard deviation, s Want to compare the distance of the observations

from the mean Loose interpretation as a typical deviation from the mean

of the data values

Page 7: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example 3

What do we mean by variability? Most among classes A, B, C Least among classes A, B, C Most between C and D Most between D and E What about F?

Page 8: Stat 512 – Day 5 Statistical significance with quantitative response variable

PP 4

Quartiles Vocabulary

“Normal” “In lower quartile”

Center vs. spread SF vs. Raleigh (59.5 and 57 degrees) Quality control, freezer, distance of homeruns,

test scores in a class, real estate prices, drumstick breakage strength, times of cross country team, medical team response time, commuting times, earthquake strengths, weight loss

Page 9: Stat 512 – Day 5 Statistical significance with quantitative response variable

Notes on Using Minitab

Worksheets vs. Projects Saving graph windows Stacked vs. unstacked data

Page 10: Stat 512 – Day 5 Statistical significance with quantitative response variable

What’s Left?

Have learned how to perform descriptive statistics with a quantitative response variable

Found is a difference in average rainfall amounts between seeded and unseeded clouds (442 acre-feet vs. 164.6 acre-feet).

Are you convinced that this reflects a true treatment effect from cloud seeding?

Page 11: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example 1: Sleep Deprivation and Visual Learning “Visual discrimination learning requires post-

training sleep,” Stickgold, R., James, L., & Hobson, J.A. Nature Neuroscience, 2:1237-1238, 2000.

Page 12: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example 1

improvement

sleep c

onditio

n

4032241680-8-16

deprived

unrestricted

Sleep group Sample size

Mean improvement

Median improvement

Deprived 11 3.90 4.50

Unrestricted 10 19.82 16.55

Page 13: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example Summary

These data come from a randomized, comparative experiment. The dotplots and descriptive statistics reveal that the sleep-deprived subjects tended to have lower improvements than those permitted unrestricted sleep.

But is this difference statistically significant?

Page 14: Stat 512 – Day 5 Statistical significance with quantitative response variable

How Decide?

Page 15: Stat 512 – Day 5 Statistical significance with quantitative response variable

All possible random assignments

Page 16: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example Summary (cont.)

Randomization alone rarely produced group differences in group means as extreme as in the actual study (the p-value is less than .01). Thus, we have fairly strong evidence that the learning improvements are genuinely lower with the sleep-deprived subjects. Moreover, because this was a randomized experiment, we can draw a causal conclusion that the sleep deprivation was the cause.

Page 17: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example 2

Actual study Hypothetical data

92.15 deprivededunrestrict xx 92.15 deprivededunrestrict xx

Page 18: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example 3: Lifetimes of Notables

Writers (n=20) Scientists (n=20)

9 2

5 3

3 4 8

9 5 0389

76622100 6 66

751 7 0357789

9530 8 679 Leaf unit = 1 year

0 9 004

Page 19: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example Summary - descriptive Graphical and numerical summaries reveal that

scientists do tend to live longer than writers, and the difference in median lifetimes is 10 years (76 for scientists, 66 for writers). Both distributions are roughly symmetric, perhaps a bit skewed to the left. The lifetimes vary more for the writers in that they range from the 20s through 90 years, as opposed to scientists ranging from the 40s through the 90s, but on the other hand, the writers’ lifetimes have a strong concentration in the 60s. Neither group has obvious outliers.

Page 20: Stat 512 – Day 5 Statistical significance with quantitative response variable

Example Summary- inferential The simulation reveals that the approximate p-value

for comparing the group means is about .07. This suggests that if there was no difference between the groups, it is unlikely, but not terribly so, for such a large difference to occur by chance alone. However, we cannot attribute the longer lifetimes to the choice of occupation, because this observational study does not control for confounding variables. One explanation for the observed tendency is that scientists require more formal training in order to succeed than writers, so someone who dies young but famous is more likely to have achieved fame as a writer than as a scientist.

Page 21: Stat 512 – Day 5 Statistical significance with quantitative response variable

flexibilty

Page 22: Stat 512 – Day 5 Statistical significance with quantitative response variable

For Thursday

PP 5 See HW handout for graph

Reading Finally start talking about selecting the

observational/experimental units in the first place HW 3 Perhaps some time for groups to meet

together and brainstorm?