stat 512 – day 5 statistical significance with quantitative response variable
Post on 22-Dec-2015
215 views
TRANSCRIPT
Stat 512 – Day 5
Statistical significance with quantitative response variable
Last Time – Summarizing Quantitative Variables Graphical summaries: (parallel) dotplots,
boxplots, stemplots, histograms Shape (skewed?, “even”?), center, spread,
unusual observations Try several different graphs, scalings
Numerical summaries Center: median (five-number summary), mean
Mean = average of all values (not “resistant”) Median = “typical” value
Spread: interquartile range (IQR=Q3-Q1), standard deviation
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range
Width of middle 50% of data values Length of box in boxplot
• 1978 IQR = 81-58 = 23 min
• 2003 IQR = 98-87 = 11 min
• Without outliers IQR = 98-87 = 11 min
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR Standard deviation
Want to compare the distance of the observations from the mean Deviation from mean: yi- Absolute deviations Squared deviations
1
)(1
2
2
n
yys
n
ii
y
1
)(1
2
n
yys
n
ii
Old Faithful
1978 SD = 13 minutes 2003 SD = 8.5 minutes
Without outliers SD=6.9 (SD is not resistant!)
Last Time – Summarizing Quantitative Variables (cont.) Interquartile range, IQR
Width of middle 50% of data values Length of box in boxplot
Standard deviation, s Want to compare the distance of the observations
from the mean Loose interpretation as a typical deviation from the mean
of the data values
Example 3
What do we mean by variability? Most among classes A, B, C Least among classes A, B, C Most between C and D Most between D and E What about F?
PP 4
Quartiles Vocabulary
“Normal” “In lower quartile”
Center vs. spread SF vs. Raleigh (59.5 and 57 degrees) Quality control, freezer, distance of homeruns,
test scores in a class, real estate prices, drumstick breakage strength, times of cross country team, medical team response time, commuting times, earthquake strengths, weight loss
Notes on Using Minitab
Worksheets vs. Projects Saving graph windows Stacked vs. unstacked data
What’s Left?
Have learned how to perform descriptive statistics with a quantitative response variable
Found is a difference in average rainfall amounts between seeded and unseeded clouds (442 acre-feet vs. 164.6 acre-feet).
Are you convinced that this reflects a true treatment effect from cloud seeding?
Example 1: Sleep Deprivation and Visual Learning “Visual discrimination learning requires post-
training sleep,” Stickgold, R., James, L., & Hobson, J.A. Nature Neuroscience, 2:1237-1238, 2000.
Example 1
improvement
sleep c
onditio
n
4032241680-8-16
deprived
unrestricted
Sleep group Sample size
Mean improvement
Median improvement
Deprived 11 3.90 4.50
Unrestricted 10 19.82 16.55
Example Summary
These data come from a randomized, comparative experiment. The dotplots and descriptive statistics reveal that the sleep-deprived subjects tended to have lower improvements than those permitted unrestricted sleep.
But is this difference statistically significant?
How Decide?
All possible random assignments
Example Summary (cont.)
Randomization alone rarely produced group differences in group means as extreme as in the actual study (the p-value is less than .01). Thus, we have fairly strong evidence that the learning improvements are genuinely lower with the sleep-deprived subjects. Moreover, because this was a randomized experiment, we can draw a causal conclusion that the sleep deprivation was the cause.
Example 2
Actual study Hypothetical data
92.15 deprivededunrestrict xx 92.15 deprivededunrestrict xx
Example 3: Lifetimes of Notables
Writers (n=20) Scientists (n=20)
9 2
5 3
3 4 8
9 5 0389
76622100 6 66
751 7 0357789
9530 8 679 Leaf unit = 1 year
0 9 004
Example Summary - descriptive Graphical and numerical summaries reveal that
scientists do tend to live longer than writers, and the difference in median lifetimes is 10 years (76 for scientists, 66 for writers). Both distributions are roughly symmetric, perhaps a bit skewed to the left. The lifetimes vary more for the writers in that they range from the 20s through 90 years, as opposed to scientists ranging from the 40s through the 90s, but on the other hand, the writers’ lifetimes have a strong concentration in the 60s. Neither group has obvious outliers.
Example Summary- inferential The simulation reveals that the approximate p-value
for comparing the group means is about .07. This suggests that if there was no difference between the groups, it is unlikely, but not terribly so, for such a large difference to occur by chance alone. However, we cannot attribute the longer lifetimes to the choice of occupation, because this observational study does not control for confounding variables. One explanation for the observed tendency is that scientists require more formal training in order to succeed than writers, so someone who dies young but famous is more likely to have achieved fame as a writer than as a scientist.
flexibilty
For Thursday
PP 5 See HW handout for graph
Reading Finally start talking about selecting the
observational/experimental units in the first place HW 3 Perhaps some time for groups to meet
together and brainstorm?