exam 1 median: 74 quartiles: 68, 84 interquartile range: 16 mean: 74.9 standard deviation: 12.5 z =...
TRANSCRIPT
Exam 1
Median: 74
Quartiles: 68, 84
Interquartile range: 16
Mean: 74.9
Standard deviation: 12.5
z = -1: 62.4 z = -1: 87.4
z = -1 z = +1
Worst Question: Operational definition (33/84)
Introduction to Hypothesis Testing:The Binomial Test
9/30
ESP
• Your friend claims he can predict the future• You flip a coin 5 times, and he’s right on 4• Is your friend psychic?
Two Hypotheses• Hypothesis
– A theory about how the world works– Proposed as an explanation for data– Posed as statement about population parameters
• Psychic– Some ability to predict future– Not perfect, but better than chance
• Luck– Random chance– Right half the time, wrong half the time
• Hypothesis testing– A method that uses inferential statistics to decide which of two
hypotheses the data support
Likelihood• Likelihood
– Probability distribution of a statistic, according to each hypothesis– If result is likely according to a hypothesis, we say data “support” or “are
consistent with” the hypothesis• Likelihood for f(correct)
– Psychic: hard to say; how psychic?– Luck: can work out exactly; 50/50 chance each time
Hypotheses Statistics
Likelihood
Support
Population Sample
Probability
Inference
Likelihood According to Luck
Flip Flip Flip Flip
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Y Y Y Y Y Y N Y Y Y N Y Y Y Y N N Y Y Y
Y Y Y Y N Y N Y Y N N Y Y Y N N N Y Y N
Y Y Y N Y Y N Y N Y N Y Y N Y N N Y N Y
Y Y Y N N Y N Y N N N Y Y N N N N Y N N
Y Y N Y Y Y N N Y Y N Y N Y Y N N N Y Y
Y Y N Y N Y N N Y N N Y N Y N N N N Y N
Y Y N N Y Y N N N Y N Y N N Y N N N N Y
Y Y N N N Y N N N N N Y N N N N N N N N
f(Y) likelihood
5 1/32
4 5/32
3 10/32
2 10/32
1 5/32
0 1/32
Binomial Distribution• Binary data
– A set of two-choice outcomes, e.g. yes/no, right/wrong• Binomial variable
– A statistic for binary samples– Frequency of “yes” / “right” / etc.
• Binomial distribution– Probability distribution for a binomial variable– Gives probability for each possible value, from 0 to n
• A family of distributions– Like Normal (need to specify mean and SD)– n: number of observations (sample size)– q: probability correct each time
Binomial Distribution
0 1 2 3 4 5
Count
Probability
0.00
0.10
0.20
0.30
n = 5, q = .5
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
n = 20, q = .5n = 20, q = .5
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
n = 20, q = .25
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
0.20
Formula (optional):
Frequency Frequency
Binomial Test• Hypothesis testing for binomial statistics• Null hypothesis
– Some fixed value for q, usually q = .5– Nothing interesting going on; blind chance (no ESP)
• Alternative hypothesis– q equals something else– One outcome more likely than expected by chance (ESP)
• Goal: Decide which hypothesis the data support• Strategy
– Find likelihood distribution for f(Y) according to null hypothesis– Compare actual result to this distribution– If actual result is too extreme, reject null hypothesis and accept altenative hypothesis
• “Innocent until proven guilty”– Believe null hypothesis unless compelling evidence to rule it out– Only accept ESP if luck can’t explain the data
Testing for ESP• Null hypothesis: Luck, q = .5• Alternative hypothesis: ESP, q > .5• Need to decide rules in advance
– If too extreme, abandon luck and accept ESP– How unlikely before we will give up on Luck?
• Where to draw cutoff?– Critical value: value our statistic must exceed to reject null hypothesis (Luck)
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Likelihood according to Luck: Binomial(n = 20, q = .5)
Frequency
Not at all unexpected
Luck still a viable explanation
Nearly impossible by luck alone
Another Example: Treatment Evaluation
• Do people tend to get better with some treatment?– Less depression, higher WBC, better memory, etc.– Measure who improves and who worsens– Want more people better off than worse
• Sign test– Ignore magnitude of change; just direction– Same logic as other binomial tests
• Count number of patients improved• Compare to probabilities according to chance
Structure of a Binomial Test• Binary data
– Each patient is Better or Worse Each coin prediction is Correct or Incorrect• Population parameter q
– Probability each patient will improve Probability each guess will be correct• Null hypothesis, usually q = .5
– No effect of treatment No ESP– Better or worse equally likely Correct and incorrect equally likely
• Alternative hypothesis, here q > .5– Effective treatment; more people improve ESP; guess right more often than chance
• Work out likelihood according to null hypothesis– Probability distribution for f(improve) Probability distribution for f(correct)
• Compare actual result to these probabilities– If more improve than likely by chance, If more correct than likely by chance,
accept treatment is useful abandon luck and accept ESP• Need to decide critical value
– How many patients must improve? How many times correct?
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
Prob
abili
ty
Errors
• Whatever the critical value, there will be errors– All values 0 to n are possible under null hypothesis– Even 20/20 happens once in 1,048,576 times
• Can only minimize how often errors occur• Two kinds of errors:• Type I error
– Null hypothesis is true, but we reject it– Conclude a useless treatment is effective
• Type II error– Null hypothesis is false, but we don’t reject it– Don’t recognize when a treatment is effective
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
Null Hypothesis (Bogus Treatments)
Critical Value and Error Rates
Type I Errors
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
? ??
??
?? ?
?
????
Alternative Hypothesis(Effective Treatments)
Type II Errors
Critical Value and Error Rates
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
Null Hypothesis (Bogus Treatments)
Type I Errors
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
? ??
??
?? ?
?
????
Alternative Hypothesis (Effective Treatments)
Type II Errors
• Increasing critical value reduces Type I error rate but increases Type II error rate (and vice versa)
• So, how do we decide critical value?• Two principles
– Type I errors are more important to avoid– Can’t figure out Type II error rate anyway
• Strategy– Decide how many Type I errors are acceptable– Choose critical value accordingly
Controlling Type I Error• Type I error rate
– Proportion of times, when null hypothesis is true, that we mistakenly reject it
– Fraction of bogus treatments that we conclude are effective
• Type I error rate equals total probability beyond the critical value, according to null hypothesis
• Strategy– Decide what Type I error rate
we want to allow– Pick critical value accordingly
• Alpha level (a)– Chosen Type I error rate– Usually .05 in Psychology– Determines critical value 0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
Null Hypothesis (Bogus Treatments)
Type IError Rate
Summary of Hypothesis Testing• Determine Null and Alternative Hypotheses
– Competing possibilities about a population parameter– Null is always precise; usually means “no effect”
• Find probability distribution of test statistic according to null hypothesis– Likelihood of the statistic under that hypothesis
• Choose acceptable rate of Type I errors (a)• Pick critical value of test statistic based on a
– Under Null, probability of a result past critical value equals a
• Compare actual result to critical value– If more extreme, reject Null
as unable to explain data– Otherwise, stick with Null
because it’s an adequate explanation
0 2 4 6 8 10 12 14 16 18 20
Count
Probability
0.00
0.05
0.10
0.15
Frequency
Null Hypothesis (Bogus Treatments)
Type I ErrorRate (a)
Reject Null
Keep Null
ReviewTo test whether infants can read, you show pairs of good and bad words, and count how many times the baby crawls to the good word.
What’s the null hypothesis?
A. Baby is more likely to crawl to good wordB. Baby is more likely to crawl to bad wordC. Baby is equally likely to crawl to either wordD. Baby will not choose either word
Teddy bear
Monster
ReviewWhat would be a Type II Error for this experiment?
A. Concluding babies can’t read when they really canB. Concluding babies can read when they really can’tC. Correctly concluding babies can readD. Correctly concluding babies can’t read Teddy
bear
Monster
ReviewEach baby gets 6 trials. If any baby chooses the good word ≥5 times, you declare (s)he can read.
Here are the probabilities of what will happen, according to the null hypothesis:
What’s the Type I error rate?
A. 0.09B. 0.11C. 0.82D. 0.89
Number of good words 0 1 2 3 4 5 6Probability .02 .09 .23 .31 .23 .09 .02