exam 1 median: 74 quartiles: 68, 84 interquartile range: 16 mean: 74.9 standard deviation: 12.5 z =...

Exam 1

Median: 74

Quartiles: 68, 84

Interquartile range: 16

Mean: 74.9

Standard deviation: 12.5

z = -1: 62.4 z = -1: 87.4

z = -1 z = +1

Worst Question: Operational definition (33/84)

Introduction to Hypothesis Testing:The Binomial Test

9/30

ESP

• Your friend claims he can predict the future• You flip a coin 5 times, and he’s right on 4• Is your friend psychic?

Two Hypotheses• Hypothesis

– A theory about how the world works– Proposed as an explanation for data– Posed as statement about population parameters

• Psychic– Some ability to predict future– Not perfect, but better than chance

• Luck– Random chance– Right half the time, wrong half the time

• Hypothesis testing– A method that uses inferential statistics to decide which of two

hypotheses the data support

Likelihood• Likelihood

– Probability distribution of a statistic, according to each hypothesis– If result is likely according to a hypothesis, we say data “support” or “are

consistent with” the hypothesis• Likelihood for f(correct)

– Psychic: hard to say; how psychic?– Luck: can work out exactly; 50/50 chance each time

Hypotheses Statistics

Likelihood

Support

Population Sample

Probability

Inference

Likelihood According to Luck

Flip Flip Flip Flip

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Y Y Y Y Y Y N Y Y Y N Y Y Y Y N N Y Y Y

Y Y Y Y N Y N Y Y N N Y Y Y N N N Y Y N

Y Y Y N Y Y N Y N Y N Y Y N Y N N Y N Y

Y Y Y N N Y N Y N N N Y Y N N N N Y N N

Y Y N Y Y Y N N Y Y N Y N Y Y N N N Y Y

Y Y N Y N Y N N Y N N Y N Y N N N N Y N

Y Y N N Y Y N N N Y N Y N N Y N N N N Y

Y Y N N N Y N N N N N Y N N N N N N N N

f(Y) likelihood

5 1/32

4 5/32

3 10/32

2 10/32

1 5/32

0 1/32

Binomial Distribution• Binary data

– A set of two-choice outcomes, e.g. yes/no, right/wrong• Binomial variable

– A statistic for binary samples– Frequency of “yes” / “right” / etc.

• Binomial distribution– Probability distribution for a binomial variable– Gives probability for each possible value, from 0 to n

• A family of distributions– Like Normal (need to specify mean and SD)– n: number of observations (sample size)– q: probability correct each time

Binomial Distribution

0 1 2 3 4 5

Count

Probability

0.00

0.10

0.20

0.30

n = 5, q = .5

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

n = 20, q = .5n = 20, q = .5

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

n = 20, q = .25

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

0.20

Formula (optional):

Frequency Frequency

Binomial Test• Hypothesis testing for binomial statistics• Null hypothesis

– Some fixed value for q, usually q = .5– Nothing interesting going on; blind chance (no ESP)

• Alternative hypothesis– q equals something else– One outcome more likely than expected by chance (ESP)

• Goal: Decide which hypothesis the data support• Strategy

– Find likelihood distribution for f(Y) according to null hypothesis– Compare actual result to this distribution– If actual result is too extreme, reject null hypothesis and accept altenative hypothesis

• “Innocent until proven guilty”– Believe null hypothesis unless compelling evidence to rule it out– Only accept ESP if luck can’t explain the data

Testing for ESP• Null hypothesis: Luck, q = .5• Alternative hypothesis: ESP, q > .5• Need to decide rules in advance

– If too extreme, abandon luck and accept ESP– How unlikely before we will give up on Luck?

• Where to draw cutoff?– Critical value: value our statistic must exceed to reject null hypothesis (Luck)

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Likelihood according to Luck: Binomial(n = 20, q = .5)

Frequency

Not at all unexpected

Luck still a viable explanation

Nearly impossible by luck alone

Another Example: Treatment Evaluation

• Do people tend to get better with some treatment?– Less depression, higher WBC, better memory, etc.– Measure who improves and who worsens– Want more people better off than worse

• Sign test– Ignore magnitude of change; just direction– Same logic as other binomial tests

• Count number of patients improved• Compare to probabilities according to chance

Structure of a Binomial Test• Binary data

– Each patient is Better or Worse Each coin prediction is Correct or Incorrect• Population parameter q

– Probability each patient will improve Probability each guess will be correct• Null hypothesis, usually q = .5

– No effect of treatment No ESP– Better or worse equally likely Correct and incorrect equally likely

• Alternative hypothesis, here q > .5– Effective treatment; more people improve ESP; guess right more often than chance

• Work out likelihood according to null hypothesis– Probability distribution for f(improve) Probability distribution for f(correct)

• Compare actual result to these probabilities– If more improve than likely by chance, If more correct than likely by chance,

accept treatment is useful abandon luck and accept ESP• Need to decide critical value

– How many patients must improve? How many times correct?

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency

Prob

abili

ty

Errors

• Whatever the critical value, there will be errors– All values 0 to n are possible under null hypothesis– Even 20/20 happens once in 1,048,576 times

• Can only minimize how often errors occur• Two kinds of errors:• Type I error

– Null hypothesis is true, but we reject it– Conclude a useless treatment is effective

• Type II error– Null hypothesis is false, but we don’t reject it– Don’t recognize when a treatment is effective

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency

Null Hypothesis (Bogus Treatments)

Critical Value and Error Rates

Type I Errors

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency

? ??

??

?? ?

?

????

Alternative Hypothesis(Effective Treatments)

Type II Errors

Critical Value and Error Rates

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency


Type I Errors

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency

? ??

??

?? ?

?

????

Alternative Hypothesis (Effective Treatments)

Type II Errors

• Increasing critical value reduces Type I error rate but increases Type II error rate (and vice versa)

• So, how do we decide critical value?• Two principles

– Type I errors are more important to avoid– Can’t figure out Type II error rate anyway

• Strategy– Decide how many Type I errors are acceptable– Choose critical value accordingly

Controlling Type I Error• Type I error rate

– Proportion of times, when null hypothesis is true, that we mistakenly reject it

– Fraction of bogus treatments that we conclude are effective

• Type I error rate equals total probability beyond the critical value, according to null hypothesis

• Strategy– Decide what Type I error rate

we want to allow– Pick critical value accordingly

• Alpha level (a)– Chosen Type I error rate– Usually .05 in Psychology– Determines critical value 0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency


Type IError Rate

Summary of Hypothesis Testing• Determine Null and Alternative Hypotheses

– Competing possibilities about a population parameter– Null is always precise; usually means “no effect”

• Find probability distribution of test statistic according to null hypothesis– Likelihood of the statistic under that hypothesis

• Choose acceptable rate of Type I errors (a)• Pick critical value of test statistic based on a

– Under Null, probability of a result past critical value equals a

• Compare actual result to critical value– If more extreme, reject Null

as unable to explain data– Otherwise, stick with Null

because it’s an adequate explanation

0 2 4 6 8 10 12 14 16 18 20

Count

Probability

0.00

0.05

0.10

0.15

Frequency


Type I ErrorRate (a)

Reject Null

Keep Null

ReviewTo test whether infants can read, you show pairs of good and bad words, and count how many times the baby crawls to the good word.

What’s the null hypothesis?

A. Baby is more likely to crawl to good wordB. Baby is more likely to crawl to bad wordC. Baby is equally likely to crawl to either wordD. Baby will not choose either word

Teddy bear

Monster

ReviewWhat would be a Type II Error for this experiment?

A. Concluding babies can’t read when they really canB. Concluding babies can read when they really can’tC. Correctly concluding babies can readD. Correctly concluding babies can’t read Teddy

bear

Monster

ReviewEach baby gets 6 trials. If any baby chooses the good word ≥5 times, you declare (s)he can read.

Here are the probabilities of what will happen, according to the null hypothesis:

What’s the Type I error rate?

A. 0.09B. 0.11C. 0.82D. 0.89

Number of good words 0 1 2 3 4 5 6Probability .02 .09 .23 .31 .23 .09 .02

exam 1 median: 74 quartiles: 68, 84 interquartile range: 16 mean: 74.9 standard deviation: 12.5 z =...

Documents