lectures 11/12 - york university

1

© Copyright 2004, Alan Marshall 1

Lectures 11/12Lectures 11/12

Introduction to HypothesisTesting


Course ChangesCourse Changes

>Class Schedule>Assignments>Quizzes>Practice Questions


Class ScheduleClass Schedule

>We will adjust the remaining schedule>Classes with “Review Sections” will be

eliminated to spend more timepresenting the material -• i.e., slow down the pace of delivery• at the expense of the review sessions

2


Revised ScheduleRevised ScheduleNext Two WeeksNext Two Weeks

>Oct. 25/27 - Hypothesis Testing• These slides

>Nov. 1 - Two Population Tests>Nov. 3 - Tests of Variance


AssignmentsAssignments

>Assignments are now “Pass/Fail”>If you get your PHGA assignment

grade to 80%, you will earn full(100%) credit• This will reduce time invested in the

assignments due to rounding errors andsimilar problems


QuizzesQuizzes

We will:>post some M/C questions for review

to facilitate quizzes>make quiz dates and material known

in advance>limit quizzes to 10 questions

3


GeneralGeneral

>We will provide some "recommendedpractice questions" from the book.

>This will help you to manage yourtime.

>Those who wish can always do more


InferenceInference


Where We Are GoingWhere We Are Going

>Inference generally involves one oftwo tasks:• Providing an estimate of a parameter,

with the appropriate confidence interval;OR

• Testing to see if there is statisticalevidence that an estimated statistic issimilar to or different from anhypothesized value

4


InferenceInference

>Inference is the process of gettinguseful decision making informationfrom data and statistics

>Two aspects• Estimation: Deriving estimates with

their associated confidence intervals (lastlecture)

• Hypothesis testing: Drawingconclusions based on the probability thatstatements are correct


Types of StatisticsTypes of Statistics

>We make inference about three typesof statistics:• The Mean• The Variance• A Proportion of the data that meets some

qualitative requirement

>Further, we can make theseinferences on lone populations orcomparing two populations


The Task Stays The SameThe Task Stays The Same

>Almost all Estimation problemsinvolve the same basic structure:• creating a confidence interval around the

point estimate• The size of the interval is determined by

–The standard deviation of the estimate–The distribution of the estimate

5


The Task Stays The SameThe Task Stays The Same

>Almost all Hypothesis Tests involvethe same basic steps:• Determining the null and alternate

hypotheses, the test statistic and therejection region

• The test statistic is based on thestandard deviation of the estimated value

• The rejection region is determined by thedistribution of the estimate


What Hypothesis Testing IsWhat Hypothesis Testing Is

>Hypothesis testing involves testingthe validity of a statistical statement(referred to as the Null Hypothesis)• If the null hypothesis statement is

statistically valid, we do not reject thenull hypothesis–This may seem picayune, but Statisticians

NEVER “accept” hypotheses

• If the null hypothesis statement is notstatistically valid, we reject the nullhypothesis


4 Components4 Components

>Null Hypothesis, HO, “H-naught”• Like a “straw man”, assumed to be true

until demonstrated otherwise

>Alternative Hypothesis, HA

>Test Statistic>Rejection Region

• Determined by the Confidence desired–For now, we will use 95% Confidence

6


Hypothesis StatementsHypothesis Statements

Null Hypothesis:>The average tire life is 100,000 km

HO: µ = 100,000 kmAlternative Hypothesis:

Simple Inequality>The average tire life is not 100,000

kmHA: µ ≠ 100,000 km


ApplicationApplication

>The test implied is a two-tail test>We are concerned about being too

high or too low>There is a rejection region in both the

right and left tails>The rejection regions for an α = 0.05

is shown on the next slide


Two-Tail TestTwo-Tail Test

-4 -3 -2 -1 0 1 2 3 4Z

100,000

Upper tail 2.5%rejection region

Lower tail 2.5%rejection region

7


CommentComment

>In this case, it is unlikely that wewould be performing a two-tail test

>We would only likely be concerned ifthe average tire life was too low• However, the manufacturer may not

want the average life to be too high!



Null Hypothesis:>The average tire life is at most

100,000 kmHO: µ < 100,000 km

Alternative Hypothesis:Greater Than

>The average tire life is greater than100,000 km

HA: µ >100,000 km



>The test implied is a one tail test>We are concerned about the average

tire life being too high>There is a rejection region in the right

tail>The rejection regions for an α = 0.05

is shown on the next slide

8


-4 -3 -2 -1 0 1 2 3 4Z

One-tail TestOne-tail Test

100,000

Upper tail 5%rejection region


CommentComment

>In this case, it is unlikely that wewould be performing this one-tail test

>We would only likely be concernedabout this result if the manufacturerdoes not want the average life to betoo high!



Null Hypothesis:>The average tire life is at least

100,000 kmHO: µ > 100,000 km

Alternative Hypothesis:Less Than

>The average tire life is less than100,000 km

HA: µ <100,000 km

9



>The test implied is also a one tail test>We are concerned about the average

tire life being too low>The rejection region is in the left tail>The rejection region for an α = 0.05 is

shown on the next slide


-4 -3 -2 -1 0 1 2 3 4Z

One-tail TestOne-tail Test

100,000

Lower tail 5%rejection region


CommentComment

>This is the test that we would mostlikely be performing

>We would be concerned if the averagetire life was too low

10


SynonymsSynonyms

HO: < ; HA: >>At most>No more than>No greater than>Less than or equal

to

HO: > ; HA: <>At least>No less than>More than or equal

to>Greater than or

equal to



>The Null and Alternative hypothesesmust be mutually exclusive andexhaustive

>All the possible results must beconsidered

>No result can be ambiguous• The ambiguity is encompassed in the

Confidence Level


Test Statistic & Rejection RegionTest Statistic & Rejection Region

>The Test Statistic is used todetermine the veracity or probity ofthe null hypothesis

>If the test statistic is in the rejectionregion, we reject the null hypothesis• Recall that we never “accept” a

hypothesis

11


Test StatisticTest Statistic

>What is an appropriate test statistic?>In our example, we are concerned

with average tire life>We need to be able to determine

whether the average taken from asample is significantly different fromthe hypothesized value



>How do we measure the difference betweenthe sample average and the hypothesized?

>The difference is measured in standarddeviations

nXzσ

µ−=



>Dividing by the standard deviation of themean computes the number of standarddeviations the observed sample mean isfrom the hypothesized mean

>As before, z is a number of standarddeviations

nXzσ

µ−=

12


How Confident?How Confident?

>For most business decisions, we aretypically satisfied with 95%confidence - only a 5% change ofmaking an error

>Occasionally, we might be prepared tobe less confident, at 90%

>Some applications demand a higherlevel of confidence

>Typical values are on the next slide


Confidence LevelsConfidence Levels

ConfidenceLevel

TotalTails

EachTail

CriticalValue

1−α α α/2 Ζα/2

90% = .90 .10 .05 1.64595% = .95 .05 .025 1.9698% = .98 .02 .01 2.32699% = .99 .01 .005 2.576

99.5% = .995 .005 .0025 2.807


Confidence - Not Black & WhiteConfidence - Not Black & White

0.05 0.100.001 0.01

OverwhelmingEvidence

Significant Insignificant

WeakEvidence

StrongEvidence

Very StrongEvidence

Very Weak toNo Evidence

13


Four Ways to Perform TestFour Ways to Perform Test

>Confidence Interval about theSample Statistic

>Confidence Interval aboutHypothesized Value

>z-Value - Number of StandardDeviations

>p-Value - Probability associated withthe test result


ExampleExample


ExampleExample

>A tire manufacturer has developed anew tire that they claim has anaverage life of 100,000 km. A sampleof 36 tires is tested and found to havea mean of 97,500 km. The standarddeviation is known to be 12,000 kmand normally distributed.

14


DiscussionDiscussion

>Observation: 97,500 km is below100,000 kmHowever,

>Question: Is 97,500 km sufficientlylower than 100,000 km to infer thatthe mean is less than 100,000 km?

>While the sample mean is below100,000, is it a result that could havehappened by chance?



>As noted previously, this will be aone-tail test

HO: µ > 100,000 kmHA: µ < 100,000 km

000,236000,12s

500,97X

X ==

=


Example - SolutionExample - SolutionConfidence Interval about the Sample Statistic ApproachConfidence Interval about the Sample Statistic Approach

( )

O

O

05.

H reject cannot we100,790000,100 Since

km 100,790 if H Reject We km 100,790

290,3500,97000,2645.1500,97

36000,12Z500,97

nZx

<=µ

>µ=

+=+=

+=σ

+ α

15


CommentComment

>This interval is the same as theconfidence interval that we wouldhave found if we had been estimatingthe tire life based on this sample

>Since the Hypothesized value is not inthe tail of the distribution, we do notreject the null hypothesis.

>There is not enough evidence to inferthat the mean tire life is less than100,000 km


CI from Sample StatisticCI from Sample Statistic

-4 -3 -2 -1 0 1 2 3 4Z

= 97,500

CL =100,790

µ =100,000

X


Example - SolutionExample - SolutionConfidence Interval about the Parameter ApproachConfidence Interval about the Parameter Approach

( )

O

O

05.

H reject cannot we96,710500,97x Since

km 710,69 x if H Reject We km 710,69

290,3000,100000,2645.1000,100

36000,12Z000,100

nZx

>=

<=

−=−=

−=σ

− α

16


CommentComment

>Notice that we are placing the samewidth of confidence interval aroundthe hypothesized parameter valuerather than the sample statistic

>Therefore it is quite logical that weget the same “do not reject the null”result

>This set-up would make sense in aquality control application• Set up rejection region once


-4 -3 -2 -1 0 1 2 3 4Z

CI from Hypothesized ValueCI from Hypothesized Value

= 97,500

CL =96,710

µ = 100,000

X


Z-Statistic ApproachZ-Statistic Approach

>97,500 and 100,000 are 2,500 apart.Is this very much?

>We need to measure the distance insome standardized way

>Converting this difference to anumber of standard deviations wouldwork

17


Standardized Test StatisticStandardized Test Statistic

µ−

=

σµ−

=

ns

xt

n

xZ

>In other words, we are simplymeasuring how many standarddeviations the sample mean is fromthe hypothesized mean


Example - SolutionExample - SolutionZ-Statistic ApproachZ-Statistic Approach

>Since the Z-value for 95% Confidence(or α = 0.05) is 1.645, if the teststatistic and the parameter were lessthan 1.645σ apart, then we wouldconclude that they are not far enoughapart to reject the null hypothesis

>The reverse interpretation would betrue if they were more than 1.645σapart


Rejection RegionRejection Region

>If Z is more than 1.645, then 97,500km is more than 1.645 standarddeviations below 100,000 km

645.1ZZ => α

18



>The sample result (97,500) is 1.25standard deviations below thehypothesized value of 100,000, whichis not far enough to cause us to rejectthe null hypothesis

25.1

36000,12

000,100500,97

n

xZ −=

−

=

σµ−

=


Testing Using p-valuesTesting Using p-values

>The Z-statistic can also be used todetermine the probability that the testvalue of 97,500 km is less than thehypothesized value of 100,000

>In other words, if the mean IS100,000 km, what is the probability ofgetting an average life of 36 tiresbeing 97,500 or less?


Example - SolutionExample - Solutionp-value Approachp-value Approach

>When we look up Z = -1.25 in theNormal Table, we find that the P(Z <-1.25) = P(Z > 1.25)= 0.5 - 0.3944 = 0.1056 or 10.56%

25.1

36000,12

000,100500,97

n

xZ −=

−

=

σ

µ−=

19


Interpreting p-valuesInterpreting p-values

>There is a 10.56% chance of getting aresult of 97,500 or lower purely bychance, if the mean life is truly100,000 km

>Since we had establish oursignificance level to be 5%, we cannotreject the null hypothesis


ConclusionConclusion

>While the test result was not aparticularly good one for the tiremanufacturer, the evidence was notstrong enough to infer that themanufacturer’s claim was false.


Equality of the Four MethodsEquality of the Four Methods

>When we set up either confidenceinterval, we add or subtract ZCRITICALto the appropriate benchmark

>The Z or t statistic is compared toZCRITICAL

>The Z or t statistic determines the p-value, which is compared to α, whichis the tail area for ZCRITICAL

20


CommentComment

>My approach (4 ways to test) is a bitunorthodox

>What is important: Can you tell whenthe data is telling you that somethingis wrong, different, unusual, etc.


Unknown Standard DeviationUnknown Standard Deviation

>In the previous tire example, if thestandard deviation had beenestimated from the sample, howwould things have changed?

>We would be using s not σ>We would use the t-distribution, not

the Normal• tCRIT = 1.690 (df = 35)

–from a detailed book of tables that I have


Unknown Standard DeviationUnknown Standard Deviation

>In the previous tire example, if thestandard deviation had beenestimated from the sample, howwould things have changed?

>We would be using s not σ>We would use the t-distribution, not

the Normal• tCRIT = 1.690 (df = 35)

–however, since n>30, we can use Normal toapproximate, tCRIT = 1.645

21


Example - RevisedExample - Revised

>A tire manufacturer has developed anew tire that they claim has anaverage life of 100,000 km. A sampleof 36 tires is tested and found to havea mean of 97,500 km and a standarddeviation of 12,000 km. Tire life isknown to be normally distributed.



>As noted previously, this will be aone-tail test

HO: µ > 100,000 kmHA: µ < 100,000 km

000,236000,12s

500,97X

X ==

=


Example - SolutionExample - SolutionConfidence Interval about the Sample Statistic ApproachConfidence Interval about the Sample Statistic Approach

( )

O

O

05.

H reject cannot we100,790000,100 Since

km 100,790 if H Reject We km 100,790

290,3500,97000,2645.1500,97

36000,12t500,97

nstx

<=µ

>µ=

+=+=

+=+ α

22


Example - SolutionExample - SolutionConfidence Interval about the Parameter ApproachConfidence Interval about the Parameter Approach

( )

O

O

05.

H reject cannot we96,710500,97x Since

km 710,69 x if H Reject We km 710,69

290,3000,100000,2645.1000,100

36000,12t000,100

nstx

>=

<=

−=−=

−=− α



>The sample result (97,500) was 1.25standard deviations below thehypothesized value of 100,000, whichis not far enough to cause us to rejectthe null hypothesis

25.1

36000,12

000,100500,97

ns

xt −=

−

=

µ−

=


Example - SolutionExample - Solutionp-value Approachp-value Approach

>When I looked up t = -1.25 in my bigt-Table, I found that the P(t < -1.25)= 0.1098 or 10.98%

25.1

36000,12

000,100500,97

ns

xt −=

−

=

µ−=

23


t-distribution and p-valuest-distribution and p-values

>We can readily determine p-valueswhen using the Normal distribution

>Due to the limited information in mostt-tables, it is often impossible todetermine p-values using the t-distribution• Larger, more detailed tables are available• Computer software will calculate


Examples from Lectures 9Examples from Lectures 9and 10and 10


Gas Station ExampleGas Station Example

>The company has tried a couponpromotion which will be deemedsuccessful if sales increasesignificantly from the current 18,000gallons/day

>A sample of 100 stations has a meanof 18,200

>This is larger than 18,000, but is itsignificantly larger?

24


Setting Up the TestSetting Up the Test

HO: µ <18,000HA: µ >18,000

-4 -3 -2 -1 0 1 2 3 4Z

18,000



>The standard deviation of sales isknown to be 8,000 gal/day

>Recall from the previous lecture slides



( )

( )4013.00987.05.0

25.0ZP800

000,18200,18xP200,18xPx

=−=>=

−>

σµ−

=>

>In other words, if the mean is actually18,000 gal/day, there is a 40.13%chance that a random sample of 100stations will have a mean of 18,200or more. THIS IS QUITE LARGE.

25


InterpretationInterpretation

>Now we can interpret that probability,as it is a p-value.

>With a p-value that large, we wouldnot reject the null hypothesis

>Likewise, the z-value (0.25) is quitesmall

>There is not sufficient evidence thatthe average station sales haveincreased substantially


In-Class ExampleIn-Class Example

>Tins of tuna are labeled as having adrained weight of 120g. According tothe packer, they are packed with anaverage of 122g and a standarddeviation of 5g.

>A sample of 25 tins has a meandrained weight of 119g


HypothesesHypotheses

>There are two hypotheses that can betested:• The canner’s claim of 122g• The label claim of 120g

g122:Hg122:H

A

O

<µ

≥µ

g120:Hg120:H

A

O

<µ

≥µ

26


Tuna Tin ExampleTuna Tin ExampleCanner’s Canner’s ClaimClaim

( )

( ) ( )0013.04987.05.0

00.3ZP00.3ZP

13ZP

255

122119xP119xPx

=−=>=−<=

−

<=

−

<σµ−

=<


Tuna Tin ExampleTuna Tin ExampleLabel’s ClaimLabel’s Claim

( )

( ) ( )1587.03413.05.0

00.1ZP00.1ZP

11ZP

255

120119xP119xPx

=−=>=−<=

−

<=

−

<σµ−

=<


CI from Sample StatisticCI from Sample Statistic

-4 -3 -2 -1 0 1 2 3 4Z

= 119

CL = 121.65

µL = 120

X

µc = 122

27


Tuna Tins ExampleTuna Tins Example

>Canner’s Claim:• Reject HO

• There is evidence to refute the Canner’sclaim that the tins are filled with 122g oftuna

>Label:• Do Not Reject HO

• There is not sufficient evidence tosupport a claim that the tins areunderfilled with respect to the label


Tuna Tins ExampleTuna Tins Example

>This illustrates the usefulness of theCI around the sample approach:• Once we have set up our rejection region

we can easily test hypotheses


In-Class ExampleIn-Class Example

>The bottlers of Mega Cola have tomonitor the amount of soft drink thatis filled into 2 litre bottles: Too much,and the probability of the bottlebursting increases; too little and theycould be fined for under-filling.Ideally, a sample of 36 bottles has amean of 2,005 ml. The fill amount isknown to be normally distributed witha a standard deviation of 12 ml.

28


Bottle Filling ExampleBottle Filling Example

>The 95% Confidence Interval for the meanfilling size is 2,001.08 ml to 2,008.92 ml.

( ) ( ) ( ) ( )

2,008.92 to ,001.082or

92.3005,236

1296.1005,2n

zX ±=±=σ

±


Two-Tail TestTwo-Tail Test

-4 -3 -2 -1 0 1 2 3 4Z

2,005

2,008.922,001.08


Quality Control ApplicationQuality Control Application

>Suppose that the mean fill level of2005ml was considered ideal

>How would you react if the mean of asample of 36 bottles was:• 2000ml• 2006ml• 2010ml

29



>This illustrates the usefulness ofsetting up a CI around thehypothesized value


ExampleExample

>The Natural Resources Minister claimsthe mean weight of trout in UpperTrout Lake is 2.4 kg. A game wardenhas taken a sample of 9 trout whichhas a mean weight of 2.23 kg and astandard deviation of .351 kg. Theweight of trout is normallydistributed.


Trout ExampleTrout Example

>We do not know the population standarddeviation, so we must use the t-distribution

>n = 9, df = 8>90% CI implies α/2 = 0.5>tα/2,n-1 = t.05,8 = 1.860

21762.023.29

351.086.123.2nstx 2 ±=

±=

± α

30


Trout Weight ExampleTrout Weight Example

-4 -3 -2 -1 0 1 2 3 4t

2.23 2.40


Trout Weight ExampleTrout Weight Example

>The Game Warden’s ConfidenceInterval includes the Minister’sclaimed average weight

>The Game Warden’s sample does notrefute the Minister’s claim


More ExamplesMore Examples

31


Review ProblemReview Problem

>A cereal manufacturer claims thattheir high fibre cereal contains 13 g offibre per 30 g serving. A test of 64boxes of the cereal had a mean of13.5 g per 30 g serving and astandard deviation of 1.6 g. Is theclaim correct at the 95% level?



>This is a two tail test, since the claimwas a specific amount. There is errorbeing to high and too low.

HO: µ = 13 gHA: µ ≠ 13 g

>The standard deviation is unknown,so our test statistic is actually t, not z.• However, since n>30, we use the normal

distribution


Example - SolutionExample - Solution

50.22.05.0

646.10.135.13

nsXZ ==

−=

µ−=

>The critical value for a 95% two-tail test is1.96

>We reject HO if |Z|>1.96>Since 2.50>1.96, we reject the null

hypothesis

32


ExampleExample

>A relief valve has to withstandpressure of 20,000 psi. 9 valves aretested and found to have a mean of21,000 and a standard deviation of1,500 psi. Are the valves withinspecifications? Use α = 0.05.Historically, the pressure withstoodhas been normally distributed.



>This is a two tail test, since thespecifications are for a specific value.There is error being too high and toolow.

HO: µ = 20,000 psiHA: µ ≠ 20,000 psi

>The standard deviation is unknownand our sample size is smaller than30, so our test statistic follows a t-distribution


Example - SolutionExample - Solution

00.25.00.1

95.10.200.21

nsXt ==

−=

µ−=

>The critical value for a 95% two-tail test is2.306

>We reject HO if |t|>2.306>Since 2.00<2.306, we do not reject HO

>The test does not provide evidence that thevalves are not within specifications

33


Hypothesis Tests onHypothesis Tests onProportionsProportions


ProportionsProportions

>We created confidence intervals forproportions

>We can perform hypothesis tests aswell


ComparingComparing

Quantitative Data:

Qualitative Data:

nXZσ

µ−=

( ) np1pppZ

−−

=

34


Tests of ProportionsTests of Proportions

>Instead of using the sample proportion, p,as we did when we created confidenceintervals for the standard error of theproportion, we use the hypothesizedproportion, p.

( ) np1pppZ

−−

=


ExampleExample

>A magazine publisher claims that60% of all its subscriptions come fromrenewals. You sample 200subscribers and find that 130 hadrenewed. Does this evidence supportthe publisher’s claim? Use a 95%level of confidence.

>p = 0.6, = 0.65, n = 200p


ExampleExample

>We would reject the publisher’s claim if thesample proportion was greater than 1.96σfrom 0.60.

>We do not reject HO. There is no statisticalevidence to refute the publisher’s claim.

( )443.1

03464.005.0

20040.060.060.065.0Z ==

−=

35


Type I and II ErrorsType I and II Errors


The Wrong ConclusionThe Wrong Conclusion

>If you perform at test at a 95% levelof confidence (α = 0.05), there is a5% change that you draw the wrongconclusion, rejecting the null whenthe null was in fact true


Possible ResultsPossible Results

Actually

Test Result HO is true HA is true

Do Not Reject HO No Error

Reject HO Type I Error No Error

36


Type II ErrorsType II Errors

>A Type II Error occurs when we donot reject HO, when we should berejecting HO, as it is not true.


Criminal JusticeCriminal Justice

Actually

Trial Result Innocent, HO Guilty, HA

Aquitted No Error Type II Error

Convicted Type I Error No Error


Type II ErrorsType II Errors

>We will limit the discussion of Type IIerrors to understanding• what they are• there is inverse relationship between the

probability of Type I and II errors.• if you decrease α, the probability of a

Type I error, you increase β, theprobability of a Type II error.

• increasing the sample size decreases theprobability of both Type I and Type IIerrors.

37


YOU LEARN STATISTICSYOU LEARN STATISTICSBY DOING STATISTICSBY DOING STATISTICS

lectures 11/12 - york university

Documents