copyright © 2013, 2009, and 2007, pearson education, inc. chapter 12 significance tests about...

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.

CHAPTER 12

Significance Tests AboutHypotheses

•TESTING HYPOTHESES ABOUT

PROPORTIONS

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.2

HYPOTHESIS TESTING ABOUT PROPORTIONS

OBJECTIVES• DEFINE THE NULL AND ALTERNATIVE

HYPOTHESES

• STATE CONCLUSIONS TO HYPOTHESES TESTS

• DISTINGUISH BETWEEN TYPE I AND TYPE II ERRORS


DEFINITION

• A HYPOTHESIS IS A STATEMENT REGARDING A CHARACTERISTIC OF ONE OR MORE POPULATIONS.

• HYPOTHESIS TESTING IS A PROCEDURE, BASED ON SAMPLE EVIDENCE AND PROBABILITY, USED TO TEST STATEMENTS REGARDING A CHARACTERISTIC OF ONE OR MORE POPULATION.


PROBLEM FORMULATION

• SUPPOSE WE TOSSED A COIN 100 TIMES AND WE OBTAINED 38 HEADS AND 62 TAILS. IS THE COIN BIASED?

• THERE IS NO WAY TO SAY YES OR NO WITH 100% CERTAINTY. BUT WE MAY EVALUATE THE STRENGTH OF SUPPORT TO THE HYPOTHESIS [OR CLAIM] THAT “THE COIN IS BIASED.”


THE NULL AND ALTERNATIVE HYPOTHESES

Null hypothesis: Represented by H0, is a statement that there is nothing happening. Generally thought of as the status quo, or no relationship, or no difference. Usually the researcher hopes to disprove or reject the null hypothesis.

Alternative hypothesis: Represented by Ha, is a statement that something is happening. In most situations, it is what the researcher hopes to prove. It may be a statement that the assumed status quo is false, or that there is a relationship, or that there is a difference.


NULL AND ALTERNATIVE HYPOTHESES

• NULL HYPOTHESIS 1. ESTABLISHED FACT;2. A STATEMENT THAT WE EXPECT DATA TO

CONTRADICT;3. NO CHANGE OF PARAMETERS.• ALTERNATIVE HYPOTHESIS 1. NEW CONJECTURE;2. YOUR CLAIM;3. A STATEMENT THAT NEEDS A STRONG

SUPPORT FROM DATA TO CLAIM IT;4. CHANGE OF PARAMETERS


NULL AND ALTERNATIVE HYPOTHESES

THE NULL HYPOTHESIS IS A STATEMENT TO BE TESTED. IT IS A STATEMENT OF NO CHANGE, NO EFFECT, OR NO DIFFERENCE AND IS ASSUMED TRUE UNTIL EVIDENCE INDICATES OTHERWISE.

THE ALTERNATIVE HYPOTHESIS IS A STATEMENT THAT WE ARE TRYING TO FIND EVIDENCE TO SUPPORT.


WRITING NULL AND ALTERNATIVE HYPOTHESES

Possible null and alternative hypotheses:

1. H0: p = p0 versus Ha: p p0 (two-sided)

2. H0: p = p0 versus Ha: p < p0 (one-sided)

3. H0: p = p0 versus Ha: p > p0 (one-sided)

p0 = specific value called the null value.


DEMONSTRATIVE EXAMPLE

.""

5.0;:

5.0;:0

HEADSTURNSCOINTHE

THATYPROBABILITTHEISpWHERE

pBIASEDISCOINH

pFAIRISCOINH

A


EXAMPLES

WRITE THE NULL AND ALTERNATIVE HYPOTHESES YOU WOULD USE TO TEST EACH OF THE FOLLOWING SITUATIONS.

(A) IN THE 1950s ONLY ABOUT 40% OF HIGH SCHOOL GRADUATES WENT ON TO COLLEGE. HAS THE PERCENTAGE CHANGED?

(B) 20% OF CARS OF A CERTAIN MODEL HAVE NEEDED COSTLY TRANSMISSION WORK AFTER BEING DRIVEN BETWEEN 50,000 AND 100,000 MILES. THE MANUFACTURER HOPES THAT REDESIGN OF A TRANSMISSION COMPONENT HAS SOLVED THIS PROBLEM.

(C) WE FIELD TEST A NEW FLAVOR SOFT DRINK, PLANNING TO MARKET IT ONLY IF WE ARE SURE THAT OVER 60% OF THE PEOPLE LIKE THE FLAVOR.


The Steps of a Significance Test [Applicable For Proportions and Means]

A significance test is a method of using data to summarize the evidence about a hypothesis.

A significance test about a hypothesis has five steps.

1. Assumptions

2. Hypotheses

3. Test Statistic

4. P-value

5. Conclusion


Step 1: Assumptions

A significance test assumes that the data production used randomization.

Other assumptions may include:• Assumptions about the sample size [The 10%

Condition and Success – Failure Condition].• Assumptions about the shape of the

population distribution.


Step 2: Hypothesis

A hypothesis is a statement about a population, usually of the form that a certain parameter takes a particular numerical value or falls in a certain range of values.

The main goal in many research studies is to check whether the data support certain hypotheses.


Each significance test has two hypotheses:

1. The null hypothesis is a statement that the parameter takes a particular value. It has a single parameter value.

2. The alternative hypothesis states that the parameter falls in some alternative range of values.

Step 2: Hypothesis


Null and Alternative Hypotheses

The value in the null hypothesis usually represents no effect.

The symbol denotes null hypothesis.

The value in the alternative hypothesis usually represents an effect of some type.

The symbol denotes alternative hypothesis.

The alternative hypothesis should express what the researcher hopes to show. The hypotheses should be formulated before viewing or analyzing the data!

0H

aH


Step 3: Test Statistic

A test statistic describes how far the point estimate falls from the parameter value given in the null hypothesis (usually in terms of the number of standard errors between the two).If the test statistic falls far from the value suggested by the null hypothesis in the direction specified by the alternative hypothesis, it is evidence against the null hypothesis and in favor of the alternative hypothesis.We use the test statistic to assess the evidence against the null hypothesis by giving a probability, the P-Value.


Step 4: P-value

To interpret a test statistic value, we use a probability summary of the evidence against the null hypothesis, .

First, we presume that is true. Next, we consider the sampling distribution from

which the test statistic comes.

We summarize how far out in the tail of this

sampling distribution the test statistic falls.

0H

0H


We summarize how far out in the tail the test statistic falls by the tail probability of that value and values even more extreme.

This probability is called a P-value.

The smaller the P-value, the stronger the

evidence is against .

Step 4: P-value

0H


Figure 12.1 Suppose Were True. The P-value Is the Probability of a Test Statistic Value Like the Observed One or Even More Extreme. This is the shaded area in the tail of the sampling distribution. Question: Which gives stronger evidence against the null hypothesis, a P-value of 0.20 or of 0.01? Why?

Step 4: P-value

0H


The P-value is the probability that the test statistic equals the observed value or a value even more extreme.

It is calculated by presuming that the null hypothesis is true.

The smaller the P-value, the stronger the evidence the data provide against the null hypothesis. That is, a small P-value indicates a small likelihood of observing the sampled results if the null hypothesis were true.

Step 4: P-value


Step 5: Conclusion

The conclusion of a significance test reports the P-value and interprets what it says about the question that motivated the test.

Sometimes this includes a decision about the validity of the null hypothesis .0H


SIGNIFICANCE TESTS

BACK TO HYPOTHESIS TESTING ABOUT

PROPORTIONS


Steps of a Significance Test about a Population Proportion

Step 1: Assumptions

The variable is categorical

The data are obtained using randomization The 10% Condition The sample size is sufficiently large that the

sampling distribution of the sample proportion is approximately normal:

0 015 (1 ) 15np and n p


Step 2: HypothesesThe null hypothesis has the form:

The alternative hypothesis has the form:


0 0:H p p

0:aH p p one sided test or

0:aH p p one sided test or

0:aH p p two sided test



The test statistic measures how far the sample

proportion falls from the null hypothesis value, ,

relative to what we’d expect if were true

The test statistic is:

0

0 0

ˆ

(1 )

p pz

p p

n


0p0H


Step 3:Test Statistic

The z-statistic for the significance test is

represents the sample estimate of the proportion

p0 represents the specific value in null hypothesis

n is the sample size

npp

ppz

00

0

1

ˆ

error standard null

valuenullestimate sample

p̂


Step 4: P-value

The P-value summarizes the evidence.

It describes how unusual the observed data would

be if were true.


0H


Computing P - Value

For Ha less than, find probability the test statistic z could have been equal to or less than what it is.

For Ha greater than, find probability the test statistic z could have been equal to or greater than what it is.

For Ha two-sided, p-value includes the probability areas in both extremes of the distribution of the test statistic z.


Computing P - Value


Step 5: Conclusion

We summarize the test by reporting and interpreting the P-value.



Using the P – Value to Reach a Conclusion

The level of significance, denoted by (alpha), is a value chosen by the researcher to be the borderline between when a p-value is small enough to choose the alternative hypothesis over the null hypothesis, and when it is not.

When the p-value is less than or equal to , we reject the null hypothesis. When the p-value is larger than , we cannot reject the null hypothesis. The level of significance may also be called the -level of the test.

Decision: reject H0 if the p-value is smaller than (usually 0.05, sometimes 0.10 or 0.01). In this case the result is statistically significant


Using the P – Value to Reach a Conclusion

When the p-value is small, we reject the null hypothesis or, equivalently, we accept the alternative hypothesis. “Small” is defined as a p-value , where level of significance (usually 0.05).

When the p-value is not small, we conclude that we cannot reject the null hypothesis or, equivalently, there is not enough evidence to reject the null hypothesis. “Not small” is defined as a p-value > ,where = level of significance (usually 0.05).


Examples From Practice Sheet


How Do We Interpret the P-value?

A significance test analyzes the strength of the evidence against the null hypothesis.

We start by presuming that is true.

The burden of proof is on .

0H

aH


The approach used in hypothesis testing is called a proof by contradiction.

To convince ourselves that is true, we must show that data contradict .

If the P-value is small, the data contradict and support .

How Do We Interpret the P-value?

0HaH

0HaH


Two-Sided Significance Tests

A two-sided alternative hypothesis has the form

The P-value is the two-tail probability under the standard normal curve.

We calculate this by finding the tail probability in a single tail and then doubling it.

0:aH p p


Example: Dogs Detecting Cancer by Smell

Study: Investigate whether dogs can be trained to distinguish a patient with bladder cancer by smelling compounds released in the patient’s urine.


Experiment:

Each of 6 dogs was tested with 9 trials.

In each trial, one urine sample from a bladder cancer patient was randomly place among 6 control urine samples.



Results:

In a total of 54 trials with the six dogs, the dogs made the correct selection 22 times (a success rate of 0.407).



Question to Explore:

Does this study provide strong evidence that the dogs’ predictions were better or worse than with random guessing?



Step 1: Check the sample size requirement:

Is the sample size sufficiently large to use the hypothesis

test for a population proportion?

The first, is not large enough.

We will see that the two-sided test is robust even when this assumption is not satisfied.


0 015 (1 ) 15Is np and n p

54(1/ 7) 7.7 54(6 / 7) 46.3and

0np


Step 2: Hypotheses


0 : 1/ 7H p

: 1/ 7aH p



6.5

54

)7/6)(7/1(

)7/1407.0(

z



Step 4: P-value

Figure 12.2 Calculation of P-value, When z = 5.6, for Testing Against . Presuming is true, the P-value is the two-tail probability of a test statistic value even more extreme than observed. Question Is the P-value of 0.00000002 strong evidence supporting or strong evidence against ?


0 : 1/ 7H p : 1/ 7H p 0H

0H 0H


Step 5: Conclusion

Since the P-value is very small and the sample

proportion is greater than 1/7, the evidence strongly

suggests that the dogs’ selections are better than

random guessing.



Insight:

In this study, the subjects were a convenience sample rather than a random sample from some population.

Also, the dogs were not randomly selected.

Any inferential predictions are highly tentative. They are valid only to the extent that the patients and the dogs are representative of their populations.

The predictions become more conclusive if similar results occur in other studies.



Summary: P-values for Different Alternative Hypotheses

Alternative Hypothesis

P-value

Right-tail probability

Left-tail probability

Two-tail probability

0:aH p p

0:aH p p

0:aH p p


Significance Level

• Supplementary Notes


The Significance Level Tells Us How Strong the Evidence Must Be

Sometimes we need to make a decision about whether the data provide sufficient evidence to reject .

Before seeing the data, we decide how small the P-value would need to be to reject .

This cutoff point is called the significance level.

0H

0H


Figure 12.3 The Decision in a Significance Test. Reject if the P-value is less than or equal to a chosen significance level, usually 0.05.

The Significance Level Tells Us How Strong the Evidence Must Be

0H


Significance Level

The significance level is a number such that we reject

if the P-value is less than or equal to that number.

In practice, the most common significance level is 0.05.

When we reject we say the results are statistically significant.

0H

0H


Possible Decisions in a Test of Significance

Table 12.1 Possible Decisions in a Test of Significance.


Report the P-value

Learning the actual P-value is more informative than learning only whether the test is “statistically significant at the 0.05 level”.

The P-values of 0.01 and 0.049 are both statistically significant in this sense, but the first P-value provides much stronger evidence against than the second.0H


“Do Not Reject ” Is Not the Same as Saying “Accept ”

Analogy: Legal trial

Null Hypothesis: Defendant is Innocent.

Alternative Hypothesis: Defendant is Guilty.

If the jury acquits the defendant, this does not mean that it accepts the defendant’s claim of innocence.

Innocence is plausible, because guilt has not been established beyond a reasonable doubt.

0H

0H


Deciding between a One-Sided and a Two-Sided Test?

Things to consider in deciding on the alternative hypothesis:

The context of the real problem.

In most research articles, significance tests use two-sided P-values.

Confidence intervals are two-sided.


TWO SAMPLE POPULATIONS

HYPOTHESES TESTING ABOUT

DIFFERENCE IN TWO PROPORTIONS


STEPS

Step 1

Determine null and alternative hypotheses

H0: p1 – p2 = versus

Ha: p1 – p2 or Ha: p1 – p2 < or Ha: p1 – p2 >

Watch how Population 1 and 2 are defined.


STEP 2: Verify Data Condition

Samples are independent. Sample sizes are large enough so that –

– are at least 5 and preferably at least 10.

22221111 ˆ1 and ,ˆ ,ˆ1 ,ˆ pnpnpnpn


Step 2: The Test Statistic

Under the null hypothesis, there is a common population proportion p. This common value is estimated using all the data as

The Standardized Test Statistics21

2211 ˆˆˆ

nn

pnpnp

21

21

11ˆ1ˆ

0ˆˆ

error standard null

valuenullstatistic sample

nnpp

ppz


Step 3: Assume the Null Hypothesis is True, Find the P – value

For Ha less than, the p-value is the area below z, even if z is positive.

For Ha greater than, the p-value is the area above z, even if z is negative.

For Ha two-sided, p-value is 2 area above |z|.


Steps 4 and 5

• Decide Whether or Not the Result is Statistically Significant based on p-value and Make a Conclusion in Context

• Choose a level of significance , and reject H0 if the p-value is less than (or equal to) .


Example: Prevention of Ear Infections

Question: Does the use of sweetener xylitolreduce the incidence of ear infections?

Randomized Experiment Results:

• Of 165 children on placebo, 68 got ear infection.

• Of 159 children on xylitol, 46 got ear infection


Example:

Step 1: State the null and alternative hypotheses

H0: p1 – p2 = versus Ha: p1 – p2 >

wherep1 = population proportion with ear infections on placebop2 = population proportion with ear infections on xylitol


Example:

Step 2: Verify conditions and compute z statistic

There are at least 10 children in each sample who did and did not get ear infections, so conditions are met.

123.ˆˆ and ,289.159

46ˆ ,412.

165

68ˆ 2121 pppp


Example:

35.324

114

159165

4668ˆˆˆ

21

2211

nn

pnpnp

32.2

1591

1651

35.135.

0123.

11ˆ1ˆ

0ˆˆ

21

21

nnpp

ppz


Example:

Steps 3, 4 and 5: Determine the p-value and make a conclusion in context.

The p-value is the area above z = 2.32 using Table. We

have p-value = 0.0102. So we reject the null

hypothesis, the results are “statistically significant”.


Example:

We can conclude that taking xylitol

would reduce the proportion of ear

infections in the population of similar

preschool children in comparison to

taking a placebo.


Examples From Practice Sheet


DECISIONS AND TYPES OF ERRORS

IN SIGNIFICANCE TESTS


Type I and Type II Errors

When is true, a Type I Error occurs when is rejected.

When is false, a Type II Error occurs when is not rejected.

As P(Type I Error) goes Down, P(Type II Error) goes Up.

The two probabilities are inversely related.

0H 0H

0H 0H


Significance Test Results

Table 12.2 The Four Possible Results of a Decision in a Significance Test. Type I and Type II errors are the two possible incorrect decisions. We make a correct decision if we do not reject when it is true or if we reject it when it is false.0H


Table 12.3 Possible Results of a Legal Trial

Significance Test Results


Decision Errors: Type I

If we reject when in fact is true, this is a Type I error.

If we decide there is a significant relationship in the population (reject the null hypothesis):

This is an incorrect decision only if is true.The probability of this incorrect decision is equal to

If we reject the null hypothesis when it is true and :

There really is no relationship and the extremity of the test statistic is due to chance.

About 5% of all samples from this population will lead us to incorrectly reject the null hypothesis and conclude significance.

0H 0H

0H

0.05


P(Type I Error) = Significance Level

Suppose is true. The probability of rejecting , thereby committing a Type I error, equals the significance level, , for the test.

We can control the probability of a Type I error by our choice of the significance level.

The more serious the consequences of a Type I error, the smaller should be. For example, suppose a convicted defendant gets the death penalty. Then, if a defendant is actually innocent, we would hope that the probability of conviction is smaller than 0.05.

0H 0H


Decision Errors: Type II

If we fail to reject when in fact is true, this is a Type II error.

If we decide not to reject the null hypothesis and thus allow for the plausibility of the null hypothesis. We make an incorrect decision only if is true.

The probability of this incorrect decision is denoted by .

We’ll learn how to calculate P(Type II error) later in the chapter.

0H aH

aH


LIMITATIONS OF SIGNIFICANCE TESTS


Statistical Significance Does Not Mean Practical Significance

When we conduct a significance test, its main relevance is studying whether the true parameter value is:

Above, or below, the value in and

Sufficiently different from the value in to be of practical importance.

0H

0H


What the Significance Test Tells Us

A significance test gives us information about whether the parameter differs from the value and its direction from that value.

It does not tell us about the practical significance (or importance) of the finding.

0H


When the sample size is very large, tiny deviations from the null hypothesis (with little practical consequence) may be found to be statistically significant.

When the sample size is very small, large deviations from the null hypothesis (of great practical importance) might go undetected (statistically insignificant).

Statistical significance is not the same thing as practical significance.

Statistical Significance vs. Practical Significance


A small P-value, such as 0.001, is highly statistically significant, but it does not imply an important finding in any practical sense.

In particular, whenever the sample size is large, small P-values can occur when the point estimate is near the parameter value in .

Statistical Significance vs. Practical Significance

0H


Significance Tests Are Less Useful Than Confidence Intervals

Although significance tests are useful, most statisticians believe that this method has been overemphasized in research.

A significance test merely indicates whether the particular parameter value in is plausible.

When a P-value is small, the significance test indicates that the hypothesized value is not plausible, but it tells us little about which potential parameter values are plausible.

A Confidence Interval is more informative, because it displays the entire set of believable values.

0H


Results of significance tests are often misinterpreted. Here are some important comments about significance tests, some of which we’ve already discussed:

“Do Not Reject ” does not mean “Accept ”. A P-value above 0.05 when the significance level is 0.05, does not mean that is correct. A test merely indicates whether a particular parameter value is plausible.

A confidence interval shows that there is a range of

plausible values, not just a single one.

Misinterpretations of Results of Significance Tests

0H

0H

0H


Statistical significance does not mean practical significance. A small P-value does not tell us whether the parameter value differs by much in practical terms from the value in . The P-value cannot be interpreted as the probability that is true. The P-value is

P(test statistic takes observed value or beyond in tails | true), NOT P( true | observed test statistic value).


0H

0H

0H 0H


It is misleading to report results only if they are

“statistically significant”.

Some tests may be statistically significant just by

chance.

True effects may not be as large as initial estimates

reported by the media.



THE LIKELIHOOD OF A TYPE II ERROR

(Not Rejecting , Even Though It’s False)

0H


Type II Error

A Type II error occurs in a hypothesis test when we fail to reject even though it is actually false.

This probability has more than one value because contains a range of possible values for the parameter.

To calculate the probability of a Type II error, we must perform separate calculations for various values of the parameter of interest.

0H

aH


Type II Error

Summary: For a fixed significance level , P(Type II error) decreases

as the parameter value moves farther into the

values and away from the value.

as the sample size increases.

aH 0H


Power of a Test

When is false, you want the probability of rejecting it to be high. The probability of rejecting when it is false is called the power of the test.

Power = 1 – P(Type II error) for a particular value of the parameter from the range of alternative hypothesis values.

The higher the power, the better.

In practice, it is ideal for studies to have high power while using a relatively small significance level.

0H0H


In this example, the Power of the test when p = 0.50 is:

Power = 1 – P(Type II error) = 1 – 0.02 = 0.98

Since, the higher the power the better, a test with power of 0.98 is quite good.

Power of a Test


Factors Affecting the Power of a Test

copyright © 2013, 2009, and 2007, pearson education, inc. chapter 12 significance tests about...

Documents