chapter

146
© 2010 Pearson Prentice Hall. All rights reserved Chapter Nonparametric Statistics 15

Upload: eagan-becker

Post on 31-Dec-2015

54 views

Category:

Documents


0 download

DESCRIPTION

Chapter. 15. Nonparametric Statistics. Section. 15.1. An Overview of Nonparametric Statistics. Objective. Understand the difference between parametric statistical procedures and nonparametric statistical procedures. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Chapter

Nonparametric Statistics

15

Page 2: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

An Overview of Nonparametric Statistics

15.1

Page 3: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-3

Objective

1. Understand the difference between parametric statistical procedures and nonparametric statistical procedures

Page 4: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-4

Parametric statistical procedures are inferential procedures conducted under the assumption that the underlying distribution of the data belongs to some parametric family of distributions (such as the normal distribution).

Page 5: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-5

Nonparametric statistical procedures are inferential procedures that do not make any assumptions about the underlying distribution of the data. They do not require that the population belong to any particular parametric family of distributions (such as the normal distribution) and, therefore, are often referred to as distribution-free procedures.

Page 6: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-6

Advantages of Nonparametric Statistical Procedures

• Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly.

Page 7: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-7

• Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly.

• For some nonparametric procedures, the computations are fairly easy.

Advantages of Nonparametric Statistical Procedures

Page 8: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-8

• Most of the tests have very few requirements, so it is unlikely that these tests will be used improperly.

• For some nonparametric procedures, the computations are fairly easy.

• The procedures can be used for count data or rank data, so nonparametric methods can be used on data, such as the rankings of a movie as excellent, good, fair, or poor.

Advantages of Nonparametric Statistical Procedures

Page 9: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-9

Disadvantages of Nonparametric Statistical Procedures

• Nonparametric procedures are less efficient than parametric procedures. This means that a larger sample size is required when conducting a nonparametric procedure to have the same probability of a Type I error as the equivalent parametric procedure.

Page 10: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-10

• Nonparametric procedures often discard useful information. For example, the sign test uses only the sign of the data and rank tests merely preserve order-the magnitude of the actual data values is lost. As a result, nonparametric procedures are typically less powerful. Recall that the power of a test refers to the probability of making a Type II error. A Type II error occurs when a researcher does not reject the null hypothesis when the alternative hypothesis is true.

Disadvantages of Nonparametric Statistical Procedures

Page 11: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-11

• Because fewer requirements must be satisfied to conduct these tests, researchers sometimes use these procedures when parametric procedures can be used.

Disadvantages of Nonparametric Statistical Procedures

Page 12: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-12

Nonparametric Test Parametric Test Efficiency

Runs test for randomness

No corresponding test --

Sign test Single sample z-test or t-test

0.955 (for small samples that come from a normal population)0.75 (for large samples if data are normal)

Wilcoxon matched-pairs test

Inference about the difference of two means –dependent samples

0.955 (if the differences are normal)

Mann-Whitney test Inference about the difference of two means – independent samples

0.955 (if data are normal)

Spearman rank-correlation coefficient

Linear correlation 0.912 (if the data are bivariate normal)

Kruskal-Wallis Test One-way ANOVA 0.955 (if the data are normal)

Page 13: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-13

“In Other Words”

The lower the efficiency is, the larger the sample size must be for a nonparametric test to have the probability of a Type I error the same as it would be for its equivalent parametric test.

Page 14: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Runs Test for Randomness

15.2

Page 15: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-15

Objective

1. Perform a runs test for randomness

Page 16: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-16

A runs test for randomness is used to test whether data have been obtained or occur randomly. A run is a sequence of similar events, items, or symbols that is followed by an event, item, or symbol that is mutually exclusive from the first event, item, or symbol. The number of events, items, or symbols in a run is called its length.

Page 17: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-17

CAUTION!

Runs tests are used to test whether it is reasonable to conclude that data occur randomly, not whether the data are collected randomly. For example, we might wonder whether defective parts come off an assembly line randomly or systematically. If broken parts occur systematically (such as every fourth part), we might be led to believe that we have a broken machine. We don’t collect the data randomly; instead, we select 100 consecutive parts. We want to know whether the defective parts in the 100 selected occur randomly.

Page 18: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-18

Notation Used in Conducting a Runs Test for Randomness

• Let n represent the sample size of which there are two mutually exclusive types.

• Let n1 represent the number of observations of the first type.

• Let n2 represent the number of observations of the second type.

• Let r represent the number of runs.

Page 19: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-19

Parallel Example 1: Notation in a Runs Test for Randomness

The following data represent the league that won the World Series for the years 1996-2007. Let “AL” represent the American League and “NL” represent the National League.

AL NL AL AL AL NL AL NL AL AL NL AL

Identify the values of n, n1, n2 and r.

Page 20: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-20

Solution

Let n represent the number of World Series in the sample. Let n1 represent the number of World Series won by the American League and n2 the number of World Series won by the National League. Lastly, let r represent the number of runs.

Then, there are n =12 World Series in the sample, n1 = 8 World Series won by the American League, n2 =4 World Series won by the National League and r =9 runs.

Page 21: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-21

Test Statistic for a Runs Test for Randomness

Small-Sample Case: If n1≤20 and n2≤20, the test statistic in the runs test for randomness is r, the number of runs.

Large-Sample Case: n1>20 or n2>20, the test statistic in the runs test for randomness is

z r r

r

where

r 2n1n2

n1 and r

2n1n2 2n1n2 n n2 n 1

Page 22: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-22

Critical Values for a Runs Test for Randomness

Small-Sample Case: To find the critical value at the = 0.05 level of significance for a runs test, we use Table X if n1≤20 and n2≤20.

Large-Sample Case: If n1>20 or n2>20, the critical value is found from Table V, the standard normal table.

Page 23: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-23

Parallel Example 2: Obtaining Critical Values from Table X

Find the upper and lower critical values if n1=8 and n2=4.

Page 24: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-24

Solution

From Table X, the lower critical value is 3 and the upper critical value is 10.

Page 25: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-25

Runs Test for Randomness

To test the randomness of data, we can use the following steps, provided that

1. the sample is a sequence of observations recorded in the order of their occurrence, and

2. the observations can be categorized into two mutually exclusive categories.

Page 26: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-26

Step 1: Assume the data are random. This formsthe basis of the null and alternative hypotheses, which are structured as follows:H0: The sequence of data is randomH1: The sequence of data is not random

Page 27: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-27

Step 2: Determine a level of significance, ,based on the seriousness of making a Type I error. The level of significance is used to determine the critical value.

Note: For the small-sample case, we must use the level of significance =0.05.

Page 28: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-28

Step 3: Use the number of runs, r, to compute the test statistic.

Small-Sample Case Large-Sample Case

r

z0 r r

r

Page 29: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-29

Step 4: Compare the critical value to the teststatistic.

Small-Sample Case Large-Sample Case

If r ≤ lower critical value of r ≥ upper

critical value, reject the null hypothesis

If or

, reject the null hypothesis

z0 z 2

z0 z 2

Page 30: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-30

Step 5: State the conclusion.

Page 31: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-31

Parallel Example 3: Testing for Randomness (Small-Sample Case)

The following data represent the league that won the World Series for the years 1996-2007. Let “AL” represent the American League and “NL” represent the National League.

AL NL AL AL AL NL AL NL AL AL NL AL

Test the claim that leagues win the World Series in a non-random way at the = 0.05 level of significance.

Page 32: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-32

Solution

The sample is a sequence of observations (which league won the World Series in a particular year) recorded in the order of occurrence. The observations are in two mutually exclusive categories, American League or National League. The requirements for the test are satisfied.

Page 33: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-33

Solution

Step 1: We are testing the hypothesis that the sequence of observations is random. Thus, H0: The sequence of data is randomH1: The sequence of data is not random

Step 2: The level of significance is = 0.05. The lower critical value is 3 and the upper critical value is 10 (Parallel Example 2).

Page 34: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-34

Solution

Step 3: The test statistic is r = 9 (Parallel Example 1).

Step 4: Since the test statistic is between the lower and upper critical values, we do not reject the null hypothesis.

Step 5: There is insufficient evidence to conclude that the World Series were won by the two leagues in a nonrandom way during the years 1996-2007.

Page 35: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Inferences About Measures of Central Tendency

15.3

Page 36: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-36

Objective

1. Conduct a one-sample sign test

Page 37: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-37

A one-sample sign test is a nonparametric test that uses data, converted to plus and minus signs, to test a hypothesis regarding the median of a population. Data values equal to the assumed value of the median are ignored during the test.

Page 38: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-38

Test Statistic for a One-Sample Sign TestThe test statistic will depend on the structure of the hypothesis test and on the sample size.

Small-Sample Case: (n ≤ 25)

Two-Tailed Left-Tailed Right-Tailed

H0: M =M0 H0: M =M0 H0: M =M0

H1: M≠ M0 H1: M < M0 H1: M > M0

The test statistic, k, will be the smaller of the number of minus signs or plus signs

The test statistic, k, will be the number of plus signs.

The test statistic, k, will be the number of minus signs.

Page 39: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-39

Large-Sample Case: (n > 25)The test statistic, z, is

where n is the number of minus and plus signs and k is obtained as described in the small-sample case.

z0 k 0.5 n

2n

2

Page 40: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-40

Critical Values for a One-Sample Sign Test

Small-Sample Case: To find the critical value for a one-sample sign test, we use Table XI if n ≤ 25.

Large-Sample Case: If n >25, the critical value is found from Table V, the standard normal table. The critical value is always located in the left tail of the standard normal distribution. For a two-tailed test, the critical value is . For a left-tailed or right-tailed test, the critical value is .

z 2

z

Page 41: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-41

One-Sample Sign TestTo test hypotheses regarding the median of apopulation, we use the following steps, providedthat the sample is a random sample.Step 1: Determine the null and alternative hypotheses.

The hypotheses can be structured in one of three ways:

Note: M0 is the assumed value of the median.

Two-Tailed Left-Tailed Right-Tailed

H0: M =M0 H0: M =M0 H0: M =M0

H1: M≠ M0 H1: M < M0 H1: M > M0

Page 42: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-42

Step 2: Count the number of observations below M0, and assign them minus (-) signs. Count the number of observations above M0, and assign them plus (+) signs.

Page 43: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-43

Step 3: Select a level of significance, ,based on the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value for small samples (n ≤ 25) is found from Table XI. Thecritical value for large samples (n > 25) is found from Table V.

Page 44: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-44

Step 4: Obtain the test statistic, k.

Note that k is the smaller of the number of minus signs and plus signs in the two-tailed test, that k is the number of plus signs in the left-tailed test, and that k is the number of minus signs in the right tailed test. In addition, n is the total number of plus and minus signs.

Small-Sample Case Large-Sample Case

k

z0 k 0.5 n

2n

2

Page 45: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-45

Step 5: Compare the critical value to the teststatistic.

Small-Sample Case Large-Sample Case

If k ≤ critical value, reject the null hypothesis

Two-tailed: If , reject the null hypothesis.

Left-tailed or right-tailed:

If , reject the null hypothesis.

z0 z 2

z0 z

Page 46: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-46

Step 6: State the conclusion.

Page 47: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-47

Parallel Example 1: Conducting a One-Sample Sign Test (Small-Sample Case)

According to the United States Bureau of Labor Statistics, in 2000, the median tenure of employees with their current employer is 3.5 years. An economist believes that the median has increased since then. To test this claim, he randomly selects 16 employed individuals, determines their length of employment and obtains the following data.

0.3 0.8 0.7 3.2 10.3 1.4 0.2 0.93.6 6.3 11.2 12.8 7.3 13.0 3.8 23.6

Test the claim at the =0.05 level of significance.

Page 48: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-48

Solution

The data were obtained from a random sample so the conditions of the test are met.

Step 1: We want to know if the median tenure of employees with their current employer is greater than 3.5 years. This is a right-tailed test.

H0: M=3.5 versus H1: M > 3.5

Page 49: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-49

Solution

Step 2: There are 7 observations less than 3.5 and 9 observations greater than 3.5. Thus, we have 7 minus signs and 9 plus signs with n=16.

Step 3: Because this is a right-tailed test and n ≤ 25, we find the critical value at the = 0.05 level of significance with n=16 to be 4 (see Table XI).

Step 4: The test statistic is the number of minus signs. Thus, k =7.

Page 50: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-50

Solution

Step 5: Since the test statistic is greater than the critical value, 4, we do not reject the null hypothesis.

Step 6: There is insufficient evidence to support the hypothesis that the median tenure of employees with their employer is greater than 3.5 years.

Page 51: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Inferences About The Difference Between Two Medians: Dependent Samples

15.4

Page 52: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-52

Objective

1. Test a hypothesis about the difference between the medians of two dependent samples

Page 53: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-53

The Wilcoxon Matched-Pairs Signed-Ranks Test is a nonparametric procedure used to test the equality of two population medians by dependent sampling.

Page 54: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-54

Test Statistic for the Wilcoxon Matched-Pairs Signed-Ranks Test

The test statistic will depend on the size of the sample and on the alternative hypothesis. Let n represent the number of nonzero differences.

Small-Sample Case: (n ≤ 30)

Two-Tailed Left-Tailed Right-Tailed

H0: MD =0 H0: MD =0 H0: MD =0

H1: MD≠ 0 H1: MD < 0 H1: MD > 0

Test Statistic: T is the smaller of T+ or |T-|

Test Statistic: T = T+

Test Statistic: T = |T-|

Page 55: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-55

Large-Sample Case: (n > 30)The test statistic is given by

where T is the test statistic from the small-sample case.

z0 T

n n 1 4

n n 1 2n 1 24

Page 56: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-56

Critical Value for Wilcoxon Matched-Pairs Signed-Ranks Test

Small-Sample Case: (n ≤ 30)Using as the level of significance, the critical value(s) is (are) obtained from Table XII in Appendix A.

Two-Tailed Left-Tailed Right-Tailed

T 2

T

T

Page 57: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-57

Large-Sample Case: (n > 30)Using as the level of significance, the critical value(s) is obtained from Table V in Appendix A. The critical value is always in the left tail of the standard normal distribution.

z 2

z

Two-Tailed Left-Tailed Right-Tailed

z

Page 58: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-58

Wilcoxon Matched-Pairs Signed-Ranks Test

If a hypothesis is made regarding the medians of twopopulations, we can use the following steps to test thehypothesis, provided that

1. the samples are dependent random samples and2. the distribution of the differences is symmetric.

Although tests for verifying the symmetry of data exist,we do not present them in this text. All the data givensatisfy the second requirement.

Page 59: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-59

Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways:

Note: MD is the median of the differences of matched pairs.

Two-Tailed Left-Tailed Right-Tailed

H0: MD = 0 H0: MD = 0 H0: MD = 0

H1: MD≠ 0 H1: MD < 0 H1: MD > 0

Page 60: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-60

Step 2: Compute the differences in the matched-pairs observations. Rank the absolute value of all sample differences from smallest to largest after discarding those differences that equal 0. Handle ties by finding the mean of the ranks for tied values. Assign negative values to the ranks where the differences are negative and positive values to the ranks where the differences are positive. Find the sum of the positive ranks, T+, and the sum of the negative ranks T-.

Page 61: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-61

Step 3: Draw a boxplot of the differences to compare the sample data from the two populations. This helps to visualize the difference in the medians.

Page 62: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-62

Step 4: Choose a level of significance, ,based on the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value is found from Table XII for small samples (n ≤ 30). The criticalvalue is found from Table V for large samples (n > 30).

Page 63: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-63

Step 5: Compute the test statistic.

Small-Sample Case (n ≤ 30)

Two-Tailed Left-Tailed Right-Tailed

H0: MD =0 H0: MD =0 H0: MD =0

H1: MD≠ 0 H1: MD < 0 H1: MD > 0

Test Statistic: T is the smaller of T+ or |T-|

Test Statistic: T = T+

Test Statistic: T = |T-|

Page 64: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-64

Large-Sample Case (n > 30)

where T is the test statistic from the small-sample case.

z0 T

n n 1 4

n n 1 2n 1 24

Page 65: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-65

Step 6: Compare the critical value with the teststatistic.

Small-Sample Case Large-Sample Case

Two-tailed: If T < , reject H0.

Two-tailed: If , reject H0.

Left-tailed: If T < , reject H0.

Left-tailed: If , reject H0.

Right-tailed: If T < , reject H0.

Right-tailed: If , reject H0.

z0 z 2

z0 z

z0 z

T 2

T

T

Page 66: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-66

Step 7: State the conclusion.

Page 67: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-67

Parallel Example 1: Wilcoxon Matched-Pairs Signed-Ranks Test (Small-Sample Case)

The data on the following slide represent the cost of a one night stay in Hampton Inn Hotels and La Quinta Inn Hotels for a random sample of 10 cities. Test the claim that Hampton Inn Hotels are priced differently than La Quinta Hotels at the =0.05 level of significance.

Page 68: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-68

City Hampton Inn La Quinta

Dallas 129 105

Tampa Bay 149 96

St. Louis 149 49

Seattle 189 149

San Diego 109 119

Chicago 160 89

New Orleans 149 72

Phoenix 129 59

Atlanta 129 90

Orlando 119 69

Page 69: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-69

Solution

The data were obtained randomly. We assume that the symmetry requirement is satisfied.

Step 1: We want to know if the hotels are priced differently. This is a two-tailed test.

H0: MD =0 versus H1: MD ≠ 0

Page 70: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-70

Solution

Step 2: In order to calculate T+ and T-, we must find the differences, rank them, and then attach the sign of the difference to the ranks. The differences and their signed ranks are given in the next slide.

Page 71: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-71

City

Hampton Inn

La Quinta

D=HI-LQ |D|

Signed

RanksDallas 129 105 24 24 +2Tampa Bay 149 96 53 53 +6

St. Louis 149 49 100 100 +10Seattle 189 149 40 40 +4San Diego 109 119 -10 10 -1Chicago 160 89 71 71 +8New Orleans 149 72 77 77 +9Phoenix 129 59 70 70 +7Atlanta 129 90 39 39 +3Orlando 119 69 50 50 +5

Page 72: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-72

Solution

Step 2: From the previous slide, we find that T+=54 and |T-| = 1.

Page 73: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-73

Solution

Step 3: The figure below shows a boxplot of the differences. The boxplot indicates that the sample-median difference is about 51.

Page 74: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-74

Solution

Step 4: We are testing the hypothesis at the =0.05 level of significance. Since this is a two-tailed test and the sample size is less than 30, we find the critical value with n=10 by using Table XII and obtain T0.025=8.

Step 5: The test statistic is the smaller of T+ and |T-| which is 1.

Page 75: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-75

Solution

Step 6: The test statistic is less than the critical value (1< 8), so we reject the null hypothesis.

Step 7: There is sufficient evidence at the =0.05 level of significance to conclude that the median room price at Hampton Inns is different than the median room price at La Quinta Inns.

Page 76: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Inferences About The Difference Between Two Medians: Independent Samples

15.5

Page 77: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-77

Objective

1. Test a hypothesis about the difference between the medians of two independent samples

Page 78: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-78

The Mann-Whitney Test is a nonparametric procedure that is used to test the equality of two population medians from independent samples.

Page 79: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-79

Test Statistic for the Mann-Whitney Test

The test statistic will depend on the size of the samples from each population. Let n1 represent the sample size for population X and n2 represent the sample size for population Y.

Small-Sample Case: (n1 ≤ 20 and n2 ≤ 20 ) If S is the sum of the ranks corresponding to the sample from population X, then the test statistic, T, is given by

Note: The value of S is always obtained by summing the ranks of the sample data that correspond to Mx, the median of population X, in the hypothesis.

T S n1 n1 1

2

Page 80: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-80

Large-Sample Case: (n1 > 20 or n2 > 20 )From the Central Limit Theorem, the test statistic is given by

where T is the test statistic from the small-sample case.

z0 T

n1n2

2n1n2 n1 n2 1

12

Page 81: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-81

Critical Value for Mann-Whitney Test

Small-Sample Case: (n1 ≤ 20 and n2 ≤ 20 )Using as the level of significance, the critical value(s) is(are) obtained from Table XIII in Appendix A.

Two-Tailed Left-Tailed Right-Tailed

w 2

w

w1 n1n2 w

w1 2 n1n2 w 2

Page 82: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-82

Large-Sample Case: (n1 > 20 or n2 > 20 )Using as the level of significance, the critical value(s) is(are) obtained from Table V in Appendix A.

Two-Tailed Left-Tailed Right-Tailed

z 2 and z 2

z

z

Page 83: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-83

Mann-Whitney Test

To test hypotheses regarding the medians of twopopulations, we can use the following steps provided that

1. the samples are independent random samples and2. the shape of the distributions are the same.

Throughout this section, we will assume that thecondition that the shape of the distributions be the sameis satisfied.

Page 84: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-84

Step 1: Draw a side-by-side boxplot to compare the sample data from the two populations. This helps to visualize the difference in the medians.

Page 85: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-85

Step 2: Determine the null and alternative hypotheses. The hypotheses are structured as follows:

Note: Mx is the median of population X and My is the median of population Y.

Two-Tailed Left-Tailed Right-Tailed

H0: Mx = My H0: Mx = My H0: Mx = My

H1: Mx ≠ My H1: Mx < My H1: Mx > My

Page 86: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-86

Step 3: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for the sample from population X.

Page 87: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-87

Step 4: Choose a level of significance, ,to match the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value is found from Table XIII for small samples (n1 ≤ 20 and n2 ≤ 20) and from Table V for large samples (n1 > 20 or n2 > 20).

Page 88: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-88

Step 5: Compute the test statistic. Note that S is the sum of the ranks obtained from the sample observations from population X. In addition, n1 is the size of the sample from population X, and n2 is the size of the sample from population Y.

Small-Sample Case Large-Sample Case

T S n1 n1 1

2

z0 T

n1n2

2n1n2 n1 n2 1

12

Page 89: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-89

Step 6: Compare the critical value with the teststatistic.

Small-Sample Case Large-Sample Case

Two-tailed: If T < , or T > , reject H0.

Two-tailed: If or , reject H0.

Left-tailed: If T < , reject H0.

Left-tailed: If , reject H0.

Right-tailed: If T > , reject H0.

Right-tailed: If , reject H0.

z0 z 2

z0 z

z0 z

w 2

w

w1

w1 2

z0 z 2

Page 90: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-90

Step 7: State the conclusion.

Page 91: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-91

Parallel Example 1: Mann-Whitney Test (Small-Sample Case)

A researcher wanted to know whether “state” quarters had a weight that is more than “traditional” quarters. He randomly selected 18 “state” quarters and 16 “traditional” quarters, weighed each of them and obtained the following data.

Page 92: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-92

Page 93: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-93

Parallel Example 1: Mann-Whitney Test (Small-Sample Case)

Test the claim that state quarters have a higher median weight than traditional quarters at the =0.05 level of significance.

Page 94: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-94

Solution

Step 1:

Page 95: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-95

Solution

Step 1: Based on the boxplots, the median weight for the state quarters is higher. We want to estimate whether this difference is due to differences in the population medians or to sampling error.

Page 96: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-96

Solution

Step 2: We want to know if the median weight for the state quarters is higher than the median weight of the traditional quarters. This is a right-tailed test.

H0: MState = MTraditional versus

H1: MState > MTraditional

Page 97: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-97

Solution

Step 3: In order to calculate the test statistic, we combine the two data sets into one data set and arrange the data in ascending order. Ranks are shown on the following slide.

Page 98: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-98

5.7 (20) 5.67 (13.5) 5.67 (13.5) 5.55 (2.5)5.73 (27.5) 5.61 (6) 5.7 (20) 5.61 (6)5.7 (20) 5.67 (13.5) 5.72 (24.5) 5.58 (4)5.65 (9.5) 5.62 (8) 5.66 (11) 5.74 (30.5)5.73 (27.5) 5.65 (9.5) 5.7 (20) 5.68 (16.5)5.79 (34) 5.73 (27.5) 5.68 (16.5) 5.53 (1)5.77 (33) 5.71 (23) 5.67 (13.5) 5.55 (2.5)5.7 (20) 5.76 (32) 5.61 (6) 5.74 (30.5)5.73 (27.5) 5.72 (24.5)

TraditionalState

Page 99: Chapter

© 2010 Pearson Prentice Hall. All rights reserved 15-99

Solution

Step 3: After ranking the observations, we add up the ranks corresponding to the state quarters to obtain

S = 20+27.5+20+9.5+27.5+34+33+20+ 27.5+13.5+6+13.5+8+9.5+27.5+23+32+24.5

=376.5

Page 100: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-100

Solution

Step 4: Since this is a right-tailed test and both sample sizes are less than 20, we determine the right critical value with n1=18 and n2=16 at the =0.05 level of significance from Table XIII and obtain w0.95 = n1n2-w0.05 = 18(16)-96 = 192.

Page 101: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-101

Solution

Step 5: The test statistic is

T S n1 n1 1

2376.5

18(19)

2205.5

Page 102: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-102

Solution

Step 6: Since the test statistic is greater than the critical value (205.5 > 192), we reject the null hypothesis.

Step 7: There is sufficient evidence at the = 0.05 level of significance to conclude that the median weight of “state” quarters is greater than that of “traditional” quarters.

Page 103: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Spearman’s Rank-Correlation Test

15.6

Page 104: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-104

Objective

1. Perform Spearman’s rank-correlation test

Page 105: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-105

The Spearman’s rank-correlation test is a nonparametric procedure that is used to test hypotheses regarding the association between two variables.

Page 106: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-106

Test Statistic for Spearman’s Rank-Correlation Test

The test statistic will depend on the size of the sample, n, and on the sum of the squared differences

where di = the difference in the ranks of the two observations in the ith ordered pair.

The test statistic, rs, is also called Spearman’s rank-correlation coefficient.

rs 16 di

2n n2 1

Page 107: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-107

CAUTION!

means to square the differences first

and then add up the squared differences.

di2

Page 108: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-108

Critical Value for Spearman’s Rank-Correlation Test

Using as the level of significance, the critical value(s) is(are) obtained from Table XIV in Appendix A. For a two-tailed test, be sure to divide the level of significance, , by 2.

Page 109: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-109

Spearman’s Rank-Correlation Test

To test hypotheses regarding the association between twovariables X and Y, we can use the following steps,provided that

1. the data are a random sample of n ordered pairs and2. each pair of observations is two measurements taken

on the same individual.

Notice that there is no requirement about the form of thedistribution of the data.

Page 110: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-110

Step 1: Determine the null and alternative hypotheses which are structured as follows:

Two-Tailed One-Tailed One-Tailed

H0: X and Y are

not associated

H0: X and Y are

not associated

H0: X and Y are

not associated

H1: X and Y are

associated

H1: X and Y are

positively associated

H1: X and Y are

negatively associated

Page 111: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-111

Step 2: Rank the X-values and rank the Y-values.Compute the differences between ranks and then square these differences. Compute the sum of the squared differences.

Page 112: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-112

Step 3: Choose a level of significance, ,based on the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value is found in Table XIV.

Page 113: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-113

Step 4: Compute the test statistic.

where n is the sample size and di is the difference in the ranks of the two observations in the ith ordered pair.

rs 16 di

2n n2 1

Page 114: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-114

Step 5: Compare the critical value with the test statistic.

Hypothesis Decision RuleH0: X and Y are not associated

H1: X and Y are associated

Reject H0 if rs is greater than the critical value or if rs is less than the negative of the critical value in Table XIV

H0: X and Y are not associated

H1: X and Y are positively associated

Reject H0 if rs is greater than the critical value in Table XIV

H0: X and Y are not associated

H1: X and Y are negatively associated

Reject H0 if rs is less than the negative of the critical value in Table XIV

Page 115: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-115

Step 6: State the conclusion.

Page 116: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-116

Parallel Example 1: Spearman’s Rank-Correlation Test

Is the price of a sport’s car associated with its performance? The following data represent the ranks of the price and performance of 8 sport’s cars. Using Spearman’s Rank Correlation, determine if the two variables are associated at the = 0.05 level of significance.

Page 117: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-117

Car Rank of Price

Rank of Performance

BMW M3 Coupe 5 8

Chevy Corvette Z06 4 4

Ferrari 360 Modena 1 1

Lotus Elise 7 2

Mazda MP3 8 7

Mitsubishi Lancer Evolution VII

6 3

Porsche Boxster S 3 6

Porsche 911 Turbo 2 5

Page 118: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-118

Solution

Step 1: We are looking for evidence that price and performance of sport’s cars are associated. Let X represent the price of the sport’s car and Y represent performance. The null and alternative hypotheses are as follows:H0: X and Y are not associatedH1: X and Y are associated

Page 119: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-119

Solution

Step 2: Rank the X-values and rank the Y-values. Compute the differences in ranks and then square the differences. Calculate the sum of the squared differences to obtain . Details are on the following slide.

di2

Page 120: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-120

Rank of X

Rank of Y d = X-Y d2

5 8 -3 9

4 4 0 0

1 1 0 0

7 2 5 25

8 7 1 1

6 3 3 9

3 6 -3 9

2 5 -3 9

di2 62

Page 121: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-121

Solution

Step 3: This is a two-tailed test with n=8 and = 0.05. From Table XIV we determine the critical value to be 0.738.

Page 122: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-122

Solution

Step 4: The test statistic is

rs 16 di

2n n2 1

16(62)

8(64 1)0.262

Page 123: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-123

Solution

Step 5: Since the test statistic is less than the critical value and greater than the negative of the critical value (-0.738 < 0.262 < 0.738), we fail to reject the null hypothesis.

Step 6: There is insufficient evidence at the = 0.05 level of significance to conclude that the price and performance of sport’s cars are associated.

Page 124: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-124

Large-Sample (n > 100) Approximation

If n > 100, the test statistic for Spearman’s Rank-Correlation Test is

Compare this test statistic with the critical value obtained from the standard normal table, Table V. For a two-tailed test, the critical values are . When testing for positive association, the critical value is . When testing for negative association, the critical value is .

z0 rs n 1

z 2

z

z

Page 125: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

Section

Kruskal-Wallis Test

15.7

Page 126: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-126

Objective

1. Test a hypothesis using the Kruskal-Wallis test

Page 127: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-127

The Kruskal-Wallis Test is a nonparametric procedure that is used to test whether k independent samples come from populations with the same distribution.

Page 128: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-128

Test Statistic for the Kruskal-Wallis Test

The test statistic for the Kruskal-Wallis test is

H 12

N(N 1)

1

ni

Ri ni N 1

2

2

Page 129: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-129

A computational formula for the test statistic is

where

• Ri is the sum of the ranks of the ith sample

• is the sum of the ranks squared for the first sample• is the sum of the ranks squared for the second sample, and so on

• n1 is the number of observations in the first sample

• n2 is the number of observations in the second sample, and

so on

• N is the total number of observations (N=n1+n2+···+nk)

• k is the number of populations being compared

H 12

N(N 1)

R12

n1

R2

2

n2

Rk

2

nk

3 N 1

R12

R22

Page 130: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-130

Critical Value for Kruskal-Wallis Test

Small-Sample CaseWhen three populations are being compared and when the sample size from each population is 5 or less, the critical value is obtained from Table XV in Appendix A.

Large-Sample CaseWhen four or more populations are being compared or the sample size from one population is more than 5, the critical value is with k-1 degrees of freedom, where k is the number of populations and is the level of significance.

2

Page 131: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-131

Kruskal-Wallis Test

To test hypotheses regarding the distribution of threeor more populations, we can use the following steps,provided that two requirements are satisfied:

1. The samples are independent random samples2. The data can be ranked

Step 1: Draw side-by-side boxplots to compare the sample data from the populations. Doing so helps to visualize the differences, if any, between the medians.

Page 132: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-132

Step 2: State the null and alternative hypotheses, which are structured as follows:H0: the distributions of the populations are the sameH1: the distributions of the populations are not the same

Step 3: Rank all sample observations from smallest to largest. Handle ties by finding the mean of the ranks for tied values. Find the sum of the ranks for each sample.

Page 133: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-133

Step 4: Choose a level of significance, ,to match the seriousness of making a Type I error. The level of significance is used to determine the critical value. The critical value is found from Table XV for small samples. The critical value is

with k-1 degrees of freedom (found in Table VII) for large samples.

2

Page 134: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-134

Step 5: Compute the test statistic.

H 12

N(N 1)

R12

n1

R2

2

n2

Rk

2

nk

3 N 1

Page 135: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-135

Step 6: Compare the critical value to the teststatistic. We reject the null hypothesis if the test statistic is greater than the critical value.

Page 136: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-136

Step 7: State the conclusion.

Page 137: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-137

Parallel Example 1: Kruskal-Wallis Test

The following data represent the weight (in grams) of pennies minted at the Denver mint in 1990, 1995, and 2000. Test the claim that the distribution of penny weights differs for the three years at the = 0.05 level of significance.

Page 138: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-138

1990 1995 20002.50 2.52 2.502.50 2.54 2.482.49 2.50 2.492.53 2.48 2.502.46 2.52 2.482.50 2.50 2.522.47 2.49 2.512.53 2.53 2.492.51 2.48 2.512.49 2.55 2.502.48 2.49 2.52

Page 139: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-139

Solution

The samples are independent random samples that can be ranked. Therefore, the conditions for the Kruskal-Wallis test are met.

Page 140: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-140

Solution

Step 1: Based on boxplots of the data, the medians do not appear to differ significantly.

Page 141: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-141

Solution

Step 2: We are interested in determining whether the distribution of penny weights differs for the three years.The null and alternative hypotheses are as follows:

H0: the distribution of penny weights are the same for the three years

H1: the distribution of penny weights are not the same for the three years

Page 142: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-142

Solution

Step 3: The ranks of the pennies are given in parentheses.

1990 1995 20002.50 (17.5) 2.52 (26.5) 2.50 (17.5)2.50 (17.5) 2.54 (32) 2.48 (5)2.49 (10.5) 2.50 (17.5) 2.49 (10.5)2.53 (30) 2.48 (5) 2.50 (17.5)2.46 (1) 2.52 (26.5) 2.48 (5)2.50 (17.5) 2.50 (17.5) 2.52 (26.5)2.47 (2) 2.49 (10.5) 2.51 (23)2.53 (30) 2.53 (30) 2.49 (10.5)2.51 (23) 2.48 (5) 2.51 (23)2.49 (10.5) 2.55 (33) 2.50 (17.5)2.48 (5) 2.49 (10.5) 2.52 (26.5)

Page 143: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-143

Solution

Step 3: We sum the ranks for each of the three years to obtain the following:

Year

1990 1995 2000

Sample size n1=11 n2=11 n3=11

Sum of ranks

R1=164.5 R2=214 R3=182.5

Page 144: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-144

Solution

Step 4: Since the sample sizes for each population are greater than 5, we find the critical value from the chi-square distribution with k-1=3-1=2 degrees of freedom with = 0.05. Thus, the critical value is

0.052 5.991.

Page 145: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-145

Solution

Step 5: Note that N=11+11+11=33. The test statistic is

H 12N N 1

R12

n1

R22

n2

R32

n3

3 N 1

12

33(331)

164.52

11

214 2

11

182.52

11

3(331)

1.221

Page 146: Chapter

© 2010 Pearson Prentice Hall. All rights reserved

15-146

Solution

Step 6: Since the test statistic is less than the critical value, we fail to reject the null hypothesis.

Step 7: There is insufficient evidence at the = 0.05 level of significance to conclude that the distribution of penny weights differs for the years 1990, 1995 and 2000.