chapter 15 nonparametric statistics general objectives: in chapters 8–10, we presented statistical...

57
Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing their respective population parameters (usually their population means). The techniques in Chapters 8 and 9 are applicable to data that are at least quantitative, and the techniques in Chapter 10 are applicable to data that have normal distributions. The purpose of this chapter is to present several statistical tests for comparing populations for the many types of data that do not satisfy the assumptions specified in Chapters 8–10. ©1998 Brooks/Cole Publishing/ITP

Upload: lauren-johns

Post on 18-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Chapter 15 Nonparametric Statistics

General Objectives:

In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing their respective population parameters (usually their population means). The techniques in Chapters 8 and 9 are applicable to data that are at least quantitative, and the techniques in Chapter 10 are applicable to data that have normal distributions. The purpose of this chapter is to present several statistical tests for comparing populations for the many types of data that do not satisfy the assumptions specified in Chapters 8–10.

©1998 Brooks/Cole Publishing/ITP

Page 2: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Specific Topics

1. Parametric versus nonparametric tests

2. The Wilcoxon rank sum test: Independent random samples

3. The sign test for a paired experiment

4. The Wilcoxon signed-rank test for a paired experiment

5. The Kruskal-Wallis H test

6. The Friedman Fr test

7. The rank correlation coefficient

©1998 Brooks/Cole Publishing/ITP

Page 3: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.1 Introduction

Some experiments generate responses that can be ordered or ranked, but the actual value of the response cannot be measured numerically.

Here are a few examples:

- The sales abilities of four sales representatives are ranked from best to worst.

- The edibility and taste characteristics of five brands of raisin bran are rated on an arbitrary scale of 1 to 5.

- Five automobile designs are ranked from most appealing to least appealing.

Nonparametric statistical methods can be used when data does not appear to satisfy the normality and other assumptions.

©1998 Brooks/Cole Publishing/ITP

Page 4: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Parametric assumptions are often replaced by more general assumptions about the population distributions.

The ranks of the observations are often used in place of the actual measurements.

Many statisticians advocate the use of nonparametric statistics in all situations.

One alternative is to replace the values of the observations by their ranks and proceed as though the ranks were actual observations.

©1998 Brooks/Cole Publishing/ITP

Page 5: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.2 The Wilcoxon Rank Sum Test: Independent Random Samples

Two different nonparametric tests use a test statistic based on these sample ranks:

- Wilcoxon rank sum test

- Mann-Whitney U test The null hypothesis to be tested is that the two population

distributions are different.

©1998 Brooks/Cole Publishing/ITP

Page 6: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

These are the possibilities for the two populations:

- If H0 is true and the observations have come from the same or identical populations, then the observations from both samples should be randomly mixed when jointly ranked from small to large. The sum of the ranks should be comparable.

- If, on the other hand, the observations from population 1 tend to be smaller than those from population 2, then these observations would have the smaller ranks and the sum of the ranks would be “small.”

- If the observations from population 1 tend to be larger than those from population 2, the reverse would happen.

For example, see Table 15.1 for a set of observations and their ranks:

Observation x1 y1 x2 y2 x3 y3 y4

Data 2 3 4 5 6 8 9

Rank 1 2 3 4 5 6 7

©1998 Brooks/Cole Publishing/ITP

Page 7: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

The sum of ranks of the observations from a sample is called the rank sum.

Formulas for the Wilcoxon Rank Sum Statistic (for Independent Samples):

Let

T1 Sum of the ranks for the first sample

is the value of the rank sum for n1 if the observations had

been ranked from large to small. (It is not the rank sum for the second sample). Depending on the nature of the alternative hypothesis, one of these two values will be chosen as the test statistic, T.

Table 7 in Appendix I can be used to locate critical values for the test statistic. A portion if this table appears in Table 15.2.

©1998 Brooks/Cole Publishing/ITP

1211*

1 )1( TnnnT

*1T

Page 8: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Table 15.2 A portion of the 5% left-tailed critical values, Table 7 in Appendix I

n1

n2 2 3 4 5 6 7 8

3 – 6

4 – 6 11

5 3 7 12 19

6 3 8 13 20 28

7 3 8 14 21 29 39

8 4 9 15 23 31 41 51

9 4 10 16 24 33 43 54

10 4 10 17 26 35 45 56

Page 9: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

The Wilcoxon Rank Sum Test:

Let n1 denote the smaller of the two sample sizes. This sample comes from population 1. The hypotheses to be tested are :

H0 : The distribution for populations 1 and 2 are identical

versus one of three alternative hypotheses:

Ha : The distributions for populations 1 and 2 are different(a two-tailed test)

Ha : The distribution for population 1 lies to the left of thatfor population 2 ( a left-tailed test)

Ha : The distribution for population 1 lies to the right of that for population 2 (a right-tailed test)

1. Rank all n1 n2 observations from small to large.

2. Find T1, the rank sum for the observations in sample 1. This is the test statistic for a left-tailed test.

©1998 Brooks/Cole Publishing/ITP

Page 10: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

3. Find the sum of the ranks of the observations from population 1 if the assigned ranks had been reversed from large to small. ( The value of is not the sum of the ranks of the observations in sample 2). This is the test statistic for a right-tailed test.

4. The test statistic for a two-tailed test is T, the minimum of T1 and

5. H0 is rejected if the observed test statistic is less than or equal to the critical value found using table 7 in Appendix

I.

The use of Table 7 is illustrated in Example 15.1.

©1998 Brooks/Cole Publishing/ITP

,)1( 1211*

1 TnnnT

*1T

.*1T

Page 11: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Example 15.1

The wing stroke frequencies of two species of Euglossine bees

were recorded for a sample of n1 4 Euglossa mandibularis

Friese (species 1) and n2 6 Euglossa imperialis Cockerell

(species 2). The frequencies are listed in Table 15.3. Can you

conclude that the distributions of wing strokes differ for these

two species?

Table 15.3

Species 1 Species 2

235 180225 169190 180188 185

178182

©1998 Brooks/Cole Publishing/ITP

Page 12: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Solution

You first need to rank the observations from small to large, as shown in Table 15.4:

Data Species Rank

169 2 1178 2 2180 2 3180 2 4182 2 5185 2 6188 1 7190 1 8225 1 9235 1 10

The hypotheses to be tested are as follows.

©1998 Brooks/Cole Publishing/ITP

Page 13: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

H0 : The distributions of the wing stroke frequencies are the same for the two species

versus

Ha : The distributions of the wing stroke frequencies differ for the two species

Since the sample size for individuals from species 1, n1 4, is the smaller of the two sample sizes, you have

and

For a two-tailed test, the test statistic is T 10, the smaller of

For this two-tailed test with .05, you can use Table 7(b) in Appendix I with n1 4 and n2 6. The critical value of T such

that is 12, and you should reject the null hypothesis if the observed value of T is 12 or less.

©1998 Brooks/Cole Publishing/ITP

34109871 T

1034)164(4)1( 1211*

1 TnnnT

.10 and 34 *1 TT

025.2)( TP

Page 14: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Since the observed value of the test statistic, T 10, is less than 12, you can reject the hypothesis of equal distributions of wing stroke frequencies at the 5% level of significance.

A Minitab printout of the Wilcoxon rank sum test ( called Mann-Whitney by Minitab) for these data is given in Figure 15.1. You will find instructions for generating this output in the section “About Minitab” at the end of this chapter. Notice that the rank sum of the first sample is given as W 34.0, which agrees with our calculations. With a reported p-value of .0142 calculated by Minitab, you can reject the null hypothesis at the 5% level.

©1998 Brooks/Cole Publishing/ITP

Page 15: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Figure 15.1 Minitab printout for Example 15.1

Page 16: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Normal Approximation for the Wilcoxon Rank Sum Test

- Provided n1 is not too small, approximations to the probabilities

for the statistic T can be found using a normal approximation to the distribution of T.

- It can be shown that the mean and variance of T are

- The distribution of

is approximately normal with mean 0 and standard deviation 1 for values of n1 and n2 as small as 10.

- If you try this approximation for Example 15.1, you get

and

©1998 Brooks/Cole Publishing/ITP

2

)1( 211

nnnT

12

)1( and 21212

nnnn

T

T

TTz

222

)164(42

)1( 211

nnn

T

222

)164)(6(42

2

)1( 21212

nnnn

T

Page 17: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

The p-value for this test is 2P (T 34). If you use a .5 correction for

continuity in calculating the value of z because n1 and n2 are both small,

you have

The Wilcoxon Rank Sum Test for Large Samples: n1 10and n2 10

1. Null hypothesis:

H0 : The population distributions are identical.

2. Alternative hypothesis:

Ha : The two population distributions are not identical

(a two-tailed test).

or Ha : The distribution of population 1 is shifted to the right

(or left) of the distribution on population 2 (a one-tailed test )

©1998 Brooks/Cole Publishing/ITP

45.222

22)5.34(

T

TTz

Page 18: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

3. Test statistic:

4. Rejection region:

a. For a two-tailed test, reject H0 if z z/2 or z z/2 .

b. For a one-tailed test in the right tail, reject H0 if z z/2.

c. For a one-tailed test in the left tail, reject H0 if z

z/2 .

Or reject H0 if p value Tabulated values of z are found in Table 3 of Appendix I.

If procedures indicate either nonnormality or inequality of variance, then the Wilcoxon Rank Sum Test is appropriate instead of the two-sample unpaired t test.

©1998 Brooks/Cole Publishing/ITP

121

21

2121

211

nnnn

nnntz

Page 19: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.3 The Sign Test for aPaired Experiment

In Section 10.5, you used the paired-difference or matched pairs design to compare the average wear for two types of tires.

The sign test is a fairly simple procedure that can be used to compare two populations when the samples consist of paired observations.

In general, for each pair, you measure whether the first response—say, A—exceeds the second response—say, B.

The test statistic is x, the number of times that A exceeds B in the n pairs of observations.

Only pairs without ties are included in the test. Critical values for the rejection region or exact p-values can be

found using the cumulative binomial tables in Appendix I.

©1998 Brooks/Cole Publishing/ITP

Page 20: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

The Sign Test for Comparing Two Populations:

1. Null hypothesis: H0 : The two population distributions are identical and P (A exceeds B) p .5

2. Alternative hypothesis:

a. Ha : The population distributions are not identical and p .5

b. Ha : The population of A measurements is shifted to theright of the population of B measurements and p .5

c. Ha : The population of A measurements is shifted to the left of the population of B measurements and p .5

3. Test statistic: For n, the number of pairs with no ties, use x, the number of the number of times that (A B) is positive.

4. Rejection region:

a. For the two-tailed test Ha : p .5, reject H0 if x xL or

x xU , where P(x xL) 2 and P(x xU) 2 for x

having a binomial distribution with p .5.

©1998 Brooks/Cole Publishing/ITP

Page 21: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

b. For Ha : p .5, reject H0 if x xU with P(x xU)

c. For Ha : p .5, reject H0 if x xL with P(x xL)

Or calculate the p-value and reject H0 if p-value

If there are tied observations, delete the tied pairs and reduce n, the total number of pairs.

Normal Approximation for the Sign Test

When the number of pairs n is large, the critical values for rejection of H0 and the approximate p-values can be found using

a normal approximation to the distribution of x, which was discussed in Section 6.4.

Because the binomial distribution is perfectly symmetric when p .5, this approximation works very well, even for n as small as 10.

©1998 Brooks/Cole Publishing/ITP

Page 22: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Sign Test for Large Samples: n 25

1. Null hypothesizes: H0 : p .5 (one treatment is not

preferred to a second treatment)

2. Alternative hypothesis: Ha : p .5, for a two tailed test(Note: We use the two-tailed test as an example. Many analyses might require a one-tailed test.)

3. Test statistic:

4. Rejection region: Reject H0 if z z/2 or z z/2, where z/2

is the z-value from Table 3 in Appendix I corresponding to

an area of /2 in the upper tail of the normal distribution.

See Example 15.4 for an application of the sign test.

©1998 Brooks/Cole Publishing/ITP

n

nxz

5.

5.

Page 23: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Example 15.4

A production superintendent claims that there is no difference between the employee accident rates for the day versus the evening shifts in a large manufacturing plant. The number of accidents per day is recorded for both the day and evening shifts for n 100 days. It is found that the number of accidents per day for the evening shift xE exceeded the corresponding number of accidents in the day shift xD on 63 of the 100 days. Do these results provide sufficient evidence to indicate that more accidents tend to occur on one shift than on the other or, equivalently, that P(xE xD) 1/2?

©1998 Brooks/Cole Publishing/ITP

Page 24: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Solution

This study is a paired-difference experiment, with n 100 pairs of observations corresponding to the 100 days. To test the null hypothesis that the two distributions of accidents are identical, you can use the test statistic

where x is the number of days in which the number of accidents on the evening shift exceeded the number of accidents on the day shift. Then for .05, you can reject the null hypothesis if

Substituting into the formula for z, you get

©1998 Brooks/Cole Publishing/ITP

n

nxz

5.

5.

.96.1 or 96.1 zz

60.25

13

1005.

)100)(5(.63

5.

5.

n

nxz

Page 25: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

When there are doubts about the validity of the assumptions for the paired t test, both tests can be performed. If both tests reach the same conclusions, then parametric test results can be considered valid.

©1998 Brooks/Cole Publishing/ITP

Page 26: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.4 A Comparison of Statistical Tests

Definition: Power 1 P(reject H0 when Ha is true)

Since is the probability of failing to reject the null hypothesis when it is false, the power of the test is the probability of rejecting the null hypothesis when it is false and some specified alternative is true.

The power is the probability that the test will do what it was designed to do—that is, detect a departure from the null hypothesis when a departure exists.

Relative efficiency is the ratio of the sample sizes for the two test procedures required to achieve the same and for a given alternative to the null hypothesis.

©1998 Brooks/Cole Publishing/ITP

Page 27: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.5 The Wilcoxon Signed-Rank Test for a Paired Experiment

Calculating the test statistic for the Wilcoxon Signed-Rank Test:

1. Calculate the differences (x1 x1) for each of the n pairs.Differences equal to 0 are eliminated, and the number of pairs, n, is reduced accordingly.

2. Rank the absolute values of the differences by assigning 1 to the smallest, 2 to the second smallest, and so on. Tied observations are assigned the average of the ranks that would have been assigned with no ties.

3. Calculate the rank sum for the negative differences and label

this value T . Similarly, calculate T , the rank sum for the positive differences.

©1998 Brooks/Cole Publishing/ITP

Page 28: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

For a two-tailed test, use the smaller of these two quantities Tas a test statistic.

You will reject the null hypothesis if T is less than or equal to some value—say, T0 .

To detect the one-sided alternative, that distribution 1 is shifted to the right of distribution 2, use the rank sum T .

If you wish to detect a shift of distribution 2 to the right of distribution 1, use the rank sum T .

The rejection region is shown symbolically in Figure 15.2

©1998 Brooks/Cole Publishing/ITP

Page 29: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Wilcoxon Signed-Rank Test for a Paired Experiment:

1. Null hypothesis:

H0 : The two population relative frequency distributions

are identical

2. Alternative hypothesis:

Ha : The two population relative frequency distributions

differ in location (a two-tailed test).

Or Ha : The population 1 relative frequency distribution is

shifted to the right of the relative frequency distribution for population 2 (a one-tailed test).

3. Test statistic

a. For a two-tailed test, use T, the smaller of the rank sum for positive and the rank sum for negative differences.

b. For a one-tailed test (to detect the alternative hypothesis

described above), use the rank sum, T of the negative differences.

©1998 Brooks/Cole Publishing/ITP

Page 30: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

4. Rejection region

a. For a two-tailed test, reject H0 if T T0, where T0 is the critical value given in Table 8 in Appendix I.

b. For a one-tailed test (to detect the alternative hypothesis described above), use the rank sum T of the negative differences. Reject H0 if T T0 .

Note: It can be shown that

Normal Approximation Test for the Wilcoxon Signed-Rank Test

Although Table 8 in Appendix I has critical values for n as large

as 50, T , like the Wilcoxon signed-rank test, will be approxi-mately normally distributed when the null hypothesis is true and n is large—say, 25 or more.

©1998 Brooks/Cole Publishing/ITP

2)1( nn

TT

Page 31: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

This enables you to construct a large-sample z test, where

Then the z statistic

can be used as a test statistic.

©1998 Brooks/Cole Publishing/ITP

4)1(

)( nn

TE

24)12)(1(2 nnn

T

24)12)(1(

4)1(

)(

nnn

nnTTET

zT

Page 32: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Thus, for a two-tailed test and .05, you can reject the hypothesis of identical population distributions when

A Large-Sample Wilcoxon Signed-Rank Test for a Paired Experiment: n 25

1. Null hypothesis: H0 : The population relative frequencydistributions 1 and 2 are identical.

2. Alternative hypothesis: Ha : The two population relative frequency distributions differ in location (a two-tailed test ).

Or Ha : The population 1 relative frequency distribution is shiftedto the right (or left) of the relative frequency distributions forpopulation 2 (a one-tailed test).

3. Test statistic:

©1998 Brooks/Cole Publishing/ITP

.96.1z

24121

41

nnn

nnTz

Page 33: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

4. Rejection region: Reject H0 if z z/2 or z z/2 for a two-

tailed test. For a one-tailed test, place all of in one tail of

the z distribution. To detect a shift in distribution 1 to the

right

of distribution 2, reject H0 when z z. To detect a shift in

the

opposite direction, reject H0 if z z.

©1998 Brooks/Cole Publishing/ITP

Page 34: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

15.6 The Kruskal-Wallis H Test for Completely Randomized Designs

The Kruskal-Wallis H test is the nonparametric alternative to the analysis of variance F test for a completely randomized design.

Procedure for Conducting the Kruskal-Wallis H test:

Suppose you are comparing k populations where

Step 1. Rank all n observations from the smallest (rank 1) to the

largest (rank n). Ties get the average rank.

Step 2. Calculate the rank sums T1, T2, , Tk for the k samples and calculate the test statistic

©1998 Brooks/Cole Publishing/ITP

)1(3)1(

12 2

nn

T

nnH

i

i

nnnn k 21

Page 35: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

Step 3. For a given value of , you can reject H0 when H exceeds . (See Figure 15.5.)

Figure 15.5 Approximate distribution of the H statistic when H0 is true

See Examples 15.6 and 15.7 for an application of the H statistic.

©1998 Brooks/Cole Publishing/ITP

2

Page 36: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

The Kruskal-Wallis H Test for Comparing More than Two Populations: Completely Randomized Design (Independent Random Samples)

1. Null hypothesis:

H0 : The k population distributions are identical

2. Alternative hypothesis:

Ha : At least two of the k population distributions differ in location.

3. Test statistic:

where

n1 Sample size for population i

Ti Rank sum for population i

n Total number of observations n1 n2 nk

)1(3)1(

12 2

nn

T

nnH

i

i

Page 37: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

4. Rejection region for a given :

with (k 1) d f

Assumptions:

- All sample sizes are greater than or equal to five.

- Ties take on the average of the ranks that they would have occupied if they had not been tied.

The Kruskal-Wallis H test is a valuable alternative to a one-way analysis of variance when the normality and equality of variance assumptions are violated.

©1998 Brooks/Cole Publishing/ITP

2H

Page 38: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

15.7 The Friedman Fr Test for Randomized Block Design

A nonparametric test for comparing the distributions of measurements for k treatments laid out in b blocks using a randomized block design.

The procedure is very similar to the Kruskal-Wallis H test.

Rank the k treatment observations within each block. Ties receive an average of the ranks occupied by the tied

observations.

The Friedman Fr Test for a Randomized Block Design:

1. Null hypothesis:

H0 : The k population distributions are identical

Page 39: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

2. Alternative hypothesis:

Ha : At least two of the k population distributions differ in location.

3. Test statistic:

where

b Number of blocks

k Number of treatments

Ti Rank sum or treatment i, i 1, 2, …, k

4. Rejection region: where is based on (k 1) df

Assumption: Either the number k treatments or the number b blocks is greater than five.

)1(3)1(

12 2 kbTkbk

F ir

,2rF 2

Page 40: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

15.8 Rank Correlation Coefficient

Two common rank correlation coefficients are the Spearman rs and the Kendall . Suppose eight elementary school science teachers have been ranked by a judge according to their ability and all have taken a “national teachers examination.” The data are listed in Table 15.11. Does the data suggest an agreement between the judge’s ranking and the examination score? That is, is there a correlation between ranks and test scores?

Page 41: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Table 15.11 Ranks and test scores for eight teachers

Teacher Judge’s Rank Examination Score

1 7 44 2 4 72 3 2 69 4 6 70 5 1 93 6 3 82 7 8 67 8 5 80

The two variables of interest are rank and test score. The former is already in rank form and the test scores can be ranked similarly, as shown in Table 15.12. The ranks for tied observations are obtained by averaging the ranks that the ties observations would have had if no ties had been observed.

Page 42: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

The Spearman rank correlation coefficient rs is calculated by using the ranks as the paired measurements on the two variables x and y in the formula for r (see Chapter 12).

Table 15.112 Ranks of data in Table 15.11

Teacher Judges Rank, xi Test Rank , yi

1 7 1

2 4 5

3 2 3

4 6 4

5 1 8

6 3 7

7 8 2

8 5 6

Page 43: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Spearman's Rank Correlation Coefficient:

where xi and yi represent the ranks of the i th pair of observations and

When there are no ties to either the x observations or the y observations, the expression for rs algebraically reduces to the simpler expression

yyxx

xys SS

Sr

n

yxyxyyxxS ii

iiiixy

n

xxxxS i

iixx

222

n

yyyyS i

iiyy

222

iiii

s yx d)n(n

dr

e wher

1

61

2

2

Page 44: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

If the number of ties is small in comparison with the number of data pairs, little error results in using this shortcut formula.

Spearman's Rank Correlation Coefficient:

1. Null hypothesis:

H0 : There is no association between the rank pairs

2. Alternative hypothesis:

Ha : There is an association between the rank pairs

(a two- tailed test).

Or

Ha : The correlation between the rank pairs is positive or

negative (a one-tailed test).

3. Test statistic: yyxx

xys SS

Sr

Page 45: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

where xi and yi represent the ranks of the i th pair of observations.

4. Rejection region:

For a two-tailed test, reject H0 if rs r0or rs r0where

r0 is

given in Table 9 in Appendix I. Double the tabulated probability to obtain the value of for the two-tailed test.

For a one-tailed test, reject H0 if rs r0 ( for an upper-

tailed test) or rs r0 (for a lower-tailed test). The -value for a

one-tailed test is the value shown in Table 9 in Appendix I.

Page 46: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Example 15.10

Calculate rs for the data in Table 15.12.

Solution:

The differences and squares of differences between the two rankings are provided in Table 15.13. Substituting values into the formula for rs , you have:

714.)164(8

)144(61

)1(

61

2

2

nn

dr is

Page 47: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Table 15.13 Differences and squares of the differences for the teacher ranks

Teacher xi yi di

1 7 1 6 36

2 4 5 1 1

3 2 3 1 1

4 6 4 2 4

5 1 8 7 49

6 3 7 4 16

7 8 2 6 36

8 5 6 1 1

Total 144

2id

Page 48: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

The Spearman rank correlation can be used as a test statistic to test the hypothesis of no association between populations.

The rejection region of a two-tailed test is shown in Figure 15.12.

The critical values of rs are given in Table 9 in Appendix I.

An abbreviated version is shown in Table 15.14.

If you calculated rs for the two data sets in Table 15.15, both

would produce a value of rs 1 because the assigned ranks for

x and y in both cases agree for all pairs (x, y).

Page 49: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Table 15.15 Two data sets with rs 1

x x

1 1 10 1

2 4 100 2

3 9 1000 3

4 16 10,000 4

5 25 100,000 5

6 36 1,000,000 6

2xy )(log10 xy

Page 50: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

15.9 Summary The nonparametric analogues of the parametric procedures

presented in Chapters 10–14 are straightforward and fairly simple to implement:

- The Wilcoxon rank sum test is the nonparametric analogue of the two-sample t test.

- The sign test and the Wilcoxon signed-rank tests are the nonparametric analogues of the paired-sample t test.

- The Kruskal-Wallis H test is the rank equivalent of the one- way analysis of variance F test.

- The Friedman Fr test is the rank equivalent of the randomized block design two-way analysis of variance F test.

- Spearman's rank correlation rs is the rank equivalent of Pearson’s correlation coefficient.

Page 51: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

Key Concepts and Formulas

I. Nonparametric Methods

1. These methods can be used when the data cannot be measured on a quantitative scale, or when

2. The numerical scale of measurement is arbitrarily set by the researcher, or when

3. The parametric assumptions such as normality or constant variance are seriously violated.

II. Wilcoxon Rank Sum Test: Independent Random Samples

1. Jointly rank the two samples: Designate the smaller sample as sample 1. Then

1211

*11 )1( 1 sample of Rank TnnnTT

Page 52: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

2. Use T1 to test for population 1 to the left of population 2

Use to test for population to the right of population 2.

Use the smaller of T1 and to test for a difference in the locations of the two populations.

3. Table 7 of Appendix I has critical values for the rejection of H0.

4. When the sample sizes are large, use the normal approximation:

*1T

*1T

2

)1( 211

nnnT

12

)1( 21212

nnnnT

T

TTz

Page 53: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

III. Sign Test for a Paired Experiment

1. Find x, the number of times that observation A exceeds observation B for a given pair.

2. To test for a difference in two populations, test H0 : p .05

versus a one- or two-tailed alternative.

3. Use Table 1 of Appendix I to calculate the p-value for the test.

4. When the sample sizes are large, use the normal approximation:

n

nxz

5.

5.

Page 54: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

IV. Wilcoxon Signed-Rank Test: Paired Experiment

1. Calculate the differences in the paired observations. Rank

the absolute values of the differences. Calculate the rank sums T and T for the positive and negative differences, respectively. The test statistic T is the smaller of the two

ranksums.

2. Table 8 of Appendix I has critical values for the rejection of for both one- and two-tailed tests.

3. When the sampling sizes are large, use the normal approximation:

24)12)(1(

4)1(

nnn

nnTz

Page 55: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

V. Kruskal-Wallis H Test: Completely Randomized Design

1. Jointly rank the n observations in the k samples. Calculate the

rank sums, Ti rank sum of sample i, and the test statistic

2. If the null hypothesis of equality of distributions is false, H will be unusually large, resulting in a one-tailed test.

3. For sample sizes of five or greater, the rejection region for H is based on the chi-square distribution with (k 1) degrees of freedom.

VI.The Friedman Fr Test: Randomized Block Design

1. Rank the responses within each block from 1 to k. Calculate the rank sums T1, T2, , Tk, and the test statistic

)1(3)1(

12 2

nn

T

nnH

i

i

)1(3)1(

12 2

kbTkbk

F ir

Page 56: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

2. If the null hypothesis of equality of treatment distributions is

false, Fr will be unusually large, resulting in a one-tailed test.

3. For block sizes of five or greater, the rejection region for Fr is based on the chi-square distribution with (k 1) degrees of freedom.

VII. Spearman's Rank Correlation Coefficient

1. Rank the responses for the two variables from smallest to largest.

2. Calculate the correlation coefficient for the ranked observations:

or yyxx

xys SS

Sr ties no are there if

)1(

61

2

2

nn

dr is

Page 57: Chapter 15 Nonparametric Statistics General Objectives: In Chapters 8–10, we presented statistical techniques for comparing two populations by comparing

©1998 Brooks/Cole Publishing/ITP

3. Table 9 in Appendix I gives critical values for rank correlations

significantly different from 0.

4. The rank correlation coefficient detects not only significant linear correlation but also any other monotonic relationship

between the two variables.