chapter 14: elements of nonparametric statistics

72
Chapter 14: Elements of Nonparametric Statistics W eight Sign ofD ifference Person Before A fter A fter-Before M rs. Sm ith 146 142 M rs. Brow n 175 178 M rs. W hite 150 147 M r. Collins 190 187 M r. G ray 220 212 M s. Collins 157 160 M rs. A llen 136 135 M rs. N oss 146 138 M s. W agner 128 132 M r. Carroll 187 187 0 M rs. Black 172 171

Upload: norman-caldwell

Post on 03-Jan-2016

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 14: Elements of Nonparametric Statistics

Chapter 14: Elements of Nonparametric Statistics

Weight Sign of DifferencePerson Before After After - BeforeMrs. Smith 146 142

Mrs. Brown 175 178

Mrs. White 150 147

Mr. Collins 190 187

Mr. Gray 220 212

Ms. Collins 157 160

Mrs. Allen 136 135

Mrs. Noss 146 138

Ms. Wagner 128 132

Mr. Carroll 187 187 0Mrs. Black 172 171

Page 2: Chapter 14: Elements of Nonparametric Statistics

Chapter Goals

• Introduce the basic concepts of nonparametric statistics, or distribution-free techniques.

• Nonparametric statistics are versatile and easy to use.

• Consider some of the most common tests and applications.

Page 3: Chapter 14: Elements of Nonparametric Statistics

14.1: Nonparametric Statistics

• Parametric methods: Assume the population is at least approximately normal, or use the central limit theorem.

• Nonparametric methods, or distribution-free methods: Assume very little about the population, subject to less confining restrictions.

Page 4: Chapter 14: Elements of Nonparametric Statistics

Nonparametric statistics have become popular:

1. Require few assumptions about the underlying population.

2. Generally easier to apply than their parametric counterparts.

3. Relatively easy to understand.

4. Can be used in situations where the normality assumptions cannot be made.

5. Generally only slightly less efficient than their parametric counterparts.

Disadvantages?

Page 5: Chapter 14: Elements of Nonparametric Statistics

14.2: Comparing Statistical Tests

• Four nonparametric tests presented in this chapter. There are many others.

• Many nonparametric tests may be used as well as certain parametric tests.

• Which statistical test is appropriate: the parametric or nonparametric.

Page 6: Chapter 14: Elements of Nonparametric Statistics

Which test is best?

1. When comparing two tests they must be equally qualified for use; they must both be appropriate test procedures.

2. Each test has a set of assumptions that must be satisfied.

3. The best test: The test that is best able to control the risks of error and at the same time keeps the sample size reasonable.

4. A larger sample size usually means a higher cost.

Page 7: Chapter 14: Elements of Nonparametric Statistics

The Risk of Error:

1. Type I Error: Controlled directly (set) by the level of significance .

2. P(type I error) = , P(type II error) = 3. We try to control .

4. The power of a statistical test = 1 The power of a test is the probability that we reject the null hypothesis when it is false (a correct decision).

If two appropriate statistical tests have the same significance level , the one with the greater power is better.

Page 8: Chapter 14: Elements of Nonparametric Statistics

The sample size:

1. Set acceptable values for and . Determine the sample size necessary to satisfy these values.

2. The statistical test that requires the smaller sample size is better.

3. Efficiency: The ratio of the sample size of the best parametric test to the sample size of the best nonparametric test when compared under a fixed set of risk values.

Example: Efficiency rating for the sign test is approximately 0.63. This means that a sample of size 63 with a parametric test will do the same job as a sample of size 100 for the sign test.

Page 9: Chapter 14: Elements of Nonparametric Statistics

To determine the choice of test:

1. Often forced to use a certain test because of the nature of the data.

2. When there is a choice, consider three factors:

a. The power of the test.

b. The efficiency of the test.

c. The data (and the sample size).

Note: The following table shows a comparison of nonparametric tests (presented in this chapter) with the parametric tests presented earlier.

Page 10: Chapter 14: Elements of Nonparametric Statistics

Comparison of Parametric and Nonparametric Tests:

Test Parametric Nonparametric Efficiency ofSituation Test Test Nonparametric TestOne mean t test Sign test 0.63

(p. 773) (p. 1219)Two t test U test 0.95independent (p. 910) (p. 1243)meansTwo t test Sign test 0.63dependent (p. 886) (p. 1225)meansCorrelation Pearson's Spearman test 0.91

(p. 1137) (p. 1274)Randomness Runs test Not meaningful

(p. 1260)

Page 11: Chapter 14: Elements of Nonparametric Statistics

14.3: The Sign Test

• Versatile, easy to apply, uses only plus and minus signs.

• Three sign test applications: confidence interval for a median, hypothesis test concerning a median, hypothesis test concerning the median difference (paired difference) for two dependent samples.

Page 12: Chapter 14: Elements of Nonparametric Statistics

Assumptions for inferences about the population median using the sign test: The n random observations forming the sample are selected independently and the population is continuous in the vicinity of the median, .

Procedure for using the sign test to obtain a confidence interval for an unknown population median, M:

1. Arrange the data in ascending order (smallest to largest):

x1 (smallest), x2, x3, . . . , xn (largest)

2. Use Table 12, Appendix B to obtain the critical value, k (the maximum allowable number of signs).

3. k indicates the number of positions to be dropped from each end of the ordered data.

4. The remaining extreme values are the bounds for a 1 confidence interval.

Confidence Interval: xk+1 to xnk

Note: Based on the binomial distribution.

Page 13: Chapter 14: Elements of Nonparametric Statistics

Example: Suppose 20 observations are selected at random and are given in ascending order (x1, x2, x3, . . . , x20).

19 21 23 28 31 32 33 34 34 35

38 41 43 43 44 46 47 48 52 55

Find a 95% confidence interval for the population median.

Solution:

Table 12: n = 20, = 0.05 k = 5

Drop the last 5 values on each end.

The confidence interval is bounded by x6 and x10.

The confidence interval: 32 to 44 (inclusive).

In general: xk1 to xnk is a 1 confidence interval for M.

Page 14: Chapter 14: Elements of Nonparametric Statistics

Single-Sample Hypothesis Test Procedure:

1. The sign test may be used when the null hypothesis concerns the population median M.

2. The test may be either one- or two-tailed.

Example: A random sample of 88 tax payers was selected and each was asked the amount of time spent preparing their federal income tax return. Test the hypothesis “the median time required to prepare a return is 8 hours” against the alternative that the median is greater than 8 hours.

The data is summarized by:

Under 8: 37; Equal 8: 3; Over 8: 48

Use the sign test with = 0.025.

Page 15: Chapter 14: Elements of Nonparametric Statistics

Solution:

The data is converted to () and () signs according to whether the data is more or less than 8.

A plus sign is assigned to each observation greater than 8.

A minus sign is assigned to each observation less than 8.

A zero is assigned to each observation equal to 8.

The sign test uses only the plus and minus signs.

The zeros are discarded.

Usable sample size = 88 3 = 85

n() = 48 n() = 37

n() n() = n = 85

Page 16: Chapter 14: Elements of Nonparametric Statistics

1. The Set-up:

a. Population parameter of concern: M, population median time to prepare a federal income tax return.

b. The null and alternative hypothesis:

H0: M = 8

Ha: M > 8

2. The Hypothesis Test Criteria:

a. Assumptions: The 88 observations were randomly selected and the variable time to prepare a return is continuous.

b. Test statistic: x = the number of the less frequent sign = n()c. Level of significance: = 0.025

Page 17: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

n = 85; x = n() = 37

4. The Probability Distribution (Classical Approach):

a. Critical value: The critical region is one-tail.

Table 12 is for two-tailed tests.

At the intersection of the column = 0.05 (= 2 0.025) and the row n = 85: k = 32.

The critical value: k = 32.

b. x is not is the critical region.

Page 18: Chapter 14: Elements of Nonparametric Statistics

4. The Probability distribution (p-Value Approach):

a. The p-value: Using Table 12: P > (0.25/2) = 0.125

Using a computer: P = 0.1928

b. The p-value is larger than the level of significance, .

5. The Results:

a. Decision: Do not reject H0.

b. Conclusion: At the 0.025 level of significance, there is no evidence to suggest the median time required to complete a federal income tax return is greater than 8 hours.

Page 19: Chapter 14: Elements of Nonparametric Statistics

Two Sample Hypothesis Test Procedure:

1. The sign test may also be used in tests concerning the median difference between paired data that result from

two dependent samples.

2. A common application: the use of before-and-after testing to determine the effectiveness of some activity.

3. The signs of the differences are used to carry out the test. Zeros are discarded.

Assumptions for inferences about median of paired differences using sign test:The paired data is selected independently and the variables are ordinal or numerical.

Page 20: Chapter 14: Elements of Nonparametric Statistics

Example: A new automobile engine additive (included during an oil change) is designed to decrease wear and improve engine performance by increasing gas mileage. Sixteen randomly selected automobiles were selected and the before-and-after miles per gallon were recorded. (The same driver was used before and after the engine treatment.) Is there any evidence to suggest the engine additive improves gas mileage? Use = 0.05.

Note: The claim being tested is that the additive improves gas mileage. Form all the differences, After Before. We will only reject the null hypothesis if there are significantly more plus signs.

Page 21: Chapter 14: Elements of Nonparametric Statistics

Data:

Car Before After Sign1 17.5 24.1 2 30.7 23.8 3 28.1 27.9 4 25.5 26.0 5 23.2 24.2 6 23.3 23.9 7 17.8 16.9 8 27.4 26.0 9 22.3 33.0 10 24.2 27.1 11 20.9 22.4 12 15.8 20.9 13 24.8 22.2 14 15.1 27.2 15 22.6 18.6

16 22.2 29.7

Page 22: Chapter 14: Elements of Nonparametric Statistics

1. The Set-up:

a. Population parameter of concern:

M, median gain in miles per gallon.

b. The null and alternative hypothesis:

H0: M = 0 (no mileage gain)

Ha: M > 0 (mileage gain)

2. The Hypothesis Test Criteria:

a. Assumptions: The automobiles were randomly selected and the variables, miles per gallon before and after, are both continuous.

b. Test statistic: The number of the less frequent sign.

In this example: x = n()c. Level of significance: 0.05

Page 23: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

n = 16; n() = 10; n() = 6

Observed value of the test statistic: x = n() = 6

4. The Probability Distribution (Classical Approach):

a. Critical Value: The critical region is one-tail.

Table 12 is for two-tailed tests.

At the intersection of the column = 0.10 (= 2 0.05) and the row n = 16: k = 4.

The critical value: k = 4.

b. x is not in the critical region.

Page 24: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (p-Value Approach):

a. The p-value: Using Table 12: P > (0.25/2) = 0.125

Using a computer: P = 0.2272

b. The p-value is larger than the level of significance, .

5. The Results:

a. Decision: Do not reject H0.

b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the engine additive increases the miles per gallon.

Page 25: Chapter 14: Elements of Nonparametric Statistics

Normal Approximation:

1. The sign test may be carried our using a normal approximation and the standard normal variable z.

2. The normal approximation is used if Table 12 does not show the desired level of significance or if n is large.

Procedure:

1. x is the number of the less frequent sign or the most frequent sign; consistent with the alternative hypothesis.

2. x is a binomial random variable with p = 0.5.

nnnpq

nnnp

x

x

21

21

21

221

Page 26: Chapter 14: Elements of Nonparametric Statistics

3. x is a binomial random variable, but it does become approximately normal for large n.

Problem: A binomial random variable is discrete and a normal random variable is continuous.

Solution: Use the continuity correction: an adjustment in the normal random variable so that the approximation is more accurate.

Continuity Correction:

a. For the binomial random variable, the area of a rectangular bar represents probability: width 1, from 1/2

unit below to 1/2 unit above the value of interest.

b. When z is used, make a 1/2 unit adjustment before calculating the observed value of z.

c. x’ is the adjusted value for x.

If x > n/2 then x’ = x (1/2)

If x < n/2 then x’ = x (1/2)

Page 27: Chapter 14: Elements of Nonparametric Statistics

6.0 6.5 7.0 7.5 8.0

0.00

0.05

0.10

0.15

0.20

p(x

)Continuity Correction Illustration:

P(x = 7) P(6.5 x 7.5)

discrete continuous

Page 28: Chapter 14: Elements of Nonparametric Statistics

1 confidence interval for M:

Using the normal approximation (including the continuity correction), the position numbers are:

The interval is xL to xU where

Note: L should be rounded down and U should be rounded up to be sure the level of confidence is at least 1 .

nzn )2/(

21

21

)(21

nzn

nzn

2)2/(

21

2 Uand

2)2/(

21

2L

Page 29: Chapter 14: Elements of Nonparametric Statistics

Example: Estimate the population median with a 95% confidence interval for a given data set with 55 observations: x1, x2, x3, . . . , x54, x55.

Solution:

The position numbers are:

L = 27.5 7.77 = 19.73; rounded down, L = 19.

U = 27.5 7.77 = 35.27; rounded up, U = 36.

Therefore: 95% confidence interval for M: x19 to x36.

77.75.27

)27.750.0(5.27

5596.121

21

)55(21

)2/(21

21

)(21

nzn

Page 30: Chapter 14: Elements of Nonparametric Statistics

Hypothesis test concerning M:

Using the standard normal distribution, z is computed using the formula:

Example: In a recent study children between the ages of 8 and 12 were reported to watch a median of 18 hours of television per week. In order to test this claim, 105 children between 8 and 12 were selected at random and the number of hours of television watched per week were recorded. A plus sign was coded if the number of hours was greater than 18, a minus sign if less than or equal to18: there were 71 plus signs and 34 minus signs. Use the normal approximation to the sign test to determine if there is any evidence to suggest the median number of hours watched is greater than 18. Use = 0.05

2/)2/('

*nnx

z

Page 31: Chapter 14: Elements of Nonparametric Statistics

Solution:

1. The Set-up:

a. Population parameter of concern: M, the median number of hours of television watched per week.

b. The null and alternative hypothesis:

H0: M = 18 () (at least as may minus signs as plus signs)

Ha: M > 18 (fewer minus signs than plus signs)

2. The Hypothesis Test Criteria:

a. Assumptions: The random sample of 105 students was independently surveyed and the variable, hours of television watched per week, is continuous.

b. Test statistic: z*

c. Level of significance: = 0.05

Page 32: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

a. Sample information: n() = 71, n() = 34

n = 105 and x = 71

b. Calculate the value of the test statistic:

4. The Probability Distribution (Classical Approach):

a. Critical value: z(0.05) = 1.65

b. z* is in the critical region.

51.3125.518

2/25.105.525.70

2/105)2/105(5.70

2/)2/('

*

nnx

z

Page 33: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = P(z* > 3.51) 0.0002

Using a computer: P 0.000224

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: At the 0.05 level of significance, there is evidence to suggest the median number of hours of television watched per week is greater than 18.

Page 34: Chapter 14: Elements of Nonparametric Statistics

14.4: The Mann-Whitney U Test

• Nonparametric alternative for the t test for the difference between two independent means.

• Null hypothesis: the two sampled populations are identical.

Page 35: Chapter 14: Elements of Nonparametric Statistics

Assumptions for inferences about two populations using the Mann-Whitney test:

The two independent random samples are independent within each sample as well as between samples, and the random variables are ordinal or numerical.

Note:

1. This test procedure is often applied in situations in which the two samples are drawn from the same population of subjects, but different treatments are used on each sample.

2. Test procedure described in the following example.

Page 36: Chapter 14: Elements of Nonparametric Statistics

Example: A recent study claimed that adults who exercise regularly tend to have lower pulse rates. To test this claim, two independent random samples of adult males were selected, one from those who exercise regularly (A), and one from those who are more sedentary (B). The data is given below. Is there any evidence to suggest that adults who exercise regularly have lower pulse rates than those who do not exercise regularly. Use = 0.05.

Sample DataA 63 71 61 66 61 63 68 69 78 70B 83 75 63 69 77 76 69 65 70 68

Page 37: Chapter 14: Elements of Nonparametric Statistics

Solution:

1. The Set-up:

a. Population parameter of concern: The distribution of pulse rates for each population of adult males.

b. The null and alternative hypothesis:

H0: Populations A and B have pulse rates with identical distributions.

Ha: The two distributions are not the same.

2. The Hypothesis Test Criteria:

a. Assumptions: The two samples are independent, and the random variable (pulse rate) is numerical.

b. Test statistic: Mann-Whitney U Statistic, described below.

c. Level of significance: = 0.05

Page 38: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

a. Sample information: Data given in the table above.

b. Calculate the value of the test statistic:

na = sample size from population A

nb = sample size from population B

Combine the two samples and order the data from smallest to largest.

Assign each observation a rank number.

The smallest observation is assigned rank 1, the next smallest is assigned rank 2, etc., up to the largest, which is assigned rank na nb.

For ties: assign each of the tied observations the mean rank of those rank positions that they occupy.

Page 39: Chapter 14: Elements of Nonparametric Statistics

The rankings:

Ranked RankedData Rank Source Data Rank Source61 1.5 A 69 11 B61 1.5 A 69 11 B63 4 A 70 13.5 A63 4 A 70 13.5 B63 4 B 71 15 A65 6 B 75 16 B66 7 A 76 17 B68 8.5 A 77 18 B68 8.5 B 78 19 A69 11 A 83 20 B

Page 40: Chapter 14: Elements of Nonparametric Statistics

To Compute the U Statistic:

1. Compute the sum of the ranks for each of the two samples: Ra and Rb.

2. Compute the U score for each sample:

3. The test statistic, U*, is the smaller of Ua and Ub.

bbb

baa Rnn

nnU

2

)1)((

aaa

bab Rnn

nnU

2

)1)((

Page 41: Chapter 14: Elements of Nonparametric Statistics

In this example:

8519155.13115.87445.15.1 aR

125201817165.1311115.864 bR

301252

)110)(10()10)(10( aU

70852

)110)(10()10)(10( bU

30* U

Page 42: Chapter 14: Elements of Nonparametric Statistics

Background:

1. Suppose the two samples are very different.

Small ranks are associated with one sample, large ranks with the other.

U* would tend to be small, and we would want to reject the null hypothesis.

2. Suppose the two samples are very similar.

The ranks are evenly distributed between the two samples.

Ua and Ub tend to be about equal, U* tends to be larger.

Note: Ua Ub = na nb

Therefore: only need to consider the smaller U-value.

Page 43: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (Classical Approach):

a. Critical value: Use Table 13B, one-tailed, = 0.05

Critical value is at the intersection of column n1 = 10 and row n2 = 10: 27

b. U* is not in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = P(U* 30, for n1 = 10 and n2 = 10)

Using Table 13: P > 0.05

Using a computer: P 0.0694

b. The p-value is not smaller than .

Page 44: Chapter 14: Elements of Nonparametric Statistics

5. The Results:

a. Decision: Do not reject H0.

b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the two populations are different.

Normal Approximation:

If the sample sizes are large, then U is approximately normal with

The standard normal distribution may be used if both sample sizes are greater than 10; the test statistic is

12

)1(

2

baba

Uba

Unnnnnn

U

UUz

*

Page 45: Chapter 14: Elements of Nonparametric Statistics

Example: The data below represents the number of hours two different cellular phone batteries worked before a recharge was necessary. Is there any evidence to suggest battery type B lasts longer than battery type A. Use the Mann-Whitney test with =0.05.

BatteryA B

44 42 51 5244 44 45 4742 49 49 4153 38 48 3741 49 40 4947 45 43 53

44 5535

Page 46: Chapter 14: Elements of Nonparametric Statistics

Solution:

1. The Set-up:

a. Population parameter of concern: The distribution of battery life for each brand.

b. The null and alternative hypothesis:

H0: The distributions for battery life are the same for both brands.

Ha: The distributions are not the same.

2. The Hypothesis Test Criteria:

a. Assumptions: The two samples are independent and the random variable, battery life, is continuous.

b. Test statistic: Mann-Whitney U statistic (normal approximation).

c. Level of significance: = 0.05

Page 47: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

a. Sample information: Data given in the table above.

b. Calculate the value of the test statistic:

Rankings for battery life:

Ranked RankedData Rank Source Data Rank Source35 1 B 45 14.5 B37 2 B 47 16.5 A38 3 A 47 16.5 B40 4 B 48 18 B41 5.5 A 49 20.5 A41 5.5 B 49 20.5 A42 7.5 A 49 20.5 B42 7.5 A 49 20.5 B43 9 B 51 23 B44 11.5 A 52 24 B44 11.5 A 53 25.5 A44 11.5 A 53 25.5 B44 11.5 B 55 27 B45 14.5 A

Page 48: Chapter 14: Elements of Nonparametric Statistics

The sums:

The U scores:

5.1555.255.205.20

5.165.145.115.115.115.75.75.53

AR

5.222275.252423

5.205.20185.165.145.1195.5421

BR

5.77*

5.1025.1552

)112)(12()15)(12(

5.775.2222

)115)(15()15)(12(

U

U

U

B

A

Page 49: Chapter 14: Elements of Nonparametric Statistics

Determine the z statistic:

9021215

2 BA

Unn

49.2042012

)28)(180(12

)11215(1215

12)1(

BABAU

nnnn

6101.49.20

905.77

U

UUz

Page 50: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (Classical Approach):

a. Critical value: z(0.05) = 1.65

b. z* is not in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = P(z* < -.6101) = .2709

b. The p-value is not smaller than .

5. The Results:

a. Decision: Do not reject H0.

b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the battery life for brand B is longer than the life for brand A.

Page 51: Chapter 14: Elements of Nonparametric Statistics

14.5: The Runs Test

• Used to test the randomness of data (or lack of randomness).

• Run: a sequence of data with a common property.

• Test statistic, V: the number of runs observed.

Page 52: Chapter 14: Elements of Nonparametric Statistics

Example: A coin is tossed 15 times and a head (H) or a tail (T) is recorded on each toss. The sequence of tosses was

T H T T T H H T T T H H T T T

The number of runs is V = 7.

T H T T T H H T T T H H T T T

Note:

1. No randomness: only two runs (all heads, then all tails, or the other way around). Or H and T alternate.

2. n1 = number of data with property 1.

n2 = number of data with property 2.

n = n1 + n2 = sample size.

Page 53: Chapter 14: Elements of Nonparametric Statistics

Assumptions for inferences about randomness using the Runs test:

Each observation may be classified into one of two categories.

Note:

1. A large number of runs, or a small number of runs, (more or less than what we would expect by chance), suggests the data is not random.

2. Another aspect of randomness: the ordering of observations above or below the mean or median of the sample.

Page 54: Chapter 14: Elements of Nonparametric Statistics

Example: Consider the following sample and use the runs test to determine if the sequence is random with respect to being above or below the mean value.

Test the null hypothesis that this sequence is random.

Use = 0.05.

Solution:

1. The Set-up:

a. Population parameter of concern: Randomness of the values above or below the mean.

24 27 30 24 29 26 33 27 32 35 25 26 24 25 31 19 15 23 18 20 28 30 25 31 24 23 28 25 26 22 24 15 26 32 17 38

Page 55: Chapter 14: Elements of Nonparametric Statistics

b. The null and the alternative hypothesis:

H0: The numbers in the sample form a random sequence with respect to the two properties above and below the mean value.

Ha: The sequence is not random.

2. The Hypothesis Test Criteria:

a. Assumptions: Each observation may be classified as above or below the mean.

b. Test statistic: V, the number of observed runs.

c. Level of significance: = 0.05

Page 56: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

Compare each number in the original sample to the value of the mean to obtain the following sequence of a’s (above) and b’s (below).

b a a b a a a a a a b a b b a b b b

b b a a b a b b a b a b b b a a b a

na = 18, nb = 18, V = 20

If n1 and n2 are both less than or equal to 20, and a two- tailed test with = 0.05 is conducted, use Table 14, Appendix B.

75.25x

Page 57: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (Classical Approach):

a. Critical value: Two-tailed test, = 0.05, Use Table 14.

Critical values at the intersection of column n1 = 18 and row n2 = 18: 12 and 26.

b. V is not in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = 2 P(V 20, for na = 18 and nb = 18)

Using Table 14: P > 0.05

Using a computer: 0.7352

b. The p-value is not smaller than the level of significance, .

Page 58: Chapter 14: Elements of Nonparametric Statistics

5. The Results:

a. Decision: Do not reject H0.

b. Conclusion: At the 0.05 level of significance, there is no evidence to reject the null hypothesis that the sequence is random with respect to above and below the mean.

Normal Approximation:

1. If n1 and n2 are larger than 20, or if is different from 0.05, a normal approximation may be used.

2. V is approximately normally distributed with

3. Test statistic:

V

VVz

)1()(

)2()2( 1

2

212

21

212121V

21

21

nnnn

nnnnnnnnnn

V

Page 59: Chapter 14: Elements of Nonparametric Statistics

Example: The letters in the following sequence represent the direction each car turned after exiting at a certain ramp on the New Jersey Turnpike (L - left, R - right).

L L R R R R R R R L L L R R R R L L R R

R R L L L L L R R L L L R L R L L R R L

L R R R R R R L L L L L R R R R R R L L

Test the null hypothesis that the sequence is random with regards to direction. Use = 0.01.

Page 60: Chapter 14: Elements of Nonparametric Statistics

Solution:

1. The Set-up:

a. Population parameter of concern: Randomness with respect to direction turned after exiting the turnpike.

b. The null and alternative hypothesis:

H0: The sequence of directions (L and R) is random.

Ha: The sequence is not random.

2. The hypothesis Test Criteria:

a. Assumptions: Each observation may be classified an L or R.

b. Test statistic: V, the number of runs.

c. Level of significance: = 0.01

Page 61: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

Calculate the value of the test statistic:

From the table above: nL = 27, nR = 33, V = 19

Determine the z statistic:

7.301332733272

12

21

21

nnnn

V

801.34473.1459)60(

)1722)(1782(

)13327()3327(

)332733272)(33272(

)1()(

)2()2(

2

221

221

212121V

nnnn

nnnnnn

078.3801.3

7.3019

V

VVz

Page 62: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (Classical Approach):

a. Critical values: z(0.005) = 2.58 and z(0.005) = 2.58

b. z* is in the critical region.

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = 2 P(z < 3.078) = 0.0021

b. The p-value is smaller that the level of significance, .

5. The Results:

a. Decision: Reject H0.

b. Conclusion: At the 0.01 level of significance, there is evidence to suggest the turning direction for cars exiting the turnpike is not random.

Page 63: Chapter 14: Elements of Nonparametric Statistics

14.6: Rank Correlation

• Charles Spearman developed the rank correlation coefficient.

• A nonparametric alternative to the linear correlation coefficient (Pearson’s product moment, r).

Page 64: Chapter 14: Elements of Nonparametric Statistics

Spearman rank correlation coefficient:

di = the difference in the paired rankings.

n = the number of pairs.

Note:

1. rS will range from 1 to 1.

2. rS used in a the same way as the linear correlation coefficient r.

)1(

)(61

2

2

nn

dr iS

Page 65: Chapter 14: Elements of Nonparametric Statistics

Calculation of rS:

1. Rank the x-values from smallest to largest: 1, 2, ... , n.

2. Rank the y-values from smallest to largest: 1, 2, ... , n.

3. Use the ranks instead of the actual numerical values in the formula for r, the linear correlation coefficient.

4. If there are no ties, rS is equivalent to r.

5. rS is an easier procedure that uses the differences between the ranks: di.

6. In practice, rS is used even when there are ties.

7. For ties: assign each of the tied observations the mean rank of those rank positions that they occupy.

Page 66: Chapter 14: Elements of Nonparametric Statistics

Assumptions for inferences about rank correlation:

The n ordered pairs of data form a random sample and the variables are ordinal or numerical.

Null Hypothesis:

There is no correlation between the two rankings.

Alternative Hypothesis:

Two-tailed: There is correlation between rankings.

May be one-tailed if positive or negative correlation is suspected.

Critical Region:

On the side(s) corresponding to the specific alternative.

Table 15, Appendix B: positive critical values only, add a plus or minus sign to the value found in the table, as appropriate.

Page 67: Chapter 14: Elements of Nonparametric Statistics

Example: A researcher believes a certain toxic chemical accumulates in body tissues with age and may eventually cause heart disease. Twelve subjects were selected at random. Their age and the chemical concentration (in parts per million) in tissue samples is given in the table below. Is there any evidence to suggest the chemical concentration in tissue samples increases with age? Use = 0.01.

Chemical Chemical

Age, x Concentration, y Age, x Concentration, y82 170 70 4883 40 62 3464 64 34 353 5 27 747 15 75 5050 5 28 10

Page 68: Chapter 14: Elements of Nonparametric Statistics

Solution:

1. The Set-up:

a. Population parameter of concern: Rank correlation coefficient between age and chemical concentration, S.

b. The null and alternative hypothesis:

H0: Age and chemical concentration are not related.

Ha: Older people tend to have higher chemical concentrations in their tissues.

2. The Hypothesis Test Criteria:

a. Assumptions: The 12 ordered pairs of data form a random sample; both variables are continuous.

b. Test statistic: Rank correlation coefficient, rS.

c. Level of significance: = 0.01.

Page 69: Chapter 14: Elements of Nonparametric Statistics

3. The Sample Evidence:

The ranks and differences:

Chemical Chem. Con.

Age Age Rank Concentration Rank Difference (d i ) (d i )2

82 11 170 12 -1 1.0083 12 40 8 4 16.0064 8 64 11 -3 9.0053 6 5 2.5 3.5 12.2547 4 15 6 -2 4.0050 5 5 2.5 2.5 6.2570 9 48 9 0 0.0062 7 34 7 0 0.0034 3 3 1 2 4.0027 1 7 4 -3 9.0075 10 50 10 0 0.0028 2 10 5 -3 9.00

70.5

Page 70: Chapter 14: Elements of Nonparametric Statistics

Use the formula for rS:

4. The Probability Distribution (Classical Approach):

a. Critical value: The critical region is one-tailed.

Table 15 lists critical values for two-tailed tests.

The critical value is located at the intersection of the = 0.02 column (2 0.01) and the n = 12 row: 0.703

b. rS is in the critical region.

7535.02465.011716423

1

)112)(12(

)5.70)(6(1

)1(

)(61

22

2

nn

dr iS

Page 71: Chapter 14: Elements of Nonparametric Statistics

4. The Probability Distribution (p-Value Approach):

a. The p-value: P = P(rS 0.7535, for n = 12)

Using Table 15: P < 0.005

Using a computer: P 0.0025

b. The p-value is smaller than the level of significance, .

5. The Results:

a. Decision:Reject H0.

b. Conclusion: At the = 0.01 level of significance, there is evidence to suggest that older people tend to have higher levels of chemical concentration in their tissues.

Page 72: Chapter 14: Elements of Nonparametric Statistics

Normal Approximation:

1. As n gets large, rS approaches a normal distribution.

2. When n exceeds the values in Table 15, the following test statistic may be used:

11/1

0

nrn

rz S

S