nonparametric statistical methods introduction features of nonparametric … · 2018. 9. 2. ·...

12
Nonparametric Statistical Methods Introduction In previous chapters, we presented statistical techniques for comparing two or more populations by comparing their respective population parameters (usually their means). Remember that these techniques require that our data are measured in at least the interval scale and that they are normally distributed. In this chapter, we present several statistical tests for comparing populations for many types of data that do not satisfy these assumptions. The social and behavioral sciences need the ability to use nonparametric statistics in research. Many studies in these areas involve data that are classified on the nominal or ordinal scale. At times, interval data from these fields lack parameters for classification as normal. Nonparametric statistics is a useful tool for analyzing such data. Features of Nonparametric Statistical Methods 1. Require few assumptions (i.e. do not require normality of data). 2. Often easier to apply and quite easy to understand. 3. Slightly less efficient than their normal counterparts when data is normal. 4. Mildly and wildly more efficient than their normal counterparts when data is normal. 5. Appropriate for data in the nominal or ordinal level, but can be applied to data in at least interval level. 6. Oftentimes, original data is usually transformed into ranks and test statistics are calculated based on these ranks. 7. Hypotheses may not necessarily be expressed in terms of population parameters, say the mean. The Wilcoxon Rank-sum Test used in comparing two independent samples used when data do not satisfy the requirements for the t test (e. g. normality) appropriate for data measured in at least interval level the data are ranked and the statistic is computed based on the ranks equivalent to the Mann-Whitney U test (SPSS) Example: A firm has a generous but rather complicated policy concerning end-of-year bonuses for its lower-level managerial personnel. The policy’s key factor is a subjective judgment of ‘‘contribution to corporate goals.’’ A personnel officer took samples of 24 female and 36 male managers to see whether there was any difference in bonuses, expressed as a percentage of yearly salary. The data is given below. Objective: Determine if male and female managers received the same percent bonuses.

Upload: others

Post on 12-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

Nonparametric Statistical Methods Introduction In previous chapters, we presented statistical techniques for comparing two or more populations by comparing their respective population parameters (usually their means). Remember that these techniques require that our data are measured in at least the interval scale and that they are normally distributed. In this chapter, we present several statistical tests for comparing populations for many types of data that do not satisfy these assumptions.

The social and behavioral sciences need the ability to use nonparametric statistics in research. Many studies in these areas involve data that are classified on the nominal or ordinal scale. At times, interval data from these fields lack parameters for classification as normal. Nonparametric statistics is a useful tool for analyzing such data. Features of Nonparametric Statistical Methods

1. Require few assumptions (i.e. do not require normality of data). 2. Often easier to apply and quite easy to understand. 3. Slightly less efficient than their normal counterparts when data is normal. 4. Mildly and wildly more efficient than their normal counterparts when data is normal. 5. Appropriate for data in the nominal or ordinal level, but can be applied to data in at

least interval level. 6. Oftentimes, original data is usually transformed into ranks and test statistics are

calculated based on these ranks. 7. Hypotheses may not necessarily be expressed in terms of population parameters,

say the mean. The Wilcoxon Rank-sum Test

used in comparing two independent samples

used when data do not satisfy the requirements for the t test (e. g. normality)

appropriate for data measured in at least interval level

the data are ranked and the statistic is computed based on the ranks

equivalent to the Mann-Whitney U test (SPSS) Example:

A firm has a generous but rather complicated policy concerning end-of-year bonuses for its lower-level managerial personnel. The policy’s key factor is a subjective judgment of ‘‘contribution to corporate goals.’’ A personnel officer took samples of 24 female and 36 male managers to see whether there was any difference in bonuses, expressed as a percentage of yearly salary. The data is given below.

Objective: Determine if male and female managers received the same percent bonuses.

Page 2: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS Excel and import into Stata.

2. a. From the menu at the top of the screen, click StatisticsNonparametric

analysisTests of hypothesisWilcoxon rank-sum test.

b. In the Main tab, select pctbonus in the pull-down menu under Variable:.

c. In the Main tab, select sex in the pull-down menu under Grouping Variable:.

d. Click OK.

SCREENSHOTS

Page 3: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

OUTPUT

Interpretation: There is significant difference in the average percent bonus received by male

and female managers (p=0.0005).

Prob > |z| = 0.0005

z = -3.464

Ho: pctbonus(sex==F) = pctbonus(sex==M)

adjusted variance 4371.25

adjustment for ties -20.75

unadjusted variance 4392.00

combined 60 1830 1830

M 36 1327 1098

F 24 503 732

sex obs rank sum expected

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

. ranksum pctbonus, by(sex)

Page 4: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

The Wilcoxon Signed-rank Test

used in comparing two dependent or matched samples

used when data do not satisfy the requirements for the paired-samples T test

the absolute differences between pairs of observations are ranked from small to large and the test statistic is obtained based on sign of the ranks

Example: A study was designed to measure the effect of home environment on academic

achievement of 12-year-old students. Because genetic differences may also contribute to academic achievement, the researcher wanted to control for this factor. Thirty sets of identical twins were identified who had been adopted prior to their first birthday, with one twin placed in a home in which academics were emphasized (Academic) and the other twin placed in a home in which academics were not emphasized (Nonacademic). The final grades (based on 100 points) for the 60 students are given below.

Objective: Determine if home environment has effect on academic achievement. Specifically,

determine if students in academically oriented home environment perform better than students in non-academically oriented home environment.

Analysis using Stata

1. From the menu at the top of the screen, click StatisticsNonparametric

analysisTests of hypothesisWilcoxon matched-pairs signed rank test.

2. In the Main tab, select academic in the pull-down menu under Variable:

3. Type nonacademic under Expression:

4. Click OK.

Page 5: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

SCREENSHOTS

Page 6: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

OUTPUT

Interpretation: The mean grade of students in academically oriented home environment

is significantly higher than the mean grade of students in non-academically oriented home environment (p=0.0001).

Kruskall-Wallis Test for a CRD Experiment

this is the nonparametric alternative to the analysis of variance F test for CRD experiments

the hypotheses to be tested are Ho: The k population distributions are identical. Ha: At least two of the k population distributions differ in location.

an extension of the Wilcoxon rank-sum test

all sample sizes are at least five Example:

A clinical psychologist wished to compare three methods for reducing hostility levels in university students, and used a certain test (HLT) to measure the degree of hostility. A high score on the test indicated great hostility. The psychologist used 24 students who obtained high and nearly equal scores in the experiment. Eight were selected at random from among the 24 problem cases and were treated with method 1. Seven of the remaining 16 students were selected at random and treated with method 2. The remaining nine students were treated with method 3. All treatments were continued for a one-semester period. Each student was given the HLT test at the end of the semester, with the results shown below.

Prob > |z| = 0.0002

z = 3.741

Ho: academic = nonacademic

adjusted variance 2354.25

adjustment for zeros 0.00

adjustment for ties -9.50

unadjusted variance 2363.75

all 30 465 465

zero 0 0 0

negative 4 51 232.5

positive 26 414 232.5

sign obs sum ranks expected

Wilcoxon signed-rank test

. signrank academic = nonacademic

Page 7: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

Analysis using Stata

1. From the menu at the top of the screen, click StatisticsNonparametric

analysisTests of hypothesisKruskal-Wallis rank test.

2. In the Main tab, select score in the pull-down menu under Outcome Variable:

3. Select method in the pull-down menu under Variable defining groups:

4. Click OK.

SCREENSHOTS

Page 8: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

OUTPUT

Interpretation: There is significant difference in the mean HLT scores of students among the three methods (p=0.0002).

Friedman Test for a RCBD Experiment

this is the nonparametric alternative to the analysis of variance F test for RCBD

experiments

Example: An accounting firm, prior to introducing in the firm widespread training in statistical

sampling for auditing, tested three training methods: 1. study at home with programmed training materials 2. training sessions at local offices conducted by local staff 3. training sessions in Manila conducted by national staff

Thirty auditors were grouped into 10 groups (blocks) of 3, according to time elapsed since college graduation, and the auditors in each block were randomly assigned to the 3 training methods. Block 1 consists of auditors graduated most recently, …, block 10 consists of those graduated most distantly. At the end of the training, each auditor was asked to analyze a complex case involving statistical application; a proficiency measure based on this analysis was obtained for each auditor. The results are given in the table below.

probability = 0.0002

chi-squared with ties = 17.313 with 2 d.f.

probability = 0.0002

chi-squared = 17.245 with 2 d.f.

Method3 9 55.50

Method2 7 81.50

Method1 8 163.00

method Obs Rank Sum

Kruskal-Wallis equality-of-populations rank test

. kwallis score, by(method)

Page 9: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

Analysis using Stata

1. This data set is typed in Stata Data Editor as it looks in the above table.

2. Before data analysis, the data needs to be transposed by typing the command:

xpose, clear.

Page 10: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

3. In the Command Window, type friedman v1-v10. Then press Enter.

OUTPUT

Interpretation: Proficiency scores differ significantly among the three training methods

(Fr=18.20, p-value=0.0001). Chi-square Test for Differences between Two Independent Samples When the data consist of frequencies in discrete categories, the chi-square test may be used to determine the significance of differences between two independent groups. The measurement involved may be as weak as nominal or categorical scaling or even ordinal. Example:

A university admissions officer was concerned that males and females were accepted at different rates into the four different colleges (business, engineering, liberal arts, and science) at his university. He collected the following data on the acceptance of 1200 males and 800 females who applied to the university:

Sex Business Engineering Liberal

Arts Science TOTAL (Fixed)

Male 300 240 300 360 1200

Female 200 160 200 240 800

TOTAL 500 400 500 600 2000

Research question: Are males and females distributed equally among the various schools? Analysis using Stata

1. This data is encoded in Stata Data Editor as follows:

2. In the Command Window, type this command and press Enter:

tabulate sex college [fweight = numstuds], chi2 expected

p-value = 0.0001

Kendall = 0.9100

Friedman = 18.2000

. friedman v1-v10

Page 11: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

OUTPUT

Interpretation: Males and females are not distributed equally among the four colleges

(Chisq=389.1109, p-value=0.000). Cochran Q test for Binary Responses from Randomized Block Experiments

Sometimes in a randomized block experiments, the response can only be measured in binary form (Yes-No, Pass-Fail, Male-Female, Dead-Alive). The most appropriate nonparametric statistical method is the Cochran’s Q test named after William G. Cochran. Example:

Workers at a large plant generally show two types of behavior: energetic and tired. This behavior was measured for 20 workers on Monday, Wednesday and Friday during one week in March, as shown in the table below (where 1 represents energetic and 0 represents tired).

Research question: Is there a significant difference in the behaviors between the three time

period?

Pearson chi2(3) = 389.1109 Pr = 0.000

480.0 560.0 440.0 520.0 2,000.0

Total 480 560 440 520 2,000

288.0 336.0 264.0 312.0 1,200.0

M 240 480 120 360 1,200

192.0 224.0 176.0 208.0 800.0

F 240 80 320 160 800

Sex Business Engg Lib_Arts Science Total

College

Page 12: Nonparametric Statistical Methods Introduction Features of Nonparametric … · 2018. 9. 2. · Analysis using Stata 1. Enter the data into Stata Data Editor or type the data in MS

Analysis using Stata 1. This data is encoded in Stata Data Editor as it appears in the above table:

2. In the Command Window, type this command and press Enter:

cochran monday wednesday friday

OUTPUT

Interpretation: There is significant difference in the behaviors between the three time periods

(Q=6.705882, p-value=0.0350). There are significantly fewer energetic workers during Mondays than Wednesdays and Fridays.

Prob > chi2 = 0.0350

Cochran's chi2(2) = 6.705882

Number of obs = 20

outcomes in matched samples (Cochran's Q):

Test for equality of proportions of nonzero