inferential statistic –non parametric test by- dr harshal p. bhumbar

38
Inferential statistic –Non Parametric test BY- DR HARSHAL P. BHUMBAR

Upload: stuart-cameron

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Inferential statistic –Non Parametric test

BY- DR HARSHAL P. BHUMBAR

DEFINITION

A statistical method wherein the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.

S. No.

Parametric Test

Non-parametric Test

1 Study of two independent samples

Student t test Wilcoxon-Mann-Whitney test

2 Study of two matched samples

Paired t test Wilcoxon signed rank test

3 Study of two or more independent samples

One way ANOVA

Kruskal-Wallis test

4 Study of two or more matched samples

Two way ANOVA

Friedman test

Difference between Parametric & Non-parametric test

Parametric test Non parametric test1. Used for ratio or interval data For ordinal or nominal data2. Used for Normal distribution Any distribution3. Mean is usual central measure Median is usual central

measure4. Information about population is

completely knownNo information available

5. Specific assumptions made regarding population

Assumption free test

6. Null hypothesis based on parameters of population

Null hypothesis free of parameters

7. Applicable only for variable For both variable & attribute

8. More efficient Less efficient9. More powerful if exists Less powerful

Wilcoxon-Mann-Whitney test

Example -

The effectiveness of advertising for two rival products (Brand X and Brand Y) was compared. Market research at a local shopping centre was carried out, with the participants being shown adverts for two rival brands of coffee, which they then rated on the overall likelihood of them buying the product (out of 10, with 10 being "definitely going to buy the product"). Half of the participants gave ratings for one of the products, the other half gave ratings for the other product.

Brand X Brand Yparticipant rating participant rating1 3 1 92 4 2 73 2 3 54 6 4 105 2 5 66 5 6 8

We have two conditions, with each participant taking part in only one of the conditions. The data are ratings (ordinal data), and hence a nonparametric test is appropriate - the Mann-Whitney U test (the non- parametric counterpart of an independent measures t-test).

STEP ONE Rank all scores together, ignoring which group they belong to.

Brand X Brand Y

Participant

Rating Rank Participant

Rating Rank

1 3 3 1 9 11

2 4 4 2 7 9

3 2 1.5 3 5 5.5

4 6 7.5 4 10 12

5 2 1.5 5 6 7.5

6 5 5.5 6 8 10

STEP TWO: Add up the ranks for Brand X, to get T1

Therefore, T1 = 3 + 4 + 1.5 + 7.5 + 1.5 + 5.5 = 23 STEP THREE: Add up the ranks for Brand Y, to get T2 Therefore,

T2 = 11 + 9 + 5.5 + 12 + 7.5 + 10 = 55

STEP FOUR: Select the larger rank. In this case it’s T2

STEP FIVE: • Calculate n1, n2 and nx These are the number of participants

in each group, and the number of people in the group that gave the larger rank total.

• Therefore n1 = 6 n2 = 6 nx = 6

STEP SIX: • Find U (Note: Tx is the larger rank total) • U = n1*n2 + nx *(nx+1)/2 – Tx• U = 6*6+6*(6+1)/2- 55• U = 2

STEP SEVEN: Use a table of critical U values for the Mann-Whitney U Test

• For n1 = 6 and n2=6, the critical value of U is 5 at the 0.05 significance level.

• For n1 = 6 and n2=6, the critical value of U is 2 at the 0.01 significance level.

STEP EIGHT: To be significant, our obtained U has to be equal to or LESS than

this critical value. Our obtained U = 2

• Our obtained U = 2 The critical value for a two tailed test at .05 significance level = 5 The critical value for a two tailed test at .01 significance level = 2

• So, our obtained U is less than the critical value of U for a 0.05 significance level. It is also equal to the critical value of U for a 0.01 significance level.

But what does this mean? • We can say that there is a highly significant difference (p<.01)

between the ratings given to each brand in terms of the likelihood of buying the product.

Wilcoxon sign rank test

Example -

To know effectiveness of new drug designed to reduce repetitive behaviors in children affected with autism. A total of 8 children with autism enroll in study and amount of time that each is engaged in repetitive behaviour during three hour observation periods are measured both before treatment and then again after taking new medication for a period of 1 week . The data shown below -.

child Before treatment After 1 week treatment

1 85 752 70 503 40 504 65 405 80 206 75 657 55 408 20 25

First we compute difference score for each child

child Before treatment

After 1 week treatment

Difference(before-after)

1 85 75 10

2 70 50 20

3 40 50 - 10

4 65 40 25

5 80 20 60

6 75 65 10

7 55 40 15

8 20 25 - 5

Next step to rank difference scores . First order absolute values of difference scores and assigned rank from 1 to lowest and n to highest for difference scores and assigned mean rank when there are ties in absolute values of different scores.

Observed difference

Ordered absolute value of difference

rank

10 - 5 1

20 10 3

- 10 - 10 3

25 10 3

60 15 5

10 20 6

15 25 7

- 5 60 8

Final step is to attach signs ( +, - ) of observed difference to each rank shown below.

rank Signed rank

1 -13 33 - 33 35 56 67 78 8

Test statistics for Wilcoxon sign rank test is given by W.W+ ( sum of positive ranks )W- ( sum of negative ranks ) If Ho – true then W+ = W- If research hypothesis true then W+ > W- In our example , W+ = 32 and W- = 4 Recall sum of ranks always equal to n(n+1)/2, In our assignment , ( 8*9)/2 = 36 , Test statistics is W = 4,

If the absolute value of W less than or equal to critical value we reject null hypothesis and if observed value of W exceeds critical value we don’t reject null hypothesis.

Friedman test

Example-

Hall et all compared three methods of determining serum amylase values in patients with pancreatitis. The results are shown in following table .we wish to know whether these data indicates a difference among three methods. ( given @=0.05 )

Specimen

Methods of determination

A B C1 4000 3210 61202 1600 1040 24103 1600 647 22104 1200 570 20605 840 445 14006 352 156 2497 224 155 2248 200 99 2089 184 70 227

Following table shows serum amylase values ( enzyme units per 100 ml of serum ) in patients with pancreatitis.

Hypothesis – Ho – MA = MB= MC H1 - at least one equality is violated. Test statistics, b= 9 & k = 3 After converting original observations to ranks , we have

Specimen Methods of determination

A B C1 2 1 32 2 1 33 2 1 34 2 1 35 2 1 36 3 1 27 2.5 1 2.58 2 1 39 2 1 3

So RA= 19.5 , RB= 9 , Rc=25.5

So by equation we have, k= 3 & b=9 Friedman test statistics

Xr² = 12/bk(k+1) ∑Rj²- 3b (k+1)

Xr²= 15.5

From table X²(1-ά, k-1) , where ά=0.05 , k=3 X²(0.95, 2) = 5.991 Since 15.5> 5.991 , we reject null hypothesis Conclusion- Enough evidence to support the claim that three

methods do not yield identical results.

Kruskal Wallis testEXAMPLE-

Does it make any difference to students comprehension of statistics whether the lectures are given in English , Serbo - croat or Cantonese?

Group A – lectures in English Group B – lectures in Serbo-croat Group C– Lectures in Cantonese DV : Students rating of lectures intelligibility on 100 point scale

English (Raw score)

English (Rank)

Serbo-croat(Raw score)

Serbo-croat(Rank)

Cantonese(Raw score)

Cantonese (Rank)

20 3.5 25 7.5 19 1.527 9 33 10 20 3.519 1.5 35 11 25 7.523 6 36 12 22 5

Step 1- Rank the scores ignoring which group they belong to .

• Lowest scores get lowest rank . • Tied scores get average rank

Step 2 –• Tc - Total of rank for each group• Tc1 – 20 • Tc2 – 40.5• Tc3 – 17.5

Step 3 – Find HWhere N- Total number of subjectsTc – Rank total for each groupnc – Number of subjects in each group

Hypothesis –• Ho – MA = MB= MC

• H1 - At least one equality is violated.• Test statistics, b= 9 & k = 3 • After converting original observations to ranks , we have

• ∑ Tc ²/nc = 20²/4 + 40.5²/4 +17.5²/4• = 586.62• H = 6.12 Step 4 – Df are number of groups minus one Step 5 – For 2 Df a chi square of 5.99 has a p = 0.05

occurring by chance• But our H is > 5.99 even so less likely occur by chance• H Is 6.12 , p< 0.05Conclusion – Three groups differ significantly. Language in which statistics is taught does make a difference

to students intelligibility.

Advantages Simple & easy to understand. Not involve complicated sampling theory. No assumption made regarding parent population.

Disadvantages Applied for only nominal or ordinal scale. They uses less information than parametric test. They are not so efficient as of parametric test.

References

• Rao VK. Biostatistics: A manual of statistical method for use in health nutrition and anthropometry. 2nd ed. New Delhi: Jaypee Brothers; 2007.

• Armitage P, Berry G. Statistical Method in Medical Research. 3rd ed. London: Oxford Blackwell scientific publication; 1994

• Swinskow TV, Campbell MJ. Statistics at Square One. 10th ed. London: BMJ Books; 2002.

THANK YOU