nonparametric tests i back to basics. lecture outline what is a nonparametric test? rank tests,...

46
Nonparametric tests I Back to basics

Upload: yesenia-revere

Post on 14-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Nonparametric tests I

Back to basics

Lecture Outline

• What is a nonparametric test?

• Rank tests, distribution free tests and nonparametric tests

• Which type of test to use

MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12

MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12MTB > desc 'Male' 'Female’

Variable N Mean Median TrMean StDev SEMeanMALE 50 0.5908 0.5600 0.5770 0.1979 0.0280FEMALE 50 0.5180 0.4950 0.5102 0.1315 0.0186

Variable Min Max Q1 Q3MALE 0.2900 1.1300 0.4275 0.7150FEMALE 0.3200 0.8500 0.4100 0.6125

Lecture Outline

• What is a nonparametric test? – What is a parameter?– What are examples of non-parametric

tests?

• Rank tests, distribution free tests and nonparametric tests

• Which type of test to use

Parameters

• are central to inference in GLM and ANOVA

• and represent assumptions about the underlying processes

LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error

RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'

LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error

RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'

Fitted value = +

Group1 1

2 2

3 -1-2

Error has Normal Distribution with zero mean and standard deviation

LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error

RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'

Fitted value = +

Group1 1

2 2

3 -1-2

Error has Normal Distribution with zero mean and standard deviation

Parameters

• are central to inference in GLM and ANOVA

• but represent assumptions about the underlying processes

Parameters

• are central to inference in GLM and ANOVA

• but represent assumptions about the underlying processes

• can be done without in some simple situations

Parameters

• are central to inference in GLM and ANOVA

• but represent assumptions about the underlying processes

• can be done without in some simple situations – BUT HOW?

Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 1

10 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2

26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2

51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2

76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1

100 1.13 1

Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 1

10 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2

26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2

51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2

76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1

100 1.13 1

Remember ties

1009080706050403020100

140

120

100

80

60

40

20

0

Mean Rank

1009080706050403020100

140

120

100

80

60

40

20

0

The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74

Mean Rank

MTB > mann-whitney male female

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0

Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26

1009080706050403020100

140

120

100

80

60

40

20

0

The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74

Mean Rank

1009080706050403020100

140

120

100

80

60

40

20

0

The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74

Mean Rank

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)

Cannot reject at alpha = 0.05

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)

Cannot reject at alpha = 0.05

MTB > mann-whitney male female

Mann-Whitney Test and CI: MALE, FEMALE

MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)

Cannot reject at alpha = 0.05

The null hypothesis is better expressed as “the distributions of male and female weights are the same”.

Parameters

• are central to inference in GLM and ANOVA

• but represent assumptions about the underlying processes

• can be done without in some simple situations

Nonparametric vs Parametric

Nonparametric vs Parametric

• Sign Test • One-sample t-test

Nonparametric vs Parametric

• Sign Test • Mann-Whitney Test

• One-sample t-test• Two-sample t-test

Nonparametric vs Parametric

• Sign Test • Mann-Whitney Test • Spearman Rank Test

• One-sample t-test• Two-sample t-test• Correlation/Regression

Nonparametric vs Parametric

• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test

• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA

Nonparametric vs Parametric

• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test• Friedman Test

• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA• One-way blocked ANOVA

Lecture Outline

• What is a nonparametric test?

• Rank tests, distribution free tests and nonparametric tests

• Which type of test to use

A rose by any other name..

• Non-parametric tests lack parameters

• Rank tests start by ranking the data

• Distribution-free tests don’t assume a Normal distribution (or any other)

These are mainly but not completely overlapping sets of tests (and some

are scale-invariant too).

Lecture Outline

• What is a nonparametric test?

• Rank tests, distribution free tests and nonparametric tests

• Which type of test to use

Fewer assumptions but...

• still some assumptions (including independence)• limited range of situations

– no more than 2 x-variables– can’t mix continuous and categorical x-variables

• provide p-values but estimation is dodgy• loss of efficiency if parametric assumptions are

upheld• there is a grand scheme for parametric statistics

(GLM) but a lot of separate strange names for nonparametrics

When is there a choice?

• when there is a non-parametric test– fewer than two or three variables

altogether

• and prediction is not required

How to choose:

• If the assumptions of parametric test are upheld, use it – on grounds of efficiency

• If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical)

• If assumptions not fixable, use nonparametric test

MTB > dotplot 'LogM' 'LogF';SUBC> same.

. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00

MTB > dotplot 'LogM' 'LogF';SUBC> same.

. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00

MTB > desc 'LogM' 'LogF'

Variable N Mean Median TrMean StDev SEMeanLogM 50 -0.5786 -0.5798 -0.5850 0.3248 0.0459LogF 50 -0.6878 -0.7032 -0.6928 0.2453 0.0347

Variable Min Max Q1 Q3LogM -1.2379 0.1222 -0.8499 -0.3355LogF -1.1394 -0.1625 -0.8916 -0.4902

Lecture Outline

• What is a nonparametric test?

• Rank tests, distribution free tests and nonparametric tests

• Which type of test to use

Last remarks

• Nonparametric tests are an opportunity to revise the basic ideas of statistical inference

• They are sometimes useful in biology

• They are often used in biology

• NEXT WEEK: more nonparametrics, including confidence intervals and randomisation tests. READ the handout