nonparametric statistical inference, fourth edition

677

Upload: mario-balderas

Post on 18-Apr-2015

1.383 views

Category:

Documents


3 download

TRANSCRIPT

NonparametricStatistical InferenceFourth Edition, Revisedand ExpandedJean Dickinson GibbonsSubhabrata ChakrabortiThe University of AlabamaTuscaloosa, Alabama, U.S.A.MAR C E LM A R C E L D E K K E R , I N C . N E W Y O R K B A S E LD E K K E RLibrary of Congress Cataloging-in-Publication DataA catalog record for this book is available from the Library of Congress.ISBN: 0-8247-4052-1This book is printed on acid-free paper.HeadquartersMarcel Dekker, Inc.270 Madison Avenue, New York, NY 10016tel: 212-696-9000; fax: 212-685-4540Eastern Hemisphere DistributionMarcel Dekker AGHutgasse 4, Postfach 812, CH-4001 Basel, Switzerlandtel: 41-61-260-6300; fax: 41-61-260-6333World Wide Webhttp:www.dekker.comThe publisher offers discounts on this book when ordered in bulk quantities. Formore information, write to Special SalesProfessional Marketing at the headquartersaddress above.Copyright # 2003 by Marcel Dekker, Inc. All Rights Reserved.Neither this book nor any part may be reproduced or transmitted in any form or byany means, electronic or mechanical, including photocopying, microlming, andrecording, or by any information storage and retrieval system, without permission inwriting from the publisher.Current printing (last digit):10 9 8 7 6 5 4 3 2 1PRINTED IN THE UNITED STATES OF AMERICASTATISTICS: Textbooks and MonographsD. B. OwenFounding Editor, 1972-1991Associate EditorsStatistical Computing/Nonparametric StatisticsProfessor William R. SchucanySouthern Methodist UniversityProbabilityProfessor Marcel F. NeutsUniversity of ArizonaMultivariate AnalysisProfessor Anant M KshirsagarUniversity of MichiganQuality Control/ReliabilityProfessor Edward G. SchillingRochester Institute of TechnologyEditorial BoardApplied ProbabilityDr. Paul R. GarveyThe MITRE CorporationEconomic StatisticsProfessor David E A. GilesUniversity of VictoriaExperimental DesignsMr Thomas B. BarkerRochester Institute of TechnologyMultivariate AnalysisProfessor Subir GhoshUniversity of California-RiversideStatistical DistributionsProfessor N. BalaknshnanMcMaster UniversityStatistical Process ImprovementProfessor G. Geoffrey VmingVirginia Polytechnic InstituteStochastic ProcessesProfessor V. LakshrmkanthamFlorida Institute of TechnologySurvey SamplingProfessor Lynne StokesSouthern Methodist UniversityTime SeriesSastry G. PantulaNorth Carolina State University1 The Generalized Jackknife Statistic, H. L. Gray and W. R Schucany2. Multivariate Analysis, Anant M. Kshirsagar3 Statistics and Society, Walter T . Federer4. Multivanate Analysis. A Selected and Abstracted Bibliography, 1957-1972, Kocher-lakota Subrahmamam and Kathleen Subrahmamam5 Design of Expenments: A Realistic Approach, Virgil L. Anderson and Robert A.McLean6. Statistical and Mathematical Aspects of Pollution Problems, John W Pratt7. Introduction to Probability and Statistics (in two parts), Part I: Probability; Part II:Statistics, Narayan C. Giri8. Statistical Theory of the Analysis of Experimental Designs, J Ogawa9 Statistical Techniques in Simulation (in two parts), Jack P. C. Kleijnen10. Data Quality Control and Editing, Joseph I Naus11 Cost of Living Index Numbers: Practice, Precision, and Theory, Kali S Banerjee12 Weighing Designs: For Chemistry, Medicine, Economics, Operations Research,Statistics, Kali S. Banerjee13. The Search for Oil: Some Statistical Methods and Techniques, edited by D B. Owen14. Sample Size Choice: Charts for Expenments with Linear Models, Robert E Odeh andMartin Fox15. Statistical Methods for Engineers and Scientists, Robert M. Bethea, Benjamin SDuran, and T homas L Boullion16 Statistical Quality Control Methods, living W Burr17. On the History of Statistics and Probability, edited by D. B. Owen18 Econometrics, Peter Schmidt19. Sufficient Statistics' Selected Contributions, VasantS. Huzurbazar (edited by Anant MKshirsagar)20. Handbook of Statistical Distributions, Jagdish K. Pate/, C H Kapadia, and D B Owen21. Case Studies in Sample Design, A. C Rosander22. Pocket Book of Statistical Tables, compiled by R. E Odeh, D B. Owen, Z. W.Bimbaum, and L Fisher23. The Information in Contingency Tables, D V Gokhale and Solomon Kullback24. Statistical Analysis of Reliability and Life-Testing Models: Theory and Methods, Lee J.Bain25. Elementary Statistical Quality Control, Irving W Burr26. An Introduction to Probability and Statistics Using BASIC, Richard A. Groeneveld27. Basic Applied Statistics, B. L Raktoe andJ J Hubert28 A Primer in Probability, Kathleen Subrahmamam29. Random Processes: A First Look, R. Syski30. Regression Methods: A Tool for Data Analysis, Rudolf J. Freund and Paul D. Minton31. Randomization Tests, Eugene S. Edgington32 Tables for Normal Tolerance Limits, Sampling Plans and Screening, Robert E. OdehandD B Owen33. Statistical Computing, William J Kennedy, Jr., and James E. Gentle34. Regression Analysis and Its Application: A Data-Onented Approach, Richard F. Gunstand Robert L Mason35. Scientific Strategies to Save Your Life, / D. J. Brass36. Statistics in the Pharmaceutical Industry, edited by C. Ralph Buncher and Jia-YeongT say37. Sampling from a Finite Population, J. Hajek38. Statistical Modeling Techniques, S. S. Shapiro and A J. Gross39. Statistical Theory and Inference in Research, T A. Bancroft and C -P. Han40. Handbook of the Normal Distribution, Jagdish K. Pate/ and Campbell B Read41 Recent Advances in Regression Methods, Hrishikesh D. Vinod and Aman Ullah42 Acceptance Sampling in Quality Control, Edward G. Schilling43. The Randomized Clinical Trial and Therapeutic Decisions, edited by Niels T ygstrup,John M Lachin, and Enk Juhl44. Regression Analysis of Survival Data in Cancer Chemotherapy, Walter H Carter, Jr,Galen L Wampler, and Donald M Stablein45. A Course in Linear Models, Anant M Kshirsagar46. Clinical Trials- Issues and Approaches, edited by Stanley H Shapiro and T homas HLouis47. Statistical Analysis of DNA Sequence Data, edited by B S Weir48. Nonlinear Regression Modeling: A Unified Practical Approach, David A Ratkowsky49. Attribute Sampling Plans, Tables of Tests and Confidence Limits for Proportions, Rob-ert E OdehandD B Owen50. Experimental Design, Statistical Models, and Genetic Statistics, edited by KlausHinkelmann51. Statistical Methods for Cancer Studies, edited by Richard G Cornell52. Practical Statistical Sampling for Auditors, Arthur J. Wilbum53 Statistical Methods for Cancer Studies, edited by Edward J Wegman and James GSmith54 Self-Organizing Methods in Modeling GMDH Type Algorithms, edited by Stanley J.Farlow55 Applied Factonal and Fractional Designs, Robert A McLean and Virgil L Anderson56 Design of Experiments Ranking and Selection, edited by T homas J Santner and AjitC T amhane57 Statistical Methods for Engineers and Scientists- Second Edition, Revised and Ex-panded, Robert M. Bethea, Benjamin S Duran, and T homas L Bouillon58 Ensemble Modeling. Inference from Small-Scale Properties to Large-Scale Systems,Alan E Gelfand and Crayton C Walker59 Computer Modeling for Business and Industry, Bruce L Bowerman and Richard TO'Connell60 Bayesian Analysis of Linear Models, Lyle D Broemeling61. Methodological Issues for Health Care Surveys, Brenda Cox and Steven Cohen62 Applied Regression Analysis and Expenmental Design, Richard J Brook and GregoryC Arnold63 Statpal: A Statistical Package for MicrocomputersPC-DOS Version for the IBM PCand Compatibles, Bruce J Chalmer and David G Whitmore64. Statpal: A Statistical Package for MicrocomputersApple Version for the II, II+, andHe, David G Whitmore and Bruce J. Chalmer65. Nonparametnc Statistical Inference. Second Edition, Revised and Expanded, JeanDickinson Gibbons66 Design and Analysis of Experiments, Roger G Petersen67. Statistical Methods for Pharmaceutical Research Planning, Sten W Bergman andJohn C Gittins68. Goodness-of-Fit Techniques, edited by Ralph B D'Agostino and Michael A. Stephens69. Statistical Methods in Discnmination Litigation, ecMed by D H Kaye and MikelAickin70. Truncated and Censored Samples from Normal Populations, Helmut Schneider71. Robust Inference, M L T iku, W Y T an, and N Balakrishnan72. Statistical Image Processing and Graphics, edited by Edward J, Wegman and DouglasJ DePnest73. Assignment Methods in Combmatonal Data Analysis, Lawrence J Hubert74. Econometrics and Structural Change, Lyle D Broemeling and Hiroki T surumi75. Multivanate Interpretation of Clinical Laboratory Data, Adelin Albert and Eugene KHam's76. Statistical Tools for Simulation Practitioners, Jack P C Kleijnen77. Randomization Tests' Second Editon, Eugene S Edgington78 A Folio of Distributions A Collection of Theoretical Quantile-Quantile Plots, Edward BFowlkes79. Applied Categorical Data Analysis, Daniel H Freeman, Jr80. Seemingly Unrelated Regression Equations Models: Estimation and Inference, Viren-dra K Snvastava and David E A Giles81. Response Surfaces: Designs and Analyses, Andre I. Khun and John A. Cornell82. Nonlinear Parameter Estimation: An Integrated System in BASIC, John C. Nash andMary Walker-Smith83. Cancer Modeling, edited by James R. T hompson and Barry W. Brown84. Mixture Models: Inference and Applications to Clustering, Geoffrey J. McLachlan andKaye E. Basford85. Randomized Response. Theory and Techniques, Anjit Chaudhuri and Rahul Mukerjee86 Biopharmaceutical Statistics for Drug Development, edited by Karl E. Peace87. Parts per Million Values for Estimating Quality Levels, Robert E Odeh and D B. Owen88. Lognormal Distnbutions: Theory and Applications, ecWed by Edwin L Crow and KunioShimizu89. Properties of Estimators for the Gamma Distribution, K O. Bowman and L R. Shenton90. Spline Smoothing and Nonparametnc Regression, Randall L Eubank91. Linear Least Squares Computations, R W Farebrother92. Exploring Statistics, Damaraju Raghavarao93. Applied Time Series Analysis for Business and Economic Forecasting, Sufi M Nazem94 Bayesian Analysis of Time Series and Dynamic Models, edited by James C. Spall95. The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Raj SChhikara andJ. Leroy Folks96. Parameter Estimation in Reliability and Life Span Models, A. Clifford Cohen and BettyJones Whrtten97. Pooled Cross-Sectional and Time Series Data Analysis, T erry E Die/man98. Random Processes: A First Look, Second Edition, Revised and Expanded, R. Syski99. Generalized Poisson Distributions: Properties and Applications, P C. Consul100. Nonlinear Lp-Norm Estimation, Rene Gonin and Arthur H Money101. Model Discrimination for Nonlinear Regression Models, Dale S. Borowiak102. Applied Regression Analysis in Econometrics, Howard E. Doran103. Continued Fractions in Statistical Applications, K. O. Bowman andL R. Shenton104 Statistical Methodology in the Pharmaceutical Sciences, Donald A. Berry105. Expenmental Design in Biotechnology, Perry D. Haaland106. Statistical Issues in Drug Research and Development, edited by Karl E Peace107. Handbook of Nonlinear Regression Models, David A. Ratkowsky108. Robust Regression: Analysis and Applications, edited by Kenneth D Lawrence andJeffrey L Arthur109. Statistical Design and Analysis of Industrial Experiments, edited by Subir Ghosh110. (7-Statistics: Theory and Practice, A J Lee111. A Primer in Probability: Second Edition, Revised and Expanded, Kathleen Subrah-maniam112. Data Quality Control: Theory and Pragmatics, edited by GunarE Uepins and V. R R.Uppuluri113. Engmeenng Quality by Design: Interpreting the Taguchi Approach, T homas B Barker114 Survivorship Analysis for Clinical Studies, Eugene K. Hams and Adelin Albert115. Statistical Analysis of Reliability and Life-Testing Models: Second Edition, Lee J. Bamand Max Engelhardt116. Stochastic Models of Carcinogenesis, Wai-Yuan Tan117. Statistics and Society Data Collection and Interpretation, Second Edition, Revised andExpanded, Walter T . Federer118. Handbook of Sequential Analysis, B K. Gfiosn and P. K. Sen119 Truncated and Censored Samples: Theory and Applications, A. Clifford Cohen120. Survey Sampling Pnnciples, E. K. Foreman121. Applied Engineering Statistics, Robert M. Bethea and R. Russell Rhinehart122. Sample Size Choice: Charts for Experiments with Linear Models: Second Edition,Robert Odeh and Martin Fox123. Handbook of the Logistic Distnbution, edited by N Balaknshnan124. Fundamentals of Biostatistical Inference, Chap T. Le125. Correspondence Analysis Handbook, J.-P Benzecn126. Quadratic Forms in Random Variables: Theory and Applications, A. M Mathai andSerge B Provost127 Confidence Intervals on Vanance Components, Richard K. Burdick and Franklin AGraybill128 Biopharmaceutical Sequential Statistical Applications, edited by Karl E Peace129. Item Response Theory Parameter Estimation Techniques, Frank B. Baker130. Survey Sampling Theory and Methods, Arijrt Chaudhun and Horst Stenger131. Nonparametnc Statistical Inference Third Edition, Revised and Expanded, Jean Dick-inson Gibbons and Subhabrata Chakraborti132 Bivanate Discrete Distribution, Subrahmaniam Kochertakota and Kathleen Kocher-lakota133. Design and Analysis of Bioavailability and Bioequivalence Studies, Shein-Chung Chowand Jen-pei Liu134. Multiple Compansons, Selection, and Applications in Biometry, edited by Fred MHoppe135. Cross-Over Expenments: Design, Analysis, and Application, David A Ratkowsky,Marc A Evans, and J. Richard Alldredge136 Introduction to Probability and Statistics- Second Edition, Revised and Expanded,Narayan C Gin137. Applied Analysis of Vanance in Behavioral Science, edited by Lynne K Edwards138 Drug Safety Assessment in Clinical Trials, edited by Gene S Gilbert139. Design of Expenments A No-Name Approach, T homas J Lorenzen and Virgil L An-derson140 Statistics in the Pharmaceutical Industry. Second Edition, Revised and Expanded,edited by C Ralph Buncher and Jia-Yeong T say141 Advanced Linear Models Theory and Applications, Song-Gui Wang and Shein-ChungChow142. Multistage Selection and Ranking Procedures. Second-Order Asymptotics, Nitis Muk-hopadhyay and T umulesh K S Solanky143. Statistical Design and Analysis in Pharmaceutical Science Validation, Process Con-trols, and Stability, Shein-Chung Chow and Jen-pei Liu144 Statistical Methods for Engineers and Scientists Third Edition, Revised and Expanded,Robert M Bethea, Benjamin S Duran, and T homas L Bouillon145 Growth Curves, Anant M Kshirsagar and William Boyce Smith146 Statistical Bases of Reference Values in Laboratory Medicine, Eugene K. Harris andJames C Boyd147 Randomization Tests- Third Edition, Revised and Expanded, Eugene S Edgington148 Practical Sampling Techniques Second Edition, Revised and Expanded, Ran/an K.Som149 Multivanate Statistical Analysis, Narayan C Gin150 Handbook of the Normal Distribution Second Edition, Revised and Expanded, JagdishK Patel and Campbell B Read151 Bayesian Biostatistics, edited by Donald A Berry and Dalene K Stangl152 Response Surfaces: Designs and Analyses, Second Edition, Revised and Expanded,Andre I Khuri and John A Cornell153 Statistics of Quality, edited by Subir Ghosh, William R Schucany, and William B. Smith154. Linear and Nonlinear Models for the Analysis of Repeated Measurements, Edward FVonesh and Vemon M Chinchilli155 Handbook of Applied Economic Statistics, Aman Ullah and David E A Giles156 Improving Efficiency by Shnnkage The James-Stein and Ridge Regression Estima-tors, Marvin H J Gruber157 Nonparametnc Regression and Spline Smoothing Second Edition, Randall L Eu-bank158 Asymptotics, Nonparametncs, and Time Senes, edited by Subir Ghosh159 Multivanate Analysis, Design of Experiments, and Survey Sampling, edited by SubirGhosh160 Statistical Process Monitoring and Control, edited by Sung H Park and G GeoffreyVining161 Statistics for the 21st Century Methodologies for Applications of the Future, editedby C. R Rao and GaborJ Szekely162 Probability and Statistical Inference, Nitis Mukhopadhyay163 Handbook of Stochastic Analysis and Applications, edited by D Kannan and V. Lak-shmtkantham164. Testing for Normality, Henry C T hode, Jr.165 Handbook of Applied Econometncs and Statistical Inference, edited by Aman Ullah,Alan T K. Wan, andAnoop Chaturvedi166 Visualizing Statistical Models and Concepts, R W Farebrother167. Financial and Actuarial Statistics An Introduction, Dale S Borowiak168 Nonparametnc Statistical Inference Fourth Edition, Revised and Expanded, JeanDickinson Gibbons and Subhabrata Chakraborti169. Computer-Aided Econometncs, edited by David E. A GilesAdditional Volumes in PreparationThe EM Algorithm and Related Statistical Models, edited by Michiko Watanabe andKazunori YamaguchiMultivanate Statistical Analysis, Narayan C GiriTo the memory of my parents,John and Alice,And to my husband, John S. FieldenJ.D.G.To my parents,Himangshu and Pratima,And to my wife Anuradha, and son, Siddhartha NeilS.C.Preface to the Fourth EditionThis book was rst published in 1971 and last revised in 1992. Duringthe span of over 30 years, it seems fair to say that the book has made ameaningful contribution to the teaching and learning of nonpara-metric statistics. We have been gratied by the interest and thecomments from our readers, reviewers, and users. These commentsand our own experiences have resulted in many corrections,improvements, and additions.We have two main goals in this revision: We want to bring thematerial covered in this book into the 21st century, and we want tomake the material more user friendly.With respect to the rst goal, we have added new materialsconcerning the quantiles, the calculation of exact power and simulatedpower, sample size determination, other goodness-of-t tests, andmultiple comparisons. These additions will be discussed in more detaillater. We have added and modied examples and included exactvsolutions done by hand and modern computer solutions using MINI-TAB,*SAS, STATXACT, and SPSS. We have removed most of thecomputer solutions to previous examples using BMDP, SPSSX, Ex-ecustat, or IMSL, because they seem redundant and take up too muchvaluable space. We have added a number of new references but havemade no attempt to make the references comprehensive on somecurrent minor renements of the procedures covered. Given the sheervolume of the literature, preparing a comprehensive list of referenceson the subject of nonparametric statistics would truly be a challengingtask. We apologize to the authors whose contributions could not beincluded in this edition.With respect to our second goal, we have completely revised anumber of sections and reorganized some of the materials, more fullyintegrated the applications with the theory, given tabular guides forapplications of tests and condence intervals, both exact and approx-imate, placed more emphasis on reporting results using P values,added some new problems, added many new gures and titled allgures and tables, supplied answers to almost all the problems, in-creased the number of numerical examples with solutions, and writtenconcise but detailed summaries for each chapter. We think the problemanswers should be a major plus, something many readers have re-quested over the years. We have also tried to correct errors and in-accuracies from previous editions.In Chapter 1, we have added Chebyshevs inequality, the CentralLimit Theorem, and computer simulations, and expanded the listing ofprobability functions, including the multinomial distribution and therelation between the beta and gamma functions. Chapter 2 has beencompletely reorganized, starting with the quantile function and theempirical distribution function (edf), in an attempt to motivate thereader to see the importance of order statistics. The relation betweenrank and the edf is explained. The tests and condence intervals forquantiles have been moved to Chapter 5 so that they are discussedalong with other one-sample and paired-sample procedures, namely,the sign test and signed rank test for the median. New discussions ofexact power, simulated power, and sample size determination, andthe discussion of rank tests in Chapter 5 of the previous editionare also included here. Chapter 4, on goodness-of-t tests, has beenexpanded to include Lillieforss test for the exponential distribution,*MINITAB is a trademark of Minitab Inc. in the United States and other countries andis used herein with permission of the owner (on the Web at www.minitab.com).vi PREFACE TO THE FOURTH EDITIONcomputation of normal probability plots, and visual analysis of good-ness of t using P-P and Q-Q plots.The new Chapter 6, on the general two-sample problem, denesstochastically larger and gives numerical examples with exact andcomputer solutions for all tests. We include sample size determinationfor the Mann-Whitney-Wilcoxontest. Chapters 7and8 are the previous-edition Chapters 8 and 9 on linear rank tests for the location and scaleproblems, respectively, withnumerical examples for all procedures. Themethod of positive variables to obtain a condence interval estimate ofthe ratio of scale parameters when nothing is known about location hasbeen added to Chapter 8, along with a much needed summary.Chapters 10 and 12, on tests for k samples, now include multiplecomparisons procedures. The materials on nonparametric correlationin Chapter 11 have been expanded to include the interpretation ofKendalls tau as a coefcient of disarray, the Students t approximationto the distribution of Spearmans rank correlation coefcient, and thedenitions of Kendalls tau a, tau b and the Goodman-Kruskal coef-cient. Chapter 14, a new chapter, discusses nonparametric methods foranalyzing count data. We cover analysis of contingency tables, tests forequality of proportions, Fishers exact test, McNemars test, and anadaptation of Wilcoxons rank-sum test for tables with orderedcategories.Bergmann, Ludbrook, and Spooren (2000) warn of possiblemeaningful differences in the outcomes of P values from different sta-tistical packages. These differences can be due to the use of exact versusasymptotic distributions, use or nonuse of a continuity correction, or useor nonuse of a correction for ties. The output seldomgives such details ofcalculations, and eventhe Help facility and the manuals do not alwaysgive a clear description or documentation of the methods used to carryout the computations. Because this warning is quite valid, we tried toexplain to the best of our ability any differences between our hand cal-culations and the package results for each of our examples.As we said at the beginning, it has been most gratifying to re-ceive very positive remarks, comments, and helpful suggestions onearlier editions of this book and we sincerely thank many readersand colleagues who have taken the time. We would like to thankMinitab, Cytel, and Statsoft for providing complimentary copiesof their software. The popularity of nonparametric statisticsmust depend, to some extent, on the availability of inexpensive anduser-friendly software. Portions of MINITAB Statistical Softwareinput and output in this book are reprinted with permission ofMinitab Inc.PREFACE TO THE FOURTH EDITION viiMany people have helped, directly and indirectly, to bring aproject of this magnitude to a successful conclusion. We are thankful tothe University of Alabama and to the Department of InformationSystems, Statistics and Management Science for providing an en-vironment conducive to creative work and for making some resourcesavailable. In particular, Heather Davis has provided valuable assis-tance with typing. We are indebted to Clifton D. Sutton of GeorgeMason University for pointing out errors in the rst printing of thethird edition. These have all been corrected. We are grateful to JosephStubenrauch, Production Editor at Marcel Dekker for giving us ex-cellent editorial assistance. We also thank the reviewers of the thirdedition for their helpful comments and suggestions. These includeJones (1993), Prvan (1993), and Ziegel (1993). Ziegels review inTechnometrics stated, This is the book for all statisticians and stu-dents in statistics who need to learn nonparametric statistics. . .. Iam grateful that the author decided that one more edition could al-ready improve a ne package. We sincerely hope that Mr. Ziegel andothers will agree that this ne package has been improved in scope,readability, and usability.Jean Dickinson GibbonsSubhabrata Chakrabortiviii PREFACE TO THE FOURTH EDITIONPreface to the Third EditionThe third edition of this book includes a large amount of additions andchanges. The additions provide a broader coverage of the nonpara-metric theory and methods, along with the tables required to applythem in practice. The primary change in presentation is an integrationof the discussion of theory, applications, and numerical examples ofapplications. Thus the book has been returned to its original fourteenchapters with illustrations of practical applications following thetheory at the appropriate place within each chapter. In addition, manyof the hand-calculated solutions to these examples are veried andillustrated further by showing the solutions found by using one ormore of the frequently used computer packages. When the packagesolutions are not equivalent, which happens frequently because mostof the packages use approximate sampling distributions, the reasonsare discussed briey. Two new packages have recently been developedexclusively for nonparametric methodsNONPAR: NonparametricStatistics Package and STATXACT: A Statistical Package for ExactixNonparametric Inference. The latter package claims to compute exactP values. We have not used them but still regard them as a welcomeaddition.Additional new material is found in the problem sets at the end ofeach chapter. Some of the new theoretical problems request verica-tion of results published in journals about inference procedures notcovered specically in the text. Other new problems refer to the newmaterial included in this edition. Further, many new applied problemshave been added.The new topics that are covered extensively are as follows. InChapter 2 we give more convenient expressions for the moments oforder statistics in terms of the quantile function, introduce the em-pirical distribution function, and discuss both one-sample and two-sample coverages so that problems can be given relating to exceedanceand precedence statistics. The rank von Neumann test for randomnessis included in Chapter 3 along with applications of runs tests in ana-lyses of time series data. In Chapter 4 on goodness-of-t tests, Lillie-forss test for a normal distribution with unspecied mean andvariance has been added.Chapter 7 now includes discussion of the control median test asanother procedure appropriate for the general two-sample problem.The extension of the control median test to k mutually independentsamples is given in Chapter 11. Other new materials in Chapter 11 arenonparametric tests for ordered alternatives appropriate for databased on k53 mutually independent random samples. The testsproposed by Jonckheere and Terpstra are covered in detail. The pro-blems relating to comparisons of treatments with a control or an un-known standard are also included here.Chapter 13, on measures of association in multiple classica-tions, has an additional section on the Page test for ordered alter-natives in k-related samples, illustration of the calculation of Kendallstau for count data in ordered contingency tables, and calculation ofKendalls coefcient of partial correlation. Chapter 14 now includescalculations of asymptotic relative efciency of more tests and alsoagainst more parent distributions.For most tests covered, the corrections for ties are derived anddiscussions of relative performance are expanded. New tables includedin the Appendix are the distributions of the Lillieforss test fornormality, Kendalls partial tau, Pages test for ordered alternatives inthe two-way layout, the Jonckheere-Terpstra test for orderedalternatives in the one-way layout, and the rank von Neumann test forrandomness.x PREFACE TO THE THIRD EDITIONThis edition also includes a large number of additional refer-ences. However, the list of references is not by any means purported tobe complete because the literature on nonparametric inference pro-cedures is vast. Therefore, we apologize to those authors whose con-tributions were not included in our list of references.As always in a new edition, we have attempted to correct pre-vious errors and inaccuracies and restate more clearly the text andproblems retained from previous editions. We have also tried to takeinto account the valuable suggestions for improvement made by usersof previous editions and reviewers of the second edition, namely,Moore (1986), Randles (1986), Sukhatme (1987), and Ziegel (1988).As with any project of this magnitude, we are indebted to manypersons for help. In particular, we would like to thank Pat Coons andConnie Harrison for typing and Nancy Kao for help in the bibliographysearch and computer solutions to examples. Finally, we are indebted tothe University of Alabama, particularly the College of Commerce andBusiness Administration, for partial support during the writing of thisversion.Jean Dickinson GibbonsSubhabrata ChakrabortiPREFACE TO THE THIRD EDITION xiPreface to the Second EditionA large number of books on nonparametric statistics have appearedsince this book was published in 1971. The majority of them areoriented toward applications of nonparametric methods and do notattempt to explain the theory behind the techniques; they are essen-tially users manuals, called cookbooks by some. Such books serve auseful purpose in the literature because non-parametric methods havesuch a broad scope of application and have achieved widespreadrecognition as a valuable technique for analyzing data, particularlydata which consist of ranks or relative preferences andor are smallsamples from unknown distributions. These books are generally usedby nonstatisticians, that is, persons in subject-matter elds. The morerecent books that are oriented toward theory are Lehmann (1975),Randles and Wolfe (1979), and Pratt and Gibbons (1981).A statistician needs to know about both the theory and methods ofnonparametric statistical inference. However, most graduate programsxiiiin statistics can afford to offer either a theory course or a methodscourse, but not both. The rst edition of this book was frequently usedfor the theory course; consequently, the students were forced to learnapplications on their own time.This second edition not only presents the theory with correctionsfrom the rst edition, it also offers substantial practice in problemsolving. Chapter 15 of this edition includes examples of applications ofthose techniques for which the theory has been presented in Chapters1 to 14. Many applied problems are given in this new chapter; theseproblems involve real research situations from all areas of social, be-havioral, and life sciences, business, engineering, and so on. The Ap-pendix of Tables at the end of this new edition gives those tables ofexact sampling distributions that are necessary for the reader to un-derstand the examples given and to be able to work out the appliedproblems. To make it easy for the instructor to cover applications assoon as the relevant theory has been presented, the sections ofChapter 15 follow the order of presentation of theory. For example,after Chapter 3 on tests based on runs is completed, the next assign-ment can be Section 15.3 on applications of tests based on runs and theaccompanying problems at the end of that section. At the end ofthe Chapter 15 there are a large number of review problems arrangedin random order as to type of applications so that the reader can obtainpractice in selecting the appropriate nonparametric technique to usein a given situation.While the rst edition of this book received considerable acclaim,several reviewers felt that applied numerical examples and expandedproblem sets would greatly enhance its usefulness as a textbook. Thissecond edition incorporates these and other recommendations. Theauthor wishes to acknowledge her indebtedness to the following re-viewers for helping to make this revised and expanded edition moreaccurate and useful for students and researchers: Dudewicz andGeller (1972), Johnson (1973), Klotz (1972), and Noether (1972).In addition to these persons, many users of the rst edition havewritten or told me over the years about their likes andor dislikesregarding the book and these have all been gratefully received andconsidered for incorporation in this edition. I would also like to expressmy gratitude to Donald B. Owen for suggesting and encouraging thiskind of revision, and to the Board of Visitors of the University ofAlabama for partial support of this project.Jean Dickinson Gibbonsxiv PREFACE TO THE SECOND EDITIONPreface to the First EditionDuring the last few years many institutions offering graduate pro-grams in statistics have experienced a demand for a course devotedexclusively to a survey of nonparametric techniques and their justi-cations. This demand has arisen both from their own majors and frommajors in social science or other quantitatively oriented elds such aspsychology, sociology, or economics. Although the basic statisticscourses often include a brief description of some of the better-knownand simpler nonparametric methods, usually the treatment is neces-sarily perfunctory and perhaps even misleading. Discussion of only afew techniques in a highly condensed fashion may leave the impres-sion that nonparametric statistics consists of a bundle of trickswhich are simply applied by following a list of instructions dreamed upbysomegenieas apanaceafor all sorts of vague andill-denedproblems.One of the deterrents to meeting this demand has been the lackof a suitable textbook in nonparametric techniques. Our experience atxvthe University of Pennsylvania has indicated that an appropriate textwould provide a theoretical but readable survey. Only a moderateamount of pure mathematical sophistication should be required sothat the course would be comprehensible to a wide variety of graduatestudents and perhaps even some advanced undergraduates. Thecourse should be available to anyone who has completed at least therather traditional one-year sequence in probability and statistical in-ference at the level of Parzen, Mood and Graybill, Hogg and Craig, etc.The time allotment should be a full semester, or perhaps two seme-sters if outside reading in journal publications is desirable.The texts presently available which are devoted exclusively tononparametric statistics are few in number and seem to be pre-dominantly either of the handbook style, with few or no justications,or of the highly rigorous mathematical style. The present book is anattempt to bridge the gap between these extremes. It assumes thereader is well acquainted with statistical inference for the traditionalparametric estimation and hypothesis-testing procedures, basic prob-ability theory, and random-sampling distributions. The survey is notintended to be exhaustive, as the eld is so extensive. The purpose ofthe book is to provide a compendium of some of the better-knownnonparametric techniques for each problem situation. Those deriva-tions, proofs, and mathematical details which are relatively easilygrasped or which illustrate typical procedures in general nonpara-metric statistics are included. More advanced results are simply statedwith references. For example, some of the asymptotic distributiontheory for order statistics is derived since the methods are equallyapplicable to other statistical problems. However, the Glivenko Can-telli theorem is given without proof since the mathematics may be tooadvanced. Generally those proofs given are not mathematically rig-orous, ignoring details such as existence of derivatives or regularityconditions. At the end of each chapter, some problems are includedwhich are generally of a theoretical nature but on the same level as therelated text material they supplement.The organization of the material is primarily according to thetype of statistical information collected and the type of questions to beanswered by the inference procedures or according to the general typeof mathematical derivation. For each statistic, the null distributiontheory is derived, or when this would be too tedious, the procedure onecould follow is outlined, or when this would be overly theoretical, theresults are stated without proof. Generally the other relevant math-ematical details necessary for nonparametric inference are also in-cluded. The purpose is to acquaint the reader with the mathematicalxvi PREFACE TO THE FIRST EDITIONlogic on which a test is based, those test properties which are essentialfor understanding the procedures, and the basic tools necessary forcomprehending the extensive literature published in the statisticsjournals. The book is not intended to be a users manual for the ap-plication of nonparametric techniques. As a result, almost no numer-ical examples or problems are provided to illustrate applications orelicit applied motivation. With the approach, reproduction of an ex-tensive set of tables is not required.The reader may already be acquainted with many of the non-parametric methods. If not, the foundations obtained from this bookshould enable anyone to turn to a users handbook and quickly graspthe application. Once armed with the theoretical background, the userof nonparametric methods is much less likely to apply tests indis-criminately or view the eld as a collection of simple prescriptions. Theonly insurance against misapplication is a thorough understanding.Although some of the strengths and weaknesses of the tests coveredare alluded to, no denitive judgments are attempted regarding therelative merits of comparable tests. For each topic covered, some re-ferences are given which provide further information about the testsor are specically related to the approach used in this book. Thesereferences are necessarily incomplete, as the literature is vast. Theinterested reader may consult Savages Bibliography (1962).I wish to acknowledge the helpful comments of the reviewers andthe assistance provided unknowingly by the authors of other textbooksin the area of nonparametric statistics, particularly Gottfried E.Noether and James V. Bradley, for the approach to presentationof several topics, and Maurice G. Kendall, for much of the materialon measures of association. The products of their endeavors greatlyfacilitated this project. It is a pleasure also to acknowledge myindebtedness to Herbert A. David, both as friend and mentor. Histraining and encouragement helped make this book a reality. Parti-cular gratitude is also due to the Lecture Note Fund of the WhartonSchool, for typing assistance, and the Department of Statistics andOperations Research at the University of Pennsylvania for providingthe opportunity and time to nish this manuscript. Finally, I thank myhusband for his enduring patience during the entire writing stage.Jean Dickinson GibbonsPREFACE TO THE FIRST EDITION xviiContentsPreface to the Fourth Edition vPreface to the Third Edition ixPreface to the Second Edition xiiiPreface to the First Edition xv1 Introduction and Fundamentals 11.1 Introduction 11.2 Fundamental Statistical Concepts 92 Order Statistics, Quantiles, and Coverages 322.1 Introduction 322.2 The Quantile Function 332.3 The Empirical Distribution Function 372.4 Statistical Properties of Order Statistics 40xix2.5 Probability-Integral Transformation (PIT) 422.6 Joint Distribution of Order Statistics 442.7 Distributions of the Median and Range 502.8 Exact Moments of Order Statistics 532.9 Large-Sample Approximations to the Moments ofOrder Statistics 572.10 Asymptotic Distribution of Order Statistics 602.11 Tolerance Limits for Distributions and Coverages 642.12 Summary 69Problems 693 Tests of Randomness 763.1 Introduction 763.2 Tests Based on the Total Number of Runs 783.3 Tests Based on the Length of the Longest Run 873.4 Runs Up and Down 903.5 A Test Based on Ranks 973.6 Summary 98Problems 994 Tests of Goodness of Fit 1034.1 Introduction 1034.2 The Chi-Square Goodness-of-Fit Test 1044.3 The Kolmogorov-Smirnov One-Sample Statistic 1114.4 Applications of the Kolmogorov-SmirnovOne-Sample Statistics 1204.5 Lillieforss Test for Normality 1304.6 Lillieforss Test for the Exponential Distribution 1334.7 Visual Analysis of Goodness of Fit 1434.8 Summary 147Problems 1505 One-Sample and Paired-Sample Procedures 1565.1 Introduction 1565.2 Condence Interval for a Population Quantile 1575.3 Hypothesis Testing for a Population Quantile 1635.4 The Sign Test and Condence Interval forthe Median 1685.5 Rank-Order Statistics 1895.6 Treatment of Ties in Rank Tests 194xx CONTENTS5.7 The Wilcoxon Signed-Rank Test andCondence Interval 1965.8 Summary 222Problems 2246 The General Two-Sample Problem 2316.1 Introduction 2316.2 The Wald-Wolfowitz Runs Test 2356.3 The Kolmogorov-Smirnov Two-Sample Test 2396.4 The Median Test 2476.5 The Control Median Test 2626.6 The Mann-Whitney U Test 2686.7 Summary 279Problems 2807 Linear Rank Statistics and the GeneralTwo-Sample Problem 2837.1 Introduction 2837.2 Denition of Linear Rank Statistics 2847.3 Distribution Properties of Linear Rank Statistics 2857.4 Usefulness in Inference 294Problems 2958 Linear Rank Tests for the Location Problem 2968.1 Introduction 2968.2 The Wilcoxon Rank-Sum Test 2988.3 Other Location Tests 3078.4 Summary 314Problems 3159 Linear Rank Tests for the Scale Problem 3199.1 Introduction 3199.2 The Mood Test 3239.3 The Freund-Ansari-Bradley-David-Barton Tests 3259.4 The Siegel-Tukey Test 3299.5 The Klotz Normal-Scores Test 3319.6 The Percentile Modied Rank Tests for Scale 3329.7 The Sukhatme Test 3339.8 Condence-Interval Procedures 3379.9 Other Tests for the Scale Problem 338CONTENTS xxi9.10 Applications 3419.11 Summary 348Problems 35010 Tests of the Equality of k Independent Samples 35310.1 Introduction 35310.2 Extension of the Median Test 35510.3 Extension of the Control Median Test 36010.4 The Kruskal-Wallis One-Way ANOVA Test andMultiple Comparisons 36310.5 Other Rank-Test Statistics 37310.6 Tests Against Ordered Alternatives 37610.7 Comparisons with a Control 38310.8 The Chi-Square Test for k Proportions 39010.9 Summary 392Problems 39311 Measures of Association for Bivariate Samples 39911.1 Introduction: Denition of Measures ofAssociation in a Bivariate Population 39911.2 Kendalls Tau Coefcient 40411.3 Spearmans Coefcient of Rank Correlation 42211.4 The Relations Between R and T; E(R), t, and r 43211.5 Another Measure of Association 43711.6 Applications 43811.7 Summary 443Problems 44512 Measures of Association in Multiple Classications 45012.1 Introduction 45012.2 Friedmans Two-Way Analysis of Variance byRanks in a k n Table and Multiple Comparisons 45312.3 Pages Test for Ordered Alternatives 46312.4 The Coefcient of Concordance for k Sets ofRankings of n Objects 46612.5 The Coefcient of Concordance for k Sets ofIncomplete Rankings 47612.6 Kendalls Tau Coefcient for Partial Correlation 48312.7 Summary 486Problems 487xxii CONTENTS13 Asymptotic Relative Efciency 49413.1 Introduction 49413.2 Theoretical Bases for Calculating the ARE 49813.3 Examples of the Calculation of Efcacy and ARE 50313.4 Summary 518Problems 51814 Analysis of Count Data 52014.1 Introduction 52014.2 Contingency Tables 52114.3 Some Special Results for k 2 Contingency Tables 52914.4 Fishers Exact Test 53214.5 McNemars Test 53714.6 Analysis of Multinomial Data 543Problems 548Appendix of Tables 552Table A Normal Distribution 554Table B Chi-Square Distribution 555Table C Cumulative Binomial Distribution 556Table D Total Number of Runs Distribution 568Table E Runs Up and Down Distribution 573Table F Kolmogorov-Smirnov One-Sample Statistic 576Table G Binomial Distribution for y =0.5 577Table H Probabilities for the WilcoxonSigned-Rank Statistic 578Table I Kolmogorov-Smirnov Two-Sample Statistic 581Table J Probabilities for the Wilcoxon Rank-Sum Statistic 584Table K Kruskal-Wallis Test Statistic 592Table L Kendalls Tau Statistic 593Table M Spearmans Coefcient of Rank Correlation 595Table N Friedmans Analysis-of-Variance Statistic andKendalls Coefcient of Concordance 598Table O Lillieforss Test for Normal DistributionCritical Values 599Table P Signicance Points of TXY. Z (for KendallsPartial Rank-Correlation Coefcient) 600Table Q Pages L Statistic 601Table R Critical Values and Associated Probabilities forthe Jonckheere-Terpstra Test 602CONTENTS xxiiiTable S Rank von Neumann Statistic 607Table T Lillieforss Test for Exponential DistributionCritical Values 610Answers to Selected Problems 611References 617Index 635xxiv CONTENTS1Introduction and Fundamentals1.1 INTRODUCTIONIn many elementary statistics courses, the subject matter is somewhatarbitrarily divided into two categories, called descriptive and inductivestatistics. Descriptive statistics usually relates only to the calculationor presentation of gures (visual or conceptual) to summarize orcharacterize a set of data. For such procedures, no assumptions aremade or implied, and there is no question of legitimacy of techniques.The descriptive gures may be a mean, median, variance, range,histogram, etc. Each of these gures summarizes a set of numbers inits own unique way; each is a distinguishable and well-dened char-acterization of data. If such data constitute a random sample from acertain population, the sample represents the population in miniatureand any set of descriptive statistics provides some informationregarding this universe. The term parameter is generally employed toconnote a characteristic of the population. A parameter is often an1unspecied constant appearing in a family of probability distributions,but the word can also be interpreted in a broader sense to includealmost all descriptions of population characteristics within a family.When sample descriptions are used to infer some informationabout the population, the subject is called inductive statistics orstatistical inference. The two types of problems most frequently en-countered here are estimation and tests of hypotheses. The factorwhich makes inference a scientic method, thereby differentiating itfrom mere guessing, is the ability to make evaluations or probabilitystatements concerning the accuracy of an estimate or reliability of adecision. Unfortunately, such scientic evaluations cannot be madewithout some information regarding the probability distribution of therandom variable relating to the sample description used in the in-ference procedure. This means that certain types of sample descrip-tions will be more popular than others, because of their distributionproperties or mathematical tractability. The sample arithmetic meanis a popular gure for describing the characteristic of central tendencyfor many reasons but perhaps least of all because it is a mean. Theunique position of the mean in inference stems largely from its ``almostnormal'' distribution properties. If some other measure, say the sam-ple median, had a property as useful as the central-limit theorem,surely it would share the spotlight as a favorite description of location.The entire body of classical statistical-inference techniques isbased on fairly specic assumptions regarding the nature of the un-derlying population distribution; usually its form and some parametervalues must be stated. Given the right set of assumptions, certain teststatistics can be developed using mathematics which is frequentlyelegant and beautiful. The derived distribution theory is qualied bycertain prerequisite conditions, and therefore all conclusions reachedusing these techniques are exactly valid only so long as the assump-tions themselves can be substantiated. In textbook problems, the re-quisite postulates are frequently just stated and the student practicesapplying the appropriate technique. However, in a real-world problem,everything does not come packaged with labels of population of origin.A decision must be made as to what population properties may judi-ciously be assumed for the model. If the reasonable assumptions arenot such that the traditional techniques are applicable, the classicalmethods may be used and inference conclusions stated only with theappropriate qualiers, e.g., ``If the population is normal, then. . . .''The mathematical statistician may claim that it is the users'problem to decide on the legitimacy of the postulates. Frequently inpractice, those assumptions which are deemed reasonable by empirical2 CHAPTER 1evidence or past experience are not the desired ones, i.e., those forwhich a set of standard statistical techniques has been developed.Alternatively, the sample may be too small or previous experience toolimited to determine what is a reasonable assumption. Or, if the re-searcher is a product of the ``cookbook school'' of statistics, his parti-cular expertise being in the area of application, he may not understandor even be aware of the preconditions implicit in the derivation of thestatistical technique. In any of these three situations, the result oftenis a substitution of blind faith for scientic method, either because ofignorance or with the rationalization that an approximately accurateinference based on recognized and accepted scientic techniques isbetter than no answer at all or a conclusion based on common sense orintuition.An alternative set of techniques is available, and the mathema-tical bases for these procedures are the subject of this book. They maybe classied as distribution-free and nonparametric procedures. In adistribution-free inference, whether for testing or estimation, themethods are based on functions of the sample observations whosecorresponding random variable has a distribution which does not de-pend on the specic distribution function of the population from whichthe sample was drawn. Therefore, assumptions regarding the under-lying population are not necessary. On the other hand, strictlyspeaking, the term nonparametric test implies a test for a hypothesiswhich is not a statement about parameter values. The type of state-ment permissible then depends on the denition accepted for the termparameter. If parameter is interpreted in the broader sense, the hy-pothesis can be concerned only with the form of the population, as ingoodness-of-t tests, or with some characteristic of the probabilitydistribution of the sample data, as in tests of randomness and trend.Needless to say, distribution-free tests and nonparametric tests arenot synonymous labels or even in the same spirit, since one relates tothe distribution of the test statistic and the other to the type of hy-pothesis to be tested. A distribution-free test may be for a hypothesisconcerning the median, which is certainly a population parameterwithin our broad denition of the term.In spite of the inconsistency in nomenclature, we shall follow thecustomary practice and consider both types of tests as procedures innonparametric inference, making no distinction between the twoclassications. For the purpose of differentiation, the classical statis-tical techniques, whose justication in probability is based on specicassumptions about the population sampled, may be called parametricmethods. This implies a denition of nonparametric statistics then asINTRODUCTION AND FUNDAMENTALS 3the treatment of either nonparametric types of inferences or analogiesto standard statistical problems when specic distribution assump-tions are replaced by very general assumptions and the analysis isbased on some function of the sample observations whose samplingdistribution can be determined without knowledge of the specic dis-tribution function of the underlying population. The assumption mostfrequently required is simply that the population be continuous. Morerestrictive assumptions are sometimes made, e.g., that the populationis symmetrical, but not to the extent that the distribution is speci-cally postulated. The information used in making nonparametric in-ferences generally relates to some function of the actual magnitudes ofthe random variables in the sample. For example, if the actual ob-servations are replaced by their relative rankings within the sampleand the probability distribution of some function of these sample rankscan be determined by postulating only very general assumptions aboutthe basic population sampled, this function will provide a distribution-free technique for estimation or hypothesis testing. Inferences basedon descriptions of these derived sample data may relate to whateverparameters are relevant and adaptable, such as the median for a lo-cation parameter. The nonparametric and parametric hypotheses areanalogous, both relating to location, and identical in the case of acontinuous and symmetrical population.Tests of hypotheses which are not statements about parametervalues have no counterpart in parametric statistics; and thus herenonparametric statistics provides techniques for solving new kinds ofproblems. On the other hand, a distribution-free test simply relates toa different approach to solving standard statistical problems, andtherefore comparisons of the merits of the two types of techniques arerelevant. Some of the more obvious general advantages of nonpara-metric-inference procedures can be appreciated even before our sys-tematic study begins. Nonparametric methods generally are quick andeasy to apply, since they involve extremely simple arithmetic. Thetheory of nonparametric inference relates to properties of the statisticused in the inductive procedure. Discussion of these properties re-quires derivation of the random sampling distribution of the pertinentstatistic, but this generally involves much less sophisticated mathe-matics than classical statistics. The test statistic in most cases is adiscrete random variable with nonzero probabilities assigned to only anite number of values, and its exact sampling distribution can oftenbe determined by enumeration or simple combinatorial formulas. Theasymptotic distributions are usually normal, chi-square, or other well-known functions. The derivations are easier to understand, especially4 CHAPTER 1for non-mathematically trained users of statistics. A cookbookapproach to learning techniques is then not necessary, which lessensthe danger of misuse of procedures. This advantage also minimizes theopportunities for inappropriate and indiscriminate applications, be-cause the assumptions are so general. When no stringent postulationsregarding the basic population are needed, there is little problem ofviolation of assumptions, with the result that conclusions reached innonparametric methods usually need not be tempered by many qua-liers. The types of assumptions made in nonparametric statistics aregenerally easily satised, and decisions regarding their legitimacyalmost obvious. Besides, in many cases the assumptions are sufcient,but not necessary, for the test's validity. Assumptions regarding thesampling process, usually that it is a random sample, are not relaxedwith nonparametric methods, but a careful experimenter can gen-erally adopt sampling techniques which render this problem academic.With so-called ``dirty data,'' most nonparametric techniques are, rela-tively speaking, much more appropriate than parametric methods.The basic data available need not be actual measurements in manycases; if the test is to be based on ranks, for example, only the ranksare needed. The process of collecting and compiling sample data thenmay be less expensive and time consuming. Some new types of pro-blems relating to sample-distribution characteristics are soluble withnonparametric tests. The scope of application is also wider because thetechniques may be legitimately applied to phenomena for which it isimpractical or impossible to obtain quantitative measurements. Wheninformation about actual observed sample magnitudes is provided butnot used as such in drawing an inference, it might seem that some ofthe available information is being discarded, for which one usuallypays a price in efciency. This is really not true, however. The in-formation embodied in these actual magnitudes, which is not directlyemployed in the inference procedure, really relates to the underlyingdistribution, information which is not relevant for distribution-freetests. On the other hand, if the underlying distribution is known, aclassical approach to testing may legitimately be used and so thiswould not be a situation requiring nonparametric methods. The in-formation of course may be consciously ignored, say for the purpose ofspeed or simplicity.This discussion of relative merits has so far been concernedmainly with the application of nonparametric techniques. Performanceis certainly a matter of concern to the experimenter, but general-izations about reliability are always difcult because of varying factorslike sample size, signicance levels or condence coefcients, evaluationINTRODUCTION AND FUNDAMENTALS 5of the importance of speed, simplicity and cost factors, and the non-existence of a xed and universally acceptable criterion of good per-formance. Box and Anderson (1955) state that to fulll the needs ofthe experimenter, statistical criteria should (1) be sensitive to changein the specic factors tested, (2) be insensitive to changes, of a mag-nitude likely to occur in practice, in extraneous factors. These prop-erties, usually called power and robustness, respectively, are generallyagreed upon as the primary requirements of good performance in hy-pothesis testing. Parametric tests are often derived in such a way thatthe rst requirement is satised for an assumed specic probabilitydistribution, e.g., using the likelihood-ratio technique of test con-struction. However, since such tests are, strictly speaking, not evenvalid unless the assumptions are met, robustness is of great concern inparametric statistics. On the other hand, nonparametric tests are in-herently robust because their construction requires only very generalassumptions. One would expect some sacrice in power to result. It istherefore natural to look at robustness as a performance criterion forparametric tests and power for nonparametric tests. How then do wecompare analogous tests of the two types?Power calculations for any test require knowledge of the prob-ability distribution of the test statistic under the alternative, but thealternatives in nonparametric problems are often extremely general.When the requisite assumptions are met, many of the classical para-metric tests are known to be most powerful. In those cases wherecomparison studies have been made, however, nonparametric tests arefrequently almost as powerful, especially for small samples, andtherefore may be considered more desirable whenever there is anydoubt about assumptions. No generalizations can be made for mod-erate-sized samples. The criterion of asymptotic relative efciency istheoretically relevant only for very large samples. When the classicaltests are known to be robust, comparisons may also be desirable fordistributions which deviate somewhat from the exact parametric as-sumptions. However, with inexact assumptions, calculation of power ofclassical tests is often difcult except by Monte Carlo techniques, andstudies of power here have been less extensive. Either type of test maybe more reliable, depending on the particular tests compared and typeor degree of deviations assumed. The difculty with all these com-parisons is that they can be made only for specic nonnull distributionassumptions, which are closely related to the conditions under whichthe parametric test is exactly valid and optimal.Perhaps the chief advantage of nonparametric tests lies in theirvery generality, and an assessment of their performance under6 CHAPTER 1conditions unrestricted by, and different from, the intrinsic postulatesin classical tests seems more expedient. A comparison under morenonparametric conditions would seem especially desirable for two ormore nonparametric tests which are designed for the same generalhypothesis testing situation. Unlike the body of classical techniques,nonparametric techniques frequently offer a selection from inter-changeable methods. With such a choice, some judgments of relativemerit would be particularly useful. Power comparisons have beenmade, predominantly among the many tests designed to detect loca-tion differences, but again we must add that even with comparisonsof nonparametric tests, power can be determined only with fairlyspecic distribution assumptions. The relative merits of the differenttests depend on the conditions imposed. Comprehensive conclusionsare thus still impossible for blanket comparisons of very generaltests.In conclusion, the extreme generality of nonparametric techni-ques and their wide scope of usefulness, while denite advantages inapplication, are factors which discourage objective criteria, particu-larly power, as assessments of performance, relative either to eachother or to parametric techniques. The comparison studies so fre-quently published in the literature are certainly interesting, in-formative, and valuable, but they do not provide the sought-forcomprehensive answers under more nonparametric conditions. Per-haps we can even say that specic comparisons are really contrary tothe spirit of nonparametric methods. No denitive rules of choice willbe provided in this book. The interested reader will nd many perti-nent articles in all the statistics journals. This book is a compendiumof many of the large number of nonparametric techniques which havebeen proposed for various inference situations.Before embarking on a systematic treatment of new concepts,some basic notation and denitions must be agreed upon and thegroundwork prepared for development. Therefore, the remainder ofthis chapter will be devoted to an explanation of the notation adoptedhere and an abbreviated review of some of those denitions and termsfrom classical inference which are also relevant to the special world ofnonparametric inference. A few new concepts and terms will also beintroduced which are uniquely useful in nonparametric theory. Thegeneral theory of order statistics will be the subject of Chapter 2, sincethey play a fundamental role in many nonparametric techniques.Quantiles, coverages, and tolerance limits are also introduced here.Starting with Chapter 3, the important nonparametric techniques willbe discussed in turn, organized according to the type of inferenceINTRODUCTION AND FUNDAMENTALS 7problem (hypothesis to be tested) in the case of hypotheses not invol-ving statements about parameters, or the type of sampling situation(one sample, two independent samples, etc.) in the case of distribution-free techniques, or whichever seems more pertinent. Chapters 3 and 4will treat tests of randomness and goodness-of-t tests, respectively,both nonparametric hypotheses which have no counterpart in classicalstatistics. Chapter 5 covers distribution-free tests of hypotheses andcondence interval estimates of the value of a population quantile inthe case of one sample or paired samples. These procedures are basedon order statistics, signs, and signed ranks. When the relevant quantileis the median, these procedures relate to the value of a locationparameter and are analogies to the one-sample (paired-sample) testsfor the population mean (mean difference) in classical statistics. Rank-order statistics are also introduced here, and we investigate the re-lationship between ranks and variate values. Chapter 6 introduces thetwo-sample problem and covers some distribution-free tests for thehypothesis of identical distributions against general alternatives.Chapter 7 is an introduction to a particular form of nonparametric teststatistic, called a linear rank statistic, which is especially useful fortesting a hypothesis that two independent samples are drawn fromidentical populations. Those linear rank statistics which are particu-larly sensitive to differences only in location and only in scale are thesubjects of Chapters 8 and 9, respectively. Chapter 10 extends thissituation to the hypothesis that k independent samples are drawnfrom identical populations. Chapters 11 and 12 are concerned withmeasures of association and tests of independence in bivariate andmultivariate sample situations, respectively. For almost all tests thediscussion will center on logical justication, null distribution andmoments of the test statistic, asymptotic distribution, and other re-levant distribution properties. Whenever possible, related methods ofinterval estimation of parameters are also included. During the courseof discussion, only the briefest attention will be paid to relative meritsof comparable tests. Chapter 13 presents some theorems relating tocalculation of asymptotic relative efciency, a possible criterion forevaluating large sample performance of nonparametric tests relativeto each other or to parametric tests when certain assumptions are met.These techniques are then used to evaluate the efciency of some ofthe tests covered earlier. Chapter 14 covers some special tests based oncount data.Numerical examples of applications of the most commonly usednonparametric test and estimation procedures are included after theexplanation of the theory. These illustrations of the techniques will8 CHAPTER 1serve to solidify the reader's understanding of proper uses ofnonparametric methods. All of the solutions show the calculationsclearly. In addition, many of the solutions are then repeated using oneor more statistical computer packages.Problems are given at the end of each chapter. The theoreticalproblems serve to amplify or solidify the explanations of theory givenin the text. The applied problems give the reader practice in applica-tions of the methods. Answers to selected problems are given at theend of the book.1.2 FUNDAMENTAL STATISTICAL CONCEPTSIn this section a few of the basic denitions and concepts of classicalstatistics are reviewed, but only very briey since the main purpose isto explain notation and terms taken for granted later on. A few of thenew fundamentals needed for the development of nonparametricinference will also be introduced here.BASIC DEFINITIONSA sample space is the set of all possible outcomes of a randomexperiment.Arandomvariable is a set function whose domainis the elements ofa sample space on which a probability function has been dened andwhose range is the set of all real numbers. Alternatively, X is a randomvariable if for every real number x there exists a probability that the valueassumed by the random variable does not exceed x, denoted by P(X 4x)or FX(x), and called the cumulative distribution function (cdf) of X.The customary practice is to denote the random variable by acapital letter like X and the actual value assumed (value observed inthe experiment) by the corresponding letter in lowercase, x. Thispractice will generally be adhered to in this book. However, it is notalways possible, strictly speaking, to make such a distinction. Occa-sional inconsistencies will therefore be unavoidable, but the statisti-cally sophisticated reader is no doubt already accustomed to this typeof conventional confusion in notation.The mathematical properties of any function FX which is a cdf ofa random variable X are as follows:1. FX(x1) 4FX(x2) for all x14x2, so that FX is nondecreasing.2. limxFX(x) = 0 and limx FX(x) = 1.3. FX(x) is continuous from the right, or, symbolically, as e 0through positive values, lime0FX(x e) = FX(x).INTRODUCTION AND FUNDAMENTALS 9A random variable X is called continuous if its cdf is continuous.Every continuous cdf in this book will be assumed differentiable ev-erywhere with the possible exception of a nite number of points. Thederivative of the cdf will be denoted by fX(x), a nonnegative functioncalled the probability density function (pdf) of X. Thus when X iscontinuous,FX(x) =

xfX(t) dt fX(x) = ddxFX(x) = F/X(x) 50and

fX(x) dx = 1A random variable is called discrete if it can take on only a niteor a countably innite number of values, called mass points. Theprobability mass function (pmf) of a discrete random variable X isdened asfX(x) = P(X = x) = FX(x) lime0FX(X e)where e 0 through positive values. For a discrete random variablefX(x) 50 and all xfX(x) = 1, where the expression ``all x'' is to beinterpreted as meaning all x at which FX(x) is not continuous; in otherwords the summation is over all the mass points. Thus for a discreterandom variable there is a nonzero probability for any mass point,whereas the probability that a continuous random variable takes onany specic xed value is zero.The term probability function (pf ) or probability distribution willbe used to denote either a pdf or a pmf. For notation, capital letterswill always be reserved for the cdf, while the corresponding lowercaseletter denotes the pf.The expected value of a function g(X) of a random variable X,denoted by E[g(X)[, isE[g(X)[ =

g(x)fX(x) dx if X is continuousall xg(x)fX(x) if X is discrete

Joint probability functions and expectations for functions of morethan one random variable are similarly dened and denoted by re-placing single symbols by vectors, sometimes abbreviated toXn = (X1. X2. . . . . Xn)10 CHAPTER 1A set of n random variables (X1. X2. . . . . Xn) is independent if andonly if their joint probability function equals the product of the n in-dividual marginal probability functions.A set of n random variables (X1. X2. . . . . Xn) is called a randomsample of the random variable X (or from the population FX or fX) ifthey are independent and identically distributed (i.i.d.) so that theirjoint probability density function is given byfXn(x1. x2. . . . . xn) = fX1. X2. .... Xn(x1. x2. . . . . xn) =ni=1fX(xi)A statistic is any function of observable or sample random vari-ables which does not involve unknown parameters.A moment is a particular type of population parameter. The kthmoment of X about the origin is m/k = E(Xk), where m/1 = E(X) = m, isthe mean and the kth central moment about the mean ismk = E(X m)kThe second central moment about the mean m2 is the variance of X,m2 = var(X) = s2(X) = E(X2) m2= m/2 (m/1)2The kth factorial moment is E[X(X 1) (X k 1)[.For two random variables, their covariance and correlation, re-spectively, arecov(X. Y) = E(X mX)(Y mY) = E(XY) mXmYcorr(X. Y) = r(X. Y) = cov(X. Y)s(X)s(Y)The moment-generating function (mgf ) of a function g(X) of X isMg(X)(t) = Eexp[tg(X)[Some special properties of the mgf areMabX(t) = ebtMX(at) for aandb constantm/k = dkdtkMX(t)

t=0= M(k)X (0)MOMENTS OF LINEAR COMBINATIONS OF RANDOM VARIABLESLet X1. X2. . . . . Xn be n random variables and ai. bi. i = 1. 2. . . . . n be anyconstants. ThenINTRODUCTION AND FUNDAMENTALS 11Eni=1aiXi =ni=1aiE(Xi)varni=1aiXi =ni=1a2i var(Xi) 214i