an introduction to medical statistics by martin bland
TRANSCRIPT
![Page 1: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/1.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>FrontofBook>Authors
Author
MartinBlandProfessorofMedicalStatisticsStGeorge'sHospitalMedicalSchool,London
![Page 2: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/2.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>FrontofBook>Dedication
Dedication
TothememoryofErnestandPhyllisBland,myparents
![Page 3: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/3.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>FrontofBook>PrefacetotheThirdEdition
PrefacetotheThirdEdition
InpreparingthisthirdeditionofAnIntroductiontoMedicalStatistics,Ihavetakentheopportunitytocorrectanumberofmistakesandtypographicalerrors,andtochangesomeoftheexamplesandaddafewmore.Ihaveextendedthetreatmentofseveraltopicsandintroducedsomenewones,previouslyomittedthroughlackofspaceorenergy,orbecausetheywerethenrarelyseeninthemedicalliterature.Inonecase,numberneededtotreat,theconcepthadnotevenbeeninventedwhenthesecondeditionwaswritten.Othernewtopicsincludeconsentinclinicaltrials,designandanalysisofcluster-randomizedtrials,ecologicalstudies,conditionalprobability,repeatedtesting,randomeffectsmodels,intraclasscorrelation,andconditionaloddsratios.Thankstothewondersofcomputerizedtypesetting,Ihavemanagedtoextendthecontentsofthebookwithaverysmallincreaseinthenumberofpages.
Thisbookisformedicalstudents,doctors,medicalresearchers,nurses,membersofprofessionsalliedtomedicine,andallothersconcernedwithmedicaldata.Therangeofstatisticalmethodsusedinthemedicalandhealthcareliterature,andhencedescribedinthisbook,continuestogrow,butthetimeavailableintheundergraduatecurriculumdoesnot.Someofthetopicscoveredherearebeyondtheneedsofmanystudents,soIhaveindicatedbyanasterisksectionswhichwouldnotusuallybeincludedinfirstcourses.Theseareintendedforpostgraduatestudentsandmedicalresearchers.
Thisthirdeditionisbeingpublishedwithacompanionvolume,StatisticalQuestionsinEvidence-basedMedicine(BlandandPeacock2000).Thisbookofquestionsandanswersincludesnocalculationsandiscomplementarytotheexercisesgivenhere.Inthesolutionsgivenwe
![Page 4: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/4.jpg)
makemanyreferencestoAnIntroductiontoMedicalStatistics.BecausewewantedStatisticalQuestionsinEvidence-basedMedicinetobeusablewiththesecondeditionofAnIntroductiontoMedicalStatistics(Bland1995),Ihavekeptthesameorderandnumberingofthesectionsinthethirdedition.Newmaterialhasallbeenaddedattheendsofthechapters.Ifthestructuresometimesseemsalittleunwieldy,thatiswhy.
Thisisabookaboutdata,notstatisticaltheory.Thefundamentalconceptsofstudydesign,datacollectionanddataanalysisareexplainedbyillustrationandexample.Onlyenoughmathematicsandformulaearegiventomakeclearwhatisgoingon.Forthosewhowishtogoalittlefurtherintheirunderstanding,someofthemoremathematicalbackgroundtothetechniquesdescribedisgivenasappendicestothechaptersratherthaninthemaintext.
Thematerialcoveredincludesallthestatisticalworkthatwouldberequiredforacourseinmedicineandfortheexaminationsofmostoftheroyalcolleges.Itincludesthedesignofclinicaltrialsandepidemiologicalstudies,datacollection.summarizingandpresentingdata,probability,theBinomial,Normal,Poisson.tandChi-squareddistributions,standarderrors,confidenceintervals,testsofsignificance,largesampleandsmallsamplecomparisonsofmeans,theuseoftransformations,regressionandcorrelation,methodsbasedonranks,contingencytables,oddsratios,measurementerror,referenceranges,mortalitydata,vitalstatistics,analysisofvariance,multipleandlogisticregression,survivalanalysis,samplesizeestimation,andthechoiceofthestatisticalmethod.
Thebookisfirmlygroundedinmedicaldata,particularlyinmedicalresearch,andtheinterpretationoftheresultsofcalculationsintheirmedicalcontextisemphasized.Exceptforafewobviouslyinventednumbersusedtoillustratethemechanicsofcalculations,allthedataintheexamplesandexercisesarereal,frommyownresearchandstatisticalconsultationorfromthemedicalliterature.
Therearetwokindsofexerciseinthisbook.Eachchapterhasasetofmultiplechoicequestionsofthe‘trueorfalse’type,100inall.Multiplechoicequestionscancoveralargeamountofmaterialinashorttime,
![Page 5: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/5.jpg)
soareausefultoolforrevision.AsMCQsarewidelyusedinpostgraduateexaminations,theseexercisesshouldalsobeusefultothosepreparingformemberships.AlltheMCQshavesolutions,withreferencetoanappropriatepartofthetextoradetailedexplanationformostoftheanswers.Eachchapteralsohasonelongexercise.Althoughtheseusuallyinvolvecalculation,Ihavetriedtoavoidmerelyslottingfiguresintoformulae.Theseexercisesincludenotonlytheapplicationofstatisticaltechniques,butalsotheinterpretationoftheresultsinthelightofthesourceofthedata.
Iwishtothankmanypeoplewhohavecontributedtothewritingofthisbook.First,therearethemanymedicalstudents,doctors,researchworkers,nurses,physiotherapists,andradiographerswhomithasbeenmypleasuretoteach,andfromwhomIhavelearnedsomuch.Second,thebookcontainsmanyexamplesdrawnfromresearchcarriedoutwithotherstatisticians,epidemiologists,andsocialscientists,particularlyDouglasAltman,RossAnderson,MikeBanks,BarbaraButland,BeulahBewley,andWalterHolland.ThesestudiescouldnothavebeendonewithouttheassistanceofPatsyBailey,BobHarris.RebeccaMcNair.JanetPeacock,SwateePatel,andVirginiaPollard.Third,thecliniciansandscientistswithwhomIhavecollaboratedorwhohavecometomeforstatisticaladvicenotonlytaughtmeaboutmedicaldatabutmanyofthemhaveleftmewithdatawhichareusedhere,includingNaibAl-Saady,ThomasBewley,FrancesBoa,NigelBrown,JanDavies,PeterFish,CarolineFlint,NickHall,TessiHanid.MichaelHutt,RiahdJasrawi,IanJohnston,MosesKipembwa,PamLuthra,HughMather,DaramMaugdal,DouglasMaxwell,CharlesMutoka,TimNorthfield,AndreasPapadopoulos,MohammedRaja,PaulRichardson,andAlbertoSmith.IamparticularlyindebtedtoJohnMorgan,asChapter16ispartlybasedonhiswork.
TheoriginalmanuscriptwastypedbySueNash,SueFisher,SusanHarding,SheilahSkipp,andmyself.ThiseditionhasbeensetbymeusingLATEX,soanyerrorswhichremainaredefinitelymyown.AllthegraphshavebeendrawnusingStataexceptforthepiecharts,doneusingHarvardGraphics.
IthankDouglasAltman,DavidJones,RobinPrescott,KlimMcPherson.JanetPeacock,andStuartPocockfortheirhelpfulcommentsonearlier
![Page 6: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/6.jpg)
drafts.Ihavecorrectedanumberoferrorsfromthefirstandsecondeditions,andIamgratefultocolleagueswhohavepointedthemouttome,inparticulartoDanielHeitjan.IamverygratefultoJanetPeacock,whoproof-readthisedition.Specialthanksareduetomyheadofdepartment,RossAnderson,forallhissupport,andtothestaffofOxfordUniversityPress.MostofallIthankmywife,PaulineBland,forherunfailingconfidenceandencouragement,andmychildren,EmilyandNicholasBland,forkeepingmyfeetfirmlyontheground.
M.B.London,March2000
![Page 7: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/7.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>Sectionsmarked*containmaterialusuallyfoundonlyinpostgraduatecourses
Sectionsmarked*containmaterialusuallyfoundonlyinpostgraduatecourses
![Page 8: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/8.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>1-Introduction
1
Introduction
1.1StatisticsandmedicineEvidence-basedpracticeisthenewwatchwordineveryprofessionconcernedwiththetreatmentandpreventionofdiseaseandpromotionofhealthandwell-being.Thisrequiresboththegatheringofevidenceanditscriticalinterpretation.Theformerisbringingmorepeopleintothepracticeofresearch,andthelatterisrequiringofallhealthprofessionalstheabilitytoevaluatetheresearchcarriedout.Muchofthisevidenceisintheformofnumericaldata.Theessentialskillrequiredforthecollection,analysis,andevaluationofnumericaldataisstatistics.ThusStatistics,thescienceofassemblingandinterpretingnumericaldata,isthecorescienceofevidence-basedpractice.
Inthepastfortyyearsmedicalresearchhasbecomedeeplyinvolvedwiththetechniquesofstatisticalinference.Theworkpublishedinmedicaljournalsisfullofstatisticaljargonandtheresultsofstatisticalcalculations.Thisacceptanceofstatistics,thoughgratifyingtothemedicalstatistician,mayevenhavegonetoofar.MorethanonceIhavetoldacolleaguethathedidnotneedmetoprovethathisdifferenceexisted,asanyonecouldseeit,onlytobetoldinturnthatwithoutthemagicofthePvaluehecouldnothavehispaperpublished.
Statisticshasnotalwaysbeensopopularwiththemedicalprofession.Statisticalmethodswerefirstusedinmedicalresearchinthe19thcenturybyworkerssuchasPierre-Charles-AlexandreLouis,WilliamFarr,FlorenceNightingaleandJohnSnow.Snow'sstudiesofthemodesofcommunicationofcholera,forexample,madeuseofepidemiologicaltechniquesuponwhichwehavestillmadelittleimprovement.Despite
![Page 9: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/9.jpg)
theworkofthesepioneers,however,statisticalmethodsdidnotbecomewidelyusedinclinicalmedicineuntilthemiddleofthetwentiethcentury.Itwasthenthatthemethodsofrandomizedexperimentationandstatisticalanalysisbasedonsamplingtheory,whichhadbeendevelopedbyFisherandothers,wereintroducedintomedicalresearch,notablybyBradfordHill.Itrapidlybecameapparentthatresearchinmedicineraisedmanynewproblemsinbothdesignandanalysis,andmuchworkhasbeendonesincetowardssolvingthesebyclinicians,statisticiansandepidemiologists.
Althoughconsiderableprogresshasbeenmadeinsuchfieldsasthedesignofclinicaltrials,thereremainsmuchtobedoneindevelopingresearchmethodologyinmedicine.Itseemslikelythatthiswillalwaysbeso,foreveryresearchprojectissomethingnew,somethingwhichhasneverbeendonebefore.Under
thesecircumstanceswemakemistakes.Nopieceofresearchcanbeperfectandtherewillalwaysbesomethingwhich,withhindsight,wewouldhavechanged.Furthermore,itisoftenfromtheflawsinastudythatwecanlearnmostaboutresearchmethods.Forthisreason,theworkofseveralresearchersisdescribedinthisbooktoillustratetheproblemsintowhichtheirdesignsoranalysesledthem.Idonotwishtoimplythatthesepeoplewereanymorepronetoerrorthantherestofthehumanrace,orthattheirworkwasnotavaluableandseriousundertaking.RatherIwanttolearnfromtheirexperienceofattemptingsomethingextremelydifficult,tryingtoextendourknowledge,sothatresearchersandconsumersofresearchmayavoidtheseparticularpitfallsinthefuture.
1.2StatisticsandmathematicsManypeoplearediscouragedfromthestudyofstatisticsbyafearofbeingoverwhelmedbymathematics.Itistruethatmanyprofessionalstatisticiansarealsomathematicians,butnotallare,andtherearemanyveryableappliersofstatisticstotheirownfields.Itispossible,thoughperhapsnotveryuseful,tostudystatisticssimplyasapartofmathematics,withnoconcernforitsapplicationatall.Statisticsmayalsobediscussedwithoutappearingtouseanymathematicsatall(e.g.
![Page 10: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/10.jpg)
Huff1954).
Theaspectsofstatisticsdescribedinthisbookcanbeunderstoodandappliedwiththeuseofsimplealgebra.Onlythealgebrawhichisessentialforexplainingthemostimportantconceptsisgiveninthemaintext.Thismeansthatseveralofthetheoreticalresultsusedarestatedwithoutadiscussionoftheirmathematicalbasis.Thisisdonewhenthederivationoftheresultwouldnotaidmuchinunderstandingtheapplication.Formanyreadersthereasoningbehindtheseresultsisnotofgreatinterest.Forthereaderwhodoesnotwishtotaketheseresultsontrust,severalchaptershaveappendicesinwhichsimplemathematicalproofsaregiven.Theseappendicesaredesignedtohelpincreasetheunderstandingofthemoremathematicallyinclinedreaderandtobeomittedbythosewhofindthatthemathematicsservesonlytoconfuse.
1.3StatisticsandcomputingPracticalstatisticshasalwaysinvolvedlargeamountsofcalculation.Whenthemethodsofstatisticalinferencewerebeingdevelopedinthefirsthalfofthetwentiethcentury,calculationsweredoneusingpencil,paper,tables,sliderulesand,withluck,averyexpensivemechanicaladdingmachine.Olderbooksonstatisticsspendmuchtimeonthedetailsofcarryingoutcalculationsandanyreferencetoa‘computer’meansapersonwhocomputes,notanelectronicdevice.Thedevelopmentofthedigitalcomputerhasbroughtchangestostatisticsastomanyotherfields.Calculationscanbedonequickly,easilyand,wehope,accuratelywitharangeofmachinesfrompocketcalculatorswithbuilt-instatisticalfunctionstopowerfulcomputersanalysingdataonmanythousandsofsubjects.Manystatisticalmethodswouldnotbecontemplatedwithoutcomputers,andthedevelopmentofnewmethodsgoeshandinhandwiththedevelopmentof
softwaretocarrythemout.Thetheoryofmultilevelmodelling(Goldstein1995)andtheprogramsMLnandMLWinareagoodexample.Mostofthecalculationsinthisbookweredoneusingacomputerandthegraphswereproducedwithone.
Asanaddedbonus,mylittleMSDOSprogramClinstat(nottobe
![Page 11: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/11.jpg)
confusedwithanycommercialpackageofthesamename)canbedownloadedfreefrommywebsiteathttp://www.sghms.ac.uk/depts/phs/staff/jmb/.Itdoesmostofthecalculationsinthisbook,includingsamplesizecalculationsandrandomsamplingandallocation.Itdoesnotdoanymultifactorialanalyses,sorry.Thereisalsoalittleprogramtofindsomeexactconfidenceintervals.
Thereisthereforenoneedtoconsidertheproblemsofmanualcalculationindetail.Theimportantthingistoknowwhyparticularcalculationsshouldbedoneandwhattheresultsofthesecalculationsactuallymean.Indeed,thedangerinthecomputerageisnotsomuchthatpeoplecarryoutcomplexcalculationswrongly,butthattheyapplyverycomplicatedstatisticalmethodswithoutknowingwhyorwhatthecomputeroutputmeans.MorethanonceIhavebeenapproachedbyaresearcherbearingatwoinchthickcomputerprintout,andaskingwhatitallmeans.Sadly,toooften,itmeansthatanothertreehasdiedinvain.
Thewidespreadavailabilityofcomputersmeansthatmorecalculationsarebeingdone,andbeingpublished,thaneverbefore,andthechanceofinappropriatestatisticalmethodsbeingappliedmayactuallyhaveincreased.Thismisusearisespartlybecausepeopleregardtheirdataanalysisproblemsascomputingproblems,notstatisticalones,andseekadvicefromcomputerexpertsratherthanstatisticians.Theyoftengetgoodadviceonhowtodoit,butratherpooradviceaboutwhattodo,whytodoitandhowtointerprettheresultsafterwards.Itisthereforemoreimportantthaneverthattheconsumersofresearchunderstandsomethingabouttheusesandlimitationsofstatisticaltechniques.
1.4ThescopeofthisbookThisbookisintendedasanintroductiontosomeofthestatisticalideasimportanttomedicine.Itdoesnottellyouallyouneedtoknowtodomedicalresearch.Onceyouhaveunderstoodtheconceptsdiscussedhere,itismucheasiertolearnaboutthetechniquesofstudydesignandstatisticalanalysisrequiredtoansweranyparticularquestion.Thereareseveralexcellentstandardworkswhichdescribethesolutionstoproblemsintheanalysisofdata(ArmitageandBerry1994,Snedecor
![Page 12: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/12.jpg)
andCochran1980,Altman1991)andalsomorespecializedbookstowhichreferencewillbemadewhererequired.
WhatIhopethebookwilldoistogiveenoughunderstandingofthestatisticalideascommonlyusedinmedicinetoenablethehealthprofessionaltoreadthemedicalliteraturecompetentlyandcritically.Itcoversenoughmaterial(andmore)foranundergraduatecourseinstatisticsforstudentsofmedicine,nursing,physiotherapy,etc.Atthetimeofwriting,asfarascanbeestablished,itcoversthematerialrequiredtoanswerstatisticalquestionssetintheexaminationsof
mostoftheRoyalColleges,exceptfortheMRCPsych.IhaveindicatedbyanasteriskinthesubheadingthosesectionswhichIthinkwillberequiredonlybythepostgraduateortheresearcher.
Whenworkingthroughatextbook,itisusefultobeabletocheckyourunderstandingofthematerialcovered.Likemostsuchbooks,thisonehasexercisesattheendofeachchapter,buttoeasethetediummostoftheseareofthemultiplechoicetype.Thereisalsoonelongexercise,usuallyinvolvingcalculations,foreachchapter.Inkeepingwiththecomputerage,wherelaboriouscalculationwouldbenecessaryintermediateresultsaregiventoavoidthis.Thustheexercisescanbecompletedquitequicklyandthereaderisadvisedtotrythem.Youcanalsodownloadsomeofthedatasetsfrommywebsite(http://www.sghms.ac.uk/depts/phs/staff/jmb).Solutionsaregivenattheendofthebook,infullforthelongexercisesandasbriefnoteswithreferencestotherelevantsectionsinthetextforMCQs.ReaderswhowouldlikemorenumericalexercisesarerecommendedtoOsborn(1979).Forawealthofexercisesintheunderstandingandinterpretationofstatisticsinmedicalresearch,drawnfromthepublishedliteratureandpopularmedia,youshouldtrythecompanionvolumetothisone,StatisticalQuestionsinEvidence-basedMedicine(BlandandPeacock2000).
Finally,aquestionmanystudentsofmedicineaskastheystrugglewithstatistics:isitworthit?AsAltman(1982)hasargued,badstatisticsleadstobadresearchandbadresearchisunethical.Notonlymayitgivemisleadingresults,whichcanresultingoodtherapiesbeingabandonedandbadonesadopted,butitmeansthatpatientsmayhave
![Page 13: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/13.jpg)
beenexposedtopotentiallyharmfulnewtreatmentsfornogoodreason.Medicineisarapidlychangingfield.Intenyears'time,manyofthetherapiescurrentlyprescribedandmanyofourideasaboutthecausesandpreventionofdiseasewillbeobsolete.Theywillbereplacedbynewtherapiesandnewtheories,supportedbyresearchstudiesanddataofthekinddescribedinthisbook,andprobablypresentingmanyofthesameproblemsininterpretation.Thepractitionerwillbeexpectedtodecideforher-orhimselfwhattoprescribeoradvisebasedonthesestudies.Soaknowledgeofmedicalstatisticsisoneofthemostusefulthingsanydoctorcouldacquireduringherorhistraining.
![Page 14: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/14.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>2-Thedesignofexperiments
2
Thedesignofexperiments
2.1ComparingtreatmentsTherearetwobroadtypesofstudyinmedicalresearch:observationalandexperimental.Inobservationalstudies,aspectsofanexistingsituationareobserved,asinasurveyoraclinicalcasereport.Wethentrytointerpretourdatatogiveanexplanationofhowtheobservedstateofaffairshascomeabout.Inexperimentalstudies,wedosomething,suchasgivingadrug,sothatwecanobservetheresultofouraction.Thischapterisconcernedwiththewaystatisticalthinkingisinvolvedinthedesignofexperiments.Inparticular,itdealswithcomparativeexperimentswherewewishtostudythedifferencebetweentheeffectsoftwoormoretreatments.Theseexperimentsmaybecarriedoutinthelaboratoryinvitrooronanimalsorhumanvolunteers,inthehospitalorcommunityonhumanpatients,or,fortrialsofpreventiveinterventions,oncurrentlyhealthypeople.Wecalltrialsoftreatmentsonhumansubjectsclinicaltrials.Thegeneralprinciplesofexperimentaldesignarethesame,althoughtherearespecialprecautionswhichmustbetakenwhenexperimentingwithhumansubjects.Theexperimentswhoseresultsmostconcerncliniciansareclinicaltrials,sothediscussionwilldealmainlywiththem.
Supposewewanttoknowwhetheranewtreatmentismoreeffectivethanthepresentstandardtreatment.Wecouldapproachthisinanumberofways.
First,wecouldcomparetheresultsofthenewtreatmentonnewpatientswithrecordsofpreviousresultsusingtheoldtreatment.Thisisseldomconvincing,becausetheremaybemanydifferencesbetween
![Page 15: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/15.jpg)
thepatientswhoreceivedtheoldtreatmentandthepatientswhowillreceivethenew.Astimepasses,thegeneralpopulationfromwhichpatientscomemaybecomehealthier,standardsofancillarytreatmentandnursingcaremayimprove,orthesocialmixinthecatchmentareaofthehospitalmaychange.Thenatureofthediseaseitselfmaychange.Allthesefactorsmayproducechangesinthepatients'apparentresponsetotreatment.Forexample,Christie(1979)showedthisbystudyingthesurvivalofstrokepatientsin1978,aftertheintroductionofaC-Theadscanner,withthatofpatientstreatedin1974,beforetheintroductionofthescanner.Hetooktherecordsofagroupofpatientstreatedin1978,whoreceivedaC-Tscan,andmatchedeachofthemwithapatienttreatedin1974ofthesameage,diagnosisandlevelofconsciousnessonadmission.AsthefirstcolumnofTable2.1shows,patientsin1978clearlytendedtohavebettersurvivalthansimilarpatientsin1974.
Thescanned1978patientdidbetterthantheunscanned1974patientin31%ofpairs.whereastheunscanned1974patientdidbetterthatthescanned1978patientinonly7%ofpairs.However,healsocomparedthesurvivalofpatientsin1978whodidnotreceiveaC-Tscanwithmatchedpatientsin1974.Thesepatientstooshowedamarkedimprovementinsurvivalfrom1974to1978(Table2.1).The1978patientsdidbetterin38%ofpairsandthe1974patientsinonly19%ofpairs.Therewasageneralimprovementinoutcomeoverafairlyshortperiodoftime.Ifwedidnothavethedataontheunscannedpatientsfrom1978wemightbetemptedtointerpretthesedataasevidencefortheeffectivenessoftheC-Tscanner.Historicalcontrolslikethisareseldomveryconvincing,andusuallyfavourthenewtreatment.Weneedtocomparetheoldandnewtreatmentsconcurrently.
Table2.1.Analysisofthedifferenceinsurvivalformatchedpairsofstrokepatients(Christie1979)
C-Tscanin NoC-Tscanin
![Page 16: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/16.jpg)
1978 1978
Pairswith1978betterthan1974
9(31%) 34(38%)
Pairswithsameoutcome
18(62%) 38(43%)
Pairswith1978worsethan1974
2(7%) 17(19%)
Second,wecouldobtainconcurrentgroupsbycomparingourownpatients,giventhenewtreatment,withpatientsgiventhestandardtreatmentinanotherhospitalorclinic,orbyanotherclinicianinourowninstitution.Again,theremaybedifferencesbetweenthepatientgroupsduetocatchment,diagnosticaccuracy,preferencebypatientsforaparticularclinician,oryoumightjustbeabettertherapist.Wecannotseparatethesedifferencesfromthetreatmenteffect.
Third,wecouldaskpeopletovolunteerforthenewtreatmentandgivethestandardtreatmenttothosewhodonotvolunteer.Thedifficultyhereisthatpeoplewhovolunteerandpeoplewhodonotvolunteerarelikelytobedifferentinmanywaysapartfromthetreatmentswegivethem.Theymightbemorelikelytofollowmedicaladvice,forexample.Wewillconsideranexampleoftheeffectsofvolunteerbiasin§2.4.
Fourth,wecanallocatepatientstothenewtreatmentorthestandardtreatmentandobservetheoutcome.Thewayinwhichpatientsareallocatedtotreatmentscaninfluencetheresultsenormously,asthefollowingexample(Hill1962)shows.Between1927and1944aseriesoftrialsofBCGvaccinewerecarriedoutinNewYork(LevineandSackett1946).ChildrenfromfamilieswheretherewasacaseoftuberculosiswereallocatedtoavaccinationgroupandgivenBCGvaccine,ortoacontrolgroupwhowerenotvaccinated.Between1927and1932
![Page 17: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/17.jpg)
physiciansvaccinatedhalfthechildren,thechoiceofwhichchildrentovaccinatebeinglefttothem.TherewasaclearadvantageinsurvivalfortheBCGgroup(Table2.2).However,therewasalsoacleartendencyforthephysiciantovaccinatethechildrenofmorecooperativeparents,andtoleavethoseoflesscooperativeparentsascontrols.From1933allocationtotreatmentorcontrolwasdonecentrally,alternatechildrenbeingassignedtocontrolandvaccine.
Thedifferenceindegreeofcooperationbetweentheparentsofthetwogroupsofchildrendisappeared,andsodidthedifferenceinmortality.Notethatthesewereaspecialgroupofchildren,fromfamilieswheretherewastuberculosis.Inlargetrialsusingchildrendrawnfromthegeneralpopulation,BCGwasshowntobeeffectiveingreatlyreducingdeathsfromtuberculosis(HartandSutherland1977)
Table2.2.ResultsofstudiesofBCGvaccineinNewYorkCity(Hill1962)
Periodoftrial
No.ofchildren
No.ofdeathsfromTB
Deathrate
Averageno.ofvisitstoclinicduring1styear
offollow-up
Proportionofparentsgivinggoodcooperationasjudgedbyvisitingnurses
1927–32Selectionmadebyphysician
BCGgroup
445 3 0.67% 3.6 43%
![Page 18: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/18.jpg)
Controlgroup
545 18 3.30% 1.7 24%
1933–44Alternativeselectioncarriedoutcentrally
BCGgroup
566 8 1.41% 2.8 40%
Controlgroup
528 8 1.52% 2.4 34%
Differentmethodsofallocationtotreatmentcanproducedifferentresults.Thisisbecausethemethodofallocationmaynotproducegroupsofsubjectswhicharecomparable,similarineveryrespectexceptthetreatment.Weneedamethodofallocationtotreatmentsinwhichthecharacteristicsofsubjectswillnotaffecttheirchanceofbeingputintoanyparticulargroup.Thiscanbedoneusingrandomallocation.
2.2RandomallocationIfwewanttodecidewhichoftwopeoplereceiveanadvantage,insuchawaythateachhasanequalchanceofreceivingit,wecanuseasimple,widelyacceptedmethod.Wetossacoin.Thisisusedtodecidethewayfootballmatchesbegin,forexample,andallappeartoagreethatitisfair.Soifwewanttodecidewhichoftwosubjectsshouldreceiveavaccine,wecantossacoin.Headsandthefirstsubjectreceivesthevaccine,tailsandthesecondreceivesit.Ifwedothisforeachpairofsubjectswebuilduptwogroupswhichhavebeenassembledwithoutanycharacteristicsofthesubjectsthemselvesbeinginvolvedintheallocation.Theonlydifferencesbetweenthegroupswillbethoseduetochance.Asweshallseelater(Chapters8and9),statisticalmethodsenableustomeasurethelikelyeffectsofchance.Anydifferencebetweenthegroupswhichislargerthanthisshouldbe
![Page 19: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/19.jpg)
duetothetreatment,sincetherewillbenootherdifferencesbetweenthegroups.Thismethodofdividingsubjectsintogroupsiscalledrandomallocationorrandomization.
Severalmethodsofrandomizinghavebeeninuseforcenturies,includingcoins,dice,cards,lots,andspinningwheels.Someofthetheoryofprobabilitywhichweshalluselatertocomparerandomizedgroupswasfirstdevelopedas
anaidtogambling.Forlargerandomizationsweuseadifferent,non-physicalrandomizingmethod:randomnumbertables.Table2.3providesanexample,atableof1000randomdigits.Thesearemoreproperlycalledpseudo-randomnumbers,astheyaregeneratedbyamathematicalprocess.Theyareavailableintables(KendallandBabingtonSmith1971)orcanbeproducedbycomputerandsomecalculators.Wecanusetablesofrandomnumbersinseveralwaystoachieverandomallocation.Forexample,letusrandomlyallocate20subjectstotwogroups,whichIshalllabelAandB.Wechoosearandomstartingpointinthetable,usingoneofthephysicalmethodsdescribedabove.(Iuseddecimaldice.Theseare20-sideddice,numbered0to9twice,whichfitournumbersystemmoreconvenientlythanthetraditionalcube.Twosuchdicegivearandomnumberbetween1and100,counting‘0,0’as100.)Therandomstartingpointwasrow22,column20,andthefirst20digitswere3,4,6,2,9,7,5,3,2,6,9,7,9,3,9,2,3,3,2and4.WenowallocatesubjectscorrespondingtoodddigitstogroupAandthosecorrespondingtoevendigitstoB.Thefirstdigit,3,isodd,sothefirstsubjectgoesintogroupA.Theseconddigit,4,iseven,sothesecondsubjectgoesintogroupB,andsoon.WegettheallocationshowninTable2.4.WecouldallocateintothreegroupsbyassigningtoAifthedigitis1,2,or3,toBif4,5,or6,andtoCif7,8,or9,ignoring0.Therearemanypossibilities.
Table2.3.The1000randomdigits
Column
![Page 20: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/20.jpg)
Row1–4 5–8 9–
1213–16
17–20
21–24
25–28
29–32
33–36
37–40
1 3645
8831
2873
5943
4632
0032
6715
3249
5455
7517
2 9051
4066
1846
9554
6589
1680
9533
1588
1860
5646
3 9841
9022
4837
8031
9139
3380
4082
3826
2039
7182
4 5525
7127
1468
6404
9924
8230
7343
9268
1899
4754
5 0299
1075
7721
8855
7997
7032
5987
7535
1834
6253
6 7985
5566
6384
0863
0400
1834
5394
5801
5505
9099
7 3353
9528
0681
3495
1393
3716
9506
1591
8999
3716
8 7475
1313
2216
3776
1557
4238
9623
9024
5826
7146
9 0666
3043
0066
3260
3660
4605
1731
6680
9101
6235
![Page 21: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/21.jpg)
10 9283
3160
8730
7683
1785
3148
1323
1732
6814
8496
11 6121
3149
9829
7770
7211
3523
6947
1427
1474
5235
12 2782
0101
7441
3877
5368
5326
5516
3566
3187
8209
13 6105
5010
9485
8632
1072
9567
8821
7209
4873
0397
14 1157
8567
9491
4948
3549
3941
8017
5445
2366
8260
15 1516
0890
9286
1332
2601
2002
7245
9474
9719
9946
16 2209
2966
1544
7674
9492
4813
7585
8128
9541
3630
17 6913
5355
3587
4323
8332
7940
9220
8376
8261
2420
18 0829
7937
0033
3534
8655
1091
1886
4350
6779
3358
19 3729
9985
5563
3266
7198
8520
3193
6391
7721
9962
![Page 22: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/22.jpg)
20 6511
1404
8886
2892
0403
4299
8708
2055
3053
8224
21 6622
8158
3080
2110
1553
2690
3377
5119
1749
2714
22 3721
7713
6931
2022
6713
4629
7532
6979
3923
3243
23 5143
0972
6838
0577
1462
8907
3789
2530
9209
0692
24 3159
3783
9255
1531
2124
0393
3597
8461
9685
4551
25 7905
4369
5293
0077
4482
9165
1171
2537
8913
6387
Table2.4.Allocationof20subjectstotwogroups
Subject Digit Group
1 3 A
2 4 B
3 6 B
![Page 23: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/23.jpg)
4 2 B
5 9 A
6 7 A
7 5 A
8 3 A
9 2 B
10 6 B
11 9 A
12 7 A
13 9 A
14 3 A
15 9 A
16 2 B
17 3 A
18 3 A
![Page 24: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/24.jpg)
19 2 B
20 4 B
Thesystemdescribedabovegaveusunequalnumbersinthetwogroups,12inAand8inB.Wesometimeswantthegroupstobeofequalsize.OnewaytodothiswouldbetoproceedasaboveuntileitherAorBhas10subjectsinit,alltheremainingsubjectsgoingintotheothergroups.ThisissatisfactoryinthateachsubjecthasanequalchanceofbeingallocatedtoAorB,butithasadisadvantage.Thereisatendencyforthelastfewsubjectsalltohavethesametreatment.Thischaracteristicsometimesworriesresearchers,whofeelthattherandomizationisnotquiteright.Instatisticaltermsthepossibleallocationsarenotequallylikely.Ifweusethismethodfortherandomallocationdescribedabove,the10thsubjectingroupAwouldbereachedatsubject15andthelastfivesubjectswouldallbeingroupB.Wecanensurethatallrandomizationsareequallylikelybyusingthetableofrandomnumbersinadifferentway.Forexample,wecanusethetabletodrawarandomsampleof10subjectsfrom20,asdescribedin§3.4.ThesewouldformgroupA,andtheremaining10groupB.Anotherwayistoputoursubjectsintosmallequal-sizedgroups,calledblocks,andwithineachblocktoallocateequalnumberstoAandB.Thisgivesapproximatelyequalnumbersonthetwotreatmentsandwilldosowheneverthetrialstops.
Theuseofrandomnumbersandthegenerationoftherandomnumbersthemselvesaresimplemathematicaloperationswellsuitedtothecomputerswhicharenowreadilyavailabletoresearchers.Itisveryeasytoprogramacomputertocarryoutrandomallocation,andonceaprogramisavailableitcanbeusedoverandoveragainforfurtherexperiments.MyprogramClinstat(§1.3)doesseveraldifferentrandomizationschemes,evenofferingblocksofrandomsize.
ThetrialcarriedoutbytheMedicalResearchCouncil(MRC1948)totesttheefficacyofstreptomycinforthetreatmentofpulmonary
![Page 25: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/25.jpg)
tuberculosisisgenerallyconsideredtohavebeenthefirstrandomizedexperimentinmedicine.Inthisstudythetargetpopulationwaspatientswithacuteprogressivebilateralpulmonarytuberculosis,aged15–30years.Allcaseswerebacteriologicallyprovedandwereconsideredunsuitableforothertreatmentsthenavailable.Thetrialtookplaceinthreecentresandallocationwasbyaseriesofrandomnumbers,drawnupforeachsexateachcentre.Thestreptomycingroupcontained55
patientsandthecontrolgroup52cases.TheconditionofthepatientsonadmissionisshowninTable2.5.Thefrequencydistributionsoftemperatureandsedimentationrateweresimilarforthetwogroups;ifanythingthetreated(S)groupwereslightlyworse.However,thisdifferenceisnogreaterthancouldhavearisenbychance,which,ofcourse,ishowitarose.Thetwogroupsarecertaintobeslightlydifferentinsomecharacteristics,especiallywithafairlysmallsample,andwecantakeaccountofthisintheanalysis(Chapter17).
Table2.5.Conditionofpatientsonadmissiontotrialofstreptomycin(MRC1948)
Group
S C
Generalcondition Good 8 8
Fair 17 20
Poor 30 24
Max.eveningtemperaturein 98- 4 4
![Page 26: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/26.jpg)
firstweek(°F) 98.9
99-99.9
13 12
100-100.9
15 17
101+ 24 19
Sedimentationrate 0-10 0 0
11-20 3 2
21-50 16 20
51+ 36 29
Table2.6.SurvivalatsixmonthsintheMRCstreptomycintrial,stratifiedbyinitialcondition
(MRC1948)
Maximumeveningtemperatureduringfirst
observationweek
Outcome
Group
Streptomycingroup
Controlgroup
98-98.9°F Alive 3 4
![Page 27: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/27.jpg)
Dead 0 0
99-99.9°F Alive 13 11
Dead 0 1
100-100.9°F Alive 15 12
Dead 0 5
101°Fandabove Alive 20 11
Dead 4 8
Aftersixmonths,93%oftheSgroupsurvived,comparedto73%ofthecontrolgroup.Therewasaclearadvantagetothestreptomycingroup.TherelationshipofsurvivaltoinitialconditionisshowninTable2.6.Survivalwasmorelikelyforpatientswithlowertemperatures,butthedifferenceinsurvivalbetweentheSandCgroupsisclearlypresentwithineachtemperaturecategorywheredeathsoccurred.
Randomizedtrialsarenotrestrictedtotwotreatments.Wecancompareseveraltreatments.Adrugtrialmightincludethenewdrug,arivaldrug,and
nodrugatall.Wecancarryoutexperimentstocompareseveralfactorsatonce.Forexample,wemightwishtostudytheeffectofadrugatdifferentdosesinthepresenceorabsenceofaseconddrug,withthesubjectstandingorsupine.Thisisusuallydesignedasafactorialexperiment,whereeverypossiblecombinationoftreatmentsisused.Thesedesignsareunusualinclinicalresearchbutaresometimesusedinlaboratorywork.Theyaredescribedinmoreadvancedtexts
![Page 28: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/28.jpg)
(ArmitageandBerry1994,SnedecorandCochran1980).Formoreonrandomizedtrialsingeneral,seePocock(1983)andJohnsonandJohnson(1977).
Randomizedexperimentationmaybecriticizedbecausewearewithholdingapotentiallybeneficialtreatmentfrompatients.Anybiologicallyactivetreatmentispotentiallyharmful,however,andwearesurelynotjustifiedingivingpotentiallyharmfultreatmentstopatientsbeforethebenefitshavebeendemonstratedconclusively.Withoutproperlyconductedcontrolledclinicaltrialstosupportit,eachadministrationofatreatmenttoapatientbecomesanuncontrolledexperiment,whoseoutcome,goodorbad,cannotbepredicted.
2.3*MethodsofallocationwithoutrandomnumbersInthesecondstageoftheNewYorkstudiesofBCGvaccine,thechildrenwereallocatedtotreatmentorcontrolalternately.Researchersoftenaskwhythismethodcannotbeusedinsteadofrandomization,arguingthattheorderinwhichpatientsarriveisrandom,sothegroupsthusformedwillbecomparable.First,althoughthepatientsmayappeartobeinarandomorder,thereisnoguaranteethatthisisthecase.Wecouldneverbesurethatthegroupsarecomparable.Second,thismethodisverysusceptibletomistakes,oreventocheatinginthepatients'perceivedinterest.Theexperimenterknowswhattreatmentthesubjectwillreceivebeforethesubjectisadmittedtothetrial.Thisknowledgemayinfluencethedecisiontoadmitthesubject,andsoleadtobiasedgroups.Forexample,anexperimentermightbemorepreparedtoadmitafrailpatientifthepatientwillbeonthecontroltreatmentthanifthepatientwouldbeexposedtotheriskofthenewtreatment.Thisobjectionappliestousingthelastdigitofthehospitalnumberforallocation.
Knowledgeofwhattreatmentthenextpatientwillreceivecancertainlyleadtobias.Forexample,Schulzetal.(1995)lookedat250controlledtrials.Theycomparedtrialswheretreatmentallocationwasnotadequatelyconcealedfromresearcherswithtrialswheretherewasadequatelyconcealment.Theyfoundanaveragetreatmenteffect41%largerinthetrialswithinadequateconcealment.
![Page 29: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/29.jpg)
Thereareseveralexamplesreportedintheliteratureofalterationstotreatmentallocations.Holten(1951)reportedatrialofanticoagulanttherapyforpatientswithcoronarythrombosis.Patientswhopresentedonevendatesweretobetreatedandpatientsarrivingonodddatesweretoformthecontrolgroup.Theauthorreportsthatsomeofthecliniciansinvolvedfoundit‘difficulttoremember’thecriterionforallocation.Overallthetreatedpatientsdidbetterthanthecontrols(Table2.7).Curiously,thecontrolsontheevendates(wronglyallocated)didconsiderablybetterthancontrolpatientsontheodddates(correctly
allocated)andevenmanagedtodomarginallybetterthanthosewhoreceivedthetreatment.Thebestoutcome,treatedornot,wasforthosewhowereincorrectlyallocated.Allocationinthistrialappearstohavebeenratherselective.
Table2.7.Outcomeofaclinicaltrialusingsystematicallocation,witherrorsinallocation
(Holten1951)
OutcomeEvendates Odddates
Treated Control Treated Control
Survived 125 39 10 125
Died 39(25%) 11(22%) 0(0%) 81(36%)
Total 164 50 10 206
Othermethodsofallocationsetouttoberandombutcanfallintothissortofdifficulty.Forexample,wecouldusephysicalmixingtoachieve
![Page 30: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/30.jpg)
randomization.Thisisquitedifficulttodo.Asanexperiment,takeadeckofcardsandordertheminsuitsfromaceofclubstokingofspades.Nowshufflethemintheusualwayandexaminethem.Youwillprobablyseemanyrunsofseveralcardswhichremaintogetherinorder.Cardsmustbeshuffledverythoroughlyindeedbeforetheorderingceasestobeapparent.Thephysicalrandomizationmethodcanbeappliedtoanexperimentbymarkingequalnumbersonslipsofpaperwiththenamesofthetreatments,sealingthemintoenvelopesandshufflingthem.Thetreatmentforasubjectisdecidedbywithdrawinganenvelope.ThismethodwasusedinanotherstudyofanticoagulanttherapybyCarletonetal.(1960).Theseauthorsreportedthatinthelatterstagesofthetrialsomeofthecliniciansinvolvedhadattemptedtoreadthecontentsoftheenvelopesbyholdingthemuptothelight,inordertoallocatepatientstotheirownpreferredtreatment.
Interferingwiththerandomizationcanactuallybebuiltintotheallocationprocedure,withequallydisastrousresults.IntheLanarkshireMilkExperiment,discussedbyStudent(1931),10000schoolchildrenreceivedthreequartersofapintofmilkperdayand10000childrenactedascontrols.Thechildrenwereweighedandmeasuredatthebeginningandendofthesix-monthexperiment.Theobjectwastoseewhetherthemilkimprovedthegrowthofchildren.Theallocationtothe‘milk’orcontrolgroupwasdoneasfollows:
Theteachersselectedthetwoclassesofpupils,thosegettingmilkandthoseactingascontrols,intwodifferentways.Incertaincasestheyselectedthembyballotandinothersonanalphabeticalsystem.Inanyparticularschoolwheretherewasanygrouptowhichthesemethodshadgivenanundueproportionofwell-fedorill-nourishedchildren,othersweresubstitutedtoobtainamorelevelselection.
Theresultofthiswasthatthecontrolgrouphadamarkedlygreateraverageheightandweightatthestartoftheexperimentthandidthemilkgroup.Studentinterpretedthisasfollows:
Presumablythisdiscriminationinheightandweightwasnotmadedeliberately,butitwouldseemprobablethattheteachers,swayedbytheveryhumanfeelingthatthepoorerchildrenneededthemilkmore
![Page 31: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/31.jpg)
thanthecomparativelywelltodo,musthaveunconsciouslymadetoolargeasubstitutionfortheill-nourishedamongthe(milkgroup)andtoofewamongthecontrolsandthatthisunconsciousselectionaffected
secondarily,bothmeasurements.
Whetherthebiaswasconsciousornot,itspoiledtheexperiment,despitebeingfromthebestpossiblemotives.
Thereisonenon-randommethodwhichcanbeusedsuccessfullyinclinicaltrials:minimization.Inthismethod,newsubjectsareallocatedtotreatmentssoastomakethetreatmentgroupsassimilaraspossibleintermsoftheimportantprognosticfactors.Itisbeyondthescopeofthisbook,butseePocock(1983)foradescription.
2.4VolunteerbiasPeoplewhovolunteerfornewtreatmentsandthosewhorefusethemmaybeverydifferent.AnillustrationisprovidedbythefieldtrialofSalkpoliomyelitisvaccinecarriedoutin1954intheUSA(Meier1977).Thiswascarriedoutusingtwodifferentdesignssimultaneously,duetoadisputeaboutthecorrectmethod.Insomedistricts,secondgradeschoolchildrenwereinvitedtoparticipateinthetrial,andrandomlyallocatedtoreceivevaccineoraninertsalineinjection.Inotherdistricts,allsecondgradechildrenwereofferedvaccinationandthefirstandthirdgradeleftunvaccinatedascontrols.Theargumentagainstthis‘observedcontrol’approachwasthatthegroupsmaynotbecomparable,whereastheargumentagainsttherandomizedcontrolmethodwasthatthesalineinjectioncouldprovokeparalysisininfectedchildren.TheresultsareshowninTable2.8.Intherandomizedcontrolareasthevaccinatedgroupclearlyexperiencedfarlesspoliothanthecontrolgroup.Sincethesewererandomlyallocated,theonlydifferencebetweenthemshouldbethetreatment,whichisclearlypreferabletosaline.However,thecontrolgroupalsohadmorepoliothanthosewhohadrefusedtoparticipateinthetrial.Thedifferencebetweenthecontrolandnotinoculatedgroupisinbothtreatment(salineinjection)andselection;theyareself-selectedasvolunteersandrefusers.Theobservedcontrolareasenableustodistinguishbetweenthesetwofactors.Thepolioratesinthevaccinatedchildren
![Page 32: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/32.jpg)
areverysimilarinbothpartsofthestudy,asaretheratesinthenotinoculatedsecondgradechildren.Itisthetwocontrolgroupswhichdiffer.Thesewereselectedindifferentways:intherandomizedcontrolareastheywerevolunteers,whereasintheobservedcontrolareastheywereeverybodyeligible,bothpotentialvolunteersandpotentialrefusers.Nowsupposethatthevaccineweresalineinstead,andthattherandomizedvaccinatedchildrenhadthesamepolioexperienceasthosereceivingsaline.Wewouldexpect200745×57/100000=114cases,insteadofthe33observed.Thetotalnumberofcasesintherandomizedareaswouldbe114+115+121=350andtherateper100000wouldbe47.Thiscomparesverycloselywiththerateof46intheobservedcontrolfirstandthirdgradegroup.Thusitseemsthattheprincipaldifferencebetweenthesalinecontrolgroupofvolunteersandthenotinoculatedgroupofrefusersisselection,nottreatment.
Thereisasimpleexplanationofthis.Polioisaviraldiseasetransmittedbythefaecal—oralroute.Beforethedevelopmentofvaccinealmosteveryoneinthe
populationwasexposedtoitatsometime,usuallyinchildhood.Inthemajorityofcases,paralysisdoesnotresultandimmunityisconferredwithoutthechildbeingawareofhavingbeenexposedtopolio.Inasmallminorityofcases,about1in200,paralysisordeathoccursandadiagnosisofpolioismade.Theoldertheexposedindividualis,thegreaterthechanceofparalysisdeveloping.Hence,childrenwhoareprotectedfrominfectionbyhighstandardsofhygienearelikelytobeolderwhentheyarefirstexposedtopoliothanthosechildrenfromhomeswithlowstandardsofhygiene,andthusmorelikelytodeveloptheclinicaldisease.Therearemanyfactorswhichmayinfluenceparentsintheirdecisionastowhethertovolunteerorrefusetheirchildforavaccinetrial.Thesemayincludeeducation,personalexperience,currentillness,andothers,butcertainlyincludeinterestinhealthandhygiene.Thusinthistrialthehighriskchildrentendedtobevolunteeredandthelowriskchildrentendedtoberefused.Thehigherriskvolunteercontrolchildrenexperienced57casesofpolioper100000,comparedto36per100000amongthelowerriskrefusers.
![Page 33: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/33.jpg)
Table2.8.ResultofthefieldtrialofSalkpoliomyelitisvaccine(Meier1977)
Studygroup Numberingroup
Paralyticpolio
Numberofcases
Rateper100000
Randomizedcontrol
Vaccinated 200745 33 16
Control 201229 115 57
Notinoculated 338778 121 36
Observedcontrol
Vaccinated2ndgrade
221998 38 17
Control1stand3rdgrade
725173 330 46
Unvaccinated2ndgrade
123605 43 35
Inmostdiseases,theeffectofvolunteerbiasisoppositetothis.Poorconditionsarerelatedbothtorefusaltoparticipateandtohighrisk,
![Page 34: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/34.jpg)
whereasvolunteerstendtobelowrisk.Theeffectofvolunteerbiasisthentoproduceanapparentdifferenceinfavourofthetreatment.Wecanseethatcomparisonsbetweenvolunteersandothergroupscanneverbereliableindicatorsoftreatmenteffects.
2.5IntentiontotreatIntheobservedcontrolareasoftheSalktrial(Table2.8),quiteapartfromthenon-randomagedifference,thevaccinatedandcontrolgroupsarenotcomparable.However,itispossibletomakeareasonablecomparisoninthisstudybycomparingallsecondgradechildren,bothvaccinatedandrefused,tothecontrolgroup.Therateinthesecondgradechildrenis23per100000,whichislessthantherateof46inthecontrolgroup,demonstratingtheeffectivenessofthevaccine.The‘treatment’whichweareevaluatingisnotvaccinationitself,butapolicyofofferingvaccinationandtreatingthosewhoaccept.Asimilarproblemcanariseinarandomizedtrial,forexampleinevaluatingtheeffectiveness
ofhealthcheckups(South-eastLondonScreeningStudyGroup1977).Subjectswererandomizedtoascreeninggrouportoacontrolgroup.Thescreeninggroupwereinvitedtoattendforanexamination,someacceptedandwerescreenedandsomerefused.Whencomparingtheresultsintermsofsubsequentmortality,itwasessentialtocomparethecontrolstothescreeninggroupscontainingbothscreenedandrefusers.Forexample,therefusersmayhaveincludedpeoplewhowerealreadytooilltocomeforscreening.Theimportantpointisthattherandomallocationprocedureproducescomparablegroupsanditisthesewemustcompare,whateverselectionmaybemadewithinthem.Wethereforeanalysethedataaccordingtothewayweintendedtotreatsubjects,notthewayinwhichtheywereactuallytreated.Thisisanalysisbyintentiontotreat.Thealternative,analysingbytreatmentactuallyreceived,iscalledontreatmentanalysis.
Analysisbyintentiontotreatisnotfreeofbias.Assomepatientsmayreceivetheothergroup'streatment,thedifferencemaybesmallerthanitshouldbe.Weknowthatthereisabiasandweknowthatitwillmakethetreatmentdifferencesmaller,byanunknownamount.On
![Page 35: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/35.jpg)
treatmentanalyses,ontheotherhand,arebiasedinfavourofshowingadifference,whetherthereisoneornot.Statisticianscallmethodswhicharebiasedagainstfindinganyeffectconservative.Ifwemusterr,weliketodosointheconservativedirection.
2.6Cross-overdesignsSometimesitispossibletouseasubjectasherorhisowncontrol.Forexample,whencomparinganalgesicsinthetreatmentofarthritis,patientsmayreceiveinsuccessionanewdrugandacontroltreatment.Theresponsetothetwotreatmentscanthenbecomparedforeachpatient.Thesedesignshavetheadvantageofremovingvariabilitybetweensubjects.Wecancarryoutatrialwithfewersubjectsthanwouldbeneededforatwogrouptrial.
Althoughallsubjectsreceivealltreatments,thesetrialsmuststillberandomized.Inthesimplestcaseoftreatmentandcontrol,patientsmaybegiventwodifferentregimes:controlfollowedbytreatmentortreatmentfollowedbycontrol.Thesemaynotgivethesameresults,e.g.theremaybealong-termcarry-overeffectortimetrendwhichmakestreatmentfollowedbycontrolshowlessofadifferencethancontrolfollowedbytreatment.Subjectsare,therefore,assignedtoagivenorderatrandom.Itispossibleintheanalysisofcross-overstudiestoestimatethesizeofanycarry-overeffectswhichmaybepresent.
Asanexampleoftheadvantagesofacross-overtrial,consideratrialofpronethalolinthetreatmentofanginapectoris(Pritchardetal.1963).Anginapectorisisachronicdiseasecharacterizedbyattacksofacutepain.Patientsinthistrialreceivedeitherpronethaloloraninertcontroltreatment(orplacebo,see§2.8)infourperiodsoftwoweeks,twoperiodsonthedrugandtwoonthecontroltreatment.Theseperiodswereinrandomorder.Theoutcomemeasurewasthenumberofattacksofanginaexperienced.Thesewererecordedbythepatientinadiary.Twelvepatientstookpartinthetrial.Theresultsareshown
inTable2.9.Theadvantageinfavourofpronethalolisshownby11ofthe12patientsreportingfewerattacksofpainwhileonpronethalolthanwhileonthecontroltreatment.Ifwehadobtainedthesamedatafromtwoseparategroupsofpatientsinsteadofthesamegroupunder
![Page 36: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/36.jpg)
twoconditions,itwouldbefarfromclearthatpronethalolissuperiorbecauseofthehugevariationbetweensubjects.Usingatwogroupdesign,wewouldneedamuchlargersampleofpatientstodemonstratetheefficacyofthetreatment.
Table2.9.Resultsofatrialofpronethalolforthetreatmentofanginapectoris(Pritchardetal.1963)
Patientnumber
Numberofattackswhileon
Differenceplacebo–pronethalolPlacebo Pronethalol
1 71 29 42
2 323 348 –25
3 8 1 7
4 14 7 7
5 23 16 7
6 34 25 9
7 79 65 14
8 60 41 19
![Page 37: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/37.jpg)
9 2 0 2
10 3 0 3
11 17 15 2
12 7 2 5
Cross-overdesignscanbeusefulforlaboratoryexperimentsonanimalsorhumanvolunteers.Theyshouldonlybeusedinclinicaltrialswherethetreatmentwillnotaffectthecourseofthediseaseandwherethepatient'sconditionwouldnotchangeappreciablyoverthecourseofthetrial.Across-overtrialcouldbeusedtocomparedifferenttreatmentsforthecontrolofarthritisorasthma,forexample,butnottocomparedifferentregimesforthemanagementofmyocardialinfarction.However,across-overtrialcannotbeusedtodemonstratethelong-termactionofatreatment,asthenatureofthedesignmeansthatthetreatmentperiodmustbelimited.Asmosttreatmentsofchronicdiseasemustbeusedbythepatientforalongtime,atwosampletrialoflongdurationisusuallyrequiredtoinvestigatefullytheeffectivenessofthetreatment.Pronethalol,forexample,waslaterfoundtohavequiteunacceptablesideeffectsinlongtermuse.
Formoreoncross-overtrials,seeSenn(1993)andJonesandKenward(1989).
2.7SelectionofsubjectsforclinicaltrialsIhavediscussedtheallocationofsubjectstotreatmentsatsomelength,butwehavenotconsideredwheretheycomefrom.Thewayinwhichsubjectsareselectedforanexperimentmayhaveaneffectonitsoutcome.Inpractice,weareusuallylimitedtosubjectswhichareeasilyavailabletous.Forexample,inananimalexperimentwemusttakethelatestbatchfromtheanimalhouse.Inaclinicaltrialofthetreatmentofmyocardialinfarction,wemustbecontentwithpatients
![Page 38: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/38.jpg)
whoarebroughtintothehospital.Inexperimentsonhumanvolunteers
wesometimeshavetousetheresearchersthemselves.
AsweshallseemorefullyinChapter3,thishasimportantconsequencesfortheinterpretationofresults.Intrialsofmyocardialinfarction,forexample,wewouldnotwishtoconcludethat,say,thesurvivalratewithanewtreatmentinatrialinLondonwouldbethesameasinatrialinEdinburgh.Thepatientsmayhaveadifferenthistoryofdiet,forexample,andthismayhaveaconsiderableeffectonthestateoftheirarteriesandhenceontheirprognosis.Indeed,itwouldbeveryrashtosupposethatwewouldgetthesamesurvivalrateinahospitalamiledowntheroad.Whatwerelyonisthecomparisonbetweenrandomizedgroupsfromthesamepopulationofsubjects,andhopethatifatreatmentreducesmortalityinLondonitwillalsodosoinEdinburgh.Thismaybeareasonablesupposition,andeffectswhichappearinonepopulationarelikelytoappearinanother,butitcannotbeprovedonstatisticalgroundsalone.Sometimesinextremecasesitturnsoutnottobetrue.BCGvaccinehasbeenshown,bylarge,wellconductedrandomizedtrials,tobeeffectiveinreducingtheincidenceoftuberculosisinchildrenintheUK.However,inIndiaitappearstobefarlesseffective(Lancet1980).Thismaybebecausetheamountofexposuretotuberculosisissodifferentinthetwopopulations.
Giventhatwecanuseonlytheexperimentalsubjectsavailabletous,therearesomeprincipleswhichweusetoguideourselectionfromthem.Asweshallseelater,thelowerthevariabilitybetweenthesubjectsinanexperimentis.thebetterchancewehaveofdetectingatreatmentdifferenceifitexists.Thismeansthatuniformityisdesirableinoursubjects.Inananimalexperimentthiscanbeachievedbyusinganimalsofthesamestrainraisedundercontrolledconditions.Inaclinicaltrialweusuallyrestrictourattentiontopatientsofadefinedagegroupandseverityofdisease.TheSalkvaccinetrial(§2.4)onlyusedchildreninoneschoolyear.Inthestreptomycintrial(§2.2)thesubjectswererestrictedtopatientswithacutebilateralpulmonarytuberculosis,bacteriologicallyproved,agedbetween15and30years,andunsuitableforothercurrenttherapy.Evenwiththisnarrowdefinitiontherewasconsiderablevariationamongthepatients,as
![Page 39: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/39.jpg)
Tables2.5and2.6show.Tuberculosishadtobebacteriologicallyprovedbecauseitisimportanttomakesurethateveryonehasthediseasewewishtotreat.Patientswithadifferentdiseasearenotonlypotentiallybeingwronglytreatedthemselves,butmaymaketheresultsdifficulttointerpret.Restrictingattentiontoaparticularsubsetofpatients,thoughuseful,canleadtodifficulties.Forexample,atreatmentshowntobeeffectiveandsafeinyoungpeoplemaynotnecessarilybesointheelderly.Trialshavetobecarriedoutonthesortofpatientsitisproposedtotreat.
2.8ResponsebiasandplacebosTheknowledgethatsheorheisbeingtreatedmayalterapatient'sresponsetotreatment.Thisiscalledtheplaceboeffect.Aplaceboisapharmacologicallyinactivetreatmentgivenasifitwereanactivetreatment.Thiseffectmaytakemanyforms,fromadesiretopleasethedoctortomeasurablebiochemical
changesinthebrain.Mindandbodyareintimatelyconnected,andunlessthepsychologicaleffectisactuallypartofthetreatmentweusuallytrytoeliminatesuchfactorsfromtreatmentcomparisons.Thisisparticularlyimportantwhenwearedealingwithsubjectiveassessments,suchasofpainorwell-being.
Fig.2.1.Painreliefinrelationtodrugandtocolourofplacebo(afterHuskisson1974)
![Page 40: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/40.jpg)
AfascinatingexampleofthepoweroftheplaceboeffectisgivenbyHuskisson(1974).Threeactiveanalgesics,aspirin,CodisandDistalgesic,werecomparedwithaninertplacebo.Twentytwopatientseachreceivedthefourtreatmentsinacross-overdesign.Thepatientsreportedpainreliefonafourpointscale,from0=noreliefto3=completerelief.Allthetreatmentsproducedsomepainrelief,maximumreliefbeingexperiencedafterabouttwohours(Figure2.1).Thethreeactivetreatmentswereallsuperiortoplacebo,butnotbyverymuch.Thefourdrugtreatmentsweregivenintheformoftabletsidenticalinshapeandsize,buteachdrugwasgiveninfourdifferentcolours.Thiswasdonesothatpatientscoulddistinguishthedrugsreceived,tosaywhichtheypreferred.Eachpatientreceivedfourdifferentcolours,oneforeachdrug,andthecolourcombinationswereallocatedrandomly.Thussomepatientsreceivedredplacebos,someblue,andsoon.AsFigure2.1shows,redplacebosweremarkedlymoreeffectivethanothercolours,andwerejustaseffectiveastheactivedrugs!Inthisstudynotonlyistheeffectofapharmacologicallyinertplaceboinproducingreportedpainreliefdemonstrated,butalsothewidevariabilityandunpredictabilityofthisresponse.Wemustclearlytakeaccountofthisintrialdesign.Incidentally,weshouldnotconcludethatredplacebosalwaysworkbest.Thereis,forexample,someevidencethatpatientsbeingtreatedforanxietyprefertabletstobeinasoothinggreen,anddepressivesymptomsrespondbesttoalivelyyellow(Schapiraetal.1970).
Inanytrialinvolvinghumansubjectsitisdesirablethatthesubjectsshouldnotbeabletotellwhichtreatmentiswhich.Inastudytocomparetwoormoretreatmentsthisshouldbedonebymakingthetreatmentsassimilaraspossible.Wheresubjectsaretoreceivenotreatmentaninactiveplaceboshouldbeusedifpossible.Sometimeswhentwoverydifferentactivetreatmentsarecomparedadoubleplaceboordoubledummycanbeused.Forexample,whencomparingadruggivenasingledosewithadrugtakendailyforsevendays,subjectson
thesingledosedrugmayreceiveadailyplaceboandthoseonthedaily
![Page 41: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/41.jpg)
doseasingleplaceboatthestart.
Placebosarenotalwayspossibleorethical.IntheMRCtrialofstreptomycin.wherethetreatmentinvolvedseveralinjectionsadayforseveralmonths,itwasnotregardedasethicaltodothesamewithaninertsalinesolutionandnoplacebowasgiven.IntheSalkvaccinetrial,theinertsalineinjectionswereplacebos.Itcouldbearguedthatparalyticpolioisnotlikelytorespondtopsychologicalinfluences,buthowcouldwebereallysureofthis?Thecertainknowledgethatachildhadbeenvaccinatedmayhavealteredtheriskofexposuretoinfectionasparentsallowedthechildtogoswimming,forexample.Finally,theuseofaplacebomayalsoreducetheriskofassessmentbiasasweshallseein§2.9.
2.9AssessmentbiasanddoubleblindstudiesTheresponseofsubjectsisnottheonlythingaffectedbyknowledgeofthetreatment.Theassessmentbytheresearcheroftheresponsetotreatmentmayalsobeinfluencedbytheknowledgeofthetreatment.
Someoutcomemeasuresdonotallowformuchbiasonthepartoftheassessor.Forexample,iftheoutcomeissurvivalordeath,thereislittlepossibilitythatunconsciousbiasmayaffecttheobservation.However,ifweareinterestedinanoverallclinicalimpressionofthepatient'sprogress,orinchangesinanX-raypicture,themeasurementmaybeinfluencedbyourdesire(orotherwise)thatthetreatmentshouldsucceed.Itisnotenoughtobeawareofthisdangerandallowforit,aswemayhavethesimilarproblemof‘bendingoverbackwardstobefair’.Evensuchanapparentlyobjectivemeasureasbloodpressurecanbeinfluencedbytheexpectationsoftheexperimenter,andspecialmeasuringequipmenthasbeendevisedtoavoidthis(Roseetal.1964).
Wecanavoidthepossibilityofsuchbiasbyusingblindassessment,thatis,theassessordoesnotknowwhichtreatmentthesubjectisreceiving.Ifaclinicaltrialcannotbeconductedinsuchawaythattheclinicianinchargedoesnotknowthetreatment,blindassessmentcanstillbecarriedoutbyanexternalassessor.Whenthesubjectdoesnotknowthetreatmentandblindassessmentisused,thetrialissaidtobe
![Page 42: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/42.jpg)
doubleblind.(Researchersoneyediseasehatetheterms‘blind’and‘doubleblind’,prefering‘masked’and‘doublemasked’instead.)
Placebosmaybejustasusefulinavoidingassessmentbiasasforresponsebias.Thesubjectisunabletotiptheassessoroffastotreatment,andthereislikelytobelessmaterialevidencetoindicatetoanassessorwhatitis.IntheanticoagulantstudybyCarletonetal.(1960)describedabove,thetreatmentwassuppliedthroughanintravenousdrip.Controlpatientshadadummydripsetup,withatubetapedtothearmbutnoneedleinserted,primarilytoavoidassessmentbias.IntheSalktrial,theinjectionswerecodedandthecodeforacasewasonlybrokenafterthedecisionhadbeenmadeastowhetherthechildhadpolioandifsoofwhatseverity.
Inthestreptomycintrial,oneoftheoutcomemeasureswasradiological
change.X-rayplateswerenumberedandthenassessedbytworadiologistsandaclinician,noneofwhomknewtowhichpatientandtreatmenttheplatebelonged.Theassessmentwasdoneindependently,andtheyonlydiscussedaplateiftheyhadnotallcometothesameconclusion.Onlywhenafinaldecisionhadbeenarrivedatwasthelinkbetweenplateandpatientmade.TheresultsareshowninTable2.10.TheclearadvantageofstreptomycinisshownintheconsiderableimprovementofoverhalftheSgroup,comparedtoonly8%ofthecontrols.
Table2.10.Assessmentofradiologicalappearanceatsixmonthsascomparedwithappearanceon
admission(MRC1948)
Radiologicalassessment S Group C Group
Considerableimprovement 28 51% 4 8%
![Page 43: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/43.jpg)
Moderateorslightimprovement
10 18% 13 25%
Nomaterialchange 2 4% 3 6%
Moderateorslightdeterioration
5 9% 12 23%
Considerabledeterioration 6 11% 6 11%
Deaths 4 7% 14 27%
Total 55 100% 52 100%
2.10*LaboratoryexperimentsSofarwehavelookedatclinicaltrials,butexactlythesameprinciplesapplytolaboratoryresearchonanimals.Itmaywellbethatinthisareatheprinciplesofrandomizationarenotsowellunderstoodandevenmorecriticalattentionisneededfromthereaderofresearchreports.Onereasonforthismaybethatgreatefforthasbeenputintoproducinggeneticallysimilaranimals,raisedinconditionsasclosetouniformasispracticable.Theresearcherusingsuchanimalsassubjectsmayfeelthattheresultinganimalsshowsolittlebiologicalvariabilitythatanynaturaldifferencesbetweenthemwillbedwarfedbythetreatmenteffects.Thisisnotnecessarilyso,asthefollowingexamplesillustrate.
Acolleaguewaslookingattheeffectoftumourgrowthonmacrophagecountsinrats.Theonlysignificantdifferencewasbetweentheinitialvaluesintumourinducedandnon-inducedrats,thatis,beforethetumour-inducingtreatmentwasgiven.Therewasasimpleexplanationforthissurprisingresult.Theoriginaldesignhadbeentogivethe
![Page 44: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/44.jpg)
tumour-inducingtreatmenttoeachofagroupofrats.Somewoulddeveloptumoursandotherswouldnot,andthenthemacrophagecountswouldbecomparedbetweenthetwogroupsthusdefined.Intheevent,alltheratsdevelopedtumours.Inanattempttosalvagetheexperimentmycolleagueobtainedasecondbatchofanimals,whichhedidnottreat,toactascontrols.Thedifferencebetweenthetreatedanduntreatedanimalswasthusduetodifferencesinparentageorenvironment,nottotreatment.
Thatproblemarosebychangingthedesignduringthecourseoftheexperiment.Problemscanarisefromignoringrandomizationinthedesignofacomparativeexperiment.Anothercolleaguewantedtoknowwhetheratreatmentwouldaffectweightgaininmice.Miceweretakenfromacageonebyone
andthetreatmentgiven,untilhalftheanimalshadbeentreated.Thetreatedanimalswereputintosmallercages,fivetoacage,whichwereplacedtogetherinaconstantenvironmentchamber.Thecontrolmicewereincagesalsoplacedtogetherintheconstantenvironmentchamber.Whenthedatawereanalysed,itwasdiscoveredthatthemeaninitialweightswasgreaterinthetreatedanimalsthaninthecontrolgroup.Inaweightgainexperimentthiscouldbequiteimportant!Perhapslargeranimalswereeasiertopickup,andsowereselectedfirst.Whatthatexperimentershouldhavedonewastoplacethemiceintheboxes,giveeachboxaplaceintheconstantenvironmentchamber,thenallocatetheboxestotreatmentorcontrolatrandom.Wewouldthenhavetwogroupswhichwerecomparable,bothininitialvaluesandinanyenvironmentaldifferenceswhichmayexistintheconstantenvironmentchamber.
2.11*ExperimentalunitsIntheweightgainexperimentdescribedabove,eachboxofmicecontainedfiveanimals.Theseanimalswerenotindependentofoneanother,butinteracted.Inaboxtheotherfouranimalsformedpartoftheenvironmentofthefifth,andsomightinfluenceitsgrowth.Theboxoffivemiceiscalledanexperimentalunit.Anexperimentalunitisthesmallestgroupofsubjectsinanexperimentwhoseresponsecannot
![Page 45: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/45.jpg)
beaffectedbyothersubjects.Weneedtoknowtheamountofnaturalvariationwhichexistsbetweenexperimentalunitsbeforewecandecidewhetherthetreatmenteffectisdistinguishablefromthisnaturalvariation.Intheweightgainexperiment,themeanweightgainineachboxshouldbecalculated,andthemeandifferenceestimatedusingthetwo-sampletmethod(§10.3).Inhumanstudies,thesamethinghappenswhengroupsofpatients,suchasallthoseinahospitalwardorageneralpracticearerandomizedasagroup.Thismighthappeninatrialofhealthpromotion,forexample,wherespecialclinicsareadvertisedandsetupinGPsurgeries.Itwouldbeimpracticaltoexcludesomepatientsfromtheclinicandimpossibletopreventpatientsfromthepracticeinteractingwithandinfluencingoneanother.Allthepracticepatientsmustbetreatedasasingleunit.Trialswhereexperimentalunitscontainmorethanonesubjectarecalledclusterrandomized.
Thequestionoftheexperimentalunitariseswhenthetreatmentisappliedtotheproviderofcareratherthantothepatientdirectly.Forexample,Whiteetal.(1989)comparedthreerandomlyallocatedgroupsofGPs,thefirstgivenanintensiveprogrammeofsmallgroupeducationtoimprovetheirtreatmentofasthma,thesecondalesserintervention,andthethirdnointerventionatall.ForeachGP,asampleofherorhisasthmaticpatientswasselected.Thesepatientsreceivedquestionnairesabouttheirsymptoms,theresearchhypothesisbeingthattheintensiveprogrammewouldresultinfewersymptomsamongtheirpatients.TheexperimentalunitwastheGP,notthepatient.TheasthmapatientstreatedbyanindividualGPwereusedtomonitortheeffectoftheinterventiononthatGP.TheproportionofpatientswhoreportedsymptomswasusedasameasureoftheGP'seffectiveness,andthemeanoftheseproportionswascomparedbetween
thegroupsusingone-wayanalysisofvariance(§10.9).Anotherexamplewouldbeatrialofpopulationscreeningforadisease(§15.3),wherescreeningcentresweresetupinsomehealthdistrictsandnotinothers.Weshouldfindthemortalityrateforeachdistrictseparatelyandthencomparethemeanrateinthegroupofscreeningdistrictswiththatinthegroupofcontroldistricts.
![Page 46: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/46.jpg)
Themostextremecaseariseswhenthereisonlyoneexperimentalunitpertreatment.Forexample,considerahealtheducationexperimentinvolvingtwoschools.Inoneschoolaspecialhealtheducationprogrammewasmounted,aimedtodiscouragechildrenfromsmoking.Bothbeforeandafterwards,thechildrenineachschoolcompletedquestionnairesaboutcigarettesmoking.Inthisexampletheschoolistheexperimentalunit.Thereisnoreasontosupposethattwoschoolsshouldhavethesameproportionofsmokersamongtheirpupils,orthattwoschoolswhichdohaveequalproportionsofsmokerswillremainso.Theexperimentwouldbemuchmoreconvincingifwehadseveralschoolsandrandomlyallocatedthemtoreceivethehealtheducationprogrammeortobecontrols.Wewouldthenlookforaconsistentdifferencebetweenthetreatedandcontrolschools,usingtheproportionofsmokersintheschoolasthevariable.
2.12*ConsentinclinicaltrialsIstartedmyresearchcareerinagriculture.Ourexperimentalsubjects,beingbarleyplants,hadnorights.Wesprayedthemwithwhateverchemicalswechoseandburntthemafterharvestandweighing.Wecannottreathumansubjectsinthesameway.Wemustrespecttherightsofourresearchsubjectsandtheirwelfaremustbeourprimaryconcern.Thishasnotalwaysbeenthecase,mostnotoriouslyintheNazideathcamps(Leaning1996).TheDeclarationofHelsinki(BMJ1996a),whichlaysdowntheprincipleswhichgovernresearchonhumansubjects,grewoutofthetrialsinNuremburgoftheperpetratorsoftheseatrocities(BMJ1996b).
Ifthereisatreatment,weshouldnotleavepatientsuntreatedifthisinanywayaffectstheirwell-being.TheworldwasrightlyoutragedbytheTuskegeeStudy,wheremenwithsyphiliswereleftuntreatedtoseewhatthelong-termeffectsofthediseasemightbe(Brawley1998,Ramsay1998).Thisisanextremeexamplebutitisnottheonlyone.Womenwithdysplasiafoundatcervicalcytologyhavebeenleftuntreatedtoseewhethercancerdeveloped(Mudur1997).Patientsarestillbeingaskedtoenterpharmaceuticaltrialswheretheymaygetaplacebo,eventhoughaneffectivetreatmentisavailable,allegedlybecauseregulatorsinsistonit.
![Page 47: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/47.jpg)
Peopleshouldnotbetreatedwithouttheirconsent.Thisgeneralprincipleisnotconfinedtoresearch.Patientsshouldalsobeaskedwhethertheywishtotakepartinaresearchprojectandwhethertheyagreetoberandomized.Theyshouldknowtowhattheyareconsenting,andusuallyrecruitstoclinicaltrialsaregiveninformationsheetswhichexplaintothemrandomization,thealternativetreatments,andthepossiblerisksandbenefits.Onlythencantheygiveinformedorvalidconsent.Forchildrenwhoareoldenoughtounderstand,bothchildand
parentshouldbeinformedandgivetheirconsent,otherwiseparentsmustconsent(Doyal1997).Peoplegetveryupsetandangryiftheythinkthattheyhavebeenexperimentedonwithouttheirknowledgeandconsent,oriftheyfeelthattheyhavebeentrickedintoitwithoutbeingfullyinformed.Agroupofwomenwithcervicalcancerweregivenanexperimentalradiationtreatment,whichresultedinseveredamage,withoutproperinformation(Anon1997).TheyformedagroupwhichtheycalledRAGE,whichspeaksforitself.
Patientsaresometimesrecruitedintotrialswhentheyareverydistressedandveryvulnerable.Ifpossibletheyshouldhavetimetothinkaboutthetrialanddiscussitwiththeirfamily.Patientsintrialsareoftennotatallclearaboutwhatisgoingonandhavewrongideasaboutwhatishappening(Snowdonetal.1997).Theymaybeunabletorecallgivingtheirconsent,anddenyhavinggivenit.Theyshouldalwaysbeaskedtosignconsentformsandshouldbegivenaseparatepatientinformationsheetandacopyoftheformtokeep.
Adifficultyariseswiththerandomizedconsentdesign(Zelen1979,1992).Inthis,wehaveanew,activetreatmentandeithernocontroltreatmentorusualcare.Werandomizesubjectstoactiveorcontrol.Wethenofferthenewtreatmenttotheactivegroup,whomayrefuse,andthecontrolgroupgetsusualcare.Theactivegroupisaskedtoconsenttothenewtreatmentandallsubjectsareaskedtoconsenttoanymeasurementrequired.Theymightbetoldthattheyareinaresearchstudy,butnotthattheyhavebeenrandomized.Thusonlypatientsintheactivegroupcanrefusethetrial,thoughallcanrefusemeasurement.Analysisisthenbyintentiontotreat(§2.5).For
![Page 48: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/48.jpg)
example,Dennisetal.(1997)wantedtoevaluateastrokefamilycareworker.Theyrandomizedpatientswithouttheirknowledge,thenaskedthemtoconsenttofollow-upconsistingofinterviewsbyaresearcher.Thecareworkervisitedthosepatientsandtheirfamilieswhohadbeenrandomizedtoher.McLean(1997)arguedthatifpatientscouldnotbeinformedabouttherandomizationwithoutjeopardizingthetrial,theresearchshouldnotbedone.Dennis(1997)arguedthattoaskforconsenttorandomizationmightbiastheresults,becausepatientswhodidnotreceivethecareworkermightberesentfulandbeharmedbythis.Myownviewisthatweshouldnotallowoneethicalconsideration,informedconsent,tooutweighallothersandthisdesigncanbeacceptable(Bland1997).
Thereisaspecialprobleminclusterrandomizedtrials.Patientscannotconsenttorandomization,butonlytotreatment.Inatrialwheregeneralpracticesareallocatedtoofferhealthchecks,forexample,patientscanconsenttothehealthchecksonlyiftheyareinahealthcheckpractice,thoughallwouldhavetoconsenttoanendoftrialassessment.
Researchonhumansubjectsshouldalwaysbeapprovedbyanindependentethicscommittee,whoseroleistorepresenttheinterestsoftheresearchsubject.Wheresuchasystemisnotinplace,terriblethingscanhappen.IntheUSA,researchcanbecarriedoutwithoutethicalapprovalifthesubjectsareprivatepatientsinaprivatehospitalwithoutanypublicfunding,andnonewdrugordeviceisused.Underthesecircumstances,plasticsurgeonscarriedoutatrialcomparingtwomethodsperformingface-lifts,oneoneachsideoftheface,
withoutpatients'consent(BulletinofMedicalEthics1998).
2MMultiplechoicequestions1to6(Eachbranchiseithertrueorfalse)
1.Whentestinganewmedicaltreatment,suitablecontrolgroupsincludepatientswho:
(a)aretreatedbyadifferentdoctoratthesametime;
![Page 49: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/49.jpg)
(b)aretreatedinadifferenthospital;
(c)arenotwillingtoreceivethenewtreatment;
(d)weretreatedbythesamedoctorinthepast;
(e)arenotsuitableforthenewtreatment.
ViewAnswer
2.Inanexperimenttocomparetwotreatments,subjectsareallocatedusingrandomnumberssothat:
(a)thesamplemaybereferredtoaknownpopulation;
(b)whendecidingtoadmitasubjecttothetrial,wedonotknowwhichtreatmentthatsubjectwouldreceive;
(c)thesubjectswillgetthetreatmentbestsuitedtothem;
(d)thetwogroupswillbesimilar,apartfromtreatment;
(e)treatmentsmaybeassignedaccordingtothecharacteristicsofthesubject.
ViewAnswer
3.Inadoubleblindclinicaltrial:
(a)thepatientsdonotknowwhichtreatmenttheyreceive;
(b)eachpatientreceivesaplacebo;
(c)thepatientsdonotknowthattheyareinatrial;
(d)eachpatientreceivesbothtreatments;
(e)theclinicianmakingassessmentdoesnotknowwhichtreatmentthepatientreceives.
ViewAnswer
4.Inatrialofanewvaccine,childrenwereassignedatrandomtoa‘vaccine’anda‘control’group.The‘vaccine’groupwereofferedvaccination,whichtwo-thirdsaccepted.Thecontrolgroupwereofferednothing:
![Page 50: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/50.jpg)
(a)thegroupwhichshouldbecomparedtothecontrolsisallchildrenwhoacceptedvaccination;
(b)thoserefusingvaccinationshouldbeincludedinthecontrolgroup;
(c)thetrialisdoubleblind;
(d)thoserefusingvaccinationshouldbeexcluded;
(e)thetrialisuselessbecausenotallthetreatedgroupwerevaccinated.
ViewAnswer
Table2.11.MethodofdeliveryintheKYMstudy
Methodofdelivery
AcceptedKYM
RefusedKYM
Controlwomen
% n % n % n
Normal 80.7 352 69.8 30 74.8 354
Instrumental 12.4 54 14.0 6 17.8 84
Caesarian 6.9 30 16.3 7 7.4 35
5.Cross-overdesignsforclinicaltrials:
(a)maybeusedtocompareseveraltreatments;
(b)involvenorandomization;
(c)requirefewerpatientsthandodesignscomparing
![Page 51: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/51.jpg)
independentgroups;
(d)areusefulforcomparingtreatmentsintendedtoalleviatechronicsymptoms;
(e)usethepatientashisowncontrol.
ViewAnswer
6.Placebosareusefulinclinicaltrials:
(a)whentwoapparentlysimilaractivetreatmentsaretobecompared;
(b)toguaranteecomparabilityinnon-randomizedtrials;
(c)becausethefactofbeingtreatedmayitselfproducearesponse;
(d)becausetheymayhelptoconcealthesubject'streatmentfromassessors;
(e)whenanactivetreatmentistobecomparedtonotreatment.
ViewAnswer
2EExercise:The‘KnowYourMidwife’trialTheKnowYourMidwife(KYM)schemewasamethodofdeliveringmaternitycareforlow-riskwomen.Ateamofmidwivesranaclinic,andthesamemidwifewouldgiveallantenatalcareforamother,deliverthebaby,andgivepostnatalcare.TheKYMschemewascomparedtostandardantenatalcareinarandomizedtrial(FlintandPoulengeris1986).Itwasthoughtthattheschemewouldbeveryattractivetowomenandthatiftheyknewitwasavailabletheymightbereluctanttoberandomizedtostandardcare.EligiblewomenwererandomizedwithouttheirknowledgetoKYMortothecontrolgroup,whoreceivedthestandardantenatalcareprovidedbySt.George'sHospital.WomenrandomizedtoKYMweresentaletterexplainingtheKYMschemeandinvitingthemtoattend.Somewomendeclinedandattendedthestandardclinicinstead.ThemodeofdeliveryforthewomenisshowninTable2.11.Normalobstetricdatawererecordedon
![Page 52: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/52.jpg)
allwomen,andthewomenwereaskedtocompletequestionnaires(whichtheycouldrefuse)aspartofastudyofantenatalcare,thoughtheywerenottoldaboutthetrial.
1.Thewomenknewwhattypeofcaretheywerereceiving.Whateffectmightthishaveontheoutcome?
ViewAnswer
2.WhatcomparisonshouldbemadetotestwhetherKYMhasanyeffectonmethodofdelivery?
ViewAnswer
3.Doyouthinkitwasethicaltorandomizewomenwithouttheirknowledge?
ViewAnswer
![Page 53: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/53.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>3-Samplingandobservationalstudies
3
Samplingandobservationalstudies
3.1ObservationalstudiesInthischapterweshallbeconcernedwithobservationalstudies.Insteadofchangingsomethingandobservingtheresult,asinanexperimentorclinicaltrial,weobservetheexistingsituationandtrytounderstandwhatishappening.Mostmedicalstudiesareobservational,includingresearchintohumanbiologyinhealthypeople,thenaturalhistoryofdisease,thecausesanddistributionofdisease,thequalityofmeasurement,andtheprocessofmedicalcare.
Oneofthemostimportantanddifficulttasksinmedicineistodeterminethecausesofdisease,sothatwemaydevisemethodsofprevention.Weareworkinginanareawhereexperimentsareoftenneitherpossiblenorethical.Forexample.todeterminethatcigarettesmokingcausedcancer,wecouldimagineastudyinwhichchildrenwererandomlyallocatedtoa‘twentycigarettesadayforfiftyyears’groupanda‘neversmokeinyourlife’group.Allwewouldhavetodothenwouldbetowaitforthedeathcertificates.However,wecouldnotpersuadeoursubjectstosticktothetreatmentanddeliberatelysettingouttocausecancerishardlyethical.Wemustthereforeobservethediseaseprocessasbestwecan.bywatchingpeopleinthewildratherthanunderlaboratoryconditions.
Wecannevercometoanunequivocalconclusionaboutcausationinobservationalstudies.Thediseaseeffectandpossiblecausedonotexistinisolationbutinacomplexinterplayofmanyinterveningfactors.Wemustdoourbesttoassureourselvesthattherelationshipweobserveisnottheresultofsomeotherfactoractingonboth‘cause’
![Page 54: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/54.jpg)
and‘effect’.Forexample,itwasoncethoughtthattheAfricanfevertree,theyellow-barkedacacia,causedmalaria,becausethoseunwiseenoughtocampunderthemwerelikelytodevelopthedisease.Thistreegrowsbywaterwheremosquitosbreed,andprovidesanidealday-timerestingplacefortheseinsects,whosebitetransmitstheplasmodiumparasitewhichproducesthedisease.Itwasthewaterandthemosquitoswhichweretheimportantfactors,notthetree.Indeed,thename‘malaria’comesfromasimilarincompleteobservation.Itmeans‘badair’andcomesfromthebeliefthatthediseasewascausedbytheairinlow-lying,marshyplaces,wherethemosquitosbred.Epidemiologicalstudydesignsmusttrytodealwiththecomplexinterrelationshipsbetweendifferentfactorsinordertodeducethetruemechanismofdiseasecausation.Wealsouseanumberofdifferentapproachestothestudyoftheseproblems,toseewhetherallproducethesameanswer.
Therearemanyproblemsininterpretingobservationalstudies,andthemedicalconsumerofsuchresearchmustbeawareofthem.Wehavenobetterwaytotacklemanyquestionsandsowemustmakethebestofthemandlookforconsistentrelationshipswhichstanduptothemostsevereexamination.Wecanalsolookforconfirmationofourfindingsindirectly,fromanimalmodelsandfromdose-responserelationshipsinthehumanpopulation.However,wemustacceptthatperfectproofisimpossibleanditisunreasonabletodemandit.Sometimes,aswithsmokingandhealth,wemustactonthebalanceoftheevidence.
Weshallstartbyconsideringhowtogetdescriptiveinformationaboutpopulationsinwhichweareinterested.Weshallgoontotheproblemofusingsuchinformationtostudydiseaseprocessesandthepossiblecausesofdisease.
3.2CensusesOnesimplequestionwecanaskaboutanygroupofinterestishowmanymembersithas.Forexample,weneedtoknowhowmanypeopleliveinacountryandhowmanyofthemareinvariousageandsexcategories,inordertomonitorthechangingpatternofdiseaseandtoplanmedicalservices.Wecanobtainitbyacensus.Inacensus,thewholeofadefinedpopulationiscounted.IntheUnitedKingdom,asin
![Page 55: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/55.jpg)
manydevelopedcountries,apopulationcensusisheldeverytenyears.Thisisdonebydividingtheentirecountryintosmallareascalledenumerationdistricts,usuallycontainingbetween100and200households.Itistheresponsibilityofanenumeratortoidentifyeveryhouseholdinthedistrictandensurethatacensusformiscompleted,listingallmembersofthehouseholdandprovidingafewsimplepiecesofinformation.Eventhoughcompletionofthecensusformiscompelledbylaw,andenormouseffortgoesintoensuringthateveryhouseholdisincluded,thereareundoubtedlysomewhoaremissed.Thefinaldata,thoughextremelyuseful,arenottotallyreliable.
Themedicalprofessiontakespartinamassive,continuingcensusofdeaths,byprovidingdeathcertificatesforeachdeathwhichoccurs,includingnotonlythenameofthedeceasedandcauseofdeath,butalsodetailsofage,sex,placeofresidenceandoccupation.Censusmethodsarenotrestrictedtonationalpopulations.Theycanbeusedformorespecificadministrativepurposestoo.Forexample,wemightwanttoknowhowmanypatientsareinaparticularhospitalataparticulartime,howmanyofthemareindifferentdiagnosticgroups,indifferentage/sexgroups,andsoon.Wecanthenusethisinformationtogetherwithestimatesofthedeathanddischargeratestoestimatehowmanybedsthesepatientswilloccupyatvarioustimesinthefuture(Bewleyetal.1975,1981).
3.3SamplingAcensusofasinglehospitalcanonlygiveusreliableinformationaboutthathospital.Wecannoteasilygeneralizeourresultstohospitalsingeneral.IfwewanttoobtaininformationaboutthehospitalsoftheUnitedKingdom,twocoursesareopentous:wecanstudyeveryhospital,orwecantakearepresentativesampleofhospitalsandusethattodrawconclusionsabouthospitalsasawhole.
Moststatisticalworkisconcernedwithusingsamplestodrawconclusionsaboutsomelargerpopulation.IntheclinicaltrialsdescribedinChapter2,thepatientsactasasamplefromalargerpopulationconsistingofallsimilarpatientsandwedothetrialtofindoutwhatwouldhappentothislargergroupwerewetogivethema
![Page 56: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/56.jpg)
newtreatment.
Theword‘population’isusedincommonspeechtomean‘allthepeoplelivinginanarea’,frequentlyofacountry.Instatistics,wedefinethetermmorewidely.Apopulationisanycollectionofindividualsinwhichwemaybeinterested,wheretheseindividualsmaybeanything,andthenumberofindividualsmaybefiniteorinfinite.Thus,ifweareinterestedinsomecharacteristicsoftheBritishpeople,thepopulationis‘allpeopleinBritain’.Ifweareinterestedinthetreatmentofdiabetesthepopulationis‘alldiabetics’.Ifweareinterestedinthebloodpressureofaparticularpatient,thepopulationis‘allpossiblemeasurementsofbloodpressureinthatpatient’.Ifweareinterestedinthetossoftwocoins,thepopulationis‘allpossibletossesoftwocoins’.Thefirsttwoexamplesarefinitepopulationsandcouldintheoryifnotpracticebecompletelyexamined;thesecondtwoareinfinitepopulationsandcouldnot.Wecouldonlyeverlookatasample,whichwewilldefineasbeingagroupofindividualstakenfromalargerpopulationandusedtofindoutsomethingaboutthatpopulation.
Howshouldwechooseasamplefromapopulation?Theproblemofgettingarepresentativesampleissimilartothatofgettingcomparablegroupsofpatientsdiscussedin§2.1,2,3.Wewantoursampletoberepresentative,insomesense,ofthepopulation.Wewantittohaveallthecharacteristicsintermsoftheproportionsofindividualswithparticularqualitiesashasthewholepopulation.Inasamplefromahumanpopulation,forexample,wewantthesampletohaveaboutthesameproportionofmenandwomenasinthepopulation,thesameproportionsindifferentagegroups,inoccupationalgroups,withdifferentdiseases,andsoon.Inaddition,ifweuseasampletoestimatetheproportionofpeoplewithadisease,wewanttoknowhowreliablethisestimateis,howfarfromtheproportioninthewholepopulationtheestimateislikelytobe.
Itisnotsufficienttochoosethemostconvenientgroup.Forexample,ifwewishedtopredicttheresultsofanelection,wewouldnottakeasoursamplepeoplewaitinginbusqueues.Thesemaybeeasytointerview,atleastuntilthebuscomes,butthesamplewouldbeheavilybiasedtowardsthosewhocannotaffordcarsandthustowards
![Page 57: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/57.jpg)
lowerincomegroups.Inthesameway,ifwewantedasampleofmedicalstudentswewouldnottakethefronttworowsofthelecturetheatre.Theymaybeunrepresentativeinhavinganunusuallyhighthirstforknowledge,orpooreyesight.
Howcanwechooseasamplewhichdoesnothaveabuilt-inbias?Wemightdivideourpopulationintogroups,dependingonhowwethinkvariouscharacteristicswillaffecttheresult.Toaskaboutanelection,forexample,wemightgroupthepopulationaccordingtoage,sexandsocialclass.Wethenchooseanumberofpeopleineachgroupbyknockingondoorsuntilthequotaismadeup,andinterviewthem.Then,knowingthedistributionsofthesecategoriesinthepopulation(fromcensusdata,etc.)wecangetafarbetterpictureofthe
viewsofthepopulation.Thisiscalledquotasampling.Inthesamewaywecouldtrytochooseasampleofratsbychoosinggivennumbersofeachweight,age,sex,etc.Therearedifficultieswiththisapproach.First,itisrarelypossibletothinkofalltherelevantclassifications.Second,itisstilldifficulttoavoidbiaswithintheclassifications,bypickingintervieweeswholookfriendly,orratswhichareeasytocatch.Third,wecanonlygetanideaofthereliabilityoffindingsbyrepeatedlydoingthesametypeofsurvey,andoftherepresentativenessofthesamplebyknowingthetruepopulationvalues(whichwecanactuallydointhecaseofelections),orbycomparingtheresultswithasamplewhichdoesnothavethesedrawbacks.Quotasamplingcanbequiteeffectivewhensimilarsurveysaremaderepeatedlyasinopinionpollsormarketresearch.Itislessusefulformedicalproblems,wherewearecontinuallyaskingnewquestions.Weneedamethodwherebiasisavoidedandwherewecanestimatethereliabilityofthesamplefromthesampleitself.Asin§2.2,weusearandommethod:randomsampling.
3.4RandomsamplingTheproblemofobtainingasamplewhichisrepresentativeofalargerpopulationisverysimilartothatofallocatingpatientsintotwocomparablegroups.Wewantawayofchoosingmembersofthesamplewhichdoesnotdependontheirowncharacteristics.Theonlywaytobe
![Page 58: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/58.jpg)
sureofthisistoselectthematrandom,sothatwhetherornoteachmemberofthepopulationischosenforthesampleispurelyamatterofchance.
Forexample,totakearandomsampleof5studentsfromaclassof80,wecouldwriteallthenamesonpiecesofpaper,mixthemthoroughlyinahatorothersuitablecontainer,anddrawoutfive.Allstudentshavethesamechanceofbeingchosen,andsowehavearandomsample.Allsamplesof5studentsareequallylikely,too,becauseeachstudentischosenquiteindependentlyoftheothers.Thismethodiscalledsimplerandomsampling.
Aswehaveseenin§2.2,physicalmethodsofrandomizingareoftennotverysuitableforstatisticalwork.Weusuallyusetablesofrandomdigits,suchasTable2.3.orrandomnumbersgeneratedbyacomputerprogram.WecoulduseTable2.3todrawoursampleof5from80studentsinseveralways.Forexample,wecouldlistthestudents,numberedfrom1to80.Thislistfromwhichthesampleistobedrawniscalledthesamplingframe.Wechooseastartingpointintherandomnumbertable(Table2.3),sayrow20,column5.Thisgivesusthefollowingpairsofdigits:
140488862892040342998708
Wecouldusethesepairsofdigitsdirectlyassubjectnumbers.Wechoosesubjectsnumbered14and4.Thereisnosubject88or86,sothenextchosenisnumber28.Thereisno92,sothenextis4.Wealreadyhavethissubjectinthesample,sowecarryontothenextpairofdigits,03.Thefinalmemberofthesampleisnumber42.Oursampleof5studentsisthusnumbers3,4,14,28and42.
Thereappearstobesomepatterninthissample.Twonumbersareadjacent(3and4)and3aredivisibleby14(14,28and42).Randomnumbersoftenappeartoustohavepattern,perhapsbecausethehumanmindisalwayslookingforit.Ontheotherhand,ifwetrytomakethesample‘morerandom’byreplacingeither3or4byasubjectneartheendofthelist,weareimposingapatternofuniformityonthesampleanddestroyingitsrandomness.Allgroupsoffiveareequallylikelyandmayhappen,even1,2,3,4,5.
![Page 59: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/59.jpg)
Thismethodofusingthetableisfinefordrawingasmallsample,butitcanbetediousfordrawinglargesamples,becauseoftheneedtocheckforduplicates.Therearemanyotherwaysofdoingit.Forexample,wecandroptherequirementforasampleoffixedsize,andonlyrequirethateachmemberofthepopulationwillhaveafixedprobabilityofbeinginthesample.Wecoulddrawa5/80=1/16sampleofourclassbyusingthedigitsingroupstogiveadecimalnumber.say,
0.14040.88860.28920.04030.42990.8708
Wethenchoosethefirstmemberofthepopulationif0.1404islessthan1/16.Itisnot,sowedonotincludethismember,northesecond,correspondingto0.8886,northethird,correspondingto0.2892.Thefourthcorrespondsto0.0403.whichislessthan1/16(0.0625)andsothefourthmemberischosenasamemberofthesample,andsoon.Thismethodisonlysuitableforfairlylargesamples,asthesizeofthesampleobtainedcanbeveryvariableinsmallsamplingproblems.Intheexamplethereisahigherthan1in10chanceoffinishingwithasampleof2orfewer.
Aswithrandomallocation(§2.2),randomsamplingisanoperationideallysuitedtocomputers.MyfreeprogramClinstat(§1.3)providestworandomsamplingschemes.Somecomputerprogramsformanagingprimarycarepracticesactuallyhavethecapacitytotakearandomsampleforanydefinedgroupofpatientsbuiltin.
Randomsamplingensuresthattheonlywaysinwhichthesamplediffersfromthepopulationwillbethoseduetochance.Ithasafurtheradvantage.becausethesampleisrandom,wecanapplythemethodsofprobabilitytheorytothedataobtained.AsweshallseeinChapter8,thisenablesustoestimatehowfarfromthepopulationvaluethesamplevalueislikelytobe.
Theproblemwithrandomsamplingisthatwemusthavealistofthepopulationfromwhichthesampleistobedrawn.Listsofpopulationsmaybehardtofind,ortheymaybeverycumbersome.Forexample,tosampletheadultpopulationintheUK,wecouldusetheelectoralroll.Butalistofsome40000000nameswouldbedifficulttohandle,andinpracticewewouldfirsttakearandomsampleofelectoralwards,andthenarandomsampleofelectorswithinthesewards.Thisis,for
![Page 60: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/60.jpg)
obviousreasons,amulti-stagerandomsample.Thisapproachcontainstheelementofrandomness,andsosampleswillberepresentativeofthepopulationsfromwhichtheyaredrawn.However,notallsampleshaveanequalchanceofbeingchosen,soitisnotthesameassimplerandomsampling.
Wecanalsocarryoutsamplingwithoutalistofthepopulationitself,providedwehavealistofsomelargerunitswhichcontainallthemembersofthepopulation.Forexample,wecanobtainarandomsampleofschoolchildreninanareabystartingwithalistofschools,whichisquiteeasytocomeby.Wethendrawasimplerandomsampleofschoolsandallthechildrenwithinourchosenschoolsformthesampleofchildren.Thisiscalledaclustersample,becausewetakeasampleofclustersofindividuals.Anotherexamplewouldbesamplingfromanyage/sexgroupinthegeneralpopulationbytakingasampleofaddressesandthentakingeveryoneatthechosenaddresseswhomatchedourcriteria.
Sometimesitisdesirabletodividethepopulationintodifferentstrata,forexampleintoageandsexgroups,andtakerandomsampleswithinthese.Thisisratherlikequotasampling,exceptthatwithinthestratawechooseatrandom.Ifthedifferentstratahavedifferentvaluesofthequantitywearemeasuring,thisstratifiedrandomsamplingcanincreaseourprecisionconsiderably.Therearemanycomplicatedsamplingschemesforuseindifferentsituations.Forexample,inastudyofcigarettesmokingandrespiratorydiseaseinDerbyshireschoolchildren,wedrewarandomsampleofschools,stratifiedbyschooltype(single-sex/mixed,selective/non-selective,etc.).Someschoolswhichtookchildrentoage13thenfedintothesame14+schoolwerecombinedintoonesamplingunit.Oursampleofchildrenwasallchildreninthechosenschoolswhowereintheirfirstsecondaryschoolyear(Banksetal.1978).Wethushadastratifiedrandomclustersample.Thesesamplingmethodsaffecttheestimateobtained.Stratificationimprovestheprecision,clustersamplingworsensit.Thesamplingschemeshouldbetakenintoaccountintheanalysis(Cochran1977,Kish1994).Oftenitisignored,aswasdonebyBanksetal.(1978)(thatis,byme),butitshouldnotbeandresultsmaybereportedas
![Page 61: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/61.jpg)
beingmoreprecisethantheyreallyare.
In§2.3Ilookedatthedifficultieswhichcanariseusingmethodsofallocationwhichappearrandombutdonotuserandomnumbers.Insampling,twosuchmethodsareoftensuggestedbyresearchers.Oneistotakeeverytenthsubjectfromthelist,orwhateverfractionisrequired.Theotheristousethelastdigitofsomereferencenumber,suchasthehospitalnumber,andtakeasthesamplesubjectswherethisis,say,3or4.Thesesamplingmethodsaresystematicorquasi-random.Itisnotusuallyobviouswhytheyshouldnotgive‘random’samples,anditmaybethatinmanycasestheywouldbejustasgoodasrandomsampling.Theyarecertainlyeasier.Tousethem,wemustbeverysurethatthereisnopatterntothelistwhichcouldproduceanunrepresentativegroup.Ifitispossible,randomsamplingseemssafer.
Volunteerbiascanbeasseriousaprobleminsamplingstudiesasitisintrials(§2.4).Ifwecanonlyobtaindatafromasubsetofourrandomsample,thenthissubsetwillnotbearandomsampleofthepopulation.Itsmemberswillbeselfselected.Itisoftenverydifficulttogetdatafromeverymemberofasample.Theproportionforwhomdataisobtainediscalledtheresponserateandinasamplesurveyofthegeneralpopulationislikelytobebetween
70%and80%.Thepossibilitythatthoselostfromthesamplearedifferentinsomewaymustbeconsidered.Forexample,theymaytendtobeill,whichcanbeaseriousproblemindiseaseprevalencestudies.IntheschoolstudyofBanksetal.(1978),theresponseratewas80%,mostofthoselostbeingabsentfromschoolontheday.Now,someoftheseabsenteeswereillandsomeweretruants.Oursamplemaythusleadustounderestimatetheprevalenceofrespiratorysymptoms,byomittingsuffererswithcurrentacutedisease,andtheprevalenceofcigarettesmokingbyomittingthosewhohavegoneforaquicksmokebehindthebikesheds.
Oneofthemostfamoussamplingdisasters,theLiteraryDigestpollof1936,illustratesthesedangers(Bryson1976).Thiswasapollofvotingintentionsinthe1936USpresidentialelection,foughtbyRooseveltandLandon.Thesamplewasacomplexone.Insomecitieseveryregisteredvoterwasincluded,inothersoneintwo,andforthewholeofChicago
![Page 62: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/62.jpg)
oneinthree.Tenmillionsampleballotsweremailedtoprospectivevoters,butonly2.3million,lessthanaquarter,werereturned.Still,twomillionisalotofAmericans,andthesepredicteda60%votetoLandon.Infact,Rooseveltwonwith62%ofthevote.Theresponsewassopoorthatthesamplewasmostunlikelytoberepresentativeofthepopulation,nomatterhowcarefullytheoriginalsamplewasdrawn.TwomillionAmericanscanbewrong!Itisnotthemeresizeofthesample,butitsrepresentativenesswhichisimportant.Providedthesampleistrulyrepresentative,2000votersisallyouneedtoestimatevotingintentionstowithin2%,whichisenoughforelectionpredictioniftheytellthetruthanddonotchangetheirminds(see§18E).
3.5SamplinginclinicalandepidemiologicalstudiesHavingextolledthevirtuesofrandomsamplingandcastdoubtonallothersamplingmethods,Imustadmitthatmostmedicaldataarenotobtainedinthisway.Thisispartlybecausethepracticaldifficultiesareimmense.ToobtainareasonablesampleofthepopulationoftheUK,anyonecangetalistofelectoralwards,takearandomsampleofthem,buycopiesoftheelectoralrollsforthechosenwardsandthentakearandomsampleofnamesfromit.Butsupposeyouwanttoobtainasampleofpatientswithcarcinomaofthebronchus.Youcouldgetalistofhospitalseasilyenoughandgetarandomsampleofthem,butthenthingswouldbecomedifficult.Thenamesofpatientswillonlybereleasedbytheconsultantinchargeshouldhesowish,andyouwillneedhispermissionbeforeapproachingthem.Anystudyofhumanpatientsrequiresethicalapproval,andyouwillneedthisfromtheethicscommitteeofeachofyourchosenhospitals.Gettingthecooperationofsomanypeopleisatasktodauntthehardiest,andobtainingethicalapprovalalonecantakemorethanayear.IntheUK,wenowhaveasystemofmulti-centreresearchethicscommittees,butaslocalapprovalmustalsobeobtainedthedelaysmaystillbeimmense.
Theresultofthisisthatclinicalstudiesaredoneonthepatientstohand.Ihavetouchedonthisprobleminthecontextofclinicaltrials(§2.7)andthe
![Page 63: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/63.jpg)
sameappliestoothertypesofclinicalstudy.InaclinicaltrialweareconcernedwiththecomparisonoftwotreatmentsandwehopethatthesuperiortreatmentinStockportwillalsobethesuperiortreatmentinSouthampton.Ifwearestudyingclinicalmeasurement,wecanhopethatameasurementmethodwhichisrepeatableinMiddlesbroughwillberepeatableinMaidenhead,andthattwodifferentmethodsgivingsimilarresultsinoneplacewillgivesimilarresultsinanother.Studieswhicharenotcomparativegivemorecauseforconcern.Thenaturalhistoryofadiseasedescribedinoneplacemaydifferinunpredictablewaysfromthatinanother,duetodifferencesintheenvironmentandthegeneticmakeupofthelocalpopulation.Referencerangesforquantitiesofclinicalinterest,thelimitswithinwhichvaluesfrommosthealthpeoplewilllie,maywelldifferfromplacetoplace.
Studiesbasedonlocalgroupsofpatientsarenotwithoutvalue.Thisisparticularlysowhenweareconcernedwithcomparisonsbetweengroups,asinaclinicaltrial,orrelationshipsbetweendifferentvariables.However,wemustalwaysbearthelimitationsofthesamplingmethodinmindwheninterpretingtheresultsofsuchstudies.
Ingeneral,mostmedicalresearchhastobecarriedoutusingsamplesdrawnfrompopulationswhicharemuchmorerestrictedthanthoseaboutwhichwewishtodrawconclusions.Wemayhavetousepatientsinonehospitalinsteadofallpatients,orthepopulationofasmallarearatherthanthatofthewholecountryorplanet.Wemayhavetorelyonvolunteersforstudiesofnormalsubjects,givenmostpeople'sdislikeofhavingneedlespushedintothemanddisinclinationtospendhourshookeduptobatteriesofinstruments.Groupsofnormalsubjectscontainmedicalstudents,nursesandlaboratorystafffarmoreoftenthanwouldbeexpectedbychance.Inanimalresearchtheproblemisevenworse,fornotonlydoesonebatchofonestrainofmicehavetorepresentthewholespecies,itoftenhastorepresentmembersofadifferentorder,namelyhumans.
Findingsfromsuchstudiescanonlyapplytothepopulationfromwhichthesamplewasdrawn.Anyconclusionwhichwecometoaboutwiderpopulations,suchasallpatientswiththediseaseinquestion,dependsonevidencewhichisnotstatisticalandoftenunspecified,namelyourgeneralexperienceofnaturalvariabilityandexperienceofsimilar
![Page 64: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/64.jpg)
studies.Thismayletusdown,andresultsestablishedinonepopulationmaynotapplytoanother.WehaveseenthisintheuseofBCGvaccineinIndia(§2.7).Itisveryimportantwhereverpossiblethatstudiesshouldberepeatedbyotherworkersonotherpopulations,sothatwecansamplethelargerpopulationatleasttosomeextent.
Intwotypesofstudy,casereportsandcaseseries,thesubjectscomebeforetheresearch,asitissuggestedbytheirexistence.Thereisnosampling.Theyareusedtoraisequestionsratherthantoanswerthem.
Acasereportisadescriptionofasinglepatientwhosecasedisplaysinterestingfeatures.Thisisusedtogenerateideasandraisequestions,ratherthantoanswerthem.Itclearlycannotbeplannedinadvance;itarisesfromthecase.Forexample,Velzeboeretal.(1997)reportedthecaseofan11-month-oldPakistani
girlwasadmittedtohospitalbecauseofdrowsiness,malaiseandanorexia.Shehadstoppedcrawlingorstandingupandscratchedherskincontinuously.Allinvestigationswerenegative.Her6-year-oldsisterwasthenbroughtinwithsimilarsymptoms.(Notethattherearetwopatientshere,buttheyarepartofthesamecase.)Thedoctorsguessedthatexposuretomercurymightbetoblame.Whenasked,themotherreportedthat2weeksbeforetheyoungerchild'ssymptomsstarted,mercuryfromabrokenthermometerhadbeendroppedonthecarpetinthechildren'sroom.Mercuryconcentrationinaurinesampletakenonadmissionwas12.6µg/1(slightlyabovetheacceptednormalvalueof10µg/1).Exposurewasconfirmedbyahighmercuryconcentrationinherhair.After3monthstreatmentthesymptomshaddisappearedtotallyandurinarymercuryhadfallenbelowthedetectionlimitof1µg/1.Thiscasecalledintoquestionthenormalvaluesformercuryinchildren.
Acaseseriesissimilartoacasereport,exceptthatanumberofsimilarcaseshavebeenobserved.Forexample,Shakeretal.(1997)described15patientsexaminedforhypocalcaemiaorskeletaldisease,inwhomthediagnosisofcoeliacdiseasewassubsequentlymade.In11ofthemgastrointestinalsymptomswereabsentormild.Theyconcludedthatbonelossmaybeasignofcoeliacdiseaseandthisdiagnosisshouldbeconsidered.Thedesigndoesnotallowthemtodrawanyconclusions
![Page 65: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/65.jpg)
abouthowoftenthismighthappen.Todothattheywouldhavetocollectdatasystematically,usingacohortdesign(§3.7)forexample.
3.6Cross-sectionalstudiesOnepossibleapproachtothesamplingproblemisthecross-sectionalstudy.Wetakesomesampleorwholenarrowlydefinedpopulationandobservethematonepointintime.Wegetpoorestimatesofmeansandproportionsinanymoregeneralpopulation,butwecanlookatrelationshipswithinthesample.Forexample,inanepidemiologicalstudy,Banksetal.(1978)gavequestion-nairestoallfirstyearsecondaryschoolboysinarandomsampleofschoolsinDerbyshire(§3.4).Amongboyswhohadneversmoked,3%reportedacoughfirstthinginthemorning,comparedto19%ofboyswhosaidthattheysmokedoneormorecigarettesperweek.ThesamplewasrepresentativeofboysofthisageinDerbyshirewhoanswerquestionnaires,butwewantourconclusionstoapplyatleasttotheUnitedKingdom,ifnotthedevelopedworldorthewholeplanet.Wearguethatalthoughtheprevalenceofsymptomsandthestrengthoftherelationshipmayvarybetweenpopulations,theexistenceoftherelationshipisunlikelyonlytooccurinthepopulationstudied.Wecannotconcludethatsmokingcausesrespiratorysymptoms.Smokingandrespiratorysymptomsmaynotbedirectlyrelated,butmaybothberelatedtosomeotherfactor.Afactorrelatedtobothpossiblecauseandpossibleeffectiscalledconfounding.Forexample,childrenwhoseparentssmokemaybemorelikelytodeveloprespiratorysymptoms,becauseofpassiveinhalationoftheirparent'ssmoke,andalsobemoreinfluencedtotrysmokingthemselves.Wecantestthisbylookingseparatelyattherelationshipbetweenthechild'ssmokingandsymptomsforthose
whoseparentsarenotsmokers,andforthosewhoseparentsaresmokers.AsFigure3.1shows,thisrelationshipinfactpersistedandtherewasnoreasontosupposethatathirdcausalfactorwasatwork.
![Page 66: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/66.jpg)
Fig.3.1.Prevalenceofself-reportedmorningcoughinDerbyshireschoolboys,bytheirownandtheirparents'cigarettesmoking(Blandetal.1978)
Mostdiseasesarenotsuitedtothissimplecross-sectionalapproach,becausetheyarerareevents.Forexample,lungcanceraccountsfor9%ofmaledeathsintheUK(OPCS,DH2No.7),andsoisaveryimportantdisease.Howevertheproportionofpeoplewhoareknowntohavethediseaseatanygiventime,theprevalence,isquitelow.Mostdeathsfromlungcancertakeplaceaftertheageof45,sowewillconsiderasampleofmenaged45andover.Theaverageremaininglifespanofthesemen,inwhichtheycouldcontractlungcancer,willbeabout30years.Theaveragetimefromdiagnosistodeathisaboutayear,soofthosewhowillcontractlungcanceronly1/30willhavebeendiagnosedwhenthesampleisdrawn.Only9%ofthesamplewilldeveloplungcanceranyway,sotheproportionwiththediseaseatanytimeis1/30×9%=0.3%or3perthousand.Wewouldneedaverylargesampleindeedtogetaworthwhilenumberoflungcancercases.
Cross-sectionaldesignsareusedinclinicalstudiesalso.Forexample,Rodinetal.(1998)studiedpolycysticovarydisease(PCO)inarandom
![Page 67: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/67.jpg)
sampleofAsianwomenfromthelistsoflocalgeneralpracticesandfromalocaltranslatingservice.Wefoundthat52%ofthesamplehadPCO,veryhighcomparedtothatfoundinotherUKsamples.However,thiswouldnotprovideagoodestimateforAsianwomeningeneral,becausetheremaybemanydifferencesbetweenthissample,suchastheirregionsoforigin,andAsianwomenlivingelsewhere.WealsofoundthatPCOwomenhadhigherfastingglucoselevelsthannon-PCOwomen.Asthisisacomparisonwithinthesample,itseemsplausibletoconcludethatamongAsianwomenPCOtendstobeassociatedwithraisedglucose.WecannotsaywhetherPCOraisesglucoseorwhetherraisedglucoseincreasestheriskofPCO,becausetheyaremeasuredatthesametime.
Table3.1.Standardizeddeathratesperyearper1000menaged35ormoreinrelationtomostrecentamount
smoked,53monthsfollow-up(DollandHill1956)
Causeofdeath
Deathrateamong
Non-smokers Smokers
Mensmokingadailyaverageweightof
tobaccoof
1–14g 15–24g 25+g
Lungcancer
0.07 0.90 0.47 0.86 1.66
Othercancer
2.04 2.02 2.01 1.56 2.63
![Page 68: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/68.jpg)
Otherrespiratory
0.81 1.13 1.00 1.11 1.41
Coronarythrombosis
4.22 4.87 4.64 4.60 5.99
Othercauses
6.11 6.89 6.82 6.38 7.19
Allcauses 13.25 15.78 14.92 14.49 18.84
3.7CohortstudiesOnewayofgettingroundtheproblemofthesmallproportionofpeoplewiththediseaseofinterestisthecohortstudy.Wetakeagroupofpeople,thecohort,andobservewhethertheyhavethesuspectedcausalfactor.Wethenfollowthemovertimeandobservewhethertheydevelopthedisease.Thisisaprospectivedesign,aswestartwiththepossiblecauseandseewhetherthisleadstothediseaseinthefuture.Itisalsolongitudinal,meaningthatsubjectsarestudiedatmorethanonetime.Acohortstudyusuallytakesalongtime,aswemustwaitforthefutureeventtooccur.Itinvolveskeepingtrackoflargenumbersofpeople,sometimesformanyyears,andoftenverylargenumbersmustbeincludedinthesampletoensuresufficientnumberswilldevelopthediseasetoenablecomparisonstobemadebetweenthosewithandwithoutthefactor.
AnotedcohortstudyofmortalityinrelationtocigarettesmokingwascarriedoutbyDollandHill(1956).TheysentaquestionnairetoallmembersofthemedicalprofessionintheUK,whowereaskedtogivetheirname,address,ageanddetailsofcurrentandpastsmokinghabits.Thedeathsamongthisgroupwererecorded.Only60%ofdoctorscooperated,soinfactthecohortdoesnotrepresentall
![Page 69: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/69.jpg)
doctors.Theresultsforthefirst53monthsareshowninTable3.1.
Thecohortrepresentsdoctorswillingtoreturnquestionnaires,notpeopleasawhole.Wecannotusethedeathratesasestimatesforthewholepopulation,orevenforalldoctors.Whatwecansayisthat,inthisgroup,smokerswerefarmorelikelythannon-smokerstodiefromlungcancer.Itwouldbesurprisingifthisrelationshipwereonlytruefordoctors,butwecannotdefinitelysaythatthiswouldbethecaseforthewholepopulation,becauseofthewaythesamplehasbeenchosen.
Wealsohavetheproblemofotherinterveningvariables.Doctorswerenotallocatedtobesmokersornon-smokersasinaclinicaltrial;theychoseforthemselves.Thedecisiontobeginsmokingmayberelatedtomanythings(socialfactors,personalityfactors,geneticfactors)whichmayalsoberelatedtolungcancer.Wemustconsiderthesealternativeexplanationsverycarefullybeforecomingtoanyconclusionaboutthecausesofcancer.Inthisstudytherewerenodatatotestsuchhypotheses.
Thesametechniqueisused,usuallyonasmallerscale,inclinicalstudies.Forexample,Caseyetal.(1996)studied55patientswithverysevererheumatoidarthritisaffectingthespineandtheuseofallfourlimbs.Thesepatientswereoperatedoninanattempttoimprovetheirconditionandtheirsubsequentprogresswasmonitored.Wefoundthatonly25%hadafavourableoutcome.Wecouldnotconcludefromthisthatsurgerywouldbeworthwhilein25%ofsuchpatientsgenerally.Ourpatientsmighthavebeenparticularlyillorunusuallyfit,oursurgeonsmightbethebestortheymightbe(relativelyspeaking)ham-fistedbutchers.However,wecomparedtheseresultswithotherstudiespublishedinthemedicalliterature,whichweresimilar.Thesestudiestogethergaveamuchbettersampleofsuchpatientsthananystudyalonecoulddo(see§17.11,meta-analysis).Welookedatwhichcharacteristicsofthepatientspredictedagoodorbadoutcomeandfoundthattheareaofcross-sectionofthespinalcordwastheimportantpredictor.Weweremuchmoreconfidentofthisfinding,becauseitarosefromstudyingrelationshipsbetweenvariableswithinthesample.Itseemsquiteplausiblefromthisstudyalonethatpatientswhosespinalcordshavealreadyatrophiedareunlikelytobenefitfrom
![Page 70: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/70.jpg)
surgery.
3.8Case-controlstudiesAnothersolutiontotheproblemofthesmallnumberofpeoplewiththediseaseofinterestisthecase-controlstudy.Inthiswestartwithagroupofpeoplewiththedisease,thecases.Wecomparethemtoasecondgroupwithoutthedisease,thecontrols.Inanepidemiologicalstudy,wethenfindtheexposureofeachsubjecttothepossiblecausativefactorandseewhetherthisdiffersbetweenthetwogroups.Beforetheircohortstudy,DollandHill(1950)carriedoutacase-controlstudyintotheaetiologyoflungcancer.TwentyLondonhospitalsnotifiedallpatientsadmittedwithcarcinomaofthelung,thecases.Aninterviewervisitedthehospitaltointerviewthecase,and,atthesametime,selectedapatientwithdiagnosisotherthancancer,ofthesamesexandwithinthesame5yearagegroupasthecase,inthesamehospitalatthesametime,asacontrol.Whenmorethanonesuitablepatientwasavailable,thepatientchosenwasthefirstinthewardlistconsideredbythewardsistertobefitforinterview.Table3.2showstherelationshipbetweensmokingandlungcancerforthesepatients.Asmokerwasanyonewhohadsmokedasmuchasonecigaretteadayforasmuchasoneyear.Itappearsthatcasesweremorelikelythancontrolstosmokecigarettes.DollandHillconcludedthatsmokingisanimportantfactorintheproductionofcarcinomaofthelung.
Thecase-controlstudyisanattractivemethodofinvestigation,becauseofitsrelativespeedandcheapnesscomparedtootherapproaches.However,therearedifficultiesintheselectionofcases,theselectionofcontrols,andobtainingthedata.Becauseofthese,case-controlstudiessometimesproducecontradictoryandconflictingresults.
Thefirstproblemistheselectionofcases.Thisusuallyreceiveslittleconsiderationbeyondadefinitionofthetypeofdiseaseandastatementaboutthe
confirmationofthediagnosis.Thisisunderstandable,asthereisusuallylittleelsethattheinvestigatorscandoaboutit.Theystartwith
![Page 71: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/71.jpg)
theavailablesetofpatients.However,thesepatientsdonotexistinisolation.Theyaretheresultofsomeprocesswhichhasledtothembeingdiagnosedashavingthediseaseandthusbeingavailableforstudy.Forexample,supposewesuspectthatoralcontraceptivesmightcausecancerofthebreast.Wehaveagroupofpatientsdiagnosedashavingcancerofthebreast.Wemustaskourselveswhetheranyoftheseweredetectedatamedicalexaminationwhichtookplacebecausethewomanwasseeingadoctortoreceiveaprescription.Ifthiswereso,theriskfactor(pill)wouldbeassociatedwiththedetectionofthediseaseratherthanitscause.Thisiscalledascertainmentbias.
Table3.2.Numbersofsmokersandnon-smokersamonglungcancerpatientsandageandsex
matchedcontrolswithdiseasesotherthancancer(DollandHill1950)
Non-smokers Smokers Total
Males
Lungcancerpatients
2(0.3%) 647(99.7%)
649
Controlpatients 27(4.2%) 622(95.8%)
649
Females
Lungcancer 19(31.7%) 41(68.3%) 60
![Page 72: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/72.jpg)
patients
Controlpatients 32(53.3%) 28(46.7%) 60
Farmoredifficultyiscausedbytheselectionofcontrols.Wewantagroupofpeoplewhodonothavethediseaseinquestion,butwhoareotherwisecomparabletoourcases.Wemustfirstdecidethepopulationfromwhichtheyaretobedrawn.Therearetwomainsourcesofcontrols:thegeneralpopulationandpatientswithotherdiseases.Thelattermaybepreferredbecauseofitsaccessibility.Nowthesetwopopulationsareclearlynotthesame.Forexample,DollandHill(1950)gavethecurrentsmokinghabitsof1014menandwomenwithdiseasesotherthancancer,14%ofwhomwerecurrentlynon-smokers.Theycommentedthattherewasnodifferencebetweensmokinginthediseasegroupsrespiratorydisease,cardiovasculardisease,gastro-intestinaldiseaseandothers.However,inthegeneralpopulationthepercentageofcurrentnon-smokerswas18%formenand59%forwomen(Todd1972).Thesmokingrateinthepatientgroupasawholewashigh.Sincetheirreport,ofcourse,smokinghasbeenassociatedwithdiseasesineachgroup.Smokersgetmorediseaseandaremorelikelytobeinhospitalthannon-smokers.
Intuitively,thecomparisonwewanttomakeisbetweenpeoplewiththediseaseandhealthypeople,notpeoplewithalotofotherdiseases.Wewanttofindouthowtopreventdisease,nothowtochooseonediseaseoranother!However,itismucheasiertousehospitalpatientsascontrols.Theremaythenbeabiasbecausethefactorofinterestmaybeassociatedwithotherdiseases.Supposewewanttoinvestigatetherelationshipbetweenadiseaseandcigarettesmokingusinghospitalcontrols.Shouldweexcludepatientswithlungcancerfromthecontrolgroup?Ifweincludethem,ourcontrolsmayhavemoresmokers
thanthegeneralpopulation,butifweexcludethemwemayhavefewer.Thisproblemisusuallyresolvedbychoosingspecificpatientgroups,suchasfracturecases,whoseillnessisthoughttobeunrelatedtothefactorbeinginvestigated.Incase-controlstudiesusingcancer
![Page 73: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/73.jpg)
registries,controlsaresometimespeoplewithotherformsofcancer.Sometimesmorethanonecontrolgroupisused.
Havingdefinedthepopulationwemustchoosethesample.Therearemanyfactorswhichaffectexposuretoriskfactors,suchasageandsex.Themoststraightforwardwayistotakealargerandomsampleofthecontrolpopulation,ascertainalltherelevantcharacteristics,andthenadjustfordifferencesduringtheanalysis,usingmethodsdescribedinChapter17.Thealternativeistotrytomatchacontroltoeachcase,sothatforeachcasethereisacontrolofthesameage,sex,etc.Havingdonethis,thenwecancompareourcasesandcontrolsknowingthattheeffectsoftheseinterveningvariablesareautomaticallyadjustedfor.Ifwewishtoexcludeacasewemustexcludeitscontrol,too,orthegroupswillnolongerbecomparable.Wecanhavemorethanonecontrolpercase,buttheanalysisbecomescomplicated.
Matchingonsomevariablesdoesnotensurecomparabilityonall.Indeed,ifitdidtherewouldbenostudy.DollandHillmatchedonage,sexandhospital.Theyrecordedareaofresidenceandfoundthat25%oftheircaseswerefromoutsideLondon,comparedto14%ofcontrols.Ifwewanttoseewhetherthisinfluencesthesmokingandlungcancerrelationshipwemustmakeastatisticaladjustmentanyway.Whatshouldwematchfor?Themorewematchfor,thefewerinterveningvariablestherearetoworryabout.Ontheotherhand,itbecomesmoreandmoredifficulttofindmatches.Evenmatchingonageandsex,DollandHillcouldnotalwaysfindacontrolinthesamehospital,andhadtolookelsewhere.Matchingformorethanageandsexcanbeverydifficult.
Havingdecidedonthematchingvariableswethenfindinthecontrolpopulationallthepossiblematches.Iftherearemorematchesthanweneed,weshouldchoosethenumberrequiredatrandom.Othermethods,suchasthatusedbyDollandHillwhoallowedthewardsistertochoose,haveobviousproblemsofpotentialbias.Ifnosuitablecontrolcanbefound,wecandotwothings.Wecanwidenthematchingcriteria,sayagetowithintenyearsratherthanfive,orwecanexcludethecase.
Therearedifficultiesininterpretingtheresultsofcase-controlstudies.
![Page 74: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/74.jpg)
Oneisthatthecase-controldesignisoftenretrospective,thatis,wearestartingwiththepresentdiseasestate,e.g.lungcancer,andrelatingittothepast,e.g.historyofsmoking.Wemayhavetorelyontheunreliablememoriesofoursubjects.Thismayleadbothtorandomerrorsamongcasesandcontrolsandsystematicrecallbias,whereonegroup,usuallythecases,recallseventsbetterthantheother.Forexample,themotherofahandicappedchildmaybemorelikelythanthemotherofanormalchildtoremembereventsinpregnancywhichmayhavecauseddamage.Thereisaproblemofassessmentbiasinsuchstudies,justasinclinicaltrials(§2.9).Interviewerswillveryoftenknowwhethertheintervieweeisacaseorcontrolandthismaywellaffectthewayquestionsareasked.Theseandotherconsiderationsmakecase-controlstudiesextremely
difficulttointerpret.Theevidencefromsuchstudiescanbeuseful,butdatafromothertypesofinvestigationmustbeconsidered,too,beforeanyfirmconclusionsaredrawn.
Thecase-controldesignisusedclinicallytoinvestigatethenaturalhistoryofdiseasebycomparingpatientswithhealthysubjectsorpatientswithanotherdisease.Forexample,Kielyetal.(1995)wereinterestedinlymphaticfunctionininflammatoryarthritis.Wecomparedarthritispatients(thecases)withhealthyvolunteers(thecontrols).Lymphaticflowwasmeasuredinthearmsofthesesubjectsandthegroupscompared.Wefoundthatlymphaticdrainagewaslessinthecasesthaninthecontrolgroup,butthiswasonlysoforarmswhichwereswollen(oedematous).
3.9*QuestionnairebiasinobservationalstudiesInobservationalstudies,muchdatamayhavetobesuppliedbythesubjectsthemselves.Thewayinwhichaquestionisaskedmayinfluencethereply.Sometimesthebiasinaquestionisobvious.Comparethese:
(a)Doyouthinkpeopleshouldbefreetoprovidethebestmedicalcarepossibleforthemselvesandtheirfamilies,freeofinterferencefromaStatebureaucracy?
![Page 75: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/75.jpg)
(b)Shouldthewealthybeabletobuyaplaceattheheadofthequeueformedicalcare,pushingasidethosewithgreaterneed,orshouldmedicalcarebesharedsolelyonthebasisofneedforit?
Version(a)expectstheansweryes,version(b)expectstheanswerno.Wewouldhopenottobemisledbysuchblatantmanipulation,buttheeffectsofquestionwordingcanbemuchmoresubtlethanthis.Hedges(1978)reportsseveralexamplesoftheeffectsofvaryingthewordingofquestions.Heaskedtwogroupsofabout800subjectsoneofthefollowing:
(a)Doyoufeelyoutakeenoughcareofyourhealth,ornot?
(b)Doyoufeelyoutakeenoughcareofyourhealth,ordoyouthinkyoucouldtakemorecareofyourhealth?
Inreplytoquestion(a),82%saidthattheytookenoughcare,whereasonly68%saidthisinreplytoquestion(b).Evenmoredramaticwasthedifferencebetweenthispair:
(a)Doyouthinkapersonofyouragecandoanythingtopreventill-healthinthefutureornot?
(b)Doyouthinkapersonofyouragecandoanythingtopreventill-healthinthefuture,orisitlargelyamatterofchance?
Notonlywasthereadifferenceinthepercentagewhorepliedthattheycoulddosomething,butasTable3.3showsthisanswerwasrelatedtoageforversion(a)butnotforversion(b).Hereversion(b)isambiguous,asitisquitepossibletothinkthathealthislargelyamatterofchancebutthatthereisstillsomethingonecandoaboutit.Onlyifitistotallyamatterofchanceistherenothingonecando.
Table3.3.Repliestotwosimilarquestionsaboutillhealth,byage(Hedges1978)
Age(years)
Total
![Page 76: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/76.jpg)
16–34 35–54 55+
Candosomething(a) 75% 64% 56% 65%
Candosomething(b) 45% 49% 50% 49%
Sometimestherespondentsmayinterpretthequestioninadifferentwayfromthequestioner.Forexample,whenaskedwhethertheyusuallycoughedfirstthinginthemorning,3.7%oftheDerbyshireschoolchildrenrepliedthattheydid.Whentheirparentswereaskedaboutthechild'ssymptoms2.4%repliedpositively,notadramaticdifference.Yetwhenaskedaboutcoughatothertimesinthedayoratnight24.8%ofchildrensaidyes,comparedtoonly4.5%oftheirparents(Blandetal.1979).Thesesymptomsallshowedrelationshipstothechild'ssmokingandotherpotentiallycausalvariables,andalsotooneanother.Weareforcedtoadmitthatwearemeasuringsomething,butthatwearenotsurewhat!
Anotherpossibilityisthatrespondentsmaynotunderstandthequestionatall,especiallywhenitincludesmedicalterms.Inanearlierstudyofcigarettesmokingbychildren,wefoundthat85%ofasampleagreedthatsmokingcausedcancer,butthat41%agreedthatsmokingwasnotharmful(Bewleyetal.1974).Thereareatleasttwopossibleexplanationsforthis:beingaskedtoagreewiththenegativestatement‘smokingisnotharmful’mayhaveconfusedthechildren,ortheymaynotseecancerasharmful.Wehaveevidenceforbothofthesepossibilities.InarepeatstudyinKentweaskedafurthersampleofchildrenwhethertheyagreedthatsmokingcausedcancerandthat‘smokingisbadforyourhealth’(BewleyandBland1976).Inthisstudy90%agreedthatsmokingcausescancerand91%agreedthatsmokingisbadforyourhealth.Inanotherstudy(Blandetal.1975),weaskedchildrenwhatwasmeantbytheterm‘lungcancer’.Only13%seemedtoustounderstandand32%clearlydidnot,oftensaying‘Idon'tknow’.Theynearlyallknewthatlungcancerwascausedbysmoking,however.
![Page 77: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/77.jpg)
Thesettinginwhichaquestionisaskedmayalsoinfluencereplies.OpinionpollstersInternationalCommunicationsandMarketResearchconductedapollinwhichhalfthesubjectswerequestionedbyinterviewersabouttheirvotingpreferenceandhalfweregivenasecretballot(McKie1992).Byeachmethod33%chose‘Labour’,but28%chose‘Conservative’atinterviewand7%wouldnotsay,whereas35%chose‘Conservative’bysecretballotandonly1%wouldnotsay.HencethesecretmethodproducedaConservativemajority,asatthethenrecentgeneralelection,andtheopeninterviewaLabourmajority.Foranotherexample,Sibbaldetal.(1994)comparedtworandomsamplesofGPs.Onesamplewereapproachedbypostandthenbytelephoneiftheydidnotreplyaftertworeminders,andtheotherwerecontacteddirectlybytelephone.Ofthepredominantlypostalsample,19%reportedthattheyprovidedcounsellingthemselves,comparedto36%ofthetelephonesample,and14%reportedthat
theirhealthvisitorprovidedcounsellingcomparedto30%ofthetelephonegroup.Thusthemethodofaskingthequestioninfluencedtheanswer.Onemustbeverycautiouswheninterpretingquestionnairereplies.
Fig.3.2.VolatilesubstanceabusemortalityandunemploymentinthecountiesofGreatBritain(Theareaofthecircleisproportional
![Page 78: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/78.jpg)
tothepopulationofthecounty,soreflectstheimportanceoftheobservation)
Oftentheeasiestandbestmethod,ifnottheonlymethod,ofobtainingdataaboutpeopleistoaskthem.Whenwedoit,wemustbeverycarefultoensurethatquestionsarestraightforward,unambiguousandinlanguagetherespondentswillunderstand.Ifwedonotdothisthendisasterislikelytofollow.
3.10*EcologicalstudiesEcologyisthestudyoflivingthingsinrelationtotheirenvironment.Inepidemiology,anecologicalstudyisonewherethediseaseisstudiedinrelationtocharacteristicsofthecommunitiesinwhichpeoplelive.Forexample,wemighttakethedeathratesfromheartdiseaseinseveralcountriesandseewhetherthisisrelatedtothenationalannualconsumptionofanimalfatperhead.
Esmailetal.(1977)carriedoutanecologicalstudyoffactorsrelatedtodeathsfromvolatilesubstanceabuse(VSA,alsocalledsolventabuse,inhalantabuseorgluesniffing).TheobservationalunitsweretheadministrativecountiesofGreatBritain.ThedeathswereobtainedfromanationalregisterofdeathsheldatSt.George'sandtheageandsexdistributionineachcountyfromnationalcensusdata.Thesewereusedtocalculateanindexofmortalityadjustedforage,thestandardizedmortalityratio(§16.3).Indicatorsofsocialdeprivationwerealsoobtainedfromcensusdata.Figure3.2showstherelationshipbetweenVSAmortalityandunemploymentinthecounties.Clearly,thereisarelationship.Themortalityishigherincountieswhereunemploymentishigh.
Relationshipsfoundinecologicalstudiesareindirect.Wemustnotconcludethatthereisarelationshipattheleveloftheperson.Thisistheecological
fallacy.Forexample,wecannotconcludefromFigure3.2thatunemployedpeopleareatagreaterriskofdyingfromVSAthantheemployed.ThepeakageforVSAdeathisamongschoolchildren,who
![Page 79: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/79.jpg)
arenotincludedintheunemploymentfigures.Itisnottheunemployedpeoplewhoaredying.Unemploymentisjustoneindicatorofsocialdeprivation,andVSAdeathsareassociatedwithmanyofthem.
Ecologicalstudiescanbeusefultogeneratehypotheses.Forexample,theobservationthathypertensioniscommonincountrieswherethereisahighintakeofdietarysaltmightleadustoinvestigatethesaltconsumptionandbloodpressureofindividualpeople,andarelationshiptheremightinturnleadtodietaryinterventions.Theseleadsoftenturnouttobefalse,however,andtheecologicalstudyaloneisneverenough.
3MMultiplechoicequestions7to13(Eachbranchiseithertrueorfalse)
7.Instatisticalterms,apopulation:
(a)consistsonlyofpeople;
(b)maybefinite;
(c)maybeinfinite;
(d)canbeanysetofthingsinwhichweareinterested;
(e)mayconsistofthingswhichdonotactuallyexist.
ViewAnswer
8.Aonedaycensusofin-patientsinapsychiatrichospitalcould:
(a)givegoodinformationaboutthepatientsinthathospitalatthattime;
(b)givereliableestimatesofseasonalfactorsinadmissions;
(c)enableustodrawconclusionsaboutthepsychiatrichospitalsofBritain;
(d)enableustoestimatethedistributionofdifferentdiagnosesinmentalillnessinthelocalarea;
(e)tellushowmanypatientstherewereinthehospital.
ViewAnswer
![Page 80: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/80.jpg)
9.Insimplerandomsampling:
(a)eachmemberofthepopulationhasanequalchanceofbeingchosen;
(b)adjacentmembersofthepopulationmustnotbechosen;
(c)likelyerrorscannotbeestimated;
(d)eachpossiblesampleofthegivensizehasanequalchanceofbeingchosen;
(e)thedecisiontoincludeasubjectinthesampledependsonlyonthesubject'sowncharacteristics.
ViewAnswer
10.Advantagesofrandomsamplinginclude:
(a)itcanbeappliedtoanypopulation;
(b)likelyerrorscanbeestimated;
(c)itisnotbiassed;
(d)itiseasytodo;
(e)thesamplecanbereferredtoaknownpopulation.
ViewAnswer
11.Inacase-controlstudytoinvestigatewhethereczemainchildrenisrelatedtocigarettesmokingbytheirparents:
(a)parentswouldbeaskedabouttheirsmokinghabitsatthechild'sbirthandthechildobservedforsubsequentdevelopmentofeczema;
(b)childrenofagroupofparentswhosmokewouldbecomparedtochildrenofagroupofparentswhoarenon-smokers;
(c)parentswouldbeaskedstoptosmokingtoseewhethertheirchildren'seczemawasreduced;
![Page 81: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/81.jpg)
(d)thesmokinghabitsoftheparentsofagroupofchildrenwitheczemawouldbecomparedtothesmokinghabitsoftheparentsofagroupofchildrenwithouteczema;
(e)parentswouldberandomlyallocatedtosmokingornon-smokinggroups.
ViewAnswer
12.Toexaminetherelationshipbetweenalcoholconsumptionandcanceroftheoesophagus,feasiblestudiesinclude:
(a)questionnairesurveyofarandomsamplefromtheelectoralrole;
(b)comparisonofhistoryofalcoholconsumptionbetweenagroupofoesophagealcancerpatientsandagroupofhealthycontrolsmatchedforageandsex;
(c)comparisonofcurrentoesophagealcancerratesinagroupofalcoholicsandagroupofteetotallers;
(d)comparisonbyquestionnaireofhistoryofalcoholconsumptionbetweenagroupofoesophagealcancerpatientsandarandomsamplefromtheelectoralroleinthesurroundingdistrict;
(e)comparisonofdeathratesduetocanceroftheoesophagusinalargesampleofsubjectswhosealcoholconsumptionhasbeendeterminedinthepast.
ViewAnswer
13.*Inastudyofhospitalpatients,20hospitalswerechosenatrandomfromalistofallhospitals.Withineachhospital,10%ofpatientswerechosenatrandom:
(a)thesampleofpatientsisarandomsample;
(b)allhospitalshadanequalchanceofbeingchosen;
(c)allhospitalpatientshadanequalchanceofbeingchosenattheoutset;
![Page 82: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/82.jpg)
(d)thesamplecouldbeusedtomakeinferencesaboutallhospitalpatientsatthattime;
(e)allpossiblesamplesofpatientshadanequalchanceofbeingchosen.
ViewAnswer
Table3.4.Doorstepdeliveryofmilkbottlesandexposuretobirdattack
No.(%)exposed
Cases Controls
Doorstepmilkdelivery 29(91%)
47(73%)
Previousmilkbottleattackbybirds 26(81%)
25(39%)
Milkbottleattackinweekbeforeillness
26(81%)
5(8%)
Protectivemeasurestaken 6(19%)
14(22%)
Handlingattackedmilkbottleinweekbeforeillness
17(53%)
5(8%)
Drinkingmilkfromattackedbottle 25 5(8%)
![Page 83: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/83.jpg)
inweekbeforeillness (80%)
Table3.5.Frequencyofbirdattacksonmilkbottles
Numberofdaysofweekwhenattackstookplace Cases Controls
0 3 42
1–3 11 3
4–5 5 1
6–7 10 1
3EExercise:CampylobacterjejuniinfectionCampylobacterjejuniisabacteriumcausinggastro-intestinalillness,spreadbythefaecal-oralroute.Itinfectsmanyspecies,andhumaninfectionhasbeenrecordedfromhandlingpetdogsandcats,handlingandeatingchickenandothermeats,andviamilkandwatersupplies.Treatmentisbyantibiotics.
InMay,1990,therewasafourfoldriseintheisolationrateofC.jejuniintheOgwrDistrict,Mid-Glamorgan.ThemotherofayoungboyadmittedtohospitalwithfebrileconvulsionsresultingfromC.jejuniinfectionreportedthathermilkbottleshadbeenattackedbybirdsduringtheweekbeforeherson'sillness,aphenomenonwhichhadbeenassociatedwithcampylobacterinfectioninanotherarea.This
![Page 84: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/84.jpg)
observation,withtheriseinC.jejuni,promptedacase-controlstudy(Southernetal.1990).
A‘case’wasdefinedasapersonwithlaboratoryconfirmedC.jejuniinfectionwithonsetbetweenMay1andJune11990,residentinanareawithBridgendatitscentre.Caseswereexcludediftheyhadspentoneormorenightsawayfromthisareaintheweekbeforeonset,iftheycouldhaveacquiredtheinfectionelsewhere,orweremembersofahouseholdinwhichtherehadbeenacaseofdiarrheaintheprecedingfourweeks.
Thecontrolswereselectedfromtheregisterofthegeneralpracticeofthecase,orinafewinstancesfrompracticesservingthesamearea.Twocontrolswereselectedforeachcase,matchedforsex,age(within5years),andareaofresidence.
Casesandcontrolswereinterviewedbymeansofastandardquestionnaireathomeorbytelephone.Caseswereaskedabouttheirexposuretovarious
factorsintheweekbeforetheonsetofillness.Controlswereaskedthesamequestionsaboutthecorrespondingweekfortheirmatchedcases.Ifacontrolormemberofhisorherfamilyhadhaddiarrhealastingmorethan3daysintheweekbeforeorduringtheillnessoftherespectivecase,orhadspentanynightsduringthatweekawayfromhome,anothercontrolwasfound.Evidenceofbirdattackincludedthepeckingortearingoffofmilkbottletops.Ahistoryofbirdattackwasdefinedasapreviousattackatthathouse.
Fifty-fivepeoplewithCampylobacterinfectionresidentintheareawerereportedduringthestudyperiod.Ofthese,19wereexcludedand4couldnotbeinterviewed,leaving32casesand64matchedcontrols.Therewasnodifferenceinmilkconsumptionbetweencasesandcontrols,butmorecasesthancontrolsreporteddoorstepdeliveryofbottledmilk,previousmilkbottleattackbybirds,milkbottleattackbybirdsintheindexweek,andhandlingordrinkingmilkfromanattackedbottle(Table3.4).Casesreportedbirdattacksmorefrequentlythancontrols(Table3.5).Controlsweremorelikelytohaveprotectedtheirmilkbottlesfromattackortohavediscardedmilkfromattacked
![Page 85: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/85.jpg)
bottles.Almostallsubjectswhosemilkbottleshadbeenattackedmentionedthatmagpiesandjackdawswerecommonintheirarea,thoughonly3hadactuallywitnessedattacksandnonereportedbirddroppingsnearbottles.
Noneoftheotherfactorsinvestigated(handlingrawchicken;eatingchickenboughtraw;eatingchicken,beeforhamboughtcooked;eatingout;attendingbarbecue;catordoginthehouse;contactwithothercatsordogs;andcontactwithfarmanimals)weresignificantlymorecommonincontrolsthancases.Bottleattacksseemedtohaveceasedwhenthestudywascarriedout,andnomilkcouldbeobtainedforanalysis.
1.Whatproblemswerethereinselectingcases?
ViewAnswer
2.Whatproblemswerethereintheselectionofcontrols?
ViewAnswer
3.Arethereanyproblemsaboutdatacollection?
ViewAnswer
4.Fromtheabove,doyouthinkthereisconvincingevidencethatbirdattacksonmilkbottlescausecampylobacterinfection?
ViewAnswer
5.Whatfurtherstudiesmightbecarriedout?
ViewAnswer
![Page 86: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/86.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>4-Summarizingdata
4
Summarizingdata
4.1TypesofdataInChapters2and3welookedatwaysinwhichdataarecollected.Inthischapterweshallseehowdatacanbesummarizedtohelptorevealinformationtheycontain.Wedothisbycalculatingnumbersfromthedatawhichextracttheimportantmaterial.Thesenumbersarecalledstatistics.Astatisticisanythingcalculatedfromthedataalone.
Itisoftenusefultodistinguishbetweenthreetypesofdata:qualitative,discretequantitativeandcontinuousquantitative.Qualitativedataarisewhenindividualsmayfallintoseparateclasses.Theseclassesmayhavenonumericalrelationshipwithoneanotheratall,e.g.sex:male,female;typesofdwelling:house,maisonette,flat,lodgings;eyecolour:brown,grey,blue,green,etc.Quantitativedataarenumerical,arisingfromcountsormeasurements.Ifthevaluesofthemeasurementsareintegers(wholenumbers),likethenumberofpeopleinahousehold,ornumberofteethwhichhavebeenfilled,thosedataaresaidtobediscrete.Ifthevaluesofthemeasurementscantakeanynumberinarange,suchasheightorweight,thedataaresaidtobecontinuous.Inpracticethereisoverlapbetweenthesecategories.Mostcontinuousdataarelimitedbytheaccuracywithwhichmeasurementscanbemade.Humanheight,forexample,isdifficulttomeasuremoreaccuratelythantothenearestmillimetreandismoreusuallymeasuredtothenearestcentimetre.Soonlyafinitesetofpossiblemeasurementsisactuallyavailable,althoughthequantity‘height’cantakeaninfinitenumberofpossiblevalues,andthemeasuredheightisreallydiscrete.However,themethodsdescribed
![Page 87: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/87.jpg)
belowforcontinuousdatawillbeseentobethoseappropriateforitsanalysis.
Weshallrefertoqualitiesorquantitiessuchassex,height,age,etc.asvariables,becausetheyvaryfromonememberofasampletoanother.Aqualitativevariableisalsotermedacategoricalvariableoranattribute.Weshallusethesetermsinterchangeably.
4.2FrequencydistributionsWhendataarepurelyqualitative,thesimplestwaytodealwiththemistocountthenumberofcasesineachcategory.Forexample,intheanalysisofthecensusofapsychiatrichospitalpopulation(§3.2),oneofthevariablesofinterestwasthepatient'sprincipaldiagnosis(Bewleyetal.1975).Tosummarizethesedata,
wecountthenumberofpatientshavingeachdiagnosis.TheresultsareshowninTable4.1.Thecountofindividualshavingaparticularqualityiscalledthefrequencyofthatquality.Forexample,thefrequencyofschizophreniais474.Theproportionofindividualshavingthequalityiscalledtherelativefrequencyorproportionalfrequency.Therelativefrequencyofschizophreniais474/1467=0.32or32%.Thesetoffrequenciesofallthepossiblecategoriesiscalledthefrequencydistributionofthevariable.
Table4.1.PrincipaldiagnosisofpatientsinTootingBecHospital
Diagnosis Numberofpatients
Schizophrenia 474
Affectivedisorders 277
![Page 88: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/88.jpg)
Organicbrainsyndrome 405
Subnormality 58
Alcoholism 57
Otherandnotknown 196
Total 1467
Table4.2.LikelihoodofdischargeofpatientsinTootingBecHospital
Discharge Frequency Relativefrequency
Cumulativefrequency
Relativecumulativefrequency
Unlikely 871 0.59 871 0.59
Possible 339 0.23 1210 0.82
Likely 257 0.18 1467 1.00
Total 1467 1.00 1467 1.00
Inthiscensusweassessedwhetherpatientswere‘likelytobedischarged’,‘possiblytobedischarged’or‘unlikelytobedischarged’.
![Page 89: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/89.jpg)
ThefrequenciesofthesecategoriesareshowninTable4.2.Likelihoodofdischargeisaqualitativevariable,likediagnosis,butthecategoriesareordered.Thisenablesustouseanothersetofsummarystatistics,thecumulativefrequencies.Thecumulativefrequencyforavalueofavariableisthenumberofindividualswithvalueslessthanorequaltothatvalue.Thus,ifweorderlikelihoodofdischargefrom‘unlikely’,through‘possibly’to‘likely’thecumulativefrequenciesare871,1210(=871+339)and1467.Therelativecumulativefrequencyforavalueistheproportionofindividualsinthesamplewithvalueslessthanorequaltothatvalue.Fortheexampletheyare0.59(=871/1467),0.82and1.00.Thuswecanseethattheproportionofpatientsforwhomdischargewasnotthoughtlikelywas0.82or82%.
Aswehavenoted,likelihoodofdischargeisaqualitativevariable,withorderedcategories.Sometimesthisorderingistakenintoaccountinanalysis,sometimesnot.Althoughthecategoriesareorderedthesearenotquantitativedata.Thereisnosenseinwhichthedifferencebetween‘likely’and‘possibly’isthesameasthedifferencebetween‘possibly’and‘unlikely’.
Table4.3.Parityof125womenattendingantenatalclinicsatSt.George'sHospital
Parity FrequencyRelativefrequency(percent)
Cumulativefrequency
Relativecumulativefrequency(percent)
0 59 47.2 59 47.2
1 44 35.2 103 82.4
2 14 11.2 117 93.6
![Page 90: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/90.jpg)
3 3 2.4 120 96.0
4 4 3.2 124 99.2
5 1 0.8 125 100.0
Total 125 100.0 125 100.0
Table4.4.FEV1(litres)of57malemedicalstudents
2.85 3.19 3.50 3.69 3.90 4.14 4.32 4.50
2.85 3.20 3.54 3.70 3.96 4.16 4.44 4.56
2.98 3.30 3.54 3.70 4.05 4.20 4.47 4.68
3.04 3.39 3.57 3.75 4.08 4.20 4.47 4.70
3.10 3.42 6.60 3.78 4.10 4.30 4.47 4.71
3.10 3.48 3.60 3.83 4.14 4.30 4.50 4.78
Table4.3showsthefrequencydistributionofaquantitativevariable,parity.ThisshowsthenumberofpreviouspregnanciesforasampleofwomenbookingfordeliveryatSt.George'sHospital.Onlycertainvalues
![Page 91: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/91.jpg)
arepossible,asthenumberofpregnanciesmustbeaninteger,sothisvariableisdiscrete.Thefrequencyofeachseparatevalueisgiven.
Table4.4showsacontinuousvariable,forcedexpiratoryvolumeinonesecond(FEV1)inasampleofmalemedicalstudents.Asmostofthevaluesoccuronlyonce,togetausefulfrequencydistributionweneedtodividetheFEV1scaleintoclassintervals,e.g.from3.0to3.5,from3.5to4.0,andsoon,andcountthenumberofindividualswithFEV1sineachclassinterval.Theclassintervalsshouldnotoverlap,sowemustdecidewhichintervalcontainstheboundarypointtoavoiditbeingcountedtwice.Itisusualtoputthelowerboundaryofanintervalintothatintervalandthehigherboundaryintothenextinterval.Thustheintervalstartingat3.0andendingat3.5contains3.0butnot3.5.Wecanwritethisas‘3.0-’or‘3.0-3.5-’or‘3.0-3.499’.Includingthelowerboundaryintheclassintervalhasthisadvantage.Mostdistributionsofmeasurementshaveazeropointbelowwhichwecannotgo,whereasfewhaveanexactupperlimit.Ifweweretoincludetheupperboundaryintheintervalinsteadofthelower,wewouldhavetwopossiblewaysofdealingwithzero.Itcouldbeleftasanisolatedpoint,notinaninterval.Alternatively,itcouldbeincludedinthelowestinterval,whichwouldthennotbeexactlycomparabletotheothersasitwouldincludebothboundarieswhilealltheotherintervalsonlyincludedtheupper.
Ifwetakeastartingpointof2.5andanintervalof0.5wegetthefrequencydistributionshowninTable4.5.Notethatthisisnotunique.Ifwetakea
startingpointof2.4andanintervalof0.2wegetadifferentsetoffrequencies.
Table4.5.FrequencydistributionofFEV1in57malemedicalstudents
FEV1 Frequency Relativefrequency(percent)
![Page 92: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/92.jpg)
2.0 0 0.0
2.5 3 5.3
3.0 9 15.8
3.5 14 24.6
4.0 15 26.3
4.5 10 17.5
5.0 6 10.5
5.5 0 0.0
Total 57 100.0
Table4.6.TallysystemforfindingthefrequencydistributionofFEV1
FEV1 Frequency
2.0 0
2.5 /// 3
![Page 93: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/93.jpg)
3.0 ///////// 9
3.5 ////////////// 14
4.0 /////////////// 15
4.5 ////////// 10
5.0 ////// 6
5.5 0
Total 57
Thefrequencydistributioncanbecalculatedeasilyandaccuratelyusingacomputer.Manualcalculationisnotsoeasyandmustbedonecarefullyandsystematically.Onewayrecommendedbymanytexts(e.g.Hill1977)istosetupatallysystem,asinTable4.6.Wegothroughthedataandforeachindividualmakeatallymarkbytheappropriateinterval.Wethencountupthenumberineachinterval.Inpracticethisisverydifficulttodoaccurately,anditneedstobecheckedanddouble-checked.Hill(1977)recommendswritingeachnumberonacardanddealingthecardsintopilescorrespondingtotheintervals.Itistheneasytocheckthateachpilecontainsonlythosecasesinthatintervalandcountthem.Thisisundoubtedlysuperiortothetallysystem.Anothermethodistoordertheobservationsfromlowesttohighestbeforemarkingtheintervalboundariesandcounting,ortousethestemandleafplotdescribedbelow.Personally,Ialwaysuseacomputer.
4.3HistogramsandotherfrequencygraphsGraphicalmethodsareveryusefulforexaminingfrequency
![Page 94: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/94.jpg)
distributions.Figure4.1showsagraphofthecumulativefrequencydistributionfortheFEV1
data.Thisiscalledastepfunction.Wecansmooththisbyjoiningsuccessivepointswherethecumulativefrequencychangesbystraightlines,togiveacumulativefrequencypolygon.Figure4.2showsthisforthecumulativerelativefrequencydistributionofFEV1.Thisplotisveryusefulforcalculatingsomeofthesummarystatisticsreferredtoin§4.5.
Fig.4.1.CumulativefrequencydistributionofFEV1inasampleofmalemedicalstudents
![Page 95: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/95.jpg)
Fig.4.2.CumulativefrequencypolygonofFEV1
Themostcommonwayofdepictingafrequencydistributionisbyahistogram.Thisisadiagramwheretheclassintervalsareonanaxisandrectangleswithheightsorareasproportionaltothefrequencieserectedonthem.Figure4.3showsthehistogramfortheFEV1distributioninTable4.5.Theverticalscaleshowsfrequency,thenumberofobservationsineachinterval.
Sometimeswewanttoshowthedistributionofadiscretevariable(e.g.Table4.3)asahistogram.Ifourintervalsare0–1-,1–2-,etc.,theactual
observationswillallbeatoneendoftheinterval.Makingthestartingpointoftheintervalasafractionratherthananintegergivesaslightlybetterpicture(Figure4.5).Thiscanalsobehelpfulforcontinuousdatawhenthereisalotofdigitpreference(§15.2).Forexample,wheremostobservationsarerecordedasintegersorassomethingpointfive,startingtheintervalatsomethingpointsevenfivecangiveamoreaccuratepicture.
![Page 96: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/96.jpg)
Fig.4.3.HistogramofFEV1:frequencyscale
Fig.4.4.HistogramofFEV1:frequencyperunitFEV1orfrequencydensityscale
![Page 97: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/97.jpg)
Fig.4.5.Histogramsofparity(Table4.3)usingintegerandfractionalcut-offpointsfortheintervals
Table4.7.Distributionofageinpeoplesufferingaccidentsinthehome(Whittington1977)
Agegroup
Relativefrequency(percent)
Relativefrequencyperyear(percent)
0–4 25.3 5.06
5–14 18.9 1.89
15–44
30.3 1.01
45–64
13.6 0.68
![Page 98: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/98.jpg)
65+ 11.7 0.33
Fig.4.6.Histogramsofagedistributionofhomeaccidentvictims,usingtherelativefrequencyscaleandtherelativefrequencydensityscale
Figure4.4showsahistogramforthesamedistributionasFigure4.3,withfrequencyperunitFEV1(orfrequencydensity)shownontheverticalaxis.Thedistributionsappearidenticalandwemaywellwonderwhetheritmatterswhichmethodwechoose.Weseethatitdoesmatterwhenweconsiderafrequencydistributionwithunequalintervals,asinTable4.7.Ifweplotthehistogramusingtheheightsoftherectanglestorepresentrelativefrequencyintheintervalwegettheleft-handhistograminFigure4.6,whereasifweusetherelativefrequencyperyearwegettheright-handhistogram.Thesehistogramstelldifferentstories.Theleft-handhistograminFigure4.6suggeststhatthemostcommonageforaccidentvictimsisbetween15and44years,whereastheright-handhistogramsuggestsitisbetween0and4.Theright-handhistogramiscorrect,theleft-handhistogrambeingdistortedbytheunequalclassintervals.Itisthereforepreferableingeneraltousethefrequencyperunit(frequencydensity)ratherthanperclassintervalwhenplottingahistogram.Thefrequencyforaparticularintervalisthenrepresentedbytheareaoftherectangleon
![Page 99: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/99.jpg)
thatinterval.Onlywhentheclassintervalsareallequalcanthefrequencyfortheclassintervalberepresented
bytheheightoftherectangle.Thecomputerprogrammerfindsequalintervalsmucheasier,however,andhistogramswithunequalintervalsarenowuncommon.
Fig.4.7.FrequencypolygonsofFEV1andPEFinmedicalstudents
Fig.4.8.StemandleafplotfortheFEV1data,roundeddowntoonedecimalplace
![Page 100: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/100.jpg)
Ratherthanahistogramconsistingofverticalrectangles,wecanplotafrequencypolygoninstead.Todothiswejointhecentrepointsofthetopsoftherectangles,thenomittherectangles(Figure4.7(a)).Whereacellofthehistogramisemptywejointhelinetothecentreofthecellatthehorizontalaxis(Figure4.7(b),males).Thiscanbeusefulifwewanttoshowtwoormorefrequencydistributionsonthesamegraph,asin(Figure4.7(b)).Whenwedothis,thecomparisoniseasierifweuserelativefrequencyorrelativefrequencydensityratherthanfrequency.Thismakesiteasiertocomparedistributionswithdifferentnumbersofsubjects.
AdifferentversionofthehistogramhasbeendevelopedbyTukey(1977),thestemandleafplot(Figure4.8).Therectanglesarereplacedbythenumbersthemselves.The‘stem’isthefirstdigitordigitsofthenumberandthe‘leafthetrailingdigit.ThefirstrowofFigure4.8representsthenumbers2.8,2.8,and2.9,whichinthedataare2.85,2.85,and2.98.Theplotprovidesagoodsummaryofdatastructurewhileatthesametimewecanseeothercharacteristicssuchasatendencytoprefersometrailingdigitstoothers,calleddigitpreference(§15.1).Itisalsoeasytoconstructandmuchlesspronetoerrorthanthetallymethodoffindingafrequencydistribution.
4.4ShapesoffrequencydistributionFigure4.3showsafrequencydistributionofashapeoftenseeninmedicaldata.Thedistributionisroughlysymmetricalaboutitscentralvalueandhasfrequencyconcentratedaboutonecentralpoint.Themostcommonvalueiscalledthe
modeofthedistributionandFigure4.3hasonesuchpoint.Itisunimodal.Figure4.9showsaverydifferentshape.Heretherearetwodistinctmodes,onenear5andtheothernear8.5.Thisdistributionisbimodal.Wemustbecarefultodistinguishbetweentheunevennessinthehistogramwhichresultsfromusingasmallsampletorepresentalargepopulationandthosewhichresultfromgenuinebimodalityinthedata.Thetroughbetween6and7inFigure4.9isverymarkedandmightrepresentagenuinebimodality.Inthiscasewehavechildren,someofwhomhaveaconditionwhichraisesthecholesterolleveland
![Page 101: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/101.jpg)
someofwhomdonot.Weactuallyhavetwoseparatepopulationsrepresentedwithsomeoverlapbetweenthem.However,almostalldistributionsencounteredmmedicalstatisticsareunimodal.
Fig.4.9.Serumcholesterolinchildrenfromkinshipswithfamilialhypercholesterolaemia(Leonardetal1977)
![Page 102: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/102.jpg)
Fig.4.10.Serumtriglycerideincordbloodfrom282babies(Table4.8)
Figure4.10differsfromFigure4.3inadifferentway.Thedistributionof
serumtriglycerideisskew,thatis,thedistancefromthecentralvaluetotheextremeismuchgreaterononesidethanitisontheother.Thepartsofthehistogramneartheextremesarecalledthetailsofthedistribution.Ifthetailsareequalthedistributionissymmetrical,asinFigure4.3.IfthetailontherightislongerthanthetailontheleftasinFigure4.10,thedistributionisskewtotherightorpositivelyskew.Ifthetailontheleftislonger,thedistributionisskewtotheleftornegativelyskew.Thisisunusual,butFigure4.11showsanexample.Thenegativeskewnesscomesaboutbecausebabiescanbebornaliveatanygestationalagefromabout20weeks,butsoonafter40weeksthebabywillhavetobeborn.Pregnancieswillnotbeallowedtogoonformorethan44weeks;thebirthwouldbeinducedartificially.Mostdistributionsencounteredinmedicalworkaresymmetricalorskewtotheright,forreasonsweshalldiscusslater(§7.4).
Table4.8.Serumtriglyceridemeasurementsincordbloodfrom282babies
0.15 0.29 0.32 0.36 0.40 0.42 0.46 0.50
0.16 0.29 0.33 0.36 0.40 0.42 0.46 0.50
0.20 0.29 0.33 0.36 0.40 0.42 0.47 0.52
0.20 0.29 0.33 0.36 0.40 0.44 0.47 0.52
![Page 103: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/103.jpg)
0.20 0.29 0.33 0.36 0.40 0.44 0.47 0.52
0.20 0.29 0.33 0.36 0.40 0.44 0.47 0.52
0.21 0.30 0.33 0.36 0.40 0.44 0.47 0.52
0.22 0.30 0.33 0.36 0.40 0.44 0.48 0.52
0.24 0.30 0.33 0.37 0.40 0.44 0.48 0.52
0.25 0.30 0.34 0.37 0.40 0.44 0.48 0.53
0.26 0.30 0.34 0.37 0.40 0.44 0.48 0.54
0.26 0.30 0.34 0.37 0.40 0.44 0.48 0.54
0.26 0.30 0.34 0.38 0.40 0.45 0.48 0.54
0.27 0.30 0.34 0.38 0.40 0.45 0.48 0.54
0.27 0.30 0.34 0.38 0.41 0.45 0.48 0.54
0.27 0.31 0.34 0.38 0.41 0.45 0.48 0.54
0.28 0.31 0.34 0.38 0.41 0.45 0.48 0.55
0.28 0.32 0.35 0.39 0.41 0.45 0.48 0.55
0.28 0.32 0.35 0.39 0.41 0.46 0.48 0.55
![Page 104: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/104.jpg)
0.28 0.32 0.35 0.39 0.41 0.46 0.49 0.55
0.28 0.32 0.35 0.39 0.41 0.46 0.49 0.55
0.28 0.32 0.35 0.39 0.42 0.46 0.49 0.55
0.28 0.32 0.35 0.40 0.42 0.46 0.50 0.55
0.28 0.32 0.36 0.40 0.42 0.46 0.50 0.55
4.5MediansandquantilesWeoftenwanttosummarizeafrequencydistributioninafewnumbers,foreaseofreportingorcomparison.Themostdirectmethodistousequantiles.Thequantilesarevalueswhichdividethedistributionsuchthatthereisagivenproportionofobservationsbelowthequantile.Forexample,themedianisaquantile.Themedianisthecentralvalueofthedistribution,suchthathalfthepointsarelessthanorequaltoitandhalfaregreaterthanorequaltoit.Wecanestimateanyquantileseasilyfromthecumulativefrequencydistribution
orastemandleafplot.FortheFEV1datathemedianis4.1,the29thvalueinTable4.4.Ifwehaveanevennumberofpoints,wechooseavaluemidwaybetweenthetwocentralvalues.
![Page 105: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/105.jpg)
Fig.4.11.Gestationalageatbirthfor1749deliveriesatSt.George'sHospital
Ingeneral,weestimatetheqquantile,thevaluesuchthataproportionqwillbebelowit,asfollows.Wehavenorderedobservationswhichdividethescaleinton+1parts:belowthelowestobservation,abovethehighestandbetweeneachadjacentpair.Theproportionofthedistributionwhichliesbelowtheithobservationisestimatedbyi/(n+1).Wesetthisequaltoqandgeti=q(n+1).Ifiisaninteger,theithobservationistherequiredquantileestimate.Ifnot,letjbetheintegerpartofi,thepartbeforethedecimalpoint.Thequantilewillliebetweenthejthandj+1thobservations.Weestimateitby
Forthemedian,forexample,the0.5quantile,i=q(n+1)=0.5×(57+1)=29,the29thobservationasbefore.
Otherquantileswhichareparticularlyusefularethequartilesofthedistribution.Thequartilesdividethedistributionintofourequalparts,calledfourths.Thesecondquartileisthemedian.FortheFEV1datathefirstandthirdquartilesare3.54and4.53.Forthefirstquartile,i=0.25×58=14.5.Thequartileisbetweenthe14thand15thobservations,whichareboth3.54.Forthethirdquartile,i=0.75×58=
![Page 106: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/106.jpg)
43.5,sothequartileliesbetweenthe42ndand43rdobservations,whichare4.50and4.56.Thequantileisgivenby4.50+(4.56-4.50)×(43.5-43)=4.53.Weoftendividethedistributionat99centilesorpercentiles.Themedianisthusthe50thcentile.Forthe20thcentileofFEV1,i=0.2×58=11.6,sothequantileisbetweenthe11thand12thobservation,3.42and3.48,andcanbeestimatedby3.42+(3.48-3.42)×(11.6-11)=3.46.WecanestimatetheseeasilyfromFigure4.2byfindingthepositionofthequantileontheverticalaxis,e.g.0.2for
the20thcentileor0.5forthemedian,drawingahorizontallinetointersectthecumulativefrequencypolygon,andreadingthequantileoffthehorizontalaxis.
Fig.4.12.BoxandwhiskerplotsforFEV1andforserumtriglyceride
![Page 107: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/107.jpg)
Fig.4.13.Boxplotsshowingaroughlysymmetricalvariableinfourgroups,withanoutlyingpoint(datainTable10.8)
Tukey(1977)usedthemedian,quartiles,maximumandminimumasaconvenientfivefiguresummaryofadistribution.Healsosuggestedaneatgraph,theboxandwhiskerplot,whichrepresentsthis(Figure4.12).Theboxshowsthedistancebetweenthequartiles,withthemedianmarkedasaline,andthe‘whiskers’showtheextremes.ThedifferentshapesoftheFEV1andserumtriglyceridedistributionsisclearfromthegraph.Fordisplaypurposes,anobservationwhosedistancefromtheedgeofthebox(i.e.thequartile)ismorethan1.5timesthelengthofthebox(i.e.theinterquartilerange,§4.7)maybecalledanoutlier.Outliersmaybeshownasseparatepoints(Figure4.13).Theplotcanbeusefulforshowingthecomparisonofseveralgroups(Figure4.13).
4.6ThemeanThemedianisnottheonlymeasureofcentralvalueforadistribution.Anotheristhearithmeticmeanoraverage,usuallyreferredtosimplyasthemean.Thisisfoundbytakingthesumoftheobservationsanddividingbytheirnumber.Forexample,considerthefollowing
![Page 108: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/108.jpg)
hypotheticaldata:
239540634
Thesumis36andthereare9observations,sothemeanis36/9=4.0.Atthispointwewillneedtointroducesomealgebraicnotation,widelyusedinstatistics.Wedenotetheobservationsby
x1,x2,…,xi,…,xn
Therearenobservationsandtheithoftheseisxi=Fortheexample,x4=5andn=9.Thesumofallthexiis
ThesummationsignisanuppercaseGreekletter,sigma,theGreekS.Whenitisobviousthatweareaddingthevaluesofx1,forallvaluesofi,whichrunsfrom1ton,weabbreviatethisto∑xiorsimplyto∑x.Themeanofthexiisdenotedby[xwithbarabove](‘xbar’),and
Thesumofthe57FEV1sis231.51andhencethemeanis231.51/57=4.06.Thisisveryclosetothemedian,4.1,sothemedianiswithin1%ofthemean.Thisisnotsoforthetriglyceridedata.Themediantriglyceride(Table4.8)is0.46butthemeanis0.51,whichishigher.Themedianis10%awayfromthemean.Ifthedistributionissymmetricalthesamplemeanandmedianwillbeaboutthesame,butinaskewdistributiontheywillnot.Ifthedistributionisskewtotheright,asforserumtriglyceride,themeanwillbegreater,ifitisskewtotheleftthemedianwillbegreater.Thisisbecausethevaluesinthetailsaffectthemeanbutnotthemedian.
Thesamplemeanhasmuchnicermathematicalpropertiesthanthemedianandisthusmoreusefulforthecomparisonmethodsdescribedlater.Themedianisaveryusefuldescriptivestatistic,butnotmuchusedforotherpurposes.
4.7Variance,rangeandinterquartilerange
![Page 109: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/109.jpg)
Themeanandmedianaremeasuresofthepositionofthemiddleofthedistribution,whichwecallthecentraltendency.Weshallalsoneedameasureofthespreadorvariabilityofthedistribution,calledthedispersion.
Oneobviousmeasureistherange,thedifferencebetweenthehighestandlowestvalues.ForthedataofTable4.4,therangeis5.43–2.85=2.58litres.The
rangeisoftenpresentedasthetwoextremes.2.85–5.43litres,ratherthantheirdifference.Therangeisausefuldescriptivemeasure,buthastwodisadvantages.Firstly,itdependsonlyontheextremevaluesandsocanvaryalotfromsampletosample.Secondly,itdependsonthesamplesize.Thelargerthesampleis,thefurtheraparttheextremesarelikelytobe.Wecanseethisifweconsiderasampleofsize2.Ifweaddathirdmembertothesampletherangewillonlyremainthesameifthenewobservationfallsbetweentheothertwo,otherwisetherangewillincrease.Wecangetroundthesecondoftheseproblemsbyusingtheinterquartilerange,thedifferencebetweenthefirstandthirdquartiles.ForthedataofTable4.4,theinterquartilerangeis4.53--3.54=0.99litres.Theinterquartilerange,too,isoftenpresentedasthetwoextremes,3.54–4.53litres.However,theinterquartilerangeisquitevariablefromsampletosampleandisalsomathematicallyintractable.Althoughausefuldescriptivemeasure,itisnottheonepreferredforpurposesofcomparison.
Table4.9.Deviationsfromthemeanof9observations
Observationsxi
Deviationsfromthemeanxi-[xwithbarabove]
Squareddeviations(xi-[xwithbarabove])2
2 -2 4
![Page 110: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/110.jpg)
3 -1 1
9 5 25
5 1 1
4 0 0
0 -4 16
6 2 4
3 -1 1
4 0 0
36 0 52
Themostcommonlyusedmeasuresofdispersionarethevarianceandstandarddeviation.Westartbycalculatingthedifferencebetweeneachobservationandthesamplemean,calledthedeviationsfromthemean,Table4.9.Ifthedataarewidelyscattered,manyoftheobservationsxiwillbefarfromthemean[xwithbarabove]andsomanydeviationsxi-[xwithbarabove]willbelarge.Ifthedataarenarrowlyscattered,veryfewobservationswillbefarfromthemeanandsofewdeviationsxi-[xwithbarabove]willbelarge.Weneedsomekindofaveragedeviationtomeasurethescatter.Ifweaddallthedeviationstogether,wegetzero,because∑(xi-[xwithbarabove])=∑xi-∑[xwithbarabove]=∑xi-n[xwithbarabove]andn[xwithbar
![Page 111: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/111.jpg)
above]=∑xi.Insteadwesquarethedeviationsandthenaddthem,asshowninTable4.9.Thisremovestheeffectofsign;weareonlymeasuringthesizeofthedeviationnotthedirection.Thisgivesus∑(xi-[xwithbarabove])2,intheexampleequalto52,calledthesumofsquaresaboutthemean,usuallyabbreviatedtosumofsquares.
Clearly,thesumofsquareswilldependonthenumberofobservationsaswellasthescatter.Wewanttofindsomekindofaveragesquareddeviation.Thisleadstoadifficulty.Althoughwewantanaveragesquareddeviation,wedividethesumofsquaresbyn-1,notn.Thisisnottheobviousthingtodoandpuzzles
manystudentsofstatisticalmethods.Thereasonisthatweareinterestedinestimatingthescatterofthepopulation,ratherthanthesample,andthesumofsquaresaboutthesamplemeanisproportionalton-1(§4A,§6B),Dividingbynwouldleadtosmallsamplesproducinglowerestimatesofvariabilitythanlargesamples.Theminimumnumberofobservationsfromwhichthevariabilitycanbeestimatedis2,asingleobservationcannottellushowvariablethedataare.Ifweusednasourdivisor,forn-Ithesumofsquareswouldbezero,givingavarianceofzero.Withthecorrectdivisorofn-1,n=1givesthemeaninglessratio0/0,reflectingtheimpossibilityofestimatingvariabilityfromasingleobservation.Theestimateofvariabilityiscalledthevariance,definedby
Wehavealreadysaidthat∑(xi-[xwithbarabove])2iscalledthesumofsquares.Thequantityn-1iscalledthedegreesoffreedomofthevarianceestimate(§7A).Wehave:
Weshallusuallydenotethevariancebys2.Intheexample,thesumofsquaresis52andthereare9observations,giving8degreesoffreedom.Hences2=52/8=6.5.
Theformula∑(xi-[xwithbarabove])2givesusarathertediouscalculation.Thereisanotherformulaforthesumofsquares,which
![Page 112: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/112.jpg)
makesthecalculationeasiertocarryout.Thisissimplyanalgebraicmanipulationofthefirstformandgiveexactlythesameanswers.Wethushavetwoformulaeforvariance:
Thealgebraisquitesimpleandisgivenin§4B.Forexample,usingthesecondformulaforthenineobservations,wehave:
asbefore.Onacalculatorthisisamucheasierformulathanthefirst,asthenumbersneedonlybeputinonce.Itcanbeinaccurate,becausewesubtractonelargenumberfromanothertogetasmallone.Forthisreasonthefirstformulawouldbeusedinacomputerprogram.
4.8StandarddeviationThevarianceiscalculatedfromthesquaresoftheobservations.Thismeansthatitisnotinthesameunitsastheobservations,whichlimitsitsuseasadescriptivestatistic.Theobviousanswertothisistotakethesquareroot,whichwillthenhavethesameunitsastheobservationsandthemean.Thesquarerootofthevarianceiscalledthestandarddeviation,usuallydenotedbys.Thus,
ReturningtotheFEVdata,wecalculatethevarianceandstandarddeviationasfollows.Wehaven=57,∑xi231.51,=∑xi2=965.45:
![Page 113: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/113.jpg)
Figure4.14showstherelationshipbetweenmean,standarddeviationandfrequencydistribution.ForFEV1,weseethatthemajorityofobservationsarewithinonestandarddeviationofthemean,andnearlyallwithintwostandarddeviationsofthemean.Thereisasmallpartofthehistogramoutsidethe[xwithbarabove]-2sto[xwithbarabove]+2sinterval,oneithersideofthissymmetricalhistogram.As
Figure4.14alsoshows,thisistrueforthehighlyskewtriglyceridedata,too.Inthiscase,however,theoutlyingobservationsareallinonetailofthedistribution.Ingeneral,weexpectroughly2/3ofobservationstoliewithinonestandarddeviationofthemeanand95%toliewithintwostandarddeviationsofthemean.
![Page 114: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/114.jpg)
Fig.4.14.HistogramsofFEV1andtriglyceridewithmeanandstandarddeviation
Table4.10.Populationof100randomdigitsforasamplingexperiment
9 1 0 7 5 6 9 5 8 8 1 0 5 7
1 8 8 8 5 2 4 8 3 1 6 5 5 7
2 8 1 8 5 8 4 0 1 9 2 1 6 9
1 9 7 9 7 2 7 7 0 8 1 6 3 8
7 0 2 8 8 7 2 5 4 1 8 6 8 3
Appendices
4AAppendix:Thedivisorforthevariance
Thevarianceisfoundbydividingthesumofsquaresaboutthesamplemeanbyn-1,notbyn.Thisisbecausewewantthescatteraboutthepopulationmean,andthescatteraboutthesamplemeanisalwaysless.Thesamplemeanis‘closer’tothedatapointsthanisthepopulationmean.Weshalltryalittlesamplingexperimenttoshowthis.Table4.10showsasetof100randomdigitswhichweshalltakeasthepopulationtobesampled.Theyhavemean4.74andthesumofsquaresaboutthemeanis811.24.Hencetheaveragesquareddifferencefromthemeanis8.1124.Wecantakesamplesofsizetwoat
![Page 115: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/115.jpg)
randomfromthispopulationusingapairofdecimaldice,whichwillenableustochooseanydigitnumberedfrom00to99.Thefirstpairchosenwas5and6whichhasmean5.5.Thesumofsquaresaboutthepopulationmean4.74is(5-4.74)2+(6-4.74)2=1.655.Thesumofsquaresaboutthesamplemeanis(5-5.5)2+(6-5.5)2=0.5.
Thesumofsquaresaboutthepopulationmeanisgreaterthanthesumofsquaresaboutthesamplemean,andthiswillalwaysbeso.Table4.11showsthisfor20suchsamplesofsizetwo.Theaveragesumofsquaresaboutthepopulationmeanis13.6,andaboutthesamplemeanitis5.7.Hencedividingbythesamplesize(n=2)wehavemeansquaredifferencesof6.8aboutthepopulationmeanand2.9aboutthesamplemean.Comparethisto8.1forthepopulationasawhole.Weseethatthesumofsquaresaboutthepopulation
meanisquitecloseto8.1,whilethesumofsquaresaboutthesamplemeanismuchless.However,ifwedividethesumofsquaresaboutthesamplemeanbyn-1,i.e.1,insteadofnwehave5.7,whichisnotmuchdifferenttothe6.8fromthesumofsquaresaboutthepopulationmean.
Table4.11.SamplingpairsfromTable4.10
Sample ∑(xi-µ)2 ∑(xi-[xwithbarabove])2
5 6 1.655 0.5
8 8 21.255 0.0
6 1 15.575 12.5
9 3 21.175 18.0
![Page 116: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/116.jpg)
5 5 0.135 0.0
7 7 10.215 0.0
1 7 19.095 18.0
9 8 28.775 0.5
3 3 6.055 0.0
5 1 14.055 8.0
8 3 13.655 12.5
5 7 5.175 2.0
5 2 5.575 4.5
5 7 5.175 2.0
8 8 21.255 0.0
3 2 10.535 0.5
0 4 23.015 8.0
9 3 21.175 18.0
5 2 7.575 4.5
![Page 117: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/117.jpg)
6 9 19.735 4.5
Mean 13.6432 5.7
Table4.12.Meansumsofsquaresaobutthesamplemeanforsetsof100randomsamplesfromTable
4.11
Numberinsample,nMeanvarianceestimates
2 4.5 9.1
3 5.4 8.1
4 5.9 7.9
5 6.2 7.7
10 7.2 8.0
Table4.12showstheresultsofasimilarexperimentwithmoresamplesbeingtaken.Thetableshowsthetwoaveragevarianceestimatesusingnandn-1asthedivisorofthesumofsquares,forsamplesizes2,3,4,5and10.Weseethatthesumofsquaresaboutthesamplemeandividedbynincreasessteadilywithsamplesize,butifwedivideitbyn
![Page 118: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/118.jpg)
-1insteadofntheestimatedoesnotchangeasthesamplesizeincreases.Thesumofsquaresaboutthesamplemeanisproportionalton-1.
4BAppendix:Formulaeforthesumofsquares
Thedifferentformulaeforsumsofsquaresarederivedasfollows:
because[xwithbarabove]hasthesamevalueforeachofthenobservations.Now,so
Wethushavethreeformulaeforvariance:
4MMultiplechoicequestions14to19(Eachbranchiseithertrueorfalse)
![Page 119: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/119.jpg)
14.Whichofthefollowingarequalitativevariables:
(a)sex;
(b)parity;
(c)diastolicbloodpressure;
(d)diagnosis;
(e)height.
ViewAnswer
15.Whichofthefollowingarecontinuousvariables:
(a)bloodglucose;
(b)peakexpiratoryflowrate;
(c)agelastbirthday;
(d)exactage;
(e)familysize.
ViewAnswer
16.Whenadistributionisskewtotheright:
(a)themedianisgreaterthanthemean;
(b)thedistributionisunimodal;
(c)thetailontheleftisshorterthanthetailontheright;
(d)thestandarddeviationislessthanthevariance;
(e)themajorityofobservationsarelessthanthemean.
ViewAnswer
17.Theshapeofafrequencydistributioncanbedescribedusing:
(a)aboxandwhiskerplot;
(b)ahistogram:
![Page 120: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/120.jpg)
(c)astemandleafplot;
(d)meanandvariance;
(e)atableoffrequencies.
ViewAnswer
18.Forthesample3,1,7,2,2:
(a)themeanis3:
(b)themedianis7:
(c)themodeis2:
(d)therangeis1:
(e)thevarianceis5.5.
ViewAnswer
19.Diastolicbloodpressurehasadistributionwhichisslightlyskewtotheright.Ifthemeanandstandarddeviationwerecalculatedforthediastolicpressuresofarandomsampleofmen:
(a)therewouldbefewerobservationsbelowthemeanthanaboveit;
(b)thestandarddeviationwouldbeapproximatelyequaltothemean;
(c)themajorityofobservationswouldbemorethanonestandarddeviationfromthemean:
(d)thestandarddeviationwouldestimatetheaccuracyofbloodpressuremeasurement:
(e)about95%ofobservationswouldbeexpectedtobewithintwostandarddeviationsofthemean.
ViewAnswer
4EExercise:Meanandstandarddeviation
![Page 121: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/121.jpg)
Thisexercisegivessomepracticeinoneofthemostfundamentalcalculationsinstatistics,thatofthesumofsquaresandstandarddeviation.Italsoshowstherelationshipofthestandarddeviationtothefrequencydistribution.Table4.13showsbloodglucoselevelsobtainedfromagroupofmedicalstudents.
1.Makeastemandleafplotforthesedata.
ViewAnswer
2.Findtheminimum,maximumandquartilesandsketchaboxandwhiskerplot.
ViewAnswer
3.Findthefrequencydistribution,usingaclassintervalof0.5.
ViewAnswer
Table4.13.Randombloodglucoselevelsfromagroupoffirstyearmedicalstudents(mmol/litre)
4.7 3.6 3.8 2.2 4.7 4.1 3.6 4.0 4.4 5.1
4.2 4.1 4.4 5.0 3.7 3.6 2.9 3.7 4.7 3.4
3.9 4.8 3.3 3.3 3.6 4.6 3.4 4.5 3.3 4.0
3.4 4.0 3.8 4.1 3.8 4.4 4.9 4.9 4.3 6.0
4.Sketchthehistogramofthisfrequencydistribution.Whattermbestdescribestheshape:symmetrical,skewtotherightorskewto
![Page 122: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/122.jpg)
theleft?
ViewAnswer
5.Forthefirstcolumnonly,i.e.for4,7,4.2,3.9,and3.4,calculatethestandarddeviationusingthedeviationsfromthemeanformula
Firstcalculatethesumoftheobservationsandthesumoftheobservationssquared.Hencecalculatethesumofsquaresaboutthemean.Isthisthesameasthatfoundin4above?Hencecalculatethevarianceandthestandarddeviation.
ViewAnswer
6.Forthesamefournumbers,calculatethestandarddeviationusingtheformula
Firstcalculatethesumoftheobservationsandthesumoftheobservationssquared.Hencecalculatethesumofsquaresaboutthemean.Isthisthesameasthatfoundin4above?Hencecalculatethevarianceandthestandarddeviation.
ViewAnswer
7.Usethefollowingsummationsforthewholesample:∑xi=162.2,∑xi2=676.74.Calculatethemeanofthesample,thesumofsquaresaboutthemean,thedegreesoffreedomforthissumofsquares,andhenceestimatethevarianceandstandarddeviation.
ViewAnswer
8.Calculatethemean±onestandarddeviationandmean±twostandarddeviations.Indicatethesepointsandthemeanonthehistogram.Whatdoyouobserveabouttheirrelationshiptothefrequencydistribution?
![Page 123: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/123.jpg)
ViewAnswer
![Page 124: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/124.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>5-Presentingdata
5
Presentingdata
5.1RatesandproportionsHavingcollectedourdataasdescribedinChapters2and3andextractedinformationfromitusingthemethodsofChapter4,wemustfindawaytoconveythisinformationtoothers.Inthischapterweshalllookatsomeofthemethodsofdoingthat.Webeginwithratesandproportions.
Whenwehavedataintheformoffrequencies,weoftenneedtocomparethefrequencywithcertainconditionsingroupscontainingdifferenttotals.InTable2.1,forexample,twogroupsofpatientpairswerecompared,29wherethelaterpatienthadaC-Tscanand89whereneitherhadaC-Tscan.Thelaterpatientdidbetterin9ofthefirstgroupand34ofthesecondgroup.Tocomparethesefrequencieswecomparetheproportions9/29and34/89.Theseare0.31and0.38,andwecanconcludethatthereislittledifference.InTable2.1,theseweregivenaspercentages,thatis,theproportionoutof100ratherthanoutof1,toavoidthedecimalpoint.InTable2.8,theSalkvaccinetrial,theproportionscontractingpoliowerepresentedasthenumberper100000forthesamereason.
Arateexpressesthefrequencyofthecharacteristicofinterestper1000(orper100000,etc.)ofthepopulation,perunitoftime.Forexample,inTable3.1,theresultsofthestudyofsmokingbydoctors,thedatawerepresentedasthenumberofdeathsper1000doctorsperyear.Thisisnotaproportion,asafurtheradjustmenthasbeenmadetoallowforthetimeperiodobserved.Furthermore,theratehasbeenadjustedtotakeaccountofanydifferencesintheagedistributionsof
![Page 125: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/125.jpg)
smokersandnon-smokers(§16.2).Sometimestheactualdenominatorforaratemaybecontinuallychanging.ThenumberofdeathsfromlungcanceramongmeninEnglandandWalesfor1983was26502.Thedenominatorforthedeathrate,thenumberofmalesinEnglandandWales,changedthroughout1983,assomedied,somewereborn,someleftthecountryandsomeenteredit.Thedeathrateiscalculatedbyusingarepresentativenumber,theestimatedpopulationattheendofJune1983,themiddleoftheyear.Thiswas24175900,givingadeathrateof26502/24175900,whichequals0.001096,or109.6deathsper100000atriskperyear.Anumberoftheratesusedinmedicalstatisticsaredescribedin§16.5.
Theuseofratesandproportionsenablesustocomparefrequenciesobtainedfromunequalsizedgroups,basepopulationsortimeperiods,butwemustbewareoftheirusewhentheirbasesordenominatorsarenotgiven.Victora(1982)
reportedadrugadvertisementsenttodoctorswhichdescribedtheantibioticphosphomycinasbeing‘100%effectiveinchronicurinaryinfections’.Thisisveryimpressive.Howcouldwefailtoprescribeadrugwhichis100%effective?Thestudyonwhichthiswasbasedused8patients,afterexcluding‘thosewhoseurinecontainedphosphomycin-resistantbacteria’.Iftheadvertisementhassaidthedrugwaseffectivein100%of8cases,wewouldhavebeenlessimpressed.Hadweknownthatitworkedin100%of8casesselectedbecauseitmightworkinthem,wewouldhavebeenstilllessimpressed.Thesamepaperquotesanadvertisementforacoldremedy,where100%ofpatientsshowedimprovement.Thiswasoutof5patients!AsVictoraremarked,suchsmallsamplesareunderstandableinthestudyofveryrarediseases,butnotforthecommoncold.
Sometimeswecanfoolourselvesaswellasothersbyomittingdenominators.IoncecarriedoutastudyofthedistributionofthesofttissuetumourKaposi'ssarcomainTanzania(Blandetal.1977),andwhilewritingitupIcameacrossapapersettingouttodothesamething(Schmid1973).Oneofthefactorsstudiedwastribalgroup,ofwhichthereareover100inTanzania.Thispaperreported‘thetribalincidenceintheWabende,WambweandWashiraziisremarkable…
![Page 126: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/126.jpg)
Thesesmalltribes,eachwithfewerthan90000people,constitutethegroupinwhichatribalfactorcanbesuspected’.Thisisbasedonthefollowingratesoftumoursper10000population:national,0.1;Wabende,1.3;Wambwe.0.7;Washirazi,1.3.Theseareverybigratescomparedtothenational,butthepopulationsonwhichtheyarebasedaresmall,8000,14000and15000respectively(EgeroandHenin1973).Togetarateof1.3/10000outof8000Wabendepeoplewemusthave8000×1.3/10000=1case!Similarlywehave1caseamongthe14000Wambweand2amongthe15000Washirazi.Wecanseethattherearenotenoughdatatodrawtheconclusionswhichtheauthorhasdone.Ratesandproportionsarepowerfultoolsandwemustbewareofthembecomingdetachedfromtheoriginaldata.
5.2SignificantfiguresWhenwecalculatedthedeathrateduetolungcanceramongmenin1983wequotedtheansweras0.001096or109.6per100000peryear.Thisisanapproximation.Theratetothegreatestnumberoffiguresmycalculatorwillgiveis0.001096215653andthisnumberwouldprobablygoonindefinitely,turningintoarecurringseriesofdigits.Thedecimalsystemofrepresentingnumberscannotingeneralrepresentfractionsexactly.Weknowthat1/2=0.5,but1/3=0.33333333…,recurringinfinitely.Thisdoesnotusuallyworryus,becauseformostapplicationsthedifferencebetween0.333and1/3istoosmalltomatter.Onlythefirstfewnon-zerodigitsofthenumberareimportantandwecallthesethesignificantdigitsorsignificantfigures.Thereisusuallylittlepointinquotingstatisticaldatatomorethanthreesignificantfigures.Afterall,ithardlymatterswhetherthelungcancermortalityrateis0.001096or0.001097.Thevalue0.001096isgivento4significantfigures.Theleadingzerosarenotsignificant,thefirstsignificantdigitinthisnumberbeing‘1’.Tothreesignificant
figuresweget0.00110,becausethelastdigitis6andsothe9whichprecedesitisroundedupto10.Notethatsignificantfiguresarenotthesameasdecimalplaces.Thenumber0.00110isgivento5decimalplaces,thenumberofdigitsafterthedecimalpoint.Whenroundingtothenearestdigit,weleavethelastsignificantdigit,9inthiscase,if
![Page 127: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/127.jpg)
whatfollowsitislessthan5,andincreasebyoneifwhatfollowsisgreaterthan5.Whenwehaveexactly5,Iwouldalwaysroundup,i.e.1.5goesto2.Thismeansthat0,1,2,3,4godownand5,6,7,8,9goup,whichseemsunbiased.Somewriterstaketheviewthat5shouldgouphalfthetimeanddownhalfthetime,sinceitisexactlymidwaybetweentheprecedingdigitandthatdigitplusone.VariousmethodsaresuggestedfordoingthisbutIdonotrecommendthemmyself.Inanycase,itisusuallyamistaketoroundtosofewsignificantfiguresthatthismatters.
Howmanysignificantfiguresweneeddependsontheusetowhichthenumberistobeputandonhowaccurateitisanyway.Forexample,ifwehaveasampleof10sublingualtemperaturesmeasuredtothenearesthalfdegree,thereislittlepointinquotingthemeantomorethan3significantfigures.Whatweshouldnotdoistoroundnumberstoafewsignificantfiguresbeforewehavecompletedourcalculations.Inthelungcancermortalityrateexample,supposeweroundthenumeratoranddenominatortotwosignificantfigures.Wehave27000/24000000=0.001125andtheanswerisonlycorrecttotwofigures.Thiscanspreadthroughcalculationscausingerrorstobuildup.Wealwaystrytoretainseveralmoresignificantfiguresthanwerequiredforthefinalanswer.
ConsiderTable5.1.Thisshowsmortalitydataintermsoftheexactnumbersofdeathsinoneyear.Thetableistakenfromamuchlargertable(OPCS1991)whichshowsthenumbersdyingfromeverycauseofdeathintheInternationalClassificationofDiseases(ICD),whichgivesnumericalcodestomanyhundredsofcausesofdeath.Thefulltable,whichalsogivesdeathsbyagegroup,covers70A4pages.Table5.1showsdeathsforbroadgroupsofdiseasescalledICDchapters.Thistableisnotagoodwaytopresentthesedataifwewanttogetanunderstandingofthefrequencydistributionofcauseofdeath,andthedifferencesbetweencausesinmenandwomen.Thisisevenmoretrueofthe70pageoriginal.Thisisnotthepurposeofthetable,ofcourse.Itisasourceofdata,areferencedocumentfromwhichusersextractinformationfortheirownpurposes.LetusseehowTable5.1canbesimplified.First,wecanreducethenumberofsignificantfigures.Letusbeextremeandreducethedatatoonesignificantfigure(Table5.2).
![Page 128: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/128.jpg)
Thismakescomparisonsrathereasier,butitisstillnotobviouswhicharethemostimportantcausesofdeath.Wecanimprovethisbyre-orderingthetabletoputthemostfrequentcause,diseasesofthecirculatorysystem,first(Table5.3).Wecanalsocombinealotofthesmallercategoriesintoan‘others’group.Ididthisarbitrarily,bycombiningallthoseaccountingforlessthan2%ofthetotal.NowitisclearataglancethatthemostimportantcausesofdeathinEnglandandWalesarediseasesofthecirculatorysystem,neoplasmsanddiseasesoftherespiratorysystem,andthatthesedwarfalltheothers.Ofcourse,mortalityisnottheonlyindicatoroftheimportanceofadisease.ICDchapterXIII,diseasesofthemusculo-skeletal
systemandconnectivetissues,areeasilyseenfromTable5.2tobeonlyminorcausesofdeath,butthisgroupincludesarthritisandrheumatism,themostimportantillnessinitseffectsondailyactivity.
Table5.1.Deathsbysexandcause,EnglandandWales,1989(OPCS1991,DH2No.10)
I.C.D. Chapterandtypeofdisease
Numberofdeaths
Males Females
I Infectiousandparasitic 1246
1297
II Neoplasms(cancers) 75172
69948
III Endocrine,nutritionalandmetabolicdiseasesand
4395
5758
![Page 129: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/129.jpg)
immunitydisorders
IV Bloodandbloodformingorgans
1002
1422
V Mentaldisorders 4493
9225
VI Nervoussystemandsenseorgans
5466
5990
VII Circulatorysystem 127435
137165
VIII Respiratorysystem 33489
33223
IX Digestivesystem 7900
10779
X Genitourinarysystem 3616
4156
XI Complicationsofpregnancy,childbirthandthepuerperium
0 56
XII Skinandsubcutaneoustissues
250 573
![Page 130: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/130.jpg)
XIII Musculo-skeletalsystemandconnectivetissues
1235
4139
XIV Congenitalanomalies 897 869
XV Certainconditionsoriginatingintheperinatalperiod
122 118
XVI Signs,symptomsandill-definedconditions
1582
3082
XVII Injuryandpoisoning 11073
6427
Total 279373
294227
5.3PresentingtablesTables5.1,5.2and5.3illustrateanumberofusefulpointsaboutthepresentationoftables.Likeallthetablesinthisbook,theyaredesignedtostandalonefromthetext.Thereisnoneedtorefertomaterialburiedinsomeparagraphtointerpretthetable.Atableisintendedtocommunicateinformation,soitshouldbeeasytoreadandunderstand.Atableshouldhaveacleartitle,statingclearlyandunambiguouslywhatthetablerepresents.Therowsandcolumnsmustalsobelabelledclearly.
Whenproportions,ratesorpercentagesareusedinatabletogetherwithfrequencies,theymustbeeasytodistinguishfromoneanother.Thiscanbedone,asinTable2.10,byaddinga‘%’symbol,orby
![Page 131: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/131.jpg)
includingaplaceofdecimals.TheadditioninTable2.10ofthe‘total’rowandthe‘100%’makesitclearthatthepercentagesarecalculatedfromthenumberinthetreatmentgroup,ratherthanthenumberwiththatparticularoutcomeorthetotalnumberofpatients.
Table5.2.Deathsbysexandcause,EnglandandWales,1989,roundedtoonesignificantfigure
I.C.D. Chapterandtypeofdisease
Numberofdeaths
Males Females
I Infectiousandparasitic 1000
1000
II Neoplasms(cancers) 80000
70000
III Endocrine,nutritionalandmetabolicdiseasesandimmunitydisorders
4000
6000
IV Bloodandbloodformingorgans
1000
1000
V Mentaldisorders 4000
9000
VI Nervoussystemandsense 5 6000
![Page 132: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/132.jpg)
organs 000
VII Circulatorysystem 100000
100000
VIII Respiratorysystem 30000
30000
IX Digestivesystem 8000
10000
X Genitourinarysystem 4000
4000
XI Complicationsofpregnancy,childbirthandthepuerperium
0 60
XII Skinandsubcutaneoustissues
300 600
XIII Musculo-skeletalsystemandconnectivetissues
1000 4000
XIV Congenitalanomalies 900 900
XV Certainconditionsoriginatingintheperinatalperiod
100 100
![Page 133: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/133.jpg)
XVI Signs,symptomsandill-definedconditions
2000
3000
XVII Injuryandpoisoning 10000
6000
Total 300000
300000
Table5.3.Deathsbysex,EnglandandWales,1989,formajorcauses
I.C.D.Chapterandtypeofdisease
Numberofdeaths
Males Females
Circulatorysystem(VII) 100000
100000
Neoplasms(cancers)(II) 80000 70000
Respiratorysystem(VIII) 30000 30000
Injuryandpoisoning(XVII) 10000 6000
Digestivesystem(IX) 8000 10000
![Page 134: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/134.jpg)
Others 20000 20000
Total 300000
300000
5.4PiechartsItisoftenconvenienttopresentdatapictorially.Informationcanbeconveyedmuchmorequicklybyadiagramthanbyatableofnumbers.Thisisparticularlyusefulwhendataarebeingpresentedtoanaudience,asheretheinformationhastobegotacrossinalimitedtime.Itcanalsohelpareadergetthesalientpointsofatableofnumbers.Unfortunately,unlessgreatcareistaken,diagramscanalsobeverymisleadingandshouldbetreatedonlyasanadditiontonumbers,notareplacement.
Wehavealreadydiscussedmethodsofillustratingthefrequencydistributionofaqualitativevariable.Wewillnowlookatanequivalentofthehistogramfor
qualitativedata,thepiechartorpiediagram.Thisshowstherelativefrequencyforeachcategorybydividingacircleintosectors,theanglesofwhichareproportionaltotherelativefrequency.Wethusmultiplyeachrelativefrequencyby360,togivethecorrespondingangleindegrees.
Table5.4.Calculationsforapiechartofthedistributionofcauseofdeath
Causeofdeath Frequency Relativefrequency
Angle(degrees)
![Page 135: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/135.jpg)
Circulatorysystem
137165 0.46619 168
Neoplasms(cancers)
69948 0.23773 86
Respiratorysystem
33223 0.11292 41
Injuryandpoisoning
6427 0.02184 8
Digestivesystem
10779 0.03663 13
Nervoussystem
5990 0.02036 7
Others 30695 0.10432 38
Total 294227 1.00000 361
![Page 136: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/136.jpg)
Fig.5.1.Piechartshowingthedistributionofcauseofdeathamongfemales,EnglandandWales,1983
Table5.4showsthecalculationfordrawingapiecharttorepresentthedistributionofcauseofdeathforfemales,usingthedataofTables5.1and5.3.(Thetotaldegreesare361ratherthan360becauseofroundingerrorsinthecalculations.)TheresultingpiechartisshowninFigure5.1.Thisdiagramissaidtoresembleapiecutintopiecesforserving,hencethename.
5.5BarchartsAbarchartorbardiagramshowsdataintheformofhorizontalorverticalbars.Forexample,Table5.5showsthemortalityduetocanceroftheoesophagusinEnglandandWalesovera10yearperiod.Figure5.2showsthesedataintheformofabarchart,theheightsofthebarsbeingproportionaltothemortality.
Table5.5.Canceroftheoesophagus:standardizedmortalityrateper100000peryear,Englandand
Wales,1960--1969
![Page 137: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/137.jpg)
Year Mortalityrate Year Mortalityrate
60 5.1 65 5.4
61 5.0 66 5.4
62 5.2 67 5.6
63 5.2 68 5.8
64 5.2 69 6.0
Fig.5.2.Barchartshowingtherelationshipbetweenmortalityduetocanceroftheoesophagusandyear,EnglandandWales,1960–1969
![Page 138: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/138.jpg)
Therearemanyusesforbarcharts.AsinFigure5.2,theycanbeusedtoshowtherelationshipbetweentwovariables,onebeingquantitativeandtheothereitherqualitativeoraquantitativevariablewhichisgrouped,asistimeinyears.Thevaluesofthefirstvariableareshownbytheheightsofbars,onebarforeachcategoryofthesecondvariable.
Barchartscanbeusedtorepresentrelationshipsbetweenmorethantwovariables.Figure5.3showstherelationshipbetweenchildren'sreportsofbreathlessnessandcigarettesmokingbythemselvesandtheirparents.Wecanseequicklythattheprevalenceofthesymptomincreasesbothwiththechild'ssmokingandwiththatoftheirparents.Inthepublishedpaperreportingtheserespiratorysymptomdata(Blandetal.1978)thebarchartwasnotused;thedataweregivenintheformoftables.Itwasthusavailableforotherresearcherstocomparetotheirownortocarryoutcalculationsupon.Thebarchartwasusedtopresenttheresultsduringaconference,wherethemostimportantthingwastoconveyanoutlineoftheanalysisquickly.
Barchartscanalsobeusedtoshowfrequencies.Forexample,Figure5.4(a)showstherelativefrequencydistributionsofcausesofdeathamongmenandwomen,Figure5.4(b)showsthefrequencydistributionofcauseofdeathamong
men.Figure5.4(b)looksverymuchlikeahistogram.Thedistinctionbetweenthesetwotermsisnotclear.MoststatisticianswoulddescribeFigures4.3,4.4,and4.6ashistograms,andFigures5.2and5.3asbarcharts,butIhaveseenbookswhichactuallyreversethisterminologyandotherswhichreservetheterm‘histogram’forafrequencydensitygraph,likeFigures4.4and4.6.
![Page 139: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/139.jpg)
Fig.5.3.Barchartshowingtherelationshipbetweentheprevalenceofself-reportedbreathlessnessamongschoolchildrenandtwopossiblecausativefactors
Fig.5.4.BarchartsshowingdatafromTable5.1
5.6Scatterdiagrams
![Page 140: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/140.jpg)
Thebarchartwouldbearatherclumsymethodforshowingtherelationshipbetweentwocontinuousvariables,suchasvitalcapacityandheight(Table5.6).Forthisweuseascatterdiagramorscattergram(Figure5.5).Thisismadebymarkingthescalesofthetwovariablesalonghorizontalandverticalaxes.Eachpairofmeasurementsisplottedwithacross,circle,orsomeothersuitablesymbolatthepointindicatedbyusingthemeasurementsascoordinates.
Table5.7showsserumalbuminmeasuredfromagroupofalcoholicpatientsandagroupofcontrols(Hickishetal.1989).Wecanuseascatterdiagramto
presentthesedataalso.Theverticalaxisrepresentsalbuminandwechoosetwoarbitrarypointsonthehorizontalaxistorepresentthegroups.
Table5.6.Vitalcapacity(VC)andheightfor44femalemedicalstudents
Height(cm)
VC(litres)
Height(cm)
VC(litres)
Height(cm)
VC(litres)
Height(cm)
155.0 2.20 161.2 3.39 166.0 3.66 170.0
155.0 2.65 162.0 2.88 166.0 3.69 171.0
155.4 3.06 162.0 2.96 166.6 3.06 171.0
158.0 2.40 162.0 3.12 167.0 3.48 171.5
160.0 2.30 163.0 2.72 167.0 3.72 172.0
![Page 141: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/141.jpg)
160.2 2.63 163.0 2.82 167.0 3.80 172.0
161.0 2.56 163.0 3.40 167.6 3.06 174.0
161.0 2.60 164.0 2.90 167.8 3.70 174.2
161.0 2.80 165.0 3.07 168.0 2.78 176.0
161.0 2.90 166.0 3.03 168.0 3.63 177.0
161.0 3.40 166.0 3.50 169.4 2.80 180.6
Fig.5.5.Scatterdiagramshowingtherelationshipbetweenvitalcapacityandheightforagroupoffemalemedicalstudents
![Page 142: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/142.jpg)
Table5.7.Albuminmeasuredinalcoholicsandcontrols
Alcoholics Controls
15 28 39 41 44 48 34 41 43 45 45
16 29 39 43 45 48 39 42 43 45 45
17 32 39 43 45 49 39 42 43 45 45
18 37 40 44 46 51 40 42 43 45 46
20 38 40 44 46 51 41 42 44 45 46
21 38 40 44 46 52 41 42 44 45 47
28 38 41 44 47 41 42 44 45 47
Fig.5.6.ScatterdiagramsshowingthedataofTable5.7
![Page 143: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/143.jpg)
InTable5.7therearemanyidenticalobservationsineachgroup,soweneedtoallowforthisinthescatterdiagram.Ifthereismorethanoneobservationatthesamecoordinatewecanindicatethisinseveralways.Wecanusethenumberofobservationsinplaceofthechosensymbol,butthismethodisbecomingobsolete.AsinFigure5.6(a),wecandisplacethepointsslightlyinarandomdirection(calledjittering).ThisiswhatStatadoesandsowhatIhavedoneinmostofthisbook.Alternatively,wecanuseasystematicsidewaysshift,toformamoreorderlypictureasinFigure5.6(b).Thelatterisoftenusedwhenthevariableonthehorizontalaxisiscategoricalratherthancontinuous.Suchscatterdiagramsareveryusefulforcheckingtheassumptionsofsomeoftheanalyticalmethodswhichweshalluselater.Ascatterdiagramwhereonevariableisagroupisalsocalledadotplot.Asapresentationaldevice,theyenableustoshowfarmoreinformationthanabarchartofthegroupmeanscando.Forthisreason,statisticiansusuallypreferthemtoothertypesofgraphicaldisplay.
5.7LinegraphsandtimeseriesThedataofTable5.5areorderedinawaythatthoseofTable5.6arenot,inthattheyarerecordedatintervalsintime.Suchdataarecalledatimeseries.Ifweplotascatterdiagramofsuchdata,asinFigure5.7,itisnaturaltojoinsuccessivepointsbylinestoformalinegraph.Wedonotevenneedtomarkthepointsatall;allweneedistheline.ThiswouldnotbesensibleinFigure5.5,astheobservationsareindependentofoneanotherandquiteunrelated,whereasinFigure5.7thereislikelytobearelationshipbetweenadjacentpoints.Herethemortalityraterecordedforcanceroftheoesophaguswilldependonanumberofthingswhichvaryovertimeincludingpossiblycausalfactors,suchastobaccoandalcoholconsumption,andclinicalfactors,suchasbetterdiagnostictechniquesandmethodsoftreatment.
Linegraphsareparticularlyusefulwhenwewanttoshowthechangeofmorethanonequantityovertime.Figure5.8showslevelsofzidovudine(AZT)in
![Page 144: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/144.jpg)
thebloodofAIDSpatientsatseveraltimesafteradministrationofthedrug,forpatientswithnormalfatabsorptionandwithfatmalabsorption(§10.7).Thedifferenceinresponsetothetwotreatmentsisveryclear.
Fig.5.7.Linegraphshowingchangesincanceroftheoesophagusmortalityovertime
![Page 145: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/145.jpg)
Fig.5.8.LinegraphtoshowtheresponsetoadministrationofzidovudineintwogroupsofAIDSpatients
5.8MisleadinggraphsFigure5.2isclearlytitledandlabelledandcanbereadindependentlyofthesurroundingtext.Theprinciplesofclarityoutlinedfortablesapplyequallyhere.Afterall,adiagramisamethodofconveyinginformationquicklyandthisobjectisdefeatedifthereaderoraudiencehastospendtimetryingtosortoutexactlywhatadiagramreallymeans.Becausethevisualimpactofdiagramscanbesogreat,furtherproblemsariseintheiruse.
Thefirstoftheseisthemissingzero.Figure5.9showsasecondbarchart
representingthedataofTable5.5.Thischartappearstoshowaveryrapidincreaseinmortality,comparedtothegradualincreaseshowninFigure5.2.Yetbothshowthesamedata.Figure5.9omitsmostoftheverticalscale,andinsteadstretchesthatsmallpartofthescalewherethechangetakesplace.Evenwhenweareawareofthis,itisdifficult
![Page 146: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/146.jpg)
tolookatthisgraphandnotthinkthatitshowsalargeincreaseinmortality.Ithelpsifwevisualizethebaselineasbeingsomewherenearthebottomofthepage.
Fig.5.9.Barchartwithzeroomittedontheverticalscale
ThereisnozeroonthehorizontalaxisinFigures5.2and5.9,either.Therearetworeasonsforthis.Thereisnopractical‘zerotime’onthecalendar;weuseanarbitraryzero.Also,thereisanunstatedassumptionthatmortalityratesvarywithtimeandnottheotherwayround.
ThezeroisomittedinFigure5.5.Thisisalmostalwaysdoneinscatterdiagrams,yetifwearetogaugetheimportanceoftherelationshipbetweenvitalcapacityandheightbytherelativechangeinvitalcapacityovertheheightrangeweneedthezeroonthevitalcapacityscale.Theoriginisoftenomittedonscatterdiagramsbecauseweareusuallyconcernedwiththeexistenceofarelationshipandthedistributionsfollowedbytheobservations,ratherthanitsmagnitude.Weestimatethelatterinadifferentway,describedinChapter11.
Linegraphsareparticularlyatriskofundergoingthesortofdistortion
![Page 147: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/147.jpg)
ofmissingzerodescribedin§5.8.ManycomputerprogramsresistdrawingbarchartslikeFigure5.9,butwillproducealinegraphwithatruncatedscaleasthedefault.Figure5.10showsalinegraphwithatruncatedscale,correspondingtoFigure5.9.Justasthere,themessageofthegraphisadramaticincreaseinmortality,whichthedatathemselvesdonotreallysupport.Wecanmakethisevenmoredramaticbystretchingtheverticalscaleandcompressingthehorizontalscale.TheeffectisnowreallyimpressiveandlooksmuchmorelikelythanFigure5.7toattractresearchfunds,Nobelprizesandinterviewsontelevision.Huff(1954)aptlynamessuchhorrors‘geewhiz’graphs.Theyareevenmoredramaticifweomitthescalesaltogetherandshowonlythesoaringline.
Fig.5.10.Linegraphswithamissingzeroandwithastretchedverticalandcompressedhorizontalscale,a‘geewhiz’graph
![Page 148: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/148.jpg)
Fig.5.11.Figure5.1withthree-dimensionaleffects
Thisisnottosaythatauthorswhoshowonlypartofthescalearedeliberatelytryingtomislead.Thereareoftengoodargumentsagainstgraphswithvastareasofboringblankpaper.InFigure5.5,wearenotinterestedinvitalcapacitiesnearzeroandcanfeelquitejustifiedinexcludingthem.InFigure5.10wecertainlyareinterestedinzeromortality;itissurelywhatweareaimingfor.Thepointisthatgraphscansoeasilymisleadtheunwaryreader,soletthereaderbeware.
Theadventofpowerfulpersonalcomputersledtoanincreaseintheabilitytoproducecomplicatedgraphics.Simplecharts,suchasFigure5.1,areinformativebutnotvisuallyexciting.Onewayofdecoratingsuchgraphsismakethemappearthree-dimensional.Figure5.11showstheeffect.Theanglesarenolongerproportionaltothenumberswhichtheyrepresent.Theareasare,butbecausetheyaredifferentshapesitisdifficulttocomparethem.Thisdefeatstheprimaryobjectofconveyinginformationquicklyandaccurately.Anotherapproachtodecoratingdiagramsistoturnthemintopictures.Inapictogramthebarsof
thebarchartarereplacedbypictures.Pictogramscanbehighlymisleading,astheheightofapicture,drawnwiththree-dimensionaleffect,isproportionaltothenumberrepresented,butwhatweseeisthevolume.Suchdecoratedgraphsareliketheilluminatedcapitalsofmedievalmanuscripts:nicetolookatbuthardtoread.Ithinkthey
![Page 149: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/149.jpg)
shouldbeavoided.
Fig.5.12.TuberculosismortalityinEnglandandWales,1871–1971(DHSS1976)
Huff(1954)recountsthatthepresidentofachapteroftheAmericanStatisticalAssociationcriticizedhimforaccusingpresentersofdataoftryingtodeceive.Thestatisticianarguedthatincompetencewastheproblem.Huff'sreplywasthatdiagramsfrequentlysensationalizebyexaggerationandrarelyminimizeanything,thatpresentersofdatararelydistortthosedatatomaketheircaseappearweakerthanitis.Theerrorsaretooone-sidedforustoignorethepossibilitythatwearebeingfooled.Whenpresentingdata,especiallygraphically,beverycarefulthatthedataareshownfairly.Whenonthereceivingend,beware!
5.9LogarithmicscalesFigure5.12showsalinegraphrepresentingthefallintuberculosismortalityinEnglandandWalesover100years(DHSS1976).Wecanseearatherunsteadycurve,showingthecontinuingdeclineinthedisease.Figure5.12alsoshowsthemortalityplottedonalogarithmic(orlog)scale.Alogarithmicscaleisonewheretwopairsofpointswillbethesamedistanceapartiftheirratiosareequal,ratherthantheirdifferences.Thusthedistancebetween1and10isequaltothat
![Page 150: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/150.jpg)
between10and100,nottothatbetween10and19.(See§5Aifyoudonotunderstandthis.)Thelogarithmiclineshowsaclearkinkinthecurveabout1950,thetimewhenanumberofeffectiveanti-TBmeasures,chemotherapywithstreptomycin,BCGvaccineandmassscreeningwithX-rays,wereintroduced.Ifweconsiderthepropertiesoflogarithms(§5A),wecanseehowthelogscaleforthetuberculosismortalitydataproducedsuchsharpchangesinthecurve.Iftherelationshipissuchthatthemortalityisfallingwithaconstantproportion,suchas10%peryear,theabsolutefalleachyeardependsontheabsolutelevelintheprecedingyear:
mortalityin1960=constant×mortalityin1959
Soifweplotmortalityonalogscaleweget:
log(mortalityin1960)=log(constant)+log(mortalityin1959)
Formortalityin1961,wehave
Hencewegetastraightlinerelationshipbetweenlogmortalityandtimet:
log(mortalityaftertyears)=t×log(constant)+log(mortalityasstart)
Whentheconstantproportionchanges,theslopeofthestraightlineformedbyplottinglog(mortality)againsttimechangesandthereisaveryobviouskinkintheline.
Logscalesareveryusefulanalytictools.However,agraphonalogscalecanbeverymisleadingifthereaderdoesnotallowforthenatureofthescale.ThelogscaleinFigure5.12showstheincreasedrateofreductioninmortalityassociatedwiththeanti-TBmeasuresquiteplainly,butitgivestheimpressionthatthesemeasureswereimportantinthedeclineofTB.Thisisnotso.Ifwelookatthecorrespondingpointonthenaturalscale,wecanseethatallthesemeasuresdidwastoaccelerateadeclinewhichhadbeengoingonforalongtime(seeRadicalStatisticsHealthGroup1976)
![Page 151: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/151.jpg)
Appendices
5AAppendix:Logarithms
Logarithmsarenotsimplyamethodofcalculationdatingfrombeforethecomputerage,butasetoffundamentalmathematicalfunctions.Becauseoftheirspecialpropertiestheyaremuchusedinstatistics.Weshallstartwithlogarithms(orlogsforshort)tobase10,thecommonlogarithmsusedincalculations.Thelogtobase10ofanumberxisywhere
x=10y
Wewritey=log10(x).Thusforexamplelog10(10)=1,log10(100)=2,log10(1000)=3,log10(10000)=4,andsoon.Ifwemultiplytwonumbers,thelogoftheproductisthesumoftheirlogs:
log(xy)=log(x)+log(y)
Forexample.
100×1000=102×103=102+3=105=100000
Orinlogterms:
log10(100×1000)=log10(100)+log10(1000)=2+3=5
Hence,100×1000=105=100000.Thismeansthatanymultiplicativerelationshipoftheform
y=a×b×c×d
canbemadeadditivebyalogtransformation:
log(y)=log(a)+log(b)+log(c)+log(d)
ThisistheprocessunderlyingthefittotheLognormalDistributiondescribedin§7.4.
Thereisnoneedtouse10asthebaseforlogarithms.Wecanuseanynumber.Thelogofanumberxtobasebcanbefoundfromthelogto
![Page 152: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/152.jpg)
baseabyasimplecalculation:
Tenisconvenientforarithmeticusinglogtables,butforotherpurposesitislessso.Forexample,thegradient,slopeordifferentialofthecurvey=log10(x)islog10(e)/x,wheree=2.718281…isaconstantwhichdoesnotdependonthebaseofthelogarithm.Thisleadstoawkwardconstantsspreadingthroughformulae.Tokeepthistoaminimumweuselogstothebasee,callednaturalorNapierianlogarithmsafterthemathematicianJohnNapier.ThisisthelogarithmusuallyproducedbyLOG(X)functionsincomputerlanguages.
Figure5.13showsthelogcurveforthreedifferentbases,2,eand10.Thecurvesallgothroughthepoint(1,0),i.e.log(1)=0.Asxapproaches0,log(x)becomesalargerandlargernegativenumber,tendingtowardsminusinfinityasxtendstozero.Therearenologsofnegativenumbers.Asxincreasesfrom1,thecurvebecomesflatterandflatter.Thoughlog(x)continuestoincrease,itdoessomoreandmoreslowly.Thecurvesallgothrough(base,1)i.e.log(base)=1.Thecurveforlogtothebase2goesthrough(2,1),(4,2),(8,3)because21=2.22=4,23=8.Wecanseethattheeffectofreplacingdatabytheirlogswillbetostretchoutthescaleatthelowerendandcontractitattheupper.
Weoftenworkwithlogarithmsofdataratherthanthedatathemselves.Thismayhaveseveraladvantages.Multiplicativerelationshipsmaybecomeadditive,curvesmaybecomestraightlinesandskewdistributionsmaybecomesymmetrical.
Wetransformbacktothenaturalscaleusingtheantilogarithmorantilog.Ify=log10(x),x=10yistheantilogofy.IfZ=loge(x),x=ezorx=exp(z)istheantilogofz.Ifyourcomputerprogramdoesnottransformback,mostcalculatorshaveexand10xfunctionsforthispurpose.
![Page 153: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/153.jpg)
Fig.5.13.Logarithmiccurvestothreedifferentbases
5MMultiplechoicequestions20to24(Eachbranchiseithertrueorfalse)
20.‘AftertreatmentwithWondermycin,66.67%ofpatientsmadeacompleterecovery’
(a)Wondermyciniswonderful;
(b)thisstatementmaybemisleadingbecausethedenominatorisnotgiven;
(c)thenumberofsignificantfiguresusedsuggestadegreeofprecisionwhichmaynotbepresent;
(d)somecontrolinformationisrequiredbeforewecandrawanyconclusionsaboutWondermycin;
(e)theremightbeonlyaverysmallnumberofpatients.
ViewAnswer
![Page 154: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/154.jpg)
21.Thenumber1729.54371:
(a)totwosignificantfiguresis1700;
(b)tothreesignificantfiguresis1720;
(c)tosixdecimalplacesis1729.54;
(d)tothreedecimalplacesis1729.544;
(e)tofivesignificantfiguresis1729.5.
ViewAnswer
Fig.5.14.Adubiousgraph
22.Figure5.14:
(a)showsahistogram;
(b)shouldhavetheverticalaxislabelled;
(c)shouldshowthezeroontheverticalaxis;
![Page 155: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/155.jpg)
(d)shouldshowthezeroonthehorizontalaxis;
(e)shouldshowtheunitsfortheverticalaxis.
ViewAnswer
23.Logarithmicscalesusedingraphsshowingtimetrends:
(a)showchangesinthetrendclearly;
(b)oftenproducestraightlines;
(c)giveaclearideaofthemagnitudeofchanges;
(d)shouldshowthezeropointfromtheoriginalscale;
(e)compressintervalsbetweenlargenumberscomparedtothosebetweensmallnumbers.
ViewAnswer
24.Thefollowingmethodscanbeusedtoshowtherelationshipbetweentwovariables:
(a)histogram;
(b)piechart;
(c)scatterdiagram;
(d)barchart;
(e)linegraph.
ViewAnswer
Table5.8.WeeklygeriatricadmissionsinWandsworthHealthDistrictfromMaytoSeptember,
1982and1983(Fishetal.1985)
Week 1982 1983 Week 1982 1983
![Page 156: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/156.jpg)
1 24 20 12 11 25
2 22 17 13 6 22
3 21 21 14 10 26
4 22 17 15 13 12
5 24 22 16 19 33
6 15 23 17 13 19
7 23 20 18 17 21
8 21 16 19 10 28
9 18 24 20 16 19
10 21 21 21 24 13
11 17 20 22 15 29
5EExercise:CreatinggraphsInthisexerciseweshalldisplaygraphicallysomeofthedatawehavestudiedsofar.
1.Table4.1showsdiagnosesofpatientsinahospitalcensus.Displaythesedataasagraph.
![Page 157: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/157.jpg)
ViewAnswer
2.Table2.8showstheparalyticpolioratesforseveralgroupsofchildren.Constructabarchartfortheresultsfromtherandomizedcontrolareas.
ViewAnswer
3.Table3.1showssomeresultsfromthestudyofmortalityinBritishdoctors.Showthesegraphically.
ViewAnswer
4.Table5.8showsthenumbersofgeriatricadmissionsinWandsworthHealthDistrictforeachweekfromMaytoSeptemberin1982and1983.Showthesedatagraphically.Whydoyouthinkthetwoyearsweredifferent?
ViewAnswer
![Page 158: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/158.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>6-Probability
6
Probability
6.1ProbabilityWeusedatafromasampletodrawconclusionsaboutthepopulationfromwhichitisdrawn.Forexample,inaclinicaltrialwemightobservethatasampleofpatientsgivenanewtreatmentrespondbetterthanpatientsgivenanoldtreatment.Wewanttoknowwhetheranimprovementwouldbeseeninthewholepopulationofpatients,andifsohowbigitmightbe.Thetheoryofprobabilityenablesustolinksamplesandpopulations,andtodrawconclusionsaboutpopulationsfromsamples.Weshallstartthediscussionofprobabilitywithsomesimplerandomizingdevices,suchascoinsanddice,buttherelevancetomedicalproblemsshouldsoonbecomeapparent.
Wefirstaskwhatexactlyismeantby‘probability’.InthisbookIshalltakethefrequencydefinition:theprobabilitythataneventwillhappenundergivencircumstancesmaybedefinedastheproportionofrepetitionsofthosecircumstancesinwhichtheeventwouldoccurinthelongrun.Forexample,ifwetossacoinitcomesdowneitherheadsortails.Beforewetossit,wehavenowayofknowingwhichwillhappen,butwedoknowthatitwilleitherbeheadsortails.Afterwehavetossedit,ofcourse,weknowexactlywhattheoutcomeis.Ifwecarryontossingourcoin,weshouldgetseveralheadsandseveraltails.Ifwegoondoingthisforlongenough,thenwewouldexpecttogetasmanyheadsaswedotails.Sotheprobabilityofaheadbeingthrownishalf,becauseinthelongrunaheadshouldoccuronhalfofthethrows.Thenumberofheadswhichmightariseinseveraltossesofthecoiniscalledarandomvariable,thatis,avariablewhichcantakemorethan
![Page 159: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/159.jpg)
onevaluewithgivenprobabilities.Inthesameway,athrowndiecanshowsixfaces,numberedonetosix,withequalprobability.Wecaninvestigaterandomvariablessuchasthenumberofsixesinagivennumberofthrows,thenumberofthrowsbeforethefirstsix,andsoon.Thereisanother,broaderdefinitionofprobabilitywhichleadstoadifferentapproachtostatistics,theBayesianschool(BlandandAltman1998),butitisbeyondthescopeofthisbook.
Thefrequencydefinitionofprobabilityalsoappliestocontinuousmeasurement,suchashumanheight.Forexample,supposethemedianheightinapopulationofwomenis168cm.Thenhalfthewomenareabove168cminheight.Ifwechoosewomenatrandom(i.e.withoutthecharacteristicsofthewomaninfluencingthechoice)theninthelongrunhalfthewomenchosenwillhave
heightsabove168cm.Theprobabilityofawomanhavingheightabove168cmisonehalf.Similarly,if1/10ofthewomenhaveheightgreaterthan180cm.awomanchosenatrandomwillhaveheightgreaterthan180cmwithprobability1/10.Inthesamewaywecanfindtheprobabilityofheightbeingbetweenanygivenvalues.Whenwemeasureacontinuousquantitywearealwayslimitedbythemethodofmeasurement,andsowhenwesayawoman'sheightis170cmwemeanthatitisbetween,say,169.5and170.5cm,dependingontheaccuracywithwhichwemeasure.Sowhatweareinterestedinistheprobabilityoftherandomvariabletakingvaluesbetweencertainlimitsratherthanparticularvalues.
6.2PropertiesofprobabilityThefollowingsimplepropertiesfollowfromthedefinitionofprobability.
1. Aprobabilityliesbetween0.0and1.0.Whentheeventneverhappenstheprobabilityis0.0,whenitalwayshappenstheprobabilityis1.0.
2. Additionrule.Supposetwoeventsaremutuallyexclusive,i.e.whenonehappenstheothercannothappen.Thentheprobabilitythatoneortheotherhappensisthesumoftheirprobabilities.Forexample,a
![Page 160: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/160.jpg)
throwndiemayshowaoneoratwo,butnotboth.Theprobabilitythatitshowsaoneoratwo=1/6+1/6=2/6.
3. Multiplicationrule.Supposetwoeventsareindependent,i.e.knowingonehashappenedtellsusnothingaboutwhethertheotherhappens.Thentheprobabilitythatbothhappenistheproductoftheirprobabilities.Forexample,supposewetosstwocoins.Onecoindoesnotinfluencetheother,sotheresultsofthetwotossesareindependent,andtheprobabilityoftwoheadsoccurringis1/2×1/2=1/4.ConsidertwoindependenteventsAandB.TheproportionoftimesAhappensinthelongrunistheprobabilityofA.SinceAandBareindependent,ofthosetimeswhenAhappens,aproportion,equaltoprobabilityofB,willhaveBhappenalso.HencetheproportionoftimesthatAandBhappentogetheristheprobabilityofAmultipliedbytheprobabilityofB.
6.3ProbabilitydistributionsandrandomvariablesSupposewehaveasetofeventswhicharemutuallyexclusiveandwhichincludesalltheeventswhichcanpossiblyhappen.Thesumoftheirprobabilitiesis1.0.Thesetoftheseprobabilitiesmakeupaprobabilitydistribution.Forexample,ifwetossacointhetwopossibilities,headortail,aremutuallyexclusiveandthesearetheonlyeventswhichcanhappen.Theprobabilitydistributionis:
PROB(head)=1/2
PROB(tail)=1/2
Now,letusdefineavariable,whichwewilldenotebythesymbolX,suchthatX=0ifthecoinshowsatailandX=1ifthecoinshowsahead.Xis
thenumberofheadsshownonasingletoss,whichmustbe0or1.WedonotknowbeforethetosswhatXwillbe,butdoknowtheprobabilityofithavinganypossiblevalue.Xisarandomvariable(§6.1)andtheprobabilitydistributionisalsothedistributionofX.Wecanrepresentthiswithadiagram,asinFigure6.1(a).
![Page 161: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/161.jpg)
Fig.6.1.Probabilitydistributionsforthenumberofheadsshowninthetossofonecoinandintossesoftwocoins
Whathappensifwetosstwocoinsatonce?Wenowhavefourpossibleevents:aheadandahead,aheadandatail,atailandahead,atailandatail.Clearly,theseareequallylikelyandeachhasprobability1/4.LetYbethenumberofheads.Yhasthreepossiblevalues:0,1,and2.Y=0onlywhenwegetatailandatailandhasprobability1/4.Similarly,Y=2onlywhenwegetaheadandahead,sohasprobability1/4.However,Y=1eitherwhenwegetaheadandtail,orwhenwehaveatailandahead,andsohasprobability1/4+1/4=1/2.Wecanwritethisprobabilitydistributionas:
PROB(Y=0)=1/4
PROB(Y=1)=1/2
PROB(Y=2)=1/4
TheprobabilitydistributionofYisshowninFigure6.1(b).
6.4TheBinomialdistributionWehaveconsideredtheprobabilitydistributionsoftworandomvariables:X,thenumberofheadsinonetossofacoin,takingvalues0and1,andY,thenumberofheadswhenwetosstwocoins,takingvalues0,1or2.Wecanincreasethenumberofcoins;Figure6.2showsthedistributionofthenumberofheadsobtainedwhen15coinsaretossed.Wedonotneedtheprobabilityofa‘head’tobe0.5:wecan
![Page 162: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/162.jpg)
countthenumberofsixeswhendicearethrown.Figure6.2alsoshowsthedistributionofthenumberofsixesobtainedfrom10dice.Ingeneral,wecanthinkofthecoinorthedieastrials,whichcanhaveoutcomessuccess(headorsix)orfailure(tailoronetofive).ThedistributionsinFigures6.1and6.2areallexamplesoftheBinomialdistribution,whicharisesfrequentlyinmedicalapplications.TheBinomialdistributionisthedistributionfollowed
bythenumberofsuccessesinnindependenttrialswhentheprobabilityofanysingletrialbeingasuccessisp.TheBinomialdistributionisinfactafamiliyofdistributions,themembersofwhicharedefinedbythevaluesofnandp.Thevalueswhichdefinewhichmemberofthedistributionfamilywehavearecalledtheparametersofthedistribution.
Fig.6.2.Distributionofthenumberofheadsshownwhen15coinsaretossedandofthenumberofsixesshownwhen10dicearethrown,examplesoftheBinomialdistribution
Simplerandomizingdeviceslikecoinsanddiceareofinterestinthemselves,butnotofobviousrelevancetomedicine.However,supposewearecarryingoutarandomsamplesurveytoestimatetheunknownprevalence,p,ofadisease.Sincemembersofthesamplearechosenatrandomandindependentlyfromthepopulation,theprobabilityofanychosensubjecthavingthediseaseisp.Wethushave
![Page 163: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/163.jpg)
aseriesofindependenttrials,eachwithprobabilityofsuccessp,andthenumberofsuccesses,i.e.membersofthesamplewiththedisease,willfollowaBinomialdistribution.Asweshallseelater,thepropertiesoftheBinomialdistributionenableustosayhowaccurateistheestimateofprevalenceobtained(§8.4).
WecouldcalculatetheprobabilitiesforaBinomialdistributionbylistingallthewaysinwhich,say,15coinscanfall.However,thereare215=32768combinationsof15coins,sothisisnotverypractical.Instead,thereisaformulafortheprobabilityintermsofthenumberofthrowsandtheprobabilityofahead.Thisenablesustoworktheseprobabilitiesoutforanyprobabilityofsuccessandanynumberoftrials.Ingeneral,wehavenindependenttrialswiththeprobabilitythatatrialisasuccessbeingp.Theprobabilityofrsuccessesis
wheren!.callednfactorial,isn×(n-1)×(n-2)×…×2×1.Thisratherforbiddingformulaariseslikethis.Foranyparticularseriesofrsuccesses,eachwithprobabilityp,andn-rfailures,eachwithprobability1-p,theprobabilityoftheserieshappeningispr(1-p)(n-r),sincethetrialsareindependentandthemultiplicativeruleapplies.Thenumberofwaysinwhichrthingsmaybechosenfromnthingsisn!/r!(n-r)!(§6A).Onlyonecombinationcanhappenat
onetime,sowehaven!/r!(n-r)!mutuallyexclusivewaysofhavingrsuccesses,eachwithprobabilitypr(1-p)(n-r).Theprobabilityofhavingrsuccessesisthesumofthesen!/r!(n-r)!probabilities,givingtheformulaabove.Thosewhorememberthebinomialexpansioninmathematicswillseethatthisisonetermofit,hencethenameBinomialdistribution.
![Page 164: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/164.jpg)
Fig.6.3.Binomialdistributionswithdifferentn,p=0.3
Wecanapplythistothenumberofheadsintossesoftwocoins.ThenumberofheadswillbefromaBinomialdistributionwithp=0.5andn=2.Hencetheprobabilityoftwoheads(r=2)is:
Notethat0!=1(§6A),andanythingtothepower0is1.Similarlyforr=1andr=0:
Thisiswhatwasfoundfortwocoinsin§6.3.Wecanusethisdistributionwheneverwehaveaseriesoftrialswithtwopossibleoutcomes.Ifwetreatagroupofpatients,thenumberwhorecoverisfromaBinomialdistribution.Ifwemeasurethebloodpressureofagroupofpeople,thenumberclassifiedashypertensiveisfromaBinomialdistribution.
Figure6.3showstheBinomialdistributionforp=0.3andincreasingvaluesofn.Thedistributionbecomesmoresymmetricalasnincreases.ItisconvergingtotheNormaldistribution,describedinthenext
![Page 165: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/165.jpg)
chapter.
6.5MeanandvarianceThenumberofdifferentprobabilitiesinaBinomialdistributioncanbeverylargeandunwieldy.Whennislarge,weusuallyneedtosummarizetheseprobabilitiesinsomeway.Justasafrequencydistributioncanbedescribedbyitsmeanandvariance,socanaprobabilitydistributionanditsassociatedrandomvariable.
Themeanistheaveragevalueoftherandomvariableinthelongrun.ItisalsocalledtheexpectedvalueorexpectationandtheexpectationofarandomvariableXisusuallydenotedbyE(X).Forexample,considerthenumberofheadsintossesoftwocoins.Weget0headsin1/4ofpairsofcoins,i.e.withprobability1/4.Weget1headin1/2ofpairsofcoins,and2headsin1/4ofpairs.Theaveragevalueweshouldgetinthelongrunisfoundbymultiplyingeachvaluebytheproportionofpairsinwhichitoccursandadding:
Ifwekeptontossingpairsofcoins,theaveragenumberofheadsperpairwouldbe1.Thusforanyrandomvariablewhichtakesdiscretevaluesthemean,expectationorexpectedvalueisfoundbysummingeachpossiblevaluemultipliedbyitsprobability.
Notethattheexpectedvalueofarandomvariabledoesnothavetobeavaluethattherandomvariablecanactuallytake.Forexample,forthemeannumberofheadsinthrowsofonecoinwehaveeithernoheadsor1head,eachwithprobabilityhalf,andtheexpectedvalueis0×½+1×½=½.Thenumberofheadsmustbe0or1,buttheexpectedvalueishalf,theaveragewhichwewouldgetinthelongrun.
Thevarianceofarandomvariableistheaveragesquareddifferencefromthemean.Forthenumberofheadsintossesoftwocoins,0is1unitfromthemeanandoccursfor1/4ofpairsofcoins,1is0unitsfromthemeanandoccursforhalfofthepairsand2is1unitfromthemeanandoccursfor1/4ofpairs,i.e.withprobability1/4.Thevarianceisthenfoundbysquaringthesedifferences,multiplyingbythe
![Page 166: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/166.jpg)
proportionoftimesthedifferencewilloccur(theprobability)andadding:
WedenotethevarianceofarandomvariableXbyVAR(X).Inmathematicalterms,
VAR(X)=E(X2-E(X)2)
Thesquarerootofthevarianceisthestandarddeviationoftherandomvariableordistribution.WeoftenusetheGreekletterµ,pronounced‘mu’,andσ,‘sigma’,
forthemeanandstandarddeviationofaprobabilitydistribution.Thevarianceisthenσ2.
Themeanandvarianceofthedistributionofacontinuousvariable,ofwhichmoreinChapter7,aredefinedinasimilarway.Calculusisusedtodefinethemasintegrals,butthisneednotconcernushere.Essentiallywhathappensisthatthecontinuousscaleisbrokenupintomanyverysmallintervalsandthevalueofthevariableinthatverysmallintervalismultipliedbytheprobabilityofbeinginit,thentheseareadded.
6.6PropertiesofmeansandvariancesWhenweusethemeanandvarianceofprobabilitydistributionsinstatisticalcalculations,itisnotthedetailsoftheirformulaewhichweneedtoknow,butsomeoftheirsimpleproperties.Mostoftheformulaeusedinstatisticalcalculationsarederivedfromthese.Thereasonsforthesepropertiesarequiteeasytoseeinanon-mathematicalway.
Ifweaddaconstanttoarandomvariable,thenewvariablesocreatedhasameanequaltothatoftheoriginalvariableplustheconstant.Thevarianceandstandarddeviationwillbeunchanged.Supposeourrandomvariableishumanheight.Wecanaddaconstanttotheheight
![Page 167: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/167.jpg)
bymeasuringtheheightsofpeoplestandingonabox.Themeanheightofpeopleplusboxwillnowbethemeanheightofthepeopleplustheconstantheightofthebox.Theboxwillnotalterthevariabilityoftheheights,however.Thedifferencebetweenthetallestandsmallest,forexample,willbeunchanged.Wecansubtractaconstantbyaskingthepeopletostandinaconstantholetobemeasured.Thisreducesthemeanbutleavesthevarianceunchangedasbefore.(MyfreeprogramClinstat(§1.3)hasasimplegraphicsprogramwhichillustratesthis.)
Ifwemultiplyarandomvariablebyapositiveconstant,themeanandstandarddeviationaremultipliedbytheconstant,thevarianceismultipliedbythesquareoftheconstant.Forexample,ifwechangeourunitsofmeasurements,sayfrominchestocentimetres,wemultiplyeachmeasurementby2.54.Thishastheeffectofmultiplyingthemeanbytheconstant,2.54,andmultiplyingthestandarddeviationbytheconstantsinceitisinthesameunitsastheobservations.However,thevarianceismeasuredinsquaredunits,andsoismultipliedbythesquareoftheconstant.Divisionbyaconstantworksinthesameway.Iftheconstantisnegative,themeanismultipliedbytheconstantandsochangessign.Thevarianceismultipliedbythesquareoftheconstant,whichispositive,sothevarianceremainspositive.Thestandarddeviation,whichisthesquarerootofthevariance,isalwayspositive.Itismultipliedbytheabsolutevalueoftheconstant,i.e.theconstantwithoutthenegativesign.
Ifweaddtworandomvariablesthemeanofthesumisthesumofthemeans,and,ifthetwovariablesareindependent,thevarianceofthesumisthesumoftheirvariances.Wecandothisbymeasuringtheheightofpeoplestandingonboxesofrandomheight.Themeanheightofpeopleonboxesisthemeanheightofpeople+themeanheightoftheboxes.Thevariabilityoftheheightsisalso
increased.Thisisbecausesomeshortpeoplewillfindthemselvesonsmallboxes,andsometallpeoplewillfindthemselvesonlargeboxes.Ifthetwovariablesarenotindependent,somethingdifferenthappens.Themeanofthesumremainsthesumofthemeans,butthevarianceofthesumisnotthesumofthevariances.Supposeourpeoplehavedecidedtostandontheboxes,notjustatastatistician'swhim,butfor
![Page 168: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/168.jpg)
apurpose.Theywishtochangealightbulb,andsomustreacharequiredheight.Nowtheshortpeoplemustpicklargeboxes,whereastallpeoplecanmakedowithsmallones.Theresultisareductioninvariabilitytoalmostnothing.Ontheotherhand,ifwetoldthetallestpeopletofindthelargestboxesandtheshortesttofindthesmallestboxes,thevariablitywouldbeincreased.Independenceisanimportantcondition.
Ifwesubtractonerandomvariablefromanother,themeanofthedifferenceisthedifferencebetweenthemeans,and,ifthetwovariablesareindependent,thevarianceofthedifferenceisthesumoftheirvariances.Supposewemeasuretheheightsabovegroundlevelofourpeoplestandinginholesofrandomdepth.Themeanheightabovegroundisthemeanheightofthepeopleminusthemeandepthofthehole.Thevariabilityisincreased,becausesomeshortpeoplestandindeepholesandsometallpeoplestandinshallowholes.Ifthevariablesarenotindependent,theadditivityofthevariancesbreaksdown,asitdidforthesumoftwovariables.Whenthepeopletrytohideintheholes,andsomustfindaholedeepenoughtoholdthem,thevariabilityisagainreduced.
Theeffectsofmultiplyingtworandomvariablesandofdividingonebyanotheraremuchmorecomplicated.Fortunatelywerarelyneedtodothis.
WecannowfindthemeanandvarianceoftheBinomialdistributionwithparametersnandp.Firstconsidern=1.Thentheprobabilitydistributionis:
Themeanistherefore0×(1-p)+1×p=p.Thevarianceis
Now,avariablefromtheBinomialdistributionwithparametersnandpisthesumofnindependentvariablesfromtheBinomialdistributionwithparameters1andp.Soitsmeanisthesumofnmeansallequaltop,anditsvarianceisthesumofnvariancesallequaltop(1-p).Hence
![Page 169: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/169.jpg)
theBinomialdistributionhasmean=npandvariance=np(1-p).Forlargesampleproblems,thesearemoreusefulthantheBinomialprobabilityformula.
ThepropertiesofmeansandvariancesofrandomvariablesenableustofindaformalsolutiontotheproblemofdegreesoffreedomforthesamplevariancediscussedinChapter4.Wewantanestimateofvariancewhoseexpectedvalueisthepopulationvariance.TheexpectedvalueofΣ(xi-[xwithbarabove])2canbeshownto
be(n-1)VAR(x)(§6B)andhencewedividebyn-1,notn,togetourestimateofvariance.
Fig.6.4.Poissondistributionswithfourdifferentmeans
6.7*ThePoissondistribution
![Page 170: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/170.jpg)
TheBinomialdistributionisoneofmanyprobabilitydistributionswhichareusedinstatistics.Itisadiscretedistribution,thatisitcantakeonlyafinitesetofpossiblevalues,andisthediscretedistributionmostcommonlyencounteredinmedicalapplications.Oneotherdiscretedistributionisworthdiscussingatthispoint,thePoissondistribution.Although,liketheBinomial,thePoissondistributionarisesfromasimpleprobabilitymodel,themathematicsinvolvedismorecomplicatedandwillbeomitted.
Supposeeventshappenrandomlyandindependentlyintimeataconstantrate.ThePoissondistributionisthedistributionfollowedbythenumberofeventswhichhappeninafixedtimeinterval.Ifeventshappenwithrateµeventsperunittime,theprobabilityofreventshappeninginunittimeis
wheree=2.718…,themathematicalconstant.Ifeventshappenrandomlyandindependentlyinspace,thePoissondistributiongivestheprobabilitiesforthenumberofeventsinunitvolumeorarea.
Thereisseldomanyneedtouseindividualprobabilitiesofthisdistribution,
asitsmeanandvariancesuffice.ThemeanofthePoissondistributionforthenumberofeventsperunittimeissimplytherate,µ.ThevarianceofthePoissondistributionisalsoequaltoµ.ThusthePoissonisafamilyofdistributions,liketheBinomial,butwithonlyoneparameter,µ.Thisdistributionisimportant,becausedeathsfrommanydiseasescanbetreatedasoccuringrandomlyandindependentlyinthepopulation.Thus,forexample,thenumberofdeathsfromlungcancerinoneyearamongpeopleinanoccupationalgroup,suchascoalminers,willbeanobservationfromaPoissondistribution,andwecanusethistomakecomparisonsbetweenmortalityrates(§16.3).
Figure6.4showsthePoissondistributionforfourdifferentmeans.YouwillseethatasthemeanincreasesthePoissondistributionlooksratherliketheBinomialdistributioninFigure6.3.Weshalldiscussthissimilarityfurtherinthenextchapter.
![Page 171: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/171.jpg)
6.8*ConditionalprobabilitySometimesweneedtothinkabouttheprobabilityofaneventifanothereventhashappened.Forexample,wemightaskwhatistheprobablitythatapatienthascoronaryarterydiseaseifheorshehastinglingpainintheleftarm.Thisiscalledaconditionalprobability,theprobabilityoftheevent(coronaryarterydisease)givenacondition(tinglingpain).Wewritethisprobabilitythus,separatingtheeventandtheconditionbyaverticalbar:
PROB(coronaryarterydisease|tinglingpain)
Conditionalprobablitiesareusefulinstatisticalaidstodiagnosis(§15.7).Forasimplerexample,wecangobacktotossesoftwocoins.Ifwetossonecointhentheother,thefirsttossalterstheprobabilitiesforthepossibleoutcomesforthetwocoins:
PROB(bothcoinsheads|firstcoinhead)=0.5
PROB(headandtail|firstcoinhead)=0.5
PROB(bothcoinstails|firstcoinhead)=0.0
and
PROB(bothcoinsheads|firstcointail)=0.0
PROB(headandtail|firstcointail)=0.5
PROB(bothcoinstails|firstcointail)=0.5
Themultiplicativerule(§6.2)canbeextendedtodealwitheventswhicharenotindependent.FortwoeventsAandB:
PROB(AandB)=PROB(A|B)PROB(B)=PROB(B|A)PROB(A).
ItisimportanttounderstandthatPROB(A|B)andPROB(B|A)arenot
thesame.Forexample,Table6.1showstherelationshipbetweentwodiseases,hayfeverandeczemainalargegroupofchildren.Theprobabilitythatinthisgroupachildwithhayfeverwillhaveeczemaalsois
PROB(eczema|hayfever)=141/1069=0.13
theproportionofchildrenwithhayfeverwhohaveeczemaalso.Thisis
![Page 172: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/172.jpg)
clearlymuchlessthantheprobablitythatachildwitheczemawillhavehayfever,
PROB(hayfever|eczema)=141/561=0.25
theproportionofchildrenwitheczemawhohavehayfeveralso.
Table6.1.Relationshipbetweenhayfeverandeczemaatage11intheNationalChildDevelopment
Study
EczemaHayfever
TotalYes No
Yes 141 420 561
No 928 13525 14453
Total 1069 13945 15522
Thismaylookobvious,butconfusionbetweenconditionalprobabilitiesiscommonandcancauseseriousproblems,forexampleintheconsiderationofforensicevidence.Typically,thiswillproducetheprobabilitythatamaterialfoundacrimescene(DNA,fibres,etc.)willmatchthesuspectascloselyasitdoesgiventhatthematerialdidnotcomefromthesubject.Thisis
PROB(evidence|suspectnotatcrimescene).
Itisnotthesameas
PROB(suspectnotatcrimescene|evidence),
butthisisoftenhowitisinterpreted,aninversionknownasthe
![Page 173: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/173.jpg)
prosecutor'sfallacy.
Appendices
6AAppendix:Permutationsandcombinations
Forthosewhoneverknew,orhaveforgotten,thetheoryofcombinations,itgoeslikethis.First,welookatthenumberofpermutations,i.e.waysofarrangingasetofobjects.Supposewehavenobjects.Howmanywayscanweorderthem?Thefirstobjectcanbechosennways,i.e.anyobject.Foreachfirstobjecttherearen-1possiblesecondobjects,sotherearen×(n-1)possiblefirstandsecondpermutations.Therearenowonlyn-2choicesforthethirdobject,n-3choicesforthefourth,andsoon,untilthereisonlyonechoiceforthelast.Hence,therearen×(n-1)×(n-2)×…×2×1permutationsofnobjects.Wecallthisnumberthefactorialofnandwriteit‘n!’.
Nowwewanttoknowhowmanywaysthereareofchoosingrobjectsfromnobjects.Havingmadeachoiceofrobjects,wecanorderthoseinr!ways.Wecanalsoorderthen-rnotchosenin(n-r)!ways.Sotheobjectscanbeorderedinr!(n-r)!wayswithoutalteringtheobjectschosen.Forexample,saywechoosethefirsttwofromthreeobjects,A,BandC.TheniftheseareAandB,twopermutationsgivethischoice,ABCandBAC.Thisis,ofcourse,2!×1!=2permutations.Eachcombinationofrthingsaccountsforr!(n-r)!ofthen!permutationspossible,sothereare
possiblecombinations.Forexample,considerthenumberofcombinationsoftwoobjectsoutofthree,sayA,BandC.ThepossiblechoicesareAB,ACandBC.Thereisnootherpossibility.Applyingtheformula,wehaven=3andr=2so
![Page 174: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/174.jpg)
Sometimesinusingthisformulawecomeacrossr=0orr=nleadingto0!.Thiscannotbedefinedinthewaywehavechosen,butwecancalculateitsonlypossiblevalue,0!=1.Becausethereisonlyonewayofchoosingnobjectsfromn,wehave
so0!=1.
6BAppendix:Expectedvalueofasumofsquares
Thepropertiesofmeansandvariancesdescribedin§6.6canbeusedtoanswerthequestionraisedin§4.7and§4Aaboutthedivisorinthesamplevariance.Weaskwhythevariancefromasampleis
andnot
Weshallbeconcernedwiththegeneralpropertiesofsamplesofsizen,soweshalltreatnasaconstantandxiand[xwithbarabove]asrandomvariables.Weshallsupposexihasmeanµandvarianceσ2.
Theexpectedvalueofthesumofsquareis
becausetheexpectedvalueofthedifferenceisthedifferencebetweentheexpectedvaluesandnisaconstant.Now,thepopulationvarianceσ2istheaveragesquareddistancefromthepopulationmeanµ,so
becauseµisaconstant.BecauseE(xi)=µ,wehave
![Page 175: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/175.jpg)
andsowefindE(x2i)=σ2+µ2andsoE(Σx2i)=n(σ2+µ2),beingthesumofnnumbersallofwhichareσ2+µ2.WenowfindthevalueofE((Σxi)2).Weneed
JustasE(x2i)=σ2+µ2=VAR(xi)+(E(xi))2so
So
Sotheexpectedvalueofthesumofsquaresis(n-1)σ2andwemustdividethesumofsquaresbyn-1,notn,toobtaintheestimateofthevariance,σ2.
Weshallfindthevarianceofthesamplemean,[xwithbarabove],usefullater(§8.2):
6MMultiplechoicequestions25to31(Eachbranchiseithertrueorfalse.)
25.TheeventsAandBaremutuallyexclusive,so:
(a)PROB(AorB)=PROB(A)+PROB(B);
(b)PROB(AandB)=0;
(c)PROB(AandB)=PROB(A)PROB(B);
![Page 176: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/176.jpg)
(d)PROB(A)=PROB(B);
(e)PROB(A)+PROB(B)=1.
ViewAnswer
26.Theprobabilityofawomanaged50havingconditionXis0.20andtheprobabilityofherhavingconditionYis0.05.Theseprobabilitiesareindependent:
Fig.19.5.PiechartshowingthedistributionofpatientsinTootingBecHospitalbydiagnosticgroup
![Page 177: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/177.jpg)
Fig.19.6.BarchartshowingtheresultsoftheSalkvaccinetrial
(a)theprobabilityofherhavingbothconditionsis0.01;
(b)theprobabilityofherhavingbothconditionsis0.25;
(c)theprobabilityofherhavingeitherX,orY,orbothis0.24;
(d)ifshehasconditionX,theprobabilityofherhavingYalsois0.01;
(e)ifshehasconditionY,theprobabilityofherhavingXalsois0.20.
ViewAnswer
27.ThefollowingvariablesfollowaBinomialdistribution:
(a)numberofsixesin20throwsofadie;
(b)humanweight;
(c)numberofarandomsampleofpatientswhorespondtoatreatment;
(d)numberofredcellsin1mlofblood;
(e)proportionofhypertensivesinarandomsampleofadultmen.
ViewAnswer
![Page 178: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/178.jpg)
28.Twoparentseachcarrythesamerecessivegenewhicheachtransmitstotheirchildwithprobability0.5.Iftheirchildwilldevelopclinicaldiseaseifitinheritsthegenefrombothparentsandwillbeacarrierifitinheritsthegenefromoneparentonlythen:
(a)theprobabilitythattheirnextchildwillhaveclinicaldiseaseis0.25;
(b)theprobabilitythattwosuccessivechildrenwillbothdevelopclinicaldiseaseis0.25×0.25;
(c)theprobabilitytheirnextchildwillbeacarrierwithoutclinicaldiseaseis0.50:
(d)theprobabilityofachildbeingacarrierorhavingclinicaldiseaseis0.75;
(e)ifthefirstchilddoesnothaveclinicaldisease,theprobabilitythatthesecondchildwillnothaveclinicaldiseaseis0.752.
ViewAnswer
Table6.2.Numberofmenremainingaliveattenyearintervals(fromEnglishLifeTableNo.11,
Males)
Ageinyears,x
Numbersurviving,lx
Ageinyears,x
Numbersurviving,lx
0 1000 60 758
10 959 70 524
![Page 179: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/179.jpg)
20 952 80 211
30 938 90 22
40 920 100 0
50 876
29.Ifacoinisspuntwiceinsuccession:
(a)theexpectednumberoftailsis1.5;
(b)theprobabilityoftwotailsis0.25;
(c)thenumberoftailsfollowsaBinomialdistribution;
(d)theprobabilityofatleastonetailis0.5;
(e)thedistributionofthenumberoftailsissymmetrical.
ViewAnswer
30.IfXisarandomvariable,meanµandvarianceσ2:
(a)E(X+2)=µ;
(b)VAR(X+2)=σ2;
(c)E(2X)=2µ;
(d)VAR(2X)=2σ2;
(e)VAR(X/2)=σ2/4.
ViewAnswer
31.IfXandYareindependentrandomvariables:
(a)VAR(X+Y)=VAR(X)+VAR(Y);
(b)E(X+Y)=E(X)+E(Y);
![Page 180: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/180.jpg)
(c)E(X-Y)=E(X)-E(Y);
(d)VAR(X-Y)=VAR(X)-VAR(Y);
(e)VAR(-X)=-VAR(X).
ViewAnswer
6EExercise:ProbabilityandthelifetableInthisexerciseweshallapplysomeofthebasiclawsofprobabilitytoapracticalexercise.Thedataarebasedonalifetable.(Ishallsaymoreaboutthesein§16.4.)Table6.2showsthenumberofmen,fromagroupnumbering1000atbirth,whowewouldexpecttobealiveatdifferentages.Thus,forexample,after10years,weseethat959surviveandso41havedied,at20years952surviveandso48havedied,41betweenages0and9and7betweenages10and19.
1.Whatistheprobabilitythatanindividualchosenatrandomwillsurvivetoage10?
ViewAnswer
2.Whatistheprobabilitythatthisindividualwilldiebeforeage10?Whichpropertyofprobabilitydoesthisdependon?
ViewAnswer
3.Whataretheprobabilitiesthattheindividualwillsurvivetoages10,20.30,40,50,60,70.80,90,100?Isthissetofprobabilitiesaprobabilitydistribution?
ViewAnswer
4.Whatistheprobabilitythatanindividualaged60yearssurvivestoage70?
ViewAnswer
5.Whatistheprobabilitythattwomenaged60willbothsurviveto
![Page 181: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/181.jpg)
age70?Whichpropertyofprobabilityisusedhere?
ViewAnswer
6.Ifwehad100individualsaged60,howmanywouldweexpecttoattainage70?
ViewAnswer
7.Whatistheprobabilitythatamandiesinhisseconddecade?YoucanusethefactthatPROB(deathin2nd)+PROB(survivesto3rd)=PROB(survivesto2nd).
ViewAnswer
8.Foreachdecade,whatistheprobabilitythatagivenmanwilldieinthatdecade?Thisisaprobabilitydistribution—why?Sketchthedistribution.
ViewAnswer
9.Asanapproximation,wecanassumethattheaveragenumberofyearslivedinthedecadeofdeathis5.Thus,thosewhodieinthe2nddecadewillhaveanaveragelifespanof15years.Theprobabilityofdyinginthe2nddecadeis0.007,i.e.aproportion0.007ofmenhaveameanlifetimeof15years.Whatisthemeanlifetimeofallmen?Thisistheexpectationoflifeatbirth.
ViewAnswer
![Page 182: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/182.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>7-TheNormaldistribution
7
TheNormaldistribution
7.1ProbabilityforcontinuousvariablesWhenwederivedthetheoryofprobabilityinthediscretecase,wewereabletosaywhattheprobabilitywasofarandomvariabletakingaparticularvalue.Asthenumberofpossiblevaluesincreases,theprobabilityofaparticularvaluedecreases.Forexample,intheBinomialdistributionwithp=0.5andn=2,themostlikelyvalue,1,hasprobability0.5.IntheBinomialdistributionwithp=0.5andn=100themostlikelyvalue,50,hasprobability0.08.Insuchcasesweareusuallymoreinterestedintheprobabilityofarangeofvaluesthanoneparticularvalue.
Foracontinuousvariable,suchasheight,thesetofpossiblevaluesisinfiniteandtheprobabilityofanyparticularvalueiszero(§6.1).Weareinterestedintheprobabilityoftherandomvariabletakingvaluesbetweencertainlimitsratherthantakingparticularvalues.Iftheproportionofindividualsinthepopulationwhosevaluesarebetweengivenlimitsisp,andwechooseanindividualatrandom,theprobabilityofchoosinganindividualwholiesbetweentheselimitsisequaltop.Thiscomesfromourdefinitionofprobability,thechoiceofeachindividualbeingequallylikely.Theproblemisfindingandgivingavaluetothisprobability.
Whenwefindthefrequencydistributionforasampleofobservations,we
countthenumberofvaluesinwhichfallwithincertainlimits(§4.2).WecanrepresentthisasahistogramsuchasFigure7.1(§4.3).Oneway
![Page 183: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/183.jpg)
ofpresentingthehistogramisasrelativefrequencydensity,theproportionofobservationsintheintervalperunitofX(§4.3),Thus,whentheintervalsizeis5,therelativefrequencydensityistherelativefrequencydividedby5(Figure7.1).Therelativefrequencyinanintervalisnowrepresentedbythewidthoftheintervalmultipliedbythedensity,whichgivestheareaoftherectangle.Thus,therelativefrequencybetweenanytwopointscanbefoundfromtheareaunderthehistogrambetweenthepoints.Forexample,toestimatetherelativefrequencybetween10and20inFigure7.1wehavethedensityfrom10to15as0.05andbetween15and20as0.03.Hencetherelativefrequencyis
0.05×(15-10)+0.03×(20-15)=0.25+0.15=0.40
Ifwetakealargersamplewecanusesmallerintervals.Wegetasmootherlookinghistogram,asinFigure7.2,andaswetakelargerandlargersamples,andsosmallerandsmallerintervals,wegetashapeveryclosetoasmoothcurve(Figure7.3).Asthesamplesizeapproachesthatofthepopulation,whichwecanassumetobeverylarge,thiscurvebecomestherelativefrequencydensityofthewholepopulation.Thuswecanfindtheproportionofobservationsbetweenanytwolimitsbyfindingtheareaunderthecurve,asindicatedinFigure7.3.
![Page 184: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/184.jpg)
Fig.7.1.Histogramshowingrelativefrequencydensity
Fig.7.2.Theeffectonafrequencydistributionofincreasingsamplesize
Ifweknowtheequationofthiscurve,wecanfindtheareaunderit.(Mathematicallywedothisbyintegration,butwedonotneedtoknowhowtointegratetouseortounderstandpracticalstatistics—alltheintegralsweneedhavebeendoneandtabulated.)Now,ifwechooseanindividualatrandom,theprobabilitythatXliesbetweenanygivenlimitsisequaltotheproportionofindividualswhofallbetweentheselimits.Hence,therelativefrequencydistributionforthewholepopulationgivesustheprobabilitydistributionofthevariable.Wecallthiscurvetheprobabilitydensityfunction.
![Page 185: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/185.jpg)
Fig.7.3.Relativefrequencydensityorprobabilitydensityfunction,showingtheprobabilityofanobservationbetween10and20
Fig.7.4.Mean,µ,standarddeviation,σ,andaprobabilitydensityfunction
![Page 186: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/186.jpg)
Probabilitydensityfunctionshaveanumberofgeneralproperties.Forexample,thetotalareaunderthecurvemustbeone,sincethisisthetotalprobabilityofallpossibleevents.Continuousrandomvariableshavemeans,variancesandstandarddeviationsdefinedinasimilarwaytothosefordiscreterandomvariablesandpossessingthesameproperties(§6.5).Themeanwillbesomewherenearthemiddleofthecurveandmostoftheareaunderthecurvewillbebetweenthemeanminustwostandarddeviationsandthemeanplustwostandarddeviations(Figure7.4).
Thepreciseshapeofthecurveismoredifficulttoascertain.Therearemanypossibleprobabilitydensityfunctionsandsomeofthesecanbeshowntoarisefromsimpleprobabilitysituations,asweretheBinomialandPoissondistributions.However,mostcontinuousvariableswithwhichwehavetodeal,suchas
height,bloodpressure,serumcholesterol,etc.,donotarisefromsimpleprobabilitysituations.Asaresult,wedonotknowtheprobabilitydistributionforthesemeasurementsontheoreticalgrounds.Asweshallsee,wecanoftenfindastandarddistributionwhosemathematicalpropertiesareknown,whichfitsobserveddatawellandwhichenablesustodrawconclusionsaboutthem.Further,assamplesizeincreasesthedistributionofcertainstatisticscalculatedfromthedata,suchasthemean,becomeindependentofthedistributionoftheobservationsthemselvesandfollowoneparticulardistributionform,theNormaldistribution.Weshalldevotetheremainderofthischaptertoastudyofthisdistribution.
![Page 187: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/187.jpg)
Fig.7.5.Binomialdistributionsforp=0.3andsixdifferentvaluesofn,withcorrespondingNormaldistributioncurves
7.2TheNormaldistributionTheNormaldistribution,alsoknownastheGaussiandistribution,mayberegardedasthefundamentalprobabilitydistributionofstatistics.Theword‘normal’isnotusedhereinitscommonmeaningof‘ordinaryorcommon’,oritsmedicalmeaningof‘notdiseased’.Theusagerelatestoitsoldermeaningof‘conformingtoaruleorpattern’,andasweshallsee,theNormaldistributionistheformtowhichtheBinomialdistributiontendsasitsparameternincreases.ThereisnoimplicationthatmostvariablesfollowaNormaldistribution.
WeshallstartbyconsideringtheBinomialdistributionasnincreases.Wesawin§6.4that,asnincreases,theshapeofthedistributionchanges.Themostextremepossiblevaluesbecomelesslikelyandthedistributionbecomesmoresymmetrical.Thishappenswhateverthevalueofp.Thepositionofthedistributionalongthehorizontalaxis,anditsspread,arestilldeterminedbyp,buttheshapeisnot.Asmoothcurvecanbedrawnwhichgoesveryclosetothesepoints.ThisistheNormaldistributioncurve,thecurveofthecontinuousdistributionwhichtheBinomialdistributionapproachesasnincreases.
![Page 188: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/188.jpg)
AnyBinomialdistributionmaybeapproximatedbytheNormaldistributionofthesamemean
andvarianceprovidednislargeenough.Figure7.5showstheBinomialdistributionsofFigure6.3withthecorrespondingNormaldistributioncurves.Fromn=10onwardsthetwodistributionsareveryclose.Generally,ifbothnpandn(1-p)exceed5theapproximationoftheBinomialtotheNormaldistributionisquitegoodenoughformostpracticalpurposes.See§8.4foranapplication.ThePoissondistributionhasthesameproperty,asFigure6.4suggests.
Fig.7.6.SumsofobservationsfromaUniformdistribution
TheBinomialvariablemayberegardedasthesumofnindependentidenticallydistributedrandomvariables,eachbeingtheoutcomeofonetrialtakingvalue1withprobabilityp.Ingeneral,ifwehaveany
![Page 189: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/189.jpg)
seriesofindependent,identicallydistributedrandomvariables,thentheirsumtendstoaNormaldistributionasthenumberofvariablesincreases.Thisisknownasthecentrallimittheorem.Asmostsetsofmeasurementsareobservationsofsuchaseriesofrandomvariables,thisisaveryimportantproperty.Fromit,wecandeducethatthesumormeanofanylargeseriesofindependentobservationsfollowsaNormaldistribution.
Forexample,considertheUniformorRectangulardistribution.Thisisthedistributionwhereallvaluesbetweentwolimits,say0and1,areequallylikelyandnoothervaluesarepossible.ObservationsfromthisariseifwetakerandomdigitsfromatableofrandomnumberssuchasTable2.3.EachobservationoftheUniformvariableisformedbyaseriesofsuchdigitsplacedafteradecimalpoint.Onamicrocomputer,thisisusuallythedistributionproducedbytheRND(X)functionintheBASIClanguage.Figure7.6showsthehistogramforthefrequencydistributionof500observationsfromtheUniformdistribution
between0and1.ItisquitedifferentfromtheNormaldistribution.NowsupposewecreateanewvariablebytakingtwoUniformvariablesandaddingthem(Figure7.6),TheshapeofthedistributionofthesumoftwoUniformvariablesisquitedifferentfromtheshapeoftheUniformdistribution.Thesumisunlikelytobeclosetoeitherextreme,here0or2,andobservationsareconcentratedinthemiddleneartheexpectedvalue.Thereasonforthisisthattoobtainalowsum,boththeUniformvariablesformingitmustbelow;tomakeahighsumbothmustbehigh.Butwegetasumnearthemiddleifthefirstishighandthesecondlow,orthefirstislowandsecondhigh,orbothfirstandsecondaremoderate.ThedistributionofthesumoftwoismuchclosertotheNormalthanistheUniformdistributionitself.However,theabruptcut-offat0andat2isunlikethecorrespondingNormaldistribution.Figure7.6alsoshowstheresultofaddingfourUniformvariablesandsixUniformvariables.ThesimilaritytotheNormaldistributionincreasesasthenumberaddedincreasesandforthesumofsixthecorrespondenceissoclosethatthedistributionscouldnoteasilybetoldapart.
TheapproximationoftheBinomialtotheNormaldistributionisa
![Page 190: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/190.jpg)
specialcaseofthecentrallimittheorem.ThePoissondistributionisanother.IfwetakeasetofPoissonvariableswiththesamerateandaddthem,wewillgetavariablewhichisthenumberofrandomeventsinalongertimeinterval(thesumoftheintervalsfortheindividualvariables)andwhichisthereforeaPoissondistributionwithincreasedmean.Asitisthesumofasetofindependent,identicallydistributedrandomvariablesitwilltendtowardstheNormalasthemeanincreases.HenceasthemeanincreasesthePoissondistributionbecomesapproximatelyNormal.Formostpracticalpurposesthisiswhenthemeanexceeds10.ThesimilaritybetweenthePoissonandtheBinomialnotedin§6.7isapartofamoregeneralconvergenceshownbymanyotherdistributions.
7.3PropertiesoftheNormaldistributionInitssimplestformtheequationoftheNormaldistributioncurve,calledtheStandardNormaldistribution,isusuallydenotedbyφ(z),whereφistheGreekletter‘phi’:
whereπistheusualmathematicalconstant.Themedicalreadercanbereassuredthatwedonotneedtousethisforbiddingformulainpractice.TheStandardNormaldistributionhasameanof0,astandarddeviationof1andashapeasshowninFigure7.7.Thecurveissymmetricalaboutthemeanandoftendescribedas‘bell-shaped’(thoughIhaveneverseenabelllikeit).Wecannotethatmostofthearea,i.e.theprobability,isbetween-1and+1,thelargemajoritybetween-2and+2,andalmostallbetween-3and+3.
AlthoughtheNormaldistributioncurvehasmanyremarkableproperties,ithasoneratherawkwardone:itcannotbeintegrated.Inotherwords,thereisnosimpleformulafortheprobabilityofarandomvariablefromaNormal
distributionlyingbetweengivenlimits.Theareasunderthecurvecanbefoundnumerically,however,andthesehavebeencalculatedandtabulated.Table7.1showstheareaundertheprobabilitydensitycurve
![Page 191: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/191.jpg)
fordifferentvaluesoftheNormaldistribution.Tobemoreprecise,foravaluezthetableshowstheareaunderthecurvetotheleftofz,i.e.fromminusinfinitytoz(Figure7.8).ThusΦ(z)istheprobabilitythatavaluechosenatrandomfromtheStandardNormaldistributionwillbelessthanz.ΦistheGreekcapital‘phi’.Notethathalfthistableisnotstrictlynecessary.WeneedonlythehalfforpositivezasΦ(-z)+Φ(z)=1.Thisarisesfromthesymmetryofthedistribution.Tofindtheprobabilityofzlyingbetweentwovaluesaandb,whereb>a,wefindΦ(b)-Φ(a).Tofindtheprobabilityofzbeinggreaterthanawefind1-Φ(a).Theseformulaeareallexamplesoftheadditivelawofprobability.Table7.1givesonlyafewvaluesofz,andmuchmoreextensiveonesareavailable(LindleyandMiller1955,PearsonandHartley1970).Goodstatisticalcomputerprogramswillcalculatethesevalueswhentheyareneeded.
Fig.7.7.TheStandardNormaldistribution
Table7.1.TheNormaldistribution
![Page 192: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/192.jpg)
z Φ(z) z Φ(z) z Φ(z) z Φ(z)
-3.0 0.001 -2.0 0.023 -1.0 0.159 0.0 0.500
-2.9 0.002 -1.9 0.029 -0.9 0.184 0.1 0.540
-2.8 0.003 -1.8 0.036 -0.8 0.212 0.2 0.579
-2.7 0.003 -1.7 0.045 -0.7 0.242 0.3 0.618
-2.6 0.005 -1.6 0.055 -0.6 0.274 0.4 0.655
-2.5 0.006 -1.5 0.067 -0.5 0.309 0.5 0.691
-2.4 0.008 -1.4 0.081 -0.4 0.345 0.6 0.726
-2.3 0.011 -1.3 0.097 -0.3 0.382 0.7 0.758
-2.2 0.014 -1.2 0.115 -0.2 0.421 0.8 0.788
-2.1 0.018 -1.1 0.136 -0.1 0.460 0.9 0.816
-2.0 0.023 -1.0 0.159 0.0 0.500 1.0 0.841
Thereisanotherwayoftabulatingadistribution,usingwhatarecalled
percentagepoints.Theone-sidedPpercentagepointofadistributionisthevaluezsuchthatthereisaprobabilityP%ofanobservationfromthatdistributionbeinggreaterthanorequaltoz(Figure7.8).Thetwo-sidedPpercentagepointisthevaluezsuchthatthereisaprobability
![Page 193: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/193.jpg)
P%ofanobservationbeinggreaterthanorequaltozorlessthanorequalto-z(Figure7.8).Table7.2showsbothonesidedandtwosidedpercentagepointsfortheNormaldistribution.Theprobabilityisquotedasapercentagebecausewhenweusepercentagepointsweareusuallyconcernedwithrathersmallprobabilities,suchas0.05or0.01,anduseofthepercentageform,makingthem5%and1%,cutsouttheleadingzero.
Table7.2.PercentagepointsoftheNormaldistribution
One-sided Two-sided
P1 (z) P2 (z)
50 0.00
25 0.67 50 0.67
10 1.28 20 1.28
5 1.64 10 1.64
2.5 1.96 5 1.96
1 2.33 2 2.33
0.5 2.58 1 2.58
0.1 3.09 0.2 3.09
![Page 194: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/194.jpg)
0.05 3.29 0.1 3.29
ThetableshowstheprobabilityP1(z)ofaNormalvariablewithmean0andvariance1beinggreaterthanz,andtheprobabilityP2(z)ofaNormalvariablewithmean0andvariance1beinglessthan-zorgreaterthanz.
Fig.7.8.One-andtwo-sidedpercentagepoints(5%)oftheStandardNormaldistribution
SofarwehaveexaminedtheNormaldistributionwithmean0andstandarddeviation1.IfweaddaconstantµtoaStandardNormalvariable,wegetanewvariablewhichhasmeanµ(see§6.6).Figure7.9showstheNormaldistributionwithmean0andthedistributionobtainedbyadding1toittogetherwiththeirtwo-sided5%points.Thecurvesareidenticalapartfromashiftalongtheaxis.
Onthecurvewithmean0nearlyalltheprobabilityisbetween-3and+3.Forthecurvewithmean1itisbetween-2and+4,i.e.betweenthemean-3andthemean+3.Theprobabilityofbeingagivennumberof
![Page 195: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/195.jpg)
unitsfromthemeanisthesameforbothdistributions,asisalsoshownbythe5%points.
Fig.7.9.Normaldistributionswithdifferentmeansandwithdifferentvariances,showingtwo-sided5%points
IfwetakeaStandardNormalvariable,withstandarddeviation1,andmultiplybyaconstantσwegetanewvariablewhichhasstandarddeviationσ.Figure7.9showstheNormaldistributionwithmean0andstandarddeviation1andthedistributionobtainedbymultiplyingby2.Thecurvesdonotappearidentical.Forthedistributionwithstandarddeviation2,nearlyalltheprobabilityisbetween-6and+6,amuchwiderintervalthanthe-3and+3forthestandarddistribution.Thevalues-6and+6are-3and+3standarddeviations.Wecanseethattheprobabilityofbeingagivennumberofstandarddeviationsfromthemeanisthesameforbothdistributions.Thisisalsoseenfromthe5%points,whichrepresentthemeanplusorminus1.96standarddeviationsineachcase.
InfactifweaddµtoaStandardNormalvariableandmultiplybyσ,wegetaNormaldistributionofmeanµ,andstandarddeviationσ.Tables7.1and7.2applytoitdirectly,ifwedenotebyzthenumberofstandarddeviationsabovethemean,ratherthanthenumericalvalueofthevariable.Thus,forexample,thetwosided5%pointsofaNormaldistributionwithmean10andstandarddeviation5arefoundby10-
![Page 196: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/196.jpg)
1.96×5=0.2and10+1.96×5=19.8,thevalue1.96beingfoundfromTable7.2.
ThispropertyoftheNormaldistribution,thatmultiplyingoraddingconstantsstillgivesaNormaldistribution,isnotasobviousasitmightseem.TheBinomialdoesnothaveit,forexample.TakeaBinomialvariablewithn=3,possiblevalues0,1,2,and3,andmultiplyby2,Thepossiblevaluesarenow0,2,4,and6.TheBinomialdistributionwithn=6hasalsopossiblevalues1,3,and5,sothedistributionsaredifferentandtheonewhichwehavederivedisnotamemberoftheBinomialfamily.
WehaveseenthataddingaconstanttoavariablefromaNormaldistributiongivesanothervariablewhichfollowsaNormaldistribution.IfweaddtwovariablesfromNormaldistributionstogether,evenwithdifferentmeansand
variances,thesumfollowsaNormaldistribution.ThedifferencebetweentwovariablesfromNormaldistributionsalsofollowsaNormaldistribution.
Fig.7.10.Distributionofheightinasampleof1794pregnantwomen(dataofBrookeetal.1989)
![Page 197: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/197.jpg)
Fig.7.11.Distributionofserumtriglyceride(Table4.8)andlog10triglycerideincordbloodfor282babies,withcorrespondingNormaldistributioncurves
7.4VariableswhichfollowaNormaldistributionSofarwehavediscussedtheNormaldistributionasitarisesfromsamplingasthesumorlimitofotherdistributions.However,manynaturallyoccurringvariables,suchashumanheight,appeartofollowaNormaldistributionveryclosely.Wemightexpectthistohappenifthevariableweretheresultofaddingvariationfromanumberofdifferentsources.TheprocessshownbythecentrallimittheoremmaywellproducearesultclosetoNormal.Figure7.10showsthedistributionofheightinasampleofpregnantwomen,andthecorrespondingNormaldistributioncurve.ThefittotheNormaldistributionisverygood.
IfthevariablewemeasureistheresultofmultiplyingseveraldifferentsourcesofvariationwewouldnotexpecttheresulttobeNormalfromtheproperties
discussedin§7.2,whichwereallbasedonadditionofvariables.However,ifwetakethelogtransformationofsuchavariable(§5A)wewouldthengetanewvariablewhichisthesumofseveraldifferentsourcesofvariationandwhichmaywellhaveaNormaldistribution.
![Page 198: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/198.jpg)
Thisprocessoftenhappenswithquantitieswhicharepartofmetabolicpathways,therateatwhichreactioncantakeplacedependingontheconcentrationsofothercompounds.Manymeasurementsofbloodconstituentsexhibitthis,forexample.Figure7.11showsthedistributionofserumtriglyceridemeasuredincordbloodfor282babies(Table4.8).ThedistributionishighlyskewedandquiteunliketheNormaldistributioncurve.However,whenwetakethelogarithmofthetriglycerideconcentration,wehavearemarkablygoodfittotheNormaldistribution(Fig.7.11).IfthelogarithmofarandomvariablefollowsaNormaldistribution,therandomvariableitselffollowsaLognormaldistribution.
WeoftenwanttochangethescaleonwhichweanalyseourdatasoastogetaNormaldistribution.Wecallthisprocessofanalysingamathematicalfunctionofthedataratherthanthedatathemselvestransformation.Thelogarithmisthetransformationmostoftenused,thesquarerootandreciprocalareothers(seealso§10.4).Forasinglesample,transformationenablesustousetheNormaldistributiontoestimatecentiles(§4.5).Forexample,weoftenwanttoestimatethe2.5thand97.5thcentiles.whichtogetherenclose95%oftheobservations.ForaNormaldistribution,thesecanbeestimatedby[xwithbarabove]±1.96s.WecantransformthedatasothatthedistributionisNormal,calculatethecentile,andthentransformbacktotheoriginalscale.
ConsiderthetriglyceridedataofFigure7.11andTable4.8.Themeanis0.51andthestandarddeviation0.22.Themeanforthelog10transformeddatais-0.33andthestandarddeviationis0.17.Whathappensifwetransformbackbytheantilog?Forthemean,weget10-0.33=0.47.Thisislessthanthemeanfortherawdata.Theantilogofthemeanlogisnotthesameastheuntransformedarithmeticmean.Infact,thisthegeometricmean,whichisthenthrootoftheproductoftheobservations.Ifweaddthelogsoftheobservationswegetthelogoftheirproduct(§5A).Ifwemultiplythelogofanumberbyasecondnumber,wegetthelogofthefirstraisedtothepowerofthesecond.Soifwedividethelogbyn,wegetthelogofthenthroot.Thusthemeanofthelogsisthelogofthegeometricmean.Onbacktransformation,thereciprocaltransformationalsoyieldsameanwitha
![Page 199: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/199.jpg)
specialname,theharmonicmean,thereciprocalofthemeanofthereciprocals.
Thegeometricmeanisintheoriginalunits.Iftriglycerideismeasuredinmmol/litre,thelogofasingleobservationisthelogofameasurementinmmol/litre.Thesumofnlogsisthelogoftheproductofnmeasurementsinmmol/litreandisthelogofameasurementinmmol/litretothenth.Thenthrootisthusagainthelogofanumberinmmol/litreandtheantilogisbackintheoriginalunits,mmol/litre(see§5A).
Theantilogofthestandarddeviation,however,isnotmeasuredintheoriginalunits.Tocalculatethestandarddeviationwetakethedifferencebetweeneachlogobservationandsubtracttheloggeometricmean,usingtheusualformula
Σ(xi-[xwithbarabove])2/(n-1)(§4.8).Thuswehavethedifferencebetweenthelogoftwonumberseachmeasuredinmmol/litre,givingthelogoftheirratio(§5A)whichisthelogofadimensionlesspurenumber.Itwouldbethesameifthetriglyceridesweremeasuredinmmol/litreormg/100ml.Wecannottransformthestandarddeviationbacktotheoriginalscale.
Ifwewanttousethestandarddeviation,itiseasiesttodoallcalculationsonthetransformedscaleandtransformback,ifnecessary,attheend.Forexample,the2.5thcentileonthelogscaleis-0.33-1.96×0.17=-0.66andthe97.5thcentileis-0.33+1.96×0.17=0.00.Togetthesewetookthelogofsomethinginmmol/litreandaddedorsubtractedthelogofapurenumber(i.e.multipliedonthenaturalscale),sowestillhavethelogofsomethinginmmol/litre.Togetbacktotheoriginalscaleweantilogtoget2.5thcentile=0.22and97.5thcentile=1.00mmol/litre.
TransformingthedatatoaNormaldistributionandthenanalysingonthetransformedscalemaylooklikecheating.Idonotthinkitis.Thescaleonwhichwechoosetomeasurethingsneednotbelinear,thoughthisisoftenconvenient.Otherscalescanbemuchmoreuseful.WemeasurepHonalogarithmicscale,forexample.Shouldthemagnitudeofanearthquakebemeasuredinmmofamplitude(linear)oronthe
![Page 200: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/200.jpg)
Richterscale(logarithmic)?Shouldspectaclelensesbemeasuredintermsoffocallengthincm(linear)ordioptres(reciprocal)?Weoftenchoosenon-linearscalesbecausetheysuitourpurposeandforstatisticalanalysisitoftensuitsustomakethedistributionNormal,byfindingascaleofmeasurementwherethisisthecase.
7.5TheNormalplotManystatisticalmethodscanonlybeusediftheobservationsfollowaNormaldistribution(seeChapters10and11).ThereareseveralwaysofinvestigatingwhetherobservationsfollowaNormaldistribution.WithalargesamplewecaninspectahistogramtoseewhetheritlookslikeaNormaldistributioncurve.Thisdoesnotworkwellwithasmallsample,andamorereliablemethodistheNormalplot.Thisisagraphicalmethod,whichcanbedoneusingordinarygraphpaperandatableoftheNormaldistribution,withspeciallyprintedNormalprobabilitypaper,or,muchmoreeasily,usingacomputer.AnygoodgeneralstatisticalpackagewillgiveNormalplots;ifitdoesnotthenitisnotagoodpackage.TheNormalplotmethodcanbeusedtoinvestigatetheNormalassumptioninsamplesofanysize,andisaveryusefulcheckwhenusingmethodssuchasthetdistributionmethodsdescribedinChapter10.
TheNormalplotisaplotofthecumulativefrequencydistributionforthedataagainstthecumulativefrequencydistributionfortheNormaldistribution.First,weorderthedatafromlowesttohighest.ForeachorderedobservationwefindtheexpectedvalueoftheobservationifthedatafollowedaStandardNormaldistribution.Thereareseveralapproximateformulaeforthis.IshallfollowArmitageandBerry(1994)andusefortheithobservationzwhereΦ(z)=(i-0.5)/n.SomebooksandprogramsuseΦ(z)=i/(n+1)andthereareother
morecomplexformulae.Itdoesnotmakemuchdifferencewhichisused.WefindfromatableoftheNormaldistributionthevaluesofzwhichcorrespondtoΦ(z)=0.5/n,1.5/n,etc.(Table7.1lacksdetailforpracticalwork,butwilldoforillustration.)For5points,forexample,wehaveΦ(z)=0.1,0.3,0.5,0.7,and0.9.andz=-1.3,-0.5,0,0.5,and1.3.ThesearethepointsoftheStandardNormal
![Page 201: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/201.jpg)
distributionwhichcorrespondtotheobserveddata.Now,iftheobserveddatacomefromaNormaldistributionofmeanµandvarianceσ2,theobservedpointshouldequalσz+µ,wherezisthecorrespondingpointoftheStandardNormaldistribution.IfweplottheStandardNormalpointsagainsttheobservedvaluesweshouldgetsomethingclosetoastraightline.Wecanwritetheequationofthislineasσz+µ=x,wherexistheobservedvariableandzthecorrespondingquantileoftheStandardNormaldistribution.Wecanrewritethisas
whichgoesthroughthepointdefinedby(µ,0)andhasslope1/σ(see§11.1).IfthedataarenotfromaNormaldistributionwewillnotgetastraightline,butacurveofsomesort.Becauseweplotthequantilesoftheobservedfrequency
distributionagainstthecorrespondingquantilesofthetheoretical(hereNormal)distribution,thisisalsoreferredtoasaquantile–quantileplotorq–qplot.
Table7.3.VitaminDlevelsmeasuredinthebloodof26healthymen,dataofHickishetal.(1989)
14 25 30 42 54
17 26 31 43 54
20 26 31 46 63
21 26 32 48 67
![Page 202: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/202.jpg)
22 27 35 52 83
24
Table7.4.CalculationoftheNormalplotforthevitaminDdata
i VitD Φ(z) z i Vit
D Φ(z) z
1 14 0.019 -2.07 14 31 0.519 0.05
2 17 0.058 -1.57 15 32 0.558 0.15
3 20 0.096 -1.30 16 35 0.596 0.24
4 21 0.135 -1.10 17 42 0.635 0.34
5 22 0.173 -0.94 18 43 0.673 0.45
6 24 0.212 -0.80 19 46 0.712 0.56
7 25 0.250 -0.67 20 48 0.750 0.67
8 26 0.288 -0.56 21 52 0.788 0.80
9 26 0.327 -0.45 22 54 0.827 0.94
![Page 203: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/203.jpg)
10 26 0.365 -0.34 23 54 0.865 1.10
11 27 0.404 -0.24 24 63 0.904 1.30
12 30 0.442 -0.15 25 67 0.942 1.57
13 31 0.481 -0.05 26 83 0.981 2.07
Φ(z)=(i-0.5)/26
Fig.7.12.BloodvitaminDlevelsandlog10vitaminDfor26normalmen,withNormalplots
![Page 204: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/204.jpg)
Table7.3showsvitaminlevelsmeasuredinthebloodof26healthymen.ThecalculationoftheNormalplotisshowninTable7.4.NotethattheΦ(z)=(i-0.5)/26andzaresymmetrical,thesecondhalfbeingthefirsthalfwithoppositesign.ThevalueoftheStandardNormaldeviate,z,canbefoundbyinterpolationinTable7.1,byusingafullertable,orbycomputer.Figure7.12showsthehistogramandtheNormalplotforthesedata.ThedistributionisskewandtheNormalplotshowsapronouncedcurve.Figure7.12alsoshowsthevitaminDdataafterlogtransformation.ItisquiteeasytoproducetheNormalplot,asthecorrespondingStandardNormaldeviate,z,isunchanged.Weonlyneedtologtheobservationsandplotagain.TheNormalplotforthetransformeddataconformsverywelltothetheoreticalline,suggestingthatthedistributionoflogvitaminDlevelisclosetotheNormal.
AsinglebendintheNormalplotindicatesskewness.AdoublecurveindicatesthatbothtailsofthedistributionaredifferentfromtheNormal,usuallybeingtoolong,andmanycurvesmayindicatethatthedistributionisbimodal(Figure7.13).Whenthesampleissmall,ofcourse,therewillbesomerandomfluctuations.
ThereareseveraldifferentwaystodisplaytheNormalplot.SomeprogramsplotthedatadistributionontheverticalaxisandthetheoreticalNormaldistributiononthehorizontalaxis,whichreversesthedirectionofthecurve.Some
plotthetheoreticalNormaldistributionwithmean[xwithbarabove],thesamplemean,andstandarddeviations,thesamplestandarddeviation.Thisisdonebycalculating[xwithbarabove]+sz.Figure7.14(a)showsboththesefeatures,theNormalplotdrawnbytheprogramStata's‘qnorm’command.Thestraightlineisthelineofequality.ThisplotisidenticaltothesecondplotinFigure7.12,exceptforthechangeofscaleandswitchingoftheaxes.AslightvariationisthestandardizedNormalprobabilityplotorp-pplot,wherewestandardizetheobservationstozeromeanandstandarddeviationone,y=(x-[xwithbarabove])/s,andplotthecumulativeNormal
probabilities,Φ(y),against(i-0.5)/nor?/(n+1)(Figure7.14(b),
![Page 205: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/205.jpg)
producedbytheStatacommand‘pnorm’)-ThereisverylittledifferencebetweenFigure7.14(a)and(b)andthequantileandprobabilityversionsoftheNormalplotshouldbeinterpretedinthesameway.
Fig.7.13.Bloodsodiumandsystolicbloodpressuremeasuredin250patientsintheIntensiveTherapyUnitatSt.George'sHospital,withNormalplots(dataofFreidlandetal.1996)
![Page 206: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/206.jpg)
Fig.7.14.VariationsontheNormalplotforthevitaminDdata
Appendices
7AAppendix:Chi-squared,t,andF
Lessmathematicallyinclinedreaderscanskipthissection,butthosewhopersevereshouldfindthatapplicationslikechi-squaredtests(Chapter13)appearmuchmorelogical.
ManyprobabilitydistributionscanbederivedforfunctionsofNormalvariableswhichariseinstatisticalanalysis.Threeoftheseareparticularlyimportant:theChi-squared,tandFdistributions.Thesehavemanyapplications,someofwhichweshalldiscussinlaterchapters.
TheChi-squareddistributionisdennedasfollows.SupposeZisaStandardNormalvariable,sohavingmean0andvariance1.ThenthevariableformedbyZ2followstheChi-squareddistributionwith1degreeoffreedom.IfwehavensuchindependentStandardNormalvariables,Z1,Z2,…,Znthenthevariabledefinedby
χ2=Z21+Z22+…+Z2n
isdefinedtobetheChi-squareddistributionwithndegreesoffreedom.χistheGreekletter‘chi’,pronounced‘ki’asin‘kite’.The
![Page 207: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/207.jpg)
distributioncurvesforseveraldifferentnumbersofdegreesoffreedomareshowninFigure7.15.Themathematicaldescriptionofthiscurveisrathercomplicated,butwedonotneedtogointothis.
SomepropertiesoftheChi-squareddistributionareeasytodeduce.AsthedistributionisthesumofnindependentidenticallydistributedrandomvariablesittendstotheNormalasnincreases,fromthecentrallimittheorem(§7.2).Theconvergenceisslow,however,(Figure7.15)andthesquarerootofchi-squaredconvergesmuchmorequickly.TheexpectedvalueofZ2isthevarianceofZ,theexpectedvalueofZbeing0,andsoE(Z2)=1.Theexpectedvalueofchi-squaredwithndegreesoffreedomisthusn:
TheChi-squareddistributionhasaveryimportantproperty.SupposewerestrictourattentiontoasubsetofpossibleoutcomesforthenrandomvariablesZ1,Z2,…,Zn.ThesubsetwillbedefinedbythosevaluesofZ1,Z2,…,Znwhichsatisfytheequationa1Z1+a2Z2+…+anZn=k,wherea1,a2…,an,andkareconstants.(Thisiscalledalinearconstraint).Thenunderthisrestriction,χ2=ΣZ2ifollowsaChi-squareddistributionwithn-1degreesoffreedom.Iftherearemsuchconstraintssuchthatnoneoftheequationscanbecalculated
fromtheothers,thenwehaveaChi-squareddistributionwithn-mdegreesoffreedom.Thisisthesourceofthename‘degreesoffreedom’.
![Page 208: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/208.jpg)
Fig.7.15.SomeChi-squareddistributions
Theproofofthisistoocomplicatedtogivehere,involvingsuchmathematicalabstractionsasndimensionalspheres,butitsimplicationsareveryimportant.First,considerthesumofsquaresaboutthepopulationmeanµofasampleofsizenfromaNormaldistribution,dividedbyσ2·σ(xi-µ)2/σ2willfollowaChi-squareddistributionwithndegreesoffreedom,asthe(xi-µ)/σhavemean0andvariance1andtheyareindependent.Nowsupposewereplaceµbyanestimatecalculatedfromthedata,[xwithbarabove].Thevariablesarenolongerindependent,theymustsatisfytherelationshipΣ(xi-[xwithbarabove])=0andwenowhaven-1degreesoffreedom.HenceΣ(xi-[xwithbarabove])2/σ2followsaChi-squareddistributionwithn-1degreesoffreedom.ThesumofsquaresaboutthemeanofanyNormalsamplewithvarianceσ2followsthedistributionofaChi-squaredvariablemultipliedbyσ2.Itthereforehasexpectedvalue(n-1)σ2andwedividebyn-1togivetheestimateofσ2.
![Page 209: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/209.jpg)
Thus,providedthedataarefromaNormaldistribution,notonlydoesthesamplemeanfollowaNormaldistribution,butthesamplevarianceisfromaChi-squareddistributiontimesσ2/(n-1).BecausethesquarerootoftheChi-squareddistributionconvergesquiterapidlytotheNormal,thedistributionofthesamplestandarddeviationisapproximatelyNormalforn>20,providedthedatathemselvesarefromaNormaldistribution.AnotherimportantpropertyofthevariancesofNormalsamplesisthat,ifwetakemanyrandomsamplesfromthesamepopulation,thesamplevarianceandsamplemeanareindependentif,
andonlyif,thedataarefromaNormaldistribution.
TheFdistributionwithmandndegreesoffreedomisthedistributionof(χ2m)/(χ2n/n),thetworatiooftwoindependentX2variableseachdividedbyitsdegreesoffreedom.Thisdistributionisusedforcomparingvariances.IfwehavetwoindependentestimatesofthesamevariancecalculatedfromNormaldata,thevarianceratiowillfollowtheFdistribution.Wecanusethisforcomparingtwoestimatesofvariance(§10.8),butitmainusesareincomparinggroupsofmeans(§10.9)andinexaminingtheeffectsofseveralfactorstogether(§17.2).
7MMultiplechoicequestions32to37(Eachbranchiseithertrueorfalse)
32.TheNormaldistribution:
(a)isalsocalledtheGaussiandistribution;
(b)isfollowedbymanyvariables;
(c)isafamilyofdistributionswithtwoparameters;
![Page 210: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/210.jpg)
(d)isfollowedbyallmeasurementsmadeinhealthypeople;
(e)isthedistributiontowardswhichthePoissondistributiontendsasitsmeanincreases.
ViewAnswer
33.TheStandardNormaldistribution:
(a)isskewtotheleft;
(b)hasmean=1.0;
(c)hasstandarddeviation=0.0;
(d)hasvariance=1.0;
(e)hasthemedianequaltothemean.
ViewAnswer
34.ThePEFRsofagroupof11-year-oldgirlsfollowaNormaldistributionwithmean300litre/minandastandarddeviation20litre/min:
(a)about95%ofthegirlshavePEFRbetween260and340litre/min;
(b)50%ofthegirlshavePEFRabove300litre/min;
(c)thegirlshavehealthylungs;
(d)about5%ofgirlshavePEFRbelow260litre/min;
(e)allthePEFRsmustbelessthan340litre/min.
ViewAnswer
35.Themeanofalargesample:
(a)isalwaysgreaterthanthemedian;
(b)iscalculatedfromtheformulaΣxn/n
(c)isfromanapproximatelyNormaldistribution;
![Page 211: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/211.jpg)
(d)increasesasthesamplesizeincreases;
(e)isalwaysgreaterthanthestandarddeviation.
ViewAnswer
36.IfXandYareindependentvariableswhichfollowStandardNormaldistributions,aNormaldistributionisalsofollowedby:
(a)5X;
(b)X2;
(c)X+5;
(d)X-Y;
(e)X/Y.
ViewAnswer
37.WhenaNormalplotisdrawnwiththeStandardNormaldeviateontheyaxis:
(a)astraightlineindicatesthatobservationsarefromaNormalDistribution;
(b)acurvewithdecreasingslopeindicatespositiveskewness;
(c)an‘S’shapedcurve(orogive)indicateslongtails;
(d)averticallinewilloccurifallobservationsareequal;
(e)ifthereisastraightlineitsslopedependsonthestandarddeviation.
ViewAnswer
7EExercise:ANormalplotInthisexerciseweshallreturntothebloodglucosedataof§4EandtrytodecidehowwelltheyconformtoaNormaldistribution.
1.Fromtheboxandwhiskerplotandthehistogramfoundinexercise§4E(ifyouhavenottriedexercise§4Eseethesolutionin
![Page 212: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/212.jpg)
Chapter19),dothebloodglucoselevelslooklikeaNormaldistribution?
ViewAnswer
2.ConstructaNormalplotforthedata.Thisisquiteeasyastheyareorderedalready.Find(i-0.5)/nfori=1to40andobtainthecorrespondingcumulativeNormalprobabilitiesfromTable7.1.Nowplottheseprobabilitiesagainstthecorrespondingbloodglucose.
ViewAnswer
3.Doestheplotappeartogiveastraightline?DothedatafollowaNormaldistribution?
ViewAnswer
![Page 213: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/213.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>8-Estimation
8
Estimation
8.1SamplingdistributionsWehaveseeninChapter3howsamplesaredrawnfrommuchlargerpopulations.Dataarecollectedaboutthesamplesothatwecanfindoutsomethingaboutthepopulation.Weusesamplestoestimatequantitiessuchasdiseaseprevalence,meanbloodpressure,meanexposuretoacarcinogen,etc.Wealsowanttoknowbyhowmuchtheseestimatesmightvaryfromsampletosample.
InChapters6and7wesawhowthetheoryofprobabilityenablesustolinkrandomsampleswiththepopulationsfromwhichtheyaredrawn.Inthischapterweshallseehowprobabilitytheoryenablesustousesamplestoestimatequantitiesinpopulations,andtodeterminetheprecisionoftheseestimates.Firstweshallconsiderwhathappenswhenwedrawrepeatsamplesfromthesamepopulation.Table8.1showsasetof100randomdigitswhichwecanuseasthepopulationforasamplingexperiment.ThedistributionofthenumbersinthispopulationisshowninFigure8.1.Thepopulationmeanis4.7andthestandarddeviationis2.9.
Thesamplingexperimentisdonebyusingasuitablerandomsamplingmethodtodrawrepeatedsamplesfromthepopulation.Inthiscasedecimaldicewereaconvenientmethod.Asampleofsizefourwaschosen:6,4,6and1.Themeanwascalculated:17/4=4.25.Thiswasrepeatedtodrawasecondsampleof4numbers:7,8,1,8.Theirmeanis6,00.Thissamplingprocedurewasdone20timesaltogether,togivethesamplesandtheirmeansshowninTable8.2.
![Page 214: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/214.jpg)
Thesesamplemeansarenotallthesame.Theyshowrandomvariation.Ifwewereabletodrawallofthe3921225possiblesamplesofsize4andcalculatetheirmeans,thesemeansthemselveswouldformadistribution.Our20samplemeansarethemselvesasamplefromthisdistribution.Thedistributionofallpossiblesamplemeansiscalledthesamplingdistributionofthemean.Ingeneral,thesamplingdistributionofanystatisticisthedistributionofthevaluesofthe
statisticwhichwouldarisefromallpossiblesamples.
Table8.1.Populationof100randomdigitsforasamplingexperiment
9 1 0 7 5 6 9 5 8 8 1 0 5 7
1 8 8 8 5 2 4 8 3 1 6 5 5 7
2 8 1 8 5 8 4 0 1 9 2 1 6 9
1 9 7 9 7 2 7 7 0 8 1 6 3 8
7 0 2 8 8 7 2 5 4 1 8 6 8 3
![Page 215: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/215.jpg)
Fig.8.1.DistributionofthepopulationofTable8.1
8.2StandarderrorofasamplemeanForthemomentweshallconsiderthesamplingdistributionofthemeanonly.Asoursampleof20meansisarandomsamplefromit,wecanusethistoestimatesomeoftheparametersofthedistribution.Thetwentymeanshavetheirownmeanandstandarddeviation.Themeanis5.1andthestandarddeviationis1.1.Nowthemeanofthewholepopulationis4.7,whichisclosetothemeanofthesamples.Butthestandarddeviationofthepopulationis2.9,whichisconsiderablygreaterthanthatofthesamplemeans.Ifweplotahistogramforthesampleofmeans(Figure8.2)weseethatthecentreofthesamplingdistributionandtheparentpopulationdistributionarethesame,butthescatterofthesamplingdistributionismuchless.
![Page 216: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/216.jpg)
Table8.2.Randomsamplesdrawninasamplingexperiment
Sample 6 7 7 1 5 5 4
4 8 9 8 2 5 2
6 1 2 8 9 7 7
1 8 7 4 5 8 6
Mean 4.25 6.00 6.25 5.25 5.25 6.25 4.75
Sample 7 7 2 8 3 4 5
8 3 5 0 7 8 5
7 8 0 7 4 7 8
![Page 217: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/217.jpg)
2 7 8 7 8 7 3
Mean 6.00 6.25 3.75 5.50 5.50 6.50 5.25
Fig.8.2.DistributionofthepopulationofTable8.1andofthesampleofthemeansofTable8.2
Thesamplemeanisanestimateofthepopulationmean.Thestandarddeviationofitssamplingdistributioniscalledthestandarderroroftheestimate.Itprovidesameasureofhowfarfromthetruevaluetheestimateislikelytobe.Inmostestimation,theestimateislikelytobewithinonestandarderrorofthetruemeanandunlikelytobemorethantwostandarderrorsfromit.Weshalllookatthismorepreciselyin§8.3.
Inalmostallpracticalsituationswedonotknowthetruevalueofthe
![Page 218: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/218.jpg)
populationvarianceσ2butonlyitsestimates2(§4.7).Wecanusethistoestimatethestandarderrorbys/√n.Thisestimateisalsoreferredtoasthestandarderrorofthemean.Itisusuallyclearfromthecontextwhetherthestandarderroristhetruevalueorthatestimatedfromthedata.
Whenthesamplesizenislarge,thesamplingdistributionof[xwithbarabove]tendstoaNormaldistribution.Also,wecanassumethats2isagoodestimateofσ2.Soforlargen[xwithbarabove],is,ineffect,anobservationfromaNormaldistributionwithmeanµandstandarddeviationestimatedbys/√n.Sowithprobability0.95,xiswithintwo,ormorepreciselyiswithin1.96standarderrorsofµ.WithsmallsampleswecannotassumeeitheraNormaldistributionor,moreimportantly,
thats2isagoodestimateofσ2.WeshalldiscussthisinChapter10.
Fig.8.3.SamplesofmeansfromaStandardNormalvariable
![Page 219: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/219.jpg)
Themeanandstandarderrorareoftenwrittenas4.062±0.089.Thisisrathermisleading,asthetruevaluemaybeuptotwostandarderrorsfromthemeanwithareasonableprobability.Thispracticeisnotrecommended.
Thereisoftenconfusionbetweentheterms‘standarderror’and‘standarddeviation’.Thisisunderstandable,asthestandarderrorisastandarddeviation(ofthesamplingdistribution)andthetermsareofteninterchangedinthiscontext.Theconventionisthis:weusetheterm‘standarderror’whenwemeasuretheprecisionofestimates,andtheterm‘standarddeviation’whenweareconcernedwiththevariabilityofsamples,populationsordistributions.IfwewanttosayhowgoodourestimateofthemeanFEV1measurementis,wequotethestandarderrorofthemean.IfwewanttosayhowwidelyscatteredtheFEV1measurementsare,wequotethestandarddeviation,s.
8.3ConfidenceintervalsTheestimateofmeanFEV1isasinglevalueandsoiscalledapointestimate.Thereisnoreasontosupposethatthepopulationmeanwillbeexactlyequaltothepointestimate,thesamplemean.Itislikelytobeclosetoit,however,andtheamountbywhichitislikelytodifferfromtheestimatecanbefound
fromthestandarderror.Whatwedoisfindlimitswhicharelikelytoincludethepopulationmean,andsaythatweestimatethepopulationmeantoliesomewhereintheinterval(thesetofallpossiblevalues)betweentheselimits.Thisiscalledanintervalestimate.
![Page 220: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/220.jpg)
Fig.8.4.Samplingdistributionofthemeanof4observationsfromaStandardNormaldistribution
Forinstance,ifweregardthe57FEVmeasurementsasbeingalargesamplewecanassumethatthesamplingdistributionofthemeanisNormal,andthatthestandarderrorisagoodestimateofitsstandarddeviation(see§10.6foradiscussionofhowlargeislarge).Wethereforeexpectabout95%ofsuchmeanstobewithin1.96standarderrorsofthepopulationmean,µ.Hence,forabout95%ofallpossiblesamples,thepopulationmeanmustbegreaterthanthesamplemeanminus1.96standarderrorsandlessthanthesamplemeanplus1.96standarderrors.Ifwecalculatedx-1.96seandx+1.96seforallpossiblesamples,95%ofsuchintervalswouldcontainthepopulationmean.Inthiscasetheselimitsare4.062-1.96×0.089to4.062+1.96×0.089whichgives3.89to4.24,or3.9to4.2litres,roundingtotwosignificantfigures;3.9and4.2arecalledthe95%confidencelimitsfortheestimate,andthesetofvaluesbetween3.9and4.2iscalledthe95%confidenceinterval.Theconfidencelimitsarethevaluesattheendsoftheconfidenceinterval.
Strictlyspeaking,itisincorrecttosaythatthereisaprobabilityof0.95thatthepopulationmeanliesbetween3.9and4.2,thoughitisoften
![Page 221: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/221.jpg)
putthatway(evenbyme).Thepopulationmeanisanumber,notarandomvariable,andhasnoprobability.Itistheprobabilitythatlimitscalculatedfromarandomsamplewillincludethepopulationvaluewhichis95%.Figure8.5showsconfidenceintervalsforthemeanfor20randomsamplesof100observationsfromtheStandardNormaldistribution.Thepopulationmeanis,ofcourse,0.0,shownbythehorizontalline.Somesamplemeansarecloseto0.0,somefurtheraway,someaboveandsomebelow.Thepopulationmeaniscontainedby19ofthe20confidenceintervals.Ingeneral,for95%ofconfidenceintervalsitwillbetrueto
saythatthepopulationvaluelieswithintheinterval.Wejustdon'tknowwhich95%.Weexpressthisbysayingthatweare95%confidentthatthemeanliesbetweentheselimits.
Fig.8.5.Meanand95%confidenceintervalfor20randomsamplesof100observationsfromtheStandardNormaldistribution
![Page 222: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/222.jpg)
IntheFEV1example,thesamplingdistributionofthemeanisNormalanditsstandarddeviationiswellestimatedbecausethesampleislarge.Thisisnotalwaystrueandalthoughitisusuallypossibletocalculateconfidenceintervalsforanestimatetheyarenotallquiteassimpleasthatforthemeanestimatedfromalargesample.Weshalllookatthemeanestimatedfromasmallsamplein§10.2.
Thereisnonecessityfortheconfidenceintervaltohaveaprobabilityof95%.Forexample,wecanalsocalculate99%confidencelimits.Theupper0.5%pointoftheStandardNormaldistributionis2.58(Table7.2),sotheprobabilityofaStandardNormaldeviatebeingabove2.58orbelow-2.58is1%andtheprobabilityofbeingwithintheselimitsis99%.The99%confidencelimitsforthemeanFEV1aretherefore,4.062-2.58×0.089and4.062+2.58×0.089,i.e.3.8and4.3litres.Thesegiveawiderintervalthanthe95%limits,aswewouldexpectsincewearemoreconfidentthatthemeanwillbeincluded.Theprobabilitywechooseforaconfidenceintervalisthusacompromisebetweenthedesiretoincludetheestimatedpopulationvalueandthedesiretoavoidpartsofscalewherethereisalowprobabilitythatthemeanwillbefound.Formostpurposes,95%confidenceintervalshavebeenfoundtobesatisfactory.
Standarderrorisnottheonlywayinwhichwecancalculateconfidenceintervals,althoughatpresentitistheoneusedformostproblems.In§8.8Idescribeadifferentapproachbasedontheexactprobabilitiesofadistribution,whichrequiresnolargesampleassumption.In§8.9IdescribealargesamplemethodwhichusestheBinomialdistributiondirectly.Thereareothers,whichIshallomitbecausetheyarerarelyused.
8.4StandarderrorandconfidenceintervalforaproportionThestandarderrorofaproportionestimatecanbecalculatedinthesameway.Supposetheproportionofindividualswhohaveaparticularconditioninagivenpopulationisp,andwetakearandomsampleofsizen,thenumberobservedwiththeconditionbeingr.Thenthe
![Page 223: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/223.jpg)
estimatedproportionisr/n.Wehaveseen(§6.4)thatrcomesfromaBinomialdistributionwithmeannpandvariancenp(1-p).Providednislarge,thisdistributionisapproximatelyNormal.Sor/n,theestimatedproportion,isNormallydistributedwithmeangivenbynp/n=p,andvariancegivenby
sincenisconstant,andthestandarderroris
Wecanestimatethisbyreplacingpbyr/n.
ThestandarderroroftheproportionisonlyofuseifthesampleislargeenoughfortheNormalapproximationtoapply.Aroughguidetothisisthatnpandn(1-p)shouldbothexceed5.Thisisusuallythecasewhenweareconcernedwithstraightforwardestimation.Ifwetrytousethemethodforsmallersamples,wemaygetabsurdresults.Forexample,inastudyoftheprevalenceofHIVinex-prisoners(Turnbulletal.1992),of29womenwhodidnotinjectdrugsonewasHIVpositive.Theauthorsreportedthistobe3.4%,witha95%confidenceinterval-3.1%to9.9%.Thelowerlimitof-3.1%,obtainedfromtheobservedproportionminus1.96standarderrors,isimpossible.AsNewcombe(1992)pointedout,thecorrect95%confidenceintervalcanbeobtainedfromtheexactprobabilitiesoftheBinomialdistributionandis0.1%to17.8%(§8.8).
8.5ThedifferencebetweentwomeansInmanystudieswearemoreinterestedinthedifferencebetweentwoparametersthanintheirabsolutevalue.Thesecouldbemeans,
![Page 224: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/224.jpg)
proportions,theslopesoflines,andmanyotherstatistics.WhensamplesarelargewecanassumethatsamplemeansandproportionsareobservationsfromaNormaldistribution,andthatthecalculatedstandarderrorsaregoodestimatesofthestandarddeviations
oftheseNormaldistributions.Wecanusethistofindconfidenceintervals.
Forexample,supposewewishtocomparethemeans,[xwithbarabove]1and[xwithbarabove]2,oftwolargesamples,sizesn1andn2.Theexpecteddifferencebetweenthesamplemeansisequaltothedifferencebetweenthepopulationmeans,i.e.E([xwithbarabove]1-[xwithbarabove]2)=µ1-µ2.Whatisthestandarderrorofthedifference?Thevarianceofthedifferencebetweentwoindependentrandomvariablesisthesumoftheirvariances(§6.6).Hence,thestandarderrorofthedifferencebetweentwoindependentestimatesisthesquarerootofthesumofthesquaresoftheirstandarderrors.Thestandarderrorofameanis√s2/n,sothestandarderrorofthedifferencebetweentwoindependentmeansis
Foranexample,inastudyofrespiratorysymptomsinschoolchildren(Blandetal.1974),wewantedtoknowwhetherchildrenreportedbytheirparentstohaverespiratorysymptomshadworselungfunctionthanchildrenwhowerenotreportedtohavesymptoms.Ninety-twochildrenwerereportedtohavecoughduringthedayoratnight,andtheirmeanPEFRwas294.8litre/minwithstandarddeviation57.1litre/min,and1643childrenwerenotreportedtohavethissymptom,theirmeanPEFRbeing313.6litre/minwithstandarddeviation55.2litre/min.Wethushavetwolargesamples,andcanapplytheNormaldistribution.Wehave
n1=92,[xwithbarabove]1=294.8,s1=57.1,n2=1643,[xwithbarabove]2=313.6,s2=55.2
![Page 225: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/225.jpg)
Thedifferencebetweenthetwogroupsis[xwithbarabove]1-[xwithbarabove]2=294.8-313.6=-18.8.Thestandarderrorofthedifferenceis
Weshalltreatthesampleasbeinglarge,sothedifferencebetweenthemeanscanbeassumedtocomefromaNormaldistributionandtheestimatedstandarderrortobeagoodestimateofthestandarddeviationofthisdistribution.(Forsmallsamplessee§10.3and§10.6.)The95%confidencelimitsforthedifferencearethus-18.8-1.96×6.11and-18.8+1.96×6.11,i.e.-6.8and-30.8litre/min.Theconfidenceintervaldoesnotincludezero,sowehavegoodevidencethat,inthispopulation,childrenreportedtohavedayornightcoughhavelowermeanPEFRthanothers.Thedifferenceisestimatedtobebetween7and31litre/minlowerinchildrenwiththesymptom,soitmaybequitesmall.
Whenwehavepaireddata,suchasacross-overtrial(§2.6)oramatchedcase-controlstudy(§3.8),thetwo-samplemethoddoesnotwork.Instead,wecalculatethedifferencesbetweenthepairedobservationsforeachsubject,thenfindthemeandifference,itsstandarderrorandconfidenceintervalasin§8.3.
Table8.3.Coughduringthedayoratnightatage14andbronchitisbeforeage5(Hollandetal.1978)
Coughat14Bronchitisat5
TotalYes No
Yes 26 44 70
No 247 1002 1249
![Page 226: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/226.jpg)
Total 273 1046 1319
8.6Comparisonoftwoproportions
ProvidedtheconditionsofNormalapproximationaremet(see§8.4)wecanfindaconfidenceintervalforthedifferenceintheusualway.
Forexample,considerTable8.3.Theresearcherswantedtoknowtowhatextentchildrenwithbronchitisininfancygetmorerespiratorysymptomsinlaterlifethanothers.Wecanestimatethedifferencebetweentheproportionsreportedtocoughduringthedayoratnightamongchildrenwithandchildrenwithoutahistoryofbronchitisbeforeage5years.Wehaveestimatesoftwoproportions,p1=26/273=0.09524andp2=44/1046=0.04207.Thedifferencebetweenthemisp1-p2=0.09524-0.04207=0.05317.Thestandarderrorofthedifferenceis
The95%confidenceintervalforthedifferenceis0.05317-1.96×0.0188to0.05317+1.96×0.0188=0.016to0.090.Althoughthedifferenceisnotverypreciselyestimated,theconfidenceintervaldoesnotincludezeroandgivesusclearevidencethatchildrenwith
![Page 227: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/227.jpg)
bronchitisreportedininfancyaremorelikelythanotherstobereportedtohaverespiratorysymptomsinlaterlife.Thedataonlungfunctionin§8.5givesussomereasontosupposethatthisisnotentirelyduetoresponsebias(§3.9).Asin§8.4,theconfidenceintervalmustbeestimated
differentlyforsmallsamples.
Thisdifferenceinproportionsmaynotbeveryeasytointerpret.Theratiooftwoproportionsisoftenmoreuseful.Anothermethod,theoddsratio,isdescribedin§13.7.Theratiooftheproportionwithcoughatage14forbronchitisbefore5totheproportionwithcoughatage14forthosewithoutbronchitisbefore5isp1/p2=0.09524/0.04207=2.26.Childrenwithbronchitisbefore5aremorethantwiceaslikelytocoughduringthedayoratnightatage14thanchildrenwithnosuchhistory.
Thestandarderrorforthisratioiscomplex,andasitisaratioratherthanadifferenceitdoesnotapproximatewelltoaNormaldistribution.Ifwetakethelogarithmoftheratio,however,wegetthedifferencebetweentwologarithms,becauselog(p1/p2)=log(p1)-log(p2)(§5A).Wecanfindthestandarderrorforthelogratioquiteeasily.Weusetheresultthat,foranyrandomvariableXwithmeanµandvarianceσ2,theapproximatevarianceoflog(X)isgivenbyVAR(loge(X))=σ2/µ2(seeKendallandStuart1969).Hence,thevarianceoflog(p)is
Forthedifferencebetweenthetwologarithmsweget
Thestandarderroristhesquarerootofthis.(Thisformulaisoftenwrittenintermsoffrequencies,butIthinkthisversionisclearer.)Fortheexamplethelogratioisloge(2.26385)=0.81707andthestandarderroris
![Page 228: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/228.jpg)
The95%confidenceintervalforthelogratioistherefore0.81707-1.96×0.23784to0.81707+1.96×0.23784=0.35089to1.28324.The95%confidenceintervalfortheratioofproportionsitselfistheantilogofthis:e0.35089toe1.28324=1.42to3.61.Thusweestimatethattheproportionofchildrenreportedtocoughduringthedayoratnightamongthosewithahistoryofbronchitisisbetween1.4to3.6timestheproportionamongthosewithoutahistoryofbronchitis.
Theproportionofindividualsinapopulationwhodevelopadiseaseorsymptomisequaltotheprobabilitythatanygivenindividualwilldevelopthedisease,calledtheriskofanindividualdevelopingadisease.ThusinTable8.3therisk
thatachildwithbronchitisbeforeage5willcoughatage14is26/273=0.09524,andtheriskforachildwithoutbronchitisbeforeage5is44/1046=0.04207.Tocomparerisksforpeoplewithandwithoutaparticularriskfactor,welookattheratiooftheriskwiththefactortotheriskwithoutthefactor,therelativerisk.Therelativeriskofcoughatage14forbronchitisbefore5isthus2.26.Toestimatetherelativeriskdirectly,weneedacohortstudy(§3.7)asinTable8.3.Weestimaterelativeriskforacase-controlstudyinadifferentway(§13.7).
Intheunusualsitutationwhenthesamplesarepaired,eithermatchedortwoobservationsonthesamesubject,weuseadifferentmethod(§13.9).
8.7*Standarderrorofasamplestandarddeviation
![Page 229: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/229.jpg)
8.8*ConfidenceintervalforaproportionwhennumbersaresmallIn§8.4Imentionedthatthestandarderrormethodforaproportiondoesnotworkwhenthesampleissmall.Instead,theconfidenceintervalcanbefoundusingtheexactprobabilitiesoftheBinomialdistribution.Themethodworkslikethis.Givenn,wefindthevaluePLfortheparameterpoftheBinomialdistributionwhichgivesaprobability0.025ofgettinganobservednumberofsuccesses,r,asbigasorbiggerthanthevalueobserved.Wedothisbycalculatingtheprobabilitiesfromtheformulain§6.4,iteratingrounddifferentpossiblevaluesofpuntilwegettherightone.WealsofindthevaluepUfortheparameterpoftheBinomialdistributionwhichgivesaprobability0.025ofgettinganobservednumberofsuccessesassmallasorsmallerthanthevalueobserved.Theexact95%confidenceintervalisPLtopU.Forexample,supposeweobserve3successesoutof10trials.TheBinomialdistributionwithn=10whichhasthetotalprobabilityfor3ormoresuccessesequalto0.025hasparameterp=0.067.Thedistributionwhichhasthetotalprobabilityfor3orfewersuccessesequalto0.025hasp=0.652.Hencethe95%confidenceintervalfortheproportioninthepopulationis0.067to0.652.Figure8.6showsthetwodistributions.Nolargesampleapproximationisrequiredandwecanusethisforanysizeofsample.PearsonandHartley(1970)giveatableforcalculatingexactBinomialconfidenceintervals.Evenbetter,youcandownloadafreeprogramfrommywebsite(§1.3).
![Page 230: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/230.jpg)
Fig.8.6.Distributionsshowingthecalculationoftheexactconfidenceintervalforthreesuccessesoutoftentrials.
Unlesstheobservedproportioniszeroorone,thesevaluesareneverincludedintheexactconfidenceinterval.Thepopulationproportionofsuccessescannotbezeroifwehaveobservedasuccessinthesample.Itcannotbeoneifwehaveobservedafailure.
8.9*Confidenceintervalforamedianandotherquantiles
Weroundjandkuptothenextinteger.Thenthe95%confidenceintervalisbetweenthejthandthekthobservationsintheordered
![Page 231: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/231.jpg)
data.Forthe57FEVmeasurementsofTable4.4,themedianwas4.1litres(§4.5).Forthe95%confidenceintervalforthemedian,n=57andq=0.5,and
The95%confidenceintervalisthusfromthe22ndtothe36thobservation,3.75to4.30litresfromTable4.4.Comparethistothe95%confidenceintervalforthemean,3.9to4.2litres,whichiscompletelyincludedintheintervalforthemedian.Thismethodofestimatingpercentilesisrelativelyimprecise.Anotherexampleisgiven§15.5.
8.10Whatisthecorrectconfidenceinterval?Aconfidenceintervalonlyestimateserrorsduetosampling.Theydonotallowforanybiasinthesampleandgiveusanestimateforthepopulationofwhichourdatacanbeconsideredarandomsample.Asdiscussedin§3.5,itisoftennotclearwhatthispopulationis,andwerelyfarmoreontheestimationofdifferencesthanabsolutevalues.Thisisparticularlytrueinclinicaltrials.Westartwithpatientsinonelocality,excludesome,allowrefusals,andthepatientscannotberegardedasarandomsampleofpatientsingeneral.However,wethenrandomizeintotwogroupswhicharethentwosamplesfromthesamepopulation,andonlythetreatmentdiffersbetweenthem.Thusthedifferenceisthethingwewanttheconfidenceintervalfor,notforeithergroupseparately.Yetresearchersoftenignorethedirectcomparisoninfavourofestimationusingeachgroupseparately.
Forexample,Salvesenetal.(1992)reportedfollow-upoftworandomizedcontrolledtrialsofroutineultrasonographyscreeningduringpregnancy.Atages8to9years,childrenofwomenwhohadtakenpartinthesetrialswerefollowedup.Asubgroupofchildrenunderwentspecifictestsfordyslexia.Thetestresultsclassified21ofthe309screenedchildren(7%,95%confidenceinterval3-10%)and26ofthe294controls(9%,95%confidenceinterval4–12%)asdyslexic.Muchmoreusefulwouldbeaconfidenceintervalforthedifferencebetweenprevalences(-6.3to2.2percentagepoints)ortheirratio(0.44to1.34),
![Page 232: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/232.jpg)
becausewecouldthencomparethegroupsdirectly.
8MMultiplechoicequestions38to43(Eachbranchiseithertrueorfalse)
38.Thestandarderrorofthemeanofasample:
(a)measuresthevariabilityoftheobservations;
(b)istheaccuracywithwhicheachobservationismeasured;
(c)isameasureofhowfarthesamplemeanislikelytobefromthepopulationmean;
(d)isproportionaltothenumberofobservations;
(e)isgreaterthantheestimatedstandarddeviationofthepopulation.
ViewAnswer
39.The95%confidencelimitsforthemeanestimatedfromasetofobservations
(a)arelimitsbetweenwhich,inthelongrun,95%ofobservationsfall;
(b)areawayofmeasuringtheprecisionoftheestimateofthemean;
(c)arelimitswithinwhichthesamplemeanfallswithprobability0.95;
(d)arelimitswhichwouldincludethepopulationmeanfor95%ofpossiblesamples;
(e)areawayofmeasuringthevariabilityofasetofobservations.
ViewAnswer
40.Ifthesizeofarandomsamplewereincreased,wewouldexpect:
![Page 233: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/233.jpg)
(a)themeantodecrease;
(b)thestandarderrorofthemeantodecrease;
(c)thestandarddeviationtodecrease;
(d)thesamplevariancetoincrease;
(e)thedegreesoffreedomfortheestimatedvariancetoincrease.
ViewAnswer
41.Theprevalenceofaconditioninapopulationis0.1.Iftheprevalenceisestimatedrepeatedlyfromsamplesofsize100,theseestimateswillformadistributionwhich:
(a)isasamplingdistribution;
(b)isapproximatelyNormal;
(c)hasmean=0.1;
(d)havevariance=9;
(e)isBinomial.
ViewAnswer
42.ItisnecessarytoestimatethemeanFEV1bydrawingasamplefromalargepopulation.Theaccuracyoftheestimatewilldependon:
(a)themeanFEV1inthepopulation;
(b)thenumberinthepopulation;
(c)thenumberinthesample;
(d)thewaythesampleisselected;
(e)thevarianceofFEV1inthepopulation.
ViewAnswer
43.Inastudyof88birthstowomenwithahistoryofthrombocytopenia(Samuelsetal.1990),thesameconditionwas
![Page 234: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/234.jpg)
recordedin20%ofbabies(95%confidenceinterval13%to30%,exactmethod):
(a)Anothersampleofthesamesizewillshowarateofthrombocytopeniabetween13%and30%;
(b)95%ofsuchwomenhaveaprobabilityofbetween13%and30%ofhavingababywiththrombocytopenia;
(c)Itislikelythatbetween13%and30%ofbirthstosuchwomenintheareawouldshowthrombocytopenia;
(d)Ifthesamplewereincreasedto880births,the95%confidenceintervalwouldbenarrower;
(e)Itwouldbeimpossibletogetthesedataiftherateforallwomenwas10%.
ViewAnswer
8EExercise:MeansoflargesamplesTable8.4summarizesdatacollectedinastudyofplasmamagnesiumindiabetics.Thediabeticsubjectswereallinsulin-dependentsubjectsattendingadiabeticclinicovera5monthperiod.Thenon-diabeticcontrolswereamixtureofblooddonorsandpeopleattendingdaycentresfortheelderly,togiveawideage
distribution.PlasmamagnesiumfollowsaNormaldistributionveryclosely.
Table8.4.Plasmamagnesiumininsulin-dependentdiabeticsandhealthycontrols
Number Mean Standarddeviation
![Page 235: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/235.jpg)
Insulin-dependentdiabetics
227 0.719 0.068
Non-diabeticcontrols 140 0.810 0.057
Fig.8.7.Distributionofmagnesiumindiabeticsandcontrols,showingtheproportionofdiabeticsabovethelowerlimitofreferenceinterval
1.Calculateanintervalwhichwouldinclude95%ofplasmamagnesiummeasurementsfromthecontrolpopulation.Thisiswhatwecallthe95%referenceinterval,describedindetailin§15.5.Ittellsussomethingaboutthedistributionofplasmamagnesiuminthepopulation.
ViewAnswer
2.Whatproportionofinsulin-dependentdiabeticswouldliewithinthis95%referenceinterval?(Hint:findhowmanystandarddeviationsfromthediabeticmeanthelowerlimitis,thenusethetableoftheNormaldistribution,Table7.1,tofindtheprobabilityofexceedingthis.SeeFigure8.7.)
![Page 236: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/236.jpg)
ViewAnswer
3.Findthestandarderrorofthemeanplasmamagnesiumforeachgroup.
ViewAnswer
4.Finda95%confidenceintervalforthemeanplasmamagnesiuminthehealthypopulation.Howdoestheconfidenceintervaldifferfromthe95%referenceinterval?Whyaretheydifferent?
ViewAnswer
5.Findthestandarderrorofthedifferenceinmeanplasmamagnesiumbetweeninsulin-dependentdiabeticsandhealthypeople.
ViewAnswer
6.Finda95%confidenceintervalforthedifferenceinmeanplasmamagnesiumbetweeninsulin-dependentdiabeticsandhealthypeople.Isthereanyevidencethatdiabeticshavelowerplasmamagnesiumthannon-diabeticsinthepopulationfromwhichthesedatacome?
ViewAnswer
7.Wouldplasmamagnesiumbeagooddiagnostictestfordiabetes?
ViewAnswer
![Page 237: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/237.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>9-Significancetests
9
Significancetests
9.1TestingahypothesisInChapter8Idealtwithestimationandtheprecisionofestimates.Thisisoneformofstatisticalinference,theprocessbywhichweusesamplestodrawconclusionsaboutthepopulationsfromwhichtheyaretaken.InthischapterIshallintroduceadifferentformofinference,thesignificancetestorhypothesistest.
Asignificancetestenablesustomeasurethestrengthoftheevidencewhichthedatasupplyconcerningsomepropositionofinterest.Forexample,considerthecross-overtrialofpronethalolforthetreatmentofangina(§2.6).Table9.1showsthenumberofattacksoverfourweeksoneachtreatment.These12patientsareasamplefromthepopulationofallpatients.Wouldtheothermembersofthispopulationexperiencefewerattackswhileusingpronethalol?Wecanseethatthenumberofattacksishighlyvariablefromonepatienttoanother,anditisquitepossiblethatthisistruefromoneperiodoftimetoanotheraswell.Soitcouldbethatsomepatientswouldhavefewerattackswhileonpronethalolthanwhileonplaceboquitebychance.Inasignificancetest,weaskwhetherthedifferenceobservedwassmallenoughtohaveoccurredbychanceiftherewerereallynodifferenceinthepopulation.Ifitwereso,thentheevidenceinfavouroftherebeingadifferencebetweenthetreatmentperiodswouldbeweakorabsent.Ontheotherhand,ifthedifferenceweremuchlargerthanwewouldexpectduetochanceiftherewerenorealpopulationdifference,thentheevidenceinfavourofarealdifferencewouldbestrong.
Tocarryoutthetestofsignificancewesupposethat,inthepopulation,
![Page 238: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/238.jpg)
thereisnodifferencebetweenthetwotreatments.Thehypothesisof‘nodifference’or‘noeffect’inthepopulationiscalledthenullhypothesis.Ifthisisnottrue,thenthealternativehypothesismustbetrue,thatthereisadifferencebetweenthetreatmentsinonedirectionortheother.Wethenfindtheprobabilityofgettingdataasdifferentfromwhatwouldbeexpected,ifthenullhypothesisweretrue,asarethosedataactuallyobserved.Ifthisprobabilityislargethedataareconsistentwiththenullhypothesis;ifitissmallthedataareunlikelytohavearisenifthenullhypothesisweretrueandtheevidenceisinfavourofthealternativehypothesis.
Table9.1.Trialofpronethalolforthepreventionofanginapectoris
Numberofattackswhileon
Differenceplacebo—pronethalol
Signofdifference
Placebo Pronethalol
71 29 42 +
323 348 -25 -
8 1 7 +
14 7 7 +
23 16 7 +
34 25 9 +
![Page 239: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/239.jpg)
79 65 14 +
60 41 19 +
2 0 2 +
3 0 3 +
17 15 2 +
7 2 5 +
9.2Anexample:ThesigntestIshallnowdescribeaparticulartestofsignificance,thesigntest,totestthenullhypothesisthatplaceboandpronethalolhavethesameeffectonangina.Considerthedifferencesbetweenthenumberofattacksonthetwotreatmentsforeachpatient,asinTable9.1.Ifthenullhypothesisweretrue,thendifferencesinnumberofattackswouldbejustaslikelytobepositiveasnegative,theywouldberandom.Theprobabilityofachangebeingnegativewouldbeequaltotheprobabilityofitbeingpositive,sobothprobabilitieswouldbe0.5.ThenthenumberofnegativeswouldbeanobservationfromaBinomialdistribution(§6.4)withn=12andp=0.5.(Iftherewereanysubjectswhohadthesamenumberofattacksonbothregimeswewouldomitthem,astheyprovidenoinformationaboutthedirectionofanydifferencebetweenthetreatments.Inthistest,nisthenumberofsubjectsforwhomthereisadifference,onewayortheother.)
Ifthenullhypothesisweretrue,whatwouldbetheprobabilityofgettinganobservationfromthisdistributionasextremeasthevaluewehaveactuallyobserved?Theexpectednumberofnegativeswouldbenp=6.Whatistheprobabilityofgettingavalueasfarfromexpectationasisthatobserved?Thenumberofnegativedifferencesis1.The
![Page 240: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/240.jpg)
probabilityofgettingonenegativechangeis
Thisisnotalikelyeventinitself.However,weareinterestedintheprobabilityofgettingavalueasfarorfurtherfromtheexpectedvalue,6,asis1,andclearly0isfurtherandmustbeincluded.Theprobabilityofnonegativechangesis
Sotheprobabilityofoneorfewernegativechangesis0.00293+0.00024=0.00317.Thenullhypothesisisthatthereisnodifference,sothealternativehypothesisisthatthereisadifferenceinonedirectionortheother.Wemust,therefore,considertheprobabilityofgettingavalueasextremeontheothersideofthemean,thatis11or12negatives(Figure9.1).Theprobabilityof11or12negativesisalso0.00317,becausethedistributionissymmetrical.Hence,theprobabilityofgettingasextremeavalueasthatobserved,ineitherdirection,is0.00317+0.00317=0.00634.Thismeansthatifthenullhypothesisweretruewewouldhaveasamplewhichissoextremethattheprobabilityofitarisingbychanceis0.006,lessthanoneinahundred.
![Page 241: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/241.jpg)
Fig.9.1.ExtremesoftheBinomialdistributionforthesigntest
Thus,wewouldhaveobservedaveryunlikelyeventifthenullhypothesisweretrue.Thismeansthatthedataarenotconsistentwithnullhypothesis,andwecanconcludethatthereisstrongevidenceinfavourofadifferencebetweenthetreatments.(Sincethiswasadoubleblindrandomizedtrial,itisreasonabletosupposethatthiswascausedbytheactivityofthedrug.)
9.3PrinciplesofsignificancetestsThesigntestisanexampleofatestofsignificance.Thenumberofnegativechangesiscalledtheteststatistic,somethingcalculatedfromthedatawhichcanbeusedtotestthenullhypothesis.Thegeneralprocedureforasignificancetestisasfollows.
1. Setupthenullhypothesisanditsalternative.
2. Findthevalueoftheteststatistic.
3. Refertheteststatistictoaknowndistributionwhichitwouldfollowifthenullhypothesisweretrue.
![Page 242: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/242.jpg)
4. Findtheprobabilityofavalueoftheteststatisticarisingwhichisasormoreextremethanthatobserved,ifthenullhypothesisweretrue.
5. Concludethatthedataareconsistentorinconsistentwiththenullhypothesis.
Weshalldealwithseveraldifferentsignificancetestsinthisandsubsequentchapters.Weshallseethattheyallfollowthispattern.
Ifthedataarenotconsistentwiththenullhypothesis,thedifferenceissaidtobestatisticallysignificant.Ifthedatadonotsupportthenullhypothesis,itissometimessaidthatwerejectthenullhypothesis,andifthedataareconsistentwiththenullhypothesisitissaidthatweacceptit.Suchan‘allornothing’decisionmakingapproachisseldomappropriateinmedicalresearch.Itispreferabletothinkofthesignificancetestprobabilityasanindexofthestrengthofevidenceagainstthenullhypothesis.Theterm‘acceptthenullhypothesis’isalsomisleadingbecauseitimpliesthatwehaveconcludedthatthenullhypothesisistrue,whichweshouldnotdo.Wecannotprovestatisticallythatsomething,suchasatreatmenteffect,doesnotexist.Itisbettertosaythatwehavenotrejectedorhavefailedtorejectthenullhypothesis.
TheprobabilityofsuchanextremevalueoftheteststatisticoccurringifthenullhypothesisweretrueisoftencalledthePvalue.Itisnottheprobabilitythatthenullhypothesisistrue.Thisisacommonmisconception.Thenullhypothesisiseithertrueoritisnot;itisnotrandomandhasnoprobability.Isuspectthatmanyresearchershavemanagedtousesignificancetestsquiteeffectivelydespiteholdingthisincorrectview.
9.4SignificancelevelsandtypesoferrorWemuststillconsiderthequestionofhowsmallissmall.Aprobabilityof0.006,asintheexampleabove,isclearlysmallandwehaveaquiteunlikelyevent.Butwhatabout0.06,or0.1?Supposewetakeaprobabilityof0.01orlessasconstitutingreasonableevidenceagainstthenullhypothesis.Ifthenullhypothesisistrue,weshallmakea
![Page 243: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/243.jpg)
wrongdecisiononeinahundredtimes.Decidingagainstatruenullhypothesisiscalledanerrorofthefirstkind,typeIerror,orαerror.Wegetanerrorofthesecondkind,typeIIerror,orβerrorifwedonotrejectanullhypothesiswhichisinfactfalse.(αandβaretheGreekletters‘alpha’and‘beta’.)Nowthesmallerwedemandtheprobabilitybebeforewedecideagainstthenullhypothesis,thelargertheobserveddifferencemustbe,andsothemorelikelywearetomissrealdifferences.Byreducingtheriskofanerrorofthefirstkindweincreasetheriskofanerrorofthesecondkind.
Theconventionalcompromiseistosaythatdifferencesaresignificantiftheprobabilityislessthan0.05.Thisisareasonableguide-line,butshouldnotbetakenassomekindofabsolutedemarcation.Thereisnotagreatdifferencebetweenprobabilitiesof0.06and0.04,andtheysurelyindicatesimilarstrengthofevidence.Itisbettertoregardprobabilitiesaround0.05asprovidingsomeevidenceagainstthenullhypothesis,whichincreasesinstrengthastheprobabilityfalls.Ifwedecidethatthedifferenceissignificant,theprobabilityissometimesreferredtoasthesignificancelevel.Wesaythatthesignificancelevelishigh
ifthePvalueislow.
Fig.9.2.One-andtwo-sidedtests
![Page 244: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/244.jpg)
Asaroughandreadyguide,wecanthinkofPvaluesasindicatingthestrengthofevidencelikethis:
greaterthan0.1:littleornoevidenceofadifferenceorrelationship
between0.05and0.1:weakevidenceofadifferenceorrelationship
between0.01and0.05:evidenceofadifferenceorrelationship
lessthan0.01:strongevidenceofadifferenceorrelationship
lessthan0.001:verystrongevidenceofadifferenceorrelationship
9.5One-andtwo-sidedtestsofsignificanceIntheaboveexample,thealternativehypothesiswasthattherewasadifferenceinonedirectionortheother.Thisiscalledatwo-sidedortwo-tailedtest,becauseweusedtheprobabilitiesofextremevaluesinbothdirections.Itwouldhavebeenpossibletohavethealternativehypothesisthattherewasadecreaseinthepronethaloldirection,inwhichcasethenullhypothesiswouldbethatthenumberofattacksontheplacebowaslessthanorequaltothenumberonpronethalol.ThiswouldgiveP=0.00317,andofcourse,ahighersignificancelevelthanthetwosidedtest.Thiswouldbeaone-sidedorone-tailedtest(Figure9.2).Thelogicofthisisthatweshouldignoreanysignsthattheactivedrugisharmfultothepatients.Ifwhatweweresayingwas‘ifthistrialdoesnotgiveasignificantreductioninanginausingpronethalolwewillnotuseitagain’,thismightbereasonable,butthemedicalresearchprocessdoesnotworklikethat.Thisisoneofseveralpiecesofevidenceandsoweshouldcertainlyuseamethodofinferencewhichwouldenableustodetecteffectsineitherdirection.
Thequestionofwhetherone-ortwo-sidedtestsshouldbethenormhasbeenthesubjectofconsiderabledebateamongpractitionersofstatisticalmethods.Perhapsthepositiontakendependsonthefieldinwhichthetestingisusuallydone.Inbiologicalscience,treatmentsseldomhaveonlyoneeffectandrelationshipsbetweenvariablesareusuallycomplex.Two-sidedtestsarealmostalwayspreferable.
Therearecircumstancesinwhichaone-sidedtestisappropriate.Ina
![Page 245: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/245.jpg)
studyoftheeffectsofaninvestigativeprocedure,laparoscopyandhydrotubation,onthefertilityofsub-fertilewomen(Luthraetal.1982),westudiedwomenpresentingataninfertilityclinic.Thesewomenwereobservedforseveralmonths,duringwhichsomeconceived,beforelaparoscopywascarriedoutonthosestillinfertile.Thesewerethenobservedforseveralmonthsafterwardsandsomeofthesewomenalsoconceived.Wecomparedtheconceptionrateintheperiodbeforelaparoscopywiththatafterwards.Ofcourse,womenwhoconceivedduringthefirstperioddidnothavealaparoscopy.Wearguedthatthelessfertileawomanwasthelongeritwaslikelytotakehertoconceive.Hence,thewomenwhohadthelaparoscopyshouldhavealowerconceptionrate(byanunknownamount)thanthelargergroupwhoenteredthestudy,becausethemorefertilewomenhadconceivedbeforetheirturnforlaparoscopycame.Toseewhetherlaparoscopyincreasedfertility,wecouldtestthenullhypothesisthattheconceptionrateafterlaparoscopywaslessthanorequaltothatbefore.Thealternativehypothesiswasthattheconceptionrateafterlaparoscopywashigherthanthatbefore.Atwo-sidedtestwasinappropriatebecauseifthelaparoscopyhadnoeffectonfertilitythepostlaparoscopyratewasexpectedtobelower;chancedidnotcomeintoit.Infactthepostlaparoscopyconceptionratewasveryhighandthedifferenceclearlysignificant.
9.6Significant,realandimportantIfadifferenceisstatisticallysignificant,thenitmaywellbereal,butnotnecessarilyimportant.Forexample,wemaylookattheeffectofadrug,givenforsomeotherpurpose,onbloodpressure.Supposewefindthatthedrugraisesbloodpressurebyanaverageof1mmHg,andthatthisissignificant.Ariseinbloodpressureof1mmHgisnotclinicallyimportant,so,althoughitmaybethere,itdoesnotmatter.Itis(statistically)significant,andreal,butnotimportant.
Ontheotherhand,ifadifferenceisnotstatisticallysignificant,itcouldstillbereal.Wemaysimplyhavetoosmallasampletoshowthatadifferenceexists.Furthermore,thedifferencemaystillbeimportant.ThedifferenceinmortalityintheanticoagulanttrialofCarletonetal.(1960),describedinChapter2,wasnotsignificant,thedifferencein
![Page 246: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/246.jpg)
percentagesurvivalbeing5.5infavouroftheactivetreatment.However,theauthorsalsoquoteaconfidenceintervalforthedifferenceinpercentagesurvivalof24.2percentagepointsinfavourofheparinto13.3percentagepointsinfavourofthecontroltreatment.Adifferenceinsurvivalof24percentagepointsinfavourofthetreatmentwouldcertainlybeimportantifitturnedouttobethecase.‘Notsignificant’doesnotimplythatthereisnoeffect.Itmeansthatwehavefailedtodemonstratetheexistenceofone.Laterstudiesshowedthatanticoagulationisindeedeffective.
Aparticularcaseofmisinterpretationofnon-significantresultsoccursintheinterpretationofrandomizedclinicaltrialswherethereisameasurementbeforetreatmentandanotherafterwards.Ratherthancomparetheaftertreatment
measurebetweenthetwogroups,researcherscanbetemptedtotestseparatelythenullhypothesesthatthemeasureinthetreatmentgrouphasnotchangedfrombaselineandthatthemeasureinthecontrolgrouphasnotchangedfrombaseline.Ifonegroupshowsasignificantdifferenceandtheotherdoesnot,theresearchersthenconcludethatthetreatmentsaredifferent.
Forexample,Kerriganetal.(1993)assessedtheeffectsofdifferentlevelsofinformationonanxietyinpatientsduetoundergosurgery.Theyrandomizedpatientstoreceiveeithersimpleordetailedinformationabouttheprocedureanditsrisks.Anxietywasagainmeasuredafterpatientshadbeengiventheinformation.Kerriganetal.(1993)calculatedsignificancetestsforthemeanchangeinanxietyscoreforeachgroupseparately.Inthegroupgivendetailedinformationthemeanchangeinanxietywasnotsignificant(P=0.2),interpretedincorrectlyas‘nochange’.Intheothergroupthereductioninanxietywassignificant(P=0.01).Theyconcludedthattherewasadifferencebetweenthetwogroupsbecausethechangewassignificantinonegroupbutnotintheother.Thisisincorrect.Theremay,forexample,beadifferenceinonegroupwhichjustfailstoreachthe(arbitrary)significancelevelandadifferenceintheotherwhichjustexceedsit,thedifferencesinthetwogroupsbeingsimilar.Weshouldcomparethetwogroupsdirectly.Itisthesewhicharecomparable
![Page 247: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/247.jpg)
apartfromtheeffectsoftreatment,beingrandomized,notthebeforeandaftertreatmentmeanswhichcouldbeinfluencedbymanyotherfactors.Analternativeanalysistestedthenullhypothesisthatafteradjustmentforinitialanxietyscorethemeananxietyscoresarethesameinpatientsgivensimpleanddetailedinformation.Thisshowedasignificantlyhighermeanscoreinthedetailedinformationgroup(BlandandAltman1993).Testingwithineachgroupseparatelyisessentiallythesameerrorascalculatingaconfidenceintervalforeachgroupseparately(§8.9).
9.7Comparingthemeansoflargesamples
Wecanusethisconfidenceintervaltocarryoutasignificancetestofthenullhypothesisthatthedifferencebetweenthemeansiszero,i.e.thealternativehypothesisisthatµ1andµ2arenotequal.Iftheconfidenceintervalincludeszero,thentheprobabilityofgettingsuchextremedataifthenullhypothesisweretrueisgreaterthan0.05(i.e.1-0.95).Iftheconfidenceintervalexcludeszero,thentheprobabilityofsuchextremedataunderthenullhypothesisisless
than0.05andthedifferenceissignificant.Anotherwayofdoingthesamethingistonotethat
isfromaStandardNormaldistribution,i.e.mean0andvariance1.Underthenullhypothesisthatµ1-µ2orµ1=µ2-0,thisis
![Page 248: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/248.jpg)
Thisistheteststatistic,andifitliesbetween-1.96and+1.96thentheprobabilityofsuchanextremevalueisgreaterthan0.05andthedifferenceisnotsignificant.Iftheteststatisticisgreaterthan1.96orlessthan-1.96,thereisalessthan0.05probabilityofsuchdataarisingifthenullhypothesisweretrue,andthedataarenotconsistentwithnullhypothesis;thedifferenceissignificantatthe0.05or5%level.ThisisthelargesampleNormaltestorztestfortwomeans.
Foranexample,inastudyofrespiratorysymptomsinschoolchildren(§8.5),wewantedtoknowwhetherchildrenreportedbytheirparentstohaverespiratorysymptomshadworselungfunctionthanchildrenwhowerenotreportedtohavesymptoms.Ninety-twochildrenwerereportedtohavecoughduringthedayoratnight,andtheirmeanPEFRwas294.8litre/minwithstandarddeviation57.1litre/min;1643childrenwerereportednottohavethesymptom,andtheirmeanPEFRwas313.6litre/minwithstandarddeviation55.2litre/min.Wethushavetwolargesamples,andcanapplytheNormaltest.Wehave
Thedifferencebetweenthetwogroupsis[xwithbarabove]1-[xwithbarabove]2=294.8-313.6=-18.8.Thestandarderrorofthedifferenceis
Theteststatisticis
UnderthenullhypothesisthisisanobservationfromaStandardNormaldistribution,andsoP<0.01(Table7.2).Ifthenullhypothesisweretrue,thedatawhichwehaveobservedwouldbeunlikely.WecanconcludethatthereisgoodevidencethatchildrenreportedtohavecoughduringthedayoratnighthavelowerPEFRthanotherchildren.
![Page 249: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/249.jpg)
Thishasaprobabilityofabout0.16,andsothedataareconsistentwiththenullhypothesis.However,the95%confidenceintervalforthedifferenceis-14.6-1.96×10.5to-14.6+1.96×10.5giving-35to6litre/min.Weseethatthedifferencecouldbejustasgreatasforcough.Becausethesizeofthesmallersampleisnotsogreat,thetestislesslikelytodetectadifferenceforthephlegmcomparisonthanforthecoughcomparison.TheadvantagesofconfidenceintervalsovertestsofsignificancearediscussedbyGardnerandAltman(1986).ConfidenceintervalsareusuallymoreinformativethanPvalues,particularlynon-significantones.
9.8ComparisonoftwoproportionsSupposewewishtocomparetwoproportionsp1andp2,estimatedfromlargeindependentsamplessizen1andn2.Thenullhypothesisisthattheproportioninthepopulationsfromwhichthesamplesaredrawnarethesame,psay.Sinceunderthenullhypothesistheproportionsforthetwogroupsarethesame,wecangetonecommonestimateoftheproportionanduseittoestimatethestandarderrors.Weestimatethecommonproportionfromthedataby
![Page 250: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/250.jpg)
wherep1=r1/n2-p2=r2/n2.Wewanttomakeinferencesfromthedifferencebetweensampleproportions,p1-p2,sowerequirethestandarderrorofthisdifference.
sincethesamplesareindependent.Hence
Aspisbasedonmoresubjectsthaneitherp1orp2,ifthenullhypothesisweretruethenstandarderrorswouldbemorereliablethanthoseestimatedin§8.6usingp1andp2separately.Wethenfindtheteststatistic
In§8.6,welookedattheproportionsofchildrenwithbronchitisininfancyandwithnosuchhistorywhowerereportedtohaverespiratorysymptomsinlaterlife.Wehad273childrenwithahistoryofbronchitisbeforeage5years,26ofwhomwerereportedtohavedayornightcoughatage14.Wehad1046childrenwithnobronchitisbeforeage5years,44ofwhomwerereportedtohavedayornightcoughatage14.Weshalltestthenullhypothesisthattheprevalenceofthesymptomisthesameinbothpopulations,againstthealternativethatitisnot:
![Page 251: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/251.jpg)
ReferringthistoTable7.2oftheNormaldistribution,wefindtheprobabilityofsuchanextremevalueislessthan0.01,soweconcludethatthedataarenotconsistentwiththenullhypothesis.Thereisgoodevidencethatchildrenwithahistoryofbronchitisaremorelikelytobereportedtohavedayornightcoughatage14.
Notethatthestandarderrorusedhereisnotthesameasthatfoundin§8.6.Itisonlycorrectifthenullhypothesisistrue.Theformulaof§8.6shouldbeusedforfindingtheconfidenceinterval.Thusthestandarderrorusedfortestingisnotidenticaltothatusedforestimation,aswasthecaseforthecomparisonoftwomeans.Itispossibleforthetesttobesignificantandtheconfidenceintervalincludezero.Thispropertyispossessedbyseveralrelatedtestsandconfidenceintervals.
Thisisalargesamplemethod,andisequivalenttothechi-squaredtestfora2by2table(§13.1,2).Howsmallthesamplecanbeandmethodsforsmallsamplesarediscussedin§13.3-6.
Notethatwedonotneedadifferenttestfortheratiooftwoproportions,asthenullhypothesisthattheratiointhepopulationisoneisthesameasthenullhypothesisthatthedifferenceinthepopulationiszero.
9.9*ThepowerofatestThetestforcomparingmeansin§9.7ismorelikelytodetectalargedifferencebetweentwopopulationsthanasmallone.Theprobabilitythatatestwillproduceasignificantdifferenceatagivensignificanceleveliscalledthepowerofthetest.Foragiventest,thiswilldepend
![Page 252: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/252.jpg)
onthetruedifferencebetweenthepopulationscompared,thesamplesizeandthesignificancelevelchosen.Wehavealreadynotedin§9.4thatwearemorelikelytoobtainasignificantdifferencewithasignificancelevelof0.05thanwithoneof0.01.WehavegreaterpowerifthePvaluechosentobeconsideredassignificantislarger.
ForthecomparisonofPEFRinchildrenwithandwithoutphlegm(§9.7),for
example,supposethatthepopulationmeanswereinfactµ1=310andµ2=295litre/min,andeachpopulationhadstandarddeviation55litre/min.Thesamplesizesweren1=1708andn2=27,sothestandarderrorofthedifferencewouldbe
Thepopulationdifferencewewanttobeabletodetectisµ1-µ2=310-295=15,andso
![Page 253: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/253.jpg)
FromTable7.1,Φ(0.55)isbetween0.691and0.726,about0.71.Thepowerofthetestwouldbe1-0.71=0.29.Ifthesewerethepopulationmeansandstandarddeviation,ourtestwouldhavehadapoorchanceofdetectingthedifferenceinmeans,eventhoughitexisted.Thetestwouldhavelowpower.Figure9.3showshowthepowerofthistestchangeswiththedifferencebetweenpopulationmeans.Asthedifferencegetslarger,thepowerincreases,gettingcloserandcloserto1.Thepowerisnotzeroevenwhenthepopulationdifferenceiszero,becausethereisalwaysthepossibilityofasignificantdifference,evenwhenthenullhypothesisistrue.1-power=β,theprobabilityofaTypeIIorbetaerror(§9.4)ifthepopulationdifference=15litres/min.
Fig.9.3.Powercurveforacomparisonoftwomeansfromsamplesofsize1708and27
9.10*MultiplesignificancetestsIfwetestanullhypothesiswhichisinfacttrue,using0.05asthecriticalsignificancelevel,wehaveaprobabilityof0.95ofcomingtoa‘notsignificant’(i.e.correct)conclusion.Ifwetesttwoindependent
![Page 254: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/254.jpg)
truenullhypotheses,theprobabilitythatneithertestwillbesignificantis0.95×0.95=0.90(§6.2).Ifwetesttwentysuchhypothesestheprobabilitythatnonewillbesignificantis
0.9520=0.36.0.Thisgivesaprobabilityof1-0.36=0.64ofgettingatleastonesignificantresult;wearemorelikelytogetonethannot.Theexpectednumberofspurioussignificantresultsis20×0.05=1.
Manymedicalresearchstudiesarepublishedwithlargenumbersofsignificancetests.Thesearenotusuallyindependent,beingcarriedoutonthesamesetofsubjects,sotheabovecalculationsdonotapplyexactly.However,itisclearthatifwegoontestinglongenoughwewillfindsomethingwhichis‘significant’.Wemustbewareofattachingtoomuchimportancetoalonesignificantresultamongamassofnon-significantones.Itmaybetheoneintwentywhichweshouldgetbychancealone.
Thisisparticularlyimportantwhenwefindthataclinicaltrialorepidemiologicalstudygivesnosignificantdifferenceoverall,butdoessoinaparticularsubsetofsubjects,suchaswomenagedover60.Forexample,Leeetal.(1980)simulatedaclinicaltrialofthetreatmentofcoronaryarterydiseasebyallocating1073patientrecordsfrompastcasesintotwo‘treatment’groupsatrandom.Theythenanalysedtheoutcomeasifitwereagenuinetrialoftwotreatments.Theanalysiswasquitedetailedandthorough.Aswewouldexpect,itfailedtoshowanysignificantdifferenceinsurvivalbetweenthosepatientsallocatedtothetwo‘treatments’.Patientswerethensubdividedbytwovariableswhichaffectprognosis,thenumberofdiseasedcoronaryvesselsandwhethertheleftventricularcontractionpatternwasnormalorabnormal.Asignificantdifferenceinsurvivalbetweenthetwo‘treatment’groupswasfoundinthosepatientswiththreediseasedvessels(themaximum)andabnormalventricularcontraction.Asthiswouldbethesubsetofpatientswiththeworstprognosis,thefindingwouldbeeasytoaccountforbysayingthatthesuperior‘treatment’haditsgreatestadvantageinthemostseverelyillpatients!Themoralofthisstoryisthatifthereisnodifferencebetweenthetreatmentsoverall,significantdifferencesinsubsetsaretobetreatedwiththeutmostsuspicion.Thismethodoflookingforadifferenceintreatment
![Page 255: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/255.jpg)
effectbetweensubgroupsofsubjectsisincorrect.Acorrectapproachwouldbetouseamultifactorialanalysis,asdescribedinChapter17,withtreatmentandgroupastwofactors,andtestforaninteractionbetweengroupsandtreatments.Thepowerfordetectingsuchinteractionsisquitelow,andweneedalargersamplethanwouldbeneededsimplytoshowadifferenceoverall(AltmanandMatthews1996,MatthewsandAltman1996a,b).
Thisspurioussignificantdifferencecomesaboutbecause,whenthereisnorealdifference,theprobabilityofgettingnosignificantdifferencesinsixsubgroupsis0.956=0.74,not0.95.WecanallowforthiseffectbytheBonferronimethod.Ingeneral,ifwehavekindependentsignificanttests,attheαlevel,ofnullhypotheseswhicharealltrue,theprobabilitythatwewillgetnosignificantdifferencesis(1-α)k.Ifwemakeαsmallenough,wecanmaketheprobabilitythatnoneoftheseparatetestsissignificantequalto0.95.ThenifanyofthektestshasaPvaluelessthanα,wewillhaveasignificantdifferencebetweenthetreatmentsatthe0.05level.Sinceαwillbeverysmall,itcanbeshownthat(1-α)k≈1-kα.Ifweputkα=0.05,soα=0.05/kwewillhaveprobability
0.05thatoneofthektestswillhaveaPvaluelessthanαifthenullhypothesesaretrue.Thus,ifinaclinicaltrialwecomparetwotreatmentswithin5subsetsofpatients,thetreatmentswillbesignificantlydifferentatthe0.05levelifthereisaPvaluelessthan0.01withinanyofthesubsets.ThisistheBonferronimethod.Notethattheyarenotsignificantatthe0.01level,butatonlythe0.05level.Thekteststogethertestthecompositenullhypothesisthatthereisnotreatmenteffectonanyvariable.
WecandothesamethingbymultiplyingtheobservedPvaluefromthesignificancetestsbythenumberoftests,k,anykPwhichexceedsonebeingignored.ThenifanykPislessthan0.05,thetwotreatmentsaresignificantatthe0.05level.
Forexample,Williamsetal.(1992)randomlyallocatedelderlypatientsdischargedfromhospitaltotwogroups.Theinterventiongroupreceivedtimetabledvisitsbyhealthvisitorassistants,thecontrol
![Page 256: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/256.jpg)
patientsgroupwerenotvisitedunlesstherewasperceivedneed.Soonafterdischargeandafteroneyear,patientswereassessedforphysicalhealth,disability,andmentalstateusingquestionnairescales.Therewerenosignificantdifferencesoverallbetweentheinterventionandcontrolgroups,butamongwomenaged75–79livingalonethecontrolgroupshowedsignificantlygreaterdeteriorationinphysicalscorethandidtheinterventiongroup(P=0.04),andamongmenover80yearsthecontrolgroupshowedsignificantlygreaterdeteriorationindisabilityscorethandidtheinterventiongroup(P=0.03).Theauthorsstatedthat‘Twosmallsub-groupsofpatientswerepossiblyshowntohavebenefitedfromtheintervention….Thesebenefits,however,havetobetreatedwithcaution,andmaybeduetochancefactors.’Subjectswerecross-classifiedbyagegroups,whetherlivingalone,andsex,sotherewereatleasteightsubgroups,ifnotmore.Thusevenifweconsiderthethreescalesseparately,onlyaPvaluelessthan0.05/8=0.006wouldprovideevidenceofatreatmenteffect.Alternatively,thetruePvaluesare8×0.04=0.32and8×0.03=0.24.
Asimilarproblemarisesifwehavemultipleoutcomemeasurements.Forexample,Newnhametal.(1993)randomizedpregnantwomentoreceiveaseriesofDopplerultrasoundbloodflowmeasurementsortocontrol.Theyfoundasignificantlyhigherproportionofbirthweightsbelowthe10thand3rdcentiles(P=0.006andP=0.02).Thesewereonlytwoofmanycomparisons,however,andonewouldsuspectthattheremaybesomespurioussignificantdifferencesamongsomany.Atleast35werereportedinthepaper,thoughonlythesetwowerereportedintheabstract.(Birthweightwasnottheintendedoutcomevariableforthetrial.)Thesetestsarenotindependent,becausetheyareallonthesamesubjects,usingvariableswhichmaynotbeindependent.Theproportionsofbirthweightsbelowthe10thand3rdcentilesareclearlynotindependent,forexample.Theprobabilitythattwocorrelatedvariablesbothgivenon-significantdifferenceswhenthenullhypothesisistrueisgreaterthan(1-α)2becauseifthefirsttestisnotsignificant,thesecondnowhasaprobabilitygreaterthan1-αofbeingnotsignificantalso.(Similarly,theprobabilitythatbotharesignificantexceedsα2,andtheprobabilitythatonlyoneissignificantisreduced.)
![Page 257: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/257.jpg)
Forkteststheprobabilityofnosignificantdifferencesisgreaterthan(1-α)kandsogreaterthan1-kα.Thusifwecarryouteachtestattheα=0.05/klevel,wewillhaveaprobabilityofnosignificantdifferenceswhichisgreaterthan0.95.APvaluelessthanαforanyvariable,orkP<0.05,wouldmeanthatthetreatmentsweresignificantlydifferent.Fortheexample,thePvaluescouldbeadjustedby35×0.006=0.21and35×0.02=0.70.
Becausetheprobabilityofobtainingnosignificantdifferencesifthenullhypothesesarealltrueisgreaterthanthe0.95whichwewantittobe,theoverallPvalueisactuallysmallerthanthenominal0.05,byanunknownamountwhichdependsonthelackofindependencebetweenthetests.Thepowerofthetest,itsabilitytodetecttruedifferencesinthepopulation,iscorrespondinglydiminished.Instatisticalterms,thetestisconservative.
Othermultipletestingproblemsarisewhenwehavemorethantwogroupsofsubjectsandwishtocompareeachpairofgroups(§10.9),whenwehaveaseriesofobservationsovertime,suchasbloodpressureevery15minafteradministrationofadrug,wheretheremaybeatemptationtotesteachtimepointseparately(§10.7),andwhenwehaverelationshipsbetweenmanyvariablestoexamine,asinasurvey.Foralltheseproblems,themultipletestsarehighlycorrelatedandtheBonferronimethodisinappropriate,asitwillbehighlyconservativeandmaymissrealdifferences.
9.11*RepeatedsignificancetestsandsequentialanalysisAspecialcaseofmultipletestingarisesinclinicaltrials,wherepatientsareadmittedatdifferenttimes.Therecanbeatemptationtokeeplookingatthedataandcarryingoutsignificanttests.Asdescribedabove(§9.10),thisisliabletoproducespurioussignificantdifferences,detectingtreatmenteffectswherenoneexist.Ihaveheardofresearcherstestingthedifferenceeachtimeapatientisaddedandstoppingthetrialassoonasthedifferenceissignificant,thensubmittingthepaperforpublicationasifonlyonetesthadbeen
![Page 258: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/258.jpg)
carriedout.Iwillbecharitableandputthisdowntoignorance.
Itisquitelegitimatetosetupatrialwherethetreatmentdifferenceistestedeverytimeapatientisadded,providedthisrepeatedtestingisdesignedintothetrialandtheoverallchanceofasignificantdifferencewhenthenullhypothesisistrueremains0.05.Suchdesignsarecalledsequentialclinicaltrials.AcomprehensiveaccountisgivenbyWhitehead(1997).
Analternativeapproachwhichisquiteoftenusedistotakeasmallnumberoflooksatthedataasthetrialprogresses,testingatapredeterminedPvalue.Forexample,wecouldtestthreetimes,rejectingthenullhypothesisofnotreatmenteffectthefirsttimeonlyifP<0.001,thesecondtimeifP<0.01,andthethirdtimeifP<0.04.Thenifthenullhypothesisistrue,theprobabilitythattherewillnotbeasignificantdifferenceisapproximately0.999×0.99×0.96=0.949,sotheoverallalphalevelwillbe1-0.949=0.051,i.e.approximately0.05.(Thecalculationisapproximatebecausethetestsarenotindependent.)Ifthenullhypothesisisrejectedatanyofthesetests,theoverallPvalueis0.05,notthe
nominalone.Thisapproachcanbeusedbydatamonitoringcommittees,whereifthetrialshowsalargedifferenceearlyonthetrialcanbestoppedyetstillallowastatisticalconclusiontobedrawn.ThisiscalledthealphaspendingorP-valuespendingapproach.
TwoparticularmethodswhichyoumightcomeacrossarethegroupedsequentialdesignofPocock(1977,1982),whereeachtestisdoneatthesamenominalalphavalue,andthemethodofO'BrienandFleming(1979),widelyusedbythepharmaceuticalindustry,wherethenominalalphavaluesdecreasesharplyasthetrialprogresses.
9MMultiplechoicequestions44to49(Eachbranchiseithertrueorfalse)
44.Inacase–controlstudy,patientswithagivendiseasedrankcoffeemorefrequentlythandidcontrols,andthedifferencewashighlysignificant.Wecanconcludethat:
![Page 259: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/259.jpg)
(a)drinkingcoffeecausesthedisease;
(b)thereisevidenceofarealrelationshipbetweenthediseaseandcoffeedrinkinginthesampledpopulation;
(c)thediseaseisnotrelatedtocoffeedrinking;
(d)eliminatingcoffeewouldpreventthedisease;
(e)coffeeandthediseasealwaysgotogether.
ViewAnswer
45.WhencomparingthemeansoftwolargesamplesusingtheNormaltest:
(a)thenullhypothesisisthatthesamplemeansareequal;
(b)thenullhypothesisisthatthemeansarenotsignificantlydifferent;
(c)standarderrorofthedifferenceisthesumofthestandarderrorsofthemeans;
(d)thestandarderrorsofthemeansmustbeequal;
(e)theteststatisticistheratioofthedifferencetoitsstandarderror.
ViewAnswer
46.InacomparisonoftwomethodsofmeasuringPEFR,6of17subjectshadhigherreadingsontheWrightpeakflowmeter,10hadhigherreadingsontheminipeakflowmeterandonehadthesameonboth.Ifthedifferencebetweentheinstrumentsistestedusingasigntest:
(a)theteststatisticmaybethenumberwiththehigherreadingontheWrightmeter;
(b)thenullhypothesisisthatthereisnotendencyforoneinstrumenttoreadhigherthantheother;
(c)aone-tailedtestofsignificanceshouldbeused;
(d)theteststatisticshouldfollowtheBinomialdistribution(n=
![Page 260: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/260.jpg)
16andp=0.5)ifthenullhypothesisweretrue;
(e)theinstrumentsshouldhavebeenpresentedinrandomorder.
ViewAnswer
47.Inasmallrandomizeddoubleblindtrialofanewtreatmentinacutemyocardialinfarction,themortalityinthetreatedgroupwashalfthatinthecontrolgroup,butthedifferencewasnotsignificant.Wecanconcludethat:
(a)thetreatmentisuseless;
(b)thereisnopointincontinuingtodevelopthetreatment;
(c)thereductioninmortalityissogreatthatweshouldintroducethetreatmentimmediately;
(d)weshouldkeepaddingcasestothetrialuntiltheNormaltestforcomparisonoftwoproportionsissignificant;
(e)weshouldcarryoutanewtrialofmuchgreatersize.
ViewAnswer
48.Inalargesamplecomparisonbetweentwogroups,increasingthesamplesizewill:
(a)improvetheapproximationoftheteststatistictotheNormaldistribution;
(b)decreasethechanceofanerrorofthefirstkind;
(c)decreasethechanceofanerrorofthesecondkind;
(d)increasethepoweragainstagivenalternative;
(e)makethenullhypothesislesslikelytobetrue.
ViewAnswer
49.Inastudyofbreastfeedingandintelligence(Lucasetal.1992),300childrenwhowereverysmallatbirthweregiventheirmother'sbreastmilkorinfantformula,atthechoiceofthemother.Atthe
![Page 261: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/261.jpg)
ageof8yearstheIQofthesechildrenwasmeasured.ThemeanIQintheformulagroupwas92.8,comparedtoameanof103.0inthebreastmilkgroup.Thedifferencewassignificant,P<0.001:
(a)thereisgoodevidencethatformulafeedingofverysmallbabiesreducesIQatageeight;
(b)thereisgoodevidencethatchoosingtoexpressbreastmilkisrelatedtohigherIQinthechildatageeight;
(c)typeofmilkhasnoeffectonsubsequentIQ;
(d)theprobabilitythattypeofmilkaffectssubsequentIQislessthan0.1%;
(e)iftypeofmilkwereunrelatedtosubsequentIQ,theprobabilityofgettingadifferenceinmeanIQasbigasthatobservedislessthan0.001.
ViewAnswer
9EExercise:Crohn'sdiseaseandcornflakesThesuggestionthatcornflakesmaycauseCrohn'sdiseasearoseinthestudyofJames(1977).Crohn'sdiseaseisaninflammatorydisease,usuallyofthelastpartofthesmallintestine.Itcancauseavarietyofsymptoms,includingvaguepain,diarrhoea,acutepainandobstruction.Treatmentmaybebydrugsorsurgery,butmanypatientshavehadthediseaseformanyyears.James'initialhypothesiswasthatfoodstakenatbreakfastmaybeassociatedwithCrohn'sdisease.Jamesstudied16menand18womenwithCrohn'sdisease,aged19–64years,meantimesincediagnosis4.2years.Thesewerecomparedtocontrols,drawnfromhospital
patientswithoutmajorgastro-intestinalsymptoms.Twocontrolswerechosenperpatient,matchedforageandsex.Jamesinterviewedallcasesandcontrolshimself.Caseswereaskedwhethertheyatevariousfoodsforbreakfastbeforetheonsetofsymptoms,andcontrolswereaskedwhethertheyatevariousfoodsbeforeacorrespondingtime(Table9.2).Therewasasignificantexcessofeatingofcornflakes,
![Page 262: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/262.jpg)
wheatandbranamongtheCrohn'spatients.Theconsumptionofdifferentcerealswasinterrelated,peoplereportingonecerealbeinglikelytoreportothers.InJames'opiniontheprincipalassociationofCrohn'sdiseasewaswithcornflakes,basedontheapparentstrengthoftheassociation.Onlyonecasehadnevereatencornflakes.
Table9.2.NumbersofCrohn'sdiseasepatientsandcontrolswhoatevariouscerealsregularly(atleastonce
perweek)(James1977)
Patients Controls Significancetest
Cornflakes Regularly 23 17 P<0.0001
Rarelyornever
11 51
Wheat Regularly 16 12 P<0.01
Rarelyornever
18 56
Porridge Regularly 11 15 0.5>P>0.1
Rarelyornever
23 53
Rice Regularly 8 10 0.5>P>0.1
![Page 263: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/263.jpg)
Rarelyornever
26 56
Bran Regularly 6 2 P=0.02
Rarelyornever
28 66
Muesli Regularly 4 3 P=0.17
Rarelyornever
30 65
Severalpaperssoonappearedinwhichthisstudywasrepeated,withvariations.NonewasidenticalindesigntoJames'studyandnoneappearedtosupporthisfindings.Mayberryetal.(1978)interviewed100patientswithCrohn'sdisease,meandurationnineyears.Theyobtained100controls,matchedforageandsex,frompatientsandtheirrelativesattendingafractureclinic.Casesandcontrolswereinterviewedabouttheircurrentbreakfasthabits(Table9.3).Theonlysignificantdifferencewasanexcessoffruitjuicedrinkingincontrols.Cornflakeswereeatenby29casescomparedto22controls,whichwasnotsignificant.Inthisstudytherewasnoparticulartendencyforcasestoreportmorefoodsthancontrols.Theauthorsalsoaskedcaseswhethertheyknewofanassociationbetweenfood(unspecified)andCrohn'sdisease.Theassociationwithcornflakeswasreportedby29,and12ofthesehadstoppedeatingthem,havingpreviouslyeatenthemregularly.Intheir29matchedcontrols,3werepastcornflakeseaters.Ofthe71Crohn'spatientswhowereunawareoftheassociation,21haddiscontinuedeatingcornflakescomparedto10oftheir71controls.Theauthorsremarked‘seeminglypatientswithCrohn'sdiseasehadsignificantlyreducedtheirconsumptionofcornflakescomparedwithcontrols,irrespectiveofwhethertheywereawareofthepossible
![Page 264: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/264.jpg)
association’.
1.Arethecasesandcontrolscomparableineitherofthesestudies?
ViewAnswer
2.Whatothersourcesofbiascouldtherebeinthesedesigns?
ViewAnswer
Table9.3.Numberofpatientsandcontrolsregularlyconsumingcertainfoodsatleasttwiceweekly
(Mayberryetal.1978)
Foodsatbreakfast
Crohn'spatients(n=100)
Controls(n=100)
Significancetest
Bread 91 86
Toast 59 64
Egg 31 37
Fruitorfruitjuice
14 30 P<0.02
Porridge 20 18
Weetabix,shreddiesor
21 19
![Page 265: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/265.jpg)
shreddedwheat
Cornflakes 29 22
SpecialK 4 7
Ricekrispies 6 6
Sugarpuffs 3 1
Branorallbran 13 12
Muesli 3 10
AnyCereal 55 55
3.WhatisthemainpointofdifferenceindesignbetweenthestudyofJamesandthatofMayberryetal.?
ViewAnswer
4.InthestudyofMayberryetal.howmanyCrohn'scasesandhowmanycontrolshadeverbeenregulareatersofcornflakes?HowdoesthiscomparewithJames'findings?
ViewAnswer
5.WhydidJamesthinkthateatingcornflakeswasparticularlyimportant?
ViewAnswer
6.ForthedataofTable9.2,calculatethepercentageofcasesand
![Page 266: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/266.jpg)
controlswhosaidthattheyatethevariouscereals.Nowdividetheproportionofcaseswhosaidthattheyhadeatenthecerealbytheproportionofcontrolswhoreportedeatingit.Thistellsus,roughly,howmuchmorelikelycasesweretoreportthecerealthanwerecontrols.Doyouthinkeatingcornflakesisparticularlyimportant?
ViewAnswer
7.Ifwehaveanexcessofallcerealswhenweaskwhatwasevereaten,andnonewhenweaskwhatiseatennow,whatpossiblefactorscouldaccountforthis?
ViewAnswer
![Page 267: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/267.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>10-Comparingthemeansofsmallsamples
10
Comparingthemeansofsmallsamples
10.1ThetdistributionWehaveseeninChapters8and9howtheNormaldistributioncanbeusedtocalculateconfidenceintervalsandtocarryouttestsofsignificanceforthemeansoflargesamples.Inthischapterweshallseehowsimilarmethodsmaybeusedwhenwehavesmallsamples,usingthetdistribution,andgoontocompareseveralmeans.
Sofar,theprobabilitydistributionswehaveusedhavearisenbecauseofthewaydatawerecollected,eitherfromthewaysamplesweredrawn(Binomialdistribution),orfromthemathematicalpropertiesoflargesamples(Normaldistribution).Thedistributiondidnotdependonanypropertyofthedatathemselves.Tousethetdistributionwemustmakeanassumptionaboutthedistributionfromwhichtheobservationsthemselvesaretaken,thedistributionofthevariableinthepopulation.WemustassumethistobeaNormaldistribution.AswesawinChapter7,manynaturallyoccurringvariableshavebeenfoundtofollowaNormaldistributionclosely.IshalldiscusstheeffectsofanydeviationsfromtheNormallater.
![Page 268: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/268.jpg)
![Page 269: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/269.jpg)
Fig.10.1.Student'stdistributionwith1,4and20degreesoffreedom,showingconvergencetotheStandardNormaldistribution
![Page 270: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/270.jpg)
Table10.1.Two-tailedprobabilitypointsofthetdistribution
D.f. Probability D.f. Probability
0.10 0.05 0.01 0.001 0.10 0.05
10% 5% 1% 0.1% 10% 5%
1 6.31 12.70 63.66 636.62 16 1.75 2.12
2 2.92 4.30 9.93 31.60 17 1.74 2.11
3 2.35 3.18 5.84 12.92 18 1.73 2.10
4 2.13 2.78 4.60 8.61 19 1.73 2.09
5 2.02 2.57 4.03 6.87 20 1.72 2.09
6 1.94 2.45 3.71 5.96 21 1.72 2.08
7 1.89 2.36 3.50 5.41 22 1.72 2.07
8 1.86 2.31 3.36 5.04 23 1.71 2.07
9 1.83 2.26 3.25 4.78 24 1.71 2.06
10 1.81 2.23 3.17 4.59 25 1.71 2.06
11 1.80 2.20 3.11 4.44 30 1.70 2.04
![Page 271: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/271.jpg)
12 1.78 2.18 3.05 4.32 40 1.68 2.02
13 1.77 2.16 3.01 4.22 60 1.67 2.00
14 1.76 2.14 2.98 4.14 120 1.66 1.98
15 1.75 2.13 2.95 4.07 ∞ 1.64 1.96
D.f.=Degreesoffreedom.
∞=infinity,sameastheStandardNormaldistribution.
LiketheNormaldistribution,thetdistributionfunctioncannotbeintegratedalgebraicallyanditsnumericalvalueshavebeentabulated.Becausethetdistributiondependsonthedegreesoffreedom,itisnotusuallytabulatedinfullliketheNormaldistributioninTable7.1.Instead,probabilitypointsaregivenfordifferentdegreesoffreedom.Table10.1showstwosidedprobabilitypointsforselecteddegreesoffreedom.Thus,with4degreesoffreedom,wecanseethat,withprobability0.05,twillbe2.78ormorefromitsmean,zero.
Becauseonlycertainprobabilitiesarequoted,wecannotusuallyfindtheexactprobabilityassociatedwithaparticularvalueoft.Forexample,supposewewanttoknowtheprobabilityofton9degreesoffreedombeingfurtherfromzerothan3.7.FromTable10.1weseethatthe0.01pointis3.25andthe0.001pointis4.78.Wethereforeknowthattherequiredprobabilityliesbetween0.01and0.001.Wecouldwritethisas0.001<P<0.01.Oftenthelowerbound,0.001,isomittedandwewriteP<0.01.Withacomputeritispossibletocalculatetheexactprobabilityeverytime,sothiscommonpracticeisduetodisappear.
![Page 272: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/272.jpg)
Fig.10.2.Sampletratiosderivedfrom750samplesof4humanheightsandthetdistribution,afterStudent(1908)
10.2Theone-sampletmethodWecanusethetdistributiontofindconfidenceintervalsformeansestimatedfromasmallsamplefromaNormaldistribution.Wedonotusuallyhavesmallsamplesinsamplesurveys,butweoftenfindtheminclinicalstudies.Forexample,wecanusethetdistributiontofind
![Page 273: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/273.jpg)
confidenceintervalsforthesizeofdifferencebetweentwotreatmentgroups,orbetweenmeasurementsobtainedfromsubjectsundertwoconditions.Ishalldealwiththelatter,singlesampleproblemfirst.
Thepopulationmean,µ,isunknownandwewishtoestimateitusinga95%confidenceinterval.Wecanseethat,for95%ofsamples,thedifferencebetween[xwithbarabove]andµisatmosttstandarderrors,wheretisthevalueofthetdistributionsuchthat95%ofobservationswillbeclosertozerothant.Foralargesamplethiswillbe1.96asfortheNormaldistribution.ForsmallsampleswemustuseTable10.1.Inthistable,theprobabilitythatthetdistributionisfurtherfromzerothantisgiven,sowemustfirstfindoneminusourdesiredprobability,0.95.Wehave1-0.95=0.05,soweusethe0.05columnofthetabletogetthevalueoft.Wethenhavethe95%confidenceinterval:[xwithbarabove]-tstandarderrorsto[xwithbarabove]-tstandarderrors.Theusualapplicationofthisistodifferencesbetweenmeasurementsmadeonthesameoronmatchedpairsofsubjects.Inthisapplicationtheonesamplettestisalsoknownasthepairedttest.
ConsiderthedataofTable10.2.(Iaskedtheresearcherwhythereweresomanymissingdata.Hetoldmethatsomeofthebiopsieswerenotusabletocountthecapillaries,andthatsomeofthesepatientswereamputeesandthefootitselfwasmissing.)Weshallestimatethedifferenceincapillarydensity
betweentheworsefoot(intermsofulceration,notcapillaries)andthebetterfootfortheulceratedpatients.Thefirststepistofindthedifferences(worse–better).Wethenfindthemeandifferenceanditsstandarderror,asdescribedin§8.2.TheseareinthelastcolumnofTable10.2.
Table10.2.Capillarydensity(permm2)inthefeetofulceratedpatientsandahealthycontrolgroup(datasuppliedbyMarc
Lamah)
![Page 274: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/274.jpg)
Controls Ulceratedpatients
Rightfoot Leftfoot
Averageofrightandleft†
Worsefoot
Betterfoot
Averageofworseand
better†
Differenceworse-
19 16 17.5 9 ? 9.0
25 30 27.5 11 ? 11.0
25 29 27.0 15 10 12.5
26 33 29.5 16 21 18.5
26 28 27.0 18 18 18.0
30 28 29.0 18 18 18.0
33 36 34.5 19 26 22.5
33 29 31.0 20 ? 20.0
34 37 35.5 20 20 20.0
34 33 33.5 20 33 26.5
34 37 35.5 20 26 23.0
34 ? 34.0 21 15 18.0
![Page 275: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/275.jpg)
35 38 36.5 22 23 22.5
36 40 38.0 22 ? 22.0
39 41 40.0 23 23 23.0
40 39 39.5 25 30 27.5
41 39 40.0 26 31 28.5
41 39 40.0 27 26 26.5
56 48 52.0 27 ? 27.0
35 23 29.0
47 42 44.5
? 24 24.0
? 28 28.0
Number 19 23
Mean 34.08 22.59
Sumofsquares
956.13 1176.32
![Page 276: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/276.jpg)
Variance 53.12 53.47
Standarddeviation
7.29 7.31
Standarderror
0.38 0.32
†Whenoneobservationismissingtheaverage=theotherobservation.?=Missingdata.
Tofindthe95%confidenceintervalforthemeandifferencewemustsupposethatthedifferencesfollowaNormaldistribution.Tocalculatetheinterval,wefirstrequiretherelevantpointofthetdistributionfromTable10.1.Thereare16non-missingdifferencesandhencen-1=15degreesoffreedomassociatedwiths2.Wewantaprobabilityof0.95ofbeingclosertozerothant,sowegotoTable10.1withprobability=1-0.95=0.05.Usingthe15d.f.row,wegett=2.13.Hencethedifferencebetweenasamplemeanandthepopulationmeanislessthan2.13standarderrorsfor95%ofsamples,andthe95%confidenceintervalis-0.81-2.13×1.51to-0.81+2.13×1.51=-4.03to+2.41capillaries/mm2.
Onthebasisofthesedata,thecapillarydensitycouldbelessintheworseaffectedfootbyasmuchas4.03capillaries/mm2,orgreaterbyasmuchas2.41capillaries/mm2.Inthelargesamplecase,wewouldusetheNormaldistributioninsteadofthetdistribution,putting1.96insteadof2.13.WewouldnotthenneedthedifferencesthemselvestofollowaNormaldistribution.
![Page 277: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/277.jpg)
Fig.10.3.NormalplotfordifferencesandplotofdifferenceagainstaverageforthedataofTable10.2,ulceratedpatients
Wecanalsousethetdistributiontotestthenullhypothesisthatinthepopulationthemeandifferenceiszero.Ifthenullhypothesisweretrue,andthedifferencesfollowaNormaldistribution,theteststatisticmean/standarderrorwouldbefromatdistributionwithn-1degreesoffreedom.Thisisbecausethenullhypothesisisthatthemeandifferenceµ=0,hencethenumerator[xwithbarabove]-µ=[xwithbarabove].Wehavetheusual‘estimateoverstandarderror’formula.Fortheexample,wehave
Ifwegotothe15degreesoffreedomrowofTable10.1,wefindthattheprobabilityofsuchanextremevaluearisingisgreaterthan0.10,the0.10pointofthedistributionbeing1.75.UsingacomputerwewouldfindP=0.6.Thedataareconsistentwiththenullhypothesisandwehavefailedtodemonstratetheexistenceofadifference.Notethattheconfidenceintervalismoreinformativethanthesignificancetest.
Wecouldalsousethesigntesttotestthenullhypothesisofnodifference.Thisgivesus5positivesoutof12differences(4differences,beingzero,givenousefulinformation)whichgivesatwosidedprobabilityof0.8,alittlelargerthanthatgivenbythettest.ProvidedtheassumptionofaNormaldistributionistrue,thettestis
![Page 278: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/278.jpg)
preferredbecauseitisthemostpowerfultest,andsomostlikelytodetectdifferencesshouldtheyexist.
ThevalidityofthepairedtmethoddescribedabovedependsontheassumptionthatthedifferencesarefromaNormaldistribution.WecanchecktheassumptionofaNormaldistributionbyaNormalplot(§7.5).Figure10.3showsaNormalplotforthedifferences.Thepointslieclosetotheexpectedline,suggestingthatthereislittledeviationfromtheNormal.
Anotherplotwhichisausefulcheckhereisthedifferenceagainstthesubjectmean(Figure10.3).Ifthedifferencedependsonmagnitude,thenweshouldbecarefulofdrawinganyconclusionaboutthemeandifference.Wemaywanttoinvestigatethisfurther,perhapsbytransformingthedata(§10.4).Inthiscasethedifferencebetweenthetwofeetdoesnotappeartoberelatedtothelevelofcapillarydensityandweneednotbeconcernedaboutthis.
ThedifferencesmaylooklikeafairlygoodfittotheNormalevenwhenthemeasurementsthemselvesdonot.Therearetworeasonsforthis:thesubtractionremovesvariabilitybetweensubjects,leavingthemeasurementerrorwhichismorelikelytobeNormal,andthetwomeasurementerrorsarethenaddedbythedifferencing,producingthetendencyofsumstotheNormalseenintheCentralLimittheorem(§7.3).TheassumptionofaNormaldistributionfortheonesamplecaseisquitelikelytobemet.Idiscussthisfurtherin§10.5.
10.3ThemeansoftwoindependentsamplesSupposewehavetwosamplesfrompopulationswhichhaveaNormaldistribution,withwhichwewanttoestimatethedifferencebetweenthepopulationmeans.Ifthesampleswerelarge,the95%confidenceintervalforthedifferencewouldbetheobserveddifference-1.96standarderrorstoobserveddifference+1.96standarderrors.Unfortunately,wecannotsimplyreplace1.96byanumberfromTable10.1.Thisisbecausethestandarderrordoesnothavethesimpleformdescribedin§10.1.Itisnotbasedonasinglesumofsquares,butratheristhesquarerootofthesumoftwoconstantsmultipliedbytwosums
![Page 279: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/279.jpg)
ofsquares.Hence,itdoesnotfollowthesquarerootoftheChi-squareddistributionasrequiredforthedenominatorofatdistributedrandomvariable(§7A).Inordertousethetdistributionwemustmakeafurtherassumptionaboutthedata.NotonlymustthesamplesbefromNormaldistributions,theymustbefromNormaldistributionswiththesamevariance.Thisisnotasunreasonableanassumptionasitmaysound.Adifferenceinmeanbutnotinvariabilityisacommonphenomenon.ThePEFRdataforchildrenwithandwithoutsymptomsanalysedin§8.5and§9.6showthecharacteristicveryclearly,asdotheaveragecapillarydensitiesinTable10.2.
Wenowestimatethecommonvariance,s2.Firstwefindthesumofsquaresaboutthesamplemeanforeachsample,whichwecanlabelSS1andSS2.WeformacombinedsumofsquaresbySS1+SS2.Thesumofsquaresforthefirstgroup,SS1,hasn1-1degreesoffreedomandthesecond,SS2,hasn2-1degreesoffreedom.Thetotaldegreesoffreedomisthereforen1-1+n2-1=n1+n2-2.Wehavelost2degreesoffreedombecausewehaveasumofsquaresabouttwomeans,eachestimatedfromthedata.Thecombinedestimateofvarianceis
Thestandarderrorof[xwithbarabove]1-[xwithbarabove]2is
NowwehaveastandarderrorrelatedtothesquarerootoftheChi-squareddistributionandwecangetatdistributedvariableby
havingn1+n2-2degreesoffreedom.The95%confidenceintervalforthedifferencebetweenpopulationmeans,µ1-µ2,is
wheretisthe0.05pointwithn1+n2-2degreesoffreedomfromTable
![Page 280: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/280.jpg)
10.1.Alternatively,wecantestthenullhypothesisthatinthepopulationthedifferenceiszero,i.e.thatµ1=µ2,usingtheteststatistic
whichwouldfollowthetdistributionwithn1+n2-2d.f.ifthenullhypothesisweretrue.
Fig.10.4.ScatterplotagainstgroupandNormalplotforthepatientaveragesofTable10.2
Forapracticalexample,Table10.2showstheaveragecapillarydensityoverbothfeet(ifpresent)fornormalcontrolsubjectsaswellasulcerpatients.Weshallestimatethedifferencebetweentheulceratedpatientsandcontrols.WecanchecktheassumptionsofNormaldistributionanduniformvariance.FromTable10.2thevariancesappearremarkablysimilar,53.12and53.47.Figure10.4showsthatthereappearstobeashiftofmeanonly.TheNormalplotcombinesbygroupsbytakingthedifferencesbetweeneachobservationanditsgroupmean,calledtheresiduals.Thishasaslightkinkattheendbutnopronouncedcurve,
![Page 281: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/281.jpg)
suggestingthatthereislittledeviationfromtheNormal.Ithereforefeelquitehappythattheassumptionsofthetwo-sampletmethodaremet.
Firstwefindthecommonvarianceestimate,s2.Thesumsofsquaresaboutthetwosamplemeansare956.13and1176.32.Thisgivesthecombinedsumofsquaresaboutthesamplemeanstobe956.13+1176.32=2132.45.Thecombineddegreesoffreedomaren1+n2-2=19+23-2=40.Hences2=2132.45/40=53.31.Thestandarderrorofthedifferencebetweenmeansis
Thevalueofthetdistributionforthe95%confidenceintervalisfoundfromthe0.05columnand40degreesoffreedomrowofTable10.1,givingt0.05=2.02.Thedifferencebetweenmeans(control–ulcerated)is34.08-22.59=11.49.Hencethe95%confidenceintervalis11.49-2.02×2.26to11.49+2.02×2.26,giving6.92to16.06capillaries/mm2.Hencethereisclearlyadifferenceincapillarydensitybetweennormalcontrolsandulceratedpatients.
Totestthenullhypothesisthatinthepopulationthecontrol-ulcerateddifferenceiszero,theteststatisticisdifferenceoverstandarderror,11.49/2.26=5.08.Ifthenullhypothesisweretrue,thiswouldbeanobservationfromthetdistributionwith40degreesoffreedom.FromTable10.1,theprobabilityofsuchanextremevalueislessthan0.001.Hencethedataarenotconsistentwiththenullhypothesisandwecanconcludethatthereisstrongevidenceofadifferenceinthepopulationswhichthesepatientsrepresent.
10.4Theuseoftransformations
![Page 282: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/282.jpg)
Wehavealreadyseen(§7.4)thatsomevariableswhichdonotfollowaNormaldistributioncanbemadesobyasuitabletransformation.Thesametransformationcanbeusedtomakethevariancesimilarindifferentgroups,calledvariancestabilizingtransformations.BecausemeanandvarianceinsamplesfromthesamepopulationareindependentifandonlyifthedistributionisNormal(§7A),stablevariancesandNormaldistributionstendtogotogether.
Oftenstandarddeviationandmeanareconnectedbyasimplerelationshipoftheforms=a[xwithbarabove]b,whereaandbareconstants.Ifthisisso,itcanbeshownthatthevariancewillbestabilizedbyraisingtheobservationstothepower1-b,
unlessb=1,whenweusethelog.(Ishallresistthetemptationtoprovethis,thoughIcan.Anybookonmathematicalstatisticswilldoit.)Thus,ifthestandarddeviationisproportionaltothesquarerootofthemean(i.e.varianceproportionaltomean),e.g.Poissonvariance(§6.7),b=0.5,1-b=0.5,andweuseasquareroottransformation.Ifthestandarddeviationisproportionaltothemeanwelog.Ifthestandarddeviationisproportionaltothesquareofthemeanwehaveb=2,1-b=-1,andweusethereciprocal.Another,rarelyseentransformationisusedwhenobservationsareBinomialproportions.Herethestandarddeviationincreasesastheproportiongoesfrom0.0to0.5,thendecreasesastheproportiongoesfrom0.5to1.0.Thisisthearcsinesquareroottransformation.Whetheritworksdependsonhowmuchothervariationthereis.Ithasnowbeenlargelysupersededbylogisticregression(§17.8).
Table10.3.Bicepsskinfoldthickness(mm)intwogroupsofpatients
Crohn'sdisease Coeliacdisease
1.8 2.8 4.2 6.2 1.8 3.8
![Page 283: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/283.jpg)
2.2 3.2 4.4 6.6 2.0 4.2
2.4 3.6 4.8 7.0 2.0 5.4
2.5 3.8 5.6 10.0 2.0 7.6
2.8 4.0 6.0 10.4 3.0
Fig.10.5.Scatterplot,histogram,andNormalplotforthebicepsskinfolddata
Whenwehaveseveralgroupswecanplotlog(s)againstlog([xwithbarabove])thendrawalinethroughthepoints.Theslopeofthelineisb(seeHealy1968).Trialanderror,however,combinedwithscatterplots,histograms,andNormalplots,usuallysuffice.
Table10.3showssomedatafromastudyofanthropometryanddiagnosisinpatientswithintestinaldisease(Maugdaletal.1985).Wewereinterestedindifferencesinanthropometricalmeasurementsbetweenpatientswithdifferentdiagnoses,andherewehavethebicepsskinfoldmeasurementsfor20patientswithCrohn'sdiseaseand9patientswithcoeliacdisease.Thedatahavebeenputintoorderof
![Page 284: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/284.jpg)
magnitudeanditisfairlyobviousthatthedistributionisskewedtotheright.Figure10.5showsthisclearly.Ihavesubtractedthegroupmeanfromeachobservation,givingwhatiscalledthewithin-groupresiduals,andthenfoundboththefrequencydistributionandNormalplot.Thedistributionisclearlyskew,andthisisreflectedintheNormalplot,whichshowsapronouncedcurvature.
Fig.10.6.Scatterplot,histogram,andNormalplotforthebicepsskinfolddata,aftersquareroot,log,andreciprocaltransformations
WeneedaNormalizingtransformation,ifonecanbefound.Theusualbestguessesaresquareroot,log,andreciprocal,withthelogbeingthemostlikelytosucceed.Figure10.6showsthescatterplot,histogram,andNormalplotfortheresidualsaftertransformation.(Theselogarithmsarenatural,tobasee,ratherthantobase10.Itmakesno
![Page 285: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/285.jpg)
differencetothefinalresultandthecalculationsarethesametothecomputer.)ThefittotheNormaldistributionisnotperfect,butforeachtransformationismuchbetterthaninFigure10.5.TheloglooksthebestfortheequalityofvarianceandtheNormaldistribution.Wecouldusethetwo-sampletmethodonthesedataquitehappily.
Table10.4showstheresultsofthetwosampletmethodusedwiththeraw,untransformeddataandwitheachtransformation.ThetteststatisticincreasesanditsassociatedprobabilitydecreasesaswemoveclosertoaNormaldistribution,reflectingtheincreasingpowerofthettestasitsassumptionsaremorecloselymet.Table10.4alsoshowstheratioofthevariancesinthetwosamples.Wecanseethat,asthetransformeddatagetsclosertoaNormaldistribution,thevariancestendtobecomemoreequalalso.
Thetransformeddataclearlygivesabettertestofsignificancethantherawdata.Theconfidenceintervalsforthetransformeddataaremoredifficulttointerpret,however,sothegainhereisnotsoapparent.Theconfidencelimitsforthedifferencecannotbetransformedbacktotheoriginalscale.Ifwetryit,thesquarerootandreciprocallimitsgiveludicrousresults.Theloggivesinterpretableresults(0.89to2.03)butthesearenotlimitsforthedifferencein
millimetres.Howcouldtheybe,fortheydonotcontainzeroyetthedifferenceisnotsignificant?Theyareinfactthe95%confidencelimitsfortheratiooftheCrohn'sdiseasegeometricmeantothecoeliacdiseasegeometricmean(§7.4).Iftherewerenodifference,ofcourse,theexpectedvalueofthisratiowouldbeone,notzero,andsolieswithinthelimits.Thereasonisthatwhenwetakethedifferencebetweenthelogarithmsoftwonumbers,wegetthelogarithmoftheirratio,notoftheirdifference(§5A).
Table10.4.Bicepsskinfoldthicknesscomparedfortwogroupsofpatients,usingdifferenttransformations
![Page 286: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/286.jpg)
Transformation
Two-samplettest,27d.f.
95%Confidenceintervalfordifferenceontransformedscale
Varianceratio,
larger/smallert P
None,rawdata
1.28 0.21 -0.71to3.07mm
1.52
Squareroot 1.38 0.18 -0.140to0.714
1.16
Logarithm 1.48 0.15 -0.114to0.706
1.10
Reciprocal -1.65 0.11 -0.203to0.022
1.63
Becausethelogtransformationistheonlyonewhichgivesusefulconfidenceintervals,Iwoulduseitunlessitwereclearlyinadequateforthedata,andanothertransformationclearlysuperior.Whenthishappenswearereducedtoasignificancetestonly,withnomeaningfulestimate.
10.5DeviationsfromtheassumptionsoftmethodsThemethodsdescribedinthischapterdependonsomestrongassumptionsaboutthedistributionsfromwhichthedatacome.Thisoftenworriesusersofstatisticalmethods,whofeelthattheseassumptionsmustlimitgreatlytheuseoftdistributionmethodsandfindtheattitudeofmanystatisticians,whooftenusemethodsbasedon
![Page 287: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/287.jpg)
Normalassumptionsalmostasamatterofcourse,rathersanguine.Weshalllookatsomeconsequencesofdeviationsfromtheassumptions.
Firstweshallconsideranon-Normaldistribution.Aswehaveseen,somevariablesconformverycloselytotheNormaldistribution,othersdonot.Deviationsoccurintwomainways:groupingandskewness.Groupingoccurswhenacontinuousvariable,suchashumanheight,ismeasuredinunitswhicharefairlylargerelativetotherange.Thishappens,forexample,ifwemeasurehumanheighttothenearestinch.TheheightsinFigure10.2weretothenearestinch,andthefittothetdistributionisverygood.Thiswasaverycoarsegrouping,asthestandarddeviationofheightswas2.5inchesandso95%ofthe3000observationshadvaluesoverarangeof10inches,only10or11possiblevaluesinall.WecanseefromthisthatiftheunderlyingdistributionisNormal,roundingthemeasurementisnotgoingtoaffecttheapplicationofthetdistributionbymuch.
Theotherassumptionofthetwo-sampletmethodisthatthevariancesinthetwopopulationsarethesame.Ifthisisnotcorrect,thetdistributionwillnotnecessarilyapply.TheeffectisusuallysmallifthetwopopulationsarefromaNormaldistribution.Thissituationisunusualbecause,forsamplesfromthesamepopulation,meanandvarianceareindependentifthedistributionisNormal(§7A).Thereisanapproximatetmethod,aswenotedin§10.3.However,unequalvarianceismoreoftenassociatedwithskewnessinthedata,inwhich
![Page 288: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/288.jpg)
caseatransformationdesignedtocorrectonefaultoftentendstocorrecttheotheraswell.
Boththepairedandtwo-sampletmethodsarerobusttomostdeviationsfromtheassumptions.Onlylargedeviationsaregoingtohavemucheffectonthesemethods.Themainproblemiswithskeweddataintheone-samplemethod,butforreasonsgivenin§10.2,thepairedtestwillusuallyprovidedifferenceswithareasonabledistribution.Ifthedatadoappeartobenon-Normal,thenaNormalizingtransformationwillimprovematters.Ifthisdoesnotwork,thenwemustturntomethodswhichdonotrequiretheseassumptions(§9.2,§12.2,§12.3).
10.6Whatisalargesample?Inthischapterwehavelookedatsmallsampleversionsofthelargesamplemethodsof§8.5and§9.7.Thereweignoredboththedistributionofthevariableandthevariabilityofs2,onthegroundsthattheydidnotmatterprovidedthesampleswerelarge.Howsmallcanalargesamplebe?Thisquestioniscriticaltothevalidityofthesemethods,butseldomseemstobediscussedintextbooks.
Providedtheassumptionsofthettestapply,thequestioniseasyenoughtoanswer.InspectionofTable10.1willshowthatfor30degreesoffreedomthe5%pointis2.04,whichissoclosetotheNormalvalueof1.96thatitmakeslittledifferencewhichisused.SoforNormaldatawithuniformvariancewecanforgetthetdistributionwhenwehavemorethan30observations.
Whenthedataarenotinthishappystate,thingsarenotsosimple.Ifthetmethodisnotvalid,wecannotassumethatalargesamplemethodwhichapproximatestoitwillbevalid.Irecommendthefollowingroughguide.First,ifindoubt,treatthesampleassmall.Second,transformtoaNormaldistributionifpossible.Inthepairedcaseyoushouldtransformbeforesubtraction.Third,themorenon-Normalthedata,thelargerthesampleneedstobebeforewecan
ignoreerrorsintheNormalapproximation.
![Page 289: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/289.jpg)
Table10.5.Bloodzidovudinelevelsattimesafteradministrationofthedrugbypresenceoffatmalabsorption
Timesinceadministrationofzidovudine
0 15 30 45 60 90 120 150
Malabsorptionpatients
0.08 13.15 5.70 3.22 2.69 1.91 1.72 1.22
0.08 0.08 0.14 2.10 6.37 4.89 2.11 1.40
0.08 0.08 3.29 3.47 1.42 1.61 1.41 1.09
0.08 0.08 1.33 1.71 3.30 1.81 1.16 0.69
0.08 6.69 8.27 5.02 3.98 1.90 1.24 1.01
0.08 4.28 4.92 1.22 1.17 0.88 0.34 0.24
0.08 0.13 9.29 6.03 3.65 2.32 1.25 1.02
0.08 0.64 1.19 1.65 2.37 2.07 2.54 1.34
0.08 2.39 3.53 6.28 2.61 2.29 2.23 1.97
Normalabsorptionpatients:
![Page 290: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/290.jpg)
0.08 3.72 16.02 8.17 5.21 4.84 2.12 1.50
0.08 6.72 5.48 4.84 2.30 1.95 1.46 1.49
0.08 9.98 7.28 3.46 2.42 1.69 0.70 0.76
0.08 1.12 7.27 3.77 2.97 1.78 1.27 0.99
0.08 13.37 17.61 3.90 5.53 7.17 5.16 3.84
Thereisnosimpleanswertothequestion:‘howlargeisalargesample?’.Weshouldbereasonablysafewithinferencesaboutmeansifthesampleisgreaterthan100forasinglesample,orifbothsamplesaregreaterthan50fortwosamples.Theapplicationofstatisticalmethodsisamatterofjudgementaswellasknowledge.
10.7*SerialdataTable10.5showslevelsofzidovudine(AZT)inthebloodofAIDSpatientsatseveraltimesafteradministrationofthedrug,forpatientswithnormalfatabsorptionorfatmalabsorption.AlinegraphofthesedatawasshowninFigure5.6.Onecommonapproachtosuchdataistocarryoutatwo-samplettestateachtimeseparately,andresearchersoftenaskatwhattimethedifferencebecomessignificant.Thisisamisleadingquestion,assignificanceisapropertyofthesampleratherthanthepopulation.Thedifferenceat15minmaynotbesignificantbecausethesampleissmallandthedifferencetobedetectedissmall,notbecausethereisnodifferenceinthepopulation.Further,ifwedothisforeachtimepointwearecarryingoutmultiplesignificancetests(§9.10)andeachtestonlyusesasmallpartofthedatasowearelosingpower(§9.9).Itisbettertoaskwhetherthereisanyevidenceofadifferencebetweentheresponseofnormalandmalabsorptionsubjectsoverthewholeperiodofobservation.
Thesimplestapproachistoreducethedataforasubjecttoone
![Page 291: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/291.jpg)
number.Wecanusethehighestvalueattainedbythesubject,thetimeatwhichthispeakvaluewasreached,ortheareaunderthecurve.Thefirsttwoareself-explanatory.
Theareaunderthecurveor(AUC)isfoundbydrawingalinethroughallthepointsandfindingtheareabetweenitandthehorizontalaxis.The‘curve’isususallyformedbyaseriesofstraightlinesfoundbyjoiningallthepointsforthesubject,andFigure10.7showsthisforthefirstsubjectinTable10.5.Theareaunderthecurvecanbecalculatedbytakingeachstraightlinesegmentandcalculatingtheareaunderthis.Thisisthebasemultipliedbytheaverageofthetwoverticalheights.Wecalculatethisforeachlinesegment,i.e.betweeneachpairofadjacenttimepoints,andadd.Thusforthefirstsubjectweget(15-0)×(0.08+13.15)/2+(30-15)×(13.15+5.70)/2+…+(360-300)×(0.43+0.32)/2=667.425.Thiscanbedonefairlyeasilybymoststatisticalcomputerpackages.TheareaforeachsubjectisshowninTable10.6.
Fig.10.7.Calculationoftheareaunderthecurveforonesubject
![Page 292: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/292.jpg)
Table10.6.AreaunderthecurvefordataofTable10.5
Malabsorptionpatients Normalpatients
667.425 256.275 919.875
569.625 527.475 599.850
306.000 388.800 499.500
298.200 505.875 472.875
617.850 1377.975
![Page 293: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/293.jpg)
Fig.10.8.NormalplotsforareaunderthecurveandlogareaforthedataofTable10.5
10.8*ComparingtwovariancesbytheFtestWecantestthenullhypothesisthattwopopulationvariancesareequalusingtheFdistribution.ProvidedthedataarefromaNormaldistribution,theratiooftwoindependentestimatesofthesamevariancewillfollowaFdistribution(§7A),thedegreesoffreedombeingthedegreesoffreedomofthetwoestimates.TheFdistributionisdefinedasthatoftheratiooftwoindependentChi-squaredvariablesdividedbytheirdegreesoffreedom:
wheremandnarethedegreesoffreedom(§7A).ForNormaldatathedistributionofasamplevariances2fromnobservationsisthatofσ2χ2n/(n-1)andwhenwedivideoneestimateofvariancebyanothertogivetheFratio,theσ2cancelsout.LikeotherdistributionsderivedfromtheNormal,theFdistributioncannotbeintegratedandsowemustuseatable.Becauseithastwodegreesoffreedom,thetableiscumbersome,coveringseveralpages,andIshallomitit.MostFmethodsaredoneusingcomputerprogramswhichcalculatetheprobabilitydirectly.Thetableisusuallyonlygivenastheupperpercentagepoints.
![Page 294: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/294.jpg)
Totestthenullhypothesis,wedividethelargervariancebythesmaller.Fortheskinfolddataof§10.4,thevariancesare5.860with19degreesoffreedomfortheCrohn'spatientsand3.860with8degreesoffreedomforthecoeliacs,givingF=5.860/3.860=1.52.TheprobabilityofthisbeingexceededbytheFdistributionwith19and8degreesoffreedomis0.3,the5%pointofthedistributionbeing3.16,sothereisnoevidencefromthesedatathatthevarianceofskinfolddiffersbetweenpatientswithCrohn'sdiseaseandcoeliacdisease.
SeveralvariancescanbecomparedbyBartlett'stestortheLevenetest(seeArmitageandBerry1994,SnedecorandCochran1980).
Table10.7.MannitolandlactulosegutpermeabilitytestsinagroupofHIVpatientsandcontrols
HIVstatus Diarrhoea %
Mannitol %lactulose HIVstatus Diarrhoea
AIDS Yes 14.9 1.17 ARC Yes
AIDS Yes 7.074 1.203 ARC No
AIDS Yes 5.693 1.008 ARC No
AIDS Yes 16.82 0.367 HIV+ No
AIDS Yes 4.93 1.13 HIV+ No
AIDS Yes 9.974 0.545 HIV+ No
![Page 295: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/295.jpg)
AIDS Yes 2.069 0.14 HIV+ No
AIDS Yes 10.9 0.86 HIV+ No
AIDS Yes 6.28 0.08 HIV+ No
AIDS Yes 11.23 0.398 HIV+ No
AIDS No 13.95 0.6 HIV- No
AIDS No 12.455 0.4 HIV- No
AIDS No 10.45 0.18 HIV- No
AIDS No 8.36 0.189 HIV- No
AIDS No 7.423 0.175 HIV- No
AIDS No 2.657 0.039 HIV- No
AIDS No 19.95 1.43 HIV- No
AIDS No 15.17 0.2 HIV- No
AIDS No 12.59 0.25 HIV- No
AIDS No 21.8 1.15 HIV- No
AIDS No 11.5 0.36 HIV- No
![Page 296: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/296.jpg)
AIDS No 10.5 0.33 HIV- No
AIDS No 15.22 0.29 HIV- No
AIDS No 17.71 0.47 HIV- No
AIDS Yes 7.256 0.252 HIV- No
AIDS No 17.75 0.47 HIV- No
ARC Yes 7.42 0.21 HIV- No
ARC Yes 9.174 0.399 HIV- No
ARC Yes 9.77 0.215 HIV- No
ARC No 22.03 0.651
10.9*ComparingseveralmeansusinganalysisofvarianceConsiderthedataofTable10.7.Thesearemeasuresofgutpermeabilityobtainedfromfourgroupsofsubjects,diagnosedwithAIDS,AIDSrelatedcom-plex(ARC),asymptomaticHIVpositive,andHIVnegativecontrols.Wewanttoinvestigatethedifferencesbetweenthegroups.
Oneapproachwouldbetousethettesttocompareeachpairofgroups.Thishasdisadvantages.First,therearemanycomparisons,m(m-1)/2wheremisthenumberofgroups.Themoregroupswehave,themorelikelyitisthattwoofthemwillbefarenoughaparttoproducea
![Page 297: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/297.jpg)
‘significant’differencewhenthenullhypothesisistrueandthepopulationmeansarethesame(§9.10).Second,whengroupsaresmall,theremaynotbemanydegreesoffreedomfortheestimateofvariance.Ifwecanuseallthedatatoestimatevariancewewillhavemore
degreesoffreedomandhenceamorepowerfulcomparison.Wecandothisbyanalysisofvariance,whichcomparesthevariationbetweenthegroupstothevariationwithinthegroups.
Table10.8.Someartificialdatatoillustratehowanalysisofvarianceworks
Group1 Group2 Group3 Group4
6 4 7 3
7 5 9 5
8 6 10 6
8 6 11 6
9 6 11 6
11 8 13 8
Mean 8.167 5.833 10.167 5.667
Toillustratehowtheanalysisofvariance,oranova,works,Ishalluse
![Page 298: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/298.jpg)
someartificialdata,assetoutinTable10.8.Inpractice,equalnumbersineachgroupareunusualinmedicalapplications.Westartbyestimatingthecommonvariancewithinthegroups,justaswedoinatwo-samplettest(§10.3).Wefindthesumofsquaresaboutthegroupmeanforeachgroupandaddthem.Wecallthisthewithingroupssumofsquares.ForTable10.8thisgives57.833.Foreachgroupweestimatethemeanfromthedata,sowehaveestimated4parametersandhave24-4=20degreesoffreedom.Ingeneral,formgroupsofsizeneachwehavenm-m=m(n-1)degreesoffreedom.Thisgivesusanestimateofvarianceof
Thisisthewithingroupsvarianceorresidualvariance.Thereisanassumptionhere.Foracommonvariance,weassumethatthevariancesarethesameinthefourpopulationsrepresentedbythefourgroups.
Wecanalsofindanestimateofvariancefromthegroupmeans.Thevarianceofthefourgroupmeansis4.562.Iftherewerenodifferencebetweenthemeansinthepopulationfromwhichthesamplecomes,thisvariancewouldbethevarianceofthesamplingdistributionofthemeanofnobservations,whichiss2/n,thesquareofthestandarderror(§8.2).Thusntimesthisvarianceshouldbeequaltothewithingroupsvariance.Fortheexample,thisis4.562×6=27.375.whichismuchgreaterthanthe2.892foundwithinthegroups.Weexpressthisbytheratioofonevarianceestimatetotheother,betweengroupsoverwithingroups,whichwecallthevarianceratioorFratio.IfthenullhypothesisistrueandiftheobservationsarefromaNormaldistributionwithuniformvariance,thisratiofollowsaknowndistribution,theFdistributionwithm-1andn-1degreesoffreedom(§10.8).
Fortheexamplewewouldhave3and20degreesoffreedomand
Ifthenullhypothesisweretrue,theexpectedvalueofthisratiowouldbe1.0.
Alargevaluegivesusevidenceofadifferencebetweenthemeansin
![Page 299: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/299.jpg)
thefourpopulations.Fortheexamplewehavealargevalueof9.47andtheprobabilityofgettingavalueasbigasthisifthenullhypothesisweretruewouldbe0.0004.Thusthereisasignificantdifferencebetweenthefourgroups.
Table10.9.One-wayanalysisofvarianceforthedataofTable10.8
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 23 139.958
Betweengroups
3 82.125 27.375 9.47 0.0004
Withingroups
20 57.833 2.892
Table10.10.One-wayanalysisofvarianceforthemannitoldata
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 58 1559.036
![Page 300: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/300.jpg)
Betweengroups
3 49.012 16.337 0.6
Residual 55 1510.024 27.455
Wecansetthesecalculationsoutinananalysisofvariancetable,asshowninTable10.9.Thesumofsquaresinthe‘betweengroups’rowisthesumofsquaresofthegroupmeanstimesn.Wecallthisthebetweengroupssumofsquares.Noticethatinthe‘degreesoffreedom’and‘sumofsquares’columnsthe‘withingroups’and‘betweengroups’rowsadduptothetotal.Thewithingroupssumofsquaresisalsocalledtheresidualsumofsquares,becauseitiswhatisleftwhenthegroupeffectisremoved,ortheerrorsumofsquares,becauseitmeasurestherandomvariationorerrorremainingwhenallsystematiceffectshavebeenremoved.
Thesumofsquaresofthewholedata,ignoringthegroupsiscalledthetotalsumofsquares.Itisthesumofthebetweengroupsandwithingroupssumofsquares.
Returningtothemannitoldata,assooftenhappensthegroupsareofunequalsize.Thecalculationofthebetweengroupssumofsquaresbecomesmorecomplicatedandweusuallydoitbysubtractingthewithingroupssumofsquaresfromthetotalsumofsquares.Otherwise,thetableisthesame,asshowninTable10.10.Asthesecalculationsareusuallydonebycomputertheextracomplexityincalculationdoesnotmatter.Herethereisnosignificantdifferencebetweenthegroups.
Ifwehaveonlytwogroups,one-wayanalysisofvarianceisanotherwayofdoingatwo-samplettest.Forexample,theanalysisofvariancetableforthecomparisonofaveragecapillarydensity(§10.3)isshowninTable10.11.TheprobabilityisthesameandtheFratio,25.78,isthesquareofthetstatistic,5.08.Theresidualmeansquareisthecommonvarianceofthettest.
![Page 301: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/301.jpg)
Table10.11.One-wayanalysisofvarianceforthecomparisonofmeancapillarydensitybetweenulceratedpatientsandcontrols,
Table10.2
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 41 3506.57
Betweengroups
1 1374.114 1374.114 25.78 <0.0001
Residual 40 2132.458 53.311
Fig.10.9.Plotsofthemannitoldata,showingthattheassumptionsofNormaldistributionandhomoscedasticityarereasonable
![Page 302: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/302.jpg)
10.10*AssumptionsoftheanalysisofvarianceTherearetwoassumptionsforanalysisofvariance:thatdatacomefromNormaldistributionswithinthegroupsandthatthevariancesofthesedistributionsarethesame.Thetechnicaltermforuniformityofvarianceishomoscedasticity;lackofuniformityisheteroscedasticity.Heteroscedasticitycanaffectanalysesofvariancealotandwetrytoguardagainstit.
Wecanexaminetheseassumptionsgraphically.Formannitol(Figure10.9)thescatterplotforthegroupsshowsthatthespreadofdataineachgroupissimilar,suggestingthattheassumptionofuniformvarianceismet,thehistogramlooksNormalandNormalplotlooksstraight.Thisisnotthecaseforthelactulosedata,asFigure10.10shows.ThevariancesarenotuniformandthehistogramandNormalplotsuggestpositiveskewness.Asisoftenthecase,thegroupwiththehighestmean,AIDS,hasthegreatestspread.Thesquareroottransformationofthelactulosefitsbetter,givingagoodNormaldistributionalthoughthevariabilityisnotuniform.Thelogtransformover-compensatesforskewness,byproducingskewnessintheoppositedirection,thoughthevariancesappearuniform.Eitherthesquarerootorthelogarithmictransformationwouldbebetterthantherawdata.Ipickedthesquarerootbecausethedistributionlookedbetter.Table10.12showstheanalysisofvarianceforsquareroottransformedlactulose.
TherearealsosignificancetestswhichwecanapplyforNormaldistributionandhomoscedasticity.Ishallomitthedetails.
10.11*ComparisonofmeansafteranalysisofvarianceConcludingfromTables10.9and10.12thatthereisasignificantdifferencebetweenthemeansisratherunsatisfactory.Wewanttoknowwhichmeansdiffer
fromwhich.Thereareanumberofwaysofdoingthis,calledmultiplecomparisonsprocedures.ThesearemostlydesignedtogiveonlyonetypeIerror(§9.3)per20analyseswhenthenullhypothesisistrue,as
![Page 303: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/303.jpg)
opposedtodoingttestsforeachpairofgroups,whichgivesoneerrorper20comparisonswhenthenullhypothesisistrue.Ishallnotgointodetails,butlookatacoupleofexamples.Thereareseveraltestswhichcanbeusedwhenthenumbersineachgrouparethesame,Tukey'sHonestlySignificantDifference,theNewman-Keulssequentialprocedure(bothcalledStudentizedrangetests),Duncan'smultiplerangetest,etc.Theoneyouusewilldependonwhichcomputerprogramyouhave.TheresultsoftheNewman-KeulssequentialprocedureforthedataofTable10.8areshowninTable10.13.Group1issignificantlydifferentfromgroups2and4,andgroup3fromgroups2and4.Atthe1%level,theonlysignificantdifferencesarebetweengroup3andgroups2and4.
Fig.10.10.Plotsofthelactulosedataonthenaturalscaleandaftersquarerootandlogtransformation
![Page 304: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/304.jpg)
Table10.12.One-wayanalysisofvarianceforthesquareroottransformedlactulosedataofTable10.7
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 58 3.25441
HIVstatus
3 0.42870 0.14290 2.78 0.0495
Residual 55 2.82571 0.05138
Forunequal-sizedgroups,thechoiceofmultiplecomparisonproceduresis
morelimited,Gabriel'stestcanbeusedwithunequal-sizedgroups.Fortheroottransformedlactulosedata,theresultsofGabriel'stestareshowninTable10.14.ThisshowsthattheAIDSsubjectsaresignificantlydifferentfromtheasymptomaticHIV+patientsandfromtheHIV-controls.Forthemannitoldata,mostmultiplecomparisonprocedureswillgivenosignificantdifferencesbecausetheyaredesignedtogiveonlyonetypeIerrorperanalysisofvariance.WhentheFtestisnotsignificant,nogroupcomparisonswillbeeither.
Table10.13.TheNewman-KeulstestforthedataofTable10.8
![Page 305: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/305.jpg)
0.05level 0.01level
Group Group Group Group
1 2 3 1 2 3
2 S 2 N
3 N S 3 N S
4 S N S 4 N N S
S=significant,N=notsignificant.
Table10.14.Gabriel'stestfortheroottransformedlactulosedata
0.05level 0.01level
Group Group Group Group
AIDS ARC HIV+ AIDS ARC HIV+
ARC N ARC N
HIV+ S N HIV+ N N
HIV- S N N HIV- N N N
![Page 306: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/306.jpg)
S=significant,N=notsignificant.
10.12*RandomeffectsinanalysisofvarianceAlthoughthetechniqueiscalledanalysisofvariance,in§10-9-11wehavebeenusingitforthecomparisonofmeans.Inthissectionweshalllookatanotherapplication,whereweshallindeeduseanovatolookatvariances.Whenweestimateandcomparethemeansofgroupsrepresentingdifferentdiagnoses,differenttreatments,etc.,wecallthesefixedeffects.Inotherapplications,groupsaremembersofarandomsamplefromalargerpopulationand,ratherthanestimatethemeanofeachgroup,weestimatethevariancebetweenthem.Wecalledthesegroupsrandomeffects.
ConsiderTable10.15,whichshowsrepeatedmeasurementsofpulserateonagroupofmedicalstudents.Eachmeasurementwasmadebyadifferentobserver.Observationsmaderepeatedlyunderthesamecircumstancesarecalledreplicatesandherewehavetworeplicatespersubject.Wecandoaonewayanalysisofvarianceonthesedata,withsubjectasthegroupingfactor(Table10.16).
ThetestofsignificanceinTable10.16isredundant,becauseweknoweachpairofmeasurementsisfromadifferentperson,andthenullhypothesisthatallpairsarefromthesamepopulationisclearlyfalse.Whatwecanusethisanova
foristoestimatesomevariances.Therearetwodifferentvariancesinthedata.Oneisbetweenmeasurementsonthesameperson,thewithin-subjectvariancewhichweshalldenotebyσ2w.Inthisexamplethewithinsubjectvarianceisthemeasurementerror,andweshallassumeitisthesameforeveryone.Theotheristhevariancebetweenthesubjects'trueoraveragepulserates,aboutwhichtheindividualmeasurementsforasubjectaredistributed.Thisistheaverageofallpossiblemeasurementsforthatsubject,nottheaverageofthetwomeasurementsweactuallyhave.Thisvarianceisthebetween-subjectsvarianceandweshalldenoteitbyσ2b.Asinglemeasurementobserved
![Page 307: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/307.jpg)
fromasingleindividualisthesumofthesubject'struepulserateandthemeasurementerror.Suchmeasurementsthereforehavevarianceσ2b+σ2w.Wecanestimateboththesevariancesfromtheanovatable.
Table10.15.Pairedmeasurementsof30secondpulsein45medicalstudents
Subject PulseAB Subject PulseA
B Subject PulseAB
1 46 42 16 34 36 31 43 43
2 50 42 17 30 36 32 30 29
3 39 37 18 35 45 33 31 36
4 40 54 19 32 34 34 43 43
5 41 46 20 44 46 35 38 43
6 35 35 21 39 42 36 31 37
7 31 44 22 34 37 37 45 43
8 43 35 23 36 38 38 39 43
9 47 45 24 33 34 39 48 48
10 48 36 25 34 35 40 40 40
![Page 308: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/308.jpg)
11 32 46 26 51 48 41 46 45
12 36 34 27 31 30 42 44 42
13 37 30 28 30 31 43 36 34
14 34 36 29 42 43 44 33 28
15 38 36 30 39 35 45 39 42
Table10.16.One-wayanalysisofvarianceforthe30secondpulsedataofTable10.15
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 89 3.054.99
Betweensubjects
44 2408.49 54.74 3.81 <0.0001
Withinsubjects
45 646.50 14.37
Forthesimpleexampleofthesamenumberofreplicatesmoneachof
![Page 309: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/309.jpg)
nsubjects,theestimationofthevariancesisquitesimple.Weestimateσ2w,directlyfromthemeansquarewithinsubjects,MSw,givinganestimates2w.Wecanshow(althoughIshallomitit)thatthemeansquarebetweensubjects,MSb,isanestimateofmσ2b+σ2w.Thevarianceratio,F=MSb/MSw,willbeexpectedtobe1.0ifσ2b=0,i.e.ifthenullhypothesisthatallsubjectsarethesameistrue.Wecanestimateσ2bbys2b=(MSb-MSw)/m.
Fortheexample,s2w=14.37ands2b=(54.74-14.37)/2=20.19.Thusthe
variabilitybetweenmeasurementsbydifferentobserversonthesamesubjectisnotmuchlessthanthevariabilitybetweentheunderlyingpulseratebetweendifferentsubjects.Themeasurement(bytheseuntrainedandinexperienceobservers)doesnottellusmuchaboutthesubjects.Weshallseeapracticalapplicationinthestudyofmeasurementerrorandobservervariationin§15.2,andconsideranotheraspectofthisanalysis,intraclasscorrelation,in§11.13.
Table10.17.NumberofX-rayrequestsconformingtotheguidelinesforeachpracticeintheinterventionandcontrolsgroups(Oakeshott
al1994)
Interventiongroup Controlgroup
Numberofrequests Percentage Numberof
requests
Total Conforming conforming Total Conforming
20 20 100 7 7
![Page 310: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/310.jpg)
7 7 100 37 33
16 15 94 38 32
31 28 90 28 23
20 18 90 20 16
24 21 88 19 15
7 6 86 9 7
6 5 83 25 19
30 25 83 120 90
66 53 80 89 64
5 4 80 22 15
43 33 77 76 52
43 32 74 21 14
23 16 70 127 83
64 44 69 22 14
6 4 67 34 21
![Page 311: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/311.jpg)
18 10 56 10 4
Total 429 341 704 509
Mean 81.6
SD 11.9
Ifwehavedifferentnumbersofreplicatespersubjectorotherfactorstoconsider(e.g.ifeachobservermadetworepeatedmeasurements)theanalysisbecomesfiendishlycomplicated(seeSearleetal.1992,ifyoumust).Theseestimatesofvariancedeserveconfidenceintervalslikeanyotherestimate,buttheseareevenmorefiendishlycomplicated,asBurdickandGraybill(1992)convincinglydemonstrate.Iwouldrecommendyouconsultastatisticianexperiencedinthesematters,ifyoucanfindone.
10.13*Unitsofanalysisandcluster-randomizedtrialsAcluster-randomizedstudy(§2.11)isonewhereagroupofsubjects,suchasthepatientsinahospitalwardorageneralpracticelist,arerandomizedtothesametreatmenttogether.Thetreatmentmightbeappliedtopatientdirectly,suchasanofferofbreastcancerscreeningtoalleligiblewomeninadistrict,orbeappliedtothecareprovider,suchastreatmentguidelinesgiventotheGP.Thedesignofthestudymustbetakenintoaccountintheanalysis.
Table10.17showsanexample(Oakeshottetal.1994,KerryandBland1998).
![Page 312: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/312.jpg)
Fig.10.11.ScatterplotsandNormalplotsforthedataofTable10.17,showingtheeffectofanarcsinesquareroottransformation
InthisstudyguidelinesastoappropriatereferralforX-rayweregiventoGPsin17practicesandanother17practicesservedascontrols.Wecouldsaywehave341outof429appropriatereferralsinthetreatedgroupand509outof704inthecontrolgroupandcomparetheseproportionsasin§8.6and§9.8.Thiswouldbewrong,becausetofollowaBinomialdistribution,allthereferralsmustbeindependent(§6.4).Theyarenot,astheindividualGPmayhaveaprofoundeffectonthedecisiontorefer.Evenwherethepractitionerisnotdirectlyinvolved,membersofaclustermaybemoresimilartooneanotherthentheyaretomembersofanotherclusterandsonotbeindependent.IgnoringtheclusteringmayresultinconfidenceintervalswhicharetoonarrowandPvalueswhicharetoosmall,producingspurioussignificantdifferences.
Theeasiestwaytoanalysethedatafromsuchstudiesistomaketheexperimentalunit,thatwhichisrandomized(§2.11),theunitofanalysis.Wecanconstructasummarystatisticforeachclusterand
![Page 313: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/313.jpg)
thenanalysethesesummaryvalues.Theideaissimilartotheanalysisofrepeatedmeasurementsonthesamesubject,whereweconstructasinglesummarystatisticoverthetimesforeachindividual(§10.7).ForTable10.17,thepractice'spercentageofreferralswhichareappropriateisthesummarystatistic.Themeanpercentagesinthetwogroupscanthenbecomparedbythetwo-sampletmethod.Theobserveddifferenceis81.6–73.6=8.0andthestandarderrorofthedifferenceis4.3.Thereare32degreesoffreedomand,fromTable10.1,the5%pointofthetdistributionis2.04.Thisgivesa95%confidenceintervalforthetreatmentdifferenceof
8.0±2.037×4.3,or-1to17percentagepoints.Forthetestofsignificance,theteststatisticis8.0/4.3=1.86,P=0.07.
Inthisexample,eachobservationisaBinomialproportion,sowecouldconsideranarcsinesquareroottransformationoftheproportions(§10.4).AsFigure10.11shows,ifanythingthetransformationmakesthefittotheNormaldistributionworse.ThisisreflectedinalargerPvalue,givingP=0.10.
Thereisawidelyvaryingnumberofreferrals,betweenpractices,whichmustreflectthelistsizeandnumberofGPsinthepractice.Wecantakethisintoaccountwithananalysiswhichweightseachobservationbythenumbersofreferrals.BlandandKerry(1998)givedetails.
Appendices
10AAppendix:Theratiomean/standarderror
![Page 314: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/314.jpg)
Asifbymagic,wehaveoursamplemeanoveritsstandarderror.Ishallnotbothertogointothisdetailfortheothersimilarratioswhichweshallencounter.AnyquantitywhichfollowsaNormaldistributionwithmeanzero(suchas[xwithbarabove]-µ),dividedbyitsstandarderror,willfollowatdistributionprovidedthestandarderrorisbasedononesumofsquaresandhenceisrelatedtotheChi-squareddistribution.
10MMultiplechoicequestions50to56(Eachbranchiseithertrueorfalse)
50.Thepairedttestis:
(a)impracticalforlargesamples;
(b)usefulfortheanalysisofqualitativedata;
(c)suitableforverysmallsamples;
![Page 315: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/315.jpg)
(d)usedforindependentsamples;
(e)basedontheNormaldistribution.
ViewAnswer
51.Whichofthefollowingconditionsmustbemetforavalidttestbetweenthemeansoftwosamples:
(a)thenumbersofobservationsmustbethesameinthetwogroups;
(b)thestandarddeviationsmustbeapproximatelythesameinthetwogroups;
(c)themeansmustbeapproximatelyequalinthetwogroups;
(d)theobservationsmustbefromapproximatelyNormaldistributions;
(e)thesamplesmustbesmall.
ViewAnswer
52.Inatwo-sampleclinicaltrial,oneoftheoutcomemeasureswashighlyskewed.Totestthedifferencebetweenthelevelsofthismeasureinthetwogroupsofpatients,possibleapproachesinclude:
(a)astandardttestusingtheobservations;
(b)aNormalapproximationifthesampleislarge;
(c)tranaformingthedatatoaNormaldistributionandusingattest;
(d)asigntest;
(e)thestandarderrorofthedifferencebetweentwoproportions.
ViewAnswer
53.Inthetwo-samplettest,deviationfromtheNormaldistributionbythedatamayseriouslyaffectthevalidityofthetestif:
(a)thesamplesizesareequal;
![Page 316: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/316.jpg)
(b)thedistributionfollowedbythedataishighlyskewed;
(c)onesampleismuchlargerthantheother;
(d)bothsamplesarelarge;
(e)thedatadeviatefromaNormaldistributionbecausethemeasurementunitislargeandonlyafewvaluesarepossible.
ViewAnswer
Table10.18.Semenanalysesforsuccessfulandunsuccessfulspermdonors(Paraskevaidesetal.1991)
Successfuldonors Unsuccessfuldonors
n Mean (sd) n Mean (sd)
Volume(ml)
17 3.14 (1.28) 19 2.91 (0.91)
Semencount(106/ml)
18 146.4 (95.7) 19 124.8 (81.8)
%Motility 17 60.7 (9.7) 19 58.5 (12.8)
%Abnormalmorphology
13 22.8 (8.4) 16 20.3 (8.5)
Alldifferencesnotsignificant,ttest.
![Page 317: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/317.jpg)
54.Table10.18showsacomparisonofsuccessful(i.e.fertile)andunsuccessfulartificialinseminationdonors.Theauthorsconcludedthat‘Conventionalsemenanalysismaybetooinsensitiveanindicatorofhighfertility[inAID]’:
(a)thetablewouldbemoreinformativeifPvaluesweregiven;
(b)thettestisimportanttotheconclusiongiven:
(c)itislikelythatsemencountfollowsaNormaldistribution;
(d)ifthenullhypothesisweretrue,thesamplingdistributionofthetteststatisticforsemencountwouldapproximatetoatdistribution;
(e)ifthenullhypothesiswerefalse,thepowerofthettestforsemencountcouldbeincreasedbyalogtransformation.
ViewAnswer
55.IfwetakesamplesofsizenfromaNormaldistributionandcalculatethesamplemean[xwithbarabove]andvariances2:
(a)sampleswithlargevaluesof[xwithbarabove]willtendtohavelarges2;
(b)thesamplingdistributionof[xwithbarabove]willbeNormal;
(c)thesamplingdistributionofs2willberelatedtotheChi-squareddistributionwithn-1degreesoffreedom;
(e)thesamplingdistributionofswillbeapproximatelyNormalifn>20.
ViewAnswer
56.Intheone-wayanalysisofvariancetableforthecomparisonofthreegroups:
(a)thegroupmeansquare+theerrormeansquare=thetotal
![Page 318: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/318.jpg)
meansquare;
(b)therearetwodegreesoffreedomforgroups;
(c)thegroupsumofsquares+theerrorsumofsquares=thetotalsumofsquares;
(d)thenumbersineachgroupmustbeequal;
(e)thegroupdegreesoffreedom+theerrordegreesoffreedom=thetotaldegreesoffreedom.
ViewAnswer
10EExercise:ThepairedtmethodTable10.19showsthetotalstaticcomplianceoftherespiratorysystemandthearterialoxygentension(pa(O2))in16patientsinintensivecare(Al-Saady,personalcommunication).Thepatients'breathingwasassistedbyarespirator
andthequestionwaswhethertheirrespirationcouldbeimprovedbyvaryingthecharacteristicsoftheairflow.Table10.19comparesaconstantinspiratoryflowwaveformwithadeceleratinginspiratoryflowwaveform.Weshallexaminetheeffectofwaveformoncompliance.
Table10.19.pa(O2)andcompliancefortwoinspiratoryflowwaveforms
Patient pa(O2)(kPa)Compliance(ml/cmH2O)
Waveform Waveform
Constant Decelerating Constant Decelerating
![Page 319: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/319.jpg)
1 9.1 10.8 65.4 72.9
2 5.6 5.9 73.7 94.4
3 6.7 7.2 37.4 43.3
4 8.1 7.9 26.3 29.0
5 16.2 17.0 65.0 66.4
6 11.5 11.6 35.2 36.4
7 7.9 8.4 24.7 27.7
8 7.2 10.0 23.0 27.5
9 17.7 22.3 133.2 178.2
10 10.5 11.1 38.4 39.3
11 9.5 11.1 29.2 31.8
12 13.7 11.7 28.3 26.9
13 9.7 9.0 46.6 45.0
14 10.5 9.9 61.5 58.2
15 6.9 6.3 25.7 25.7
![Page 320: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/320.jpg)
16 18.1 13.9 48.7 42.3
1.Calculatethechangesincompliance.Findastemandleafplot(hint:youwillneedbothazeroandaminuszerorow).
ViewAnswer
2.Asacheckonthevalidityofthetmethod,plotthedifferenceagainstthesubject'smeancompliance.Dotheyappeartoberelated?
ViewAnswer
3.Calculatethemean,variance,standarddeviationandstandarderrorofthemeanforthecompliancedifferences.
ViewAnswer
4.EventhoughthecompliancedifferencesarefarfromaNormaldistribution,calculatethe95%confidenceintervalusingthetdistribution.Wewillcomparethiswiththatfortransformeddata.
ViewAnswer
5.Findthelogarithmsofthecomplianceandrepeatsteps1to3.Dotheassumptionsofthetdistributionmethodapplymoreclosely?
ViewAnswer
6.Calculatethe95%confidenceintervalforthelogdifferenceandtransformbacktotheoriginalscale.Whatdoesthismeanandhowdoesitcomparetothatbasedontheuntransformeddata?
ViewAnswer
7.Whatcanbeconcludedabouttheeffectofinspiratorywaveformonstaticcomplianceinintensivecarepatients?
ViewAnswer
![Page 321: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/321.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>11-Regressionandcorrelation
11
Regressionandcorrelation
11.1ScatterdiagramsInthischapterIshalllookatmethodsofanalysingtherelationshipbetweentwoquantitativevariables.ConsiderTable11.1,whichshowsdatacollectedbyagroupofmedicalstudentsinaphysiologyclass.InspectionofthedatasuggeststhattheremaybesomerelationshipbetweenFEV1andheight.Beforetryingtoquantifythisrelationship,wecanplotthedataandgetanideaofitsnature.Theusualfirstplotisascatterdiagram,§5.6.Whichvariablewechooseforwhichaxisdependsonourideasastotheunderlyingrelationshipbetweenthem,asdiscussedbelow.Figure11.1showsthescatterdiagramforFEV1andheight.
InspectionofFigure11.1suggeststhatFEVlincreaseswithheight.Thenextstepistotryanddrawalinewhichbestrepresentstherelationship.Thesimplestlineisastraightone;IshallconsidermorecomplicatedrelationshipsinChapter17.
Theequationofastraightlinerelationshipbetweenvariablesxandyisy=a+bx,whereaandbareconstants.Thefirst,a,iscalledtheintercept.Itisthevalueofywhenxis0.Thesecond,b,iscalledtheslopeorgradientoftheline.Itistheincreaseinycorrespondingtoanincreaseofoneunitinx.TheirgeometricalmeaningisshowninFigure11.2.Wecanfindthevaluesofaandbwhichbestfitthedatabyregressionanalysis.
11.2Regression
![Page 322: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/322.jpg)
Regressionisamethodofestimatingthenumericalrelationshipbetween
variables.Forexample,wewouldliketoknowwhatisthemeanorexpectedFEV1forstudentsofagivenheight,andwhatincreaseinFEV1isassociatedwithaunitincreaseinheight.
Table11.1.FEV1andheightfor20malemedicalstudents
Height(cm)
FEV1(litres)
Height(cm)
FEV1(litres)
Height(cm)
FEV1(litres)
164.0 3.54 172.0 3.78 178.0 2.98
167.0 3.54 174.0 4.32 180.7 4.80
170.4 3.19 176.0 3.75 181.0 3.96
171.2 2.85 177.0 3.09 183.1 4.78
171.2 3.42 177.0 4.05 183.6 4.56
171.3 3.20 177.0 5.43 183.7 4.68
172.0 3.60 177.4 3.60
![Page 323: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/323.jpg)
Fig.11.1.ScatterdiagramshowingtherelationshipbetweenFEV1andheightforagroupofmalemedicalstudents
![Page 324: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/324.jpg)
Fig.11.2.Coefficientsofastraightline
Thename‘regression’isduetoGalton(1886),whodevelopedthetechniquetoinvestigatetherelationshipbetweentheheightsofchildrenandoftheirparents.Heobservedthatifwechooseagroupofparentsofagivenheight,themeanheightoftheirchildrenwillbeclosertothemeanheightofthepopulationthanisthegivenheight.Inotherwords,tallparentstendtobetallerthantheirchildren,shortparentstendtobeshorter.Galtontermedthisphenomenon‘regressiontowardsmediocrity’,meaning‘goingbacktowardstheaverage’.Itisnowcalledregressiontowardsthemean(§11.4).Themethodusedtoinvestigateitwascalledregressionanalysisandthenamehasstuck.However,
inGalton'sterminologytherewas‘noregression’iftherelationshipbetweenthevariableswassuchthatonepredictedtheotherexactly;inmodernterminologythereisnoregressionifthevariablesarenotrelatedatall.
Inregressionproblemsweareinterestedinhowwellonevariablecanbeusedtopredictanother.InthecaseofFEV1andheight,forexample,weareconcernedwithestimatingthemeanFEV1foragivenheightratherthanmeanheightforgivenFEV1.Wehavetwokindsofvariables:theoutcomevariablewhichwearetryingtopredict,inthiscaseFEV1,andthepredictororexplanatoryvariable,inthiscaseheight.Thepredictorvariableisoftencalledtheindependentvariableandtheoutcomevariableiscalledthedependentvariable.However,thesetermshaveothermeaningsinprobability(§6.2),soIshallnotusethem.IfwedenotethepredictorvariablebyXandtheoutcomebyY,therelationshipbetweenthemmaybewrittenas
whereaandbareconstantsandEisarandomvariablewithmean0,calledtheerror,whichrepresentsthatpartofthevariabilityofYwhichisnotexplainedbytherelationshipwithX.IfthemeanofEwerenotzero,wecouldmakeitsobychanginga.WeassumethatEisindependentofX.
![Page 325: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/325.jpg)
11.3ThemethodofleastsquaresIfthepointsalllayalongalineandtherewasnorandomvariation,itwouldbeeasytodrawalineonthescatterdiagram.InFigure11.1thisisnotthecase.Therearemanypossiblevaluesofaandbwhichcouldrepresentthedataandweneedacriterionforchoosingthebestline.Figure11.3showsthedeviationofapointfromtheline,thedistancefromthepointtothelineintheYdirection,Thelinewillfitthedatawellifthedeviationsfromitaresmall,andwillfitbadlyiftheyarelarge.ThesedeviationsrepresenttheerrorE,thatpartofthevariableYnotexplainedbyX.OnesolutiontotheproblemoffindingthebestlineistochoosethatwhichleavestheminimumamountofthevariabilityofYunexplained,bymakingthevarianceofEaminimum.Thiswillbeachievedbymakingthesumofsquaresofthedeviationsaboutthelineaminimum.Thisiscalledthemethodofleastsquaresandthelinefoundistheleastsquaresline.
ThemethodofleastsquaresisthebestmethodifthedeviationsfromthelineareNormallydistributedwithuniformvariancealongtheline.Thisislikelytobethecase,astheregressiontendstoremovefromYthevariabilitybetweensubjectsandleavethemeasurementerror,whichislikelytobeNormal.Ishalldealwithdeviationsfromthisassumptionin§11.8.
Manyusersofstatisticsarepuzzledbytheminimizationofvariationinonedirectiononly.UsuallybothvariablesaremeasuredwithsomeerrorandyetweseemtoignoretheerrorinX.Whynotminimizetheperpendiculardistancestothelineratherthanthevertical?Therearetworeasonsforthis.First,wearefindingthebeatpredictionofYfromtheobservedvaluesofX,notfromthe
‘true’valuesofX.Themeasurementerrorinbothvariablesisoneofthecausesofdeviationsfromtheline,andisincludedinthesedeviationsmeasuredintheYdirection.Second,thelinefoundinthiswaydependsontheunitsinwhichthevariablesaremeasured.ForthedataofTable11.1thelinefoundbythismethodis
FEV1(litre)=-9.33+0.075×height(cm)
Ifwemeasureheightinmetresinsteadofcentimetres,weget
![Page 326: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/326.jpg)
FEV1(litre)=-34.70+22.0×height(m)
ThusbythismethodthepredictedFEV1forastudentofheight170cmis3.42litres,butforastudentofheight1.70mitis2.70litres.Thisisclearlyunsatisfactoryandwewillnotconsiderthisapproachfurther.
Fig.11.3.Deviationsfromthelineintheydirection
ReturningtoFigure11.3,theequationofthelinewhichminimizesthesumofsquareddeviationsfromthelineintheoutcomevariableisfoundquiteeasily(§11A).Thesolutionis:
Wethenfindtheinterceptaby
![Page 327: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/327.jpg)
TheequationY=a+bXiscalledtheregressionequationofYonX,YbeingtheoutcomevariableandXthepredictor.Thegradient,b,isalsocalledtheregressioncoefficient.WeshallcalculateitforthedataofTable11.1.Wehave
WedonotneedthesumofsquaresforYyet,butweshalllater.
HencetheregressionequationofFEV1onheightis
FEV=-9.19+0.0744×height
Figure11.4showsthelinedrawnonthescatterdiagram.
![Page 328: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/328.jpg)
Thecoefficientsaandbhavedimensions,dependingonthoseofXandY.IfwechangetheunitsinwhichXandYaremeasuredwealsochangeaandb,butwedonotchangetheline.Forexample,ifheightismeasuredinmetreswedividethexiby100andwefindthatbismultipliedby100togiveb=7.4389litres/m.Thelineis
FEV1(litres)=-9.19+7.44×height(m)
Thisisexactlythesamelineonthescatterdiagram.
Fig.11.4.TheregressionofFEV1onheight
![Page 329: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/329.jpg)
Fig.11.5.ThetworegressionlinesforthedataofTables11.1and10.15
11.4*TheregressionofXonYWhathappensifwechangeourchoiceofoutcomeandpredictorvariables?TheregressionequationofheightonFEVlis
height=158+4.54×FEV1
ThisisnotthesamelineastheregressionofFEV1onheight.Forifwerearrangethisequationbydividingeachsideby4.54weget
FEVl=-34.8+0.220×height
TheslopeoftheregressionofheightonFEV1isgreaterthanthatofFEV1onheight(Figure11.5).Ingeneral,theslopeoftheregressionofXonYisgreaterthanthatofYonX,whenXisthehorizontalaxis.Onlyifallthepointslieexactlyonastraightlinearethetwoequationsthesame.
Figure11.5alsoshowsthetwo30secondpulsemeasurementsofTable10.15,withthelinesrepresentingtheregressionofthesecondmeasurementonthe
firstandthefirstmeasurementonthesecond.Theregressionequationsare2ndpulse=17.3+0.572×1stpulseand1stpulse=14.9+0.598×2ndpulse.Eachregressioncoefficientislessthanone.Thismeansthatforsubjectswithanygivenfirstpulsemeasurement,thepredictedsecondpulsemeasurementwillbeclosertothemeanthanthefirstmeasurement,andforanygivensecondpulsemeasurement,thepredictedfirstmeasurementwillbeclosertothemeanthanthesecondmeasurement.Thisisregressiontowardsthemean(§11.2).Regressiontowardsthemeanisapurelystatisticalphenomenon,producedbytheselectionofthegivenvalueofthepredictorandtheimperfectrelationshipbetweenthevariables.Regressiontowardsthemeanmaymanifestitselfinmanyways.Forexample,supposewemeasurethe
![Page 330: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/330.jpg)
bloodpressureofanunselectedgroupofpeopleandthenselectsubjectswithhighbloodpressure,e.g.diastolic>95mmHg.Ifwethenmeasuretheselectedgroupagain,themeandiastolicpressurefortheselectedgroupwillbelessonthesecondoccasionthanonthefirst,withoutanyinterventionortreatment.Theapparentfalliscausedbytheinitialselection.
11.5ThestandarderroroftheregressioncoefficientInanyestimationprocedure,wewanttoknowhowreliableourestimatesare.Wedothisbyfindingtheirstandarderrorsandhenceconfidenceintervals.Wecanalsotesthypothesesaboutthecoefficients,forexample,thenullhypothesisthatinthepopulationtheslopeiszeroandthereisnolinearrelationship.Thedetailsaregivenin§11C.Wefirstfindthesumofsquaresofthedeviationsfromtheline,thatis,thedifferencebetweentheobservedyiandthevaluespredictedbytheregressionline.Thisis
Inordertoestimatethevarianceweneedthedegreesoffreedomwithwhichtodividethesumofsquares.Wehaveestimatednotoneparameterfromthedata,asforthesumofsquaresaboutthemean(§4.6),buttwo,aandb.Welosetwodegreesoffreedom,leavinguswithn-2.HencethevarianceofYabouttheline,calledtheresidualvariance,is
Ifwearetoestimatethevariationabouttheline,wemustassumethatitisthesameallthewayalongtheline,i.e.thatthevarianceisuniform.Thisisthesameasforthetwo-sampletmethod(§10.3)andanalysisofvariance(§10.9).Forthe
![Page 331: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/331.jpg)
FEV1datathesumofsquaresduetotheregressionis0.0743892×576.352=3.18937andthesumofsquaresabouttheregressionis9.43868-3.18937=6.24931.Thereare20-2=18degreesoffreedom,sothevarianceabouttheregressioniss2=6.2493/18=0.34718.Thestandarderrorofbisgivenby
WehavealreadyassumedthattheerrorEisNormallydistributed,sobmustbe,too.Thestandarderrorisbasedonasinglesumofsquares,sob/SE(b)isanobservationfromthetdistributionwithn-2degreesoffreedom(§10.1).Wecanfinda95%confidenceintervalforbbytakingtstandarderrorsoneithersideoftheestimate.Fortheexample,wehave18degreesoffreedom.FromTable10.1,the5%pointofthetdistributionis2.10.sothe95%confidenceintervalforbis0.074389-2.10×0.02454to0.074389+2.10×0.02454or0.02to0.13litres/cm.WecanseethatFEV1andheightarerelated,thoughtheslopeisnotverywellestimated.
Wecanalsotestthenullhypothesisthat,inthepopulation,theslope=0againstthealternativethattheslopeisnotequalto0,arelationshipineitherdirection.Theteststatisticisb/SE(b)andifthenullhypothesisistruethiswillbefromatdistributionwithn-2degreesoffreedom.Fortheexample,
FromTable10.1thishastwo-tailedprobabilityoflessthan0.01.Thecomputertellsusthattheprobabilityisabout0.007.Hencethedataareinconsistentwiththenullhypothesisandthedataprovidefairlygoodevidencethatarelationshipexists.Ifthesampleweremuchlarger,wecoulddispensewiththetdistributionandusetheStandardNormaldistributioninitsplace.
11.6*UsingtheregressionlineforpredictionWecanusetheregressionequationtopredictthemeanorexpectedYforanygivenvalueofX.ThisiscalledtheregressionestimateofY.We
![Page 332: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/332.jpg)
canusethistosaywhetheranyindividualhasanobservedYgreaterorlessthanwouldbeexpectedgivenX.Forexample,thepredictedFEVlforstudentswithheight177cmis-9.19+0.0744×177=3.98litres.Threesubjectshadheight177cm.ThefirsthadobservedFEVlof5.43litres,1.45litresabovethatexpected.ThesecondhadaratherlowFEVlof3.09litres,0.89litresbelowexpectation,whilethethirdwithanFEVlof4.05litreswasveryclosetothatpredicted.Wecanusethisclinicallytoadjustameasuredlungfunctionforheightandthusgetabetterideaofthepatient'sstatus.Wewould,ofcourse,useamuchlargersampletoestablishapreciseestimateoftheregressionequation.Wecanalsouseavariantofthemethod(§17.1)toadjustFEV1forheightincomparingdifferentgroups,wherewecanbothremovevariationinFEV1duetovariationinheight
andallowfordifferencesinmeanheightbetweenthegroups.Wemaywishtodothistocomparepatientswithrespiratorydiseaseondifferenttherapies,ortocomparesubjectsexposedtodifferentenvironmentalfactors,suchasairpollution,cigarettesmoking,etc.
Fig.11.6.Confidenceintervalsfortheregressionestimate
Aswithallsampleestimates,theregressionestimateissubjecttosamplingvariation.Weestimateitsprecisionbystandarderrorandconfidenceintervalintheusualway.ThestandarderroroftheexpectedYforanobservedvaluexis
![Page 333: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/333.jpg)
Weneednotgointothealgebraicdetailsofthis.Itisverysimilartothatin§11C.Forx=177wehave
Thisgivesa95%confidenceintervalof3.98-2.10×0.138to3.98+2.10×0.138givingfrom3.69to4.27litres.Here3.98istheestimateand2.10isthe5%pointofthetdistributionwithn-2=18degreesoffreedom.
Thestandarderrorisaminimumatx=[xwithbarabove],andincreasesaswemoveawayfrom[xwithbarabove]ineitherdirection.Itcanbeusefultoplotthestandarderrorand95%confidenceintervalaboutthelineonthescatterdiagram.Figure11.6showsthisfortheFEV1data.Noticethatthelinesdivergeconsiderablyaswereachtheextremesofthedata.Itisverydangeroustoextrapolatebeyondthedata.Notonlydothestandarderrorsbecomeverywide,butweoftenhavenoreasontosupposethatthestraightlinerelationshipwouldpersist.
Theintercepta,thepredictedvalueofYwhenX=0,isaspecialcaseofthis.Clearly,wecannotactuallyhaveamedicalstudentofheightzeroandwithFEV1of-9.19litres.Figure11.6alsoshowstheconfidenceintervalfortheregressionestimatewithamuchsmallerscale,toshowtheintercept.Theconfidenceintervalisverywideatheight=0,andthisdoesnottakeaccountof
anybreakdowninlinearity.
WemaywishtousethevalueofXforasubjecttoestimatethatsubject'sindividualvalueofY,ratherthanthemeanforallsubjectswiththisX.Theestimateisthesameastheregressionestimate,butthestandarderrorismuchgreater:
![Page 334: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/334.jpg)
Forastudentwithaheightof177cm.thepredictedFEVlis3.98litres,withstandarderror0.61litres.Figure11.7showstheprecisionofthepredictionofafurtherobservation.Aswemightexpect,the95%confidenceintervalsincludeallbutoneofthe20observations.Thisisonlygoingtobeausefulpredictionwhentheresidualvariances2issmall.
WecanalsousetheregressionequationofYonXtopredictXfromY.ThisismuchlessaccuratethanpredictingYfromX.Thestandarderrorsare
Forexample,ifweusetheregressionofheightonFEV1(Figure11.5)topredicttheFEV1ofanindividualstudentwithheight177cm,wegetapredictionof4.21litres,withstandarderror1.05litres.ThisisalmosttwicethestandarderrorobtainedfromtheregressionofFEV1onheight,0.61.OnlyifthereisnopossibilityofdeviationsinXfulfillingtheassumptionsofNormaldistributionanduniformvariance,andsonowayoffittingX=a+bY,shouldweconsiderpredictingXfromtheregressionofYonX.ThismighthappenifXisfixedinadvance,e.g.thedoseofadrug.
11.7*AnalysisofresidualsItisoftenveryusefultoexaminetheresiduals,thedifferencesbetweentheobservedandpredictedY.Thisisbestdonegraphically.WecanassesstheassumptionofaNormaldistributionbylookingatthehistogramorNormalplot(§7.5).Figure11.8showsthesefortheFEVldata.Thefitisquitegood.
Figure11.9showsaplotofresidualsagainstthepredictorvariable.Thisplotenablesustoexaminedeviationsfromlinearity.Forexample,ifthetruerelationshipwerequadratic,sothatYincreasesmoreandmorerapidlyasXincreases,weshouldseethattheresidualsare
![Page 335: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/335.jpg)
relatedtoX.LargeandsmallXwouldtendtohavepositiveresidualswhereascentralvalueswouldhavenegativeresiduals.Figure11.9showsnorelationshipbetweentheresidualsandheight,andthelinearmodelseemstobeanadequatefittothedata.
Fig.11.7.Confidenceintervalforafurtherobservation
![Page 336: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/336.jpg)
Fig.11.8.DistributionofresidualsfortheFEV1data
Fig.11.9.ResidualsagainstheightfortheFEV1data
Fig.11.10.Datawhichdonotmeettheconditionsofthemethodofleastsquares,beforeandafterlogtransformation
![Page 337: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/337.jpg)
Figure11.9showssomethingelse,however.Onepointstandsoutashavingaratherlargerresidualthantheothers.Thismaybeanoutlier,apointwhichmaywellcomefromadifferentpopulation.Itisoftendifficulttoknowwhattodowithsuchdata.Atleastwehavebeenwarnedtodoublecheckthispointfortranscriptionerrors.Itisalltooeasytotransposeadjoiningdigitswhentransferringdatafromonemediumtoanother.Thismayhavebeenthecasehere,asanFEV1of4.53,ratherthanthe5.43recorded,wouldhavebeenmoreinlinewiththerestofthedata.Ifthishappenedatthepointofrecording,thereisnotmuchwecandoaboutit.Wecouldtrytomeasurethesubjectagain,orexcludehimandseewhetherthismakesanydifference.Ithinkthat,onthewhole,weshouldworkwithallthedataunlessthereareverygoodreasonsfornotdoingso.Ihaveretainedthiscasehere.
11.8*DeviationsfromassumptionsinregressionBoththeappropriatenessofthemethodofleastsquaresandtheuseofthetdistributionforconfidenceintervalsandtestsofsignificancedependontheassumptionthattheresidualsarefromaNormaldistributionwithuniformvariance.Thisassumptioniseasilymet,forthesamereasonsthatitisinthepairedttest(§10.2).TheremovalofthevariationduetoXtendstoremovesomeofthevariationbetweenindividuals,leavingthemeasurementerror.Problemscanarise,however,anditisalwaysagoodideatoplottheoriginalscatterdiagramandtheresidualstocheckthattherearenogrossdeparturesfromtheassumptionsofthemethod.Notonlydoesthishelppreservethevalidityofthestatisticalmethodused,butitmayalsohelpuslearnmoreaboutthestructureofthedata.
Figure11.10showstherelationshipbetweengestationalageandcordbloodlevelsofAVP,theantidiuretichormone,inasampleofmalefoetuses.ThevariabilityoftheoutcomevariableAVPdependsontheactualvalueofthevariable,beinglargerforlargevaluesofAVP.Theassumptionsofthemethodofleastsquaresdonotapply.However,wecanuseatransformationaswedidforthecomparisonofmeansin§10.4.Figure11.10alsoshowsthedataafterAVPhasbeenlogtransformed,togetherwiththeleastsquaresline.
![Page 338: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/338.jpg)
Asin§10.4,thetransformationisfoundbytrialanderror.Thelogtransformationenablesustointerprettheregressioncoefficientinawaywhichothertransformationsdonot.Iusedlogstobase10forthistransformationandgotthefollowingregressionequation:
log10(AVP)=-0.651253+0.011771×gestationalage
Thismeansthatforeveryonedayincreaseingestationalage,log10(AVP)increasesby0.011771.Adding0.011771tolog10(AVP)multipliesAVPby100.011771=1.027theantilogof0.011771.Wecanantilogtheconfidencelimitsfortheslopetogivetheconfidenceintervalforthisfactor.
Itmaybemoreconvenienttoreporttheincreaseperweekorpermonth.Thesewouldbefactorsof100.011771×7=1.209or100.011771×30
=2.255respectively.Whenthedataarearandomsample,itisoftenconvenienttoquotetheslopecalculatedfromlogsastheeffectofadifferenceofonestandarddeviationofthepredictor.Forgestationalagethestandarddeviationis61.16104days,sotheeffectofachangeofoneSDistomultipleAVPby100.011771×61.16104=5.247,soadifferenceofonestandarddeviationisassociatedwithafivefoldincreaseinAVP.Anotherapproachistolookatthedifferencebetweentwocentiles,suchasthe10thandthe90th.Forgestationalagetheseare98and273days,sotheeffectonAVPwouldbetomultiplyitby100.011771×(273–98)=114.796.ThusthedifferenceoverthisintercentilerangeistoraiseAVP115-fold.
11.9CorrelationTheregressionmethodtellsussomethingaboutthenatureoftherelationshipbetweentwovariables,howonechangeswiththeother,butitdoesnottellushowclosethatrelationshipis.Todothisweneedadifferentcoefficient,thecorrelationcoefficient.Thecorrelationcoefficientisbasedonthesumofproductsaboutthemeanofthetwovariables,soIshallstartbyconsideringthepropertiesofthesumofproductsandwhyitisagoodindicatoroftheclosenessoftherelationship.
![Page 339: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/339.jpg)
Figure11.11showsthescatterdiagramofFigure11.1withtwonewaxesdrawnthroughthemeanpoint.Thedistancesofthepointsfromtheseaxesrepresentthedeviationsfromthemean.InthetoprightsectionofFigure11.11,thedeviationsfromthemeanofbothvariables,FEV1andheight,arepositive.Hence,theirproductswillbepositive.Inthebottomleftsection,thedeviationsfromthemeanofthetwovariableswillbothbenegative.Again,theirproductwillbepositive.InthetopleftsectionofFigure11.11,thedeviationsofFEV1fromitsmeanwillbepositive,andthedeviationofheightfromitsmeanwillbenegative.Theproductofthesewillbenegative.Inthebottomrightsection,theproductwillagainbenegative.SoinFigure11.11nearlyalltheseproductswillbepositive,andtheirsumwillbepositive.Wesaythatthereisapositivecorrelationbetweenthetwovariables;asoneincreasessodoestheother.Ifonevariabledecreasedastheotherincreased,wewouldhaveascatterdiagramwheremostofthepointslayinthetopleftandbottomrightsections.Inthis
casethesumoftheproductswouldbenegativeandtherewouldbeanegativecorrelationbetweenthevariables.Whenthetwovariablesarenotrelated,wehaveascatterdiagramwithroughlythesamenumberofpointsineachofthesections.Inthiscase,thereareasmanypositiveasnegativeproducts,andthesumiszero.Thereiszerocorrelationornocorrelation.Thevariablesaresaidtobeuncorrelated.
![Page 340: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/340.jpg)
Fig.11.11.Scatterdiagramwithaxesthroughthemeanpoint
Thevalueofthesumofproductsdependsontheunitsinwhichthetwovariablesaremeasured.WecanfindadimensionlesscoefficientifwedividethesumofproductsbythesquarerootsofthesumsofsquaresofXandY.Thisgivesustheproductmomentcorrelationcoefficient,orthecorrelationcoefficientforshort,usuallydenotedbyr.
Ifthenpairsofobservationsaredenotedby(xi,yi),thenrisgivenby
![Page 341: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/341.jpg)
FortheFEV1andheightwehave
Theeffectofdividingthesumofproductsbytherootsumofsquaresofdeviationsofeachvariableistomakethecorrelationcoefficientliebetween-1.0and+1.0.WhenallthepointslieexactlyonastraightlinesuchthatYincreasesasXincreases,r=1.Thiscanbeshownbyputtinga+bxiinplaceofyiintheequationforr;everythingcancelsoutleavingr=1.Whenallthepointslieexactlyonastraightlinewithnegativeslope,r=-1.Whenthereisnorelationshipatall,r=0,becausethesumofproductsiszero.Thecorrelationcoefficientdescribestheclosenessofthelinearrelationshipbetweentwovariables.ItdoesnotmatterwhichvariablewetaketobeYandwhichtobeX.Thereisnochoiceofpredictorandoutcomevariable,asthereisinregression.
![Page 342: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/342.jpg)
Fig.11.12.Datawherethecorrelationcoefficientmaybemisleading
Thecorrelationcoefficientmeasureshowclosethepointsaretoastraightline.EvenifthereisaperfectmathematicalrelationshipbetweenXandY,thecorrelationcoefficientwillnotbeexactly1unlessthisisoftheformy=a+bx.Forexample,Figure11.12showstwovariableswhichareperfectlyrelatedbuthaver=0.86.Figure11.12alsoshowstwovariableswhichareclearlyrelatedbuthavezerocorrelation,becausetherelationshipisnotlinear.Thisshowsagaintheimportanceofplottingthedataandnotrelyingonsummarystatisticssuchasthecorrelationcoefficientonly.Inpractice,relationshipslikethoseofFigures11.12arerareinmedicaldata,althoughthepossibilityisalwaysthere.Moreoften,thereissomuchrandomvariationthatitisnoteasytodiscernanyrelationshipatall.
Thecorrelationcoefficientrisrelatedtotheregressioncoefficientbinasimpleway.IfY=a+bXistheregressionofyonX,andX=a′+b′YistheregressionofXonY,thenr2=bb′.Thisarisesfromtheformulaeforrandb.FortheFEV1data,b=0.074389andb′=4.5424,sobb′=0.074389×4.5424=0.33790,thesquarerootofwhichis0.58129,thecorrelationcoefficient.Wealsohave
![Page 343: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/343.jpg)
Thisistheproportionofvariabilityexplained,describedin§11.5.
Table11.2.Two-sided5%and1%pointsofthedistributionofthecorrelationcoefficient,r,underthe
nullhypothesis
n 5% 1% n 5% 1% n 5% 1%
3 1.00 1.00 16 0.50 0.62 29 0.37 0.47
4 0.95 0.99 17 0.48 0.61 30 0.36 0.46
5 0.88 0.96 18 0.47 0.59 40 0.31 0.40
6 0.81 0.92 19 0.46 0.58 50 0.28 0.36
7 0.75 0.87 20 0.44 0.56 60 0.25 0.33
8 0.71 0.83 21 0.43 0.55 70 0.24 0.31
9 0.67 0.80 22 0.42 0.54 80 0.22 0.29
10 0.63 0.77 23 0.41 0.53 90 0.21 0.27
11 0.60 0.74 24 0.40 0.52 100 0.20 0.25
![Page 344: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/344.jpg)
12 0.58 0.71 25 0.40 0.51 200 0.14 0.18
13 0.55 0.68 26 0.39 0.50 500 0.09 0.12
14 0.53 0.66 27 0.38 0.49 1000 0.06 0.08
15 0.51 0.64 28 0.37 0.48
n=Numberofobservations.
11.10SignificancetestandconfidenceintervalforrTestingthenullhypothesisthatr=0inthepopulation,i.e.thatthereisnolinearrelationship,issimple.Thetestisnumericallyequivalenttotestingthenullhypothesisthatb=0,andthetestisvalidprovidedatleastoneofthevariablesisfromaNormaldistribution.Thisconditioniseffectivelythesameasthatfortestingb,wheretheresidualsintheYdirectionmustbeNormal,Ifb=0,theresidualsintheYdirectionaresimplythedeviationsfromthemean,andthesewillonlybeNormallydistributedifYis.Iftheconditionisnotmet,wecanuseatransformation(§11.8),oroneoftherankcorrelationmethods(§12.4-5).
Becausethecorrelationcoefficientdoesnotdependonthemeansorvariancesoftheobservations,thedistributionofthesamplecorrelationcoefficientwhenthepopulationcoefficientiszeroiseasytotabulate.Table11.2showsthecorrelationcoefficientatthe5%and1%levelofsignificance.Fortheexamplewehaver=0.58from20observations.The1%pointfor20observationsis0.56,sowehaveP<0.01,andthecorrelationisunlikelytohaveariseniftherewerenolinearrelationshipinthepopulation.Notethatthevaluesofrwhichcanarisebychancewithsmallsamplesarequitehigh.With10pointsrwouldhavetobegreaterthan0.63tobesignificant.Ontheotherhandwith1000pointsverysmallvaluesofr,aslowas0.06,willbesignificant.
![Page 345: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/345.jpg)
Findingaconfidenceintervalforthecorrelationcoefficientismoredifficult.
EvenwhenXandYarebothNormallydistributed,rdoesnotitselfapproachaNormaldistributionuntilthesamplesizeisinthethousands.Furthermore,itsdistributionisrathersensitivetodeviationsfromtheNormalinXandY.However,ifbothvariablesarefromNormaldistributions,Fisher'sztransformationgivesaNormallydistributedvariablewhosemeanandvarianceareknownintermsofthepopulationcorrelationcoefficientwhichwewishtoestimate.Fromthisaconfidenceintervalcanbefound.Fisher'sztransformationis
whichfollowsaNormaldistributionwithmean
soforthelowerlimitwehave
andfortheupperlimit
andthe95%confidenceintervalis0.18to0.81.Thisisverywide,
![Page 346: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/346.jpg)
reflectingthesamplingvariationwhichthecorrelationcoefficienthasforsmallsamples.Correlationcoefficientsmustbetreatedwithsomecautionwhenderivedfromsmallsamples.
Theeaseofthesignificancetestcomparedtotherelativecomplexityoftheconfidenceintervalcalculationhasmeantthatinthepastasignificancetestwasusuallygivenforthecorrelationcoefficient.Theincreasingavailabilityofcomputerswithwell-writtenstatisticalpackagesshouldleadtocorrelationcoefficientsappearingwithconfidenceintervalsinthefuture.
Table11.3.Simulateddatashowing10pairsofmeasurementsoftwoindependentvariablesforfoursubjects
Subject1 Subject2 Subject3 Subject4
x y x y x y x
47 51 49 52 51 46 63
46 53 50 56 46 48 70
50 57 42 46 46 47 63
52 54 48 52 45 55 58
46 55 60 53 52 49 59
36 53 47 49 54 61 61
47 54 51 52 48 53 67
![Page 347: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/347.jpg)
46 57 57 50 47 48 64
36 61 49 50 47 50 59
44 57 49 49 54 44 61
Means 45.0 55.2 50.2 50.9 49.0 50.1 62.5
r=-0.33 r=0.49 r=0.06 r=-0.39
P=0.35 P=0.15 P=0.86 P=0.27
11.11UsesofthecorrelationcoefficientThecorrelationcoefficienthasseveraluses.UsingTable11.2,itprovidesasimpletestofthenullhypothesisthatthevariablesarenotlinearlyrelated,withlesscalculationthantheregressionmethod.Itisalsousefulasasummarystatisticforthestrengthofrelationshipbetweentwovariables.Thisisofgreatvaluewhenweareconsideringtheinterrelationshipsbetweenalargenumberofvariables.Wecansetupasquarearrayofthecorrelationsofeachpairofvariables,calledthecorrelationmatrix.Examinationofthecorrelationmatrixcanbeveryinstructive,butwemustbearinmindthepossibilityofnon-linearrelationships.Thereisnosubstituteforplottingthedata.Thecorrelationmatrixalsoprovidesthestartingpointforanumberofmethodsfordealingwithalargenumberofvariablessimultaneously.
Ofcourse,forthereasonsdiscussedinChapter3,thefactthattwovariablesarecorrelateddoesnotmeanthatonecausestheother.
11.12*Usingrepeatedobservations
![Page 348: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/348.jpg)
Inclinicalresearchweareoftenabletotakeseveralmeasurementsonthesamepatient.Wemaywanttoinvestigatetherelationshipbetweentwovariables,andtakepairsofreadingswithseveralpairsfromeachofseveralpatients.Theanalysisofsuchdataisquitecomplex.Thisisbecausethevariabilityofmeasurementsmadeondifferentsubjectsisusuallymuchgreaterthanthevariabilitybetweenmeasurementsonthesamesubject,andwemusttakethesetwokindsofvariabilityintoaccount.Whatwemustnotdoistoputallthedatatogether,asiftheywereonesample.
ConsiderthesimulateddataofTable11.3.Thedataweregeneratedfromrandomnumbers,andthereisnorelationshipbetweenXandYatall.FirstvaluesofXandYweregeneratedforeach‘subject’,thenafurtherrandomnumberwasaddedtomaketheindividual‘observation’.Foreachsubjectseparately,
therewasnosignificantcorrelationbetweenXandY.Forthesubjectmeans,thecorrelationcoefficientwasr=0.77,P=0.23.However,ifweputall40observationstogetherwegetr=0.53,P=0.0004.Eventhoughthecoefficientissmallerthanthatbetweensubjectmeans,becauseitisbasedon40pairsofobservationsratherthan4itbecomessignificant.ThedataareplottedinFigure11.13,withthreeothersimulations.Asthenullhypothesisisalwaystrueinthesesimulateddata,thepopulationcorrelationsforeach‘subject’andforthemeansarezero.Becausethenumbersofobservationsaresmall,thesamplecorrelationsvarygreatly.AsTable11.2shows,largecorrelationcoefficientscanarisebychanceinsmallsamples.However,theoverallcorrelationis‘significant’inthreeofthefoursimulations,thoughindifferentdirections.
![Page 349: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/349.jpg)
Fig.11.13.Simulationsof10pairsofobservationsonfoursubjects
Weonlyhavefoursubjectsandonlyfourpoints.Byusingtherepeateddata,wearenotincreasingthenumberofsubjects,butthestatisticalcalculationisdoneasifwehave,andsothenumberofdegreesoffreedomforthesignificancetestisincorrectlyincreasedandaspurioussignificantcorrelationproduced.
Therearetwosimplewaystoapproachthistypeofdata,andwhichischosendependsonthequestionbeingasked.IfwewanttoknowwhethersubjectswithahighvalueofXtendtohaveahighvalueofYalso,weusethesubjectmeansandfindthecorrelationbetweenthem.Ifwehavedifferentnumbersofobservationsforeachsubject,wecanuseaweightedanalysis,weightedbythenumberofobservationsforthesubject.Ifwewanttoknowwhetherchangesinonevariableinthesamesubjectareparallelledbychangesintheother,weneedtousemultipleregression,takingsubjectsoutasafactor(§17.1,§17.6).Ineither
case,weshouldnotmixobservationsfromdifferentsubjects
![Page 350: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/350.jpg)
indiscriminately.
Fig.11.14.Scatterplotsofthe30secondpulsedataasinTable10.15andwithhalfthepairsofobservationsreversed
11.13*IntraclasscorrelationSometimeswehavepairsofobservationswherethereisnoobviouschoiceofXandY.ThedataofTable10.15areagoodexample.Eachsubjecthastwomeasurementsmadebydifferentobservers,differentpairsofobserversbeingusedforeachsubject.ThechoiceofXandYisarbitraryFigure11.14showsthedataasinTable10.15andwithhalfthepairsarbitrarilyreversed.Thescatterplotslookalittledifferentandthereisnogoodreasontochooseoneagainsttheother.Thecorrelationcoefficientsarealittledifferenttoo:fortheoriginalorderr=0.5848andforthesecondorderr=0.5804.Theseareverysimilar,ofcourse,butwhichshouldweuse?
Itwouldbenicetohaveanaveragecorrelationcoefficientacrossallthe245possibleorderings.ThisisprovidedbytheintraclasscorrelationcoefficientorICC.Thiscanbefoundfromtheestimatesofwithinsubjectvariance,s2w,andbetweensubjectsvariance,s2b,foundfromtheanalysisofvariancein§10.12.Wehave:
![Page 351: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/351.jpg)
Fortheexample,s2w=14.37ands2b=20.19(§10.12).hence
TheICCwasoriginallydevelopedforapplicationssuchascorrelationbetweenvariablesmeasuredinpairsoftwins(whichtwinisXandwhichisY?).WedonothavetohavepairsofmeasurementstousetheICC.Itworksjustaswellfortripletsorforanynumberofobservationswithinthegroups,notnecessarilyallthesame.
Althoughnotusednearlyasoftenastheproductmomentcorrelationcoefficient,theICChassomeimportantapplications.Oneisinthestudyofmeasurementerrorandobservervariation(§15.2),whereifmeasurementsaretrue
replicatestheorderinwhichtheyweremadeisnotimportant.Anotherisinthedesignofcluster-randomizedtrialswherethegroupistheclusterandmayhavehundredsofobservationswithinit(§18.8).
Appendices
11AAppendix:Theleastsquaresestimates
Thissectionrequiresknowledgeofcalculus.Wewanttofindaandbsothatthesumofsquaresabouttheliney=a+bxisaminimum.WethereforewanttominimizeΣ(yi-a-bxi)2.Thiswillhaveaminimumwhenthepartialdifferentialswithrespecttoaandbarebothzero.
![Page 352: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/352.jpg)
Subtractingthisfromthesecondequationweget
Thisgivesus
11BAppendix:Varianceabouttheregressionline
![Page 353: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/353.jpg)
11CAppendix:Thestandarderrorofb
Tofindthestandarderrorofb,wemustbearinmindthatinourregressionmodelalltherandomvariationisinY.Wefirstrewritethesumofproducts:
![Page 354: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/354.jpg)
Thevarianceofaconstanttimesarandomvariableisthesquareoftheconstanttimesthevarianceoftherandomvariable(§6.6).Thexiareconstants,notrandomvariables,so
VAR(yi)isthesameforallyi,sayVAR(yi)=s2.Hence
Thestandarderrorofbisthesquarerootofthis.
11MMultiplechoicequestions57to61(Eachbranchiseithertrueorfalse)
57.InFigure11.15(a):
(a)predictorandoutcomeareindependent;
(b)predictorandoutcomeareuncorrelated;
(c)thecorrelationbetweenpredictorandoutcomeislessthan1;
(d)predictorandoutcomeareperfectlyrelated;
(e)therelationshipisbestestimatedbysimplelinearregression.
ViewAnswer
58.InFigure11.15(b):
(a)predictorandoutcomeareindependentrandomvariables;
(b)thecorrelationbetweenpredictorandoutcomeiscloseto
![Page 355: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/355.jpg)
zero;
(c)outcomeincreasesaspredictorincreases;
(d)predictorandoutcomearelinearlyrelated;
(e)therelationshipcouldbemadelinearbyalogarithmictransformationoftheoutcome.
ViewAnswer
Fig.11.15.Scatterdiagrams
59.Asimplelinearregressionequation:
(a)describesalinewhichgoesthroughtheorigin;
(b)describesalinewithzeroslope;
(c)isnotaffectedbychangesofscale;
(d)describesalinewhichgoesthroughthemeanpoint;
(e)isaffectedbythechoiceofdependentvariable.
ViewAnswer
60.Ifthetdistributionisusedtofindaconfidenceintervalfortheslopeofaregressionline:
![Page 356: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/356.jpg)
(a)deviationsfromthelineintheindependentvariablemustfollowaNormaldistribution;
(b)deviationsfromthelineinthedependentvariablemustfollowaNormaldistribution;
(c)thevarianceaboutthelineisassumedtobethesamethroughouttherangeofthepredictorvariable;
(d)theyvariablemustbelogtransformed;
(e)allthepointsmustlieontheline.
ViewAnswer
61.Theproductmomentcorrelationcoefficient,r:
(a)mustliebetween-1and+1;
(b)canonlyhaveavalidsignificancetestcarriedoutwhenatleastoneofthevariablesisfromaNormaldistribution;
(c)is0.5whenthereisnorelationship;
(d)dependsonthechoiceofdependentvariable;
(e)measuresthemagnitudeofthechangeinonevariableassociatedwithachangeintheother.
ViewAnswer
11EExercise:ComparingtworegressionlinesTable11.4andFigure11.16showthePEFRandheightsofsamplesofmaleandfemalemedicalstudents.Table11.5showsthesumsofsquaresandproductsforthesedata.
1.Estimatetheslopesoftheregressionlinesforfemalesandmales.
ViewAnswer
2.Estimatethestandarderrorsoftheslopes.
ViewAnswer
![Page 357: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/357.jpg)
3.Findthestandarderrorforthedifferencebetweentheslopes,whichareindependent.Calculatea95%confidenceintervalforthedifference.
ViewAnswer
4.Usethestandarderrortotestthenullhypothesisthattheslopesarethesameinthepopulationfromwhichthesedatacome.
ViewAnswer
Fig.11.16.PEFRandheightforfemaleandmalemedicalstudents
Table11.4.HeightandPEFRinasampleofmedicalstudents
Females
![Page 358: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/358.jpg)
Ht PEFR Ht PEFR Ht PEFR Ht PEFR Ht
155 450 163 428 168 480 164 540 175
155 475 163 548 168 595 167 470 176
155 503 164 485 169 510 167 530 176
158 440 165 485 170 455 167 598 177
160 360 166 430 171 430 168 510 177
161 383 166 440 171 537 168 560 177
161 461 166 485 172 442 170 510 177
161 470 166 510 172 463 170 547 177
161 470 167 415 172 490 170 553 177
161 475 167 455 174 540 170 560 177
161 480 167 470 174 540 171 460 178
162 450 167 500 176 535 171 473 178
162 475 168 430 177 513 171 550 178
162 550 168 440 181 522 171 575 178
![Page 359: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/359.jpg)
163 370 172 480 178
172 550 180
172 620 181
174 550 181
174 550 181
174 616
Table11.5.SummarystatisticsforheightandPEFRinasampleofmedicalstudents
Females Males
Number 43 58
Sumofsquares,height 1469.9 2292.0
Sumofsquares,PEFR 101124.8 226994.1
Sumofproductsaboutmean 4220.1 9048.2
![Page 360: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/360.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>12-Methodsbasedonrankorder
12
Methodsbasedonrankorder
12.1*Non-parametricmethodsInChapters10and11IdescribedanumberofmethodsofanalysiswhichreliedontheassumptionthatthedatacamefromaNormaldistribution.Tobemoreprecise,wecouldsaythedatacomefromoneoftheNormalfamilyofdistributions,theparticularNormaldistributioninvolvedbeingdefinedbyitsmeanandstandarddeviation,theparametersofthedistribution.ThesemethodsarecalledparametricbecauseweestimatetheparametersoftheunderlyingNormaldistribution.Methodswhichdonotassumeaparticularfamilyofdistributionsforthedataaresaidtobenon-parametric.InthisandthenextchapterIshallconsidersomenon-parametrictestsofsignificance.Therearemanyothers,butthesewillillustratethegeneralprinciple.Wehavealreadymetonenon-parametrictest,thesigntest(§9.2).ThelargesampleNormaltestcouldalsoberegardedasnon-parametric.
Itisusefultodistinguishbetweenthreetypesofmeasurementsscales.Onanintervalscale,thesizeofthedifferencebetweentwovaluesonthescalehasaconsistentmeaning.Forexample,thedifferenceintemperaturebetween1°Cand2°Cisthesameasthedifferencebetween31°Cand32°C.Onanordinalscale,observationsareordered,butdifferencesmaynothaveameaning.Forexample,anxietyisoftenmeasuredusingsetsofquestions,thenumberofpositiveanswersgivingtheanxietyscale.Asetof36questionswouldgiveascalefrom0to36.Thedifferenceinanxietybetweenscoresof1and2isnotnecessarilythesameasthedifferencebetweenscores31and32.Onanominalscale,wehaveaqualitativeorcategoricalvariable,whereindividuals
![Page 361: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/361.jpg)
aregroupedbutnotnecessarilyordered.Eyecolourisagoodexample.Whencategoriesareordered,wecantreatthescaleaseitherorderedornominal,asappropriate.
AllthemethodsofChapters10and11applytointervaldata,beingbasedondifferencesofobservationsfromthemean.Mostofthemethodsinthischapterapplytoordinaldata.AnyintervalscalewhichdoesnotmeettherequirementsofChapters10and11maybetreatedasordinal,sinceitis,ofcourse,ordered.Thisisthemorecommonapplicationinmedicalwork.
GeneraltextssuchasArmitageandBerry(1994),SnedecorandCochran(1980)andColton(1974)tendnottogointoalotofdetailaboutrankandrelatedmethods,andmorespecializedbooksareneeded(Siegel1956,Conover1980).
12.2*TheMann-WhitneyUtestThisisthenon-parametricanalogueofthetwo-samplettest(§10.3).Itworkslikethis.Considerthefollowingartificialdatashowingobservationsofavariableintwoindependentgroups,AandB:
A 7 4 9 17
B 11 6 21 14
WewanttoknowwhetherthereisanyevidencethatAandBaredrawnfrompopulationswithdifferentlevelsofthevariable.Thenullhypothesisisthatthereisnotendencyformembersofonepopulationtoexceedmembersoftheother.Thealternativeisthatthereissuchatendency,inonedirectionortheother.Firstwearrangetheobservationsinascendingorder,i.e.werankthem:
4 6 7 9 11 14 17 21
A B A A B B A B
Wenowchooseonegroup,sayA.ForeachA,wecounthowmanyBsprecedeit.ForthefirstA,4,noBsprecede.Forthesecond,7,oneBprecedes,forthethirdA,9,oneB,forthefourth,17,threeBs.Weadd
![Page 362: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/362.jpg)
thesenumbersofprecedingBstogethertogiveU=0+1+1+3=5.Now,ifUisverysmall,nearlyalltheAsarelessthannearlyalltheBs.IfUislarge,nearlyallAsaregreaterthannearlyallBs.ModeratevaluesofUmeanthatAsandBsaremixed.TheminimumUis0,whenallBsexceedallAs,andmaximumUisn1×n2whenallAsexceedallBs.ThemagnitudeofUhasameaning,becauseU/n1n2isanestimateoftheprobabilitythatanobservationdrawnatrandomfrompopulationAwouldexceedanobservationdrawnatrandomfrompopulationB.
ThereisanotherpossibleU,whichwewillcallU′,obtainedbycountingthenumberofAsbeforeeachB,ratherthanthenumberofBsbeforeeachA.Thiswouldbe1+3+3+4=11.ThetwopossiblevaluesofUandU′arerelatedbyU+U′=n1n2.SowesubtractU′fromn1n2togive4×4-11=5.
IfweknowthedistributionofUunderthenullhypothesisthatthesamplescomefromthesamepopulation,wecansaywithwhatprobabilitythesedatacouldhaveariseniftherewerenodifference.Wecancarryoutthetestofsignificance.ThedistributionofUunderthenullhypothesiscanbefoundeasily.Thetwosetsoffourobservationscanbearrangedin70differentways,fromAAAABBBBtoBBBBAAAA(8!/4!4!=70,§6A).Underthenullhypothesisthesearrangementsareallequallylikelyand,hence,haveprobability1/70.EachhasitsvalueofU,from0to16,andbycountingthenumberofarrangementswhichgiveeachvalueofUwecanfindtheprobabilityofthatvalue.Forexample,U=0onlyarisesfromtheorderAAAABBBBandsohasprobability1/70=0.014.U=1onlyarisesfromAAABABBBandsohasprobability1/70=0.014also.U=2canariseintwoways:AAABBABBandAABAABBB.Ithasprobability2/70=0.029.ThefullsetofprobabilitiesisshowninTable12.1.
Weapplythistotheexample.ForgroupsAandB,U=5andtheprobabilityofthisis0.071.Aswedidforthesigntest(§9.2)weconsidertheprobabilityofmoreextremevaluesofU,U=5orless,whichis0.071+0.071+0.043+0.029+0.014+0.014=0.242.
Thisgivesaonesidedtest.Foratwo-sidedtest,wemustconsidertheprobabilitiesofadifferenceasextremeintheoppositedirection.We
![Page 363: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/363.jpg)
canseefromTable12.1thatthedistributionofUissymmetrical,sotheprobabilityofanequallyextremevalueintheoppositedirectionisalso0.242,hencethetwo-sidedprobabilityis0.242+0.242=0.484.Thustheobserveddifferencewouldhavebeenquiteprobableifthenullhypothesisweretrueandthetwosamplescouldhavecomefromthesamepopulation.
Table12.1.DistributionoftheMann-WhitneyUstatistic,fortwosamplesofsize4
U Probability U Probability U Probability
0 0.014 6 0.100 12 0.071
1 0.014 7 0.100 13 0.043
2 0.029 8 0.114 14 0.029
3 0.043 9 0.100 15 0.014
4 0.071 10 0.100 16 0.014
5 0.071 11 0.071
Table12.2.Two-sided5%pointsforthedistributionofthesmallervalueofUintheMann-WhitneyUtest
![Page 364: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/364.jpg)
n1n
2 3 4 5 6 7 8 9 10 11
2 - - - - - - 0 0 0 0
3 - - - 0 1 1 2 2 3 3
4 - - 0 1 2 3 4 4 5 6
5 - 0 1 2 3 5 6 7 8 9
6 - 1 2 3 5 6 8 10 11 13
7 - 1 3 5 6 8 10 12 14 16
8 0 2 4 6 8 10 13 15 17 19
9 0 2 4 7 10 12 15 17 20 23
10 0 3 5 8 11 14 17 20 23 26
11 0 3 6 9 13 16 19 23 26 30
12 1 4 7 11 14 18 22 26 29 33
13 1 4 8 12 16 20 24 28 33 37
14 1 5 9 13 17 22 26 31 36 40
![Page 365: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/365.jpg)
15 1 5 10 14 19 24 29 34 39 44
16 1 6 11 15 21 26 31 37 42 47
17 2 6 11 17 22 28 34 39 45 51
18 2 7 12 18 24 30 36 42 48 55
19 2 7 13 19 25 32 38 45 52 58
20 2 8 13 20 27 34 41 48 55 62
IfUislessthanorequaltothetabulatedvaluethedifferenceissignificant.
Inpractice,thereisnoneedtocarryoutthesummationofprobabilitiesdescribedabove,asthesearealreadytabulated.Table12.2showsthe5%pointsofUforeachcombinationofsamplesizesn1andn2upto20.ForourgroupsAandB,U=5.wefindthen2=4columnandthen1=4row.Fromthisweseethatthe5%pointforUis0,andsoU=5isnotsignificant.IfwehadcalculatedthelargerofthetwovaluesofU,11,wecanuseTable12.2byfindingthelowervalue,n1n2-U=16-11=5.
Table12.3.Bicepsskinfoldthickness(mm)intwogroupsofpatients
Crohn'sDisease CoeliacDisease
1.8 2.8 4.2 6.2 1.8 3.8
![Page 366: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/366.jpg)
2.2 3.2 4.4 6.6 2.0 4.2
2.4 3.6 4.8 7.0 2.0 5.4
2.5 3.8 5.6 10.0 2.0 7.6
2.8 4.0 6.0 10.4 3.0
Wecannowturntothepracticalanalysisofsomerealdata.ConsiderthebicepsskinfoldthicknessdataofTable10.4,reproducedasTable12.3.WewillanalysetheseusingtheMann-WhitneyUtest.DenotetheCrohn'sdiseasegroupbyAandthecoeliacgroupbyB.Thejointorderisasfollows:
LetuscounttheAsbeforeeachB.Immediatelywehaveaproblem.ThefirstAandthefirstBhavethesamevalue.DoesthefirstAcomebeforethefirstBorafterit?WeresolvethisdilemmabycountingonehalfforthetiedA.Thetiesbetweenthesecond,thirdandfourthBsdonotmatter,aswecancountthenumberofAsbeforeeachwithoutdifficulty.WehavefortheUstatistic:
U=0.5+1+1+1+6+8.5+10.5+13+18=59.5
Thisisthelowervalue,sincen1n2=9×20=180andsothemiddlevalueis90.WecanthereforereferUtoTable12.2.Thecriticalvalueatthe5%levelforgroupssize9and20is48,whichourvalueexceeds.Hencethedifferenceisnotsignificantatthe5%levelandthedataare
![Page 367: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/367.jpg)
consistentwiththenullhypothesisthatthereisnotendencyformembersofonepopulationtoexceedmembersoftheother.Thisisthesameastheresultofthettestof§10.4.
Forlargervaluesofn1andn2calculationofUcanberathertedious.AsimpleformulaforUcanbefoundusingtheranks.Therankofthelowestobservationis1,ofthenextis2,andsoon.Ifanumberofobservationsaretied,eachhavingthesamevalueandhencethesamerank,wegiveeachtheaverageoftherankstheywouldhaveweretheyordered.Forexample,intheskinfolddatathefirsttwoobservationsareeach1.8.Theyeachreceiverank(1+2)/2=1.5.Thethird,fourthandfiftharetiedat2.0,givingeachofthemrank(3+4+5)/3=4.Thesixth,2.2,isnottiedandsohasrank6.Theranksfortheskinfolddataareasfollows:
skinfold 1.8 1.8 2.0 2.0 2.0 2.2 2.4 2.5 2.8 2.8
group A B B B B A A A A A
rank 1.5 1.5 4 4 4 6 7 8 9.5 9.5
r1 r2 r3 r4
skinfold 3.0 3.2 3.6 3.8 3.8 4.0 4.2 4.2 4.4 4.8
group B A A A B A A B A A
rank 11 12 13 14.5 14.5 16 17.5 17.5 19 20
r5 r6 r7
skinfold 5.4 5.6 6.0 6.2 6.6 7.0 7.6 10.0 10.4
group B A A A A A B A A
rank 21 22 23 24 25 26 27 28 29
![Page 368: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/368.jpg)
r8 r9
WedenotetheranksoftheBgroupbyr1,r2,…,rn1.ThenumberofAsprecedingthefirstBmustber1-1,sincetherearenoBsbeforeitanditisther1thobservation.ThenumberofAsprecedingthesecondBisr2-2,sinceitisther2thobservation,andoneprecedingobservationisaB.Similarly,thenumberprecedingthethirdBisr3-3,andthenumberprecedingtheithBisri-i.Hencewehave:
Thatis,weaddtogethertheranksofallthen1observations,subtractn1(n1+1)/2andwehaveU.Fortheexample,wehave
asbefore.Thisformulaissometimeswritten
Butthisissimplybasedontheothergroup,sinceU+U′=n1n2.Fortestingweusethesmallervalue,asbefore.
![Page 369: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/369.jpg)
isanobservationfromaStandardNormaldistribution.Fortheexample,n1=9andn2=20.wehave
FromTable7.1thisgivestwo-sidedprobability=0.15,similartothatfoundbythetwosamplettest(§10.3).
NeitherTable12.2northeaboveformulaforthestandarddeviationofUtaketiesintoaccount;bothassumethedatacanbefullyranked.Theirusefordatawithtiesisanapproximation.Forsmallsampleswemustacceptthis.FortheNormalapproximation,tiescanbeallowedforusingthefollowingformulaforthestandarddeviationofUwhenthenullhypothesisistrue:
TheMann-WhitneyUtestisanon-parametricanalogueofthetwosamplettest.Theadvantageoverthettestisthattheonlyassumptionaboutthedistributionofthedataisthattheobservationscanberanked,whereasforthettestwemustassumethedataarefrom
![Page 370: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/370.jpg)
Normaldistributionswithuniformvariance.Therearedisadvantages.FordatawhichareNormallydistributed,theUtestislesspowerfulthanthettest,i.e.thettest,whenvalid,candetect
smallerdifferencesforgivensamplesize.TheUtestisalmostaspowerfulformoderateandlargesamplesizes,andthisdifferenceisimportantonlyforsmallsamples.Forverysmallsamples,e.g.twogroupsofthreeobservations,thetestisuselessasallpossiblevaluesofUhaveprobabilitiesabove0.05(Table12.2).TheUtestisprimarilyatestofsignificance.Thetmethodalsoenablesustoestimatethesizeofthedifferenceandgivesaconfidenceinterval.AlthoughasnotedaboveU/n1n2hasaninterpretation,wecannot,sofarasIknow,findaconfidenceintervalforit.
Table12.4.Frequencydistributionsofnumberofnodesinvolvedinbreastcancersdetectedat
screeninganddetectedintheintervalsbetweenscreens(dataofMohammedRaja)
Screeningcancers Intervalcancers
Nodes Freqency Nodes Frequency
0 291 0 66
1 43 1 22
2 16 2 7
3 20 3 7
![Page 371: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/371.jpg)
4 13 4 2
5 3 5 4
6 1 6 4
7 4 7 3
8 3 8 3
9 1 9 2
10 1 10 2
11 2 12 2
12 1 13 1
15 1 15 1
16 1 16 1
17 2 20 1
18 2
20 1
27 1
![Page 372: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/372.jpg)
33 1
Total 408 128
Mean 1.21 2.19
Median 0 0
75%ile 1 3
ThenullhypothesisoftheMann–Whitneytestissometimespresentedasbeingthatthepopulationshavethesamemedian.ThereisevenaconfidenceintervalforthedifferencebetweentwomediansbasedontheMann–Whitneytest(CampbellandGardner1989).Thisissurprising,asthemediansarenotinvolvedinthecalculation.Furthermore,wecanhavetwogroupswhicharesignificantlydifferentusingtheMann–WhitneyUtestyethavethesamemedian.Table12.4
showsanexample.Themajorityofobservationsinbothgroupsarezero,sotransformationtotheNormalisimpossible.Althoughthesamplesarequitelarge,thedistributionissoskewthatarankmethod,appropriatelyadjustedforties,maybesaferthanthemethodof§9.7.TheMann–WhitneyUtestwashighlysignificant,yetthemediansarebothzero.Asthemedianswereequal,Isuggestedthe75thpercentileasameasureoflocationforthedistributions.
ThereasonforthesetwodifferentviewsoftheMann–WhitneyUtestliesintheassumptionswemakeaboutthedistributionsinthetwo
![Page 373: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/373.jpg)
populations.Ifwemakenoassumptions,wecantestthenullhypothesis:thattheprobabilitythatamemberofthefirstpopulationdrawnatrandomwillexceedamemberofthesecondpopulationdrawnatrandomisonehalf.Somepeoplechoosetomakeanassumptionaboutthedistributions:thattheyhavethesameshapeanddifferonlyinlocation(meanormedian).Ifthisassumptionistrue,thenifthedistributionsaredifferentthemediansmustbedifferent.Themeansmustdifferbythesameamount.Itisaverystrongassumption.Forexample,ifitistruethenthevariancesmustbethesameinthetwopopulations.Forthereasonsgivenin§10.5and§7A,itisunlikelythatwecouldgetthisifthedistributionswerenotNormal.UnderthisassumptiontheMann–WhitneyUtestwillrarelybevalidifthetwosamplettestisnotvalidalso.
Thereareothernon-parametrictestswhichtestthesameorsimilarnullhypotheses.Twoofthese,theWilcoxontwosampletestandtheKendallTautest,aredifferentversionsoftheMann–WhitneyUtestwhichweredevelopedaroundthesametimeandlatershowntobeidentical.Thesenamesaresometimesusedinterchangeably.Theteststatisticsandtablesarenotthesame,andtheusermustbeverycarefulthatthecalculationoftheteststatisticbeingusedcorrespondstothetabletowhichitisreferred.AnotherdifficultywithtablesisthatsomearedrawnsothatforasignificantdifferenceUmustbelessthanorequaltothetabulatedvalue(asinTable12.2),forothersUmustbestrictlylessthanthetabulatedvalue.
Formorethantwogroups,therankanalogueofone-wayanalysisofvariance(§10.9)istheKruskal–Wallistest,seeConover(1980)andSiegel(1956).Conover(1980)alsodescribesamultiplecomparisontestforthepairsofgroups,similartothosedescribedin§10.11.
12.3*TheWilcoxonmatchedpairstestThistestisananalogueofthepairedttest.Wehaveasamplemeasuredundertwoconditionsandthenullhypothesisisthatthereisnotendencyfortheoutcomeononeconditiontobehigherorlowerthantheother.Thealternativehypothesisisthattheoutcomeononeconditiontendstobehigherorlowerthantheother.Asthetestisbasedonthemagnitudeofthedifferences,thedatamustbeinterval.
![Page 374: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/374.jpg)
ConsiderthedataofTable12.5,previouslydiscussedin§2.6and§9.2,whereweusedthesigntestfortheanalysis.Inthesigntest,wehaveignoredthemagnitudeofdifferences,andonlyconsideredtheirsigns.Ifwecanuseinformation
aboutthemagnitude,wewouldhopetohaveamorepowerfultest.Clearly,wemusthaveintervaldatatodothis.Toavoidmakingassumptionsaboutthedistributionofthedifferences,weusetheirrankorderinasimilarmannertotheMann–WhitneyUtest.
Table12.5.Resultsofatrialofpronethalolforthepreventionofanginapectoris(Pritchardetal.1963),in
rankorderofdifferences
Numberofattackswhileon
Differenceplacebo–pronethalol
Rankofdifference
Placebo Pronethalol All Positive Negative
2 0 2 1.5 1.5
17 15 2 1.5 1.5
3 0 3 3 3
7 2 5 4 4
8 1 7 6 6
14 7 7 6 6
![Page 375: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/375.jpg)
23 16 7 6 6
34 25 9 8 8
79 65 14 9 9
60 41 19 10 10
323 348 -25 11 11
71 29 42 12 12
Sumofranks
67 11
First,werankthedifferencesbytheirabsolutevalues,i.e.ignoringthesign.Asin§12.2,tiedobservationsaregiventheaverageoftheirranks.Wenowsumtheranksofthepositivedifferences,67,andtheranksofthenegativedifferences,11(Table12.5).Ifthenullhypothesisweretrueandtherewasnodifference,wewouldexpecttheranksumsforpositiveandnegativedifferencestobeaboutthesame,equalto39(theiraverage).Theteststatisticisthelesserofthesesums,T.ThesmallerTis,thelowertheprobabilityofthedataarisingbychance.
ThedistributionofTwhenthenullhypothesisistruecanbefoundbyenumeratingallthepossibilities,asdescribedfortheMann–WhitneyUstatistic.Table12.6givesthe5%and1%pointsforthisdistribution,forsamplesizenupto25.Fortheexample,n=12andsothedifferencewouldbesignificantatthe5%levelifTwerelessthanorequalto14.WehaveT=11,sothedataarenotconsistentwiththenullhypothesis.Thedatasupporttheviewthatthereisarealtendencyforpatientstohavefewerattackswhileontheactivetreatment.
![Page 376: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/376.jpg)
FromTable12.6,wecanseethattheprobabilitythatT≤11liesbetween0.05and0.01.Thisisgreaterthantheprobabilitygivenbythesigntest,whichwas0.006(§9.2).Usuallywewouldexpectgreaterpower,andhencelowerprobabilitieswhenthenullhypothesisisfalse,whenweusemoreoftheinformation.Inthiscase,thegreaterprobabilityreflectsthefactthattheonenegativedifference,-25,islarge.Examinationoftheoriginaldatashowsthatthisindividualhadverylargenumbersofattacksonbothtreatments,anditseemspossiblethathemaybelongtoadifferentpopulationfromtheothereleven.
LikeTable12.2,Table12.6isbasedontheassumptionthatthedifferencescanbefullyrankedandtherearenoties.Tiesmayoccurintwowaysinthis
test.Firstly,tiesmayoccurintherankingsense.Intheexamplewehadtwodifferencesof+2andthreeof+7.Thesewererankedequally:1.5and1.5.and6,6and6.Whentiesarepresentbetweennegativeandpositivedifferences,Table12.6onlyapproximatestothedistributionofT.
Table12.6.Two-sided5%and1%pointsofthedistributionofT(lowervalue)intheWilcoxonone-
sampletest
Samplesizen
ProbabilitythatT≤thetabulated
valueSamplesizen
ProbabilitythatT≤the
tabulatedvalue
5% 1% 5% 1%
5 - - 16 30 19
![Page 377: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/377.jpg)
6 1 - 17 35 23
7 2 - 18 40 28
8 4 0 19 46 32
9 6 2 20 52 37
10 8 3 21 59 43
11 11 5 22 66 49
12 14 7 23 73 55
13 17 10 24 81 61
14 21 13 25 90 68
15 25 16
Tiesmayalsooccurbetweenthepairedobservations,wheretheobserveddifferenceiszero.Inthesamewayasforthesigntest,weomitzerodifferences(§9.2).Table12.6isusedwithnasthenumberofnon-zerodifferencesonly,notthetoalnumberofdifferences.Thisseemsodd,inthatalotofzerodifferenceswouldappeartosupportthenullhypothesis.Forexample,ifinTable12.5wehadanotherdozenpatientswithzerodifferences,thecalculationandconclusionwouldbethesame.However,themeandifferencewouldbesmallerandtheWilcoxontesttellsusnothingaboutthesizeofthedifference,onlyitsexistence.Thisillustratesthedangerofallowingsignificanceteststooutweighallotherwaysoflookingatthedata.
![Page 378: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/378.jpg)
isfromaStandardNormaldistributionifthenullhypothesisistrue.FortheexampleofTable12.5,wehave:
FromTable7.1thisgivesatwo-tailedprobabilityof0.028,similartothatobtainedfromTable12.6.
Wehavethreepossibletestsforpaireddata,theWilcoxon,signandpairedtmethods.IfthedifferencesareNormallydistributed,thettestisthemostpowerfultest.TheWilcoxontestisalmostaspowerful,however,andinpracticethedifferenceisnotgreatexceptforsmallsamples.LiketheMann–WhitneyUtest,theWilcoxonisuselessforverysmallsamples.ThesigntestissimilarinpowertotheWilcoxonforverysmallsamples,butasthesamplesizeincreasestheWilcoxontestbecomesmuchmorepowerful.ThismightbeexpectedsincetheWilcoxontestusesmoreoftheinformation.TheWilcoxontestusesthemagnitudeofthedifferences,andhencerequiresintervaldata.Thismeansthat,asfortmethods,wewillgetdifferentresultsifwetransformthedata.Fortrulyordinaldataweshouldusethesigntest.Thepairedtmethodalsogivesaconfidenceintervalforthedifference.TheWilcoxontestispurelyatestofsignificance,butaconfidenceintervalforthemediandifferencecanbefoundusingtheBinomialmethoddescribedin§8.9.
![Page 379: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/379.jpg)
12.4*Spearman'srankcorrelationcoefficient,ρWenotedinChapter11thesensitivitytoassumptionsofNormalityoftheproductmomentcorrelationcoefficient,r.Thisledtothedevelopmentofnon-parametricapproachesbasedonranks.Spearman'sapproachwasdirect.Firstweranktheobservations,thencalculatetheproductmomentcorrelationoftheranks,ratherthanoftheobservationsthemselves.Theresultingstatistichasadistributionwhichdoesnotdependonthedistributionoftheoriginalvariables.ItisusuallydenotedbytheGreekletterρ,pronounced‘rho’,orbyrs.
Table12.7showsdatafromastudyofthegeographicaldistributionofatumour,Kaposi'ssarcoma,inmainlandTanzania.Theincidencerateswerecalculatedfromcancerregistrydataandtherewasconsiderabledoubtthatallcaseswerenotified.Thedegreeofreportingofcasesmayhavebeenrelatedtopopulationdensityoravailabilityofhealthservices.Inaddition,incidencewascloselyrelatedtoageandsex(whererecorded)andsocouldberelatedtotheageandsexdistributionintheregion.Tocheckthatnoneofthesewereproducingartefactsinthegeographicaldistribution,Icalculatedtherankcorrelationofdiseaseincidencewitheachofthepossibleexplanatoryvariables.Table12.7showstherelationshipofincidencetothepercentageofthepopulationlivingwithin10kmofahealthcentre.Figure12.1showsthescatterdiagramofthesedata.Thepercentagewithin10kmofahealthcentreisveryhighlyskewed,whereasthediseaseincidenceappearssomewhatbimodal.Theassumptionoftheproductmomentcorrelationdonotappeartobemet,sorankcorrelationwaspreferred.
Table12.7.IncidenceofKaposi'ssarcomaandaccessofpopulationtohealthcentresforeachregionofmainland
Tanzania(Blandetal.1977)
Percent Rankorder
![Page 380: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/380.jpg)
RegionIncidence
permillionperyear
populationwithin10kmofhealthcentre
Incidence Population%
Coast 1.28 4.0 1 3
Shinyanga 1.66 9.0 2 7
Mbeya 2.06 6.7 3 6
Tabora 2.37 1.8 4 1
Arusha 2.46 13.7 5 13
Dodoma 2.60 11.1 6 10
Kigoma 4.22 9.2 7 8
Mara 4.29 4.4 8 4
Tanga 4.54 23.0 9 16
Singida 6.17 10.8 10 9
Morogoro 6.33 11.7 11 11
Mtwara 6.40 14.8 12 14
![Page 381: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/381.jpg)
Westlake 6.60 12.5 13 12
Kilimanjaro 6.65 57.3 14 17
Ruvuma 7.21 6.6 15 5
Iringa 8.46 2.6 16 2
Mwanza 8.54 20.7 17 15
Fig.12.1.IncidenceofKaposi'ssarcomapermillionperyearandpercentageofpopulationwithin10kmofahealthcentre,for17regionsofmainlandTanzania
ThecalculationofSpearman'sρproceedsasfollows.Theranksforthe
![Page 382: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/382.jpg)
twovariablesarefound(Table12.7).Weapplytheformulafortheproductmomentcorrelation(§11.9)totheseranks.Wedefine:
Table12.8.Two-sided5%and1%pointsofthedistributionofSpearman'sρ
Samplesizen
Probabilitythatρisasfarorfurtherfrom0thanthetabulatedvalue
5% 1%
4 - -
5 1.00 -
6 0.89 1.00
7 0.82 0.96
8 0.79 0.93
9 0.70 0.83
10 0.68 0.81
![Page 383: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/383.jpg)
Wehaveignoredtheproblemoftiesintheabove.Wetreatobservationswiththesamevalueasdescribedin§12.2.Wegivethemtheaverageoftherankstheywouldhaveiftheywereseparableandapplytherankcorrelationformulaasdescribedabove.InthiscasethedistributionofTable12.8isonlyapproximate.
Thereareseveralwaysofcalculatingthiscoefficient,resultinginformulaewhichappearquitedifferent,thoughtheygivethesameresult(seeSiegel1956).
12.5*Kendall'srankcorrelationcoefficient,τSpearman'srankcorrelationisquitesatisfactoryfortestingthenullhypothesisofnorelationship,butisdifficulttointerpretasameasurementofthestrengthoftherelationship.Kendalldevelopedadifferentrankcorrelationcoefficient.Kendall'sτ,whichhassomeadvantagesoverSpearman's.(TheGreekletterτispronounced‘tau’.)ItisrathermoretedioustocalculatethanSpearman's,butinthecomputeragethishardlymatters.Foreachpairofsubjectswe
observewhetherthesubjectsareorderedinthesamewaybythetwo
![Page 384: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/384.jpg)
variables,aconcordantpair,orderedinoppositeways,adiscordantpair,orequalforoneofthevariablesandsonotorderedatall,atiedpair.Kendall'sτistheproportionofconcordantpairsminustheproportionofdiscordantpairs.τwillbeoneiftherankingsareidentical,asallpairswillbeorderedinthesameway,andminusoneiftherankingsareexactlyopposite,asallpairswillbeorderedintheoppositeway.
Weshalldenotethenumberofconcordantpairs(orderedthesameway)bync,thenumberofdiscordantpairs(orderedinoppositeways)bynd,andthedifference,nc-nd,byS.Therearen(n-1)/2pairsaltogether,so
Whentherearenoties,nc+nd=n(n-1)/2.
Thesimplestwaytocalculatencistoordertheobservationsbyoneofthevariables,asinTable12.7whichisorderedbydiseaseincidence.Nowconsiderthesecondranking(%populationwithin10kmofahealthcentre).Thefirstregion,Coast,has14regionsbelowitwhichhavegreaterrank,sothepairsformedbythefirstregionandthesewillbeinthecorrectorder.Thereare2regionsbelowitwhichhavelowerrank,sothepairsformedbythefirstregionandthesewillbeintheoppositeorder.Thesecondregion,Shinyanga,has10regionsbelowitwithgreaterrankandsocontributes10furtherpairsinthecorrectorder.Notethatthepair‘CoastandShinyanga’hasalreadybeencounted.Thereare5pairsinoppositeorder.Thethirdregion,Mbeya,has10regionsbelowitinthesameorderand4inoppositeorders,andsoon.Weaddthesenumberstogetncandnd:
nc=14+10+10+13+4+6+7+8+1+5+4+2+2+0+1+1+0=88
nd=2+5+4+0+8+5+3+1+7+2+2+3+2+3+1+0+0=48
Thenumberofpairsisn(n-1)/2=17×16/2=136.Becausetherearenoties,wecouldalsocalculatendbynd=n(n-1)/2-nc=136-88=48.S=nc-nd=88-48=40.Henceτ=S/(n(n-1)/2)=40/136=0.29.
![Page 385: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/385.jpg)
Whenthereareties,τcannotbeone.However,wecouldhaveperfectcorrelationifthetieswerebetweenthesamesubjectsforbothvariables.Toallowforthis,weuseadifferentversionofτ,τb.Considerthedenominator.Therearen(n-1)/2possiblepairs.IftherearetindividualstiedataparticularrankforvariableX,nopairsfromthesetindividualscontributetoS.Therearet(t-1)/2suchpairs.IfweconsiderallthegroupsoftiedindividualswehaveΣt(t-1)/2pairswhichdonotcontributetoS,summingoverallgroupsoftiedranks.HencethetotalnumberofpairswhichcancontributetoSisn(n-1)-Σt(t-1)/2,andScannotbegreaterthann(n-1)/2-Σt(t-1)/2.ThesizeofSisalsolimitedbytiesinthesecondranking.Ifwedenotethenumberofindividuals
withthesamevalueofYbyu,thenthenumberofpairswhichcancontributetoSisn(n-1)/2-Σu(u-1)/2.Wenowdefineτbby
Notethatiftherearenoties,Σt(t-1)/2=0=Σ.Whentherankingsareidenticalτb=1,nomatterhowmanytiesthereare.Kendall(1970)alsodiscussestwootherwaysofdealingwithties,obtainingcoefficientsτaandτc,buttheiruseisrestricted.
Weoftenwanttotestthenullhypothesisthatthereisnorelationshipbetweenthetwovariablesinthepopulationfromwhichoursamplewasdrawn.Asusual,weareconcernedwiththeprobabilityofSbeingasormoreextreme(i.e.farfromzero)thantheobservedvalue.Table12.9wascalculatedinthesamewayasTables12.1and12.2.ItshowstheprobabilityofbeingasextremeastheobservedvalueofSfornupto10.Forconvenience,Sistabulatedratherthanτ.Whentiesarepresentthisisonlyanapproximation.
Whenthesamplesizeisgreaterthan10,ShasanapproximatelyNormaldistributionunderthenullhypothesis,withmeanzero.Iftherearenoties,thevarianceis
Whenthereareties,thevarianceformulaisverycomplicated(Kendall
![Page 386: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/386.jpg)
1970).Ishallomitit,asinpracticethesecalculationswillbedoneusingcomputersanyway.Iftherearenotmanytiesitwillnotmakemuchdifferenceifthesimpleformisused.
Fortheexample,S=40,n=17andtherearenoties,sotheStandardNormalvariateis
FromTable7.1oftheNormaldistributionwefindthatthetwo-sidedprobabilityofavalueasextremeasthisis0.06×2=0.12,whichisverysimilartothatfoundusingSpearman'sρ.Theproductmomentcorrelation,r,givesr=0.30,P=0.24,butofcoursethenon-NormaldistributionsofthevariablesmakethisPinvalid.
Whyhavetwodifferentrankcorrelationcoefficients?Spearman'sρisolderthanKendall'sτ,andcanbethoughtofasasimpleanalogueoftheproductmomentcorrelationcoefficient,Pearson'sr.τisapartofamoregeneralandconsistentsystemofrankingmethods,andhasadirectinterpretation,asthedifferencebetweentheproportionsofconcordantanddiscordantpairs.Ingeneral,
thenumericalvalueofρisgreaterthanthatofτ.Itisnotpossibletocalculateτfromρorρfromτ,theymeasuredifferentsortsofcorrelation.ρgivesmoreweighttoreversalsoforderwhendataarefarapartinrankthanwhenthereisareversalclosetogetherinrank,τdoesnot.Howeverintermsoftestsofsignificancebothhavethesamepowertorejectafalsenullhypothesis,soforthispurposeitdoesnotmatterwhichisused.
Table12.9.Two-sided5%and1%pointsofthedistributionofSforKendall'sτ
![Page 387: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/387.jpg)
Samplesizen
ProbabilitythatSisasfarorfurtherfromtheexpectedthanthetabulatedvalue
5% 1%
4 - -
5 10 -
6 13 15
7 15 19
8 18 22
9 20 26
10 23 29
12.6*ContinuitycorrectionsInthischapter,whensampleswerelargewehaveusedacontinuousdistribution,theNormal,toapproximatetoadiscretedistribution.U,TorS.Forexample,Figure12.2showsthedistributionoftheMann—WhitneyUstatisticforn1=4,n2=4(Table12.1)withthecorrespondingNormalcurve.Fromtheexactdistribution,theprobabilitythatU<2is0.014+0.014+0.029=0.057.ThecorrespondingStandardNormaldeviateis
Thishasaprobabilityof0.048,interpolatinginTable7.1.Thisis
![Page 388: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/388.jpg)
smallerthantheexactprobability.Thedisparityarisesbecausethecontinuousdistributiongivesprobabilitytovaluesotherthantheintegers0,1,2,etc.TheestimatedprobabilityforU=2canbefoundbytheareaunderthecurvebetweenU=1.5andU=2.5.ThecorrespondingNormaldeviatesare-1.876and-1.588,whichhaveprobabilitiesfromTable7.1of0.030and0.056.ThisgivestheestimatedprobabilityforU=2tobe0.056-0.030=0.026,whichcomparesquitewellwiththeexactfigureof0.029.ThustoestimatetheprobabilitythatU<2,weestimatetheareabelowU=1.5,notbelowU=2.ThisgivesusaStandardNormaldeviateof-1.588,asalreadynoted,andhenceaprobabilityof0.056.Thiscorrespondsremarkablywellwiththeexactprobabilityof0.057,especiallywhenweconsiderhowsmalln1andn2are.
WewillgetabetterapproximationfromourStandardNormaldeviateifwemakeUclosertoitsexpectedvalueby1/2.Ingeneral,wegetabetterfitifwe
maketheobservedvalueofthestatisticclosertoitsexpectedvaluebyhalfoftheintervalbetweenadjacentdiscretevalues.Thisisacontinuitycorrection.
Fig.12.2.DistributionoftheMann-WhitneyUstatistic,n1=4,n2=4,whenthenullhypothesisistrue,withthecorrespondingNormal
![Page 389: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/389.jpg)
distributionandareaestimatingPROB(U=2)
ForS,theintervalbetweenadjacentvaluesis2,not1,forS=nc-nd=2nc-n(n-1)/2,andncisaninteger.AchangeofoneunitinncproducesachangeoftwounitsinS.Thecontinuitycorrectionisthereforehalfof2,whichis1.WemakeSclosertotheexpectedvalueof0by1beforeapplyingtheNormalapproximation.FortheKaposi'ssarcomadata,wehadS=40,withn=17.Usingthecontinuitycorrectiongives
Thisgivesatwo-sidedprobabilityof0.066×2=0.13,slightlygreaterthantheuncorrectedvalueof0.12.
Continuitycorrectionsareimportantforsmallsamples;forlargesamplestheyarenegligible.WeshallmeetanotherinChapter13.
12.7*Parametricornon-parametricmethods?Formanystatisticalproblemsthereareseveralpossiblesolutions,justasformanydiseasesthereareseveraltreatments,similarperhapsintheiroverallefficacybutdisplayingvariationintheirsideeffects,intheirinteractionswithotherdiseasesortreatmentsandintheirsuitabilityfordifferenttypesofpatients.Thereisoftennoonerighttreatment,butrathertreatmentisdecidedonthepresciber'sjudgementoftheseeffects,pastexperienceandplainprejudice.Manyproblemsinstatisticalanalysisarelikethis.Incomparingthemeansoftwosmallgroups,forinstance,wecoulduseattest,attestwithatransformation,aMann-WhitneyUtest,oroneofseveralothers.Ourchoice
ofmethoddependsontheplausibilityofNormalassumptions,theimportanceofobtainingaconfidenceinterval,theeaseofcalculation,andsoon.Itdependsonplainprejudice,too.SomeusersofstatisticalmethodsareveryconcernedabouttheimplicationsofNormalassumptionsandwilladvocatenon-parametricmethodswherever
![Page 390: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/390.jpg)
possible,whileothersaretoocarelessoftheerrorsthatmaybeintroducedwhenassumptionsarenotmet.
Isometimesmeetpeoplewhotellmethattheyhaveusednon-parametricmethodsthroughouttheiranalysisasifthisissomekindofbadgeofstatisticalpurity.Itisnothingofthekind.Itmaymeanthattheirsignificancetestshavelesspowerthantheymighthave,andthatresultsareleftas‘notsignificant’when,forexample,aconfidenceintervalforadifferencemightbemoreinformative.
Ontheotherhand,suchmethodsareveryusefulwhenthenecessaryassumptionsofthetdistributionmethodcannotbemade,anditwouldbeequallywrongtoeschewtheiruse.Rather,weshouldchoosethemethodmostsuitedtotheproblem,bearinginmindboththeassumptionswearemakingandwhatwereallywanttoknow.WeshallsaymoreaboutwhatmethodtousewheninChapter14.
Thereisacommonmisconceptionthatwhenthenumberofobservationsisverysmall,usuallysaidtobelessthansix,Normaldistributionmethodssuchasttestsandregressionmustnotbeusedandthatrankmethodsshouldbeusedinstead.Ihaveneverseenanyargumentputforwardinsupportofthis,butinspectionofTables12.2,12.6,12.8,and12.9willshowthatitisnonsense.Forsuchsmallsamplesranktestscannotproduceanysignificanceattheusual5%level.Shouldoneneedstatisticalanalysisofsuchsmallsamples,Normalmethodsarerequired.
12M*Multiplechoicequestions62to66(Eachbranchiseithertrueorfalse)
62.Forcomparingtheresponsestoanewtreatmentofagroupofpatientswiththeresponsesofacontrolgrouptoastandardtreatment,possibleapproachesinclude:
(a)thetwo-sampletmethod;
(b)thesigntest;
(c)theMann-WhitneyUtest;
(d)theWilcoxonmatchedpairstest;
![Page 391: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/391.jpg)
(e)rankcorrelationbetweenresponsestothetreatments.
ViewAnswer
63.Suitablemethodsfortrulyordinaldatainclude:
(a)thesigntest;
(b)theMann-WhitneyUtest;
(c)theWilcoxonmatchedpairstest;
(d)thetwosampletmethod;
(e)Kendall'srankcorrelationcoefficient.
ViewAnswer
64.Kendall'srankcorrelationcoefficientbetweentwovariables:
(a)dependsonwhichvariableisregardedasthepredictor;
(b)iszerowhenthereisnorelationship;
(c)cannothaveavalidsignificancetestwhentherearetiedobservations;
(d)mustliebetween-1and+1;
(e)isnotaffectedbyalogtransformationofthevariables.
ViewAnswer
65.Testsofsignificancebasedonranks:
(a)arealwaystobepreferredtomethodswhichassumethedatatobeNormallydistributed;
(b)arelesspowerfulthanmethodsbasedontheNormaldistributionwhendataareNormallydistributed;
(c)enableconfidenceintervalstobeestimatedeasily;
(d)requirenoassumptionsaboutthedata;
(e)areoftentobepreferredwhendatacannotbeassumedto
![Page 392: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/392.jpg)
followanyparticulardistribution.
ViewAnswer
66.Tenmenwithanginaweregivenanactivedrugandaplaceboonalternatedaysinrandomorder.Patientsweretestedusingthetimeinminutesforwhichtheycouldexerciseuntilanginaorfatiguestoppedthem.Theexistenceofanactivedrugeffectcouldbeexaminedby:
(a)pairedttest;
(b)Mann-WhitneyUtest;
(c)signtest;
(d)Wilcoxonmatchedpairstest;
(e)Spearman'sρ.
ViewAnswer
12E*Exercise:ApplicationofrankmethodsInthisexerciseweshallanalysetherespiratorycompliancedataof§10Eusingnon-parametricmethods.
1.ForthedataofTable10.19,usethesigntesttotestthenullhypothesisthatchangingthewaveformhasnoeffectonstaticcompliance.
ViewAnswer
2.Testthesamenullhypothesisusingatestbasedonranks.
ViewAnswer
3.Repeatstep1usinglogtransformedcompliance.Doesthetransformationmakeanydifference?
ViewAnswer
![Page 393: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/393.jpg)
4.Repeatstep2usinglogcompliance.Whydoyougetadifferentanswer?
ViewAnswer
5.Whatdoyouconcludeabouttheeffectofwaveformfromthenon-parametrictests?
ViewAnswer
6.Howdotheconclusionsoftheparametricandnon-parametricapproachesdiffer?
ViewAnswer
![Page 394: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/394.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>13-Theanalysisofcross-tabulations
13
Theanalysisofcross-tabulations
13.1Thechi-squaredtestforassociationTable13.1showsforasampleofmotherstherelationshipbetweenhousingtenureandwhethertheyhadapretermdelivery.Thiskindofcross-tabulationoffrequenciesisalsocalledacontingencytableorcross-classification.Eachentryinthetableisafrequency,thenumberofindividualshavingthesecharacteristics(§4.1).Itcanbequitedifficulttomeasurethestrengthoftheassociationbetweentwoqualitativevariableslikethese,butitiseasytotestthenullhypothesisthatthereisnorelationshiporassociationbetweenthetwovariables.Ifthesampleislarge,wedothisbyachi-squaredtest.
Thechi-squaredtestforassociationinacontingencytableworkslikethis.Thenullhypothesisisthatthereisnoassociationbetweenthetwovariables,thealternativebeingthatthereisanassociationofanykind.Wefindforeachcellofthetablethefrequencywhichwewouldexpectifthenullhypothesisweretrue.Todothisweusetherowandcolumntotals,sowearefindingtheexpectedfrequenciesfortableswiththesetotals,calledthemarginaltotals.
Thereare1443women,ofwhom899wereowneroccupiers,aproportion899/1443.Iftherewerenorelationshipbetweentimeofdeliveryandhousingtenure,wewouldexpecteachcolumnofthetabletohavethesameproportion,899/1443,ofitsmembersinthefirstrow.Thusthe99patientsinthefirstcolumnwouldbeexpectedtohave99×899/1443=61.7inthefirstrow.By‘expected’wemeantheaveragefrequencywewouldgetinthelongrun.Wecouldnotactuallyobserve61.7subjects.The1344patientsinthesecondcolumnwouldbe
![Page 395: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/395.jpg)
expectedtohave1344×899/1443=837.3inthefirstrow.Thesumofthesetwoexpectedfrequenciesis899,therowtotal.Similarly,thereare258patientsinthesecondrowandsowewouldexpect99×258/1443=17.7in
thesecondrow,firstcolumnand1344×258/1443=240.3inthesecondrow,secondcolumn.Wecalculatetheexpectedfrequencyforeachrowandcolumncombination,orcell.The10cellsofTable13.1giveustheexpectedfrequenciesshowninTable13.2.NoticethattherowandcolumntotalsarethesameasinTable13.1.Ingeneral,theexpectedfrequencyforacellofthecontingencytableisfoundby
Itdoesnotmatterwhichvariableistherowandwhichthecolumn.
Table13.1.Contingencytableshowingtimeofdeliverybyhousingtenure
Housingtenure Preterm Term Total
Owner–occupier 50 849 899
Counciltenant 29 229 258
Privatetenant 11 164 175
Liveswithparents 6 66 72
Other 3 36 39
Total 99 1344 1443
![Page 396: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/396.jpg)
Wenowcomparetheobservedandexpectedfrequencies.Ifthetwovariablesarenotassociated,theobservedandexpectedfrequenciesshouldbeclosetogether,anydiscrepancybeingduetorandomvariation.Weneedateststatisticwhichmeasuresthis.Thedifferencesbetweenobservedandexpectedfrequenciesareagoodplacetostart.Wecannotsimplysumthemasthesumwouldbezero,bothobservedandexpectedfrequencieshavingthesamegrandtotal,1443.Wecanresolvethisasweresolvedasimilarproblemwithdifferencesfromthemean(§4.7),bysquaringthedifferences.Thesizeofthedifferencewillalsodependinsomewayonthenumberofpatients.Whentherowandcolumntotalsaresmall,thedifferencebetweenobservedandexpectedisforcedtobesmall.Itturnsout,forreasonsdiscussedin§13A,thatthebeststatisticis
Thisisoftenwrittenas
ForTable13.1thisis
Aswillbeexplainedin§13A,thedistributionofthisteststatisticwhenthenullhypothesisistrueandthesampleislargeenoughistheChi-squareddistribution(§7A)with(r-1)(c-1)degreesoffreedom,whereristhenumberofrowsand
cisthenumberofcolumns.Ishalldiscusswhatismeantby‘large
![Page 397: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/397.jpg)
enough’in§13.3.Wearetreatingtherowandcolumntotalsasfixedandonlyconsideringthedistributionoftableswiththesetotals.Thetestissaidtobeconditionalonthesetotals.Wecanprovethatweloseverylittleinformationbydoingthisandwegetasimpletest.
Table13.2.ExpectedfrequenciesunderthenullhypothesisforTable13.1
Housingtenure Preterm Term Total
Owner–occupier 61.7 837.3 899
Counciltenant 17.7 240.3 258
Privatetenant 12.0 163.0 175
Liveswithparents 4.9 67.1 72
Other 2.7 36.3 39
Total 99 1344 1443
![Page 398: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/398.jpg)
Fig.13.1.PercentagepointoftheChi-squareddistribution
ForTable13.1wehave(5-1)×(2-1)=4degreesoffreedom.Table13.3showssomepercentagepointsoftheChi-squareddistributionforselecteddegreesoffreedom.Thesearetheupperpercentagepoints,asshowninFigure13.1.Weseethatfor4degreesoffreedomthe5%pointis9.49and1%pointis13.28,soourobservedvalueof10.5hasprobabilitybetween1%and5%,or0.01and0.05.Ifweuseacomputerprogramwhichprintsouttheactualprobability,wefindP=0.03.Thedataarenotconsistentwiththenullhypothesisandwecanconcludethatthereisgoodevidenceofarelationshipbetweenhousingtenureandtimeofdelivery.
Thechi-squaredstatisticisnotanindexofthestrengthoftheassociation.IfwedoublethefrequenciesinTable13.1,thiswilldoublechi-squared,butthestrengthoftheassociationisunchanged.Notethatwecanonlyusethechi-squaredtestwhenthenumbersinthecellsarefrequencies,notwhentheyarepercentages,proportionsormeasurements.
Table13.3.PercentagepointsoftheChi-squareddistribution
![Page 399: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/399.jpg)
Degreesoffreedom
Probabilitythatthetabulatedvalueisexceeded(Figure13.1)
10% 5% 1% 0.1%
1 2.71 3.84 6.63 10.83
2 4.61 5.99 9.21 13.82
3 6.25 7.81 11.34 16.27
4 7.78 9.49 13.28 18.47
5 9.24 11.07 15.09 20.52
6 10.64 12.59 16.81 22.46
7 12.02 14.07 18.48 24.32
8 13.36 15.51 20.09 26.13
9 14.68 16.92 21.67 27.88
10 15.99 18.31 23.21 29.59
11 17.28 19.68 24.73 31.26
12 18.55 21.03 26.22 32.91
![Page 400: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/400.jpg)
13 19.81 22.36 27.69 34.53
14 21.06 23.68 29.14 36.12
15 22.31 25.00 30.58 37.70
16 23.54 26.30 32.00 39.25
17 24.77 27.59 33.41 40.79
18 25.99 28.87 34.81 42.31
19 27.20 30.14 36.19 43.82
20 28.41 31.41 37.57 45.32
Table13.4.Coughduringthedayoratnightatage14forchildrenwithandwithoutahistoryofbronchitisbeforeage5(Hollandetal.1978)
Bronchitis NoBronchitis Total
Cough 26 44 70
Nocough 247 1002 1249
Total 273 1046 1319
![Page 401: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/401.jpg)
13.2Testsfor2by2tablesConsiderthedataoncoughsymptomandhistoryofbronchitisdiscussedin§9.8.Wehad273childrenwithahistoryofbronchitisofwhom26werereportedtohavedayornightcough,and1046childrenwithouthistoryofbronchitis,ofwhom44werereportedtohavedayornightcough.Wecansetthesedataoutasacontingencytable,asinTable13.4.Wecanalsousethechi-squaredtesttotestthenullhypothesisofnoassociationbetweencoughandhistory.TheexpectedvaluesareshowninTable13.5.Theteststatisticis
Wehaver=2rowsandc=2columns,sothereare(r-1)(c-1)=(2-1)×(2-1)=1degreeoffreedom.WeseefromTable13.3thatthe5%pointis3.84,andthe1%pointis6.63,sowehaveobservedsomethingveryunlikelyifthenullhypothesisweretrue.Hencewerejectthenullhypothesisofnoassociationandconcludethatthereisarelationshipbetweenpresentcoughandhistoryofbronchitis.
Table13.5.ExpectedfrequenciesforTable13.4
Bronchitis Nobronchitis Total
Cough 14.49 55.51 70.00
Nocough 258.51 990.49 1249.00
![Page 402: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/402.jpg)
Total 273.00 1046.00 1319.00
Nowthenullhypothesis‘noassociationbetweencoughandbronchitis’isthesameasthenullhypothesis‘nodifferencebetweentheproportionswithcoughinthebronchitisandnobronchitisgroups’.Iftherewereadifference,thevariableswouldbeassociated.Thuswehavetestedthesamenullhypothesisintwodifferentways.Infactthesetestsareexactlyequivalent.IfwetaketheNormaldeviatefrom§9.8,whichwas3.49,andsquareit,weget12.2,thechi-squaredvalue.Themethodof§9.8and§8.6hastheadvantagethatitcanalsogiveusaconfidenceintervalforthesizeofthedifference,whichthechi-squaredmethoddoesnot.Notethatthechi-squaredtestcorrespondstothetwo-sidedztest,eventhoughonlytheuppertailofthechi-squareddistributionisused.
13.3Thechi-squaredtestforsmallsamplesWhenthenullhypothesisistrue,theteststatisticΣ(O-E)2/E,whichwecancallthechi-squaredstatistic,followstheChi-squareddistributionprovidedtheexpectedvaluesarelargeenough.Thisisalargesampletest,likethoseof§9.7and§9.8.Thesmallertheexpectedvaluesbecome,themoredubiouswillbethetest.
TheconventionalcriterionforthetesttobevalidisusuallyattributedtothegreatstatisticianW.G.Cochran.Theruleisthis:thechi-squaredtestisvalidifatleast80%oftheexpectedfrequenciesexceed5andalltheexpectedfrequenciesexceed1.WecanseethatTable13.2satisfiesthisrequirement,sinceonly2outof10expectedfrequencies,20%,arelessthan5andnoneislessthan1.Notethatthisconditionappliestotheexpectedfrequencies,nottheobservedfrequencies.Itisquiteacceptableforanobservedfrequencytobe0,providedtheexpectedfrequenciesmeetthecriterion.
Thiscriterionisopentoquestion.Simulationstudiesappeartosuggestthattheconditionmaybetooconservativeandthatthechi-squaredapproximationworksforsmallerexpectedvalues,especiallyforlargernumbersofrowsandcolumns.Atthetimeofwritingtheanalysisof
![Page 403: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/403.jpg)
tablesbasedonsmallsamplesizes,particularly2by2tables,isthesubjectofhotdisputeamongstatisticians.Asyet,no-onehassucceededindevisingabetterrulethanCochran's,soIwouldrecommendkeepingtoituntilthetheoreticalquestionsareresolved.Any
chi-squaredtestwhichdoesnotsatisfythecriterionisalwaysopentothechargethatitsvalidityisindoubt.
Table13.6.Observedandexpectedfrequenciesofcategoriesofradiologicalappearanceatsixmonthsascomparedwith
appearanceonadmissionintheMRCstreptomycintrial,patientswithaninitialtemperatureof100–100.9°F
Radiologicalassessment
Streptomycin Control
Observed Expected Observed Expected
Improvement 13 8.4 5 9.6
Deterioration 2 4.2 7 4.8
Death 0 2.3 5 2.7
Total 15 15 17 17
Ifthecriterionisnotsatisfiedwecanusuallycombineordeleterowsandcolumnstogivebiggerexpectedvalues.Ofcourse,thiscannotbedonefor2by2tables,whichweconsiderinmoredetailbelow.Forexample,Table13.6showsdatafromtheMRCstreptomycintrial(§2.2),
![Page 404: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/404.jpg)
theresultsofradiologicalassessmentforasubgroupofpatientsdefinedbyaprognosticvariable.Wewanttoknowwhetherthereisevidenceofastreptomycineffectwithinthissubgroup,sowewanttotestthenullhypothesisofnoeffectusingachi-squaredtest.Thereare4outof6expectedvalueslessthan5,sothetestonthistablewouldnotbevalid.Wecancombinetherowssoastoraisetheexpectedvalues.Sincethesmallexpectedfrequenciesareinthe‘deterioration’and‘death’rows,itmakessensetocombinethesetogivea‘deteriorationordeath’row.Theexpectedvaluesarethenallgreaterthan5andwecandothechi-squaredtestwith1degreeoffreedom.Thiseditingmustbedonewithregardtothemeaningofthevariouscategories.InTable13.6,therewouldbenopointincombiningrows1and3togiveanewcategoryof‘considerableimprovementordeath’tobecomparedtotheremainder,asthecomparisonwouldbeabsurd.ThenewtableisshowninTable13.7.Wehave
UnderthenullhypothesisthisisfromaChi-squareddistributionwithonedegreeoffreedom,andfromTable13.3wecanseethattheprobabilityofgettingavalueasextremeas10.8islessthan1%.Wehavedatainconsistentwiththenullhypothesisandwecanconcludethattheevidencesuggestsatreatmenteffectinthissubgroup.
Ifthetabledoesnotmeetthecriterionevenafterreductiontoa2by2table,wecanapplyeitheracontinuitycorrectiontoimprovetheapproximationtotheChi-squareddistribution(§13.5),oranexacttestbasedonadiscretedistribution(§13.4).
Table13.7.ReductionofTable13.6toa2by2table
Radiologicalassessment
Streptomycin Control
Observed Expected Observed Expected
Improvement 13 8.4 5 9.6
![Page 405: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/405.jpg)
Deteriorationordeath
2 6.6 12 7.4
Total 15 15.0 17 17.0
Table13.8.ArtificialdatatoillustrateFisher'sexacttest
Survived Died Total
TreatmentA 3 1 4
TreatmentB 2 2 4
Total 5 3 8
13.4Fisher'sexacttestThechi-squaredtestdescribedin§13.1isalargesampletest.Whenthesampleisnotlargeandexpectedvaluesarelessthan5,wecanturntoanexactdistributionlikethatfortheMann–WhitneyUstatistic(§12.2).ThismethodiscalledFisher'sexacttest.
Theexactprobabilitydistributionforthetablecanonlybefoundwhentherowandcolumntotalsaregiven.Justaswiththelargesamplechi-squaredtest,werestrictourattentiontotableswiththesetotals.Thisdifficultyhasledtomuchcontroversyabouttheuseofthistest.Ishallshowhowthetestworks,thendiscussitsapplicability.
![Page 406: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/406.jpg)
Considerthefollowingartificialexample.Inanexperiment,werandomlyallocate4patientstotreatmentAand4totreatmentB,andgettheoutcomeshowninTable13.8.Wewanttoknowtheprobabilityofsolargeadifferenceinmortalitybetweenthetwogroupsifthetreatmentshavethesameeffect(thenullhypothesis).Wecouldhaverandomizedthesubjectsintotwogroupsinmanyways,butifthenullhypothesisistruethesamethreewouldhavedied.Therowandcolumntotalswouldthereforebethesameforallthesepossibleallocations.Ifwekeeptherowandcolumntotalsconstant,thereareonly4possibletables,showninTable13.9.Thesetablesarefoundbyputtingthevalues0,1,2,3inthe‘DiedingroupA’cell.AnyothervalueswouldmaketheDtotalgreaterthan3.
Now,letuslabeloursubjectsatoh.Thesurvivorswewillcallatoe,andthedeathsftoh.Howmanywayscanthesepatientsbearrangedintwogroupsof4togivetablesi,ii,iiiandiv?Tableicanarisein5ways.Patientsf,g,andhwouldhavetobeingroupB,togive3deaths,andtheremainingmemberofBcouldbea,b,c,dore.Tableiicanarisein30ways.The3survivorsingroupAcanbeabc,abd,abe,acd,ace,ade,bcd,bce,bde,cde,10ways.ThedeathinAcanbef,gorh,3ways.Hencethegroupcanbemadeupin10×3=30ways.Tableiiiisthesameastableii,withAandBreversed,soarisesin30ways.TableivisthesameastableiwithAandBreversed,soarisesin5ways.
Hencewecanarrangethe8patientsinto2groupsof4in5+30+30+5=70ways.Now,theprobabilityofanyonearrangementarisingbychanceis1/70,sincetheyareallequallylikelyifthenullhypothesisistrue.Tableiarisesfrom5ofthe70arrangements,sohadprobability5/70=0.071.Tableiiarisesfrom30outof70arrangements,sohasprobability30/70=0.429.Similarly,Tableiiihasprobability30/70=0.429,andTableivhasprobability5/70=0.071.
Table13.9.PossibletablesforthetotalsofTable13.8
![Page 407: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/407.jpg)
i. S D T
A 4 0 4
B 1 3 4
T 5 3 8
ii
S D T
A 3 1 4
B 2 2 4
T 5 3 8
iii.
S D T
A 2 2 4
B 3 1 4
T 5 3 8
iv.
S D T
A 1 3 4
B 4 0 4
![Page 408: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/408.jpg)
T 5 3 8
Hence,underthenullhypothesisthatthereisnoassociationbetweentreatmentandsurvival,Tableii,whichweobserved,hasaprobabilityof0.429.Itcouldeasilyhavearisenbychanceandsoitisconsistentwiththenullhypothesis.Asin§9.2,wemustalsoconsidertablesmoreextremethantheobserved.Inthiscase,thereisonemoreextremetableinthedirectionoftheobserveddifference,Tablei.Inthedirectionoftheobserveddifference,theprobabilityoftheobservedtableoramoreextremeoneis0.071+0.429=0.5.ThisisthePvalueforaone-sidedtest(§9.5).
Fisher'sexacttestisessentiallyonesided.Itisnotclearwhatthecorrespondingdeviationsintheotherdirectionwouldbe,especiallywhenallthemarginaltotalsaredifferent.Thisisbecauseinthatcasethedistributionisasymmetrical,unlikethoseof§12.2–5.Onesolutionistodoubletheone-sidedprobabilitytogetatwo-sidedtestwhenthisisrequired.IfollowArmitageandBerry(1994)inpreferringthisoption.AnothersolutionistocalculateprobabilitiesforeverypossibletableandsumallprobabilitieslessthanorequaltotheprobabilityfortheobservedtabletogivethePvalue.ThismaygiveasmallerPvaluethanthedoublingmethod.
Thereisnoneedtoenumerateallthepossibletables,asabove.Theprobabilitycanbefoundfromasimpleformula(§13B).Theprobabilityofobservingasetoffrequenciesf11,f12,f21,f22,whentherowandcolumntotalsarer1,r2,c1,andc2andthegrandtotalisn,is
(See§6Aforthemeaningofn!.)Wecancalculatethisforeachpossibletablesofindtheprobabilityfortheobservedtableandeachmoreextremeone.Fortheexample:
![Page 409: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/409.jpg)
givingatotalof0.50asbefore.
Unliketheexactdistributionsfortherankstatistics,thisdistributionisfairlyeasytocalculatebutdifficulttotabulate.Agoodtableofthisdistributionrequiredasmallbook(Finneyetal.1963).
WecanapplythistesttoTable13.7.The2by2tablestobetestedandtheirprobabilitiesare:
Thetotalone-sidedprobabilityis0.0014553,whichdoubledforatwo-sidedtestgives0.0029.ThemethodusingallsmallerprobabilitiesgivesP=0.00159.EitherislargerthantheprobabilityfortheX2valueof10.6,whichis0.0011.
Fisher'sexacttestwasoriginallydevisedforthe2×2tableandonlyusedwhentheexpectedfrequenciesweresmall.Thiswasbecauseforlargernumbersandlargertablesthecalculationswereimpractical.Withcomputersthingshavechanged,andFisher'sexacttestcanbedoneforany2×2table.SomeprogramswillalsocalculateFisher'sexacttestforlargertablesasthenumberofrowsandcolumnsincreases,thenumberofpossibletablesincreasesveryrapidlyanditbecomesimpracticabletocalculateandstoretheprobabilityforeachone.TherearespecialistprogramssuchasStatExactwhichcreatearandomsampleofthepossibletablesandusethemtoestimateadistributionofprobabilities
![Page 410: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/410.jpg)
whosetailareaisthenfound.Methodswhichsamplethepossibilitiesinthiswayare(ratherendearingly)calledMonteCarlomethods.
13.5Yates'continuitycorrectionforthe2by2tableThediscrepancyinprobabilitiesbetweenthechi-squaredtestandFisher'sexacttestarisesbecauseweareestimatingthediscretedistributionoftheteststatisticbythecontinuousChi-squareddistribution.Acontinuitycorrectionlikethoseof§12.6,calledYates'correction,canbeusedtoimprovethefit.Theobservedfrequencieschangeinunitsofone,sowemakethemclosertotheirexpectedvaluesbyonehalf.Hencetheformulaforthecorrectedchi-squaredstatisticfora2by2tableis
Thishasprobability0.0037,whichisclosertotheexactprobability,thoughthereisstillaconsiderablediscrepancy.Atsuchextremelylowvaluesanyapproximateprobabilitymodelsuchasthisisliabletobreakdown.Inthecriticalareabetween0.10and0.01,thecontinuitycorrectionusuallygivesaverygoodfittotheexactprobability.AsFisher'sexacttestisnowsoeasytodo,Yates'correctionmaysoondisappear.
13.6*ThevalidityofFisher'sandYates'methodsTherehasbeenmuchdisputeamongstatisticiansaboutthevalidityoftheexacttestandthecontinuitycorrectionwhichapproximatestoit.Amongthemoreargumentativeofthefoundingfathersofstatistical
![Page 411: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/411.jpg)
inference,suchasFisherandNeyman,thiswasquiteacrimonious.Theproblemisstillunresolved,andgeneratingalmostasmuchheataslight.
Notethatalthoughbothare2by2tables,Tables13.4and13.7aroseindifferentways.InTable13.7,thecolumntotalswerefixedbythedesignoftheexperimentandonlytherowtotalsarefromarandomvariable.InTable13.4neitherrownorcolumntotalsweresetinadvance.BotharefromtheBinomialdistribution,dependingontheincidenceofbronchitisandprevalenceofchroniccoughinthepopulation.Thereisathirdpossibility,thatboththerowandcolumntotalsarefixed.Thisisrareinpractice,butitcanbeachievedbythefollowingexperimentaldesign.Wewanttoknowwhetherasubjectcandistinguishanactivetreatmentfromaplacebo.Wepresenthimwith10tablets,5ofeach,andaskhimtosortthetabletsintothe5activeand5placebo.Thiswouldgivea2by2table,subject'schoiceversustruth,inwhichallrowandcolumntotalsarepresetto5.Thereareseveralvariationsonthesetypesoftable,too.Itcanbeshownthatthesamechi-squaredtestappliestoallthesecaseswhensamplesarelarge.Whensamplesaresmall,thisisnotnecessarilyso.Adiscussionoftheproblemiswellbeyondthescopeofthisbook.Forsomeofthesecases.Fisher'sexacttestandYates'correctionmaybeconservative,that
is,giveratherlargerprobabilitiesthantheyshould,thoughthisisamatterofdebate.MyownopinionisthatYates'correctionandFisher'sexacttestshouldbeused.Ifwemusterr,itseemsbettertoerronthesideofcaution.
Table13.10.The2by2tableinsymbolicnotation
Total
a b a+b
![Page 412: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/412.jpg)
c d c+d
Total a+c b+d a+b+c+d
13.7OddsandoddsratiosIftheprobabilityofaneventispthentheoddsofthateventiso=p/(1-p).Theprobabilitythatacoinshowsaheadis0.5,theoddsis0.5/(1-0.5)=1.Notethat‘odds’isasingularword,notthepluralof‘odd’.Theoddshasadvantagesforsometypesofanalysis,asitisnotconstrainedtoliebetween0and1,butcantakeanyvaluefromzerotoinfinity.Weoftenusethelogarithmtothebaseeoftheodds,thelogoddsorlogit:
Thiscanvaryfromminusinfinitytoplusinfinityandthusisveryusefulinfittingregressiontypemodels(§17.8).Thelogitiszerowhenp=1/2andthelogitof1-pisminusthelogitofp:
ConsiderTable13.4.Theprobabilityofcoughforchildrenwithahistoryofbronchitisis26/273=0.09524.Theoddsofcoughforchildrenwithahistoryofbronchitisis26/247=0.10526.Theprobabilityofcoughforchildrenwithoutahistoryofbronchitisis44/1046=0.04207.Theoddsofcoughforchildrenwithoutahistoryofbronchitisis44/1002=0.04391.
Onewaytocomparechildrenwithandwithoutbronchitisistofindtheratiooftheproportionsofchildrenwithcoughinthetwogroups(therelativerisk,§8.6).Anotheristofindtheoddsratio,theratiooftheoddsofcoughinchildrenwithbronchitisandchildrenwithoutbronchitis.Thisis(26/247)/(44/1002)=0.10526/0.04391=2.39718.Thustheoddsofcoughinchildrenwithahistoryofbronchitisis2.39718timestheoddsofcoughinchildrenwithoutahistoryofbronchitis.
Ifwedenotethefrequenciesinthetablebya,b,c.andd,asinTable
![Page 413: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/413.jpg)
13.10,theoddsratioisgivenby
Thisissymmetrical;wegetthesamethingby
Wecanestimatethestandarderrorandconfidenceintervalusingthelogoftheoddsratio(§13C).Thestandarderrorofthelogoddsratiois:
Hencewecanfindthe95%confidenceinterval.ForTable13.4,thelogoddsratioisloge(2.39718)=0.87429,withstandarderror
Providedthesampleislargeenough,wecanassumethatthelogoddsratiocomesfromaNormaldistributionandhencetheapproximate95%confidenceintervalis
0.87429-1.96×0.25736to0.87429+1.96×0.25736=0.36986to1.37872
Togetaconfidenceintervalfortheoddsratioitselfwemustantilog:
Theoddsratiocanbeusedtoestimatetherelativeriskinacase-controlstudy.Thecalculationofrelativeriskin§8.6dependedonthefactthatwecouldestimatetherisks.Wecoulddothisbecausewehadaprospectivestudyandsoknewhowmanyoftheriskgroupdevelopedthesymptom.Thiscannotbedoneifwestartwiththeoutcome,inthiscasecoughatage14,andtrytoworkbacktotheriskfactor,bronchitis,asinacase–controlstudy.
Table13.11showsdatafromacase–controlstudyofsmokingandlungcancer(see§3.8).Westartwithagroupofcases,patientswithlungcancerandagroupofcontrols,herehospitalpatientswithoutcancer.Wecannotcalculaterisks(thecolumntotalswouldbemeaninglessand
![Page 414: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/414.jpg)
havebeenomitted),butwecanstillestimatetherelativerisk.
Supposetheprevalenceoflungcancerisp,asmallnumber,andthetableisasTable13.10.Thenwecanestimatetheprobabilityofbothhavinglungcancerandbeingasmokerbypa/(a+b),becausea/(a+b)istheconditionalprobabilityofsmokinginlungcancerpatients(§6.8).Similarly,theprobabilityofbeingasmokerwithoutlungcanceris(1-p)c/(c+d).Theprobabilityofbeingasmokeristhereforepa/(a+b)+(1-p)c/(c+d),theprobabilityofbeingasmokerwithlungcancerplustheprobabilityofbeingasmokerwithoutlungcancer.Becausepismuchsmallerthan1-p,thefirsttermcanbeignoredand
theprobabilityofbeingasmokerisapproximately(1-p)c/(c+d).Theriskoflungcancerforsmokersisfoundbydividingtheprobabilityofbeingasmokerwithlungcancerbytheprobabilitybeingasmoker:
Table13.11.Smokersandnon-smokersamongmalecancerpatientsandcontrols(DollandHill1950)
Smokers Non-smokers Total
Lungcancer 647 2 649
Controls 622 27 649
Similarly,theprobabilityofbothbeinganon-smokerandhavinglungcancerispb/(a+b)andtheprobabilityofbeinganon-smokerwithoutlungcanceris(1-p)d/(c+d).Theprobabilityofbeinganon-smokeris
![Page 415: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/415.jpg)
thereforepb/(a+b)+(1-p)d/(c+d),andsincepismuchsmallerthan1-p,thefirsttermcanbeignoredandtheprobabilityofbeinganon-smokerisapproximately(1-p)d/(c+d).Thisgivesariskoflungcanceramongnon-smokersofapproximately
Therelativeriskoflungcancerforsmokersisthus,approximately,
Thisis,ofcourse,theoddsratio.Thusforcasecontrolstudiestherelativeriskisapproximatedbytheoddsratio.
ForTable13.11wehave
Thustheriskoflungcancerinsmokersisabout14timesthatofnon-smokers.Thisisasurprisingresultfromatablewithsofewnon-smokers,butadirectestimatefromthecohortstudy(Table3.1)is0.90/0.07=12.9,whichisverysimilar.Thelogoddsratiois2.64210anditsstandarderroris
Hencetheapproximate95%confidenceintervalis
Table13.12.Coughduringthedayoratnightandcigarettesmokingby12-year-oldboys(Blandetal.1978)
Boy'ssmoking
Non-smoker Occasional Regular
![Page 416: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/416.jpg)
Cough 266 20.4% 395 28.8% 80 46.5%
Nocough
1037 79.6% 977 71.2% 92 53.5%
Total 1303 100.0% 1372 100.0% 172 100.0%
Togetaconfidenceintervalfortheoddsratioitselfwemustantilog:
Theverywideconfidenceintervalisbecausethenumbersofnon-smokers,particularlyforlungcancercases,aresosmall.
13.8*Thechi-squaredtestfortrendConsiderthedataofTable13.12.Usingthechi-squaredtestdescribedin§13.1,wecantestthenullhypothesisthatthereisnorelationshipbetweenreportedcoughandsmokingagainstthealternativethatthereisarelationshipofsomesort.Thechi-squaredstatisticis64.25,with2degreesoffreedom,P<0.001.Thedataarenotconsistentwiththenullhypothesis.
Now,wewouldhavegotthesamevalueofchi-squaredwhatevertheorderofthecolumns.Thetestignoresthenaturalorderingofthecolumns,butwemightexpectthatiftherewerearelationshipbetweenreportedcoughandsmoking,theprevalenceofcoughwouldbegreaterforgreateramountsofsmoking.Inotherwords,welookforatrendincoughprevalencefromoneendofthetabletotheother.Wecantestforthisusingthechi-squaredtestfortrend.
First,wedefinetworandomvariables.XandY,whosevaluesdependonthecategoriesoftherowandcolumnvariables.Forexample,wecouldputX=1fornon-smokers,X=2foroccasionalsmokersandX=3forregularsmokers,andputY=1for‘cough’andY=2for‘nocough’.Thenforanon-smokerwhocoughs,thevalueofXis1andthevalueofYis1.BothXandYmayhavemorethantwocategories,providedbothareordered.Iftherearenindividuals,wehavenpairsofobservations
![Page 417: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/417.jpg)
(xi,yi).Ifthereisalineartrendacrossthetable,therewillbelinearregressionofYonXwhichhasnon-zeroslope.Wefittheusualleastsquaresregressionline,Y=a+bX,where
andwheres2istheestimatedvarianceofY.Insimplelinearregression,asdescribedinChapter11,weareusuallyconcernedwithestimatingbandmakingstatementsaboutitsprecision.Hereweareonlygoingtotestthenullhypothesisthatinthepopulationb=0.Underthenullhypothesis,thevarianceaboutthelineisequaltothetotalvarianceofY,sincethelinehaszeroslope.Weusethe
estimate
(Weusenasthedenominator,notn-1,becausethetestisconditionalontherowandcolumntotalsasdescribedin§13A.Thereisagoodreasonforit,butitisnotworthgoingintohere.)Asin§11.5,thestandarderrorofbis
Forpracticalcalculationsweusethealternativeformsofthesumsofsquaresandproducts:
NotethatitdoesnotmatterwhichvariableisXandwhichisY.The
![Page 418: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/418.jpg)
sumsofsquaresandproductsareeasytoworkout.Forexample,forthecolumnvariable,X,wehave1303individualswithX=1,1372withX=2and172withX=3.Forourdatawehave
Similarly,Σy2i=9165andΣyi=4953;
=59.47
Ifthenullhypothesisistrue,χ2iisanobservationfromtheChi-squareddistributionwith1degreeoffreedom.Thevalue59.47ishighlyunlikelyfromthisdistributionandthetrendissignificant.
Thereareseveralpointstonoteaboutthismethod.ThechoiceofvaluesforXandYisarbitrary.ByputtingX=1,2or3weassumedthatthedifferencebetweennon-smokersandoccasionalsmokersisthesameasthatbetweenoccasionalsmokersandsmokers.ThisneednotbesoandadifferentchoiceofXwouldgiveadifferentchi-squaredfortrendstatistic.Thechoiceisnotcritical,however.Forexample,puttingX=1,2or4,somakingregularsmokersmoredifferentfromoccasionalsmokersthanoccasionalsmokersarefromnon-smokers,wegetx2fortrendtobe64.22.Thefittothedataisratherbetter,buttheconclusionsareunchanged.
Thetrendmaybesignificanteveniftheoverallcontingencytablechi-squaredisnot.Thisisbecausethetestfortrendhasgreaterpowerfordetectingtrendsthanhastheordinarychi-squaredtest.Ontheotherhand,ifwehadanassociationwherethosewhowereoccasionalsmokershadfarmoresymptomsthaneithernon-smokersorregularsmokers,thetrendtestwouldnotdetectit.Ifthehypothesiswewish
![Page 419: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/419.jpg)
totestinvolvestheorderofthecategories,weshouldusethetrendtest,ifitdoesnotweshouldusethecontingencytabletestof§13.1.Notethatthetrendteststatisticisalwayslessthantheoverallchi-squaredstatistic.
Thedistributionofthetrendchi-squaredstatisticdependsonalargesampleregressionmodel,notonthetheorygivenin§13A.ThetabledoesnothavetomeetCochran'srule(§13.3)forthetrendtesttobevalid.Aslongasthereareatleast30observationstheapproximationshouldbevalid.
Somecomputerprogramsofferaslightlydifferenttest,theMantel–Haenzseltrendtest(nottobeconfusedwiththeMantel–Haenzselmethodforcombining2by2tables,§17.11).Thisisalmostidenticaltothemethoddescribedhere.Asanalternativetothechi-squaredtestfortrend,wecouldcalculateKendall'srankcorrelationcoefficient,τb,betweenXandY(§12.5).ForTable13.12wegetτb=-0.136withstandarderror0.018.Wegetaχ21statisticby(τb/SE(τb))2=57.09.ThisisverysimilartotheX2fortrendvalue59.47.
13.9*MethodsformatchedsamplesThechi-squaredtestdescribedaboveenablesus,amongotherthings,totestthenullhypothesisthatbinomialproportionsestimatedfromtwoindependentsamplesarethesame.Wecandothisfortheonesampleormatchedsampleproblemalso.Forexample,Hollandetal.(1978)obtainedrespiratorysymptomquestionnairesfor1319Kentschoolchildrenatages12and14.Onequestionweaskedwaswhethertheprevalenceofreportedsymptomswasdifferentatthetwoages.Atage12,356(27%)childrenwerereportedtohavehadseverecoldsinthepast12monthscomparedto468(35%)atage14.Wasthereevidenceofarealincrease?Justasintheonesampleorpairedttest(§10.2)wewouldhope
toimproveouranalysisbytakingintoaccountthefactthatthisisthesamesample.Wemightexpect,forinstance,thatsymptomsonthetwooccasionswillberelated.
![Page 420: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/420.jpg)
Table13.13.SeverecoldsreportedattwoagesforKentschoolchildren(Hollandetal.1978)
Severecoldsatage12
Severecoldsatage14 Total
Yes No
Yes 212 144 356
No 256 707 963
Total 468 851 1319
ThemethodwhichenablesustodothisisMcNemar'stest,anotherversionofthesigntest.Weneedtoknowthat212childrenwerereportedtohavecoldsonbothoccasions.144tohavecoldsat12butnotat14,256tohavecoldsat14butnotat12and707tohavecoldsatneitherage.Table13.13showsthedataintabularform.
Thenullhypothesisisthattheproportionssayingyesonthefirstandsecondoccasionsarethesame,thealternativebeingthatoneexceedstheother.Thisisahypothesisabouttherowandcolumntotals,quitedifferentfromthatforthecontingencytablechi-squaredtest.Ifthenullhypothesisweretruewewouldexpectthefrequenciesfor‘yes,no’and‘no,yes’tobeequal.Inotherwords,asmanyshouldgoupasdown.(Comparethiswiththesigntest,§9.2.)Ifwedenotethesefrequenciesbyfynandfny,thentheexpectedfrequencieswillbe(fyn+fny)/2.Wegettheteststatistic:
![Page 421: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/421.jpg)
whichfollowsaChi-squareddistributionprovidedtheexpectedvaluesarelargeenough.Therearetwoobservedfrequenciesandoneconstraint,thatthesumoftheobservedfrequencies=thesumoftheexpectedfrequencies.Hencethereisonedegreeoffreedom.Likethechi-squaredtest(§13.1)andFisher'sexacttest(§13.4),weassumeatotaltobefixed.Inthiscaseitisfyn+fny,nottherowandcolumntotals,whicharewhatwearetesting.Theteststatisticcanbesimplifiedconsiderably,to:
ForTable13.13,wehave
ThiscanbereferredtoTable13.3withonedegreeoffreedomandisclearlyhighlysignificant.Therewasadifferencebetweenthetwoages.Astherewasnochangeinanyoftheothersymptomsstudied,wethoughtthatthiswaspossiblyduetoanepidemicofupperrespiratorytractinfectionjustbeforethesecondquestionnaire.
Thereisacontinuitycorrection,againduetoYates.Iftheobservedfrequencyfynincreasesby1,fnydecreasesby1andfyn-fnyincreasesby2.Thushalfthedifferencebetweenadjacentpossiblevaluesis1andwemaketheobserveddifferencenearertotheexpecteddifference(zero)by1.Thusthecontinuitycorrectedteststatisticis
where|fyn-fny|istheabsolutevalue,withoutsign.ForTable13.13:
Thereisverylittledifferencebecausetheexpectedvaluesaresolargebutiftheexpectedvaluesaresmall,saylessthan20,thecorrectionisadvisable.Forsmallsamples,wecanalsotakefnyasanobservation
![Page 422: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/422.jpg)
fromtheBinomialdistributionwithp=½andn=fyn+fnyandproceedasforthesigntest(§9.2).
Wecanfindaconfidenceintervalforthedifferencebetweentheproportions.Theestimateddifferenceisp1-p2=(fyn-fyn)/n.Werearrangethis:
WecantreatthefynasanobservationfromaBinomialdistributionwithparametern=fyn+fny,which,ofcourse,wearetreatingasfixed.(IamusingnheretomeantheparameteroftheBinomialdistributionasin§6.4,nottomeanthetotalsamplesize.)Wefindaconfidenceintervalforfyn/(fyn+fny)usingeitherthezmethodof§8.4ortheexactmethodof§8.8.Wethenmultiplytheselimitsby2,subtract1andmultiplyby(fyn+fny)/n.
Fortheexample,theestimateddifferenceis(144-256)/1319=-0.085.Fortheconfidenceinterval,fyn+fny=400andfyn=144.The95%confidenceintervalforfyn/(fyn+fny)is0.313to0.407bythelargesamplemethod.Hencetheconfidenceintervalforp1-p2is(2×0.313-1)×400/1319=-0.113to(2×0.407-1)×400/1319=-0.056.Weestimatethattheproportionofcoldsonthefirstoccasionwaslessthanthatonthesecondbybetween0.06and0.11.
Wemaywishtocomparethedistributionofavariablewiththreeormorecategoriesinmatchedsamples.Ifthecategoriesareordered,likesmokingexperienceinTable13.12,weareusuallylookingforashiftfromoneendofthedistributiontotheother,andwecanusethesigntest(§9.2),countingpositiveswhensmokingincreased,negativewhenitdecreased,andzeroifthecategory
wasthesame.Whenthecategoriesarenotordered,asTable13.1thereisatestduetoStuart(1955),describedbyMaxwell(1970).Thetestisdifficulttodoandthesituationisveryunusual,soIshallomitdetails.MyfreeprogramClinstatwilldoit(§1.3).
![Page 423: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/423.jpg)
Table13.14.Parityof125womenattendingantenatalclinicsatSt.George'sHospital,withthecalculationofthechi-squaredgoodnessoffittest
Wecanalsofindanoddsratioforthematchedtable,calledtheconditionaloddsratio.LikeMcNemar'smethod,itusesthefrequenciesintheoffdiagonalonly.Theestimateisverysimple:fyn/fny.ThusforTable13.13theoddsofhavingseverecoldsatage12is144/256=0.56timesthatatage14.Thisexampleisnotveryinteresting,butthemethodisparticularlyusefulinmatchedcase–controlstudies,whereitprovidesanestimateoftherelativerisk.Aconfidenceintervalisprovidedinthesamewayasforthedifferencebetweenproportions.Wecanestimatep=fyn/(fyn+fny)andthentheoddsratioisgivenbyp/(1-p).Fortheexample,p=144/400=0.36andturningpbacktotheoddsratiop/(1-p)=0.36/(1-0.36)=0.56asbefore.The95%confidenceintervalforpis0.313to0.4071,asabove.Hencethe95%confidenceintervalfortheconditionaloddsratiois0.31/(1-0.31)=0.45to0.41/(1-0.41)=0.69.
13.10*Thechi-squaredgoodnessoffittestAnotheruseoftheChi-squareddistributionisthegoodnessoffittest.HerewetestthenullhypothesisthatafrequencydistributionfollowssometheoreticaldistributionsuchasthePoissonorNormal.Table13.14showsafrequencydistribution.Weshalltestthenullhypothesis
![Page 424: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/424.jpg)
thatitisfromaPoissondistribution,i.e.thatconceptionisarandomeventamongfertilewomen.
FirstweestimatetheparameterofthePoissondistribution,itsmean,µ,inthiscase0.816.Wethencalculatetheprobabilityforeachvalueofthevariable,usingthePoissonformulaof§6.7:
whereristhenumberofevents.TheprobabilitiesareshowninTable13.14.Theprobabilitythatthevariableexceedsfiveisfoundbysubtractingtheprobabilitiesfor0,1,2,3,4,and5from1.0.Wethenmultiplythesebythenumberof
observations,125,togivethefrequencieswewouldexpectfrom125observationsfromaPoissondistributionwithmepn0.816.
Table13.15.Timeofonsetof554strokesWroeetal.(1992)
Time Frequency Time Frequency
00.01–02.00 21 12.01–14.00 34
02.01–04.00 16 14.01–16.00 59
04.01–06.00 22 16.01–18.00 44
06.01–08.00 104 18.01–20.00 51
08.01–10.00 95 20.01–22.00 32
![Page 425: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/425.jpg)
10.01–12.00 66 22.01–24.00 10
Wenowhaveasetofobservedandexpectedfrequenciesandcancomputeachi-squaredstatisticintheusualway.Wewantalltheexpectedfrequenciestobegreaterthan5ifpossible.Weachievethisherebycombiningallthecategoriesforparitygreaterthanorequalto3.Wethenadd(O-E)2/Eforthecategoriestogiveaχ2statistic.Wenowfindthedegreesoffreedom.Thisisthenumberofcategoriesminusthenumberofparametersfittedfromthedata(oneintheexample)minusone.Thuswehave4-1-1=2degreesoffreedom.FromTable13.3theobservedχ2valueof2.99hasP>0.10andthedeviationfromthePoissondistributionisclearlynotsignificant.
Thesametestcanbeusedfortestingthefitofanydistribution.Forexample,Wroeetal.(1992)studieddiurnalvariationinonsetofstrokes.Table13.15showsthefrequencydistributionoftimesofonset.Ifthenullhypothesisthatthereisnodiurnalvariationweretrue,thetimeatwhichstrokesoccurredwouldfollowaUniformdistribution(§7.2).Theexpectedfrequencyineachtimeintervalwouldbethesame.Therewere554casesaltogether,sotheexpectedfrequencyforeachtimeis554/12=46.167.Wethenworkout(O-E)2/Eforeachintervalandaddtogivethechi-squaredstatistic,inthiscaseequalto218.8.Thereisonlyoneconstraint,thatthefrequenciestotal554,asnoparametershavebeenestimated.HenceifthenullhypothesisweretruewewouldhaveanobservationfromtheChi-squareddistributionwith12-1=11degreesoffreedom.Thecalculatedvalueof218.8isveryunlikely,P<0.001fromTable13.3,andthedataarenotconsistentwiththenullhypothesis.WhenwetesttheequalityofasetoffrequencieslikethisthetestisalsocalledthePoissonheterogeneitytest.
Appendices
13AAppendix:Whythechi-squaredtestworks
![Page 426: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/426.jpg)
WenotedsomeofthepropertiesoftheChi-squareddistributionin§7A.Inparticular,itisthesumofthesquaresofasetofindependentStandardNormalvariables,andifwelookatasubsetofvaluesdefinedbyindependentlinearrelationshipsbetweenthesevariablesweloseonedegreeoffreedomforeachconstraint.Itisonthesetwopropertiesthatthechi-squaredtestdepends.
SupposewedidnothaveafixedsizetothebirthstudyofTable13.1,butobservedsubjectsastheydeliveredoverafixedtime.Thenthenumberin
agivencellofthetablewouldbefromaPoissondistributionandthesetofPoissonvariablescorrespondingtothecellfrequencywouldbeindependentofoneanother.OurtableisonesetofsamplesfromthesePoissondistributions.However,wedonotknowtheexpectedvaluesofthesedistributionsunderthenullhypothesis;weonlyknowtheirexpectedvaluesifthetablehastherowandcolumntotalsweobserved.Wecanonlyconsiderthesubsetofoutcomesofthesevariableswhichhastheobservedrowandcolumntotals.Thetestissaidtobeconditionalontheserowandcolumntotals.
Table13.16.Symbolicrepresentationofa2×2table
Total
f11 f12 r1
f21 f22 r2
Total c1 c2 n
![Page 427: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/427.jpg)
ThemeanandvarianceofaPoissonvariableareequal(§6.7).Ifthenullhypothesisistrue,themeansofthesevariableswillbeequaltotheexpectedfrequencycalculatedin§13.1.ThusO,theobservedcellfrequency,isfromaPoissondistributionwithmeanE,theexpectedcellfrequency,andstandarddeviation√E.ProvidedEislargeenough,thisPoissondistributionwillbeapproximatelyNormal.Hence(O-E)/√EisfromaNormaldistributionmean0andvariance1.Henceifwefind
thisisthesumofthesquaresofasetofNormallydistributedrandomvariableswithmean0andvariance1,andsoisfromaChi-squareddistribution(§7A).
Wewillnowfindthedegreesoffreedom.Althoughtheunderlyingvariablesareindependent,weareonlyconsideringasubsetdefinedbytherowandcolumntotals.ConsiderthetableasinTable13.16.Here,f11tof22aretheobservedfrequencies,r1,r2therowtotals,c1,c2thecolumntotals,andnthegrandtotal.Denotethecorrespondingexpectedvaluesbye11toe22.Therearethreelinearconstraintsonthefrequencies:
Anyotherconstraintcanbemadeupofthese.Forexample,wemusthave
Thiscanbefoundbysubtractingthesecondequationfromthefirst.Eachoftheselinearconstraintsonf11tof22isalsoalinearconstrainton(f11-e11)/√e11
to(f22-e22)/√e22.Thisisbecausee11isfixedandso(f11-e11)/√e11isalinearfunctionoff11.Therearefourobservedfrequenciesandsofour
![Page 428: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/428.jpg)
(O-E)/√Evariables,withthreeconstraints.Weloseonedegreeoffreedomforeachconstraintandsohave4-3=1degreeoffreedom.
Ifwehaverrowsandccolumns,thenwehaveoneconstraintthatthesumofthefrequenciesisn.Eachrowmustaddup,butwhenwereachthelastrowtheconstraintcanbeobtainedbysubtractingthefirstr-1rowsfromthegrandtotal.Therowscontributeonlyr-1furtherconstraints.Similarlythecolumnscontributec-1constraints.Hence,therebeingrcfrequencies,thedegreesoffreedomare
Sowehavedegreesoffreedomgivenbythenumberofrowsminusonetimesthenumberofcolumnsminusone.
13BAppendix:TheformulaforFisher'sexacttest
ThederivationofFisher'sformulaisstrictlyforthealgebraicallyminded.Rememberthatthenumberofwaysofchoosingrthingsoutofnthings(§6A)isn!/r!(n-r)!.Now,supposewehavea2by2tablemadeupofnasshowninTable13.16.First,weaskhowmanywaysnindividualscanbearrangedtogivemarginaltotals,r1,r2,c1andc2.Theycanbearrangedincolumnsinn!/c1!c2!ways,sincewearechoosingc1objectsoutofn,andinrowsn!/r1!r2!ways.(Remembern-c1=c2andn-r1=r2.)Hencetheycanbearrangedin
ways.Forexample,thetablewithtotals
canhappenin
Aswesawin§13.4,thecolumnscanbearrangedin70ways.Nowweask,ofthesewayshowmanymakeupaparticulartable?Wearenowdividingthenintofourgroupsofsizesf11,f12,f21andf12.Wecan
![Page 429: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/429.jpg)
choosethefirstgroupinn!/f11!(n-f11)!ways,asbefore.Wearenowleftwithn-f11individuals,sowecanchoosef12in(n-f11)!/f12!(n-f11-f12)!.Wearenowleftwithn-f11-f12,andsowechoosef21in(n-f11-f12)!/f21!ways.Thisleavesn-f11-f12-f21,whichis,ofcourse,equaltof22andsof22canonlybechoseninoneway.Hencewehavealtogether:
becausen-f11-f12-f12=f22.Sooutofthe
possibletables,thegiventablesarisesin
ways.Theprobabilityofthistablearisingbychanceis
13CAppendix:Standarderrorforthelogoddsratio
Thisisforthemathematicalreader.Westartwithageneralresultconcerninglogtransformations.IfXisarandomvariablewithmeanµ,
![Page 430: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/430.jpg)
theapproximatevarianceofloge(X)isgivenby
Ifaneventhappensatimesanddoesnothappenbtimes,thelogoddsisloge(a/b)-loge(a)-loge(b).ThefrequenciesaandbarefromindependentPoissondistributionswithmeansestimatedbyaandbrespectively.Hencetheirvariancesareestimatedby1/aand1/brespectively.Thevarianceofthelogoddsisgivenby
Thestandarderrorofthelogoddsisthusgivenby
Thelogoddsratioisthedifferencebetweenthelogodds:
Thevarianceofthelogoddsratioisthesumofthevariancesofthelogoddsandfortable2wehave
Thestandarderroristhesquarerootofthis:
13MMultiplechoicequestions67to73
![Page 431: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/431.jpg)
(Eachbranchiseithertrueorfalse)
67.Thestandardchi-squaredtestfora2by2contingencytableisvalidonlyif:
(a)alltheexpectedfrequenciesaregreaterthanfive;
(b)bothvariablesarecontinuous;
(c)atleastonevariableisfromaNormaldistribution;
(d)alltheobservedfrequenciesaregreaterthanfive;
(e)thesampleisverylarge.
ViewAnswer
68.Inachi-squaredtestfora5by3contingencytable:
(a)variablesmustbequantitative;
(b)observedfrequenciesarecomparedtoexpectedfrequencies;
(c)thereare15degreesoffreedom;
(d)atleast12cellsmusthaveexpectedvaluesgreaterthanfive;
(e)alltheobservedvaluesmustbegreaterthanone.
ViewAnswer
Table13.17.Coughfirstthinginthemorninginagroupofschoolchildren,asreportedbythechildandbythechild'sparents(Blandetal.1979)
Parents'reportChild'sreport
TotalYes No
Yes 29 104 133
![Page 432: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/432.jpg)
No 172 5097 5269
Total 201 5201 5402
69.InTable13.17:
(a)theassociationbetweenreportsbyparentsandchildrencanbetestedbyachi-squaredtest;
(b)*thedifferencebetweensymptomprevalenceasreportedbychildrenandparentscanbetestedbyMcNemar'stest;
(c)*ifMcNemar'stestissignificant,thecontingencychi-squaredtestisnotvalid;
(d)thecontingencychi-squaredtesthasonedegreeoffreedom;
(e)itwouldbeimportanttousethecontinuitycorrectioninthecontingencychi-squaredtest.
ViewAnswer
70.Fisher'sexacttestforacontingencytable:
(a)appliesto2by2tables;
(b)usuallygivesalargerprobabilitythantheordinarychi-squaredtest;
(c)usuallygivesaboutthesameprobabilityasthechi-squaredtestwithYates'continuitycorrection;
(d)issuitablewhenexpectedfrequenciesaresmall;
(e)isdifficulttocalculatewhentheexpectedfrequenciesarelarge.
ViewAnswer
71.Whenanoddsratioiscalculatedfroma2by2table:
![Page 433: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/433.jpg)
(a)theoddsratioisameasureofthestrengthoftherelationshipbetweentherowandcolumnvariables;
(b)iftheorderoftherowsandtheorderofthecolumnsisreversed,theoddsratiowillbeunchanged;
(c)theratiomaytakeanypositivevalue;
(d)theoddsratiowillbechangedtoitsreciprocaliftheorderofthecolumnsischanged;
(e)theoddsratioistheratiooftheproportionsofobservationsinthefirstrowforthetwocolumns.
ViewAnswer
Table13.18.BirdattacksonmilkbottlesreportedbycasesofCampylobacterjejuniinfectionand
controls(Southernetal.1990)
Numberofdaysofweekwhenattackstookplace
NumberofOR
Cases Controls
0 3 42 1
1–3 11 3 51
4–5 5 1 70
6–7 10 1 140
![Page 434: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/434.jpg)
72.Table13.18appearedinthereportofacasecontrolstudyofinfectionwithCampylobacterjejuni(§3E):
(a)*achi-squaredtestfortrendcouldbeusedtotestthenullhypothesisthatriskofdiseasedoesnotincreasewiththenumberofbirdattacks;
(b)‘OR’meanstheoddsratio;
(c)*asignificantchi-squaredtestwouldshowthatriskofdiseaseincreaseswithincreasingnumbersofbirdattacks;
(d)‘OR’providesanestimateoftherelativeriskofCampylobacterjejuniinfection;
(e)*Kendall'srankcorrelationcoefficient,τb,couldbeusedtotestthenullhypothesisthatriskofdiseasedoesnotincreasewiththenumberofbirdattacks.
ViewAnswer
73.*McNemar'stestcouldbeused:
(a)tocomparethenumbersofcigarettesmokersamongcancercasesandageandsexmatchedhealthycontrols;
(b)toexaminethechangeinrespiratorysymptomprevalenceinagroupofasthmaticsfromwintertosummer;
(c)tolookattherelationshipbetweencigarettesmokingandrespiratorysymptomsinagroupofasthmatics;
(d)toexaminethechangeinPEFRinagroupofasthmaticsfromwintertosummer;
(e)tocomparethenumberofcigarettesmokersamongagroupofcancercasesandarandomsampleofthegeneralpopulation.
ViewAnswer
13EExercise:AdmissionstohospitalinaheatwaveInthisexerciseweshalllookatsomedataassembledtotestthehypothesisthatthereisaconsiderableincreaseinthenumberof
![Page 435: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/435.jpg)
admissionstogeriatricwardsduringheatwaves.Table13.19showsthenumberofadmissionstogeriatricwardsinahealthdistrictforeachweekduringthesummersof1982,whichwascold,and1983,whichwashot.Alsoshownaretheaverageofthedailypeaktemperaturesforeachweek.
1.Whendoyouthinktheheatwavebeganandended?
ViewAnswer
2.Howmanyadmissionswerethereduringtheheatwaveandinthecorrespondingperiodof1982?Wouldthisbesufficientevidencetoconcludethatheatwavesproduceanincreaseinadmissions?
ViewAnswer
3.Wecanusetheperiodsbeforeandaftertheheatwaveweeksascontrolsforchangesinotherfactorsbetweentheyears.Dividetheyearsintothreeperiods,before,during,andaftertheheatwaveandsetupatwo-waytableshowingnumbersofadmissionsbyperiodandyear.
ViewAnswer
Table13.19.MeanpeakdailytemperaturesforeachweekfromMaytoSeptemberof1982and1983,withgeriatricadmissionsinWandsworth
(Fish1985)
Week
Meanpeak,°C Admissions
Week
Meanpeak,°C
1982 1983 1982 1983 1982 1983
1 12.4 15.3 24 20 12 21.7 25.0
![Page 436: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/436.jpg)
2 18.2 14.4 22 17 13 22.5 27.3
3 20.4 15.5 21 21 14 25.7 22.9
4 18.8 15.6 22 17 15 23.6 24.3
5 25.3 19.6 24 22 16 20.4 26.5
6 23.2 21.6 15 23 17 19.6 25.0
7 18.6 18.9 23 20 18 20.2 21.2
8 19.4 22.0 21 16 19 22.2 19.7
9 20.6 21.0 18 24 20 23.3 16.6
10 23.4 26.5 21 21 21 18.1 18.4
11 22.8 30.4 17 20 22 17.3 20.7
4.Wecanusethistabletotestforaheatwaveeffect.Statethenullhypothesisandcalculatethefrequenciesexpectedifthenullhypothesisweretrue.
ViewAnswer
5.Testthenullhypothesis.Whatconclusionscanyoudraw?
ViewAnswer
6.Whatotherinformationcouldbeusedtotesttherelationship
![Page 437: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/437.jpg)
betweenheatwavesandgeriatricadmissions?
ViewAnswer
![Page 438: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/438.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>14-Choosingthestatisticalmethod
14
Choosingthestatisticalmethod
14.1*MethodorientedandproblemorientedteachingThechoiceofmethodofanalysisforaproblemdependsonthecomparisontobemadeandthedatatobeused.InChapters8,9,10,11,12,and13,statisticalmethodshavebeenarrangedlargelybytypeofdata,largesamples,Normal,ordinal,categorical,etc,ratherthanbytypeofcomparison.Inthischapterwelookathowtheappropriatemethodischosenforthethreemostcommonproblemsinstatisticalinference:
comparisonoftwoindependentgroups,forexample,groupsofpatientsgivendifferenttreatments;
comparisonoftheresponseofonegroupunderdifferentconditions,asinacross-overtrial,orofmatchedpairsofsubjects,asinsomecase–controlstudies;
investigationoftherelationshipbetweentwovariablesmeasuredonthesamesampleofsubjects.
ThischapteractsasamapofthemethodsdescribedinChapters8,9,10,11,12,and13.Subsequentchaptersdescribemethodsforspecialproblemsinclinicalmedicine,populationstudy,dealingwithseveralfactorsatonce,andthechoiceofsamplesize.
Aswasdiscussedin§12.7,thereareoftenseveraldifferentapproachestoevenasimplestatisticalproblem.Themethodsdescribedhereandrecommendedforparticulartypesofquestionmaynotbetheonlymethods,andmaynotalwaysbeuniversallyagreedasthebest
![Page 439: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/439.jpg)
method.Statisticiansareatleastaspronetodisagreeasclinicians.However,thesewouldusuallybeconsideredasvalidandsatisfactorymethodsforthepurposesforwhichtheyaresuggestedhere.Whenthereismorethanonevalidapproachtoaproblem,theywillusuallybefoundtogivesimilaranswers.
14.2*TypesofdataThestudydesignisonefactorwhichdeterminesthemethodofanalysis,thevariablebeinganalysedisanother.Wecanclassifyvariablesintothefollowingtypes:
RatioscalesTheratiooftwoquantitieshasameaning,sowecansaythatoneobservationistwiceanother.Humanheightisaratioscale.Ratioscales
allowustocarryoutpowertransformationslikelogorsquareroot.
IntervalscalesTheintervalordistancebetweenpointsonthescalehasprecisemeaning,achangeofoneunitatonescalepointisthesameasachangeofoneunitatanother.Forexample,temperaturein°Cisanintervalscale,thoughnotaratioscalebecausethezeroisarbitrary.Wecanaddandsubtractonanintervalscale.Allratioscalesarealsointervalscales.Intervalscalesallowustocalculatemeansandvariances,andtofindstandarderrorsandconfidenceintervalsforthese.
OrdinalscaleThescaleenablesustoorderthesubjects,fromthatwiththelowestvaluetothatwiththehighest.Anytieswhichcannotbeorderedareassumedtobebecausethemeasurementisnotsufficientlyprecise.Atypicalexamplewouldbeananxietyscorecalculatedfromaquestionnaire.Apersonscoring10ismoreanxiousthanapersonscoring8,butnotnecessarilyhigherbythesameamountthatapersonscoring4ishigherthanapersonscoring2.
![Page 440: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/440.jpg)
OrderednominalscaleWecangroupsubjectsintoseveralcategories,whichhaveanorder.Forexample,wecanaskpatientsiftheirconditionismuchimproved,improvedalittle,nochange,alittleworse,muchworse.
NominalscaleWecangroupsubjectsintocategorieswhichneednotbeorderedinanyway.Eyecolourismeasuredonanominalscale.
DichotomousscalesSubjectsaregroupedintoonlytwocategories,forexample:survivedordied.Thisisaspecialcaseofthenominalscale.
Clearlytheseclassesarenotmutuallyexclusive,andanintervalscaleisalsoordinal.Sometimesitisusefultoapplymethodsappropriatetoalowerlevelofmeasurement,ignoringsomeoftheinformation.Thecombinationofthetypeofcomparsionandthescaleofmeasurementshoulddirectustotheappropriatemethod.
14.3*ComparingtwogroupsThemethodsusedforcomparingtwogroupsaresummarizedinTable14.1.
Intervaldata.Forlargesamples,saymorethan50ineachgroup,confidenceintervalsforthemeancanbefoundbytheNormalapproximation(§8.5).Forsmallersamples.confidenceintervalsforthemeancanbefoundusingthetdistributionprovidedthedatafolloworcanbetransformedtoaNormaldistribution(§10.3,§10.4).Ifnot,asignificancetestofthenullhypothesisthatthemeansareequalcanbecarriedoutusingtheMann–WhitneyUtest(§12.2).Thiscanbeusefulwhenthedataarecensored,thatis,therearevaluestoosmallortoolargetomeasure.Thishappens,forexample,whenconcentrationsaretoosmalltomeasureandlabelled‘notdetectable’.ProvidedthatdataarefromNormaldistributions,itispossibletocomparethevariancesofthegroupsusingtheFtest(§10.8).
![Page 441: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/441.jpg)
Ordinaldata.ThetendencyforonegrouptoexceedmembersoftheotheristestedbytheMann–WhitneyUtest(§12.2).
Orderednominaldata.Firstthedataissetoutasatwowaytable,onevariablebeinggroupandtheothertheorderednominaldata.Achi-squaredtest
(§13.1)willtestthenullhypothesisthatthereisnorelationshipbetweengroupandvariable,buttakesnoaccountoftheordering.Thisisdonebyusingthechi-squaredtestfortrend,whichtakestheorderingintoaccountandprovidesamuchmorepowerfultest(§13.8).
Table14.1.Methodsforcomparingtwosamples
Typeofdata Sizeofsample Method
Interval Large,>50eachsample
Normaldistributionformeans(§8.5,§9.7)
Small,<50eachsample,withNormaldistributionanduniformvariance
Two-sampletmethod(§10.3)
Small,<50eachsample,non-Normal
Mann–WhitneyUtest(§12.2)
Ordinal Any Mann–WhitneyU
![Page 442: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/442.jpg)
test(§12.2)
Nominal,ordered
Large,n>30 Chi-squaredfortrend(§13.8)
Nominal,notordered
Large,mostexpectedfrequencies>5
Chi-squaredtest(§13.1)
Small,morethan20%expectedfrequencies<5
Reducenumberofcategoriesbycombiningorexcludingasappropriate(§13.3)
Dichotomous Large,allexpectedfrequencies>5
Comparisonoftwoproportions(§8.6,§9.8),chi-squaredtest(§13.1),oddsratio(§13.7)
Small,atleastoneexpectedfrequency<5
Chi-squaredtestwithYates'correction(§13.5),Fisher'sexacttest(§13.4)
Nominaldata.Setthedataoutasatwowaytableasdescribedabove.Thechi-squaredtestforatwowaytableistheappropriatetest(§13.1).Theconditionforvalidityofthetest,thatatleast80%oftheexpectedfrequenciesshouldbegreaterthan5,mustbemetbycombiningor
![Page 443: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/443.jpg)
deletingcategoriesasappropriate(§13.3).Ifthetablereducestoa2by2tablewithouttheconditionbeingmet,useFisher'sexacttest.
Dichotomousdata.Forlargesamples,eitherpresentthedataastwoproportionsandusetheNormalapproximationtofindtheconfidenceintervalforthedifference(§8.6),orsetthedataupasa2by2tableanddoachi-squaredtest(§13.1).Theseareequivalentmethods.Anoddsratiocanalsobecalculated(§13.7).Ifthesampleissmall,thefittotheChi-squareddistributioncanbeimprovedbyusingYates'correction(§13.5).Alternatively,useFisher'sexacttest(§13.4).
Table14.2.Methodsfordifferencesinoneorpairedsample
Typeofdata Sizeofsample Method
Interval Large,>100 Normaldistribution(§8.3)
Small,<100,Normaldifferences
Pairedtmethod(§10.2)
Small,<100,non-Normaldifferences
Wilcoxonmatchedpairstest(§12.3)
Ordinal Any Signtest(§9.2)
Nominal,ordered
Any Signtest(§9.2)
![Page 444: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/444.jpg)
Nominal Any Stuarttest(§13.9)
Dichotomous Any McNemar'stest(§13.9)
14.4*OnesampleandpairedsamplesMethodsofanalysisforpairedsamplesaresummarizedinTable14.2.
Intervaldata.Inferencesareondifferencesbetweenthevariableasobservedonthetwoconditions.Forlargesamples,sayn>100,theconfidenceintervalforthemeandifferenceisfoundusingtheNormalapproximation(§8.3).Forsmallsamples,providedthedifferencesarefromaNormaldistribution,usethepairedttest(§10.2).Thisassumptionisoftenveryreasonable,asmostofthevariationbetweenindividualsisremovedandrandomerrorislargelymadeupofmeasurementerror.Furthermore,theerroristheresultoftwoaddedmeasurementerrorsandsotendstofollowaNormaldistributionanyway.Ifnot,transformationoftheoriginaldatawilloftenmakedifferencesNormal(§10.4).IfnoassumptionofaNormaldistributioncanbemade,usetheWilcoxonsigned-rankmatched-pairstest(§12.3).
Itisrarelyaskedwhetherthereisadifferenceinvariabilityinpaireddata.Thiscanbetestedbyfindingthedifferencesbetweenthetwoconditionsandtheirsum.Thenifthereisnochangeinvariancethecorrelationbetweendifferenceandsumhasexpectedvaluezero(Pitman'stest).Thisisnotobviousbutitistrue.
Ordinaldata.Ifthedatadonotformanintervalscale,asnotedin§14.2thedifferencebetweenconditionsisnotmeaningful.However,wecansaywhatdirectionthedifferenceisin,andthiscanbeexaminedbythesigntest(§9.2).
Orderednominaldata.Usethesigntest,withchangesinonedirection
![Page 445: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/445.jpg)
beingpositive,intheothernegative,nochangeaszero(§9.2).
Nominaldata.Withmorethantwocategories,thisisdifficult.UseStuart'sgeneralizationtomorethantwocategoriesofMcNemar'stest(§13.9).
Dichotomousdata.Herewearecomparingtheproportionsofindividualsinagivenstateunderthetwoconditions.TheappropriatetestisMcNemar'stest(§13.9).
14.5*RelationshipbetweentwovariablesThemethodsforstudyingrelationshipsbetweenvariablesaresummarizedinTable14.3.Relationshipswithdichotomousvariablescanbestudiedasthedifferencebetweentwogroups(§14.3),thegroupsbeingdefinedbythetwostatesofthedichotomousvariable.Dichotomousdatahavebeenexcludedfromthetextofthissection,butareincludedinTable14.3.
Intervalandintervaldata.Twomethodsareused:regressionandcorrelation.Regression(§11.2,§11.5)isusuallypreferred,asitgivesinformationaboutthenatureoftherelationshipaswellasaboutitsexistence.Correlation(§11.9)measuresthestrengthoftherelationship.Forregression,residualsaboutthelinemustfollowaNormaldistributionwithuniformvariance.Forestimation,thecorrelationcoefficientrequiresanassumptionthatbothvariablesfollowaNormaldistribution,buttotestthenullhypothesisonlyonevariableneedstofollowaNormaldistribution.IfneithervariablecanbeassumedtofollowaNormaldistributionorbetransformedtoit(§11.8),userankcorrelation(§12.4,§12.5).
Intervalandordinaldata.Rankcorrelationcoefficient(§12.4,§12.5).
Intervalandorderednominaldata.Thiscanbeapproachedbyrankcorrelation,usingKendall'sτ(§12.5)becauseitcopeswiththelargenumberoftiesbetterthandoesSpearman'sρ,orbyanalysisofvarianceasdescribedforintervalandnominaldata.ThelatterrequiresanassumptionofNormaldistributionanduniformvariancefortheintervalvariable.Thesetwoapproachesarenotequivalent.
![Page 446: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/446.jpg)
Intervalandnominaldata.IftheintervalscalefollowsaNormaldistribution,useone-wayanalysisofvariance(§10.9).TheassumptionisthatwithincategoriestheintervalvariableisfromNormaldistributionswithuniformvariance.Ifthisassumptionisnotreasonable,useKruskal–Wallisanalysisofvariancebyranks(§12.2).
Ordinalandordinaldata.Usearankcorrelationcoefficient,Spearman'sρ(§12.4)orKendall'sτ(§12.5).Bothwillgiveverysimilaranswersfortestingthenullhypothesisofnorelationshipintheabsenceofties.Fordatawithmanytiesandforcomparingthestrengthsofdifferentrelationships,Kendall'sτispreferable.
Ordinalandorderednominaldata.UseKendall'srankcorrelationcoefficient,τ(§12.5).
Ordinalandnominaldata.Kruskal–Wallisone-wayanalysisofvariancebyranks(§12.2).
Orderednominalandorderednominaldata.Usechi-squaredfortrend(§13.8).
Orderednominalandnominaldata.Usethechi-squaredtestforatwo-waytable(§13.1).
Nominalandnominaldata.Usethechi-squaredtestforatwo-waytable(§13.1),providedtheexpectedvaluesarelargeenough.OtherwiseuseYates'correction(§13.5)orFisher'sexacttest(§13.4).
Table14.3.Methodsforrelationshipsbetweenvariables
Interval,Normal
Interval,non-Normal Ordinal
IntervalNormal
Regression(§11.2)correlation
Regression(§11.2)Rank
Rankcorrelation(§12.4,
![Page 447: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/447.jpg)
(§11.9) correlation(§12.4,§12.5)
§12.5)
Interval,non-Normal
Regression(§11.2)rankcorrelation(§12.4,§12.5)
Rankcorrelation(§12.4,§12.5)
Rankcorrelation(§12.4,§12.5)
Ordinal Rankcorrelation(§12.4,§12.5)
Rankcorrelation(§12.4,§12.5)
Rankcorrelation(§12.4,§12.5)
Nominal,ordered
Kendall'srankcorrelation(§12.5)
Kendall'srankcorrelation(§12.5)
Kendall'srankcorrelation(§12.5)
Nominal Analysisofvariance(§10.9)
Kruskal–Wallistest(§12.2)
Kruskal–Wallistest(§12.2)
Dichotomous ttest(§10.3)Normaltest(§8.5,§9.7)
LargesampleNormaltest(§8.5,§9.7)Mann–WhitneyU
Mann–WhitneyUtest(§12.2)
![Page 448: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/448.jpg)
test(§12.2)
Nominal,ordered Nominal Dichotomous
IntervalNormal
Rankcorrelation(§12.4,§12.5)
Analysisofvariance(§10.9)
ttest(§10.3)Normaltest(§8.5,§9.7)
Interval,non-Normal
Kendall'srankcorrelation(§12.5)
Kruskal-Wallistest(§12.2)
LargesampleNormaltest(§8.5,§9.7),Mann–WhitneyUtest(§12.2)
Ordinal Kendall'srankcorrelation(§12.5)
Kruskal-Wallistest(§12.2)
Mann-WhitneyUtest(§12.2)
Nominal,ordered
Chi-squaredtestfortrend(§13.8)
Chi-squaredtest(§13.1)
Chi-squaredtestfortrend(§13.8)
Nominal Chi-squared
Chi-squared
Chi-squaredtest(§13.1)
![Page 449: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/449.jpg)
test(§13.1)
test(§13.1)
Dichotomous Chi-squaredtestfortrend(§13.8)
Chi-squaredtest(§13.1)
Chi-squaredtest(§13.1,§13.5)Fisher'sexacttest(§13.4)
14MMultiplechoicequestions74to80(*Eachbranchiseithertrueorfalse)
74.Thefollowingvariableshaveintervalscalesofmeasurement:
(a)height;
(b)presenceorabsenceofasthma;
(c)Apgarscore;
(d)age;
(e)ForcedExpiratoryVolume.
ViewAnswer
75.Thefollowingmethodsmaybeusedtoinvestigatearelationshipbetweentwocontinuousvariables:
(a)pairedttest;
(b)thecorrelationcoefficient,r;
(c)simplelinearregression;
(d)Kendall'sτ;
![Page 450: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/450.jpg)
(e)Spearman'sρ.
ViewAnswer
76.Whenanalysingnominaldatathefollowingstatisticalmethodsmaybeused:
(a)simplelinearregression;
(b)correlationcoefficient,r;
(c)pairedttest;
(d)Kendall'sτ;
(e)chi-squaredtest.
ViewAnswer
77.Tocomparelevelsofacontinuousvariableintwogroups,possiblemethodsinclude:
(a)theMann–WhitneyUtest;
(b)Fisher'sexacttest;
(c)attest;
(d)Wilcoxonmatched-pairssigned-ranktest;
(e)thesigntest.
ViewAnswer
Table14.4.Numberofrejectionepisodesover16weeksfollowinghearttransplantintwogroupsof
patients
Episodes GroupA GroupB Total
![Page 451: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/451.jpg)
0 10 8 18
1 15 6 21
2 4 0 4
3 3 0 3
Totalpatients 32 14 46
78.Table14.4showsthenumberofrejectionepisodesfollowinghearttransplantintwogroupsofpatients:
(a)therejectionratesinthetwopopulationscouldbecomparedbyaMann–WhitneyUtest;
(b)therejectionratesinthetwopopulationscouldbecomparedbyatwo-samplettest;
(c)therejectionratesinthetwopopulationscouldbecomparedbyachi-squaredtestfortrend:
(d)thechi-squaredtestfora4by2tablewouldnotbevalid;
(e)thehypothesisthatthenumberofepisodesfollowsaPoissondistributioncouldbeinvestigatedusingachi-squaredtestforgoodnessoffit.
ViewAnswer
79.Twentyarthritispatientsweregiveneitheranewanalgesicoraspirinonsuccessivedaysinrandomorder.Thegripstrengthofthepatientswasmeasured.Methodswhichcouldbeusedtoinvestigatetheexistenceofatreatmenteffectinclude:
(a)Mann–WhitneyUtest;
![Page 452: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/452.jpg)
(b)pairedtmethod;
(c)signtest;
(d)Normalconfidenceintervalforthemeandifference;
(e)Wilcoxonmatched-pairssigned-ranktest.
ViewAnswer
80.Inastudyofboxers,computertomographyrevealedbrainatrophyin3of6professionalsand1of8amateurs(Kasteetal.1982).Thesegroupscouldbecomparedusing:
(a)Fisher'sexacttest;
(b)thechi-squaredtest;
(c)thechi-squaredtestwithYates'correction;
(d)*McNemar'stest;
(e)thetwo-samplettest.
ViewAnswer
Table14.5.GastricpHandurinarynitriteconcentrationsin26subjects(HallandNorthfield,privatecommunication)
pH Nitrite pH Nitrite pH Nitrite pH Nitrite
1.72 1.64 2.64 2.33 5.29 50.6 5.77 48.9
1.93 7.13 2.73 52.0 5.31 43.9 5.86 3.26
1.94 12.1 2.94 6.53 5.50 35.2 5.90 63.4
![Page 453: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/453.jpg)
2.03 15.7 4.07 22.7 5.55 83.8 5.91 81.2
2.11 0.19 4.91 17.8 5.59 52.5 6.03 19.5
2.17 1.48 4.94 55.6 5.59 81.8
2.17 9.36 5.18 0.0 5.17 21.9
14E*Exercise:Choosingastatisticalmethod1.Inacross-overtrialtocomparetwoappliancesforileostomypatients,of14patientswhoreceivedsystemAfirst,5expressedapreferenceforA,9forsystemBandnonehadnopreference.OfthepatientswhoreceivedsystemBfirst,7preferredA,5preferredBand4hadnopreference.Howwouldyoudecidewhetheronetreatmentwaspreferable?Howwouldyoudecidewhethertheorderoftreatmentinfluencedthechoice?
ViewAnswer
2.Burretal.(1976)testedaproceduretoremovehouse-dustmitesfromthebeddingofadultasthmaticsinattempttoimprovesubjects'lungfunction,whichtheymeasuredbyPEFR.Thetrialwasatwoperiodcross-overdesign,thecontrolorplacebotreatmentbeingthoroughdustremovalfromthelivingroom.ThemeansandstandarderrorsforPEFRinthe32subjectswere:
activetreatment:335litres/min,SE=19.6litres/min
placebotreatment:329litres/min,SE=20.8litres/min
differenceswithinsubjects:(treatment–placebo)6.45litres/min,SE=5.05litres/min
HowwouldyoudecidewhetherthetreatmentimprovesPEFR?
ViewAnswer
![Page 454: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/454.jpg)
3.Inatrialofscreeningandtreatmentformildhypertension(Readeretal.1980),1138patientscompletedthetrialonactivetreatment,with9deaths,and1080completedonplacebo,with19deaths.Afurther583patientsallocatedtoactivetreatmentwithdrew,ofwhom6died,and626allocatedtoplacebowithdrew,ofwhom16diedduringthetrialperiod.Howwouldyoudecidewhetherscreeningandtreatmentformildhypertensionreducestheriskofdying?
ViewAnswer
4.Table14.5showsthepHandnitriteconcentrationsinsamplesofgastricfluidfrom26patients.AscatterdiagramisshowninFigure14.1.HowwouldyouassesstheevidenceofarelationshipbetweenpHandnitriteconcentration?
ViewAnswer
5.Thelungfunctionof79childrenwithahistoryofhospitalizationforwhoopingcoughand178childrenwithoutahistoryofwhoopingcough,takenfromthesameschoolclasses,wasmeasured.Themeantransittimeforthewhoopingcoughcaseswas0.49seconds(s.d.=0.14seconds)andforthecontrols0.47seconds(s.d.=0.11seconds),(Johnstonetal.1983).Howcouldyouanalysethedifferenceinlungfunctionbetweenchildrenwhohadhadwhoopingcoughandthosewhohadnot?Eachcasehadtwomatchedcontrols.Ifyouhadallthedata,howcouldyouusethisinformation?
ViewAnswer
![Page 455: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/455.jpg)
Fig.14.1.GastricpHandurinarynitrite
Table14.6.Visualacuityandresultsofacontrastsensitivityvisiontestbeforeandaftercataractsurgery(Wilkins,personalcommunication)
CaseVisualacuity Contrastsensitivitytest
Before After Before After
1 6/9 6/9 1.35 1.50
2 6/9 6/9 0.75 1.05
3 6/9 6/9 1.05 1.35
4 6/9 6/9 0.45 0.90
![Page 456: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/456.jpg)
5 6/12 6/6 1.05 1.35
6 6/12 6/9 0.90 1.20
7 6/12 6/9 0.90 1.05
8 6/12 6/12 1.05 1.20
9 6/12 6/12 0.60 1.05
10 6/18 6/6 0.75 1.05
11 6/18 6/12 0.90 1.05
12 6/18 6/12 0.90 1.50
13 6/24 6/18 0.45 0.75
14 6/36 6/18 0.15 0.45
15 6/36 6/36 0.45 0.60
16 6/60 6/9 0.45 1.05
17 6/60 6/12 0.30 1.05
6.Table14.6showssomedatafromapre-andpost-treatmentstudy
![Page 457: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/457.jpg)
ofcataractpatients.Thesecondnumberinthevisualacuityscorerepresentsthesizeofletterwhichcanbereadatadistanceofsixmetres,sohighnumbersrepresentpoorvision.Forthecontrastsensitivitytest,whichisameasurement,highnumbersrepresentgoodvision.Whatmethodscouldbeusedtotestthedifferenceinvisualacuityandinthecontrastsensitivitytestpre-andpost-operation?Whatmethodcouldbeusedtoinvestigatetherelationshipbetweenvisualacuityandthecontrastsensitivitytestpost-operation?
ViewAnswer
Table14.7.Asthmaorwheezebymaternalage(Andersonetal.1986)
Asthmaorwheezereported
Mother'sageatchild'sbirth
15–19 20–29 30+
Never 261 4017 2146
Onsetbyage7 103 984 487
Onsetfrom8to11 27 189 95
Onsetfrom12to16 20 157 67
Table14.8.Colontransittime(hours)ingroupsofmobileand
![Page 458: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/458.jpg)
immobileelderlypatients(dataofDrMichaelO'Connor)
Mobilepatients Immobilepatients
8.4 21.6 45.5 62.4 68.4 15.6 38.8 54.0
14.4 25.2 48.0 66.0 24.0 42.0 54.0
19.2 30.0 50.4 66.0 24.0 43.2 57.6
20.4 36.0 60.0 66.0 32.4 47.0 58.8
20.4 38.4 60.0 67.2 34.8 52.8 62.4
n1=21,[xwithbarabove]1=42.57,s1=20.58
n1=21,[xwithbarabove]49.63,s2=16.39
7.Table14.7showstherelationshipbetweenageofonsetofasthmainchildrenandmaternalageatthechild'sbirth.Howwouldyoutestwhetherthesewererelated?ThechildrenwereallborninoneweekinMarch,1958.Apartfromthepossibilitythatyoungmothersingeneraltendtohavechildrenpronetoasthma,whatotherpossibleexplanationsarethereforthisfinding?
ViewAnswer
8.Inastudyofthyroidhormoneinprematurebabies,wewantedtostudytherelationshipoffreeT3measuredatseveraltimepointsoversevendayswiththenumberofdaysthebabiesremainedoxygendependent.Somebabiesdied,mostlywithinafewdaysofbirth,andsomebabieswenthomestilloxygendependentandwere
![Page 459: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/459.jpg)
notfollowedanylongerbytheresearchers.HowcouldyoureducetheseriesofT3measurementsonababytoasinglevariable?Howcouldyoutesttherelationshipwithtimeonoxygen?
ViewAnswer
9.Table14.8showscolontransittimesmeasuredinagroupofelderlypatientswhoweremobileandinasecondgroupwhowereunabletomoveindependently.Figure14.2showsascatterdiagramandhistogramandNormalplotofresidualsforthesedata.Whattwostatisticalapproachescouldbeusedhere?Whichwouldyoupreferandwhy?
ViewAnswer
Fig.14.2.Scatterplot,histogram,andNormalplotforthecolontransittimedataofTable14.8
![Page 460: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/460.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>15-Clinicalmeasurement
15
Clinicalmeasurement
15.1MakingmeasurementsInthischapterweshalllookatanumberofproblemsassociatedwithclinicalmeasurement.Theseincludehowpreciselywecanmeasure,howdifferentmethodsofmeasurementcanbecompared,howmeasurementscanbeusedindiagnosisandhowtodealwithincompletemeasurementsofsurvival.
Whenwemakeameasurement,particularlyabiologicalmeasurement,thenumberweobtainistheresultofseveralthings:thetruevalueofthequantitywewanttomeasure,biologicalvariation,themeasurementinstrumentitself,thepositionofthesubject,theskill,experienceandexpectationsoftheobserver,andeventherelationshipbetweenobserverandsubject.Someofthesefactors,suchasthevariationwithinthesubject,areoutsidethecontroloftheobserver.Others,suchasposition,arenot,anditisimportanttostandardizethese.Onewhichismostunderourcontrolistheprecisionwithwhichwereadscalesandrecordtheresult.Whenbloodpressureismeasured,forexample,someobserversrecordtothenearest5mmHg,otherstothenearest10mmHg.SomeobserversmayrecorddiastolicpressureatKorotkovsoundfour,othersatfive.Observersmaythinkthatasbloodpressureissuchavariablequantity,errorsinrecordingofthismagnitudeareunimportant.Inthemonitoringoftheindividualpatient,suchlackofuniformitymaymakeapparentchangesdifficulttointerpret.Inresearch,imprecisemeasurementcanleadtoproblemsintheanalysistolossofpower.
Howpreciselyshouldwerecorddata?Whilethismustdependtosome
![Page 461: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/461.jpg)
extentonthepurposeforwhichthedataaretoberecorded,anydatawhicharetobesubjectedtostatisticalanalysisshouldberecordedaspreciselyaspossible.Astudycanonlybeasgoodasthedata,anddataareoftenverycostlyandtime-consumingtocollect.Theprecisiontowhichdataaretoberecordedandallother.procedurestobeusedinmeasurementshouldbedecidedinadvanceandstatedintheprotocol,thewrittenstatementofhowthestudyistobecarriedout.Weshouldbearinmindthattheprecisionofrecordingdependsonthenumberofsignificantfigures(§5.2)recorded,notthenumberofdecimalplaces.Theobservations0.15and1.66fromTable4.8,forexample,arebothrecordedtotwodecimalplaces,but0.15hastwosignificantfiguresand1.66hasthree.Thesecondobservationisrecordedmoreprecisely.Thisbecomesveryimportantwhenwecometoanalysethedata,forthedataofTable4.8havea
skewdistributionwhichwewishtologtransform.Thegreaterimprecisionofrecordingatthelowerendofthescaleismagnifiedbythetransformation.
Inmeasurementthereisusuallyuncertaintyinthelastdigit.Observerswilloftenhavesomevaluesforthislastdigitwhichtheyrecordmoreoftenthanothers.Manyobserversaremorelikelytorecordaterminalzerothananineoraone,forexample.Thisisknownasdigitpreference.Thetendencytoreadbloodpressuretothenearest5or10mmHgmentionedaboveisanexampleofthis.Observertrainingandawarenessoftheproblemhelptominimizedigitpreference,butifpossiblereadingsshouldbetakentosufficientsignificantfiguresforthelastdigittobeunimportant.Digitpreferenceisparticularlyimportantwhendifferencesinthelastdigitareofimportancetotheoutcome,asitmightbeinTable15.1,wherewearedealingwiththedifferencebetweentwosimilarnumbers.Becauseofthisitisamistaketohaveonemeasurertakereadingsunderonesetofconditionsandasecondunderanother,astheirdegreeofdigitpreferencemaydiffer.Itisalsoimportanttoagreetheprecisiontowhichdataaretoberecordedandtoensurethatinstrumentshavesufficientlyfinescalesforthejobinhand.
![Page 462: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/462.jpg)
15.2*RepeatabilityandmeasurementerrorIhavealreadydiscussedsomefactorswhichmayproducebiasinmeasurements(§2.7,§2.8,§3.6).Ihavenotyetconsideredthenaturalbiologicalvariability,insubjectandinmeasurementmethod,whichmayleadtomeasurementerror.‘Error’comesfromaLatinrootmeaning‘towander’,anditsuseinstatisticsincloselyrelatedtothis,asin§11.2,forexample.Thuserrorinmeasurementmayincludethenaturalcontinualvariationofabiologicalquantity,whenasingleobservationwillbeusedtocharacterizetheindividual.Forexample,inthemeasurementofbloodpressurewearedealingwithaquantitythatvariescontinuously,notonlyfromheartbeattoheartbeatbutfromdaytoday,seasontoseason,andevenwiththesexofthemeasurer.Themeasurer,too,willshowvariationintheperceptionoftheKorotkovsoundandreadingofthemanometer.Becauseofthis,mostclinicalmeasurementscannotbetakenatfacevaluewithoutsomeconsiderationbeinggiventotheirerror.
Thequantificationofmeasurementerrorisnotdifficultinprinciple.Todoitweneedasetofreplicatereadings,obtainedbymeasuringeachmemberofasampleofsubjectsmorethanonce.Wecanthenestimatethestandarddeviationofrepeatedmeasurementsonthesamesubject.Table15.1showssomereplicatedmeasurementsofpeakexpiratoryflowrate,madebythesameobserver(myself)withaWrightPeakFlowMeter.Foreachsubject,themeasuredPEFRvariesfromobservationtoobservation.Thisvariationisthemeasurementerror.Wecanquantifymeasurementerrorintwoways:usingthestandarddeviationforrepeatedmeasurementsonthesamesubjectandbycorrelation.
Table15.1.PairsofreadingsmadewithaWrightPeakFlowMeteron17healthyvolunteers
![Page 463: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/463.jpg)
Subject
PEFR(litres/min)
Subject
PEFR(litres/min)
First Second First Second
1 494 490 10 433 429
2 395 397 11 417 420
3 516 512 12 656 633
4 434 401 13 267 275
5 476 470 14 478 492
6 557 611 15 178 165
7 413 415 16 423 372
8 442 431 17 427 421
9 650 638
Table15.2.AnalysisofvariancebysubjectforthePEFRdataofTable15.1
Sourceof Degrees Sumof Mean Variance
![Page 464: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/464.jpg)
variation offreedom
squares square ratio(F) Probability
Total 33 445581.5
Betweensubjects
16 441598.5 27599.9 117.8
Residual(withinsubjects)
17 3983.0 234.3
![Page 465: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/465.jpg)
Table15.3.Analysisofvariancebysubjectforthelog(basetransformedPEFRdataofTable15.1
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 33 3.160104
Subjects 16 3.139249 0.196203 159.9
![Page 466: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/466.jpg)
Residual(withinsubjects)
17 0.020855 0.001227
Weshouldchecktoseewhethertheerrordoesdependonthevalueofthemeasurement,usuallybeinglargerforlargervalues.Wecandothisbyplottingascatterdiagramoftheabsolutevalueofthedifference(i.e.ignoringthesign)andthemeanofthetwoobservations(Figure15.1).ForthePEFRdata,thereisnoobviousrelationship.Wecancheckthisbycalculatingacorrelation(§11.9)orrankcorrelationcoefficient(§12.4,§12.5).ForFigure15.1wehaveτ=0.17,P=0.3,sothereislittletosuggestthatthemeasurementerrorisrelatedtothesizeofthePEFR.Hencethecoefficientofvariationisnotasappropriateasthewithinsubjectsstandarddeviationasarepresentationofthemeasurementerror.Formostmedicalmeasurements,thestandarddeviationiseitherindependentoforproportionaltothemeasurementandsooneofthesetwoapproachescanbeused.
![Page 467: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/467.jpg)
Fig.15.1.Absolutedifferenceversussumfor17pairsofWrightPeakFlowMetermeasurements
Measurementerrormayalsobepresentedasthecorrelationcoefficientbetweenpairsofreadings.Thisissometimescalledthereliabilityofthemeasurement,andisoftenusedforpsychologicalmeasurementsusingquestionnairescales.However,thecorrelationdependsontheamountofvariationbetweensubjects.Ifwedeliberatelychoosesubjectstohaveawidespreadofpossiblevalues,thecorrelationwillbebiggerthanifwetakearandomsampleofsubjects.Thusthismethodshouldonlybeusedifwehavearepresentativesampleofthesubjectsinwhomweareinterested.Theintra-classcorrelationcoefficient(§11.13),whichdoesnottakeintoaccounttheorderinwhichobservationsweretakenandwhichcanbeusedwithmorethantwoobservationspersubject,ispreferredforthisapplication.Applyingthemethodof§11.13toTable15.1wegetICC=0.98.ICCandswarecloselyrelated,becauseICC=1-sw2/(sb2+sw2).ICCthereforedependsalsoonthevariationbetweensubjects,andthusrelatestothepopulationofwhichthesubjectscanbeconsideredarandomsample.StreinerandNorman(1996)giveaninterestingdiscussion.
15.3*ComparingtwomethodsofmeasurementInclinicalmeasurement,mostofthethingswewanttomeasure,hearts,lungs,liversandsoon,aredeepwithinlivingbodiesandoutofreach.Thismeansthatmanyofthemethodsweusetomeasurethemareindirectandwecannotbesurehowcloselytheyarerelatedtowhatwereallywanttoknow.Whenanewmethodofmeasurementisdeveloped,ratherthancompareitsoutcometoasetofknownvalueswemustoftencompareittoanothermethodjustasindirect.Thisisacommontypeofstudy,andonewhichisoftenbadlydone(AltmanandBland1983,BlandandAltman1986).
Table15.4showsmeasurementsofPEFRbytwodifferentmethods,theWrightmeterdatacomingfromTable15.1.Forsimplicity,Ishalluse
![Page 468: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/468.jpg)
onlyonemeasurementbyeachmethodhere.Wecouldmakeuseoftheduplicate
databyusingtheaverageofeachpairfirst,butthisintroducesanextrastageinthecalculation.BlandandAltman(1986)givedetails.
Table15.4.ComparisonoftwomethodsofmeasuringPEFR
Subjectnumber
PEFR(litres/min)DifferenceWright-miniWright
meterMinimeter
1 494 512 -18
2 395 430 -35
3 516 520 -4
4 434 428 6
5 476 500 -24
6 557 600 -43
7 413 364 49
8 442 380 62
![Page 469: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/469.jpg)
9 650 658 -8
10 433 445 -12
11 417 432 -15
12 656 626 30
13 267 260 7
14 478 477 1
15 178 259 -81
16 423 350 73
17 427 451 -24
Total -36
Mean 2.1
S.d. 38.8
Thefirststepintheanalysisistoplotthedataasascatterdiagram(Figure15.2).Ifwedrawthelineofequality,alongwhichthetwomeasurementswouldbeexactlyequal,thisgivesusanideaoftheextenttowhichthetwomethodsagree.Thisisnotthebestwayoflookingatdataofthistype,becausemuchofthegraphisemptyspaceandtheinterestinginformationisclusteredalongtheline.Abetter
![Page 470: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/470.jpg)
approachistoplotthedifferencebetweenthemethodsagainstthesumoraverage.Thesignofthedifferenceisimportant,asthereisapossibilitythatonemethodmaygivehighervaluesthantheotherandthismayberelatedtothetruevaluewearetryingtomeasure.ThisplotisalsoshowninFigure15.2.
Twomethodsofmeasurementagreeifthedifferencebetweenobservationsonthesamesubjectusingbothmethodsissmallenoughforustousethemethodsinterchangeably.Howsmallthisdifferencehastobedependsonthemeasurementandtheusetowhichitistobeput.Itisaclinical,notastatistical,decision.Wequantifythedifferencesbyestimatingthebias,whichisthemeandifference,andthelimitswithinwhichmostdifferenceswilllie.Weestimatetheselimitsfromthemeanandstandarddeviationofthedifferences.Ifwearetoestimatethesequantities,wewantthemtobethesameforhighvaluesandforlowvaluesofthemeasurement.Wecancheckthisfromtheplot.ThereisnoclearevidenceofarelationshipbetweendifferenceandmeaninFigure15.4,andwecancheckthisbyatestofsignificanceusingthecorrelationcoefficient.Wegetr=0.19,P=0.5.
Themeandifferenceisclosetozero,sothereislittleevidenceofoverallbias.
Wecanfindaconfidenceintervalforthemeandifferenceasdescribedin§10.2.Thedifferenceshaveamean[dwithbarabove]=-2.1litres/min,andastandarddeviationof38.8.Thestandarderrorofthemeanisthuss/√n=38.8/√17=9.41litres/minandthecorrespondingvalueoftwith16degreesoffreedomis2.12.The95%confidenceintervalforthebiasisthus-2.1±2.12×9.41=-22to+18litres/min.Thusonthebasisofthesedatawecouldhaveabiasofasmuchas22litres/min,whichcouldbeclinicallyimportant.Theoriginalcomparisonoftheseinstrumentsusedamuchlargersampleandfoundthatanybiaswasverysmall(Oldhametal.1979).
![Page 471: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/471.jpg)
Fig.15.2.PEFRmeasuredbytwodifferentinstruments,minimeterversusWrightmeteranddifferenceversusmeanofminiandWrightmeters
Fig.15.3.DistributionofdifferencesbetweenPEFRmeasuredbytwomethods
Thestandarddeviationofthedifferencesbetweenmeasurementsmadebythetwomethodsprovidesagoodindexofthecomparabilityofthemethods.Ifwecanestimatethemeanandstandarddeviationreliably,withsmallstandarderrors,wecanthensaythatthedifferencebetweenmethodswillbeatmosttwostandarddeviationsoneithersideofthemeanfor95%ofobservations.These[dwithbarabove]±2s
![Page 472: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/472.jpg)
limitsforthedifferencearecalledthe95%limitsofagreement.ForthePEFRdata,thestandarddeviationofthedifferencesisestimatedtobe38.8litres/minandthemeanis-2litres/min.Twostandarddeviationsistherefore78litres/min.Thereadingwiththeminimeterisexpectedtobe80litresbelowto76litresaboveformostsubjects.TheselimitsareshownashorizontallinesinFigure15.4.Thelimitsdependontheassumptionthatthedistributionof
thedifferencesisapproximatelyNormal,whichcanbecheckedbyhistogramandNormalplot(§7.5)(Figure15.3).
Fig.15.4.DifferenceversussumforPEFRmeasuredbytwomethods
OnthebasisofthesedatawewouldnotconcludethatthetwomethodsarecomparableorthattheminimetercouldreliablyreplacetheWrightpeakflowmeter.Asremarkedin§10.2,thismeterhadreceivedconsiderablewear.
Whenthereisarelationshipbetweenthedifferenceandthemean,wecantrytoremoveitbyatransformation.Thisisusuallyaccomplishedbythelogarithm,andleadstoaninterpretationofthelimitssimilartothatdescribedin§15.2.BlandandAltman(1986,1999)givedetails.
![Page 473: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/473.jpg)
15.4SensitivityandspecificityOneofthemainreasonsformakingclinicalmeasurementsistoaidindiagnosis.Thismaybetoidentifyoneofseveralpossiblediagnosesinapatient,ortofindpeoplewithaparticulardiseaseinanapparentlyhealthypopulation.Thelatterisknownasscreening.Ineithercasethemeasurementprovidesatestwhichenablesustoclassifysubjectsintotwogroups,onegroupwhomwethinkarelikelytohavediseaseinwhichweareinterested,andanothergroupunlikelytohavethedisease.Whendevelopingsuchatest,weneedtocomparethetestresultwithatruediagnosis.Thetestmaybebasedonacontinuousvariableandthediseaseindicatedifitisaboveorbelowagivenlevel,oritmaybeaqualitativeobservationsuchascarcinomainsitucellsonacervicalsmear.IneithercaseIshallcallthetestpositiveifitindicatesthediseaseandnegativeifnot,andthediseasepositiveifthediseaseislaterconfirmed,negativeifnot.
Howdowemeasuretheeffectivenessofthetest?Table15.5showsthreeartificialsetsoftestanddiseasedata.Wecouldtakeasanindexoftesteffectivenesstheproportiongivingthecorrectdiagnosisfromthetest.ForTest1intheexampleitis94%.NowconsiderTest2,whichalwaysgivesanegativeresult.Test2willneverdetectanycasesofthedisease.Wearenowrightfor95%ofthesubjects!However,thefirsttestisuseful,inthatitdetectssome
casesofthedisease,andthesecondisnot,sothisisclearlyapoorindex.
Table15.5.Someartificialtestanddiagnosisdata
DiseaseTest1 Test2 Test3
Total+ve -ve +ve -ve +ve -ve
Yes 4 1 0 5 2 3 5
![Page 474: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/474.jpg)
No 5 90 0 95 0 95 95
Total 9 91 0 100 2 98 100
Thereisnoonesimpleindexwhichenablesustocomparedifferenttestsinallthewayswewouldlike.Thisisbecausetherearetwothingsweneedtomeasure:howgoodthetestisatfindingdiseasepositives,i.e.thosewiththecondition,andhowgoodthetestisatexcludingdiseasenegatives,i.e.thosewhodonothavethecondition.Theindicesconventionallyemployedtodothisare:
Inotherwords,thesensitivityisaproportionofdiseasepositiveswhoaretestpositive,andthespecificityistheproportionofdiseasenegativeswhoaretestnegatives.Forourthreeteststheseare:
Test1 Test2 Test3
Sensitivity 0.80 0.00 0.40
Specificity 0.95 1.00 1.00
Test2,ofcourse,missesallthediseasepositivesandfindsallthediseasenegatives,bysayingallarenegative.ThedifferencebetweenTests1and3isbroughtoutbythegreatersensitivityof1andthegreaterspecificityof3.Wearecomparingtestsintwodimensions.WecanseethatTest3isbetterthanTest2,becauseitssensitivityishigherandspecificitythesame.However,itismoredifficulttoseewhetherTest3isbetterthanTest1.Wemustcometoajudgementbasedontherelativeimportanceofsensitivityandspecificityintheparticularcase.
Sensitivityandspecificityareoftenmultipliedby100togivepercentages.Theyarebothbinomialproportions,sotheirstandard
![Page 475: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/475.jpg)
errorsandconfidenceintervalsarefoundasdescribedin§8.4and§8.8.Becausetheproportionsareoftennearto1.0,thelargesampleapproach(§8.4)maynotbevalid.TheexactmethodusingtheBinomialprobabilities(§8.8)ispreferable.HarperandReeves(1999)pointoutthatconfidenceintervalsarealmostalwaysomittedinstudiesofdiagnostictestsreportedoutsidethemajorgeneralmedicaljournals,andrecommendthattheyshouldalwaysbegiven.Asthereadermightexpect,Iagreewiththem!Thesamplesizerequiredforthereliableestimationofsensitivityandspecificitycanbecalculatedasdescribedin§18.2.
Sometimesatestisbasedonacontinuousvariable.Forexample,Table15.6showsmeasurementsofcreatinekinase(CK)inpatientswithunstableangina
andacutemyocardialinfarction.Figure15.5(a)showsascatterplot.WewishtodetectpatientswithAMIamongpatientswhomayhaveeitherconditionandthismeasurementisapotentialtest,AMIpatientstendingtohavehighvalues.Howdowechoosethecut-offpoint?ThelowestCKinAMIpatientsis90,soacut-offbelowthiswilldetectallAMIpatients.Using80,forexample,wewoulddetectallAMIpatients,sensitivity=1.00,butwouldalsoonlyhave42%ofanginapatientsbelow80,sothesensitivity=0.42.Wecanalterthesensitivityandspecificitybychangingthecut-offpoint.Raisingthecut-offpointwillmeanfewercaseswillbedetectedandsothesensitivitywillbedecreased.However,therewillbefewerfalsepositives,positivesontestbutwhodonotinfacthavethedisease,andthespecificitywillbeincreased.Forexample,ifCK≥100werethecriterionforAMI,sensitivitywouldbe0.96andspecificity0.62.Thereisatrade-offbetweensensitivityandspecificity.Itcanbehelpfultoplotsensitivityagainstspecificitytoexaminethistrade-off.ThisiscalledareceiveroperatingcharacteristicorROCcurve.(Thenamecomesfromtelecommunications.)
Weoftenplotsensitivityagainst1–specificity,asinFigure15.5(b).WecanseefromFigure15.5(b)thatwecangetbothhighsensitivityandhighspecificityifwechoosetherightcut-off.With1-specificitylessthan0.1,i.e.sensitivitygreaterthan0.9.wecangetsensitivitygreater
![Page 476: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/476.jpg)
than0.9also.Infact,acut-offof200wouldgivesensitivity=0.93andspecificity=0.91inthissample.Theseestimateswillbebiased,becauseweareestimatingthecut-offandtestingitinthesamesample.Weshouldcheckthesensitivityandspecificityofthiscut-offinadifferentsampletobesure.
Table15.6.Creatinekinaseinpatientswithunstableanginaandacutemyocardialinfarction(AMI)(dataof
FrancesBoa)
Unstableangina AMI
23 48 62 83 104 130 307 90 648
33 49 63 84 105 139 351 196 894
36 52 63 85 105 150 360 302 962
37 52 65 86 107 155 311 1015
37 52 65 88 108 157 325 1143
41 53 66 88 109 162 335 1458
41 54 67 88 111 176 347 1955
41 57 71 89 114 180 349 2139
42 57 72 91 116 188 363 2200
![Page 477: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/477.jpg)
42 58 72 94 118 198 377 3044
43 58 73 94 121 226 390 7590
45 58 73 95 121 232 398 11138
47 60 75 97 122 257 545
48 60 80 100 126 257 577
48 60 80 103 130 297 629
Fig.15.5.ScatterdiagramandwithROCcurveforthedataofTable15.6
TheareaundertheROCcurveisoftenquoted(hereitis0.9753).Itestimatestheprobabilitythatamemberofonepopulationchosenatrandomwillexceedamemberoftheotherpopulation,inthesamewayasdoesU/n1n2intheMann–WhitneyUtest(§12.2).Itcanbeusefulin
![Page 478: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/478.jpg)
comparingdifferenttests.InthisstudyanotherbloodtestgaveusanareaundertheROCcurve=0.9825,suggestingthatthetestmaybeslightlybetterthanCK.
WecanalsoestimatethepositivepredictivevalueorPPV,theprobabilitythatasubjectwhoistestpositivewillbeatruepositive(i.e.hasthediseaseandiscorrectlyclassified),andthenegativepredictivevalueorNPV,theprobabilitythatasubjectwhoistestnegativewillbeatruenegative(i.e.doesnothavethediseaseandiscorrectlyclassified).Thesedependontheprevalenceofthecondition,Pprev,aswellasthesensitivity,Psens,andthespecificity,pspec.Ifthesampleisasinglegroupofpeople,weknowtheprevalenceandcanestimatePPVandNPVforthispopulationdirectlyassimpleproportions.Ifwestartedwithasampleofcasesandasampleofcontrols,wedonotknowtheprevalence,butwecanestimatePPVandNPVforapopulationwithanygivenprevalence.Asdescribedin§6.8,psensistheconditionalprobabilityofapositivetestgiventhedisease,sotheprobabilityofbeingbothtestpositiveanddiseasepositiveispsens×pprev.Similarly,theprobabilityofbeingbothtestnegativeanddiseasepositiveis(1-pspec)×(1-pprev).Theprobabilityofbeingtestpositiveisthesumofthese(§6.2):psens×pprev+(1-pspec)×(1-pprev)andthePPVis
Similarly,theNPVis
InscreeningsituationstheprevalenceisalmostalwayssmallandthePPVislow.Supposewehaveafairlysensitiveandspecifictest,psens=0.95andpspec=0.90,andthediseasehasprevalencepprev=0.01(1%).Then
![Page 479: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/479.jpg)
soonly8.8%oftestpositiveswouldbetruepositives,butalmostalltestnegativeswouldbetruenegatives.Mostscreeningtestsaredealingwithmuchsmallerprevalencesthanthis,somosttestpositivesarefalsepositives.
15.5NormalrangeorreferenceintervalIn§15.4wewereconcernedwiththediagnosisofparticulardiseases.Inthissectionwelookatittheotherwayroundandaskwhatvaluesmeasurementsonnormal,healthypeoplearelikelytohave.Therearedifficultiesindoingthis.Whois‘normal’anyway?IntheUKpopulationalmosteveryonehashardfattydepositsintheircoronaryarteries,whichresultindeathformanyofthem.VeryfewAfricanshavethis;theydiefromothercauses.SoitisnormalintheUKtohaveanabnormality.Weusuallysaythatnormalpeoplearetheapparentlyhealthymembersofthelocalpopulation.WecandrawasampleoftheseasdescribedinChapter3andmakethemeasurementonthem.
Thenextproblemistoestimatethesetofvalues.Ifweusetherangeoftheobservations,thedifferencebetweenthetwomostextremevalues,wecanbefairlyconfidentthatifwecarryonsamplingwewilleventuallyfindobservationsoutsideit.andtherangewillgetbiggerandbigger(§4.7).Toavoidthisweusearangebetweentwoquantiles(§4.7),usuallythe2.5centileandthe97.5centile,whichiscalledthenormalrange,95%referencerangeor95%referenceinterval.Thisleaves5%ofnormalsoutsidethe‘normalrange’,whichisthesetofvalueswithinwhich95%ofmeasurementsfromapparentlyhealthyindividualswilllie.
Athirddifficultycomesfromconfusionbetween‘normal’asusedinmedicineand‘Normaldistribution’asusedinstatistics.ThishasledsomepeopletodevelopapproacheswhichsaythatalldatawhichdonotfitunderaNormalcurveareabnormal!Suchmethodsaresimplyabsurd,thereisnoreasontosupposethatallvariablesfollowaNormaldistribution(§7.4,§7.5).Theterm‘referenceinterval’,whichisbecomingwidelyused,hastheadvantageofavoidingthisconfusion.However,themostcommonlyusedmethodofcalculationrestsontheassumptionthatthevariablefollowsaNormaldistribution.
![Page 480: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/480.jpg)
Wehavealreadyseenthatingeneralmostobservationsfallwithintwostandarddeviationsofthemean,andthatforaNormaldistribution95%arewithintheselimitswith2.5%belowand2.5%above.IfweestimatethemeanandstandarddeviationofdatafromaNormalpopulationwecanestimatethereferenceintervalas[xwithbarabove]-2sto[xwithbarabove]+2s.
ConsidertheFEV1dataofTable4.5.WewillestimatethereferenceintervalforFEV1inmalemedicalstudents.Wehave57observations,mean4.06andstandarddeviation0.67litres.Thereferenceintervalisthusfrom2.7to5.4litres.FromTable4.4weseethatinfactonlyonestudent(2%)isoutsidetheselimits,althoughthesampleisrathersmall.
Hence,providedNormalassumptionshold,thestandarderrorofthelimitofthereferenceintervalis
ComparetheserumtriglyceridemeasurementsofTable4.8.Asalreadynoted(§4.4,§7.4).thedataarehighlyskewed,andwecannotusetheNormalmethoddirectly.Ifwedid,thelowerlimitwouldbe0.07,wellbelowanyoftheobservations,andtheupperlimitwouldbe0.94,greaterthanwhichare5%oftheobservations.Itispossibleforsuchdatatogiveanegativelowerlimit.
![Page 481: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/481.jpg)
BecauseoftheobviouslyunsatisfactorynatureoftheNormalmethodforsomedata,someauthorshaveadvocatedtheestimationofthepercentilesdirectly(§4.5),withoutanydistributionalassumptions.Thisisanattractiveidea.Wewanttoknowthepointbelowwhich2.5%ofvalueswillfall.Letussimplyranktheobservationsandfindthepointbelowwhich2.5%oftheobservationsfall.Forthe282triglycerides,the2.5and97.5centilesarefoundasfollows.Forthe2.5centile,wefindi=q(n+1)=0.025×(282+1)=7.08.Therequiredquantilewillbebetweenthe7thand8thobservation.The7this0.21,the8this0.22sothe2.5centilewouldbeestimatedby0.21+(0.22-0.21)×(7.08-7)=0.211.Similarlythe97.5centileis1.039.
Thisapproachgivesanunbiassedestimatewhateverthedistribution.Thelogtransformedtriglyceridewouldgiveexactlythesameresults.NotethattheNormaltheorylimitsfromthelogtransformeddataareverysimilar.Wenowlookattheconfidenceinterval.The95%confidenceintervalfortheqquantile,hereqbeing0.025or0.975,estimateddirectlyfromthedataisfoundbytheBinomialdistributionmethod(§8.9).Forthetriglyceridedata,n=282andsoforthelowerlimit,q=0.025,wehave
Thisgivesj=1.9andk=12.2,whichwerounduptoj=2andk=13.Inthetriglyceridedatathesecondobservation,correspondingtoj=2,is0.16andthe13this0.26.Thusthe95%confidenceintervalforthelowerreferencelimitis0.16to0.26.Thecorrespondingcalculationforq=0.975givesj=270andk=281.The270thobservationis0.96andthe281stis1.64,givinga95%confidenceintervalfortheupperreferencelimitof0.96to1.64.ThesearewiderconfidenceintervalsthanthosefoundbytheNormalmethod,thoseforthelongtailparticularlyso.Thismethodofestimatingpercentilesinlongtailsisrelativelyimprecise.
![Page 482: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/482.jpg)
15.6*SurvivaldataWeoftenhavedatawhichrepresentthetimefromsomeeventtodeath,suchastimefromdiagnosisorfromentrytoaclinicaltrial,butsurvivalanalysisdoesnothavetobeaboutdeath.Incancerstudieswecanusesurvivalanalysisforthetimetometastasisortolocalrecurrenceofatumour,inastudyofmedicalcarewecanuseittoanalysethetimetoreadmissiontohospital,inastudyofbreast-feedingwecouldlookattheageatwhichbreast-feedingceasedoratwhichbottlefeedingwasfirstintroduced,andinastudyofthetreatmentofinfertilitywecantreatthetimefromtreatmenttoconceptionassurvivaldata.Weusuallyrefertotheterminalevent,death,conception,etc.,astheendpoint.
Problemsariseinthemeasurementofsurvivalbecauseoftenwedonotknowtheexactsurvivaltimesofallsubjects.Thisisbecausesomewillstillbesurvivingwhenwewanttoanalysethedata.Whencaseshaveenteredthestudyatdifferenttimes,someoftherecententrantsmaybesurviving,butonlyhavebeenobservedforashorttime.Theirobservedsurvivaltimemaybelessthanthosecasesadmittedearlyinthestudyandwhohavesincedied.Themethodofcalculatingsurvivalcurvesdescribedbelowtakesthisintoaccount.Observationswhichareknownonlytobegreaterthansomevaluearerightcensored,oftenshortenedtocensored.(Wegetleftcensoreddatawhenthemeasurementmethodcannotdetectanythingbelowsomecut-offvalue,andobservationsarerecordedas‘nonedetectable’.TherankmethodsinChapter12areusefulforsuchdata.)
Table15.7.Survivaltimeinyearsofpatientsafterdiagnosisofparathyroidcancer
Alive Deaths
<1 <1
![Page 483: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/483.jpg)
<1 2
1 6
1 6
4 7
5 9
6 9
8 11
10 14
10
17
Table15.7showssomesurvivaldata,forpatientswithparathyroidcancer.Thesurvivaltimesarerecordedincompletedyears.Apatientwhosurvivedfor6yearsandthendiedcanbetakenashavinglivedfor6yearsandthendiedintheseventh.Inthefirstyearfromdiagnosis.onepatientdied,twopatientswereobservedforonlypartofthisyear,and17survivedintothenextyear.Thesubjectswhohaveonlybeenobservedforpartoftheyeararecensored,alsocalledlosttofollow-uporwithdrawnfromfollow-up.(Thesearerathermisleadingnames,oftenwronglyinterpretedasmeaningthatthesesubjectshavedroppedoutofthestudy.Thismaybethecase,butmostofthesesubjectsare
![Page 484: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/484.jpg)
simplystillaliveandtheirfurthersurvivalisunknown.)Thereisnoinformationaboutthesurvivalofthesesubjectsafterthefirstyear,becauseithasnothappenedyet.Thesepatientsareonlyatriskofdyingforpartoftheyearandwecannotsaythat1outof20diedastheymayyetcontributeanotherdeathinthefirstyear.Wecansaythatsuchpatientswillcontributehalfayearofrisk,onaverage,sothenumberofpatientyearsatriskinthefirstyearis18(17whosurvivedand1whodied)plus2halvesforthosewithdrawnfromfollow-up,giving19altogether.Wegetanestimateoftheprobabilityofdyinginthefirstyearof1/19,andanestimatedprobabilityofsurvivingof1-1/19.Wecandothisforeachyearuntilthelimitsofthedataarereached.Wethustracethesurvivalofthesepatientsestimatingtheprobabilityofdeathorsurvivalateachyearandthecumulativeprobabilityofsurvivaltoeachyear.Thissetofprobabilitiesiscalledalifetable.
Tocarryoutthecalculation,wefirstsetoutforeachyear,x,thenumberaliveatthestart,nx,thenumberwithdrawnduringtheyear,wx,andthenumberatrisk,rx,andthenumberdying,dx(Table15.8).Thusinyear1thenumberatthestartis20,thenumberwithdrawnis2,thenumberatriskr1=n1-1/2w1=20-1/2×2=19andthenumberofdeathsis1.Astherewere2withdrawalsand1deaththenumberatthestartofyear2is17.Foreachyearwecalculatetheprobabilityofdyinginthatyearforpatientswhohavereachedthebeginningofit,qx=dx/rx,andhencetheprobabilityofsurvivingtothenextyear,px=1-qx.Finallywecalculatethecumulativesurvivalprobability.
Table15.8.Lifetablecalculationforparathyroidcancersurvival
Year Numberatstart
Withdrawnduringyear
Atrisk Deaths
Prob.ofdeath
Prob.ofsurvivingyear
![Page 485: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/485.jpg)
x nx wx rx dx qx
1 20 2 19 1 0.0526 0.9474
2 17 2 16 0 0 1
3 15 0 15 1 0.0667 0.9333
4 14 0 14 0 0 1
5 14 1 13.5 0 0 1
6 13 1 12.5 0 0 1
7 12 1 11.5 2 0.1739 0.8261
8 9 0 9 1 0.1111 0.8889
9 8 1 7.5 0 0 1
10 7 0 7 2 0.2857 0.7143
11 5 2 4 0 0 1
12 3 0 3 1 0.3333 0.6667
13 2 0 2 0 0 1
14 2 0 2 0 0 1
![Page 486: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/486.jpg)
15 2 0 2 1 0.5000 0.5000
16 1 0 1 0 0 1
17 1 0 1 0 0 1
18 1 1 0.5 0 0 1
rx=nx-1/2wx,qx=dx/rx,px=1-qx,Px=pxPx-1.
Forthefirstyear,thisistheprobabilityofsurvivingthatyear,P1=p1.Forthesecondyear,itistheprobabilityofsurvivinguptothestartofthesecondyear,P1,timestheprobabilityofsurvivingthatyear,p2,togiveP2=p2P1.Theprobabilityofsurvivingfor3yearsissimilarlyP3=p3P2,andsoon.Fromthislifetablewecanestimatethefiveyearsurvivalrate,ausefulmeasureofprognosisincancer.Fortheparathyroidcancer,thefiveyearsurvivalrateis0.8842,or88%.Wecanseethattheprognosisforthiscancerisquitegood.Ifweknowtheexacttimeofdeathorwithdrawalforeachsubject,theninsteadofusingfixedtimeintervalsweusexastheexacttime,witharowofthetableforeachtimewheneitheranendpointorawithdrawaloccurs.Thenrx=nxandwecanomittherx=nx-1/2wxstep.
Wecandrawagraphofthecumulativesurvivalprobability,thesurvivalcurve.Thisisusuallydrawninsteps,withabruptchangesinprobability(Figure15.6).Thisconventionemphasizestherelativelypoorestimationatthelongsurvivalendofthecurve,wherethesmallnumbersatriskproducedlargesteps.Whentheexacttimesofdeathandcensoringareknown,thisiscalledaKaplan-Meiersurvivalcurve.Thetimesatwhichobservationsarecensoredmaybemarkedbysmallverticallinesabovethesurvivalcurve(Figure15.7),andthenumberremainingatriskmaybewrittenatsuitableintervalsbelowthetime
![Page 487: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/487.jpg)
axis.
Thestandarderrorandconfidenceintervalforthesurvivalprobabilitiescanbefound(seeArmitageandBerry1994).Theseareusefulforestimatessuchasfiveyearsurvivalrate.Theydonotprovideagoodmethodforcomparing
survivalcurves,astheydonotincludeallthedata,onlyusingthoseuptothechosentime.Survivalcurvesstartofftogetherat100%survival,possiblydiverge,buteventuallycometogetheratzerosurvival.Thusthecomparisonwoulddependonthetimechosen.Survivalcurvescanbecomparedbyseveralsignificancetests,ofwhichthebestknownisthelogranktest.Thisisanon-parametrictestwhichmakesuseofthefullsurvivaldatawithoutmakinganyassumptionabouttheshapeofthesurvivalcurve.
Fig.15.6.Survivalcurveforparathyroidcancerpatients
Table15.9showsthetimetorecurrenceofgallstonesfollowingdissolutionbybileacidtreatmentorlithotrypsy.Hereweshallcomparethetwogroupsdefinedbyhavingsingleormultiplegallstones,usingthelogranktest.Weshalllookatthequantitativevariablesdiameterof
![Page 488: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/488.jpg)
gallstoneandmonthstodissolvein§17.9.Figure15.7showthetimetorecurrenceforsubjectswithsingleprimarygallstonesandmultipleprimarygallstones.Thenullhypothesisisthatthereisnodifferenceinrecurrence-freesurvivaltime,thealternativethatthereissuchadifference.ThecalculationofthelogranktestissetoutinTable15.10.Foreachtimeatwhicharecurrenceoracensoringoccurred,wehavethenumbersunderobservationineachgroup,n1andn2,thenumberofrecurrences,d1andd2(dfordeath),andthenumberofcensorings,w1
andw2(wforwithdrawal).Foreachtime,wecalculatetheprobabilityofrecurrence,pd=(d1+d2)/(n1+n2),whicheachsubjectwouldhaveifthenullhypothesisweretrue.Foreachgroup,wecalculatetheexpectednumberofrecurrences,e1=Pd×n1ande2=Pd×n2.Wethencalculatethenumbersatriskatthenexttime,n1-d1-w1andn2-d2-w2.Wedothisforeachtime.Wethenaddthed1andd2columnstogettheobservednumbersofrecurrences,andthee1ande2columnstogetthenumbersofrecurrencesexpectedifthenullhypothesisweretrue.
Wehaveobservedfrequenciesofrecurrenced1andd2,andexpectedfrequenciese1,ande2.Ofcourse,d1+d2=e1+e2,soweonlyneedtocalculatee1asinTable15.10.andhencee2bysubtraction.Thisonlyworksfortwogroups,however,andthemethodofTable15.10worksforanynumberofgroups.
Table15.9.Timetorecurrenceofgallstonesfollowingdissolution,whetherpreviousgallstonesweremultiple,
maximumdiameterofpreviousgallstones,andmonthspreviousgallstonestooktodissolve
Time Rec. Mult. Diam. Dis. Time Rec. Mult. Diam.
3 No Yes 4 10 13 No No 11
![Page 489: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/489.jpg)
3 No No 18 3 13 No No 22
3 No Yes 5 27 13 No No 13
4 No Yes 4 4 13 Yes Yes
5 No No 19 20 14 No Yes
6 No Yes 3 10 14 No No 23
6 No Yes 4 6 14 No No 15
6 No Yes 4 20 16 Yes Yes
6 Yes Yes 5 8 16 Yes Yes
6 Yes Yes 3 18 16 No No 18
6 Yes Yes 7 9 17 No No
6 No No 25 9 17 No Yes
6 No Yes 4 6 17 No Yes
6 Yes Yes 10 38 17 Yes No
6 Yes Yes 8 15 17 No Yes
6 No Yes 4 13 18 Yes No 10
![Page 490: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/490.jpg)
7 Yes Yes 4 15 18 Yes Yes
7 No Yes 3 7 18 No Yes 11
7 Yes Yes 10 48 19 No No 26
8 Yes Yes 14 29 19 No Yes 11
8 Yes No 18 14 19 Yes Yes
8 Yes Yes 6 6 20 No No 11
8 No No 15 1 20 No No 13
8 No Yes 1 12 20 No No
8 No Yes 5 6 21 No Yes 11
9 No Yes 2 15 21 No Yes 13
9 Yes Yes 7 6 21 No Yes
9 No No 19 8 22 No No 10
10 Yes Yes 14 8 22 No No 20
11 No Yes 8 12 23 No No 16
11 No No 15 15 24 No No 15
![Page 491: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/491.jpg)
11 Yes No 5 8 24 No Yes
11 No Yes 3 6 24 No No 15
11 Yes Yes 5 12 24 Yes Yes
11 No Yes 4 6 25 No No 13
11 No Yes 4 3 25 Yes Yes
11 No Yes 13 18 25 No No
11 Yes No 7 8 26 No No 17
12 Yes Yes 5 7 26 No Yes
12 Yes Yes 8 12 26 Yes No 16
12 No Yes 4 6 28 No No 20
12 No Yes 4 8 28 Yes No 30
12 Yes Yes 7 19 29 No No 16
12 Yes No 7 3 29 Yes No 12
12 No Yes 5 22 29 Yes Yes 10
![Page 492: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/492.jpg)
12 Yes No 8 1 29 No Yes
12 No No 6 6 30 No Yes
12 No No 26 4 30 No No
13 No Yes 5 6 30 Yes Yes 22
13 No No 13 6 30 Yes Yes
31 No Yes 5 6 38 No No 10
31 No No 26 3 38 Yes Yes
31 No No 7 24 38 No No
32 Yes Yes 10 12 40 No No 23
32 No Yes 5 6 41 No No 16
32 No No 4 6 41 No No
32 No No 18 10 42 No No 15
33 No No 13 9 42 No Yes 16
34 No No 15 8 42 No Yes
34 No No 20 30 42 No Yes 14
![Page 493: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/493.jpg)
34 No Yes 15 8 43 Yes No
34 No No 27 8 44 No Yes
35 No No 6 12 44 No Yes 10
36 No No 18 5 45 No No 12
36 No Yes 6 16 47 No Yes
36 No Yes 5 6 48 No No 21
36 No Yes 8 17 48 No No
36 No No 5 4 53 No Yes
37 No Yes 5 7 60 Yes No 15
37 No No 19 4 61 No No 10
37 No Yes 4 4 65 No Yes
37 No Yes 4 12 70 No Yes
![Page 494: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/494.jpg)
Fig.15.7.Gallstone-freesurvivalafterthedissolutionofsingleandmultiplegall-stones
Table15.10.Calculationforthelogranktest
Time n1 d1 w1 n2 d2 w2 pd e1
3 65 0 1 79 0 2 0.000 0.000
4 64 0 0 77 0 1 0.000 0.000
5 64 0 1 76 0 0 0.000 0.000
6 63 0 1 76 5 5 0.036 2.266
![Page 495: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/495.jpg)
7 62 0 0 66 2 1 0.016 0.969
8 62 1 1 63 2 2 0.024 1.488
9 60 0 1 59 1 1 0.008 0.504
10 59 0 0 57 1 0 0.009 0.509
11 59 2 1 56 1 5 0.026 1.539
12 56 2 2 50 3 3 0.047 2.642
13 52 0 4 44 1 1 0.010 0.542
14 48 0 2 42 0 1 0.000 0.000
16 46 0 1 41 2 0 0.023 1.057
17 45 1 1 39 0 3 0.012 0.536
18 43 1 0 36 1 1 0.025 1.089
19 42 0 1 34 1 1 0.013 0.553
20 41 0 3 32 0 0 0.000 0.000
21 38 0 0 32 0 3 0.000 0.000
22 38 0 2 29 0 0 0.000 0.000
![Page 496: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/496.jpg)
23 36 0 1 29 0 0 0.000 0.000
24 35 0 2 29 1 1 0.016 0.547
25 33 0 2 27 1 0 0.017 0.550
26 31 1 1 26 0 1 0.018 0.544
28 29 1 1 25 0 0 0.019 0.537
29 27 1 1 25 1 1 0.038 1.038
30 25 0 1 23 2 1 0.042 1.042
31 24 0 2 20 0 1 0.000 0.000
32 22 0 2 19 1 1 0.024 0.537
33 20 0 1 17 0 0 0.000 0.000
34 19 0 3 17 0 1 0.000 0.000
35 16 0 1 16 0 0 0.000 0.000
36 15 0 2 16 0 3 0.000 0.000
37 13 0 1 13 0 3 0.000 0.000
38 12 0 2 10 1 0 0.045 0.545
![Page 497: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/497.jpg)
40 10 0 1 9 0 0 0.000 0.000
41 9 0 2 9 0 0 0.000 0.000
42 7 0 1 9 0 3 0.000 0.000
43 6 1 0 6 0 0 0.083 0.500
44 5 0 0 4 0 2 0.000 0.000
45 5 0 1 4 0 0 0.000 0.000
47 4 0 0 4 0 1 0.000 0.000
48 4 0 2 3 0 0 0.000 0.000
53 2 0 0 3 0 1 0.000 0.000
60 2 1 0 2 0 0 0.250 0.500
61 1 0 1 2 0 0 0.000 0.000
65 0 0 0 2 0 1 0.000 0.000
70 0 0 0 1 0 1 0.000 0.000
Total 12 27 20.032
![Page 498: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/498.jpg)
pd=(d1+d2)/(n1+n2),e1=pdn1,e2=pdn2.
Wecantestthenullhypothesisthattheriskofrecurrenceinanymonthisequalforthetwopopulationsbyachi-squaredtest:
Thereisoneconstraint,thatthetwofrequenciesaddtothesumoftheexpected(i.e.thetotalnumberofrecurrences),soweloseonedegreeoffreedom,giving2-1=1degreeoffreedom.FromTable13.3.thishasaprobabilityof0.01.
Sometextsdescribethistestdifferently,sayingthatunderthenullhypothesisd1isfromaNormaldistributionwithmeane1andvariancee1e2/(e1+e2).Thisisalgebraicallyidenticaltothechi-squaredmethod,butonlyworksfortwogroups.
Thelogranktestisnon-parametric,becausewemakenoassumptionsabouteitherthedistributionofsurvivaltimeoranydifferenceinrecurrencerates.Itrequiresthesurvivalorcensoringtimestobeexact.AsimilarmethodforgroupeddataasinTable15.8isgivenbyMantel(1966).
Thelogranktestisatestofsignificanceand,ofcourse,anestimateofthedifferenceispreferableifwecangetone.Thelogranktestcalculationcanbeusedtogiveusone:thehazardratio.Thisistheratiooftheriskofdeathingroup1totheriskofdeathingroup2.Forthistomakesense,wehavetoassumethatthisratioisthesameatalltimes,otherwisetherecouldnotbeasingleestimate.(Comparethepairedtmethod,§10.2.)Theriskofdeathisthenumberofdeathsdividedbythepopulationatrisk,butthepopulationkeepschangingduetocensoring.However,thepopulationsatriskinthetwogroupsareproportionaltothenumbersofexpecteddeaths,e1ande2.Wecanthuscalculatethehazardratioby
![Page 499: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/499.jpg)
ForTable15.10.wehave
Thusweestimatetheriskofrecurrencewithsinglestonestobe0.42timestheriskformultiplestones.ThedirectcalculationofaconfidenceintervalforthehazardratioistediousandIshallomitit.Altman(1991)givesdetails.ItcanalsobedonebyCoxregression(§17.9).
15.7*ComputeraideddiagnosisReferenceintervals(§15.5)areoneareawherestatisticalmethodsareinvolveddirectlyindiagnosis,computeraideddiagnosisisanother.The‘aided’isputintopersuadecliniciansthatthemainpurposeisnottodothemoutofajob,but,naturally,theyhavetheirdoubts.Computeraideddiagnosisispartlyastatisticalexercise.Therearetwotypesofcomputeraideddiagnosis:statisticalmethods,wherediagnosisisbasedonasetofdataobtainedfrompastcases,and
decisiontreemethods,whichtrytoimitatethethoughtprocessesofanexpertinthefield.Weshalllookbrieflyateachapproach.
Thereareseveralmethodsofstatisticalcomputeraideddiagnosis.Oneusesdiscriminantanalysis.Inthiswestartwithasetofdataonsubjectswhosediagnosiswassubsequentlyconfirmed,andcalculateoneormorediscriminantfunctions.Adiscriminantfunctionhastheform:
constant1×variable1+constant2×variable2+…+constantk×variablek
Theconstantsarecalculatedsothatthevaluesofthefunctionsareassimilaraspossibleformembersofthesamegroupandasdifferentaspossibleformembersofdifferentgroups.Inthecaseofonlytwogroups,wehaveonediscriminantfunctionandallthesubjectsinonegroupwillhavehighvaluesofthefunctionandallsubjectsintheotherwillhavelowvalues.Foreachnewsubjectweevaluatethe
![Page 500: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/500.jpg)
discriminantfunctionanduseittoallocatethesubjecttoagroupordiagnosis.Wecanestimatetheprobabilityofthesubjectfallinginthatgroup,andinanyother.Manyformsofdiscriminantanalysishavebeendevelopedtotryandimprovethisformofcomputerdiagnosis,butitdoesnotseemtomakemuchdifferencewhichisused.Logisticregression(§17.8)canalsobeused.
AdifferentapproachusesBayesiananalysis.ThisisbasedonBayes'theorem,aresultaboutconditionalprobability(§6.8)whichmaybestatedintermsoftheprobabilityofdiagnosisAbeingtrueifwehaveobserveddataB,as:
Ifwehavealargedatasetofknowndiagnosesandtheirassociatedsymptomsandsigns,wecandeterminePROB(diagnosisA)easily.ItissimplytheproportionoftimesAhasbeendiagnosed.Theproblemoffindingtheprobabilityofaparticularcombinationofsymptomsandsignsismoredifficult.Iftheyareallindependent,wecansaythattheprobabilityofagivensymptomistheproportionoftimesitoccurs,andtheprobabilityofthesymptomforeachdiagnosisisfoundinthesameway.Theprobabilityofanycombinationofsymptomscanbefoundbymultiplyingtheirindividualprobabilitiestogether,asdescribedin§6.2.Inpracticetheassumptionthatsignsandsymptomsareindependentismostunlikelytobemetandamorecomplicatedanalysiswouldberequiredtodealwiththis.However,somesystemsofcomputeraideddiagnosishavebeenfoundtoworkquitewellwiththesimpleapproach.
Expertorknowledge-basedsystemsworkinadifferentway.Heretheknowledgeofahumanexpertorgroupofexpertsinthefieldisconvertedintoaseriesofdecisionrules,e.g.‘ifthepatienthashighCKthenthepatienthasmyocardialinfarction,ifnotthenontothenextdecision’.Thesesystemscanbemodifiedbyaskingfurtherexpertstotestthesystemwithcasesfromtheirownexperienceandtosuggestfurtherdecisionrulesiftheprogramfails.Theyalsohavetheadvantagethattheprogramcan‘explain’thereasonforits‘decision’bylistingtheseriesofstepswhichledtoit.MostofChapter14consistsofrulesofjust
thistypeandcouldbeturnedintoanexpertsystemforstatistical
![Page 501: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/501.jpg)
analysis.
Althoughtherehavebeensomeimpressiveachievementsinthefieldofcomputerdiagnosis,ithastodatemadelittleprogresstowardsacceptanceinroutinemedicalpractice.Ascomputersbecomemorefamiliartoclinicians,morecommonintheirsurgeriesandmorepowerfulintermsofdatastorageandprocessingspeed,wemayexpectcomputeraideddiagnosistobecomeaswellestablishedascomputeraidedstatisticalanalysisistoday.
15.8*NumberneededtotreatWhenaclinicaltrialhasadichotomousoutcomemeasure,suchassurvivalordeath,thereareseveralwaysinwhichwecanexpressthedifferencebetweenthetwotreatments.Theseincludethedifferencebetweenproportionsofsuccesses,ratioofproportions(riskratioorrelativerisk),andtheoddsratio.Thenumberneededtotreat(NNT)isthenumberofpatientswewouldneedtotreatwiththenewtreatmenttoachieveonemoresuccessthanwewouldontheoldtreatment(Laupacisetal.1988;CookandSackett1995).Itisthereciprocalofthedifferencebetweentheproportionofsuccessonthenewtreatmentandtheproportionontheoldtreatment.Forexample,intheMRCstreptomycintrial(Table2.10)thesurvivalratesafter6monthswere93%instreptomycingroupand730.93-0.73=0.20andthenumberneededtotreattopreventonedeathoversixmonthswas1/0.20=5.ThesmallertheNNT,themoreeffectivethetreatmentwillbe.ThesmallestpossiblevalueforNNTis1.0,whentheproportionssuccessfulare1.0and0.0.Thiswouldmeanthatthenewtreatmentwasalwayseffectiveandtheoldtreatmentwasnevereffective.TheNNTcannotbezero.Ifthetreatmenthasnoeffectatall,theNNTwillbeinfinite,becausethedifferenceintheproportionofsuccesseswillbezero.Ifthetreatmentisharmful,sothatsuccessrateislessthanonthecontroltreatment,theNNTwillbenegative.Thenumberisthencalledthenumberneededtoharm(NNH).Thisideahascaughtonveryquicklyandhasbeenwidelyusedanddeveloped,forexampleasthenumberneededtoscreen(Rembold1998).
TheNNTisanestimateandshouldhaveaconfidenceinterval.Thisisapparentlyquitestraightforward.Wefindtheconfidenceintervalfor
![Page 502: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/502.jpg)
thedifferenceintheproportions,andthereciprocaloftheselimitsaretheconfidencelimitsfortheNNT.FortheMRCstreptomycintrialthe95%confidenceintervalforthedifferenceis0.0578to0.3352,reciprocals17.3and3.0.Thusthe95%confidenceintervalfortheNNTis3to17.
Thisisdeceptivelysimple.AsAltman(1998)pointedout,thereareproblemswhenthedifferenceisnotsignificant.Theconfidenceintervalforthedifferencebetweenproportionsincludeszero,soinfinityisapossiblevalueforNNT,andnegativevaluesarealsopossible,i.e.thetreatmentmayharm.Theconfidenceintervalmustallowforthis.
Forexample,Henzietal.(2000)calculatedNNTforseveralstudies,includingthatofLopez-Olaondoetal.(1996).Thisstudycompareddexamethasoneagainstplacebotopreventpostoperativenauseaandvomiting.Theyobserved
nauseain5/25patientsondexamethasoneand10/25onplacebo.Thusthedifferenceinproportionswithoutnausea(success)is0.80-0.60=0.20,95%confidenceinterval-0.0479to0.4479(§8.6).Thenumberneededtotreatisthereciprocalofthisdifference,1/0.20=5.0.Thereciprocalsoftheconfidencelimtsare1/(-0.0479)=-20.9and1/0.4479=2.2.ButtheconfidenceintervalfortheNNTisnot-20.9to2.2.Zero,whichthisincludes,isnotapossiblevaluefortheNNT.Sincetheremaybenotreatmentdifferenceatall,zerodifferencebetweenproportions,theNNTmaybeinfinite.Infact,theconfidenceintervalforNNTisnotthevaluesbetween-20.9and2.2,butthevaluesoutsidethisinterval,i.e.2.2toinfinity(numberneededtoachieveanextrasuccess,NNT)andminusinfinityto-20.9(numberneededtoachieveanextrafailure,NNH).ThustheNNTisestimatedtobeanythinggreaterthan2.2,andtheNNHtobeanythinggreaterthan20.9.Theconfidenceintervalisintwoparts,-∞to-20.9and2.2to∞.(‘∞’isthesymbolforinfinity.)Henzietal.(2000)quotethisconfidenceintervalas2.2to-21,whichtheysaythereadershouldinterpretasincludinginfinity.Altman(1998)recommends‘NNTH=21.9to∞toNNTB2.2’,whereNNTHmeans‘numberneededtoharm’andNNTBmeans‘numberneededtobenefit’.Iprefer‘-∞to-20.9,2.2to∞’.Here-∞and∞each
![Page 503: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/503.jpg)
tellusthatitdoesnotmatterwhichtreatmentisused.
Two-partconfidenceintervalsarenotexactlyintuitiveandIthinkthattheproblemsofinterpretationofNNTinnegativetrialslimititsvaluetobeingasupplementarydescriptionoftrialsresults.
15MMultiplechoicequestions81to86(Eachansweristrueorfalse)
81.*Therepeatabilityorprecisionofmeasurementsmaybemeasuredby:
(a)thecoefficientofvariationofrepeatedmeasurements;
(b)thestandarddeviationofmeasurementsbetweensubjects;
(c)thestandarddeviationofthedifferencebetweenpairsofmeasurements;
(d)thestandarddeviationofrepeatedmeasurementswithinsubjects;
(e)thedifferencebetweenthemeansoftwosetsofmeasurementsonthesamesetofsubjects.
ViewAnswer
82.Thespecificityofatestforadisease:
(a)hasastandarderrorderivedfromtheBinomialdistribution;
(b)measureshowwellthetestdetectscasesofthedisease;
(c)measureshowwellthetestexcludessubjectswithoutthedisease;
(d)measureshowoftenacorrectdiagnosisisobtainedfromthetest;
(e)isallweneedtotellushowgoodthetestis.
ViewAnswer
![Page 504: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/504.jpg)
83.Thelevelofanenzymemeasuredinbloodisusedasadiagnostictestforadisease,thetestbeingpositiveiftheenzymeconcentrationisaboveacriticalvalue.Thesensitivityofthediagnostictest:
(a)isoneminusthespecificity;
(b)isameasureofhowwellthetestdetectscasesofthedisease;
(c)istheproportionofpeoplewiththediseasewhoarepositiveonthetest;
(d)increasesifthecriticalvalueislowered;
(e)measureshowwellpeoplewithoutthediseaseareexcluded.
ViewAnswer
84.A95%referenceinterval,95%referencerange,ornormalrange:
(a)maybecalculatedastwostandarddeviationsoneithersideofthemean;
(b)maybecalculateddirectlyfromthefrequencydistribution;
(c)canonlybecalculatediftheobservationsfollowaNormaldistribution;
(d)getswiderasthesamplesizeincreases;
(e)maybecalculatedfromthemeananditsstandarderror.
ViewAnswer
85.Ifthe95%referenceintervalforhaematocritinmenis43.2to49.2:
(a)anymanwithhaematocritoutsidetheselimitsisabnormal;
(b)haematocritsoutsidetheselimitsareproofofdisease:
(c)amanwithahaematocritof46mustbeveryhealthy;
(d)awomanwithahaematocritof48hasahaematocritwithinnormallimits;
![Page 505: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/505.jpg)
(e)amanwithahaematocritof42maybeill.
ViewAnswer
86.*Whenasurvivalcurveiscalculatedfromcensoredsurvivaltimes:
(a)theestimatedproportionsurvivingbecomeslessreliableassurvivaltimeincreases;
(b)individualswithdrawnduringthefirsttimeintervalareexcludedfromtheanalysis;
(c)survivalestimatesdependontheassumptionthatsurvivalratesremainconstantoverthestudyperiod;
(d)itmaybethatthesurvivalcurvewillnotreachzerosurvival;
(e)thefiveyearsurvivalratecanbecalculatedevenifsomeofthesubjectswereidentifiedlessthanfiveyearsago.
ViewAnswer
15EExercise:AreferenceintervalInthisexerciseweshallestimateareferenceinterval.Matheretal.(1979)measuredplasmamagnesiumin140apparentlyhealthypeople,tocomparewithasampleofdiabetics.ThenormalsamplewaschosenfromblooddonorsandpeopleattendingdaycentresfortheelderlyintheareaofSt.George'sHospital,togive10maleand10femalesubjectsineachagedecadefrom15–24to75yearsandover.Questionnaireswereusedtoexcludeanysubjectwithpersistent
diarrhoea,excessivealcoholintakeorwhowereonregulardrugtherapyotherthanhypnoticsandmildanalgesicsintheelderly.ThedistributionofplasmamagnesiumisshowninFigure15.8.Themeanwas0.810mmol/litreandthestandarddeviation0.057mmol/litre.
![Page 506: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/506.jpg)
Fig.15.8.Distributionofplasmamagnesiumin140apparentlyhealthypeople
1.Whatdoyouthinkofthesamplingmethod?Whyuseblooddonorsandelderlypeopleattendingdaycentres?
ViewAnswer
2.Whyweresomepotentialsubjectsexcluded?Wasthisagoodidea?Whywerecertaindrugsallowedfortheelderly?
ViewAnswer
3.DoesplasmamagnesiumappeartofollowaNormaldistribution?
ViewAnswer
4.Whatisthereferenceintervalforplasmamagnesium,usingtheNormaldistributionmethod?
ViewAnswer
5.Findconfidenceintervalsforthereferencelimits.
ViewAnswer
6.Woulditmatterifmeanplasmamagnesiuminnormalpeopleincreasedwithage?Whatmethodmightbeusedtoimprovethe
![Page 507: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/507.jpg)
estimateofthereferenceintervalinthiscase?
ViewAnswer
![Page 508: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/508.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>16-Mortalitystatisticsandpopulationstructure
16
Mortalitystatisticsandpopulationstructure
16.1MortalityratesMortalitystatisticsareoneofourprincipalsourcesofinformationaboutchangingpatternofdiseasewithinacountryandthedifferencesindiseasetweencountries.Inmostdevelopedcountries,anydeathmustbecertifiedby-doctor,whorecordsthecause,dateandplaceofdeathandsomedataaboutdeceased.InBritain,theseincludethedateofbirth,areaofresidenceandknownoccupation.Thesedeathcertificatesformtherawmaterialfromwhichmortalitystatisticsarecompiledbyanationalbureauofcensuses,inBritaintheOfficeforNationalStatistics.Thenumbersofdeathscanbetabulatedbycause,sex,age,typesofoccupation,areaofresidence,andmaritalstatus.Table5.1showsonesuchtabulation,ofdeathsbycauseandsex.
Forpurposesofcomparisonwemustrelatethenumberofdeathstothenumberinthepopulationinwhichtheyoccur.Wehavethisinformationfairlyreliablyat10yearintervalsfromthedecennialcensusofthecountry.Wecanestimatethesizeandageandsexstructureofthepopulationbetweencensusesusingregistrationofbirthsanddeaths.Eachbirthordeathisnotifiedtoanofficialregistrar,andsowecankeepsometrackofchangesinthepopulationThereareother,lesswelldocumentedchangestakingplace,suchasimmigrationandemigration,whichmeanthatpopulationsizeestimatesbetweenthecensusyearsareonlyapproximations.Someestimates,suchasthenumbersindifferentoccupations,aresounreliablethatmortalitydataisonlytabulatedbythemforcensusyears.
Ifwetakethenumberofdeathsoveragivenperiodoftimeanddivide
![Page 509: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/509.jpg)
itbythenumberinthepopulationandthetimeperiod,wegetamortalityrate,thenumberofdeathsperunittimeperperson.Weusuallytakethenumberofdeathsoveronecalendaryear,althoughwhenthenumberofdeathsissmallwemaytakedeathsoverseveralyears,toincreasetheprecisionofthenumerator.Thenumberinthepopulationischangingcontinually,andwetakeasthedenominatortheestimatedpopulationatthemid-pointofthetimeperiod.Mortalityratesareoftenverysmallnumbers,soweusuallymultiplythembyaconstant,suchas1000or100000,toavoidstringsofzerosafterthedecimalpoint.
Whenwearedealingwithdeathsinthewholepopulation,irrespectiveofage,therateweobtainiscalledthecrudemortalityrateorcrudedeathdrate.
Theterms‘deathrate’and‘mortalityrate’areusedinterchangeably.Wecalculatethecrudemortalityrateforapopulationas:
Table16.1.Age-specificmortalityratesandagedistributioninadultmales,EnglandandWales,1901
and1981
Agegroup(years)
Age-specificdeathrateper1000peryear
%Adultpopulationinagegroup
1901 1981 1901 1981
15–19 3.5 0.8 15.36 11.09
20–24 4.7 0.8 14.07 9.75
25–34 6.2 0.9 23.76 18.81
![Page 510: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/510.jpg)
35–44 10.6 1.8 18.46 15.99
45–54 18.0 6.1 13.34 14.75
55–64 33.5 17.7 8.68 14.04
65–74 67.8 45.6 4.57 10.65
75–84 139.8 105.2 1.58 4.28
85+ 276.5 226.2 0.17 0.64
Iftheperiodisinyears,thisgivesthecrudemortalityrateasdeathsper1000populationperyear.
Thecrudemortalityrateissocalledbecausenoallowanceismadefortheagedistributionofthepopulation,andcomparisonsbetweenpopulationswithdifferentagestructures.Forexample,in1901thecrudemortalityrateamongadultmales(agedover15years)inEnglandandWaleswas15.7per1000peryear,andin1981itwas14.8per1000peryear.Itseemsstrangethatwithalltheimprovementsinmedicine,housingandnutritionbetweenthesetimestherehasbeensolittleimprovementinthecrudemortalityrate.Toseewhywemustlookattheage-specificmortalityrates,themortalityrateswithinnarrowagegroups.Age-specificmortalityratesareusuallycalculatedforone,fiveortenyearagegroups.In1901theagespecificmortalityrateformenaged15to19was3.5deathsper1000peryear,whereasin1981itwasonly0.8.AsTable16.1shows,theagespecificmortalityratein1901wasgreaterthanthatin1981foreveryagegroup.Howeverin1901therewasamuchgreaterproportionofthepopulationintheyoungeragegroups,wheremortalitywaslow,thantherewasin1981.
![Page 511: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/511.jpg)
Correspondingly,therewasasmallerproportionofthe1901populationthanthe1981populationinthehighermortalityolderagegroups.Althoughmortalitywasloweratanygivenagein1981,thegreaterproportionofolderpeoplemeantthattherewerealmostasmanydeathsasin1901.
Toeliminatetheeffectsofdifferentagestructuresinthepopulationswhichwewanttocompare,wecanlookattheage-specificdeathrates.Butifwearecomparingseveralpopulations,thisisarathercumbersomeprocedure,anditisoftenmoreconvenienttocalculateasinglesummaryfigurefromtheage-specific
rates.Therearemanywaysofdoingthis,ofwhichthreearefrequentlyused:thedirectandindirectmethodsofagestandardizationandthelifetable.
Table16.2.Calculationoftheagestandardizedmortalityratebythedirectmethod
Agegroup(years)
Standardproportioninagegroup(a)
Observedmortalityrateper1000(b)
a×i
15–19 0.1536 0.8 0.1229
20–24 0.1407 0.8 0.1126
25–34 0.2376 0.9 0.2138
35–44 0.1846 1.8 0.3323
45–54 0.1334 6.1 0.8137
![Page 512: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/512.jpg)
55–64 0.0868 17.7 1.5364
65–74 0.0457 45.6 2.0839
75–84 0.0158 105.2 1.6622
85+ 0.0017 226.2 0.3845
Sum 7.2623
16.2AgestandardizationusingthedirectmethodIshalldescribethedirectmethodfirst.Weuseastandardpopulationstructure,i.e.astandardagedistributionorsetofproportionsofpeopleineachagegroup.Wethencalculatetheoverallmortalityratewhichapopulationwiththestandardagestructurewouldhaveifitexperiencedtheagespecificmortalityratesoftheobservedpopulation,thepopulationwhosemortalityrateistobeadjusted.Weshalltakethe1901populationasthestandardandcalculatethemortalityratethe1981populationwouldhaveexperiencediftheagedistributionwerethesameasin1901.Wedothisbymultiplyingeach1981agespecificmortalityratebytheproportioninthatagegroupinthestandard1901population,andadding.Thisthengivesusanaveragemortalityrateforthewholepopulation,theage-standardizedmortalityrate.Forexample,the1981mortalityrateinagegroup15–19was0.8per1000peryearandtheproportioninthestandardpopulationinthisagegroupis15.36%or0.1536.Thecontributionofthisagegroupis0.8×0.1536=0.1229.ThecalculationissetoutinTable16.2.
Ifweusedthepopulation'sownproportionsineachagegroupinthiscalculationwewouldgetthecrudemortalityrate.Since1901hasbeenchosenasthestandardpopulation,itscrudemortalityrateof15.7is
![Page 513: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/513.jpg)
alsotheage-standardizedmortalityrate.Theage-standardizedmortalityratefor1981was7.3per1000menperyear.Wecanseethattherewasamuchhigherage-standardizedmortalityin1901than1981,reflectingthedifferenceinage-specificmortalityrates.
16.3AgestandardizationbytheindirectmethodThedirectmethodreliesuponage-specificmortalityratesfortheobservedpopulation.Ifwehaveveryfewdeaths,theseage-specificrateswillbeverypoorlyestimated.Thiswillbeparticularlysointheyoungeragegroups,wherewemay
evenhavenodeathsatall.Suchsituationsarisewhenconsideringmortalityduetoparticularconditionsorinrelativelysmallgroups,suchasthosedefinedbyoccupation.Theindirectmethodofstandardizationisusedforsuchdata.Wecalculatethenumberofdeathswewouldexpectintheobservedpopulationifitexperiencedtheage-specificmortalityratesofastandardpopulation.Wethencomparetheexpectednumberofdeathswiththatactuallyobserved.
Table16.3.Age-specificmortalityratesduetocirrhosisoftheliverandagedistributionsofallmenandmedicalpractitioners,EnglandandWales,1971
Agegroup(years)
Mortalitypermillionmenperyear
Numberofmen
Numberofdoctors
15–24 5.859 3584320 1080
25–34 13.050 3065100 12860
35–44 46.937 2876170 11510
![Page 514: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/514.jpg)
45–54 161.503 2965880 10330
55–64 271.358 2756510 7790
IshalltakeasanexamplethedeathsduetocirrhosisoftheliveramongmalequalifiedmedicalpractitionersinEnglandandWales,recordedaroundthe1971census.Therewere14deathsamong43570doctorsagedbelow65,acrudemortalityrateof14/43570=321permillion,comparedto1423outof15247980adultmales(aged15–64),or93permillion.Themortalityamongdoctorsappearshigh,butthemedicalpopulationmaybeolderthanthepopulationofmenasawhole,asitwillcontainrelativelyfewbelowtheageof25.Alsotheactualnumberofdeathsamongdoctorsissmallandanydifferencenotexplainedbytheageeffectmaybeduetochance.Theindirectmethodenablesustotestthis.Table16.3showstheage-specificmortalityratesforcirrhosisoftheliveramongallmenaged15to65,andthenumberofmenestimatedineachten-year-agegroup,forallmenandfordoctors.Wecanseethatthetwoagedistributionsdoappeartobedifferent.
Thecalculationoftheexpectednumberofdeathsissimilartothedirectmethod,butdifferentpopulationsandratesareused.Foreachagegroup,wetakethenumberintheobservedpopulation,andmultiplyitbythestandardagespecificmortalityrate,whichwouldbetheprobabilityofdyingifmortalityintheobservedpopulationwerethesameasthatinthestandardpopulation.Thisgivesusthenumberwewouldexpecttodieinthisagegroupintheobservedpopulation.Weaddtheseovertheagegroupsandobtaintheexpectednumberofdeaths.ThecalculationissetoutinTable16.4.
Theexpectednumberofdeathsis4.4965,whichisconsiderablylessthanthe14observed.Weusuallyexpresstheresultofthecalculationastheratioofobservedtoexpecteddeaths,calledthestandardizedmortalityratioorSMR.ThustheSMRforcirrhosisamongdoctorsis
![Page 515: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/515.jpg)
WeusuallymultiplytheSMRby100togetridofthedecimalpoint,andreporttheSMRas311.Ifwedonotadjustforageatall,theratioofthecrudedeathratesis3.44,comparedtotheageadjustedfigureof3.11,sotheadjustmenthasmadesome,butnotmuch,differenceinthiscase.
Table16.4.Calculationoftheexpectednumberofdeathsduetocirrhosisoftheliveramongpractitioners,usingtheindirectmethod
Agegroup(years)
Standardmortalityrate(a)
Observedpopulationnumberofdoctors(b)
a×b
15–24 0.000005859 1080 0.0063
25–34 0.000013050 12860 0.1678
35–44 0.000046937 11510 0.5402
45–54 0.000161503 10330 1.6683
55–64 0.000271358 7790 2.1139
Total 4.4965
WecancalculateaconfidenceintervalfortheSMRquiteeasily.DenotetheobserveddeathsbyOandexpectedbyE.Itisreasonabletosupposethatthedeathsareindependentofoneanotherandhappening
![Page 516: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/516.jpg)
randomlyintime,sotheobservednumberofdeathsisfromaPoissondistribution(§6.7).ThestandarddeviationofthisPoissondistributionisthesquarerootofitsmeanandsocanbeestimatedbythesquarerootoftheobserveddeaths,√O.Theexpectednumberiscalculatedfromaverymuchlargersampleandissowellestimateditcanbetreatedasaconstant,sothestandarddeviationof100×O/E,whichisthestandarderroroftheSMR,isestimatedby100×√O/E.Providedthenumberofdeathsislargeenough,saymorethan10,anapproximate95%confidenceintervalisgivenby
Forthecirrhosisdatatheformulagives
Theconfidenceintervalclearlyexcludes100andthehighmortalitycannotbeascribedtochance.
ForsmallobservedfrequenciestablesbasedontheexactprobabilitiesofthePoissondistributionareavailable(PearsonandHartley1970).ThecalculationsareeasilydonebycomputerandmyfreeprogramClinstat(§1.3)doesthem.ThereisalsoanexactmethodforcomparingtwoSMRs,whichClinstatdoes.Forthecirrhosisdatatheexact95%confidenceintervalis170to522.Thisis
notquitethesameasthelargesampleapproximation.BetterapproximationsandexactmethodsofcalculatingconfidenceintervalsaredescribedbyMorrisandGardner(1989)andBreslowandDay(1987).
WecanalsotestthenullhypothesisthatinthepopulationtheSMR=100.Ifthenullhypothesisistrue,OisfromaPoissondistributionwithmeanEandhencestandarddeviation√E,providedthesampleislargeenough,sayE>10.Then(O-E)/√EwouldbeanobservationfromtheStandardNormaldistributionifthenullhypothesisweretrue.Thesampleofdoctorsistoosmallforthistesttobereliable,butifitwere,wewouldhave(O-E)/√E=(14-4.4965)/√4.4965=4.48,P=0.0001.Again,thereisanexactmethod.ThisgivesP=0.0005.Assooften
![Page 517: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/517.jpg)
happens,largesamplemethodsbecometooliberalandgivePvalueswhicharetoosmallwhenusedwithsampleswhicharetoosmallforthetesttobevalid.
Thehighlysignificantdifferencesuggeststhatdoctorsareatincreasedriskofdeathfromcirrhosisoftheliver,comparedtoemployedmenasawhole.Thenewsisnotallbadformedicalpractitioners,however.TheirSMRforcancerofthetrachea,bronchusandlungisonly32.Doctorsmaydrink,buttheydonotsmoke!
16.4DemographiclifetablesWehavealreadydiscussedauseofthelifetabletechniquefortheanalysisofclinicalsurvivaldata(§15.6).Thelifetablewasfoundbyfollowingthesurvivalofagroupofsubjectsfromsomestartingpointtodeath.Indemography,whichmeansthestudyofhumanpopulations,thislongitudinalmethodofanalysisisimpractical,becausewecouldonlystudypeoplebornmorethan100yearsago.Demographiclifetablesaregeneratedinadifferentway,usingacross-sectionalapproach.Ratherthanchartingtheprogressofagroupfrombirthtodeath,westartwiththepresentage-specificmortalityrates.Wethencalculatewhatwouldhappentoacohortofpeoplefrombirthiftheseage-specificmortalityratesappliedunchangedthroughouttheirlives.Wedenotetheprobabilityofdyingbetweenagesxandx+1years(theage-specificmortalityrateatagex)byqx.AsinTable15.8,theprobabilityofsurvivingfromagextox+1ispx=1-qx.Wenowsupposethatwehaveacohortofsizel0atage0,i.e.atbirth.l0isusually100000or10000.Thenumberwhowouldstillbealiveafterxyearsislx.Wecanseethatthenumberaliveafterx+1yearsislx+1=px×lx,sogivenallthepxfromx=0onwardswecancalculatethelx.ThecumulativesurvivalprobabilitytoagexisthenPx=lx/l0
Table16.5showsanextractfromLifeTableNumber11,1950–52,forEnglandandWales.Withtheexceptionof1941,alifetablelikethishasbeenproducedevery10yearssince1871,basedonthedecennialcensusyear.Thelifetableisbasedonthecensusyearbecauseonlythendowehaveagoodmeasureofthenumberofpeopleateachage,thedenominatorinthecalculationofqx.Athreeyearperiodisusedto
![Page 518: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/518.jpg)
increasethenumberofdeathsforayearofageandsoimprovetheestimationofqx.Separatetablesareproducedformalesandfemales
becausethemortalityofthetwosexesisverydifferent.Agespecificdeathratesarehigherinmalesthanfemalesateveryage.Betweencensusyearslifetablesarestillproducedbutareonlypublishedinanabridgedform,givinglxatfiveyearintervalsonlyafteragefive(Table16.6).
Table16.5.ExtractfromEnglishLifeTableNumber11,1950–52,Males
Ageinyears
Expectednumberaliveatagex
Probabilityanindividualdiesbetweenagesxandx+1
Expectedlifeatagexyears
x lx qx ex
0 100000 0.03266 66.42
1 96734 0.00241 67.66
2 96501 0.00141 66.82
3 96395 0.00102 65.91
4 96267 0.00084 64.98
. . . .
![Page 519: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/519.jpg)
. . . .
. . . .
100 23 0.44045 1.67
101 13 0.45072 1.62
102 7 0.46011 1.58
103 4 0.46864 1.53
104 2 0.47636 1.50
ThefinalcolumninTables16.5and16.6istheexpectedlife,expectationoflifeorlifeexpectancy,ex.Thisistheaveragelifestilltobelivedbythosereachingagex.Wehavealreadycalculatedthisastheexpectedvalueoftheprobabilitydistributionofyearofdeath(§6E).Wecandothecalculationinanumberofotherways.Forexample,ifweaddlx+1,lx+2,lx+3,etc.wewillgetthetotalnumberofyearstobelived,becausethelx+1whosurvivetox+1willhaveaddedlx+1yearstothetotal,thelx+2ofthesewhosurvivefromx+1tox+2willaddafurtherlx+2years,andsoon.Ifwedividethissumbylxwegettheaveragenumberofwholeyearstobelived.Ifwethenrememberthatpeopledonotdieontheirbirthdays,butscatteredthroughouttheyear,wecanaddhalftoallowfortheaverageofhalfyearlivedintheyearofdeath.Wethusget
i.e.summingthelifromagex+1totheendofthelifetable.
![Page 520: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/520.jpg)
Ifmanypeopledieinearlylife,withhighage-specificdeathratesforchildren,thishasagreateffectonexpectationoflifeatbirth.Table16.7showsexpectationoflifeatselectedagesfromfourEnglishLifeTables(OfficeforNationalStatistics1997).In1991,forexample,expectationoflifeatbirthformaleswas74years,comparedtoonly40yearsin1841,animprovementof34years.Howeverexpectationoflifeatage45in1991was31yearscomparedto23yearsin1841,animprovementofonly8years.Atage65,maleexpectationoflifewas11
yearsin1841and14yearsin1991,anevensmallerchange.Hencethechangeinlifeexpectancyatbirthwasduetochangesinmortalityinearlylife,notlatelife.
Table16.6.AbridgedLifeTable1988–90,EnglandandWales
Age Males Females
x lx ex lx ex
0 10000 73.0 10000 78.5
1 9904 72.7 9928 78.0
2 9898 71.7 9922 77.1
3 9893 70.8 9919 76.1
4 9890 69.8 9916 75.1
5 9888 68.8 9914 74.2
![Page 521: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/521.jpg)
10 9877 63.9 9907 69.2
15 9866 58.9 9899 64.3
20 9832 54.1 9885 59.4
25 9790 49.3 9870 54.4
30 9749 44.5 9852 49.5
35 9702 39.7 9826 44.6
40 9638 35.0 9784 39.8
45 9542 30.3 9718 35.1
50 9375 25.8 9607 30.5
55 9097 21.5 9431 26.0
60 8624 17.5 9135 21.7
65 7836 14.0 8645 17.8
70 6689 11.0 7918 14.2
75 5177 8.4 6869 11.0
![Page 522: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/522.jpg)
80 3451 6.4 5446 8.2
85 1852 4.9 3659 5.9
Thereisacommonmisconceptionthatalifeexpectancyatbirthof40years,asin1841,meantthatmostpeoplediedaboutage40.Forexample(Rowe1992):
Mothershavealwaysprovokedrageandresentmentintheiradultdaughters,whiletheadultdaughtershavealwaysprovokedanguishandguiltintheirmothers.Inpastcenturies,however,suchmatchedmiserydidnotlastforlong.Daughterscouldburytheirrageandresentmentunderaconcernfordutywhiletheycaredfortheirmotherswho,turning40,rapidlyaged,grewfrailanddied.Nowmothersturning40arestrongandhealthy,andonlyhalfwaythroughtheirlives.
Thisisabsurd.AsTable16.7shows,sincelifeexpectancywasfirstestimatedwomenturning40havehadaverageremaininglivesofmorethan20years.Theydidnotrapidlyage,growfrail,anddie.
‘Expectation’isusedinitsstatisticalsenseoftheaverageofadistribution.Itdoesnotmeanthateachpersoncanknowwhentheywilldie.FromthemostrecentlifetableforEnglandandWales,for1994–96(OfficeforNationalStatistics1998a),amanaged53(myself,forexample)hasalifeexpectancyof24years.Thisistheaveragelifetimewhichallmenaged53yearswouldhaveifthepresentage-specificmortalityratesdonotchange.(Theseshouldgodownovertime,puttinglife-spansup.)Abouthalfofthesemenwillhaveshorterlivesandhalflonger.Ifwecouldcalculatelifeexpectanciesformenwithdifferent
combinationsofriskfactors,wemightfindthatmylifeexpectancywouldbedecreasedbecauseIamshort(sounfairIthink)andfatandincreasedbecauseIdonotsmoke(likealmostallmedicalstatisticians)andamofprofessionalsocialclass.Howevermyexpectationoflifewasadjusted,itwouldremainanaverage,notaguaranteedfigureforme.
![Page 523: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/523.jpg)
Table16.7.Lifeexpectancyin1841,1901,1951,and1991,EnglandandWales
Age Sex Expectationoflifeinyears
1841 1901 1951 1991
Birth Males 40 49 66 74
Females 42 52 72 79
15yrs Males 43 47 54 59
Females 44 50 59 65
45yrs Males 23 23 27 31
Females 24 26 31 36
65yrs Males 11 11 12 14
Females 12 12 14 18
Lifetableshaveanumberofuses,bothmedicalandnon-medical.Expectationoflifeprovidesausefulsummaryofmortalitywithouttheneedforastandardpopulation.Thetableenablesustopredictthefuturesizeofandagestructureofapopulationgivenitspresentstate,calledapopulationprojection.Thiscanbeveryusefulinpredictingsuchthingsasthefuturerequirementforgeriatricbedsinahealthdistrict.Lifetablesarealsoinvaluableinnon-medicalapplications,
![Page 524: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/524.jpg)
suchasthecalculationofinsurancepremiums,pensionsandannuities.
Themaindifficultywithpredictionfromalifetableisfindingatablewhichappliestothepopulationsunderconsideration.Forthegeneralpopulationof,say,ahealthdistrict,thenationallifetablewillusuallybeadequate,butforspecialpopulationsthismaynotbethecase.Ifwewanttopredictthefutureneedforcareofaninstitutionalizedpopulation,suchasinalongstaypsychiatrichospitaloroldpeoples'home,themortalitymaybeconsiderablygreaterthanthatinthegeneralpopulation.Predictionsbasedonthenationallifetablecanonlybetakenasaveryroughguide.Ifpossiblelifetablescalculatedonthattypeofpopulationshouldbeused.
16.5VitalstatisticsWehaveseenanumberofoccasionswhereordinarywordshavebeengivenquitedifferentmeaningsinstatisticsfromthosetheyhaveincommonspeech;‘Normal’and‘significant’aregoodexamples.‘Vitalstatistics’istheopposite,atechnicaltermwhichhasacquiredacompletelyunrelatedpopularmeaning.Asfarasthemedicalstatisticianisconcerned,vitalstatisticshavenothingtodowiththedimensionsoffemalebodies.Theyarethestatisticsrelatingtolifeanddeath:birthrates,fertilityrates,marriageratesanddeathrates.Ihavealreadymentionedcrudemortalityrate,age-specificmortalityrates,age-standardized
mortalityrate,standardizedmortalityratio,andexpectationoflife.InthissectionIshalldefineanumberofotherstatisticswhichareoftenquotedinthemedicalliterature.
Theinfantmortalityrateisthenumberofdeathsunderoneyearofagedividedbythenumberoflivebirths,usuallyexpressedasdeathsper1000livebirths.Theneonatalmortalityrateisthesamethingfordeathsinthefirst4weeksoflife.Thestillbirthrateisthenumberofstillbirthsdividedbythetotalnumberofbirths,liveandstill.Astillbirthisachildborndeadafter28weeksgestation.Theperinatalmortalityrateisthenumberofstillbirthsanddeathsinthefirstweekoflifedividedbythetotalbirths,againusuallypresentedper1000births.Infantandperinatalmortalityratesareregardedasparticularly
![Page 525: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/525.jpg)
sensitiveindicatorsofthehealthstatusofthepopulation.Thematernalmortalityrateisthenumberofdeathsofmothersascribedtoproblemsofpregnancyandbirth,dividedbythetotalnumberofbirths.Thebirthrateisthenumberoflivebirthsperyeardividedbythetotalpopulation.Thefertilityrateisthenumberoflivebirthsperyeardividedbythenumberofwomenofchildbearingage,takenas15–44years.
Theattackrateforadiseaseistheproportionofpeopleexposedtoinfectionwhodevelopthedisease.Thecasefatalityrateistheproportionofpeoplewiththediseasewhodiefromit.Theprevalenceofadiseaseistheproportionofpeoplewhohaveitatonepointintime.Theincidenceisthenumberofnewcasesinoneyeardividedbythenumberatrisk.
16.6ThepopulationpyramidTheagedistributionofapopulationcanbepresentedashistogram,usingthemethodsof§4.3.However,becausethemortalityofmalesandfemalesissodifferenttheagedistributionsformalesandfemalesarealsodifferent.Itisusualtopresenttheagedistributionsforthetwosexesseparately.Figure16.1showstheagedistributionsforthemaleandfemalepopulationsofEnglandandWalesin1901.Now,thesehistogramshavethesamehorizontalscale.TheconventionalwaytodisplaythemiswiththeagescaleverticallyandthefrequencyscalehorizontallyasinFigure16.2.Thefrequencyscalehaszerointhemiddleandincreasestotherightforfemalesandtotheleftformales.Thisiscalledapopulationpyramid,becauseofitsshape.
Figure16.3showsthepopulationpyramidforEnglandandWalesin1991.Theshapeisquitedifferent.Insteadofatrianglewehaveanirregularfigurewithalmostverticalsideswhichbegintobendverysharplyinwardsataboutage65.Thepost-warand1960sbabyboomscanbeseenasbulgesatages25–30and40–45.Amajorchangeinpopulationstructurehastakenplace,withavastincreaseintheproportionofelderly.Thishasmajorimplicationsformedicine,asthecareoftheelderlyhasbecomealargeproportionoftheworkofdoctors,nursesandtheircolleagues.Itisinterestingtoseehowthishascomeabout.
![Page 526: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/526.jpg)
Itispopularlysupposedthatpeoplearenowlivingmuchlongerasaresultofmodernmedicine,whichpreventsdeathsinmiddlelife.Thisisonlypartlytrue.
Fig.16.1.AgedistributionsforthepopulationofEnglandandWales,bysex,1901
Fig.16.2.PopulationpyramidforEnglandandWales,1901
![Page 527: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/527.jpg)
Fig.16.3.PopulationpyramidforEnglandandWales,1991
AsTable16.7shows,lifeexpectancyatbirthincreaseddramaticallybetween1901and1991,buttheincreaseinlaterlifeismuchless.Thechangeisnotanextensionofeverylifeby25years,whichwouldbeseenateveryage,butmainlyareductioninmortalityinchildhoodandearlyadulthood.Mortalityinlaterlifehaschangedrelativelylittle.Now,abigreductioninmortalityinchildhoodwouldresultinanincreaseinthebasepartofthepyramid,asmorechildrensurvived,unlesstherewasacorrespondingfallinthenumberofbabiesbeingborn.Inthe19thcentury,womenwerehavingmanychildrenanddespitethehighmortalityinchildhoodthenumberwhosurvivedintoadulthoodtohavechildrenoftheirownexceededthatoftheirownparents.Thepopulationexpandedandthishistoryisembodiedinthe1901populationpyramid.Inthe20thcentury,infantmortalityfellandpeoplerespondedtothisbyhavingfewerchildren.In1841–45,theinfantmortalityrateswere148per1000livebirths,138in1901–05,10
![Page 528: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/528.jpg)
in1981–85(OPCS1992)andonly5.9in1997(OfficeforNationalStatistics1999).Thebirthratewas32.2per1000populationperyearin1841–45,in1901–05itwas28.2,andin1987–97itwas13.5(OfficeforNationalStatistics1998b).Thebaseofthepyramidceasedtoexpand.Asthosewhowereinthebaseofthe1901pyramidgrewolder,thepopulationinthetophalfofthepyramidincreased.Thesurvivorsofthe0–4agegroupinthe1901pyramidarethe90+agegroupinthe1991pyramid.Hadthebirthratenotfallen,thepopulationwouldhavecontinuedtoexpandandwewouldhaveasgreatorgreateraproportionofyoungpeoplein1991aswedidin1901,andavastlylargerpopulation.Thustheincreaseintheproportionoftheelderlyisnotprimarilybecauseadultliveshavebeenextended,althoughthishasasmalleffect,butbecausefertilityhasdeclined.Lifeexpectancyfortheelderlyhaschangedrelativelylittle.MostdevelopedcountrieshavestablepopulationpyramidslikeFigure16.3andthoseofmostdevelopingcountrieshaveexpandingpyramidslikeFigure16.2.
16MMultiplechoicequestions87to92(Eachbranchiseithertrueorfalse)
87.Age-specificmortalityrate:
(a)isaratioofobservedtoexpecteddeaths;
(b)canbeusedtocomparemortalitybetweendifferentagegroups;
(c)isanageadjustedmortalityrate;
(d)measuresthenumberofdeathsinayear;
(e)measurestheagestructureofthepopulation.
ViewAnswer
88.Expectationoflife:
(a)isthenumberofyearsmostpeoplelive;
(b)isawayofsummarizingage-specificdeathrates:
![Page 529: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/529.jpg)
(c)istheexpectedvalueofaparticularprobabilitydistribution;
(d)varieswithage:
(e)isderivedfromlifetables.
ViewAnswer
89.Inastudyofpost-natalsuicide(Appleby1991),theSMRforsuicideamongwomenwhohadjusthadababywas17witha95%confidenceinterval14to21(allwomen=100).Forwomenwhohadhadastillbirth,theSMRwas105(95%confidenceinterval31to277).Wecanconcludethat:
(a)womenwhohadjusthadababywerelesslikelytocommitsuicidethanotherwomenofthesameage;
(b)womenwhohadjusthadastillbirthwerelesslikelytocommitsuicidethanotherwomenofthesameage;
(c)womenwhohadjusthadalivebabywerelesslikelytocommitsuicidethanwomenofthesameagewhohadhadastillbirth:
(d)itispossiblethathavingastillbirthincreasestheriskofsuicide;
(e)suicidalwomenshouldhavebabies.
ViewAnswer
90.In1971,theSMRforcirrhosisoftheliverformenwas773forpublicansandinnkeepersand25forwindowcleaners,bothbeingsignificantlydifferentfrom100(DonnanandHaskey1977).Wecanconcludethat:
(a)publicansaremorethan7timesaslikelyastheaveragepersontodiefromcirrhosisoftheliver;
(b)thehighSMRforpublicansmaybebecausetheytendtobefoundintheolderagegroups;
(c)beingapublicancausescirrhosisoftheliver;
![Page 530: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/530.jpg)
(d)windowcleaningprotectsmenfromcirrhosisoftheliver;
(e)windowcleanersareathighriskofcirrhosisoftheliver.
ViewAnswer
91.Theageandsexstructureofapopulationmaybedescribedby:
(a)alifetable;
(b)acorrelationcoefficient;
(c)astandardizedmortalityratio;
(d)apopulationpyramid;
(e)abarchart.
ViewAnswer
92.Thefollowingstatisticsareadjustedtoallowfortheagedistributionofthepopulation:
(a)age-standardizedmortalityrate;
(b)fertilityrate;
(c)perinatalmortalityrate;
(d)crudemortalityrate;
(e)expectationoflifeatbirth.
ViewAnswer
16EExercise:DeathsfromvolatilesubstanceabuseAndersonetal.(1985)studiedmortalityassociatedwithvolatilesubstanceabuse(VSA),oftencalledgluesniffing.InthisstudyallknowndeathsassociatedwithVSAfrom1971to1983inclusivewerecollected,usingsourcesincludingthreepresscuttingsagenciesandasix-monthlysystematicsurveyofallcoroners.CaseswerealsonotifiedbytheOfficeofPopulationCensusesandSurveysforEnglandandWalesandbytheCrownOfficeandprocuratorsfiscalinScotland.
![Page 531: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/531.jpg)
Table16.8showstheagedistributionofthesedeathsforGreatBritainandforScotlandalone,withthecorrespondingagedistributionsatthe1981decennialcensus.
1.Calculateage-specificmortalityratesforVSAperyearandforthewholeperiod.Whatisunusualabouttheseage-specificmortalityrates?
ViewAnswer
2.CalculatetheSMRforVSAdeathsforScotland.
ViewAnswer
3.Calculatethe95%confidenceintervalforthisSMR.
ViewAnswer
4.DoesthenumberofdeathsinScotlandappearparticularlyhigh?Apartfromalotofgluesniffing,arethereanyotherfactorswhichshouldbeconsideredaspossibleexplanationsforthisfinding?
ViewAnswer
Table16.8.Volatilesubstanceabusemortalityandpopulationsize,GreatBritainandScotland.1971–83
(Andersonetal.1985)
Agegroup(years) GreatBritain Scotland
VSAdeaths Population(thousands)
VSAdeaths
Population(thousands)
0–9 0 6770 0 653
![Page 532: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/532.jpg)
10–14 44 4271 13 425
15–19 150 4467 29 447
20–24 45 3959 9 394
25–29 15 3616 0 342
30–39 8 7408 0 0659
40–49 2 6055 0 574
50–59 7 6242 0 579
60+ 4 10769 0 962
![Page 533: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/533.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>17-Multifactorialmethods
17
Multifactorialmethods
17.1*MultipleregressionInChapters10and11welookedatmethodsofanalysingtherelationshipbetweenacontinuousoutcomevariableandapredictor.Thepredictorcouldbequantitative,asinregression,orqualitative,asinone-wayanalysisofvariance.Inthischapterweshalllookattheextensionofthesemethodstomorethanonepredictorvariable,anddescriberelatedmethodsforusewhentheoutcomeisdichotomousorcensoredsurvivaldata.Thesemethodsareverydifficulttodobyhandandcomputerprogramsarealwaysused.Ishallomittheformulae.
Table17.1showstheages,heightsandmaximumvoluntarycontractionofthequadricepsmuscle(MVC)inagroupofmalealcoholics.TheoutcomevariableisMVC.Figure17.1showstherelationshipbetweenMVCandheight.Wecan
fitaregressionlineoftheformMVC=a+b×height(§11.2–3).ThisenablesustopredictwhatthemeanMVCwouldbeformenofanygivenheight.ButMVCvarieswithotherthingsbesideheight.Figure17.2showstherelationshipbetweenMVCandage.
Table17.1.Maximumvoluntarycontraction(MVC)ofquadricepsmuscle,ageandheight,of41male
alcoholics(Hickishetal.1989)
![Page 534: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/534.jpg)
Age(years)
Height(cm)
MVC(newtons)
Age(years)
Height(cm)
MVC(newtons)
24 166 466 42 178 417
27 175 304 47 171 294
28 173 343 47 162 270
28 175 404 48 177 368
31 172 147 49 177 441
31 172 294 49 178 392
32 160 392 50 167 294
32 172 147 51 176 368
32 179 270 53 159 216
32 177 412 53 173 294
34 175 402 53 175 392
34 180 368 53 172 466
35 167 491 55 170 304
![Page 535: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/535.jpg)
37 175 196 55 178 324
38 172 343 55 155 196
39 172 319 58 160 98
39 161 387 61 162 216
39 173 441 62 159 196
40 173 441 65 168 137
41 168 343 65 168 74
41 178 540
Fig.17.1.Musclestrength(MVC)againstheight
![Page 536: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/536.jpg)
Fig.17.2.Musclestrength(MVC)againstage
Wecanshowthestrengthsofthelinearrelationshipsbetweenallthreevariablesbytheircorrelationmatrix.Thisisatabulardisplayofthecorrelationcoefficientsbetweeneachpairofvariables,matrixbeingusedinitsmathematicalsenseasarectangulararrayofnumbers.ThecorrelationmatrixforthedataofTable17.1isshowninTable17.2.Thecoefficientsofthemaindiagonalareall1.0,becausetheyshowthecorrelationofthevariablewithitself,andthecorrelationmatrixissymmetricalaboutthisdiagonal.Becauseofthissymmetrymanycomputerprogramsprintonlythepartofthematrixbelowthediagonal.InspectionofTable17.2showsthatoldermenwereshorterandweaker
thanyoungermen.thattallermenwerestrongerthanshortermen,andthatthemagnitudesofallthreerelationshipswassimilar.ReferencetoTable11.2with41-2=39degreesoffreedomshowsthatallthreecorrelationsaresignificant.
![Page 537: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/537.jpg)
Table17.2.CorrelationmatrixforthedataofTable17.1
Age Height MVC
Age 1.000 -0.338 -0.417
Height -0.338 1.000 0.419
MVC -0.417 0.419 1.000
WecouldfitaregressionlineoftheformMVC=a+b×age,fromwhichwecouldpredictthemeanMVCforanygivenage.However,MVCwouldstillvarywithheight.Toinvestigatetheeffectofbothageandheight,wecanusemultipleregressiontofitaregressionequationoftheform
MVC=b0+b1×height+b2×age
Thecoefficientsarecalculatedbyaleastsquaresprocedure,exactlythesameinprincipleasforsimpleregression.Inpractice,thisisalwaysdoneusingacomputerprogram.ForthedataofTable17.1,themultipleregressionequationis
MVC=-466+5.40×height-3.08×age
Fromthis,wewouldestimatethemeanMVCofmenwithanygivenageandheight,inthepopulationofwhichtheseareasample.
Thereareanumberofassumptionsimplicithere.OneisthattherelationshipbetweenMVCandheightisthesameateachage,thatis,thatthereisnointeractionbetweenheightandage.AnotheristhattherelationshipbetweenMVCandheightislinear,thatisoftheformMVC=a+b×height.Multipleregressionanalysisenablesustotestbothoftheseassumptions.
Multipleregressionisnotlimitedtotwopredictorvariables.Wecan
![Page 538: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/538.jpg)
haveanynumber,althoughthemorevariableswehavethemoredifficultitbecomestointerprettheregression.Wemust,however,havemorepointsthanvariables,andasthedegreesoffreedomfortheresidualvariancearen-1-qifqvariablesarefitted,andthisshouldbelargeenoughforsatisfactoryestimationofconfidenceintervalsandtestsofsignificance.Thiswillbecomeclearafterthenextsection.
17.2*SignificancetestsandestimationinmultipleregressionAswesawin§11.5,thesignificanceofasimplelinearregressionlinecanbetestedusingthetdistribution.Wecancarryoutthesametestusinganalysisofvariance.FortheFEV1andheightdataofTable11.1thesumsofsquaresandproductswerecalculatedin§11.3.ThetotalsumofsquaresforFEV1isSyy=9.43868,withn-1=19degreesoffreedom.Thesumofsquaresduetoregressionwascalculatedin§11.5tobe3.18937.Theresidualsumofsquares,i.e.thesumofsquaresabouttheregressionline,isfoundbysubtractionas9.43868-3.18937=6.24931,andthishasn-2=18degreesoffreedom.We
cannowsetupananalysisofvariancetableasdescribedin§10.9,showninTable17.3.
Table17.3.AnalysisofvariancefortheregressionofFEV1onheight
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 19 9.43868
![Page 539: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/539.jpg)
Duetoregression
1 3.18937
3.18937
9.19
Residual(aboutregression)
18 6.24931
0.34718
Table17.4.AnalysisofvariancefortheregressionofMVConheightandage
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 40 503344
Regression 2 131495
65748
6.72
Residual 38 371849
9785
Notethatthesquarerootofthevarianceratiois3.03,thevalueoftfoundin§11.5.Thetwotestsareequivalent.Notealsothattheregressionsumofsquaresdividedbythetotalsumofsquares=3.18937/9.43868=0.3379isthesquareofthecorrelationcoefficient,r=0.58(§11.5,§11.10).Thisratio,sumofsquaresduetoregressionovertotalsumofsquares,istheproportionofthevariabilityaccountedfor
![Page 540: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/540.jpg)
bytheregression.Thepercentagevariabilityaccountedfororexplainedbytheregressionis100timesthis,i.e.34%.
ReturningtotheMVCdata,wecantestthesignificanceoftheregressionofMVConheightandagetogetherbyanalysisofvariance.Ifwefittheregressionmodelin§17.1,theregressionsumofsquareshastwodegreesoffreedom,becausewehavefittedtworegressioncoefficients.TheanalysisofvariancefortheMVCregressionisshowninTable17.4.
Theregressionissignificant;itisunlikelythatthisassociationcouldhavearisenbychanceifthenullhypothesisweretrue.Theproportionofvariabilityaccountedfor,denotedbyR2,is131495/503344=0.26.Thesquarerootofthisiscalledthemultiplecorrelationcoefficient,R.R2mustliebetween0and1,andasnomeaningcanbegiventothedirectionofcorrelationinthemultivariatecase,Risalsotakenaspositive.ThelargerRis,themorecloselycorrelatedwiththeoutcomevariablethesetofpredictorvariablesare.WhenR=1thevariablesareperfectlycorrelatedinthesensethattheoutcomevariableisalinearcombinationoftheothers.Whentheoutcomevariableisnotlinearlyrelatedtoanyofthepredictorvariables,Rwillbesmall,butnotzero.
Wemaywishtoknowwhetherbothoronlyoneofourvariablesleadstotheassociation.Todothis,wecancalculateastandarderrorforeachregressioncoefficient(Table17.5).Thiswillbedoneautomaticallybytheregressionprogram.Wecanusethistotesteachcoefficientseparatelybyattest.Wecan
alsofindaconfidenceintervalforeach,usingtstandarderrorsoneithersideoftheestimate.Fortheexample,bothageandheighthaveP=0.04andwecanconcludethatbothageandheightareindependentlyassociatedwithMVC.
Table17.5.CoefficientsfortheregressionofMVConheightandage,withstandarderrorsandconfidenceintervals
![Page 541: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/541.jpg)
Predictorvariable Coefficient Standard
errortratio P
95%Confidenceinterval
height 5.40 2.55 2.12 0.04 0.25to10.55
age -3.08 1.47 -2.10 0.04 -6.05to-0.10
intercept -465.63 460.33 -1.01 0.3 -1397.52to466.27
Adifficultyariseswhenthepredictorvariablesarecorrelatedwithoneanother.Thisincreasesthestandarderroroftheestimates,andvariablesmayhaveamultipleregressioncoefficientwhichisnotsignificantdespitebeingrelatedtotheoutcomevariable.Wecanseethatthiswillbesomostclearlybytakinganextremecase.Supposewetrytofit
MVC=b0+b1×height+b2×height
FortheMVCdata
MVC=-908+6.20×height+1.00×height
isaregressionequationwhichminimizestheresidualsumofsquares.However,itisnotunique,because
MVC=-908+5.20×height+2.00×height
willdosotoo.ThetwoequationsgivethesamepredictedMVC.Thereisnouniquesolution,andsonoregressionequationcanbefitted,eventhoughthereisaclearrelationshipbetweenMVCandheight.Whenthepredictorvariablesarehighlycorrelatedtheindividualcoefficientswillbepoorlyestimatedandhavelargestandarderrors.Correlatedpredictorvariablesmayobscuretherelationshipofeachwiththe
![Page 542: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/542.jpg)
outcomevariable.
Adifferent(andequivalent)wayoftestingtheeffectsoftwocorrelatedpredictorvariablesseparatelyistoproceedasfollows.Wefitthreemodels:
1. MVConheightandage,regressionsumofsquares=131495,d.f.=2
2. MVConheight,regressionsumofsquares=88511,d.f.=1
3. MVConage,regressionsumofsquares=87471,d.f.=1
Notethat88511+87471=175982isgreaterthan131495.Thisisbecauseageandheightarecorrelated.Wethentesttheeffectofheightifageistakenintoaccount,referredtoastheeffectofheightgivenage.Theregressionsumofsquaresforheightgivenageistheregressionsumofsquares(ageandheight)minusregressionsumofsquares(ageonly),whichis131495-87471=44024.Thishasdegreesoffreedom=2-1=1.Similarly,theeffectofageallowing
forheight,i.e.agegivenheight,istestedbyregressionsumofsquares(ageandheight)minusregressionsumofsquares(heightonly)=131495-88511=42984,withdegreesoffreedom=2-1=1.Wecansetallthisoutinananalysisofvariancetable(Table17.6).Thethirdtosixthrowsofthetableareindentedforthesourceofvariation,degreesoffreedomandsumofsquarescolumns,toindicatethattheyaredifferentwaysoflookingatvariationalreadyaccountedforinthesecondrow.Theindentedrowsarenotincludedwhenthedegreesoffreedomandsumsofsquaresareaddedtogivethetotal.AfteradjustmentforagethereisstillevidenceofarelationshipbetweenMVCandheight,andafteradjustmentforheightthereisstillevidenceofarelationshipbetweenMVCandage.NotethatthePvaluesarethesameasthosefoundbyattestfortheregressioncoefficient.Thisapproachisessentialforqualitativepredictorvariableswithmorethantwocategories(§17.6),whenseveraltstatisticsmaybeprintedforthevariable.
Table17.6.AnalysisofvariancefortheregressionofMVCon
![Page 543: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/543.jpg)
heightandage,showingadjustedsumsofsquares
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 40 503344
Regression 2 131495
65748
6.72
Agealone
1 87471
87471
8.94
Heightgivenage
1 44024
44024
4.50
Heightalone
1 88511
88511
9.05
Agegivenheight
1 42984
42984
4.39
Residual 38 371849
9785
17.3*InteractioninmultipleregressionAninteractionbetweentwopredictorvariablesariseswhentheeffect
![Page 544: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/544.jpg)
ofoneontheoutcomedependsonthevalueoftheother.Forexample,tallmenmaybestrongerthanshortmenwhentheyareyoung,butthedifferencemaydisappearastheyage.
Wecantestforinteractionasfollows.Wehavefitted
MVC=b0+b1×height+b2×age
Aninteractionmaytaketwosimpleforms.Asheightincreases,theeffectofagemayincreasesothatthedifferenceinMVCbetweenyoungandoldtallmenisgreaterthanthedifferencebetweenyoungandoldshortmen.Alternatively,asheightincreases,theeffectofagemaydecrease.Morecomplexinteractionsarebeyondthescopeofthisdiscussion.Now,ifwefit
MVC=b0+b1×height+b2×age+b3×height×age
forfixedheighttheeffectofageisb2+b3×height.Ifthereisnointeraction,theeffectofageisthesameatallheights,andb3willbezero.Ofcourse,b3willnot
beexactlyzero,butonlywithinthelimitsofrandomvariation.Wecanfitsuchamodeljustaswefittedthefirstone.Weget
Table17.7.Analysisofvariancefortheinteractionofheightandage
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 40 503344
Regression 3 202 67 8.32 0.0002
![Page 545: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/545.jpg)
719 573
Heightandage
2 131495
65748
8.09 0.001
Height×age
1 71224
71224
8.77 0.005
Residual 37 300625
8125
MVC=4661-24.7×height-112.8×age+0.650×height×age
Theregressionisstillsignificant,aswewouldexpect.However,thecoefficientsofheightandagehavechanged;theyhaveevenchangedsign.Thecoefficientofheightdependsonage.Theregressionequationcanbewritten
MVC=4661+(-24.7+0.650×age)×height-112.8×age
Thecoefficientofheightdependsonage,thedifferenceinstrengthbetweenshortandtallsubjectsbeinggreaterforoldersubjectsthanforyounger.
TheanalysisofvarianceforthisregressionequationisshowninTable17.7.Theregressionsumofsquaresisdividedintotwoparts:thatduetoageandheight,andthatduetotheinteractiontermafterthemaineffectsofageandheighthavebeenaccountedfor.TheinteractionrowisthedifferencebetweentheregressionrowinTable17.7,whichhas3degreesoffreedom,andtheregressionrowinTable17.4,whichhas2.Fromthisweseethattheinteractionishighlysignificant.TheeffectsofheightandageonMVCarenotadditive.Anotherexampleoftheinvestigationofapossibleinteractionisgivenin§17.7.
17.4*PolynomialregressionSofar,wehaveassumedthatalltheregressionrelationshipshavebeen
![Page 546: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/546.jpg)
linear,i.e.thatwearedealingwithstraightlines.Thisisnotnecessarilyso.Wemayhavedatawheretheunderlyingrelationshipisacurveratherthanastraightline.Unlessthereisatheoreticalreasonforsupposingthataparticularformoftheequation,suchaslogarithmicorexponential,isneeded,wetestfornon-linearitybyusingapolynomial.Clearly,ifwecanfitarelationshipoftheform
MVC=b0+b1×height+b2×age
wecanalsofitoneoftheform
MVC=b0+b1×height+b2×height2
Table17.8.AnalysisofvarianceforpolynomialregressionofMVConheight
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 40 503344
Regression 2 89103 44552
4.09 0.02
Linear 1 88522
88522
7.03 0.01
Quadratic 1 581 581 0.05 0.8
![Page 547: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/547.jpg)
Residual 38 414241
12584
togiveaquadraticequation,andcontinueaddingpowersofheighttogiveequationswhicharecubic,quartic,etc.
Heightandheightsquaredarehighlycorrelated,whichcanleadtoproblemsinestimation.Toreducethecorrelation,wecansubtractanumberclosetomeanheightfromheightbeforesquaring.ForthedataofTable17.1,thecorrelationbetweenheightandheightsquaredis0.9998.Meanheightis170.7cm,so170isaconvenientnumbertosubtract.Thecorrelationbetweenheightandheightminus170squaredis-0.44,sothecorrelationhasbeenreduced,thoughnoteliminated.Theregressionequationis
MVC=-961+7.49×height+0.092×(height-170)2
Totestfornon-linearity,weproceedasin§17.2.Wefittworegressionequations,alinearandaquadratic.Thenon-linearityisthentestedbythedifferencebetweenthesumofsquaresduetothequadraticequationandthesumofsquaresduetothelinear.TheanalysisofvarianceisshowninTable17.8.Inthiscasethequadratictermisnotsignificant,sothereisnoevidenceofnon-linearity.Werethequadratictermsignificant,wecouldfitacubicequationandtesttheeffectofthecubicterminthesameway.Polynomialregressionofonevariablecanbecombinedwithordinarylinearregressionofotherstogiveregressionequationsoftheform
MVC=b0+b1×height+b2×height2+b3×age
andsoon.RoystonandAltman(1994)haveshownthatquitecomplexcurvescanbefittedwithasmallnumberofcoefficientsifweuselog(x)andpowers-1,0.5,0.5,1and2intheregressionequation.
17.5*AssumptionsofmultipleregressionFortheregressionestimatestobeoptimalandtheFtestsvalid,theresiduals(thedifferencebetweenobservedvaluesofthedependentvariableandthosepredictedbytheregressionequation)shouldfollow
![Page 548: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/548.jpg)
aNormaldistributionandhavethesamevariancethroughouttherange.Wealsoassumethattherelationshipswhichwearemodellingarelinear.Theseassumptionsarethesameasforsimplelinearregression(§11.8)andcanbecheckedgraphicallyinthesameway,usinghistograms,Normalplotsandscatterdiagrams.IftheassumptionsofNormal
distributionanduniformvariancearenotmet,wecanuseatransformationasdescribedin§10.4and§11.8.Non-linearitycanbedealtwithusingpolynomialregression.
Fig.17.3.HistogramandNormalplotofresidualsofMVCaboutheightandage
![Page 549: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/549.jpg)
Fig.17.4.ResidualsagainstobservedMVC,tocheckuniformityofvariance,andage,tochecklinearity
TheregressionequationofstrengthonheightandageisMVC=-466+5.40×height-3.08×ageandtheresidualsaregivenby
residual=MVC-(-466+5.40×height-3.08×age)
Figure17.3showsahistogramandaNormalplotoftheresidualsfortheMVCdata.Thedistributionlooksquitegood.Figure17.4showsaplotofresidualsagainstMVC.Thevariabilitylooksuniform.Wecanalsocheckthelinearitybyplottingresidualsagainstthepredictorvariables.Figure17.4alsoshowstheresidualagainstage.Thereisanindicationthatresidualmayberelatedtoage.Thepossibilityofanonlinearrelationshipcanbecheckedbypolynomialregression,which,inthiscase,doesnotproduceaquadratictermwhichapproachessignificance.
17.6*QualitativepredictorvariablesIn§17.1thepredictorvariables,heightandage,werequantitative.Inthestudyfromwhichthesedatacome,wealsorecordedwhetherornotsubjectshad
cirrhosisoftheliver.Cirrhosiswasrecordedas‘present’or‘absent’,sothevariablewasdichotomous.Itiseasytoincludesuchvariablesaspredictorsinmultipleregression.Wecreateavariablewhichis0ifthecharacteristicisabsent,1ifpresent,andusethisintheregressionequationjustaswedidheight.Theregressioncoefficientofthisdichotomousvariableisthedifferenceinthemeanoftheoutcomevariablebetweensubjectswiththecharacteristicandsubjectswithout.Ifthecoefficientinthisexamplewerenegative,itwouldmeanthatsubjectswithcirrhosiswerenotasstrongassubjectswithoutcirrhosis.Inthesameway,wecanusesexasapredictorvariablebycreatingavariablewhichis0forfemalesand1formales.Thecoefficientthenrepresentsthedifferenceinmeanbetweenmaleandfemale.Ifweuseonlyone,dichotomouspredictorvariableintheequation,theregressionisexactlyequivalenttoatwo-samplettestbetweenthe
![Page 550: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/550.jpg)
groupsdefinedbythevariable(§10.3).
Apredictorvariablewithmorethantwocategoriesorclassesiscalledaclassvariableorafactor.Wecannotsimplyuseaclassvariableintheregressionequation,unlesswecanassumethattheclassesareorderedinthesamewayastheircodes,andthatadjoiningclassesareinsomesensethesamedistanceapart.Forsomevariables,suchasthediagnosisdataofTable4.1andthehousingdataofTable13.1,thisisabsurd.Forothers,suchastheAIDScategoriesofTable10.7,itisaverystrongassumption.Whatwedoinsteadistocreateasetofdichotomousvariablestorepresentthefactor.FortheAIDSdataofTable10.7,wecancreatethreevariables:
hiv1=1ifsubjecthasAIDS,0otherwise
hiv2=1ifsubjecthasARC,0otherwise
hiv3=1ifsubjectisHIVpositivebuthasnosymptoms,0otherwise
IfthesubjectisHIVnegative,allthreevariablesarezero.hiv1,hiv2,andhiv3arecalleddummyvariables.Somecomputerprogramswillcalculatethedummyvariablesautomaticallyifthevariableisdeclaredtobeafactor,forotherstheusermustdefinethem.Weputthethreedummyvariablesintotheregressionequation.Thisgivestheequation:
mannitol=11.4-0.066×hiv1-2.56×hiv2-1.69×hiv3
Eachcoefficientisthedifferenceinmannitolabsorptionbetweentheclassrepresentedbythatvariableandtheclassrepresentedbyalldummyvariablesbeingzero,HIVnegative,calledthereferenceclass.TheanalysisofvarianceforthisregressionequationisshowninTable17.9,andtheFtestshowsthatthereisnosignificantrelationshipbetweenmannitolabsorptionandHIVstatus.Theregressionprogramprintsoutstandarderrorsandttestsforeachdummyvariable,butthesettestsshouldbeignored,becausewecannotinterpretonedummyvariableinisolationfromtheothers.
![Page 551: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/551.jpg)
Table17.9.AnalysisofvariancefortheregressionofmannitolexcretiononHIVstatus
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 58 1559.035
Regression 3 49.011 16.337 0.60 0.6
Residual 55 1510.024
27.455
Table17.10.Two-wayanalysisofvarianceformannitolexcretion,withHIVstatusanddiarrhoeaasfactors
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 58 1559.035
Model 4 134.880 33.720 1.28 0.3
![Page 552: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/552.jpg)
HIV 3 58.298 19.432 0.74 0.5
Diarrhoea 1 85.869 85.869 3.26 0.08
Residual 54 1424.155
26.373
17.7*Multi-wayanalysisofvarianceAdifferentapproachtotheanalysisofmultifactorialdataisprovidedbythedirectcalculationofanalysisofvariance.Table17.9isidenticaltotheonewayanalysisofvarianceforthesamedatainTable10.8.Wecanalsoproduceanalysesofvarianceforseveralfactorsatonce.Table17.10showsthetwo-wayanalysisofvarianceforthemannitoldata,thefactorsbeingHIVstatusandpresenceorabsenceofdiarrhoea.Thiscouldbeproducedequallywellbymultipleregressionwithtwocategoricalpredictorvariables.IftherewerethesamenumberofpatientswithandwithoutdiarrhoeaineachHIVgroupthefactorswouldbebalanced.ThemodelsumofsquareswouldthenbethesumofthesumsofsquaresforHIVandfordiarrhoea,andthesecouldbecalculatedverysimplyfromthetotaloftheHIVgroupsandthediarrhoeagroups.Forbalanceddatawecanassessmanycategoricalfactorsandtheirinteractionsquiteeasilybymanualcalculation.SeeArmitageandBerry(1994)fordetails.Complexmultifactorialbalancedexperimentsarerareinmedicalresearch,andtheycanbeanalysedbyregressionanywaytogetidenticalresults.Mostcomputerprogramsinfactusetheregressionmethodtocalculateanalysesofvariance.
Foranotherexample,considerTable17.11,whichshowstheresultsofastudyoftheproductionofTumourNecrosisFactor(TNF)bycellsinvitro.Twodifferentpotentialstimulatingfactors,Mycobacteriumtuberculosis(MTB)andFixedActivatedT-cells(FAT),havebeenadded,singlyandtogether.Cellsfromthesame11donorshavebeenusedthroughout.Thuswehavethreefactors,MTB,FAT,anddonor.Threemeasurementsweremadeateachcombinationoffactors;Figure
![Page 553: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/553.jpg)
17.5(a)showsthemeansofthesesetsofthree.Everypossiblecombinationoffactorsisusedthesamenumberoftimesinaperfectthree-wayfactorialarrangement.Therearetwomissingobservations.Thesethingshappen,eveninthebestregulatedlaboratories.TherearesomenegativevaluesofTNF.
Table17.11.TNFmeasuredunderfourdifferentconditionsusingcellsfrom11donors(dataofDr.JanDavies)
NoMTB MTB
FAT Donor TNF,3replicates FAT Donor
No 1 -0.01 -0.01 -0.13 No 1
No 2 16.13 -9.62 -14.88 No 2
No 3 Missing -0.3 -0.95 No 3
No 4 3.63 47.5 55.2 no 4
No 5 -3.21 -5.64 -5.32 No 5
No 6 16.26 52.21 17.93 No 6
No 7 -12.74 -5.23 -4.06 No 7
No 8 -4.67 20.1 110 No 8
![Page 554: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/554.jpg)
No 9 -5.4 20 10.3 No 9
No 10 -10.94 -5.26 -2.73 No 10
No 11 -4.19 -11.83 -6.29 No 11
Yes 1 88.16 97.58 66.27 Yes 1
Yes 2 196.5 114.1 134.2 Yes 2
Yes 3 6.02 1.19 3.38 Yes 3
Yes 4 935.4 1011 951.2 Yes 4
Yes 5 606 592.7 608.4 Yes 5
Yes 6 1457 1349 1625 Yes 6
Yes 7 1457 1349 1625 Yes 7
Yes 8 196.7 270.8 160.7 Yes 8
Yes 9 135.2 221.5 268 Yes 9
Yes 10 -14.47 79.62 304.1 Yes 10
Yes 11 516.3 585.9 562.6 Yes 11
![Page 555: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/555.jpg)
Fig.17.5.TumourNecrosisFactor(TNF)measuredinthepresenceandabsenceofFixedActivatedT-cells(FAT)andMycobacteriumtuberculosis(MTB),thenaturalandatransformedscale
ThisdoesnotmeanthatthecellsweresuckingTNFinfromtheirenvironment,butwasanartifactoftheassaymethodandrepresentsmeasurementerror.
ThesubjectmeansareshowninFigure17.5(a).Thissuggestsseveralthings:thereisastrongdonoreffect(donor6isalwayshigh,donor3isalwayslow,forexample),MTBandFATeachincreaseTNF,bothtogetherhaveagreatereffectthaneitherindividually,thedistributionofTNFishighlyskew,thevarianceofTNFvariesgreatlyfromgrouptogroup,andincreaseswiththemean.AsthemeanforMTBandFATcombinedismuchgreaterthanthesumoftheir
individualmeans,theresearcherthoughttherewassynergy,i.e.thatMTBandFATworkedtogether,thepresenceofoneenhancingtheeffectoftheother.Shewasseekingstatisticalsupportforthisconclusion(JanDavies,personalcommunication).
Table17.12.AnalysisofvariancefortheeffectsofMTB,FATanddonorontransformedTNF
![Page 556: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/556.jpg)
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 43 194.04030
Donor 10 38.89000
3.88900
3.72 0.003
MTB 1 58.49320
58.49320
55.88 <0.0001
FAT 1 65.24482
65.24482
62.33 <0.0001
MTB×FAT
1 0.00811
0.00811
0.01 0.9
Residual 30 31.40418
1.04681
Forstatisticalanalysis,wewouldlikeNormaldistributionswithuniformvariancesbetweenthegroups.Alogtransformationlookslikeagoodbet,butsomeobservationsarenegative.Asthelog(orthesquareroot)willnotworkfornegativenumbers,wehavetoadjustthedatafurther.Theeasiestapproachistoaddaconstanttoalltheobservationsbeforetransformation.Ichose20,whichmakesalltheobservationspositivebutissmallcomparedtomostoftheobservations.Ididthisbytrialanderror.AsFigure17.5(b)shows,thetransformationhasnotbeentotallysuccessful,butthetransformeddatalookmuchmoreamenable
![Page 557: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/557.jpg)
toaNormaltheoryanalysisthandotherawdata.
TherepeatedmeasurementsgiveusamoreaccuratemeasurementofTNF,butdonotcontributeanythingelse.IthereforeanalysedthemeantransformedTNF.TheanalysisofvarianceisshowninTable17.12.Donorisafactorwith11categories,hencehas10degreesoffreedom.Itisnotofanyimportancetothesciencehere,butiswhatwecallanuisancevariable,oneweneedtoallowforbutarenotinterestedin.IhaveincludedaninteractionbetweenMTBandFAT,becauselookingforthisisoneoftheobjectivesoftheexperiment.ThemaineffectsofMTBandFATarehighlysignificant,buttheinteractiontermisnot.TheestimatesoftheeffectswiththeirconfidenceintervalsareshowninTable17.13.Astheanalysiswasonalogscale,theantilogs(exponentials)arealsoshown.Theantiloggivesustheratioofthe(geometric)meaninthepresenceofthefactortothemeanintheabsenceofthefactor,i.e.theamountbywhichTNFismultipliedbywhenthefactorispresent.Strictlyspeaking,ofcourse,itistheratioofthegeometricmeansofTNFplus20,butas20issmallcomparedtomostTNFmeasurementstheratiowillbeapproximatelytheincreaseinTNF.
Theestimatedinteractionissmallandnotsignificant.Theconfidenceintervaliswide(thesampleisverysmall),sowecannotexcludethepossibilityofaninteraction,butthereiscertainlynoevidencethatoneexists.Thiswasnotwhattheresearcherexpected.Thiscontradictioncomesaboutbecausethestatisticalmodelusedisofadditiveeffectsonthelogarithmicscale,i.e.ofmultiplicativeeffectsonthenaturalscale.Thisisforcedonusbythenatureofthedata.The
lackofinteractionbetweentheeffectsshowsthatthedataareconsistentwiththismodel,thisviewofwhatishappening.ThelackofinteractioncanbeseenquiteclearlyinFigure17.5(b),asthemeanforMTBandFATlooksverysimilartothesumofthemeansforMTBaloneandFATalone.
Table17.13.EstimatedeffectsonTNFofMTB,FAT
![Page 558: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/558.jpg)
andtheirinteraction
Effect(logscale)
95%Confidenceinterval
Ratioeffect(naturalscale)
95%Confidenceinterval
Withinteractionterm
MTB 2.333 (1.442to3.224)
10.3 (4.2to25.1)
FAT 2.463 (1.572to3.354)
11.7 (4.8to28.6)
MTB×FAT
0.054 (-1.206to1.314)
1.1 (0.3to3.7)
Withoutinteractionterm
MTB 2.306 (1.687to2.925)
10.0 (5.4to18.6)
FAT 2.435 (1.816to3.054)
11.4 (6.1to21.2)
Multipleregressioninwhichqualitativeandquantitativepredictorvariablesarebothusedisalsoknownasanalysisofcovariance.Forordinaldata,thereisatwo-wayanalysisofvarianceusingranks,theFriedmantest(seeConover1980,Altman1991)
![Page 559: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/559.jpg)
17.8*LogisticregressionLogisticregressionisusedwhentheoutcomevariableisdichotomous,a‘yesorno’,whetherornotthesubjecthasaparticularcharacteristicsuchasasymptom.Wewantaregressionequationwhichwillpredicttheproportionofindividualswhohavethecharacteristic,or,equivalently,estimatetheprobabilitythatanindividualwillhavethesymptom.Wecannotuseanordinarylinearregressionequation,becausethismightpredictproportionslessthanzeroorgreaterthanone,whichwouldbemeaningless.Insteadweusethelogitoftheproportionastheoutcomevariable.Thelogitofaproportionpisthelogodds(§13.7):
Thelogitcantakeanyvaluefromminusinfinity,whenp=0,toplusinfinity,whenp=1.WecanfitregressionmodelstothelogitwhichareverysimilartotheordinarymultipleregressionandanalysisofvariancemodelsfoundfordatafromaNormaldistribution.Weassumethatrelationshipsarelinearonthelogisticscale:
wherex1,…,xmarethepredictorvariablesandpistheproportiontobepredicted.Themethodiscalledlogisticregression,andthecalculationiscomputerintensive.Theeffectsofthepredictorvariablesarefoundaslogoddsratios.Wewilllookattheinterpretationinanexample.
![Page 560: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/560.jpg)
Fig.17.6.Bodymassindex(BMI)inwomenundergoingtrialofscar
Table17.14.Coefficientsinthelogisticregressionforpredictingcaesariansection
Coef. Std.Err. z P
95%Confidenceinterval
BMI 0.0883
0.0200
4.42 <0.001 0.0492to0.1275
Induction 0.6471
0.2141
3.02 0.003 0.2276to1.0667
Prev.vag.del.
-1.7963
0.2981
-6.03 <0.001 -2.3805to-1.2120
![Page 561: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/561.jpg)
Intercept -3.7000
0.5343
-6.93 <0.001 -4.7473to-2.6528
Whengivingbirth,womenwhohavehadapreviouscaesariansectionusuallyhaveatrialofscar,thatis,theyattemptanaturallabourwithvaginaldeliveryandonlyhaveanothercaesarianifthisisdeemednecessary.Severalfactorsmayincreasetheriskofacaesarian,andinthisstudythefactorofinterestwasobesity,asmeasuredbythebodymassindexorBMI,definedasweight/height2.ThedistributionofBMIisshowninFigure17.6(dataofAndreasPapadopoulos).ForcaesariansthemeanBMIwas26.4kg/m2andforvaginaldeliveriesthemeanwas24.9kg/m2.Twoothervariableshadastrongrelationshipwithasubsequentcaesarian.Womenwhohadhadapreviousvaginaldelivery(PVD)werelesslikelytoneedacaesarian,oddsratio=0.18,95%confidenceinterval0.10to0.32.Womenwhoselabourwasinducedhadanincreasedriskofacaesarian,oddsratio=2.11,95%confidenceinterval1.44to3.08.Alltheserelationshipswerehighlysignificant.ThequestiontobeansweredwaswhethertherelationshipbetweenBMIandcaesariansectionremainedwhentheeffectsofinductionandpreviousdeliverieswereallowedfor.
TheresultsofthelogisticregressionareshowninTable17.14.Wehavethecoefficientsfortheequationpredictingthelogoddsofacaesarian:
log(o)=-3.7000+0.0883×BMI+0.6471×induction-1.7963×PVD
whereinductionandPVDare1ifpresent,0ifnot.ThusforwomanwhohadBMI=25kg/m2,notbeeninducedandhadapreviousvaginaldeliverythelog
oddsofacaesarianisestimatedtobe
Table17.15.Oddsratiosfromthelogisticregression
![Page 562: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/562.jpg)
forpredictingcaesariansection
Oddsratio P 95%Confidenceinterval
BMI 1.092 <0.001 1.050to1.136
Induction 1.910 0.003 1.256to2.906
Prev.vag.del.
0.166 <0.001 0.096to0.298
log(o)=-3.7000+0.0883×25+0.6471×0-1.7963×1=-3.2888
Theoddsisexp(-3.2888)=0.03730andtheprobabilityisgivenbyp=o/(1+o)=0.03730/(1+0.03730)=0.036.Iflabourhadbeeninduced,thelogoddswouldriseto
log(o)=-3.7000+0.0883×25+0.6471×1-1.7963×1=-2.6417
givingoddsexp(-2.6417)=0.07124andhenceprobability0.07124/(1+0.07124)=0.067.
Becausethelogisticregressionequationpredictsthelogodds,thecoefficientsrepresentthedifferencebetweentwologodds,alogoddsratio.Theantilogofthecoefficientsisthusanoddsratio.Someprogramswillprinttheseoddsratiosdirectly,asinTable17.15.Wecanseethatinductionincreasestheoddsofacaesarianbyafactorof1.910andapreviousvaginaldeliveryreducestheoddsbyafactorof0.166.Theseareoftencalledadjustedoddsratios.Inthisexampletheyandtheirconfidenceintervalsaresimilartotheunadjustedoddsratiosgivenabove,becausethethreepredictorvariableshappennottobecloselyrelatedtoeachother.
Foracontinuouspredictorvariable,suchasBMI,thecoefficientisthechangeinlogoddsforanincreaseofoneunitinthepredictorvariable.
![Page 563: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/563.jpg)
Theantilogofthecoefficient,theoddsratio,isthefactorbywhichtheoddsmustbemultipliedforaunitincreaseinthepredictor.Twounitsincreaseinthepredictorincreasestheoddsbythesquareoftheoddsratio,andsoon.Adifferenceof5kg/m2inBMIgivesanoddsratioforacaesarianof1.0925=1.55,thustheoddsofacaesarianaremultipliedby1.55.See§11.8forasimilarinterpretationandfullerdiscussionwhenacontinuousoutcomevariableislogtransformed.
Whenwehaveacasecontrolstudy,wecananalysethedatabyusingthecaseorcontrolstatusastheoutcomevariableinalogisticregression.Thecoefficientsarethentheapproximatelogrelativerisksduetothefactors(§13.7).Thereisavariantcalledconditionallogisticregression,whichcanbeusedwhenthecasesandcontrolsareinmatchedpairs,triples,etc.
Logisticregressionisalargesamplemethod.Aruleofthumbisthatthereshouldbeatleast10‘yes'sand10‘no's,andpreferably20,foreachpredictorvariable(Peduzzietal.1996).
17.9*SurvivaldatausingCoxregressionOneproblemofsurvivaldata,thecensoringofindividualswhohavenotdiedatthetimeofanalysis,hasbeendiscussedin§15.6.Thereisanotherwhichisimportantformultifactorialanalysis.Weoftenhavenosuitablemathematicalmodelofthewaysurvivalisrelatedtotime,i.e.thesurvivalcurve.ThesolutionnowwidelyadoptedtothisproblemwasproposedbyCox(1972),andisknownasCoxregressionortheproportionalhazardsmodel.Inthisapproach,wesaythatforsubjectswhohavelivedtotimet,theprobabilityofanendpoint(e.g.dying)instantaneouslyattimetish(t),whichisanunknownfunctionoftime.Wecalltheprobabilityofanendpointthehazard,andh(t)isthehazardfunction.Wethenassumethatanythingwhichaffectsthehazarddoessobythesameratioatalltimes.Thus,somethingwhichdoublestheriskofanendpointondayonewillalsodoubletheriskofanendpointondaytwo,daythreeandsoon.Thus,ifh0(t)isthehazardfunctionforsubjectswithallthepredictorvariablesequaltozero,andh(t)isthehazardfunctionforasubjectwithsomeother
![Page 564: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/564.jpg)
valuesforthepredictorvariables,h(t)/h0(t)dependsonlyonthepredictorvariables,notontimet.Wecallh(t)/h0(t)thehazardratio.Itistherelativeriskofanendpointoccurringatanygiventime.
Instatistics,itisconvenienttoworkwithdifferencesratherthanratios,sowetakethelogarithmoftheratio(see§5A)andhavearegression-likeequation:
wherex1,…,xparethepredictorvariablesandb1,…,bparethecoefficientswhichweestimatefromthedata.ThisisCox'sproportionalhazardsmodel.Coxregressionenablesustoestimatethevaluesofb1,…,bpwhichbestpredicttheobservedsurvival.Thereisnoconstanttermb0,itsplacebeingtakenbythebaselinehazardfunctionh0(t).
Table15.7showsthetimetorecurrenceofgallstones,orthetimeforwhichpatientsareknowntohavebeengallstone-free,followingdissolutionbybileacidtreatmentorlithotrypsy,withthenumberofpreviousgallstones,theirmaximumdiameter,andthetimerequiredfortheirdissolution.Thedifferencebetweenpatientswithasingleandwithmultiplepreviousgallstoneswastestedusingthelogranktest(§15.6).Coxregressionenablesustolookatcontinuouspredictorvariables,suchasdiameterofgallstone,andtoexamineseveralpredictorvariablesatonce.Table17.16showstheresultoftheCoxregression.Wecanearn-outanapproximatetestofsignificancedividingthecoefficientbyitsstandarderror,andifthenullhypothesisthatthecoefficientwouldbezerointhepopulationistrue,thisfollowsaStandardNormaldistribution.Thechi-squaredstatisticteststherelationshipbetweenthetimetorecurrenceandthethreevariablestogether.Themaximumdiameterhasnosignificantrelationshiptotimetorecurrence,sowecantryamodelwithoutit(Table17.17).Asthechangeinoverallchi-squaredshows,removingdiameterhashadverylittleeffect.
ThecoefficientsinTable17.17aretheloghazardratios.Thecoefficientfor
multiplegallstonesis0.963.Ifweantilogthis,wegetexp(0.963)=
![Page 565: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/565.jpg)
2.62.Asmultiplegallstonesisa0or1variable,thecoefficientmeasuresthedifferencebetweenthosewithsingleandmultiplestones.Apatientwithmultiplegallstonesis2.62timesaslikelytohavearecurrenceatanytimethanapatientwithasinglestone.The95%confidenceintervalforthisestimateisfoundfromtheantilogsoftheconfidenceintervalinTable17.17,1.30to5.26.Notethatapositivecoefficientmeansanincreasedriskoftheevent,inthiscaserecurrence.Thecoefficientformonthstodissolutionis0.043,whichhasantilog=1.04.Thisisaquantitativevariable,andforeachmonthtodissolvethehazardratioincreasesbyafactorof1.04.Thusapatientwhosestonetooktwomonthstodissolvehasariskofrecurrence1.04timesthatforapatientwhosestonetookonemonth,apatientwhosestonetookthreemonthshasarisk1.042timesthatforaonemonthpatient,andsoon.
Table17.16.Coxregressionoftimetorecurrenceofgallstonesonpresenceofmultiplestones,maximum
diameterofstoneandmonthstodissolution
Variable Coef. Std.Err. z P
95%Conf.interval
Mult.gallstones
0.838 0.401 2.09 0.038 0.046to1.631
Max.diam.
-0.023 0.036 -0.63 0.532 -0.094to0.049
Monthsto 0.044 0.017 2.64 0.009 0.011
![Page 566: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/566.jpg)
dissol. to0.078
X2=12.57,3d.f.,P=0.006.
Table17.17.Coxregressionoftimetorecurrenceofgallstonesonpresenceofmultiplestonesand
monthstodissolution
Variable Coef. Std.Err. z P
95%Conf.interval
Mult.gallstones
0.963 0.353 2.73 0.007 0.266to1.661
Monthstodissol.
0.043 0.017 2.59 0.011 0.010to0.076
X2=12.16,2d.f.,P=0.002.
IfwehaveonlythedichotomousvariablemultiplegallstonesintheCoxmodel,wegetfortheoverallteststatisticX2=6.11,1degreesoffreedom.In§15.6weanalysedthesedatabycomparisonoftwogroupsusingthelogranktestwhichgaveX2=6.62,1degreeoffreedom.Thetwomethodsgivesimilar,butnotidenticalresults.Thelogranktestis
![Page 567: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/567.jpg)
non-parametric,makingnoassumptionaboutthedistributionofsurvivaltime.TheCoxmethodissaidtosemi-parametric,becausealthoughitmakesnoassumptionabouttheshapeofthedistributionofsurvivaltime,itdoesrequireassumptionsaboutthehazardratio.
Likelogisticregression(§17.8),Coxregressionisalargesamplemethod.Aruleofthumbisthatthereshouldbeatleast10,andpreferably20,events(deaths)foreachpredictorvariable.FulleraccountsofCoxregressionaregivenbyAltman(1991),MatthewsandFarewell(1988),ParmarandMachin(1995),andHosmerandLemeshow(1999).
17.10*StepwiseregressionStepwiseregressionisatechniqueforchoosingpredictorvariablesfromalargeset.Thestepwiseapproachcanbeusedwithmultiplelinear,logisticandCoxregressionandwithother,lessoftenseen,regressiontechniques(§17.12)too.
Therearetwobasicstrategies:step-upandstep-down,alsocalledforwardandbackward.Instep-uporforwardregression,wefitallpossibleone-wayregressionequations.Havingfoundtheonewhichaccountsforthegreatestvariance,alltwo-wayregressionsincludingthisvariablearefitted.Theequationaccountingforthemostvariationischosen,andallthree-wayregressionsincludingthesearefitted,andsoon.Thiscontinuesuntilnosignificantincrease,invariationaccountedforisfound.Inthestep-downorbackwardmethod,wefirstfittheregressionwithallthepredictorvariables,andthenthevariableisremovedwhichreducestheamountofvariationaccountedforbytheleastamount,andsoon.Therearealsomorecomplexmethods,inwhichvariablescanbothenterandleavetheregressionequation.
Thesemethodsmustbetreatedwithcare.Differentstepwisetechniquesmayproducedifferentsetsofpredictorvariablesintheregressionequation.Thisisespeciallylikelywhenthepredictorvariablesarecorrelatedwithoneanother.Thetechniqueisveryusefulforselectingasmallsetofpredictorvariablesforpurposesofstandardizationandprediction.Fortryingtogetanunderstandingof
![Page 568: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/568.jpg)
theunderlyingsystem,stepwisemethodscanbeverymisleading.Whenpredictorvariablesarehighlycorrelated,onceonehasenteredtheequationinastep-upanalysis,theotherwillnotenter,eventhoughitisrelatedtotheoutcome.Thusitwillnotappearinthefinalequation.
17.11*Meta-analysis:DatafromseveralstudiesMeta-analysisisthecombinationofdatafromseveralstudiestoproduceasingleestimate.Fromthestatisticalpointofview,meta-analysisisastraightforwardapplicationofmultifactorialmethods.Wehaveseveralstudiesofthesamething,whichmightbeclinicaltrialsorepidemiologicalstudies,perhapscarriedoutindifferentcountries.Eachtrialgivesusanestimateofaneffect.Weassumethattheseareestimatesofthesameglobalpopulationvalue.Wechecktheassumptionsoftheanalysis,and,iftheseassumptionsaresatisfied,wecombinetheseparatestudyestimatestomakeacommonestimate.Thisisamultifactorialanalysis,wherethetreatmentorriskfactorisonepredictorvariableandthestudyisanother,categorical,predictorvariable.
Themainproblemsofmeta-analysisarisebeforewebegintheanalysisofthedata.First,wemusthaveacleardefinitionofthequestionsothatweonlyincludestudieswhichaddressthis.Forexample,ifwewanttoknowwhetherloweringserumcholesterolreducesmortalityfromcoronaryarterydisease,wewouldnotwanttoincludeastudywheretheattempttolowercholesterolfailed.Ontheotherhand,ifweaskwhetherdietaryadvicelowersmortality,wewouldincludesuchastudy.Whichstudiesweincludemayhaveaprofoundinfluenceontheconclusions(Thompson1993).Second,wemusthavealltherelevant
studies.Asimpleliteraturesearchisnotenough.Notallstudieswhichhavebeenstartedarepublished;studieswhichproducesignificantdifferencesaremorelikelytobepublishedthanthosewhichdonot(e.g.PocockandHughes1990;Easterbrooketal.1991).Withinastudy,resultswhicharesignificantmaybeemphasizedandpartsofthedatawhichproducenodifferencesmaybeignoredbytheinvestigatorsasuninteresting.Publicationofunfavourableresultsmaybediscouragedbythesponsorsofresearch.ResearcherswhoarenotnativeEnglish
![Page 569: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/569.jpg)
speakersmayfeelthatpublicationintheEnglishlanguageliteratureismoreprestigiousasitwillreachawideraudience,andsotrytherefirst,onlypublishingintheirownlanguageiftheycannotpublishinEnglish.TheEnglishlanguageliteraturemaythuscontainmorepositiveresultsthandootherliteratures.Thephenomenonbywhichsignificantandpositiveresultsaremorelikelytobereported,andreportedmoreprominently,thannon-significantandnegativeonesiscalledpublicationbias.Thuswemustnotonlytrawlthepublishedliteratureforstudies,butusepersonalknowledgeofourselvesandotherstolocatealltheunpublishedstudies.Onlythenshouldwecarryoutthemeta-analysis.
Whenwehaveallthestudieswhichmeetthedefinition,wecombinethemtogetacommonestimateoftheeffectofthetreatmentorriskfactor.Weregardthestudiesasprovidingseveralobservationsofthesamepopulationvalue.Therearetwostagesinmeta-analysis.Firstwecheckthatthestudiesdoprovideestimatesofthesamething.Second,wecalculatethecommonestimateanditsconfidenceinterval.Todothiswemayhavetheoriginaldatafromallthestudies,whichwecancombineintoonelargedatafilewithstudyasoneofthevariables,orwemayonlyhavesummarystatisticsobtainedfrompublications.
Iftheoutcomemeasureiscontinuous,suchasmeanfallinbloodpressure,wecancheckthatsubjectsarefromthesamepopulationbyanalysisofvariance,withtreatmentorriskfactor,study,andinteractionbetweentheminthemodel.Multipleregressioncanalsobeused,rememberingthatstudyisacategoricalvariableanddummyvariablesarerequired.Wetestthetreatmenttimesstudyinteractionintheusualway.Iftheinteractionissignificantthisindicatesthatthetreatmenteffectisnotthesameinallstudies,andsowecannotcombinethestudies.Itistheinteractionwhichisimportant.Itdoesnotmattermuchifthemeanbloodpressurevariesfromstudytostudy.Whatmattersiswhethertheeffectofthetreatmentonbloodpressurevariesmorethanwewouldexpect.Wemaywanttoexaminethestudiestoseewhetheranycharacteristicofthestudiesexplainsthisvariation.Thismightbeafeatureofthesubjects,thetreatmentorthedatacollection.Ifthereisnointeraction,thenthedataareconsistentwiththetreatmentorriskfactoreffectbeingconstant.Thisiscalleda
![Page 570: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/570.jpg)
fixedeffectsmodel(see§10.12).Wecandroptheinteractiontermfromthemodelandthetreatmentorriskfactoreffectisthentheestimatewewant.Itsstandarderrorandconfidenceintervalarefoundasdescribedin§17.2.Ifthereisaninteraction,wecannotestimateasingletreatmenteffect.Wecanthinkofthestudiesasarandomsampleofthepossibletrialsandestimatethemeantreatmenteffectforthispopulation.Thisiscalledtherandomeffectsmodel(§10.12).The
confidenceintervalisusuallymuchwiderthanthatfoundusingthefixedeffectmodel.
Table17.18.OddsratiosandconfidenceintervalsinfivestudiesofvitaminAsupplementationin
infectiousdisease(GlasziouandMackerras1993)
Study Doseregime VitaminA Controls
Deaths Number Deaths Number
1 200000IUsix-monthly
101 12991 130 12209
2 200000IUsix-monthly
39 7076 41 7006
3 8333IUweekly
37 7764 80 7755
![Page 571: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/571.jpg)
4 200000IUfour-monthly
152 12541 210 12264
5 200000IUonce
138 3786 167 3411
Table17.19.OddsratiosandconfidenceintervalsinfivestudiesofvitaminAsupplementationin
infectiousdisease
Study Oddsratio 95%Confidenceinterval
1 0.73 0.56to0.95
2 0.94 0.61to1.46
3 0.46 0.31to0.68
4 0.70 0.57to0.87
5 0.73 0.58to0.93
Iftheoutcomemeasureisdichotomous,suchassurvivedordied,theestimateofthetreatmentorriskfactoreffectwillbeintheformofanoddsratio(§13.7).Wecanproceedinthesamewayasforacontinuousoutcome,usinglogisticregression(§17.8).Severalothermethodsexistforcheckingthehomogeneityoftheoddsratiosacrossstudies,suchas
![Page 572: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/572.jpg)
Woolf'stest(seeArmitageandBerry1994)orthatofBreslowandDay(1980).Theyallgivesimilaranswers,and,sincetheyarebasedondifferentlarge-sampleapproximations,thelargerthestudysamplesthemoresimilartheresultswillbe.Providedtheoddsratiosarehomogeneousacrossstudies,wecanthenestimatethecommonoddsratio.ThiscanbedoneusingtheMantel-Haenszelmethod(seeArmitageandBerry1994)orbylogisticregression.
Forexample,GlasziouandMackerras(1993)carriedoutameta-analysisofvitaminAsupplementationininfectiousdisease.TheirdataforfivecommunitystudiesareshowninTable17.18.Wecanobtainoddsratiosandconfidenceintervalsasdescribedin§13.7,showninTable17.19.
Thecommonoddsratiocanbefoundinseveralways.Touselogisticregression,weregresstheeventofdeathonvitaminAtreatmentandstudy.Ishalltreatthetreatmentasadichotomousvariable,setto1iftreatedwithvitaminA,0ifcontrol.Studyisacategoricalvariable,sowecreatedummyvariablesstudy1tostudy4,whicharesettooneforstudies1to4respectively,andtozerootherwise.Wetesttheinteractionbycreatinganothersetofvariables,theproductsofstudy1tostudy4andvitaminA.LogisticregressionofdeathonvitaminA,studyandinteractiongivesachi-squaredstatisticforthemodelof
496.99with9degreesoffreedom,whichishighlysignificant.Logisticregressionwithouttheinteractiontermsgives490.33with5degreesoffreedom.Thedifferenceis496.99-490.33=6.66with9-5=4degreesoffreedom,whichhasP=0.15,sowecandroptheinteractionfromthemodel.TheadjustedoddsratioforvitaminAis0.70,95%confidenceinterval0.62to0.79,P<0.0001.
![Page 573: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/573.jpg)
Fig.17.7.Meta-analysisoffivevitaminAtrials(dataofGlasziouandMackerras1993).Theverticallinesaretheconfidenceintervals.
TheoddsratiosandtheirconfidenceintervalsareshowninFigure17.7.Theconfidenceintervalisindicatedbyaline,thepointestimateoftheoddsratiobyacircle.Inthispicturethemostimportanttrialappearstobestudy2,withthewidestconfidenceinterval.Infact,itisthestudywiththeleasteffectonthewholeestimate,becauseitisthestudywheretheoddsratioisleastwellestimated.Inthesecondpicture,theoddsratioisindicatedbythemiddleofasquare.Theareaofthesquareisproportionaltothenumberofsubjectsinthestudy.Thisnowmakesstudy2appearrelativelyunimportant,andmakestheoverallestimatestandout.
Therearemanyvariantsonthisstyleofgraph,whichissometimescalledaforestdiagram.Thegraphisoftenshownwiththestudiesontheverticalaxis
andtheoddsratioordifferenceinmeanonthehorizontalaxis(Figure17.8).Thecombinedestimateoftheeffectmaybeshownasalozengeordiamondshapeandforoddsratiosalogarithmicscaleisoftenemployed,asinFigure17.8.
![Page 574: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/574.jpg)
Fig.17.8.Meta-analysisoffivevitaminAtrials,verticalversion
17.12*OthermultifactorialmethodsThechoiceofmultiple,logisticorCoxregressionisdeterminedbythenatureoftheoutcomevariable:continuous,dichotomous,orsurvivaltimesrespectively.Thereareothertypesofoutcomevariableandcorrespondingmultifactorialtechniques.Ishallnotgointoanydetails,butthislistmayhelpshouldyoucomeacrossanyofthem.Iwouldrecommendyouconsultastatisticianshouldyouactuallyneedtouseoneofthesemethods.Thetechniquesfordealingwithpredictorvariablesdescribedin§17.2–17.4and§17.6applytoallofthem.
Iftheoutcomevariableiscategoricalwithmorethantwocategories,e.g.severaldiagnosticgroups,weuseaprocedurecalledmultinomiallogisticregression.Thisestimatesforasubjectwithgivenvaluesofthepredictorvariabletheprobabilitythatthesubjectwillbeineachcategory.Ifthecategoriesareordered,e.g.tumourstage,wecantaketheorderingintoaccountusingorderedlogisticregression.Boththesetechniquesarecloselyrelatedtologisticregression(§17.8).
Iftheoutcomeisacount,suchashospitaladmissionsinadayordeathsrelatedtoaspecificcauseperweekormonth,wecanusePoisson
![Page 575: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/575.jpg)
regression.Thisisparticularlyusefulwhenwehavemanytimeintervalsbutthenumbersofeventsperintervalissmall,sothattheassumptionsofmultipleregression(§17.5)donotapply.
Aslightlydifferentproblemariseswithmulti-waycontingencytableswherethereisnoobviousoutcomevariable.Wecanuseatechniquecalledloglinearmodelling.Thisenablesustotesttherelationshipbetweenanytwoofthevariablesinthetableholdingtheothersconstant.
17M*Multiplechoicequestions93to97(Eachansweristrueorfalse)
93.Inmultipleregression,R2:
(a)isthesquareofthemultiplecorrelationcoefficient;
(b)wouldbeunchangedifweexchangedtheoutcome(dependent)variableandoneofthepredictor(independent)variables;
(c)iscalledtheproportionofvariabilityexplainedbytheregression;
(d)istheratiooftheerrorsumofsquarestothetotalsumofsquares;
(e)wouldincreaseifmorepredictorvariableswereaddedtothemodel.
ViewAnswer
Table17.20.Analysisofvariancefortheeffectsofage,sexandethnicgroup(Afro-CaribbeanversusWhite)oninter-pupil
distance(Imafedon,personalcommunication)
![Page 576: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/576.jpg)
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F)
Probability
Total 37 603.586
Agegroup
2 124.587 62.293 6.81 0.003
Sex 1 1.072 1.072 0.12 0.7
Ethnicgroup
1 134.783 134.783 14.74 0.0005
Residual 33 301.782 9.145
94.TheanalysisofvariancetableforastudyofthedistancebetweenthepupilsoftheeyesisshowninTable17.20:
(a)therewere34observations;
(b)thereisgoodevidenceofanethnicgroupdifferenceinthepopulation:
(c)wecanconcludethatthereisnodifferenceininter-pupildistancebetweenmenandwomen;
(d)thereweretwoagegroups;
(e)thedifferencebetweenethnicgroupsislikelytobeduetoarelationshipbetweenethnicityandageinthesample.
ViewAnswer
![Page 577: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/577.jpg)
Table17.21.Logisticregressionofgraftfailureafter6months(Thomasetal.1993)
Variable Coef. Std.Err.
z=coef/se P 95%
Conf.
Whitecellcount
1.238 0.273 4.539 <0.001
0.695
Grafttype1
0.175 0.876 0.200 0.842 -1.570
Grafttype2
0.973 1.030 0.944 0.348 -1.080
Grafttype3
0.038 1.518 0.025 0.980 -2.986
Female -0.289 0.767 -0.377 0.708 -1.816
Age 0.022 0.035 0.633 0.528 -0.048
Smoker 0.998 0.754 1.323 0.190 -0.504
Diabetic 1.023 0.709 1.443 0.153 -0.389
Constant -13.726 3.836 -3.578 0.001 -21.369
![Page 578: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/578.jpg)
Numberofobservations=84,chi-squared=38.05,d.f.=8,P<0.0001.
95.Table17.21showsthelogisticregressionofveingraftfailureonsomepotentialexplanatoryvariables.Fromthisanalysis:
(a)patientswithhighwhitecellcountsweremorelikelytohavegraftfailure;
(b)thelogoddsofgraftfailureforadiabeticisbetween0.389lessand2.435greaterthanthatforanon-diabetic;
(c)graftsweremorelikelytofailinfemalesubjects,thoughthisisnotsignificant;
(d)therewerefourtypesofgraft;
(e)anyrelationshipbetweenwhitecellcountandgraftfailuremaybeduetosmokershavinghigherwhitecellcounts.
ViewAnswer
Fig.17.9.Oralandforeheadtemperaturemeasurementsmadeinagroupofpyrexicpatients
![Page 579: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/579.jpg)
96.ForthedatainFigure17.9:
(a)therelationshipcouldbeinvestigatedbylinearregression;
(b)an‘oralsquared’termcouldbeusedtotestwhetherthereisanyevidencethattherelationshipisnotastraightline;
(c)ifan‘oralsquared’termwereincludedtherewouldbe2degreesoffreedomforthemodel;
(d)thecoefficientsofan‘oral’andan‘oralsquared’termwouldbeuncorrelated;
(e)theestimationofthecoefficientofaquadratictermwouldbeimprovedbysubtractingthemeanfromtheoraltemperaturebeforesquaring.
ViewAnswer
Table17.22.Coxregressionoftimetoreadmissionforasthmaticchildrenfollowingdischargefrom
hospital(Mitchelletal.1994)
Variable Coef. Std.err. coef/se P
Boy -0.197 0.088 -2.234 0.026
Age -0.126 0.017 -7.229 <0.001
Previousadmissions
0.395 0.034 11.695 <0.001
![Page 580: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/580.jpg)
(squareroot)
Inpatienti.v.therapy
0.267 0.093 2.876 0.004
Inpatienttheophyline
-0.728 0.295 -2.467 0.014
Numberofobservations=1024,X2=167.15,5d.f.,P<0.0001.
97.Table17.22showstheresultsofanobservationalstudyfollowingupasthmaticchildrendischargedfromhospital.Fromthistable:
(a)theanalysiscouldonlyhavebeendoneifallchildrenhadbeenreadmittedtohospital;
(b)theproportionalhazardsmodelwouldhavebeenbetterthanCoxregression;
(c)Boyshaveashorteraveragetimebeforereadmissionthandogirls;
(d)theuseoftheophylinepreventsreadmissiontohospital;
(e)childrenwithseveralpreviousadmissionshaveanincreasedriskofreadmission.
ViewAnswer
![Page 581: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/581.jpg)
Fig.17.10.Cushionvolumeagainstnumberofpairsofsomitesfortwogroupsofmouseembryos(WebbandBrown,personalcommunication)
Table17.23.Numberofsomitesandcushionvolumeinmouseembryos
Normal Trisomy-16
som. c.vol. som. c.vol. som. c.vol. som.
17 2.674 28 3.704 15 0.919 28
20 3.299 31 6.358 17 2.047 28
![Page 582: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/582.jpg)
21 2.486 32 3.966 18 3.302 28
23 1.202 32 7.184 20 4.667 31
23 4.263 34 8.803 20 4.930 32
23 4.620 35 4.373 23 4.942 34
25 4.644 40 4.465 23 6.500 35
25 4.403 42 10.940 23 7.122 36
27 5.417 43 6.035 25 7.688 40
27 4.395 25 4.230 42
27 8.647
17E*Exercise:AmultipleregressionanalysisTrisomy-16micecanbeusedasananimalmodelforDown'ssyndrome.Thisanalysislooksatthevolumeofaregionoftheheart,theatrioventricularcushion,ofamouseembryo,comparedbetweentrisomicandnormalembryos.Theembryoswereatvaryingstagesofdevelopment,indicatedbythenumberofpairsofsomites(precursorsofvertebrae).Figure17.10andTable17.23showthedata.Thegroupwascoded1=normal,2=trisomy-16.Table17.24showstheresultsofaregressionanalysisandFigure17.11showsresidualplots.
1.Isthereanyevidenceofadifferenceinvolumebetweengroupsforgivenstageofdevelopment?
ViewAnswer
![Page 583: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/583.jpg)
2.Figure17.11showsresidualplotsfortheanalysisofTable17.24.Arethereanyfeaturesofthedatawhichmightmaketheanalysisinvalid?
ViewAnswer
Table17.24.Regressionofcushionvolumeonnumberofpairsofsomitesandgroupinmouseembryos
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
Total 39 328.976
Duetoregression
2 197.708 98.854 27.86 P<0.0001
Residual(aboutregression)
37 131.268 3.548
Variable Coef. Std.Err. t P 95%Conf.interval
group 2.44 0.60 4.06 <0.001 1.29to3.65
somites 0.27 0.04 6.70 <0.001 0.19to0.36
![Page 584: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/584.jpg)
Fig.17.11.ResidualagainstnumberofpairsofsomitesandNormalplotofresidualsfortheanalysisofTable17.24
3.ItappearsfromFigure17.10thattherelationshipbetweenvolumeandnumberofpairsofsomitesmaynotbethesameinthetwogroups.Table17.25showstheanalysisofvarianceforregressionanalysisincludinganinteractionterm.CalculatetheF-ratiototesttheevidencethattherelationshipisdifferentinnormalandtrisomy-16embryos.YoucanfindtheprobabilityfromTable10.1,usingthefactthatthesquarerootofFwith1andndegreesoffreedomistwithndegreesoffreedom.
ViewAnswer
Table17.25.Analysisofvarianceforregressionwithnumberofpairsofsomites×groupinteraction
Sourceofvariation
Degreesoffreedom
Sumofsquares
Meansquare
Varianceratio(F) Probability
![Page 585: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/585.jpg)
Total 39 328.976
Duetoregression
3 207.139 69.046 20.40 P<0.0001
Residual(aboutregression)
36 121.837 3.384
![Page 586: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/586.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>18-Determinationofsamplesize
18
Determinationofsamplesize
18.1*EstimationofapopulationmeanOneofthequestionsmostfrequentlyaskedofamedicalstatisticianis‘HowlargeasampleshouldItake?’Inthischapterweshallseehowstatisticalmethodsfordecidingsamplesizescanbeusedinpracticeasanaidindesigninginvestigations.Themethodsweshallusearelargesamplemethods,thatis,theyassumethatlargesamplemethodswillbeusedintheanalysisandsotakenoaccountofdegreesoffreedom.
Wecanusetheconceptsofstandarderrorandconfidenceintervaltohelpdecidehowmanysubjectsshouldbeincludedinasample.Ifwewanttoestimatesomepopulationquantity,suchasthemean,andweknowhowthestandarderrorisrelatedtothesamplesize,thenwecancalculatethesamplesizerequiredtogiveaconfidenceintervalwiththedesiredwidth.Thedifficultyisthatthestandarderrormayalsodependeitheronthequantitywewishtoestimate,oronsomeotherpropertyofthepopulation,suchasthestandarddeviation.Wemustestimatethesequantitiesfromdataalreadyavailable,orcarryoutapilotstudytoobtainaroughestimate.Thecalculationofsamplesizecanonlybeapproximateanyway,sotheestimatesusedtodoitneednotbeprecise.
Ifwewanttoestimatethemeanofapopulation,wecanusetheformulaforthestandarderrorofamean,s/√n,toestimatethesamplesizerequired.Forexample,supposewewishtoestimatethemeanFEV1inapopulationofyoungmen.WeknowthatinanotherstudyFEV1hadstandarddeviations=0.67litre(§4.8).Wethereforeexpectthestandarderrorofthemeantobe0.67/√n.Wecansetthesizeof
![Page 587: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/587.jpg)
standarderrorwewantandchoosethesamplesizetoachievethis.Wemightdecidethatastandarderrorof0.1litreiswhatwewant,sothatwewouldestimatethemeantowithin1.96×0.1=0.2litre.Then:SE=0.67/√n,n=0.672/SE2=0.672/0.12=45.Wecanalsoseewhatthestandarderrorandwidthofthe95%confidenceintervalwouldbefordifferentvaluesofn:
n 10 20 50 100 200 500
standarderror 0.212 0.150 0.095 0.067 0.047 0.030
95%confidenceinterval
±0.42 ±0.29 ±0.19 ±0.13 ±0.09 ±0.06
Sothatifwehadasamplesizeof200,wewouldexpectthe95%confidenceintervaltobe0.09litreoneithersidedofthesamplemean(1.96standarderrors)whereaswithasampleof50the95%confidenceintervalwouldbe0.19litreon
eithersideofthemean.
18.2*EstimationofapopulationproportionWhenwewishtoestimateaproportionwehaveafurtherproblem.Thestandarderrordependsontheveryquantitywhichwewishtoestimate.Wemustguesstheproportionfirst.Forexample,supposewewishtoestimatetheprevalenceofadisease,whichwesuspecttobeabout2%,towithin5%,i.e.tothenearest1per1000.Theunknownproportion,p,isguessedtobe0.02andwewantthe95%confidenceintervaltobe0.001oneitherside,sothestandarderrormustbehalfthis,0.0005.
Theaccurateestimationofverysmallproportionsrequiresverylargesamples.Thisisaratherextremeexampleandwedonotusuallyneed
![Page 588: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/588.jpg)
toestimateproportionswithsuchaccuracy.Awiderconfidenceinterval,obtainablewithasmallersampleisusuallyacceptable.Wecanalsoask‘Ifwecanonlyaffordasamplesizeof1000,whatwillbethestandarderror?’
The95%confidencelimitswouldbe,roughly,p±0.009.Forexample,iftheestimatewere0.02,the95%confidencelimitswouldbe0.011to0.029.Ifthisaccuracyweresufficientwecouldproceed.
TheseestimatesofsamplesizearebasedontheassumptionthatthesampleislargeenoughtousetheNormaldistribution.Ifaverysmallsampleisindicateditwillbeinadequateandothermethodsmustbeusedwhicharebeyondthescopeofthisbook.
18.3*SamplesizeforsignificancetestsWeoftenwanttodemonstratetheexistenceofadifferenceorrelationshipaswellaswantingtoestimateitsmagnitude,asinaclinicaltrial,forexample.Webasethesesamplesizecalculationsonsignificancetests,usingthepowerofatest(§9.9)tohelpchoosethesamplesizerequiredtodetectadifferenceifitexists.Thepowerofatestisrelatedtothepostulateddifferenceinthepopulation,thestandarderrorofthesampledifference(whichinturndependsonthesamplesize),andthesignificancelevel,whichweusuallytaketobeα=0.05.Thesequantitiesarelinkedbyanequationwhichenablesustodetermineanyoneofthemgiventheothers.Wecanthensaywhatsamplesizewouldberequiredtodetectanygivendifference.Wethendecidewhatdifferenceweneedtobeable
todetect.Thismightbeadifferencewhichwouldhaveclinicalimportanceofadifferencewhichwethinkthetreatmentmayproduce.
Supposewehaveasamplewhichgivesanestimatedofthepopulationdifferenceµd.WeassumedcomesfromaNormaldistributionwithmeanµdandhasstandarderrorSE(d).Heredmightbethedifferencebetweentwomeanstwoproportions,oranythingelsewecancalculatefromdata.Weareinterestedintestingthenullhypothesisthatthereis
![Page 589: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/589.jpg)
nodifferenceinthepopulation.i.e.µd=0.Wearegoingtouseasignificancetestattheαlevel,andwantthepower,theprobabilityofdetectingasignificantdifference,tobeP.
IshalldefineuαtobethevaluesuchthattheStandardNormaldistribution(mean0andvariance1)islessthan-uαorgreaterthanuαwithprobabilityα.Forexample,u0.05=1.96.Theprobabilityoflyingbetween-uαanduαis1-α.ThusuαisthetwosidedαprobabilitypointoftheStandardNormaldistribution,asshowninTable7.2.
Ifthenullhypothesisweretrue,theteststatisticd/SE(d)wouldbefromaStandardNormaldistribution.Werejectthenullhypothesisattheαleveliftheteststatisticisgreaterthanuαorlessthan-uα,1.96fortheusual5%significancelevel.Forsignificancewemusthave:
Letusassumethatwearetryingtodetectadifferencesuchthatdwillbegreaterthan0.Thefirstalternativeisthenextremelyunlikelyandcanbeignored.Thuswemusthave,forasignificantdifference:d/SE(d)>uαsod>uαSE(d).ThecriticalvaluewhichdmustexceedisuαSE(d).
Now,disarandomvariable,andforsomesamplesitwillbegreaterthanitsmean,µd,forsomeitwillbelessthanitsmean.disanobservationfromaNormaldistributionwithmeanµdandvarianceSE(d)2.WewantdtoexceedthecriticalvaluewithprobabilityP,thechosenpowerofthetest.ThevalueoftheStandardNormaldistributionwhichisexceededwithprobabilityPis-u2(1-P)(seeFigure18.1).(1-P)isoftenrepresentedasβ(beta).Thisistheprobabilityoffailingtoobtainasignificantdifferencewhenthenullhypothesisisfalseandthepopulationdifferenceisµd.ItistheprobabilityofaTypeIIerror(§9.4).ThevaluewhichdexceedswithprobabilityPisthemeanminus-u2(1-P)standarddeviations:µd-u2(1-P)SE(d).Henceforsignificancethismustexceedthecriticalvalue,uαSE(d).Thisgives
µd-u2(1-P)SE(d)=uαSE(d)
![Page 590: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/590.jpg)
Puttingthecorrectstandarderrorformulaintothiswillyieldtherequiredsamplesize.Wecanrearrangeitas
µ2d=(uα+u2(1-P))2SE(d)2
ThisistheconditionwhichmustbemetifwearetohaveaprobabilityPof
detectingasignificantdifferenceattheαlevel.Weshallusetheexpression(uα2(1-P))2alot,soforconvenienceIshalldenoteitbyf(α,P).Table18.1showsthevaluesofthefactorf(α,P)fordifferentvaluesofαandP.Theusualvalueusedforαis0.05,andPisusually0.80,0.90,or0.95.
Fig.18.1.RelationshipbetweenPandu2(1-P)
Table18.1.Valuesoff(α,P)=(uα+u2(1-P))2for
![Page 591: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/591.jpg)
differentPandα
Power,PSignificancelevel,α
0.05 0.01
0.50 3.8 6.6
0.70 6.2 9.6
0.80 7.9 11.7
0.90 10.5 14.9
0.95 13.0 17.8
0.99 18.4 24.0
Sometimeswedonotexpectthenewtreatmenttobebetterthanthestandardtreatment,buthopethatitwillbeasgood.Wewanttotesttreatmentswhichmaybeasgoodastheexistingtreatmentbecausethenewtreatmentmaybecheaper,havefewersideeffects,belessinvasive,orunderourpatent.Wecannotusethepowermethodbasedonthedifferencewewanttobeabletodetect,becausewearenotlookingforadifference.Whatwedoisspecifyhowdifferentthetreatmentsmightbeinthepopulationandstillberegardedasequivalent,anddesignourstudytodetectsuchadifference.Thiscangetrathercomplicatedandspecialised,soIshallleavethedetailstoMachinetal.(1998).
18.4*Comparisonoftwomeans
![Page 592: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/592.jpg)
Whenwearecomparingthemeansoftwosamples,samplesizesn1andn2,frompopulationswithmeansµ1andµ2,withthevarianceofthemeasurementsbeingσ2,wehaveµd=µ1-µ2and
sotheequationbecomes:
Forexample,supposewewanttocomparebicepsskinfoldinpatientswithCrohn'sdiseaseandcoeliacdisease,followinguptheinconclusivecomparisonofbicepsskinfoldinTable10.4withalargerstudy.Weshallneedanestimateofthevariabilityofbicepsskinfoldinthepopulationweareconsidering.Wecanusuallygetthisfromthemedicalliterature,orasherefromourowndata.Ifnotwemustdoapilotstudy,asmallpreliminaryinvestigationtocollectsomedataandcalculatethestandarddeviation.ForthedataofTable10.4,thewithin-groupsstandarddeviationis2.3mm.Wemustdecidewhatdifferencewewanttodetect.Inpracticethismaybedifficult.InmysmallstudythemeanskinfoldthicknessintheCrohn'spatientswas1mmgreaterthaninmycoeliacpatients.Iwilldesignmylargerstudytodetectadifferenceof0.5mm.Ishalltaketheusualsignificancelevelof0.05.Iwantafairlyhighpower,sothatthereisahighprobabilityofdetectingadifferenceofthechosensizeshoulditexist.Ishalltake0.90,whichgivesf(α,P)=10.5fromTable18.1.Theequationbecomes:
Wehaveoneequationwithtwounknowns,sowemustdecideontherelationshipbetweenn1andn2.Ishalltrytorecruitequalnumbersinthetwogroups:
![Page 593: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/593.jpg)
andIneed444subjectsineachgroup.
Itmaybethatwedonotknowexactlywhatsizeofdifferenceweareinterestedin.Ausefulapproachistolookatthesizeofthedifferencewecoulddetectusingdifferentsamplesizes,asinTable18.2.Thisisdonebyputtingdifferentvaluesofninthesamplesizeequation.
Table18.2.Differenceinmeanbicepsskinfoldthickness(mm)detectedatthe5%significancelevelwithpower90%fordifferentsamplesizes,equal
groups
Sizeofeachgroup,n
Differencedetectedwithprobability0.90
10 3.33
20 2.36
50 1.49
100 1.05
200 0.75
500 0.47
![Page 594: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/594.jpg)
1000 0.33
Table18.3.Samplesizerequiredineachgrouptodetectadifferencebetweentwomeansatthe5%
significancelevelwithpower90%,usingequallysizedsamples
Differencein
standarddeviations
n
Differencein
standarddeviations
n
Differencein
standarddeviations
n
0.01 210000
0.1 2100 0.6 58
0.02 52500
0.2 525 0.7 43
0.03 23333
0.3 233 0.8 33
0.04 13125
0.4 131 0.9 26
0.05 8400
0.5 84 1.0 21
![Page 595: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/595.jpg)
Ifwemeasurethedifferenceintermsofstandarddeviations,wecanmakeageneraltable.Table18.3givesthesamplesizerequiredtodetectdifferencesbetweentwoequallysizedgroups.Altman(1982)givesaneatgraphicalmethodofcalculation.
Wedonotneedtohaven1=n2=n.Wecancalculateµ1-µ2fordifferentcombinationsofn1andn2.Thesizeofdifference,intermsofstandarddeviations,whichwouldbedetectedisgiveninTable18.4.Wecanseefromthisthatwhatmattersisthesizeofthesmallersample.Forexample,ifwehave10ingroup1and20ingroup2,wedonotgainverymuchbyincreasingthesizeofgroup2:increasinggroup2from20to100produceslessadvantagethanincreasinggroup1from10to20.Inthiscasetheoptimumisclearlytohavesamplesofequalsize.
Table18.4.Difference(instandarddeviations)detectableatthe5%significancelevelwithpower90%
fordifferentsamplesizes,unequalgroups
n2 n1
10 1.45 1.25 1.13 1.08 1.05 1.03 1.03
20 1.25 1.03 0.85 0.80 0.75 0.75 0.73
50 1.13 0.85 0.65 0.55 0.50 0.48 0.48
![Page 596: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/596.jpg)
100 1.08 0.80 0.55 0.45 0.40 0.35 0.35
200 1.05 0.75 0.50 0.40 0.33 0.28 0.25
500 1.03 0.75 0.48 0.35 0.28 0.20 0.18
1000 1.03 0.73 0.48 0.35 0.25 0.18 0.15
18.5*ComparisonoftwoproportionsUsingthesameapproach,wecanalsocalculatethesamplesizesforcomparingtwoproportions.Ifwehavetwosampleswithsizesn1andn2fromBinomialpopulationswithproportionsp1andp2thedifferenceisµd=p1-p2,thestandarderrorofthedifferencebetweenthesampleproportions(§8.6)is:
Ifweputtheseintothepreviousformulawehave:
Thesizeoftheproportions,p1andp2,isimportant,aswellastheirdifference.(Thesignificancetestimpliedhereissimilartothechi-squaredtestfora2by2table).Whenthesamplesizesareequal,i.e.n1=n2=n,wehave
Thereareseveralslightvariationsonthisformula.Differentcomputerprogramsmaythereforegiveslightlydifferentsamplesizeestimates
Supposewewishtocomparethesurvivalratewithanewtreatmentwiththatwithanoldtreatment,whereitisabout60%.Whatvaluesofn1andn2willhave90%chanceofgivingsignificantdifferenceatthe5%
![Page 597: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/597.jpg)
levelfordifferentvaluesofp2?ForP=0.90andα=0.05,f(α,P)=10.5.Supposewewishtodetectanincreaseinthesurvivalrateonthenewtreatmentto80%,sop2=0.80,andp1=0.60.
Table18.5.Samplesizeineachgrouprequiredtodetectdifferentproportionsp2whenp1=0.6atthe5%significancelevelwithpower90%,equalgroups
p2 n
0.90 39
0.80 105
0.70 473
0.65 1964
Table18.6.n2fordifferentn1andp2whenp1=0.05atthe5%significancelevelwithpower90%
p2n1
50 100 200 500 1000 2000 5000
![Page 598: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/598.jpg)
0.06 . . . . . . 237000
0.07 . . . . . 4500 2300
0.08 . . . . 1900 1200 970
0.10 . . 1500 630 472 420 390
0.15 5400 270 180 150 140 140 140
0.20 134 96 84 78 76 76 75
Wewouldrequire105ineachgrouptohavea90%chanceofshowingasignificantdifferenceifthepopulationproportionswere0.6and0.8.
Whenwedonothaveaclearideaofthevalueofp2inwhichweareinterested,wecancalculatethesamplesizerequiredforseveralproportions,asinTable18.5.Itisimmediatelyapparentthattodetectsmalldifferencesbetweenproportionsweneedverylargesamples.
Thecasewheresamplesareofequalsizeisusualinexperimentalstudies,butnotinobservationalstudies.Supposewewishtocomparetheprevalenceofacertainconditionintwopopulations.Weexpectthatinonepopulationitwillbe5%andthatitmaybemorecommonthesecond.Wecanrearrangetheequation:
Table18.6showsn2fordifferentn1andp2.Forsomevaluesofn1wegetanegativevalueofn2.Thismeansthatnovalueofn2islarge
![Page 599: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/599.jpg)
enough.Itisclear
thatwhentheproportionsthemselvesaresmall,thedetectionofsmalldifferencesrequiresverylargesamplesindeed.
18.6*DetectingacorrelationInvestigationsareoftensetuptolookforarelationshipbetweentwocontinuousvariables.Itisconvenienttotreatthisasanestimationofortestofacorrelationcoefficient.Thecorrelationcoefficienthasanawkwarddistribution,whichtendsonlyveryslowlytotheNormal,evenwhenbothvariablesthemselvesfollowaNormaldistribution.WecanuseFisher'sztransformation:
whichfollowsaNormaldistributionwithmean
andvariance1/(n-3)approximately,whereρisthepopulationcorrelationcoefficientandnisthesamplesize(§11.10).Forsamplesizecalculationswecanapproximatezρby
![Page 600: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/600.jpg)
Thuswehave
andwecanestimaten,ρorPgiventheothertwo.Table18.7showsthesamplesizerequiredtodetectacorrelationcoefficientwithapowerofP=0.9andasignificancelevelα=0.05.
Table18.7.Approximatesamplesizerequiredtodetectacorrelationatthe5%significancelevelwith
power90%
ρ n ρ n ρ n
0.01 100000 0.1 1000 0.6 25
0.02 26000 0.2 260 0.7 17
0.03 12000 0.3 110 0.8 12
0.04 6600 0.4 62 0.9 8
0.05 4200 0.5 38
![Page 601: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/601.jpg)
18.7*AccuracyoftheestimatedsamplesizeInthischapterIhaveassumedthatsamplesaresufficientlylargeforsamplingdistributionstobeapproximatelyNormalandforestimatesofvariancetobegoodestimates.Withverysmallsamplesthismaynotbethecase.Variousmoreaccuratemethodsexist,butanysamplesizecalculationisapproximateandexceptforverysmallsamples,saylessthan10,themethodsdescribedaboveshouldbeadequate.Whenthesampleisverysmall,wemightneedtoreplacethesignificancetestcomponentoff(α,P)bythecorrespondingnumberfromthetdistribution.
Thesemethodsdependonassumptionsaboutthesizeofdifferencesoughtandthevariabilityoftheobservations.Itmaybethatthepopulationtobestudiedmaynothaveexactlythesamecharacteristicsasthosefromwhichthestandarddeviationorproportionswereestimated.Thelikelyeffectsofchangesinthesecanbeexaminedbyputtingdifferentvaluesofthemintheformula.However,thereisalwaysanelementofventuringintotheunknownwhenembarkingonastudyandwecanneverbysurethatthesampleandpopulationwillbeasweexpect.Thedeterminationofsamplesizeasdescribedaboveisthusonlyaguide,anditisprobablyaswellalwaystoerronthesideofalargersamplewhencomingtoafinaldecision.
Thechoiceofpowerisarbitrary,inthatthereisnotoptimumchoiceofpowerforastudy.Iusuallyrecommend90%,but80%isoftenquoted.Thisgivessmallerestimatedsamplesizes,but,ofcourse,agreaterchanceoffailingtodetecteffects.
ForafullertreatmentofsamplesizeestimationandfullertablesseeMachinetal.(1998)andLemeshowetal.(1990).
18.8*TrialsrandomizedinclustersWhenwerandomizebyclusterratherthanindividual(§2.11)welosepowercomparedtoanindividually-randomizedtrialofthesamesize.Hencetogetthepowerwewant,wemustincreasethesamplesizefromthatrequiredforanindividuallyrandomizedtrial.Theratioofthenumberofpatientsrequiredforaclustertrialtothatforasimplyrandomizedtrialiscalledthedesigneffectofthestudy.Itdependson
![Page 602: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/602.jpg)
thenumberofsubjectspercluster.Forthepurposeofsamplesizecalculationsweusuallyassumethisisconstant.
Iftheoutcomemeasurementiscontinuous,e.g.serumcholesterol,asimple
methodofanalysisisbasedonthemeanoftheobservationsforallsubjectsinthecluster,andcomparesthesemeansbetweenthetreatmentgroups(§10.13).Wewilldenotethevarianceofobservationswithinoneclusterbys2wandassumethatthisvarianceisthesameforallclusters.Iftherearemsubjectsineachclusterthenthevarianceofasinglesamplemeaniss2w/m.Thetrueclustermean(unknown)willvaryfromclustertocluster,withvariances2c(see§10.12).Theobservedvarianceoftheclustermeanswillbethesumofthevariancebetweenclustersandthevariancewithinclusters,i.e.varianceofoutcome=s2c+s2w/m.Hencethestandarderrorforthedifferencebetweenmeansisgivenby
wheren1andn2arethenumbersofclustersinthetwogroups.Formosttrialsn1=n2=n.so
Hence,usingthegeneralmethodof§18.3,wecancalculatetherequirednumberofclustersby
Whentheoutcomeisadichotomous,‘yesorno’variable,wereplaces2wbyp(1-p),wherepistheprobabilityofa‘yes’.
Forexample,inaproposedstudyofabehaviouralinterventiontolowercholesterolingeneralpractice,practicesweretoberandomisedintotwogroups,onetoofferintensivedietaryinterventionbyspecially
![Page 603: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/603.jpg)
trainedpracticenursesusingabehaviouralapproachandtheothertousualgeneralpracticecare.Theoutcomemeasurewouldbemeancholesterollevelsinpatientsattendingeachpracticeoneyearlater.EstimatesofbetweenpracticevarianceandwithinpracticevariancewereobtainedfromtheMRCthrombosispreventiontrial(Meadeetal.1992)andweres2c=0.0046ands2w=1.28respectively.Theminimumdifferenceconsideredtobeclinicallyrelevantwas0.1mmol/l.Ifwerecruit50patientsperpractice,wewouldhaves2=s2w+s2w/m=0.0046+1.28/50=0.0302.IfwechoosepowerP=0.90andandsignificancelevelα=0.05,fromTable18.1f(P,α)=10.5.Thenumberofpracticesrequiredtodetectadifferenceof0.1mmol/lisgivenbyn=10.5×0.0302×2/0.12=63ineachgroup.Thiswouldgiveus63×50=3150patientsineachgroup.Acompletelyrandomizedtrialwithoutclusterswouldhaves2=0.0046+1.28=1.2846andwewouldneedn=10.5×1.2846×2/0.12=2698patientspergroup.Thusthedesigneffectofhavingclustersof50patientsis3150/2698=1.17.
Theequationforthedesigneffectis
Ifwecalculateanintra-classcorrelationcoefficient(ICC)fortheseclusters(§11.13),wehave
Inthiscontext,theICCiscalledtheintra-clustercorrelationcoefficient.Byabitofalgebraweget
DEEF=1+(m-1)ICC
Ifthereisonlyoneobservationpercluster,m=1andthedesigneffectis1.0andthetwodesignsarethesame.Otherwise,thelargertheICC,i.e.themoreimportantthevariationbetweenclustersis,thebiggerthedesigneffectandthemoresubjectswewillneedtogetthesamepowerasasimply-randomizedstudy.EvenasmallICCwillhaveanimpactiftheclustersizeislarge.TheX-rayguidelinesstudy(§10.13)
![Page 604: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/604.jpg)
hadICC=0.019.AstudywiththesameICCandm=50referralsperpracticewouldhavedesigneffectD=1+(50-1)×0.019=1.93.Thusitwouldrequirealmosttwiceasmanysubjectsasatrialwherepatientswererandomizedtotreatmentindividually.
ThemaindifficultyincalculatingsamplesizeforclusterrandomizedstudiesisobtaininganestimateofthebetweenclustervariationorICC.Estimatesofvariationbetweenindividualscanoftenbeobtainedfromtheliteraturebutevenstudiesthatusetheclusterastheunitofanalysismaynotpublishtheirresultsinsuchawaythatthebetweenpracticevariationcanbeestimated.Donneretal.(1990),recognizingthisproblem,recommendedthatauthorspublishthecluster-specificeventratesobservedintheirtrial.Thiswouldenableotherworkerstousethisinformationtoplanfurtherstudies.
Insometrials,wheretheinterventionisdirectedattheindividualsubjectsandthenumberofsubjectsperclusterissmall,wemayjudgethatthedesigneffectcanbeignored.Ontheotherhand,wherethenumberofsubjectsperclusterislarge,anestimateofthevariabilitybetweenclusterswillbeveryimportant.Whenthenumberofclustersisverysmall,wemayhavetousesmallsampleadjustmentsmentionedin§18.7.
18M*Multiplechoicequestions98to100(Eachansweristrueorfalse)
98.*Thepowerofatwo-samplettest:
(a)increasesifthesamplesizesareincreased;
(b)dependsonthedifferencebetweenthepopulationmeanswhichwewishtodetect;
(c)dependsonthedifferencebetweenthesamplemeans;
(d)istheprobabilitythatthetestwilldetectagivenpopulationdifference;
(e)cannotbezero.
ViewAnswer
![Page 605: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/605.jpg)
99.*Thesamplesizerequiredforastudytocomparetwoproportions:
(a)dependsonthemagnitudeoftheeffectwewishtodetect;
(b)dependsonthesignificancelevelwewishtoemploy;
(c)dependsonthepowerwewishtohave;
(d)dependsontheanticipatedvaluesoftheproportionsthemselves;
(e)shouldbedecidedbyaddingsubjectsuntilthedifferenceissignificant.
ViewAnswer
100.*Thesamplesizerequiredforastudytoestimateamean:
(a)dependsonthewidthoftheconfidenceintervalwhichwewant;
(b)dependsonthevariabilityofthequantitybeingstudied;
(c)dependsonthepowerwewishtohave;
(d)dependsontheanticipatedvalueofthemean;
(e)dependsontheanticipatedvalueofthestandarddeviation.
ViewAnswer
18E*Exercise:Estimationofsamplesizes1.Whatsamplesizewouldberequiredtoestimatea95%referenceintervalusingtheNormaldistributionmethod,sothatthe95%confidenceintervalforthereferencelimitswereatmost20%ofthereferenceintervalsize?
ViewAnswer
2.Howbigasamplewouldberequiredforanopinionpollstertoestimatevoterpreferencestowithintwopercentagepoints?
![Page 606: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/606.jpg)
ViewAnswer
3.Mortalityfrommyocardialinfarctionafteradmissiontohospitalisabout15%.Howmanypatientswouldberequiredforaclinicaltrialtodetecta10%reductioninmortality,i.e.to13.5%,ifthepowerrequiredwas90%?Howmanywouldbeneededifthepowerwereonly80%?
ViewAnswer
4.Howmanypatientswouldberequiredinaclinicalstudytocompareanenzymeconcentrationinpatientswithaparticulardiseaseandcontrols,ifdifferencesoflessthanonestandarddeviationwouldnotbeclinicallyimportant?Iftherewasalreadyasampleofmeasurementsfrom100healthycontrols,howmanydiseasecaseswouldberequired?
ViewAnswer
5.Inaproposedtrialofahealthpromotionprogramme,theprogrammewastobeimplementedacrossawholecounty.Theplanwastousefourcounties,twocountiestobeallocatedtoreceivetheprogrammeandtwocountiestoactascontrols.Theprogrammewouldbeevaluatedbyasurveyofsamplesofabout750subjectsdrawnfromtheat-riskpopulationsineachcounty.Aconventionalsamplesizecalculation,whichignoredtheclustering,hadindicatedthat1500subjectsineachtreatmentgroupwouldberequiredtogivepower80%todetecttherequireddifference.Theapplicantswereawareoftheproblemofclusterrandomisationandtheneedtotakeitintoaccountintheanalysis,e.g.byanalysisatthelevelofthecluster(county).Theyhadanestimateoftheintraclustercorrelation=0.005,basedonapreviousstudy.Theyarguedthatthiswassosmallthattheycouldignoretheclustering.Weretheycorrect?
ViewAnswer
![Page 607: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/607.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>TableofContents>19-Solutionstoexercises
19
Solutionstoexercises
Someofthemultiplechoicequestionsarequitehard.Ifyouscore+1foracorrectanswer,-1foranincorrectanswer,and0forapartwhichyouomitted,Iwouldregard40%asthepasslevel,50%asgood,60%asverygood,and70%asexcellent.Thesequestionsarehardtosetandsomemaybeambiguous,soyouwillnotscore100%.
SolutiontoExercise2M:Multiplechoicequestions1to61.FFFFF.Controlsshouldbetreatedinthesameplaceatthesametime,underthesameconditionsotherthanthetreatmentundertest(§2.1).Allmustbewillingandeligibletoreceiveeithertreatment(§2.4).
2.FTFTF.Randomallocationisdonetoachievecomparablegroups,allocationbeingunrelatedtothesubjects'characteristics(§2.2).Theuseofrandomnumbershelpstopreventbiasinrecruitment(§2.3).
3.TFFFT.Patientsdonotknowtheirtreatment,buttheyusuallydoknowthattheyareinatrial(§2.9).Notthesameasacross-overtrial(§2.6).
4.FFFFF.Vaccinatedandrefusingchildrenareself-selected(§2.4).Weanalysebyintentiontotreat(§2.5).Wecancompareeffectofavaccinationprogrammebycomparingwholevaccinationgroup,vaccinatedandrefuserstothecontrols.
5.TFTTT.§2.6.Theorderisrandomized.
![Page 608: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/608.jpg)
6.FFTTT.§2.8,§2.9.Thepurposeofplacebosismakedissimilartreatmentsappearsimilar.Onlyinrandomizedtrialscanwerelyoncomparability,andthenonlywithinthelimitsofrandomvariation(§2.2).
SolutiontoExercise2E1.ItwashopedthatwomenintheKYMgroupwouldbemoresatisfiedwiththeircare.Theknowledgethattheywouldreceivecontinuityofcarewasanimportantpartofthetreatment,andsothelackofblindnessisessential.MoredifficultisthatKYMwomenweregivenachoiceandsomayhavefeltmorecommittedtowhicheverscheme,KYMorstandard,theyhadchosen,thandidthecontrolgroup.Wemustacceptthiselementofpatientcontrolaspartofthetreatment.
2.Thestudyshouldbe(andwas)analysedbyintentiontotreat(§2.5).Asoftenhappens,therefusersdidworsethandidtheacceptorsofKYM,andworsethan
thecontrolgroup.WhenwecompareallthoseallocatedtoKYMwiththoseallocatedtocontrol,thereisverylittledifference(Table19.1).
Table19.1.MethodofdeliveryintheKYMstudy
Methodofdelivery
AllocatedtoKYM
Allocatedtocontrol
% n % n
Normal 79.7 382 74.8 354
Instrumental 12.5 60 17.8 84
Caesarian 7.7 37 7.4 35
![Page 609: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/609.jpg)
3.Womenhadbookedforhospitalantenatalcareexpectingthestandardservice.Thoseallocatedtothisthereforereceivedwhattheyhadrequested.ThoseallocatedtotheKYMschemewereofferedatreatmentwhichtheycouldrefuseiftheywished,refusersgettingthecareforwhichtheyhadoriginallybooked.Noextraexaminationswerecarriedoutforresearchpurposes,theonlyspecialdatabeingthequestionnaires,whichcouldberefused.Therewasthereforenoneedtogetthewomen'spermissionfortherandomization.Ithoughtthiswasaconvincingargument.
SolutiontoExercise3M:Multiplechoicequestions7to137.FTTTT.Apopulationcanbeanything(§3.3).
8.TFFFT.Acensustellsuswhoisthereonthatday,andonlyappliestocurrentin-patients.Thehospitalcouldbequiteunusual.Somediagnosesarelesslikelythanotherstoleadtoadmissionortolongstay(§3.2).
9.TFFTF.Allmembersandallsampleshaveequalchancesofbeingchosen(§3.4).Wemuststicktothesampletherandomprocessproduces.Errorscanbeestimatedusingconfidenceintervalsandsignificancetests.Choicedoesnotdependonthesubject'scharacteristicsatall,exceptforitsbeinginthepopulation.
10.FTTFT.Somepopulationsareunidentifiableandsomecannotbelistedeasily(§3.4).
11.FFFTF.Inacase-controlstudywestartwithagroupwiththedisease,thecases,andagroupwithoutthedisease,thecontrols(§3.8).
12.FTFTT.Wemusthaveacohortorcasecontrolstudytogetenoughcases(§3.7,§3.8).
13.TTTTF.Thisisarandomclustersample(§3.4).Eachpatienthadthesamechanceoftheirhospitalbeingchosenandthenthesamechanceofbeingchosenwithinthehospital.Thiswouldnotbesoifwechoseafixednumberfromeachhospitalratherthanafixedproportion,as
![Page 610: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/610.jpg)
thoseinsmallhospitalswouldbemorelikelytobechosenthanthoseinlargehospitals.Inpart(e).whataboutasamplewithpatientsineveryhospital?
SolutiontoExercise3E1.Manycasesofinfectionmaybeunreported,butthereisnotmuchthatcouldbedoneaboutthat.Manyorganismsproducesimilarsymptoms,hencethe
needforlaboratoryconfirmation.Therearemanysourcesofinfection,includingdirecttransmission,hencetheexclusionofcasesexposedtootherwatersuppliesandtoinfectedpeople.
2.Controlsmustbematchedforageandsexasthesemayberelatedtotheirexposuretoriskfactorssuchashandlingrawmeat.Inclusionofcontrolswhomayhavehadthediseasewouldhaveweakenedanyrelationshipswiththecause,andthesameexclusioncriteriawereappliedasforthecases,tokeepthemcomparable.
3.Dataareobtainedbyrecall.Casesmayremembereventsinrelationtothediseasemoreeasilythatthancontrolsinrelationtothesametime.Casesmayhavebeenthinkingaboutpossiblecausesofthediseaseandsobemorelikelytorecallmilkattacks.Thelackofpositiveassociationwithanyotherriskfactorssuggeststhatthisisnotimportanthere.
4.Iwasconvinced.Therelationshipisverystrongandthesescavengingbirdsareknowntocarrytheorganism.Therewasnorelationshipwithanyotherriskfactor.Theonlyproblemisthattherewaslittleevidencethatthesebirdshadactuallyattackedthemilk.Othershavesuggestedthatcatsmayalsoremovethetopsofmilkbottlestodrinkthemilkandmaybetherealculprits(Balfour1991).
5.Furtherstudies:testingofattackedmilkbottlesforCampylobacter(havetowaitforthenextyear).Possiblyacohortstudy,askingpeopleabouthistoryofbirdattacksanddrinkingattackedmilk,thenfollowforfutureCampylobacter(andother)infections.Possiblyaninterventionstudy.Advisepeopletoprotecttheirmilkandobservethesubsequentpatternofinfection.
![Page 611: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/611.jpg)
SolutiontoExercise4M:Multiplechoicequestions14to1914.TFFTF.§4.1.Parityisquantitativeanddiscrete,heightandbloodpressurearecontinuous.
15.TTFTF.§4.1.Agelastbirthdayisdiscrete,exactageincludesyearsandfractionofayear.
16.FFTFT.§4.4,§4.6.Itcouldhavemorethanonemode,wecannotsay.Standarddeviationislessthanvarianceifthevarianceisgreaterthanone(§4.7,8).
17.TTTFT.§4.2,3,4.Meanandvarianceonlytellusthelocationandspreadofthedistribution(§4.6,7).
18.TFTFT.§4.5,6,7.Median=2,theobservationsmustbeorderedbeforethecentraloneisfound,mode=2,range=7-1=6,variance=22/4=5.5.
19.FFFFT.§4.6,7,8.Therewouldbemoreobservationsbelowthemeanthanabove,becausethemedianwouldbelessthanthemean.Mostobservationswillbewithinonestandarddeviationofthemeanwhatevertheshape.Thestandarddeviationmeasureshowwidelythebloodpressureisspreadbetweenpeople,notforasingleperson,whichwouldbeneededtoestimateaccuracy.Seealso§15.2.
Fig.19.1.Stemandleafplotofbloodglucose
![Page 612: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/612.jpg)
Fig.19.2.Boxandwhiskerplotofbloodglucose
SolutiontoExercise4E1.ThestemandleafplotisshowninFigure19.1:
2.Minimum=2.2,maximum=6.0.Themedianistheaverageofthe20thand21storderedobservations,sincethenumberofobservationsiseven.Theseareboth4.0,sothemedianis4.0.Thefirstquartileisbetweenthe10thand11th,whichareboth3.6.Thethirdquartileisbetweenthe30thand31stobservations,whichare4.5and4.6.Wehaveq=0.75,i=0.75×41=30.75,andthequartileisgivenby4.5+(4.6-4.5)×0.75=4.575(§4.5).TheboxandwhiskerplotisshowninFigure19.2.
![Page 613: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/613.jpg)
Fig.19.3.Histogramofbloodglucose
3.Thefrequencydistributionisderivedeasilyfromthestemandleafplot:
Interval Frequency
2.0–2.4 1
2.5–2.9 1
3.0–3.4 6
3.5–3.9 10
4.0–4.4 11
4.5–4.9 8
5.0–5.4 2
![Page 614: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/614.jpg)
5.5–5.9 0
6.0–6.4 1
Total 40
4.ThehistogramisshowninFigure19.3.Thedistributionissymmetrical.
5.Themeanisgivenby
Thedeviationsandtheirsquaresareasfollows:
xi xi-[xwithbarabove] (xi-[xwithbarabove])2
4.7 0.65 0.4225
4.2 0.15 0.0225
3.9 -0.15 0.0225
3.4 -0.65 0.4225
Total 16.2 0.00 0.8900
Therearen-1=4-1=3degreesoffreedom.Thevarianceisgivenby
6.Asbefore,thesumis∑xi=16.2,Thesumofsquaresaboutthemeanisthengivenby∑xi2=66.5and
Thisisthesameasfoundin5above,so,asbefore,
![Page 615: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/615.jpg)
7.Forthemeanwehave∑xi=162.2,
Thesumofsquaresaboutthemeanisgivenby:
Therearen-1-40-1=39degreesoffreedom.Thevarianceisgivenby
9.Forthelimits,[xwithbarabove]-2s=4.055-2×0.698=2.659,[xwithbarabove]-s=4.055-0.698=3.357,[xwithbarabove]=4.055,[xwithbarabove]+s=4.055+0.698=4.753,and[xwithbarabove]+2s=4.055+2×0.698=5.451.Figure19.3showsthemeanandstandarddeviationmarkedonthehistogram.Themajorityofpointsfallwithinonestandarddeviationofthemeanandnearlyallwithintwostandarddeviationsofthemean.Becausethedistributionissymmetrical,itextendsjustbeyondthe[xwithbarabove]±2spointsoneitherside.
SolutiontoExercise5M:Multiplechoicequestions20to2420.FTTTT.§5.1,§5.2.Withoutacontrolgroupwehavenoideahowmanywouldgetbetteranyway(§2.1).66.67%is2/3.Wemayonlyhave3patients.
21.TFFTT.§5.2.Tothreesignificantfigures,itshouldbe1730.Weroundupbecauseofthe9.Tosixdecimalplacesitis1729.543710.
22.FTTFT.Thisisabarchartshowingtherelationshipbetweentwovariables(§5.5).SeeFigure19.4.Calendartimehasnotruezerotoshow.
23.TTFFT.§5.9,§5A.Thereisnologarithmofzero.
![Page 616: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/616.jpg)
24.FFTTT.§5.5,6,7.Ahistogram(§4.3)andapiechart(§5.4)eachshowthedistributionofasinglevariable.
Fig.19.4.Adubiousgraphrevised
Table19.2.CalculationsforapiechartfortheTootingBecdata
Category Frequency Relativefrequency Angle
Schizophrenia 474 0.32311 116
Affectiveillness 277 0.18882 68
Organicbrainsyndrome
405 0.27607 99
![Page 617: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/617.jpg)
Subnormality 58 0.03954 14
Alcoholism 57 0.03885 14
Other 196 0.13361 48
Total 1467 1.00000 359
SolutiontoExercise5E1.Thisisthefrequencydistributionofaqualitativevariable,soapiechartcanbeusedtodisplayit.ThecalculationsaresetoutinTable19.2.Noticethatwehavelostonedegreethroughroundingerrors.Wecouldworktofractionsofadegree,buttheeyeisunlikelytospotthedifference.ThepiechartisshowninFigure19.5.
2.SeeFigure19.6.
3.Thereareseveralpossibilities.Intheoriginalpaper,DollandHillusedaseparatebarchartforeachdisease,similartoFigure19.7.
4.Linegraphscanbeusedhere,aswehavesimpletimeseries(Figure19.8).Foranexplanationofthedifferencebetweenyears,see§13E.
SolutiontoExercise6M:Multiplechoicequestions25to3125.TTFFF.§6.2.Iftheyaremutuallyexclusivetheycannotbothhappen.Thereisnoreasonwhytheyshouldbeequiprobableorexhaustive,theonlyeventswhichcanhappen(§6.3).
26.TFTFT.Forboth,theprobabilitiesaremultiplied,0.2×0.05=0.01(§6.2).
Clearlytheprobabilityofbothmustbelessthanthatforeachone.The
![Page 618: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/618.jpg)
probabilityofbothis0.01,sotheprobabilityofXaloneis0.20-0.01=0.19andtheprobabilityofYaloneis0.05-0.01=0.04.TheprobabilityofhavingXorYistheprobabilityofXalone+probabilityofYalone+probabilityofXandYtogether,becausethesearethreemutuallyexclusiveevents.HavingXandhavingYarenotmutuallyexclusiveasshecanhaveboth.HavingXtellsusnothingaboutwhethershehasY.IfshehasXtheprobabilityofhavingYisstill0.05,becauseXandYareindependent.
Fig.19.5.PiechartshowingthedistributionofpatientsinTootingBecHospitalbydiagnosticgroup
![Page 619: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/619.jpg)
Fig.19.6.BarchartshowingtheresultsoftheSalkvaccinetrial
27.TFTFF.§6.4.Weightiscontinuous.Patientsrespondornotwithequalprobability,beingselectedatrandomfromapopulationwheretheprobabilityofrespondingvaries.ThenumberofredcellsmightfollowaPoissondistribution(§6.7);thereisnosetofindependenttrials.Thenumberofhypertensivesfollows
aBinominaldistribution,nottheproportion
![Page 620: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/620.jpg)
Fig.19.7.MortalityinBritishdoctorsbysmokinghabits,afterDollandHill(1956)
Fig.19.8.LinegraphsforgeriatricadmissionsinWandsworthinthesummersof1982and1983
28.TTTTF.Theprobabilityofclinicaldiseaseis0.5×0.5=0.25.Theprobabilityofcarrierstatus=probabilitythatfatherpassesthegeneandmotherdoesnot+probabilitythatmotherpassesthegeneandfatherdoesnot=0.5×0.5+0.5×0.5=0.5.Probabilityofnotinheritingthegene=0.5×0.5=0.25.Probabilityofnothavingclinicaldisease=1-0.25=0.75.Successivechidrenareindependent,sotheprobabilitiesforthesecondchildareunaffectedbythefirst(§6.2)
29.FTTFT.§6.3,4.Theexpectednumberisone(§6.6).Thespinsareindependent(§6.2).Atleastonetailmeansonetail(PROB=0.5)ortwotails(PROB=0.25).Thesearemutuallyexclusive,sotheprobabilityofatleastonetailis0.5+0.25=0.75.
![Page 621: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/621.jpg)
Table19.3.Probabilityofsurvivingtodifferentages
Survivetoage Probability Surviveto
age Probability
10 0.959 60 0.758
20 0.952 70 0.524
30 0.938 80 0.211
40 0.920 90 0.022
50 0.876 100 0.000
30.FTTFT.§6.6.E(X=2)=µ+2,VAR(2X)=4σ2.
31.TTTFF.§6.6.Thevarianceofadifferenceisthesumofthevariances.Variancescannotbenegative.VAR(-X)=(-1)2×VAR(X)=VAR(X).
SolutiontoExercise6E1.Probabilityofsurvivaltoage10.Thisillustratesthefrequencydefinitionofprobability.959outof1000survive,sotheprobabilityis959/1000=0.959.
2.Survivalanddeatharemutuallyexclusive,exhaustiveevents,soPROB(survives)+PROB(dies)=1.HencePROB(dies)=1-0.959=0.041.
3.Thesearethenumbersurvivingdividedby1000(Table19.3).Theeventsarenotmutuallyexclusive,e.g.amancannotsurvivetoage20
![Page 622: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/622.jpg)
ifhedoesnotsurvivetoage10.Thisdoesnotformaprobabilitydistribution.
4.Theprobabilityisfoundby
5.Independentevents.PROB(survival60to70)=0.691,
PROB(bothsurvive)=0.691×0.691=0.477.
6.Theproportionsurvivingonaverageistheprobabilityofsurvival=0.691.Soaproportionof0.691ofthe100survive.Weexpect0.691×100=69.1tosurvive.
7.Theprobabilityisfoundby
8.Asin7,wefindprobabilitiesofdyingforeachdecade(Table19.4).Thisisasetofmutuallyexclusiveeventsandtheyareexhaustive–thereisnootherdecadeinwhichdeathcantakeplace.Thesumoftheprobabilitiesistherefore1.0.ThedistributionisshowninFigure19.9.
9.Wefindtheexpectedvaluesormeanofaprobabilitydistributionbysummingeachvaluetimesitsprobability(§6.4),togivelifeexpectancyatbirth=66.6
years(Table19.5).
Table19.4.Probabilityofdyingineachdecade
Decade Probabilityofdying Decade Probabilityof
dying
![Page 623: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/623.jpg)
1st 0.041 6th 0.118
2nd 0.007 7th 0.234
3rd 0.014 8th 0.313
4th 0.018 9th 0.189
5th 0.044 10th 0.022
Fig.19.9.Probabilitydistributionofdecadeofdeath
SolutiontoExercise7M:Multiplechoicequestions32to3732.TTTFT.§7.2,3,4.
![Page 624: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/624.jpg)
33.FFFTT.Symmetrical,µ=0,σ=1(§7.3,§4.6).
34.TTFFF.§7.2.Median=mean.TheNormaldistributionhasnothingtodowithnormalphysiology.2.5%willbelessthan260,2.5%willbegreaterthan340litres/min.
Table19.5.Calculationofexpectationoflife
5×0.041=0.20515×0.007=0.10525×0.014=0.35035×0.018=0.63045×0.044=1.98055×0.118=6.49065×0.234=15.21075×0.313=23.47585×0.189=16.06595×0.022=2.090Total66.600
Fig.19.10.Histogramofthebloodglucosedatawiththe
![Page 625: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/625.jpg)
correspondingNormaldistributioncurve,andNormalplot
35.FTTFF.§4.6,§7.3.Thesamplesizeshouldnotaffectthemean.Therelativesizesofmean,medianandstandarddeviationdependontheshapeofthefrequencydistribution.
36.TFTTF.§7.2,§7.3.Adding,subtractingormultiplyingbyaconstant,oraddingorsubtractinganindependentNormalvariablegivesaNormaldistribution.X2followsaveryskewChi-squareddistributionwithonedegreeoffreedomandX/Yfollowsatdistributionwithonedegreeoffreedom(§7A).
37.TTTTT.Agentleslopeindicatesthatobservationsarefarapart,asteepslopethattherearemanyobservationsclosetogether.Hencegentle-steep-gentle(‘S’shaped)indicateslongtails(§7.5).
SolutiontoExercise7E1.Theboxandwhiskerplotshowsaveryslightdegreeofskewness,thelowerwhiskerbeingshorterthantheupperandthelowerhalfoftheboxsmallerthantheupper.FromthehistogramitappearsthatthetailsarealittlelongerthantheNormalcurveofFigure7.10wouldsuggest.Figure19.10showstheNormaldistributionwiththesamemeanandvariancesuperimposedonthehistogram,whichalsoindicatesthis.
2.Wehaven=40.Fori=1to40wewanttocalculate(i-0.5)/n=(2i-1)/2n.Thisgivesusaprobability.WeuseTable7.1tofindthevalueoftheNormaldistributioncorrespondingtothisprobability.Forexample,fori=1wehave
FromTable7.1wecannotfindthevalueofxcorrespondingtoΦ(x)=0.0125directly,butweseethatx=-2.3correspondstoΦ(x)=0.011andx=-2.2toΦ(x)=0.014.Φ(x)=0.0125ismid-waybetweentheseprobabilitiessowecanestimatethevalueofxasmid-waybetween-2.3and-2.2,giving-2.25.Thiscorrespondstothelowestblood
![Page 626: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/626.jpg)
glucose,2.2.Fori=2wehaveΦ(x)=0.0375.Referringtothetablewehavex=-1.8,Φ(x)=0.036andx=-1.7,Φ(x)=0.045.ForΦ(x)=0.0375wemusthavexjustabove-1.8,about-1.78.The
correspondingbloodglucoseis2.9.Wedonothavetobeveryaccuratebecauseweareonlyusingthisplotforaroughguide.Wegetasetofprobabilitiesasfollows:
i (2i-1)/2n=Φ(x) x Bloodglucose
1 1/80=0.0125 -2.25 2.2
2 3/80=0.0375 -1.78 2.9
3 5/80=0.625 -1.53 3.3
4 7/80=0.0875 -1.36 3.3
andsoon.BecauseofthesymmetryoftheNormaldistribution,fromi=21onwardsthevaluesofxarethosecorrespondingto40-i+1,butwithapositivesign.TheNormalplotisshowninFigure19.10.
3.Thepointsdonotlieonastraightline.Therearepronouncedbendsneareachend.Thesebendsreflectratherlongtailsofthedistributionofbloodglucose.Ifthelineshowedasteadycurve,risinglesssteeplyasthebloodglucosevalueincreased,thiswouldshowsimpleskewnesswhichcanoftenbecorrectedbyalogtransformation.Thiswouldnotworkhere;thebendatthelowerendwouldbemadeworse.
Thedeviationfromastraightlineisnotverygreat,compared,say,tothevitaminDmeasurementsinFigure7.12.AsweseeinChapter10,suchsmalldeviationsfromtheNormaldonotusuallymatter.
SolutiontoExercise8M:Multiplechoicequestions38to43
39.FTFTF.§8.3.Thesamplemeanisalwaysinthemiddleofthelimits.
![Page 627: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/627.jpg)
41.TTTFF.§8.1,2,§6.4)Varianceisp(1-p)/n=0.1×0.9/100=0.0009.ThenumberinthesamplewiththeconditionfollowsaBinomialdistribution,nottheproportion.
42.FFTTT.ItdependsonthevariabilityofFEV1andthenumberinthesample(§8.2).Thesampleshouldberandom(§3.3,4).
43.FFTTF.§8.3,4.Itisunlikelythatwewouldgetthesedataifthepopulationratewere10%,butnotimpossible.
SolutiontoExercise8E1.Theintervalwillbe1.96standarddeviationslessthanandgreaterthanthemean.Thelowerlimitis0.810-1.96×0.057=0.698mmol/litre.Theupperlimitis0.810+1.96×0.057=0.922mmol/litre.
2.Forthediabetics,themeanis0.719andthestandarddeviation0.068,sothelowerlimitof0.698willbe(0.698-0.719)/0.068=-0.309standarddeviationsfromthemean.FromTable7.1,theprobabilityofbeingbelowthisis0.38,sotheprobabilityofbeingaboveis1-0.38=0.62.Thustheprobabilitythataninsulin-dependentdiabeticwouldbewithinthereferenceintervalwouldbe0.62or62%.Thisistheproportionwerequire.
4.The95%confidenceintervalisthemean±1.96standarderrors.Forthecontrols,0.810-1.96×0.00482to0.810+1.96×0.00482givesus0.801to0.819mmol/litre.Thisismuchnarrowerthantheintervalofpart1.Thisisbecausetheconfidenceintervaltellsushowfarthesamplemeanmightbefromthepopulationmean.The95%referenceintervaltellsushowfaranindividualobservationmightbefromthepopulationmean.
5.Thegroupsareindependent,sothestandarderrorofthedifferencebetweenmeansisgivenby:
![Page 628: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/628.jpg)
6.Thedifferencebetweenthemeansis0.719-0.810=-0.091mmol/litre.The95%confidenceintervalisthus-0.091-1.96×0.00660to-0.091+1.96×0.00660,giving-0.104to-0.078.Hencethemeanplasmamagnesiumlevelforinsulindependentdiabeticsisbetween0.078and0.104mmol/litrebelowthatofnon-diabetics.
7.Althoughthedifferenceissignificant,thiswouldnotbeagoodtestbecausethemajorityofdiabeticsarewithinthe95%referenceinterval.
SolutiontoExercise9M:Multiplechoicequestions44to4944.FTFFF.Thereisevidenceforarelationship(§9.6),whichisnotnecessarilycausal.Theremaybeotherdifferencesrelatedtocoffeedrinking,suchassmoking(§3.8).
46.TTFTT.§9.2.Itisquitepossibleforeithertobehigheranddeviationsineitherdirectionareimportant(§9.5).n=16becausethesubjectgivingthesamereadingonbothgivesnoinformationaboutthedifferenceandisexcludedfromthetest.Theordershouldberandom,asinacross-overtrial(§2.6).
47.FFFFT.Thetrialissmallandthedifferencemaybeduetochance,buttheremayalsobealargetreatmenteffect.Wemustdoabiggertrialtoincreasethepower(§9.9).Addingcaseswouldcompletelyinvalidatethetest.Ifthenullhypothesisistrue,thetestwillgivea‘significant’resultonein20times.Ifwekeepaddingcasesanddoingmanytestswehaveaveryhighchanceofgettinga‘significant’resultononeofthem,eventhoughthereisnotreatmenteffect(§9.10).
48.TFTTF.Largesamplemethodsdependonestimatesofvarianceobtained
![Page 629: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/629.jpg)
fromthedata.Thisestimategetsclosertothepopulationvalueasthesamplesizeincreases(§9.7,§9.8).Thechanceofanerrorofthefirstkindisthesignificancelevelsetinadvance,say5%.Thelargerthesamplethemorelikelywearetodetectadifferenceshouldoneexist(§9.9).Thenullhypothesisdependsonthephenomenaweareinvestigating,notonthesamplesize.
49.FTFFT.Wecannotconcludecausationinanobservationalstudy(§3.6,7,8),butwecanconcludethatthereisevidenceofadifference(§9.6).0.001istheprobabilityofgettingsolargeadifferenceifthenullhypothesisweretrue(§9.3).
SolutiontoExercise9E1.Bothcontrolgroupsaredrawnfrompopulationswhichwereeasytogetto,onebeinghospitalpatientswithoutgastro-intestinalsymptoms,theotherbeingfracturepatientsandtheirrelatives.Botharematchedforageandsex;Mayberryetal.(1978)alsomatchedforsocialclassandmaritalstatus.Apartfromthematchingfactors,wehavenowayofknowingwhethercasesandcontrolsarecomparable,oranywayofknowingwhethercontrolsarerepresentativeofthegeneralpopulation.Thisisusualincasecontrolstudiesandisamajorproblemwiththisdesign.
2.Therearetwoobvioussourcesofbias:interviewswerenotblindandinformationisbeingrecalledbythesubject.Thelatterisparticularlyaproblemfordataaboutthepast.InJames'studysubjectswereaskedwhattheyusedtoeatseveralyearsinthepast.Forthecasesthiswasbeforeadefiniteevent,onsetofCrohn'sdisease,forthecontrolsitwasnot,thetimebeingtimeofonsetofthediseaseinthematchedcase.
3.ThequestioninJames'studywas‘whatdidyoutoeatinthepast?’,thatinMayberryetal.(1978)was‘whatdoyoueatnow?’
4.Ofthe100patientswithCrohn'sdisease,29werecurrenteatersofcornflakes.Of29caseswhoknewofthecornflakesassociation,12wereex-eatersofcornflakes,andamongtheother71cases21wereex-eatersofcornflakes,givingatotalof33pastbutnotpresenteatersofcornflakes.Combiningthesewiththe29currentconsumers,weget62
![Page 630: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/630.jpg)
caseswhohadatsometimebeenregulareatersofcornflakes.Ifwecarryoutthesamecalculationforthecontrols,weobtain3+10=13pasteatersandwith22currenteatersthisgives35sometimeregularcornflakeseaters.Casesweremorelikelythancontrolstohaveeatencornflakesregularlyatsometime,theproportionofcasesreportinghavingeatencornflakesbeingalmosttwiceasgreatasforcontrols.ComparethistoJames'data,where17/68=25%ofcontrolsand23/34=68%ofcases,2.7timesasmany,hadeatencornflakesregularly.Theresultsaresimilar.
5.TherelationshipbetweenCrohn'sdiseaseandreportedconsumptionofcornflakeshadamuchsmallerprobabilityforthesignificancetestandhencestrongerevidencethatarelationshipexisted.Also,onlyonecasehadnevereatencornflakes(itwasalsothemostpopularcerealamongcontrols).
6.OftheCrohn'scases,67.6%(i.e.23/34)reportedhavingeatencornflakesregularlycomparedto25.0%ofcontrols.Thuscaseswere67.6/25.0=2.7times
aslikelyascontrolstoreporthavingeatencornflakes.Thecorrespondingratiosfortheothercerealsare:wheat,2.7;porridge,1.5;rice,1.6;bran,6.1;muesli,2.7.Cornflakesdoesnotstandoutwhenwelookatthedatainthisway.Thesmallprobabilitysimplyarisesbecauseitisthemostpopularcereal.ThePvalueisapropertyofthesample,notofthepopulation.
7.WecanconcludethatthereisnoevidencethateatingcornflakesismorecloselyrelatedtoCrohn'sdiseasethanisconsumptionofothercereals.ThetendencyforCrohn'scasestoreportexcessiveeatingofbreakfastfoodsbeforeonsetofthediseasemaybetheresultofgreatervariationindietthanincontrols,astheytrydifferentfoodsinresponsetotheirsymptoms.Theymayalsobemorelikelytorecallwhattheyusedtoeat,beingmoreawareoftheeffectsofdietbecauseoftheirdisease.
SolutiontoExercise10M:Multiplechoicequestions50to56
![Page 631: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/631.jpg)
50.FFTFT.§10.2.ItisequivalenttotheNormaldistributionmethod(§8.3).
51.FTFTF.§10.3.Whetherthe(population)meansareequaliswhatwearetryingtofindout.ThelargesamplecaseisliketheNormaltestof(§9.7),exceptforthecommonvarianceestimate.Itisvalidforanysamplesize.
52.FTTFF.TheassumptionofNormalitywouldnotbemetforasmallsamplettest(§10.3)withouttransformation(§10.4),butforalargesamplethedistributionfollowedbythedatawouldnotmatter(§9.7).Thesigntestisforpaireddata.Wehavemeasurements,notqualitativedata.
53.FTTFF.§10.5.Themoredifferentthesamplesizesare,theworseistheapproximationtothetdistribution.Whenbothsamplesarelarge,thisbecomesalargesampleNormaldistributiontest(§9.7).Groupingofdataisnotaseriousproblem.
54.TFFTT.APvalueconveysmoreinformationthanastatementthatthedifferenceissignificantornotsignificant.Aconfidenceintervalwouldbeevenbetter.Whatisimportantishowwellthediagnostictestdiscriminates,i.e.byhowmuchthedistributionsoverlap,notanydifferenceinmean.SemencountcannotfollowaNormaldistributionbecausetwostandarddeviationsexceedthemeanandsomeobservationswouldbenegative(§7.4).Approximatelyequalnumbersmakethettestveryrobustbutskewnessreducesthepower(§10.5).
56.FTTFT.§10.9.Sumsofsquaresanddegreesoffreedomaddup,meansquaresdonot.Threegroupsgivestwodegreesoffreedom.Wecanhaveanysizesofgroups.
SolutiontoExercise10E1.ThedifferencesforcomplianceareshowninTable19.6.ThestemandleafplotisshowninFigure19.11.
![Page 632: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/632.jpg)
Table19.6.Differencesandmeansforstaticcompliance
Patient Constant Decelerating Difference Mean
1 65.4 72.9 -7.5 69.15
2 73.7 94.4 -20.7 84.05
3 37.4 43.3 -5.9 40.35
4 26.3 29.0 -2.7 27.65
5 65.0 66.4 -1.4 65.70
6 35.2 36.4 -1.2 35.80
7 24.7 27.7 -3.0 26.20
8 23.0 27.5 -4.5 25.25
9 133.2 178.2 -45.0 155.70
10 38.4 39.3 -0.9 38.85
11 29.2 31.8 -2.6 30.50
12 28.3 26.9 1.4 27.60
![Page 633: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/633.jpg)
13 46.6 45.0 1.6 45.80
14 61.5 58.2 3.3 59.85
15 25.7 25.7 0.0 25.70
16 48.7 42.3 6.4 45.50
Fig.19.11.Stemandleafplotforcompliance
2.TheplotofdifferenceagainstmeanisFigure19.12.Thedistributionishighlyskewedandthedifferencecloselyrelatedtothemean.
3.Thesumandsumofthesquareddifferencesare∑di=-82.7and∑di2
=2648.43,hencethemeanis[dwithbarabove]=-82.7/16=-5.16875.Forthesumofsquaresaboutthemean
![Page 634: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/634.jpg)
4.Wehave15degreesoffreedomandfromTable7.1the5%pointofthetdistributionis2.13.The95%confidenceintervalis-5.16875-2.13×3.04205to-5.16875+2.13×3.04205,giving-11.6to+1.3.
Fig.19.12.Differenceversusmeanforcompliance
![Page 635: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/635.jpg)
6.The95%confidenceintervalis-0.028688-2.13×0.012503to-0.028688+2.13×0.012503whichgives-0.055312to-0.002057.Thishasnotbeenrounded,becauseweneedtotransformthemfirst.Ifwetransformtheselimitsbackbytakingtheantilogsweget0.880to0.995.Thismeansthatthecompliancewithadeceleratingwaveformisbetween0.880and0.995timesthatwithaconstantwaveform.Thereissomeevidencethatwaveformhasaneffect,whereaswiththeuntransformeddatatheconfidenceintervalforthedifferenceincludedzero.Becauseoftheskewnessoftherawdatatheconfidenceintervalwastoowide.
7.Wecanconcludethatthereissomeevidenceofareductioninmeancompliance,whichcouldbeupto12%(from(1-0.880)×100),butcouldalsobenegligiblysmall.
SolutiontoExercise11M:Multiplechoicequestions57to6157.FFTTF.Outcomeandpredictorvariablesareperfectlyrelatedbutdonotlieonastraightline,sor<1(§11.9).
![Page 636: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/636.jpg)
Fig.19.13.Stemandleafplotsforlogcompliance
Table19.7.Differenceandmeanforlogtransformedcompliance(tobase10)
Patient Constant Decelerating Difference Mean
1 1.816 1.863 -0.047 1.8395
2 1.867 1.975 -0.108 1.9210
3 1.573 1.636 -0.063 1.6045
4 1.420 1.462 -0.042 1.4410
5 1.813 1.822 -0.009 1.8175
6 1.547 1.561 -0.014 1.5540
7 1.393 1.442 -0.049 1.4175
8 1.362 1.439 -0.077 1.4005
9 2.125 2.251 -0.126 2.1880
10 1.584 1.594 -0.010 1.5890
![Page 637: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/637.jpg)
11 1.465 1.502 -0.037 1.4835
12 1.452 1.430 0.022 1.4410
13 1.668 1.653 0.015 1.6605
14 1.789 1.765 0.024 1.7770
15 1.410 1.410 0.000 1.4100
16 1.688 1.626 0.062 1.6570
Fig.19.14.Differenceversusmeanforlogcompliance
![Page 638: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/638.jpg)
58.FTFFF.Knowledgeofthepredictortellsussomethingabouttheoutcomevariable(§6.2).Thisisnotastraightlinerelationship.Forpartofthescaletheoutcomevariabledecreasesasthepredictorincreases,thentheoutcomevariableincreasesagain.Thecorrelationcoefficientwillbeclosetozero(§11.9).Alogarithmictransformationwouldworkiftheoutcomeincreasedmoreandmorerapidlyasthepredictorincreased(§5.9).
59.FFFTT.Aregressionlineusuallyhasnon-zerointerceptandslope,whichhavedimensions(§11.3).ExchangingXandYchangestheline(§11.4).
60.FTTFF.Thepredictorvariablehasnoerrorintheregressionmodel(§11.3).Transformationsareonlyusedifneededtomeettheassumptions(§11.8).Thereisascatterabouttheline(§11.3).
61.TTFFF.§11.9,10.Thereisnodistinctionbetweenpredictorandoutcome.rshouldnotbeconfusedwiththeregressioncoefficient(§11.3).
SolutiontoExercise11E1.Theslopeisfoundby
Forfemales,
Formales,
2.Forthestandarderror,wefirstneedthevariancesabouttheline:
thenthestandarderroris
![Page 639: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/639.jpg)
Forfemales:
Formales:
3.Thestandarderrorofthedifferencebetweentwoindependentvariablesisthesquarerootofthesumoftheirstandarderrorssquared(§8.5):
Thesampleisreasonablylarge,almostattaining50ineachgroup,sothisstandarderrorshouldbefairlywellestimatedandwecanusealargesampleNormalapproximation.The95%confidenceintervalisthus1.96standarderrorsoneithersideoftheestimate.Theobserveddifferenceisbf-bm=2.8710-3.9477=-1.0767.The95%confidenceintervalisthus-1.0767-1.96×1.7225=-4.5to-1.0767+1.96×1.7225=2.3.Ifthesamplesweresmall,wecoulddothisusingthetDistribution,butwewouldneedtoestimateacommonvariance.Itwouldbebettertousemultipleregression,testingtheheight×sexinteraction(§17.3).
4.Forthetestofsignificancetheteststatisticisobserveddifferenceoverstandarderror:
![Page 640: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/640.jpg)
Ifthenullhypothesisweretrue,thiswouldbeanobservationfromaStandardNormaldistribution.FromTable7.2,P>0.5.
SolutiontoExercise12M:Multiplechoicequestions62to6662.TFTFF.§10.3,§12.2.ThesignandWilcoxontestsareforpaireddata(§9.2,§12.3).Rankcorrelationlooksfortheexistenceofrelationshipsbetweentwoordinalvariables,notacomparisonbetweentwogroups(§12.4,§12.5).
63.TTFFT.§9.2,§12.2,§10.3,§12.5.TheWilcoxontestisforintervaldata(§12.3).
64.FTFTT.§12.5.Thereisnopredictorvariableincorrelation.Logtransformationwouldnotaffecttherankorderoftheobservations.
65.FTFFT.IfNormalassumptionsaremetthemethodsusingthemarebetter(§12.7).Estimationofconfidenceintervalsusingrankmethodsisdifficult.Rankmethodsrequiretheassumptionthatthescaleisordinal,i.e.thatthedatacanberanked.
66.TFTTF.Weneedapairedtest:t,signorWilcoxon(§10.2,§9.2,§12.3).
SolutiontoExercise12E1.ThedifferencesareshowninTable19.6.Wehave4positive,11negativeand1zero.Underthenullhypothesisofnodifference,thenumberofpositivesisfromtheBinomialdistributionwithp=0.5,n=15.Wehaven=15becausethesinglezerocontributesnoinformationaboutthedirectionofthedifference.ForPROB(r≤4)wehave
![Page 641: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/641.jpg)
Ifwedoublethisforatwo-sidedtestweget0.11848,againnotsignificant.
2.UsingtheWilcoxonmatchedpairstestweget
Diff. -0.9 -1.2 -1.4 1.4 1.6 -2.6 -2.7 -3.0
Rank 1 2 3.5 3.5 5 6 7 8
Diff. 3.3 -4.5 -5.9 6.4 -7.5 -20.7 -45.0
Rank 9 10 11 12 13 14 15
Asforthesigntest,thezeroisomitted.SumofranksforpositivedifferencesisT=3.5+5+9+12=29.5.FromTable12.5the5%pointforn=15is25,whichTexceeds,sothedifferenceisnotsignificantatthe5%level.Thethreetestsgivesimilaranswers.
3.UsingthelogtransformeddifferencesinTable19.7,westillhave4positives,11negativesand1zero,withasigntestprobabilityof0.11848.Thetransformationdoesnotalterthedirectionofthechangesandsodoesnotaffectthesigntest.
4.FortheWilcoxonmatchedpairstestonthelogcompliance:
Diff. -0.009 -0.010 -0.014 0.015 0.022 0.024
Rank 1 2 3 4 5 6
Diff. -0.037 -0.042 -0.047 -0.049 0.062 -0.063
Rank 7 8 9 10 11 12
Diff. -0.077 -0.108 -0.126
Rank 13 14 15
HenceT=4+5+6+11-26.Thisisjustabovethe5%pointof25andisdifferentfromthatintheuntransformeddata.Thisisbecausethetransformationhasalteredtherelativesizeofthedifferences.Thistest
![Page 642: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/642.jpg)
assumesintervaldata.Bychangingtoalogscalewehavemovedtoascalewherethedifferencesaremorecomparable,becausethechangedoesdependonthemagnitudeoftheoriginalvalue.Thisdoesnothappenwiththeotherranktests,theMann–WhitneyUtestandrankcorrelationcoefficients,whichinvolvenodifferencing.
5.Althoughthereisapossibilityofareductionincomplianceitdoesnotreachtheconventionallevelofsignificance.
6.Theconclusionsarebroadlysimilar,buttheeffectoncomplianceismorestronglysuggestedbythetmethod.ProvidedthedatacanbetransformedtoapproximateNormalitythetdistributionanalysisismorepowerful,andasitalsogivesconfidenceintervalsmoreeasily,Iwouldpreferit.
SolutiontoExercise13M:Multiplechoicequestions67to7367.TFFFF.§13.3.80%of4isgreaterthan3,soallexpectedfrequenciesmustexceed5.Thesamplesizecanbeassmallas20,ifallrowandcolumntotalsare10.
68.FTFTF.§13.1,§13.3.(5-1)×(3-1)=8degreesoffreedom,80%×15=12cellsmusthaveexpectedfrequencies>5.ItisO.K.foranobservedfrequencytobezero.
69.TTFTF.§13.1,§13.9.Thetwotestsareindependent.Thereare(2-1)×(2-1)=1degreeoffreedom.WithsuchlargenumbersYates'correctiondoesnotmakemuchdifference.Withoutitwegetχ2=124.5,withitwegetχ2=119.4(§13.5.).
70.TTTTT.§13.4,5.Thefactorialsoflargenumberscanbedifficulttocalculate.
71.TTTTF.§13.7.
72.TTFTT.Chi-squaredfortrendandτbwillbothtestthenullhypothesisofnotrendinthetable,butanordinarychi-squaredtestwillnot(§13.8).Theoddsratio(OR)isanestimateoftherelativeriskforacase-controlstudy(§13.7).
73.TTFFF.Thetestcomparesproportionsinmatchedsamples(§13.9).
![Page 643: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/643.jpg)
Forarelationship,weusethechi-squaredtest(§13.1).PEFRisacontinuousvariable,weusethepairedtmethod(§10.2).Fortwoindependentsamplesweusethechi-squaredtest(§13.1).
SolutiontoExercise13E1.Theheatwaveappearstobegininweek10andcontinuetoincludeweek17.Thisperiodwasmuchhotterthanthecorrespondingperiodof1982.
Table19.8.Cross-tabulationoftimeperiodbyyearforgeriatricadmissions
Year
Period
TotalBeforeheatwave
Duringheatwave
Afterheatwave
1982 190 110 82 382
1983 180 178 110 468
Total 370 288 192 850
Table19.9.ExpectedfrequenciesforTable19.8
Year
Period
TotalBefore During After
![Page 644: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/644.jpg)
heatwave heatwave heatwave
1982 166.3 129.4 86.3 382.0
1983 203.7 158.6 105.7 468.0
Total 370.0 288.0 192.0 850.0
2.Therewere178admissionsduringtheheatwavein1983and110inthecorrespondingweeksof1982.Wecouldtestthenullhypothesisthatthesecamefromdistributionswiththesameadmissionrateandwewouldgetasignificantdifference.Thiswouldnotbeconvincing,however.Itcouldbeduetootherfactors,suchastheclosureofanotherhospitalwithresultingchangesincatchmentarea.
3.Thecross-tabulationisshowninTable19.8.
4.Thenullhypothesisisthatthereisnoassociationbetweenyearandperiod,inotherwordsthatthedistributionofadmissionsbetweentheperiodswillbethesameforeachyear.TheexpectedvaluesareshowninTable19.9.
5.Thechi-squaredstatisticisgivenby:
Thereare2rowsand3columns,givingus(2-1)×(3-1)=2degreesoffreedom.Thuswehavechi-squared=11.8with2degreesoffreedom.FromTable13.3weseethatthishasprobabilityoflessthan0.01.Thedataarenotconsistentwiththenullhypothesis.Theevidencesupportstheviewthatadmissionsrosebymorethancouldbeascribedtochanceduringthe1983heatwave.Wecannotbecertainthatthiswasduetotheheatwaveandnotsomeotherfactorwhichhappenedtooperateatthesametime.
![Page 645: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/645.jpg)
6.Wecouldseewhetherthesameeffectoccurredinotherdistrictsbetween1982and1983.Wecouldalsolookatolderrecordstoseewhethertherewasasimilarincreaseinadmissions,sayfortheheatwavesof1975and1976.
SolutiontoExercise14M:Multiplechoicequestions74to8074.TFFTT.Table14.2.
75.FTTTT.§14.5.
76.FFFFT.Regression,correlationandpairedtmethodsneedcontinuousdata(§11.3,§11.9,§10.2).Kendall'sτcanbeusedfororderedcategories.
77.TFTFF.§14.2.
78.TFTTT.AttestcouldnotbeusedbecausethedatadonotfollowaNormaldistribution(10.3).Theexpectedfrequencieswillbetoosmallforachi-squaredtest(§13.3),butatrendtestwouldbeO.K.(§13.8).Agoodnessoffittestcouldbeused(§13.10).
79.FTTFT.Asmall-sample,pairedmethodisneeded(Table14.4).
80.TFTFF.ForatwobytwotablewithsmallexpectedfrequencieswecanuseFisher'sexacttestorYates'correction(§13.4,5).McNemar'stestisinappropriatebecausethegroupsarenotmatched(§13.9).
SolutiontoExercise14E1.Overallpreference:wehaveonesampleofpatientssoweuse(Table14.2).Ofthese12preferredA,14preferredBand4didnotexpressapreference.WecanuseaBinomialorsigntest(§9.2),onlyconsideringthosewhoexpressedapreference.ThoseforAarepositives,thoseforBarenegatives.Wegettwo-sidedP=0.85,notsignificant.
Preferenceandorder:wehavetherelationshipbetweentwovariables(Table14.3),preferenceandorder,bothnominal.Wesetupatwowaytableanddoachi-squaredtest.Forthe3by2tablewehavetwoexpectedfrequencieslessthanfive,sowemusteditthetable.There
![Page 646: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/646.jpg)
arenoobviouscombinations,butwecandeletethosewhoexpressednopreference,leavinga2by2table,χ2=1.3,1degreeoffreedom,P>0.05.
2.Thedataarepaired(Table14.2)soweuseapairedttest(§10.2).TheassumptionofaNormaldistributionforthedifferencesshouldbemetasPEFRitselffollowsaNormaldistributionfairlywell.Wegett=6.45/5.05=1.3,degreesoffreedom=31,whichisnotsignificant.Usingt=2.04(Table10.1)wegeta95%confidenceintervalof-3.85to16.75litres/min.
3.Wehavetwoindependentsamples(Table14.1).Wemustusethetotalnumberofpatientswerandomizedtotreatments,inanintentiontotreatanalysis(§2.5).Thuswehave1721activetreatmentpatientsincluding15deaths,and1706placebopatientswith35deaths.Achi-squaredtestgivesusχ2=8.3,d.f.=1,P<0.01.Acomparisonoftwoproportionsgivesadifferenceof-0.0118with95%confidenceinterval-0.0198to-0.0038(§8.6)andtestofsignificanceusingtheStandardNormaldistributiongivesavalueof2.88,P<0.01,(§9.8).
4.Wearelookingattherelationshipbetweentwovariables(Table14.3).Bothvariableshaveverynon-Normaldistributions.NitriteishighlyskewandpHisbimodal.ItmightbepossibletotransformthenitritestoaNormaldistributionbutthetransformationwouldnotbeasimpleone.Thezeropreventsasimplelogarithmictransformation,forexample.Becauseofthis,regressionand
correlationarenotappropriateandrankcorrelationcanbeused.Spearman'sρ=0.58andKendall'sτ=0.40,bothgivingaprobabilityof0.004.
5.Wehavetwoindependentsamples(Table14.1).WehavetwolargesamplesandcandotheNormalcomparisonoftwomeans(§8.5).Thestandarderrorofthedifferenceis0.0178sandtheobserveddifferenceis0.02s,givinga95%confidenceintervalof-0.015to0.055fortheexcessmeantransittimeinthecontrols.Ifwehadallthedata,foreachcasewecouldcalculatethemeanMTTforthetwocontrolsmatchedtoeachcase,findthedifferencebetweencaseMTTandcontrolmeanMTT,andusetheonesamplemethodof§8.3.
![Page 647: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/647.jpg)
6.Thesearepaireddata,sowerefertoTable14.2.Theunequalstepsinthevisualacuityscalesuggestthatitisbesttreatedasanordinalscale,sothesigntestisappropriate.Preminuspost,thereare10positivedifferences,nonegativedifferencesand7zeros.Thuswerefer0totheBinomialdistributionwithp=0.5andn=10.Theprobabilityisgivenby
7.Wewanttotestfortherelationshipbetweentwovariables,whicharebothpresentedascategorical(Table14.3).Weuseachi-squaredtestforacontingencytable,χ2=38.1,d.f.=6,P<0.001.Onepossibilityisthatsomeothervariable,suchasthemother'ssmokingorpoverty,isrelatedtobothmaternalageandasthma.Anotheristhatthereisacohorteffect.Alltheage14–19motherswerebornduringthesecondworldwar,andsomecommonhistoricalexperiencemayhaveproducedtheasthmaintheirchildren.
8.Theserialmeasurementsofthyroidhormonecouldbesummarizedusingtheareaunderthecurve(§10.7).Theoxygendependenceistricky.Thebabieswhodiedhadtheworstoutcome,butifwetooktheirsurvivaltimeasthetimetheywereoxygendependent,wewouldbetreatingthemasiftheyhadagoodoutcome.Wemustalsoallowforthebabieswhowenthomeonoxygenhavingalongbutunknownoxygendependence.Mysolutionwastoassignanarbitrarylargenumberofdays,largerthananyforthebabiessenthomewithoutoxygen,tothebabiessenthomeonoxygen.Iassignedanevenlargernumberofdaysto
thebabieswhodied.IthenusedKendall'staub(§12.5)toassesstherelationshipwiththyroidhormoneAUC.Kendall'srankcorrelationwaschoseninpreferencetoSpearman'sbecauseofthelargenumberoftieswhichthearbitraryassignmentoflargenumbersproduced.
9.Thisisacomparisonoftwoindependentsamples,soweuseTable14.1.Thevariableisintervalandthesamplesaresmall.Wecouldeitherusethetwosampletmethod(10.3)ortheMann–WhitneyUtest
![Page 648: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/648.jpg)
(§12.2).Thegroupshavesimilarvariances,butthedistributionshowsaslightnegativeskewness.AsthetwosampletmethodisfairlyrobusttodeviationsfromtheNormaldistributionandasIwantedaconfidenceintervalforthedifferenceIchosethisoption.Ididnotthinkthattheslightskewnesswassufficienttocauseanyproblems.
Bythetwosampletmethodwegetthedifferencebetweenthemeans,immobile-mobile,tobe7.06,standarderror=5.74,t=1.23,P=0.23,95%confidenceinterval=-4.54to18.66hours.BytheMann-Whitney,wegetU=178.5,z=-1.06,P=0.29.Thetwomethodsgiveverysimilarresultsandleadtosimilarconclusions,asweexpectthemtodowhenbothmethodsarevalid.
SolutiontoExercise15M:Multiplechoicequestions81to8681.TFTTF.§15.2.Unlessthemeasurementprocesschangesthesubject,wewouldexpectthedifferenceinmeantobezero.
82.TFTFF.§15.4.Weneedthesensitivityaswellasspecificity.Thereareotherthings,dependentonthepopulationstudied,whichmaybeimportanttoo,likethepositivepredictivevalue.
83.FTTTF.§15.4.Specificity,notsensitivity,measureshowwellpeoplewithoutthediseaseareeliminated.
84.TTFFF.§15.5.The95%referenceintervalshouldnotdependonthesamplesize.
85.FFFFT.§15.5.Weexpect5%of‘normal’mentobeoutsidetheselimits.Thepatientmayhaveadiseasewhichdoesnotproduceanabnormalhaematocrit.Thisreferenceintervalisformen,notwomenwhomayhaveadifferentdistributionofhaematocrit.Itisdangeroustoextrapolatethereferenceintervaltoadifferentpopulation.Infact,forwomenthereferenceintervalis35.8to45.4,puttingawomanwithahaematocritof48outsidethereferenceinterval.Ahaematocritoutsidethe95%referenceintervalsuggeststhatthemanmaybeill,althoughitdoesnotproveit.
86.TFTTT.§15.6.Astimeincreases,ratesarebasedonfewerpotentialsurvivors.Withdrawalsduringthefirstintervalcontributehalfan
![Page 649: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/649.jpg)
intervalatrisk.Ifsurvivalrateschangethosesubjectsstartinglaterincalendartime,andsomorelikelytobewithdrawn,willhaveadifferentsurvivaltothosestartingearlier.Thefirstpartofthecurvewillrepresentadifferentpopulationtothesecond.Thelongestsurvivormaystillbealiveandsobecomeawithdrawal.
SolutiontoExercise15E1.Theblooddonorswereusedbecauseitwaseasytogettheblood.Thiswouldproduceasampledeficientinolderpeople,soitwassupplementedbypeopleattendingdaycentres.Thiswouldensurethatthesewerereasonablyactive,healthypeoplefortheirage.Giventheproblemofgettingbloodandthelimitedresourcesavailable,thisseemsafairlysatisfactorysampleforthepurpose.Thealternativewouldbetotakearandomsamplefromthelocalpopulationandtrytopersuadethemtogivetheblood.Theremighthavebeensomanyrefusalsthatvolunteerbiaswouldmakethesampleunrepresentativeanyway.Thesampleisalsobiasedgeographically,beingdrawnfromonepartofLondon.Inthecontextofthestudy,wherewewantedtocomparediabeticswithnormals,thisdidnotmattersomuch,asbothgroupscamefromthesameplace.Forareferenceintervalwhichwouldapplynationally,iftherewereageographicalfactortheintervalwouldbebiassedinotherplaces.Tolookatthiswewouldhavetorepeatthestudyinseveralplaces,comparetheresultingreferenceintervalsandpoolasappropriate.
2.Wewantnormal,healthypeopleforthesample,sowewanttoexcludepeoplewithobviouspathologyandespeciallythosewithdiseaseknowntoaffectthequantitybeingmeasured.However,ifweexcludedallelderlypeoplecurrentlyreceivingdrugtherapywewouldfinditverydifficulttoasufficientlylargesample.Itisindeed‘normal’fortheelderlytobetakinganalgesicsandhypnotics,sothesewerepermitted.
3.FromtheshapeofthehistogramandtheNormalplot,thedistributionofplasmamagnesiumdoesindeedappearNormal.
4.Thereferenceinterval,outsidewhichabout5%ofnormalvaluesare
![Page 650: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/650.jpg)
expectedtolie,is[xwithbarabove]-2sto[xwithbarabove]+2s,or0.810-×0.057to0.810+2×0.057,whichis0.696to0.924,or0.70to0.92mmol/litre.
5.AsthesampleislargeandthedataNormallydistributedthestandarderrorofthelimitsisapproximately
Forthe95%confidenceintervalwetake1.96standarderrorsoneithersideofthelimit,1.96×0.0083439=0.016.The95%confidenceintervalforthelowerreferencelimitis0.696-0.016to0.696+0.016=0.680to0.712or0.68to0.71mmol/litre.Theconfidenceintervalfortheupperlimitis0.924-0.016to0.696+0.016=0.908to0.940or0.91to0.94mmol/litre.Thereferenceintervaliswellestimatedasfarassamplingerrorsareconcerned.
6.Plasmamagnesiumdidindeedincreasewithage.Thevariabilitydidnot.Thiswouldmeanthatforolderpeoplethelowerlimitwouldbetoolowandtheupperlimittoohigh,asthefewabovethiswouldallbeelderly.Wecouldsimplyestimatethereferenceintervalseparatelyatdifferentages.Wecoulddothisusingseparatemeansbutacommonestimateofvariance,obtainedbyone-wayanalysisofvariance(§10.9).Orwecouldusetheregressionofmagnesiumon
agetogetaformulawhichwouldpredictthereferenceintervalforanyage.Themethodchosenwoulddependonthenatureoftherelationship.
SolutiontoExercise16M:Multiplechoicequestions87to9287.FTFFF.§16.1.Itisforaspecificagegroup,notageadjusted.Itmeasuresthenumberofdeathsperpersonatrisk,notthetotalnumber.Ittellsusnothingaboutagestructure.
88.FTTTT.§16.4.Thelifetableiscalculatedfromagespecificdeathrates.Expectationoflifeistheexpectedvalueofthedistributionofageatdeathifthesemortalityratesapply(§6E).Itusuallyincreaseswithage.
![Page 651: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/651.jpg)
89.TFTTF.TheSMR(§16.3)forwomenwhohadjusthadababyislowerthan100(allwomen)and105(stillbirthwomen).Theconfidenceintervalsdonotoverlapsothereisgoodevidenceforadifference.Womenwhohadhadastillbirthmaybelessormorelikelythanallwomentocommitsuicide,wecannottell.Wecannotconcludethatgivingbirthpreventssuicide–itmaybethatoptimistsconceive,forexample.
90.TFFFF.§16.3.Ageeffectshavebeenadjustedfor.Itmayalsobethatheavydrinkersbecomepublicans.Itisdifficulttoinfercausationfromobservationaldata.Menathighriskofcirrhosisoftheliver,i.e.heavydrinkers,maynotbecomewindowcleaners,orwindowcleanerswhodrinkmaychangetheiroccupation,whichrequiresgoodbalance.Windowcleanershavelowrisk.The‘average’ratiois100,not1.0.
91.FFFTF.§16.6.Alifetabletellsusaboutmortality,notpopulationstructure.Abarchartshowstherelationshipbetweentwovariables,nottheirfrequencydistribution(§5.5).
92.TFFFT.§16.1,§16.2,§16.5.Expectationoflifedoesnotdependonagedistribution(§16.4).
SolutiontoExercise16E1.Weobtaintheratesforthewholeperiodbydividingthenumberofdeathsinanagegroupbythepopulationsize.Thusforages10–14wehave44/4271=0.01030casesperthousandpopulation.Thisisfora13yearperiodsotherateperyearis0.01030/13=0.00079per1000peryear,or0.79permillionperyear.Table19.10showstheratesforeachagegroup.Theratesareunusualbecausetheyarehighestamongtheadolescentgroup,wheremortalityratesformostcausesarelow.Andersonetal.(1985)notethat‘…ourresultssuggestthatamongadolescentmalesabuseofvolatilesubstancescurrentlyaccountfor2%ofdeathsfromallcauses…’.Theratesarealsounusualbecausewehavenotcalculatedthemseparatelyforeachsex.Thisispartlyforsimplicityandpartlybecausethenumberofcasesinmostagegroupsissmallasitis.
2.TheexpectednumberofdeathsbymultiplyingthenumberintheagegroupinScotlandbythedeathratefortheperiod,i.e.per13years,
![Page 652: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/652.jpg)
forGreatBritain.Wethenaddthesetoget27.19deathsexpectedaltogether.Weobserved48,sotheSMRis48/27.19=1.77,or177withGreatBritainas100.
Table19.10.Age-specificmortalityratesforvolatilesubstanceabuse,GreatBritain,andcalculationof
SMRforScotland
Agegroup
GreatBritainASMRs
Scotlandpopulation(thousands)
Scotlandexpecteddeaths
Permillionperyear
Perthousandper13years
0–9 0.00 0.00000 653 0.00000
10–14
0.79 0.01030 425 4.37750
15–19
2.58 0.03358 447 15.01026
20–24
0.87 0.01137 394 4.47978
25–29
0.32 0.00415 342 1.41930
![Page 653: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/653.jpg)
30–39
0.08 0.00108 659 0.71172
40–49
0.03 0.00033 574 0.18942
50–59
0.09 0.00112 579 0.64848
60+ 0.03 0.00037 962 0.35594
Total 27.19240
3.WefindthestandarderroroftheSMRby
The95%confidenceintervalisthen1.77-1.96×0.2548to1.77+1.96×0.2548,or1.27to2.27.Multiplyingby100asusual,weget127to227.TheobservednumberisquitelargeenoughfortheNormalapproximationtothePoissondistributiontobeused.
4.Yes,theconfidenceintervaliswellawayfromzero.Otherfactorsrelatetothedatacollection,whichwasfromnewspapers,coroners,deathregistrationsetc.ScotlandhasdifferentnewspapersandothernewsmediaandadifferentlegalsystemtotherestofGreatBritain.ItmaybethattheassociationofdeathswithVSAismorelikelytobereportedtherethaninEnglandandWales.
SolutiontoExercise17M:Multiplechoicequestions93to9793.TFTFT.§17.2.Itistheratiooftheregressionsumofsquarestothetotalsumofsquares.
![Page 654: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/654.jpg)
94.FTFFF.§17.2.Therewere37+1=38observations.Thereisahighlysignificantethnicgroupeffect.Thenon-significantsexeffectdoesnotmeanthatthereisnodifference(§9.6).Therearethreeagegroups,sotwodegreesoffreedom.Iftheeffectofethnicityweredueentirelytoage,itwouldhavedisappearedwhenagewasincludedinthemodel.
95.TTTTF.§17.8.Afour-levelfactorhasthreedummyvariables(§17.6).Iftheeffectofwhitecellcountweredueentirelytosmoking,itwouldhavedisappearedwhensmokingwasincludedinthemodel.
96.TTTFT.§17.4
97.FFFFT.§17.9.Boyshavealowerriskofreadmissionthangirls,shownbythenegativecoefficient,andhencealongertimebeforebeingreadmitted.Theophilineisrelatedtoalowerriskofreadmissionbutwecannotconcludecausation.Treatmentmaydependonthetypeandseverityofasthma.
SolutiontoExercise17E1.Thedifferenceishighlysignificant(P<0.001)andisestimatedtobebetween1.3and3.7,i.e.volumesarehigheringroup2,thetrisomy-16group.
2.FromboththeNormalplotandtheplotagainstnumberofpairsofsomitesthereappearstobeonepointwhichmayberatherseparatefromtherestofthedata,anoutlier.Inspectionofthedatashowednoreasontosupposethatthepointwasanerror,soitwasretained.OtherwisethefittotheNormaldistributionseemsquitegood.Theplotagainstnumberofpairsofsomitesshowsthattheremaybearelationshipbetweenmeanandvariability,butthisverysmallandwillnotaffecttheanalysistoomuch.Thereisalsoapossiblenon-linearrelationship,whichshouldbeinvestigated.(Theadditionofaquadratictermdidnotimprovethefitsignificantly.)
3.Modeldifferenceinsumofsquares=207.139-197.708=9.431,residualsumofsquares=3.384,Fratio=9.431/3.384=2.79with1and36degreesoffreedom,correspondingtot=1.67,P>0.1,notsignificant.
![Page 655: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/655.jpg)
SolutiontoExercise18M:Multiplechoicequestions98to10098.TTFTT.§9.9.Powerisapropertyofthetest,notthesample.Itcannotbezero,asevenwhenthereisnopopulationdifferenceatallthetestmaybesignificant.
99.TTTTF.§18.5.Ifwekeeponaddingobservationsandtesting,wearecarryingoutmultipletestingandsoinvalidatethetest(§9.10).
100.TTFFT.§18.1.Powerisnotinvolvedinestimation.
SolutiontoExercise18E
3.Thisisacomparisonoftwoproportions(§18.5).Wehavep1=0.15andp2=0.15×0.9=0.135,areductionof10%.Withapowerof90%andasignificancelevelof5%,wehave
![Page 656: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/656.jpg)
Henceweneed11400ineachgroup,22800patientsaltogether.Withapowerof80%andasignificancelevelof5%,wehave
Henceweneed8577ineachgroup,17154patientsaltogether.Loweringthepowerreducestherequiredsamplesize,but,ofcourse,reducesthechanceofdetectingadifferenceiftherereallyisone.
4.Thisisthecomparisonoftwomeans(§18.4).Weestimatethesamplesizeforadifferenceofonestandarddeviation,µ1-µ2=σ.Withapowerof90%andasignificancelevelof5%,thenumberineachgroupisgivenby
Henceweneed21ineachgroup.Ifwehaveunequalsamplesandn1=100,n2isgivenby
![Page 657: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/657.jpg)
andsoweneed12subjectsinthediseasegroup.
5.Whenthenumberofclustersisverysmallandthenumberofindividualswithinaclusterislarge,asinthisstudy,clusteringcanhaveamajoreffect.Thedesigneffect,bywhichtheestimatedsamplesizeshouldbemultiplied,isDEFF=1+(750-1)×0.005=4.745.Thustheestimatedsamplesizeforanygivencomparisonshouldbemultipliedby4.745.Lookingatitanotherway,theeffectivesamplesizeistheactualsamplesize,3000,dividedby4.745,about632.Further,samplesizecalculationsshouldtakeintoaccountdegreesoffreedom.Inlargesampleapproximationsamplesizecalculations,power80%andalpha5%areembodiedinthemultiplierf(α,P)=f(0.05,0.80)=(1.96+0.85)2=7.90.Forasmallsamplecalculationusingthettest,1.96mustbereplacedbythecorresponding5%pointofthetdistributionwiththeappropriatedegreesoffreedom,here2degreesoffreedomgivingt=4.30.Hencethemultiplieris(4.30+0.85)2=26.52,3.36timesthatforthelargesample.
Theeffectofthesmallnumberofclusterswouldreducetheeffectivesamplesizeevenmore,downto630/3.36=188.Thusthe3000menintwogroupsoftwoclusterswouldgivethesamepowertodetectthesamedifferenceas188menrandomizedindividually.Theapplicantsresubmittedaproposalwithmanymoreclusters.
![Page 658: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/658.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>References
References
Altman,D.G.(1982).Statisticsandethicsinmedicalresearch.InStatisticsinPractice(ed.S.M.GoreandD.G.Altman).BritishMedicalAssociation,London.
Altman,D.G.(1991).PracticalStatisticsforMedicalResearch,ChapmanandHall,London.
Altman,D.G.(Confidenceintervalsforthenumberneededtotreat)(1998).BritishMedicalJournal,317,1309–12.
Altman,D.G.andBland,J.M.(1983).Measurementinmedicine:theanalysisofmethodcomparisonstudies.TheStatistician,32,307–17.
Altman,D.G.andMatthews,J.N.S.(1996).StatisticsNotes:Interaction1:heterogeneityofeffects.BritishMedicalJournal,313,486.
Anderson,H.R.,Bland,J.M.,Patel,S.,andPeckham,C.(1986).Thenaturalhistoryofasthmainchildhood.JournalofEpidemiologyandCommunityHealth,40,121–9.
Anderson,H.R.,MacNair,R.S.,andRamsey,J.D.(1985).Deathsfromabuseofsubstances,anationalepidemiologicalstudy.BritishMedicalJournal,290,304–7.
Anon(1997).Alltrialsmusthaveinformedconsent.BritishMedicalJournal,314,1134–5.
![Page 659: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/659.jpg)
Appleby,L.(1991).Suicideduringpregnancyandinthefirstpostnatalyear.BritishMedicalJournal,302,137–40.
Armitage,P.andBerry,G.(1994).StatisticalMethodsinMedicalResearch,Blackwell,Oxford.
Balfour,R.P.(1991).Birds,milkandcampylobacter.Lancet,337,176.
Ballard,R.A.,Ballard,P.C.,Creasy,R.K.,Padbury,J.,Polk,D.H.,Bracken,M.,Maya,F.R.,andGross,I.(1992).Respiratorydiseaseinvery-low-birthweightinfantsafterprenatalthyrotropinreleasinghormoneandglucocorticoid.Lancet,339,510–5.
Banks,M.H.,Bewley,B.R.,Bland,J.M.,Dean,J.R.,andPollard,V.M.(1978).Alongtermstudyofsmokingbysecondaryschoolchildren.ArchivesofDiseaseinChildhood,53,12–19.
Bewley,B.R.andBland,J.M.(1976).Academicperformanceandsocialfactorsrelatedtocigarettesmokingbyschoolchildren.BritishJournalofPreventiveandSocialMedicine,31,18–24.
Bewley,B.R.,Bland,J.M.,andHarris,R.(1974).Factorsassociatedwiththestartingofcigarettesmokingbyprimaryschoolchildren.BritishJournalofPreventiveandSocialMedicine,28,37–44.
Bewley,T.H.,Bland,J.M.,Ilo,M.,Walch,E.,andWillington,G.(1975).Censusofmentalhospitalpatientsandlifeexpectancyofthoseunlikelytobedischarged.BritishMedicalJournal,4,671–5.
Bewley,T.H.,Bland,J.M.,Mechen,D.,andWalch,E.(1981).‘Newchronic’patients.BritishMedicalJournal,283,1161–4.
![Page 660: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/660.jpg)
Bland,J.M.andAltman,D.G.(1986).Statisticalmethodsforassessingagreementbetweentwomethodsofclinicalmeasurement.Lancet,i,307–10.
Bland,J.M.andAltman,D.G.(1993).Informedconsent.BritishMedicalJournal,306,928.
Bland,J.M.andAltman,D.G.(1998).StatisticsNotes.Bayesiansandfrequentists.BritishMedicalJournal,317,1151.
Bland,J.M.andAltman,D.G.(1999).Measuringagreementinmethodcomparisonstudies.StatisticalMethodsinMedicalResearch,8,135–60.
Bland,J.M.,Bewley,B.R.,Banks,M.H.,andPollard,V.M.(1975).Schoolchildren'sbeliefsaboutsmokinganddisease.HealthEducationJournal,34,71–8.
Bland,J.M.,Bewley,B.R.,Pollard,V.,andBanks,M.H.(1978).Effectofchildren'sandparents'smokingonrespiratorysymptoms.ArchivesofDiseaseinChildhood,53,100–5.
Bland,J.M.,Bewley,B.R.,andBanks,M.H.(1979).Cigarettesmokingandchildren'srespiratorysymptoms:validityofquestionnairemethod.Revued'EpidemiologieetSantéPublique,27,69–76.
Bland,J.M.,Holland,W.W.,andElliott,A.(1974).ThedevelopmentofrespiratorysymptomsinacohortofKentschoolchildren.BulletinPhysio-PathologieRespiratoire,10,699–716.
Bland,J.M.andKerry,S.M.(1998).StatisticsNotes.Weightedcomparisonofmeans.BritishMedicalJournal,316,129.
![Page 661: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/661.jpg)
Bland,J.M.,Mutoka,C.,andHutt,M.S.R.(1977).Kaposi'ssarcomainTanzania.EastAfricanJournalofMedicalResearch,4,47–53.
Bland,J.M.andPeacock,J.L.(2000).StatisticalQuestionsinEvidence-BasedMedicine,UniversityPress,Oxford.
Bland,M.(1995).AnIntroductiontoMedicalStatistics,2nd.ed.,UniversityPress,Oxford.
Bland,M.(1997).Informedconsentinmedicalresearch:Letreadersjudgeforthemselves.BritishMedicalJournal,314,1477–8.
BMJ(1996a).TheDeclarationofHelsinki.BritishMedicalJournal,313,1448.
BMJ(1996b).TheNurembergcode(1947).BritishMedicalJournal,313,1448.
Brawley,O.W.(1998).Thestudyofuntreatedsyphilisinthenegromale.InternationalJournalofRadiationOncology,Biology,Physics,40,5–8.
Breslow,N.E.andDay,N.E.(1987).StatisticalMethodsinCancerResearch.VolumeII—TheDesignandAnalysisofCohortStudies,IARC,Lyon.
BritishStandardsInstitution(1979).Precisionoftestmethods.1:Guideforthedeterminationandreproducibilityofastandardtestmethod(BS5497,part1),BSI,London.
Brooke,O.G.,Anderson,H.R.,Bland,J.M.,Peacock,J.,andStewart,M.(1989).Theinfluenceonbirthweightofsmoking,alcohol,caffeine,psychosocialandsocio-economicfactors.British
![Page 662: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/662.jpg)
MedicalJournal,298,795–801.
Bryson,M.C.(1976).TheLiteraryDigestpoll:makingofastatisticalmyth.TheAmericanStatistician,30,184–5.
BulletinofMedicalEthics(1998).News:LivelydebateonresearchethicsintheUS.November,3–4.
Burdick,R.K.andGraybill,F.A.(1992).ConfidenceIntervalsonVarianceComponents,NewYork,Dekker.
Burr,M.L.,St.Leger,A.S.,andNeale,E.(1976).Anti-mitemeasuresinmite-sensitiveadultasthma:acontrolledtrial.Lancet,i,333–5.
Campbell,M.J.andGardner,M.J.(1989).Calculatingconfidenceintervalsforsomenon-parametricanalyses.InStatisticswithConfidence(ed.Gardner,M.J.andAltman,D.G.).BritishMedicalJournal,London.
Carleton,R.A.,Sanders,C.A.,andBurack,W.R.(1960).Heparinadministrationafteracutemyocardialinfarction.NewEnglandJournalofMedicine,263,1002–4.
Casey,A.T.H.,Crockard,H.A.,Bland,J.M.,Stevens,J.,Moskovich,R.,andRansford,A.(1996).Predictorsofoutcomeinthequadripareticnonambulatorymyelopathicpatientwithrheumatoid-arthritis—aprospectivestudyof55surgicallytreatedRanawatclassIIIBpatients.JournalofNeurosurgery,85,574–81.
Christie,D.(1979).Before-and-aftercomparisons:acautionarytale.BritishMedicalJournal,2,1629–30.
Cochran,W.G.(1977).SamplingTechniques,Wiley,NewYork.
![Page 663: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/663.jpg)
Colton,T.(1974).StatisticsinMedicine,LittleBrown,Boston.
Cook,R.J.andSackett,D.L.(1995).Thenumberneededtotreat:aclinicallyusefulmeasureoftreatmenteffect.BritishMedicalJournal,310,452–4.
Conover,W.J.(1980).PracticalNonparametricStatistics,JohnWileyandSons,NewYork.
Cox,D.R.(1972).Regressionmodelsandlifetables.JournaloftheRoyalStatisticalSocietySeriesB,34,187–220.
Curtis,M.J.,Bland,J.M.,andRing,P.A.(1992).TheRingtotalkneereplacement—acomparisonofsurvivorship.JournaloftheRoyalSocietyofMedicine,85,208–10.
Davies,O.L.andGoldsmith,P.L.(1972).StatisticalMethodsinResearchandProduction,OliverandBoyd,Edinburgh.
Dennis,M.(1997).Commentary:Whywedidn'taskpatientsfortheirconsent.BritishMedicalJournal,314,1077.
Dennis,M.,O'Rourke,S.,Slattery,J.,Staniforth,T.,andWarlow,C.(1997).Evaluationofastrokefamilycareworker:resultsofarandomisedcontrolledtrial.BritishMedicalJournal,314,1071–11.
DHSS(1976).PreventionandHealth:Everybody'sBusiness,HMSO,London.
Doll,R.andHill,A.B.(1950).Smokingandcarcinomaofthelung.BritishMedicalJournal,ii,739–48.
![Page 664: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/664.jpg)
Doll,R.andHill,A.B.(1956).Lungcancerandothercausesofdeathinrelationtosmoking:asecondreportonthemortalityofBritishdoctors.BritishMedicalJournal,ii,1071–81.
Donnan,S.P.B.andHaskey,J.(1977).Alcoholismandcirrhosisoftheliver.PopulationTrends,7,18–24.
Donner,A.,Brown,K.S.,andBrasher,P.(1990).Amethodologicalreviewofnon-therapeuticinterventiontrialsemployingclusterrandomisation1979–1989.InternationalJournalofEpidemiology,19,795–800.
Doyal,L.(1997).Informedconsentinmedicalresearch:Journalsshouldnotpublishresearchtowhichpatientshavenotgivenfullyinformedconsent—withthreeexceptions.BritishMedicalJournal,314,1107–11.
Easterbrook,P.J.,Berlin,J.A.,Gopalan,R.,andMathews,D.R.(1991).Publicationbiasinclinicalresearch.Lancet,337,867–72.
Egero,B.andHenin,R.A.(1973).ThePopulationofTanzania,BureauofStatistics,DaresSalaam.
Esmail,A.,Warburton,B.,Bland,J.M.,Anderson,H.R.,Ramsey,J.(1997).RegionalvariationsindeathsfromvolatilesubstanceabuseinGreatBritain.Addiction,92,1765–71.
Finney,D.J.,Latscha,R.,Bennett,B.M.,andHsa,P.(1963).TablesforTestingSignificanceina2×2ContingencyTable,CambridgeUniversityPress,London.
Fish,P.D.,Bennett,G.C.J.,andMillard,P.H.(1985).Heatwavemorbidityandmortalityinoldage.AgeandAging,14,243–5.
![Page 665: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/665.jpg)
Flint,C.andPoulengeris,P.(1986).The‘KnowYourMidwife’Report,CarolineFlint,London.
Friedland,J.S.,Porter,J.C.,Daryanani,S.,Bland,J.M.,Screaton,N.J.,Vesely,M.J.J.,Griffin,G.E.,Bennett,E.D.,andRemick,D.G.(1996).Plasmaproinflammatorycytokineconcentrations,AcutePhysiologyandChronicHealthEvaluation(APACHE)IIIscoresandsurvivalinpatientsinanintensivecareunit.CriticalCareMedicine,24,1775–81.
Galton,F.(1886).Regressiontowardsmediocrityinhereditarystature.JournaloftheAnthropologicalInstitute,15,246–63.
Gardner,M.J.andAltman,D.G.(1986).ConfidenceintervalsratherthanPvalues:estimationratherthanhypothesistesting.BritishMedicalJournal,292,746–50.
Glasziou,P.P.andMackerras,D.E.M.(1993).VitaminAsupplementationininfectiousdisease:ameta-analysis.BritishMedicalJournal,306,366–70.
Goldstein,H.(1995).MultilevelStatisticalModels,EdwardArnold,London.
Harper,R.andReeves,B.(1999).Reportingofprecisionofestimatesfordiagnosticaccuracy:areview.BritishMedicalJournal,318,1322–3.
Hart,P.D.andSutherland,I.(1977).BCGandvolebacillusinthepreventionoftuberculosisinadolescenceandearlyadultlife.BritishMedicalJournal,2,293–5.
Healy,M.J.R.(1968).Discipliningmedicaldata.BritishMedicalBulletin,24,210–4.
![Page 666: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/666.jpg)
Hedges,B.M.(1978).Questionwordingeffects:presentingoneorbothsidesofacase.TheStatistician,28,83–99.
Henzi,I.,Walder,B.,andTramè,M.R.(2000).Dexamethasoneforthepreventionofpostoperativenauseaandvomiting:aquantitativesystematicreview.Anesthesia—Analgesia,90,186–94.
Hickish,T.,Colston,K.,Bland,J.M.,andMaxwell,J.D.(1989).VitaminDdeficiencyandmusclestrengthinmalealcoholics.ClinicalScience,77,171–6.
Hill,A.B.(1962).StatisticalMethodsinClinicalandPreventiveMedicine,ChurchillLivingstone,Edinburgh.
Hill,A.B.(1977).AShortTextbookofMedicalStatistics,HodderandStoughton,London.
Holland,W.W.,Bailey,P.,andBland,J.M.(1978).Long-termconsequencesofrespiratorydiseaseininfancy.JournalofEpidemiologyandCommunityHealth,32,256–9.
Holten,C.(1951).Anticoagulantsinthetreatmentofcoronarythrombosis.ActaMedicaScandinavica,140,340–8.
Hosmer,D.W.andLemeshow,S.(1999).AppliedSurvivalAnalysis,JohnWileyandSons,NewYork.
Huff,D.(1954).HowtoLiewithStatistics,Gollancz,London.
Huskisson,E.C.(1974).Simpleanalgesicsforarthritis.BritishMedicalJournal,4,196–200.
![Page 667: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/667.jpg)
James,A.H.(1977).BreakfastandCrohn'sdisease.BritishMedicalJournal,1,943–7.
Johnson,F.N.andJohnson,S.(ed.)(1977).ClinicalTrials,Blackwell,Oxford.
Johnston,I.D.A.,Anderson,H.R.,Lambert,H.P.,andPatel,S.(1983).Respiratorymorbidityandlungfunctionafterwhoopingcough.Lancet,ii,1104–8.
Jones,B.andKeward,M.G.(1989).DesignandAnalysisofCross-OverTrials,ChapmanandHall,London.
Kaste,M.,Kuurne,T.,Vilkki,J.,Katevuo,K.,Sainio,K.,andMeurala,H.(1982).Ischronicbraindamageinboxingahazardofthepast?Lancet,ii,1186–8.
Kendall,M.G.(1970).RankCorrelationMethods,CharlesGriffin,London.
Kendall,M.G.andBabingtonSmith,B.(1971).TablesofRandomSamplingNumbers,CambridgeUniversityPress,Cambridge.
Kendall,M.G.andStuart,A.(1969).TheAdvancedTheoryofStatistics,3rd.ed.,vol.1,CharlesGriffin,London.
Kerrigan,D.D.,Thevasagayam,R.S.,Woods,T.O.,McWelch,I.,ThomasW.E.G.,Shorthouse,A.J.,andDennison,A.R.(1993).Who'safraidofinformedconsent?BritishMedicalJournal,306,298–300.
Kerry,S.M.andBland,J.M.(1998).StatisticsNotes:Analysisofatrialrandomizedinclusters.BritishMedicalJournal,316,54.
![Page 668: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/668.jpg)
Kiely,P.D.W.,Bland,J.M.,Joseph,A.E.A.,Mortimer,P.S.,andBourke,B.E.(1995).Upperlimblymphaticfunctionininflamatoryarthritis.JournalofRheumatology,22,214–17.
Kish,L.(1994).SurveySampling,WileyClassicLibrary,NewYork.
Lancet(1980).BCG:badnewsfromIndia.Lancet,i,73–4.
Laupacis,A.,Sackett,D.L.,Roberts,R.S.(1988).Anassessmentofclinicallyusefulmeasuresoftheconsequencesoftreatment.NewEnglandJournalofMedicine,318,1728–33.
Leaning,J.(1996).Warcrimesandmedicalscience.BritishMedicalJournal,313,1413–15.
Lee,K.L.,McNeer,J.F.,Starmer,F.C.,Harris,P.J.,andRosati,R.A.(1980).Clinicaljudgementsandstatistics:lessonsformasimulatedrandomizedtrialincoronaryarterydisease.Circulation,61,508–15.
Lemeshow,S.,Hosmer,D.W.,Klar,J.,andLwanga,S.K.(1990).AdequacyofSampleSizeinHealthStudies,JohnWileyandSons,Chichester.
Leonard,J.V,Whitelaw,A.G.L.,Wolff,O.H.,Lloyd,J.K.,andSlack,S.(1977).Diagnosingfamilialhypercholesterolaemiainchildhoodbymeasuringserumcholesterol.BritishMedicalJournal,1,1566–8.
Levine,M.I.andSackett,M.F.(1946).ResultsofBCGimmunizationinNewYorkCity.AmericanReviewofTuberculosis,53,517–32.
Lindley,M.I.andMiller,J.C.P.(1955).CambridgeElementaryStatisticalTables,CambridgeUniversityPress,Cambridge.
![Page 669: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/669.jpg)
Lopez-Olaondo,L.,Carrascosa,F.,Pueyo,F.J.,Monedero,P.,Busto,N.,andSaez,A.(1996).Combinationofondansetronanddexamethasoneintheprophylaxisofpostoperativenauseaandvomiting.BritishJournalofAnaesthesia,76,835–40.
Lucas,A.,Morley,R.,Cole,T.J.,Lister,G.,andLeeson-Payne,C.(1992).Breastmilkandsubsequentintelligencequotientinchildrenbornpreterm.Lancet,339,510–5.
Luthra,P.,Bland,J.M.,andStanton,S.L.(1982).Incidenceofpregnancyafterlaparoscopyandhydrotubation.BritishMedicalJournal,284,1013.
Machin,D.,Campbell,M.J.,Fayers,P.,andPinol,A.(1998).StatisticalTablesfortheDesignofClinicalStudies,SecondEdition,Blackwell,Oxford.
Mantel,N.(1966).Evaluationofsurvivaldataandtwonewrankorderstatisticsarisinginitsconsideration.CancerChemotherapyReports,50,163–70.
Mather,H.M.,Nisbet,J.A.,Burton,G.H.,Poston,G.J.,Bland,J.M.,Bailey,P.A.,andPilkington,T.R.E.(1979).Hypomagnesaemiaindiabetes.ClinicaChemicaActa,95,235–42.
Matthews,D.E.andFarewell,V.(1988).UsingandUnderstandingMedicalStatistics,SecondEdition,Karger,Basel.
Matthews,J.N.S.andAltman,D.G.(1996a).StatisticsNotes:Interaction2:compareeffectsizesnotPvalues.BritishMedicalJournal,313,808.
Matthews,J.N.S.andAltman,D.G.(1996b).StatisticsNotes:Interaction3:howtoexamineheterogeneity.BritishMedicalJournal,313,862.
![Page 670: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/670.jpg)
Matthews,J.N.S.,Altman,D.G.,Campbell,M.J.,andRoyston,P.(1990).Analysisofserialmeasurementsinmedicalresearch.BritishMedicalJournal,300,230–5.
Maugdal,D.P.,Ang,L.,Patel,S.,Bland,J.M.,andMaxwell,J.D.(1985).Nutritionalassessmentinpatientswithchronicgastro-intestinalsymptoms:comparisonoffunctionalandorganicdisorders.HumanNutrition:ClinicalNutrition,39,203–12.
Maxwell,A.E.(1970).Comparingtheclassificationofsubjectsbytwoindependentjudges.BritishJournalofPsychiatry,116,651–5.
Mayberry,J.F.,Rhodes,J.,andNewcombe,R.G.(1978).BreakfastanddietaryaspectsofCrohn'sdisease.BritishMedicalJournal,2,1401.
McKie,D.(1992).Pollstersturntosecretballot.TheGuardian,London,24August,p.20.
McLean,S.(1997).Commentary:Noconsentmeansnottreatingthepatientwithrespect.BritishMedicalJournal,314,1076.
Meade,T.W.,Roderick,P.J.,Brennan,P.J.,Wilkes,H.C.,andKelleher,C.C.(1992).Extra-cranialbleedingandothersymptomsduetolowdoseaspirinandlowintensityoralanticoagulation.ThrombosisandHaematosis,68,1–6.
Meier,P.(1977).Thebiggesthealthexperimentever:the1954fieldtrialoftheSalkpoliomyelitisvaccine.InStatistics:AGuidetotheBiologicalandHealthSciences(ed.J.M.Tanur,etal.).Holden-Day,SanFrancisco.
![Page 671: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/671.jpg)
Mitchell,E.A.,Bland,J.M.,andThompson,J.M.D.(1994).Riskfactorsforreadmissiontohospitalforasthma.Thorax,49,33–36.
Morris,J.A.andGardner,M.J.(1989).Calculatingconfidenceintervalsforrelativerisks,oddsratiosandstandardizedratiosandrates.InStatisticswithConfidence(ed.Gardner,M.J.andAltmanD.G.).BritishMedicalJournal,London.
MRC(1948).Streptomycintreatmentofpulmonarytuberculosis.BritishMedicalJournal,2,769–82.
Mudur,G.(1997).Indianstudyofwomenwithcervicallesionscalledunethical.BritishMedicalJournal,314,1065.
Newcombe,R.G.(1992).Confidenceintervals:enlighteningormystifying.BritishMedicalJournal,304,381–2.
Newnham,J.P.,Evans,S.F.,Con,A.M.,Stanley,F.J.,andLandau,L.I.(1993).Effectsoffrequentultrasoundduringpregnancy:arandomizedcontrolledtrial.Lancet,342,887–91.
Oakeshott,P.,Kerry,S.M.,andWilliams,J.E.(1994).RandomisedcontrolledtrialoftheeffectoftheRoyalCollegeofRadiologists'guidelinesongeneralpractitioners'referralforradiographicexamination.BritishJournalofGeneralPractice,44,197–200.
O'Brien,P.C.andFleming,T.R.(1979).Amultipletestingprocedureforclinicaltrials.Biometrics,35,549–56.
OfficeforNationalStatistics(1997).1995,1996,1997MortalityStatistics,General,SeriesDH1,No.28,HMSO,London.
OfficeforNationalStatistics(1998a).1998MortalityStatistics,General,SeriesDH1,No.29,HMSO,London.
![Page 672: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/672.jpg)
OfficeforNationalStatistics(1998b).1997BirthStatistics,SeriesFM1,No.26,HMSO,London.
OfficeforNationalStatistics(1999).MortalityStatistics,Childhood,InfantandPerinatal,SeriesDH3,No.30,HMSO,London.
Oldham,H.G.,Bevan,M.M.,andMcDermott,M.(1979).ComparisonofthenewminiatureWrightpeakflowmeterwiththestandardWrightpeakflowmeter.Thorax,34,807–8.
OPCS(1991).MortalityStatistics,SeriesDH2,No.16,HMSO,London.
OPCS(1992).MortalityStatistics,SeriesDH1,No.24,HMSO,London.
Osborn,J.F.(1979).StatisticalExercisesinMedicalResearch,Blackwell,Oxford.
Paraskevaides,E.C.,Pennington,G.W.,Naik,S.,andGibbs,A.A.(1991).Prefreeze/post-freezesemenmotilityratio.Lancet,337,366–7.
Parmar,M.andMachin,D.(1995).SurvivalAnalysis,JohnWileyandSons,Chichester.
Pearson,E.S.andHartley,H.O.(1970).BiometrikaTablesforStatisticians,vol.1,CambridgeUniversityPress,Cambridge.
Pearson,E.S.andHartley,H.O.(1972).BiometrikaTablesforStatisticians,vol.2,CambridgeUniversityPress,Cambridge.
![Page 673: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/673.jpg)
Peduzzi,P.,Concato,J.,Kemper,E.,Holford,T.R.,andFeinstein,A.R.(1996).Asimulationstudyofthenumberofeventspervariableinlogisticregressionanalysis.JournalofClinicalEpidemiology,49,1373–9.
Pocock,S.J.(1977).Groupsequentialmethodsinthedesignandanalysisofclinicaltrials.Biometrika,64,191–9.
Pocock,S.J.(1982).Interimanalysesforrandomisedclinicaltrials:thegroupsequentialapproach.Biometrics,38,153–62.
Pocock,S.J.(1983).ClinicalTrials:APracticalApproach,JohnWileyandSons,Chichester.
Pocock,S.J.andHughes,M.D.(1990).Estimationissuesinclinicaltrialsandoverviews.StatisticsinMedicine,9,657–71.
Pritchard,B.N.C.,Dickinson,C.J.,Alleyne,G.A.O,Hurst,P.,Hill,I.D.,Rosenheim,M.L.,andLaurence,D.R.(1963).ReportofaclinicaltrialfromMedicalUnitandMRCStatisticalUnit,UniversityCollegeHospitalMedicalSchool,London.BritishMedicalJournal,2,1226–7.
RadicalStatisticsHealthGroup(1976).WhosePriorities?,RadicalStatistics,London.
Ramsay,S.(1998).MissEvers'Boys(review).Lancet,352,1075.
Reader,R.,etal.(1980).TheAustraliantrialinmildhypertension:reportbythemanagementcommittee.Lancet,i,1261–7.
Rembold,C.(Numberneededtoscreen:developmentofastatisticfordiseasescreening).1998.BritishMedicalJournal,317,307–12.
![Page 674: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/674.jpg)
Rodin,D.A.,Bano,G.,Bland,J.M.,Taylor,K.,andNussey,S.S.(1998).PolycysticovariesandassociatedmetabolicabnormalitiesinIndiansubcontinentAsianwomen.ClinicalEndocrinology,49,91–9.
Rose,G.A.,Holland,W.W.,andCrowley,E.A.(1964).Asphygmomanometerforepidemiologists.Lancet,i,296–300.
Rowe,D.(1992).Motheranddaughteraren'tdoingwell.TheGuardian,London,14July,p.33.
Royston,P.andAltman,D.G.(1994).Regressionusingfractionalpolynomialsofcontinuouscovariates:parsimoniousparametricmodelling.AppliedStatistics,43,429–467.
Salvesen,K.A.,Bakketeig,L.S.,Eik-nes,S.H.,Undheim,J.O.,andOkland,O.(1992).Routineultrasonographyinuteroandschoolperformanceatage8–9years.Lancet,339,85–9.
Samuels,P.,Bussel,J.B.,Braitman,L.E.,Tomaski,A.,Druzin,M.L.,Mennuti,M.T.,andCines,D.B.(1990).Estimationoftheriskofthrombocytopeniaintheoffspringofpregnantwomenwithpresumedimmunethrombocytopeniapurpura.NewEnglandJournalofMedicine,323,229–35.
Schapira,K.,McClelland,H.A.,Griffiths,N.R.,andNewell,D.J.(1970).Studyontheeffectsoftabletcolourinthetreatmentofanxietystates.BritishMedicalJournal,2,446–9.
Schmid,H.(1973).Kaposi'ssarcomainTanzania:astatisticalstudyof220cases.TropicalGeographicalMedicine,25,266–76.
Schulz,K.F.,Chalmers,I.,Hayes,R.J.,andAltman,D.G.(1995).Biasduetonon-concealmentofrandomizationandnon-double-blinding.JournaloftheAmericanMedicalAssociation,273,408–12.
![Page 675: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/675.jpg)
Searle,S.R.,Cassela,G.,andMcCulloch,C.E.(1992).VarianceComponents,NewYork,NewYork.
Senn,S.(1989).Cross-OverTrialsinClinicalResearch,Wiley,Chichester.
Shaker,J.L.,Brickner,R.C.,Findling,J.W.,Kelly,T.M.,Rapp.R.,Rizk,G.,Haddad,J.G.,Schalch,D.S.,andShenker,Y.(1997).Hypocalcemiaandskeletaldiseaseaspresentingfeaturesofceliacdisease.ArchivesofInternalMedicine,157,1013–6.
Siegel,S.(1956).Non-parametricStatisticsfortheBehaviouralSciences,McGraw-HillKagakusha,Tokyo.
Sibbald,B.,AddingtonHall,J.,Brenneman,D.,andFreeling,P.(1994).Telephoneversuspostalsurveysofgeneralpractitioners.BritishJournalofGeneralPractice,44,297–300.
Snedecor,G.W.andCochran,W.G.(1980).StatisticalMethods,7thedn.,IowaStateUniversityPress,Ames,Iowa.
Snowdon,C.,Garcia,J.,andElbourne,D.R.(1997).Makingsenseofrandomisation:Responsesofparentsofcriticallyillbabiestorandomallocationoftreatmentinaclinicaltrial.SocialScienceandMedicine,15,1337–55.
South-eastLondonScreeningStudyGroup(1977).Acontrolledtrialofmultiphasicscreeninginmiddle-age:resultsoftheSouth-EastLondonScreeningStudy.InternationalJournalofEpidemiology,6,357–63.
Southern,J.P.,Smith,R.M.M.,andPalmer,S.R.(1990).Birdattackonmilkbottles:possiblemodeoftransmissionofCampylobacter
![Page 676: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/676.jpg)
jejunitoman.Lancet,336,1425–7.
Streiner,D.L.andNorman,G.R.(1996).HealthMeasurementScales:APracticalGuidetoTheirDevelopmentandUse,secondedition,Oxford,UniversityPress.
Stuart,A.(1955).Atestforhomogeneityofthemarginaldistributionsinatwo-wayclassification.Biometrika,42,412.
‘Student’(1908).Theprobableerrorofamean.Biometrika,6,1–24.
‘Student’(1931).TheLanarkshireMilkExperiment.Biometrika,23,398–406.
Thomas,P.R.S.,Queraishy,M.S.,Bowyer,R.,Scott,R.A.P.,Bland,J.M.,andDormandy,J.A.(1993).Leucocytecount:apredictorofearlyfemoropoplitealgraftfailure.CardiovascularSurgery,1,369–72.
Thompson,S.G.(1993).Controversiesinmeta-analysis:thecaseofthetrialsofserumcholesterolreduction.StatisticalMethodsinMedicalResearch,2,173–92.
Todd,G.F.(1972).StatisticsofSmokingintheUnitedKingdom,6thed.,TobaccoResearchCouncil,London.
Tukey,J.W.(1977).ExploratoryDataAnalysis,Addison-Wesley,NewYork.
Turnbull,P.J.,Stimson,G.V.,andDolan,K.A.(1992).PrevalenceofHIVinfectionamongex-prisoners.BritishMedicalJournal,304,90–1.
![Page 677: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/677.jpg)
Velzeboer,S.C.J.M.,Frenkel,J.,anddeWolff,F.A.(1997).Ahypertensivetoddler.Lancet,349,1810.
Victora,C.G.(1982).Statisticalmalpracticeindrugpromotion:acase-studyfromBrazil.SocialScienceandMedicine,16,707–9.
White,P.T.,Pharoah,C.A.,Anderson,H.R.,andFreeling,P.(1989).Improvingtheoutcomeofchronicasthmaingeneralpractice:arandomizedcontrolledtrialofsmallgroupeducation.JournaloftheRoyalCollegeofGeneralPractitioners,39,182–6.
Whitehead,J.(1997).TheDesignandAnalysisofSequentialClinicalTrials,revised2nd.ed.,Chichester,Wiley.
Whittington,C.(1977).Safetybeginsathome.NewScientist,76,340–2.
Williams,E.I.,Greenwell,J.,andGroom,L.M.(1992).Thecareofpeopleover75yearsoldafterdischargefromhospital:anevaluationoftimetabledvisitingbyHealthVisitorAssistants.JournalofPublicHealthMedicine,14,138–44.
Wroe,S.J.,Sandercock,P.,Bamford,J.,Dennis,M.,Slattery,J.,andWarlow,C.(1992).Diurnalvariationinincidenceofstroke:Oxfordshirecommunitystrokeproject.BritishMedicalJournal,304,155–7.
Zelen,M.(1979).Anewdesignforclinicaltrials.NewEnglandJournalofMedicine,300,1242–5.
Zelen,M.(1992).Randomizedconsentdesignsforclinicaltrials:anupdate.StatisticsinMedicine,11,131–2.
![Page 678: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/678.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>A
Aabridgedlifetable200–1absolutedifference271–2absolutevalue239acceptingnullhypothesis140accidents53acutemyocardialinfarction277additionrule88adjustedoddsratio323admissionstohospital86 255–6 354 356 370–1age53 56–7 267 308–14 316 373age,gestational56–7ageinlifetableseelifetableage-specificmortalityrate295–6 299–300 302 307 376–7age-standardizedmortalityrate74 296 302age-standardizedmortalityratio297–9 303 307 376–7agreement272–5AIDS58 77–8 169–71 172 174–8 317–8alphaspending152albumin76–7alcoholics76–7 308–17allocationtotreatment6–13 15 20–1 23alterationsto11–13 21alternate6–7 11alternatedates11–12bygeneralpractice21 23byward21cheatingin12–13knowninadvance11inclusters21–2 179–81 344–6
![Page 679: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/679.jpg)
minimization13non-random11–13 21–2physicalrandomization12random7–11 15 17 20–1 25systematic11–12usingenvelopes12usinghospitalnumber11
alphaerror140alternateallocation6–7 11alternativehypothesis137 139–42ambiguousquestions40–1analgesics15 18analysisofcovariance321analysisofvariance172–9 261–2 267–8 318–21assumptions173 175–6balanced318inestimationofmeasurementerror271fixedeffects177Friedman321Kruskal–Wallis217 261–2inmeta-analysis327multi-way318–21one-way172–9 261–2randomeffects177–9inregression310–15 315two-way318usingranks217 261–2 321
anginapectoris15–16 138–9 218–20animalexperiments5 16–17 20–1 33anticoagulanttherapy11–12 19 142antidiuretichormone196–7antilogarithm83appropriateconfidenceintervalsforcomparison134appropriatesignificancetestsforcomparison142–3anxiety18 143 210ARC58 172 174–7arcsinesquareroottransformation165
![Page 680: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/680.jpg)
areaunderthecurve104–5 109–11 169–71 278 373–4probability104–5 109–11serialdata169–71 373–4ROCcurve278
arithmeticmean59arterialoxygentension183–4arthritis15 18 37 40Asianwomen35assessment19–20ascertainmentbias38association230–2asthma21 265 267 332 372 373atrophyofspinalchord37attackrate303attribute47AUCseeareaunderthecurveaverageseemeanAVP196–7AZT(zidovudine)77–8 169–71
![Page 681: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/681.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>B
Bbabies267 373–4back-transformation166–7 271backwardsregression326barchart73–5 354–6barnotation59Bartlett'stest172baseoflogarithm82–4baseline79baselinehazard324BASIC107Bayesianprobability87Bayes'theorem289BCGvaccine6–7 11 17 33 81betaerror140 337betweengroupssumofsquares174betweenclustervariance345–6betweensubjectsvariance178–9 204bias6 11–14 17–20 28 39–42 283–4 327 350 363ascertainment38inallocation11–13ascertainment38inassessment19–20publication327inquestionwording40–2recall39 350 363inreporting17–19response17–19insampling28 31volunteer6 13–14 32
![Page 682: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/682.jpg)
bicepsskinfold165–7 213–15 339bimodaldistribution54–5binaryvariableseedichotomousvariableBinomialdistribution89–91 94 103 106–8 110 128 130–1 132–3 180andNormaldistribution91 106–8meanandvariance94probability90–1insigntest138–9 247
biologicalvariation269birds45–6 255 350birthrate303 305birthweight150blindassessment19–20blocks9bloodpressure19 28 117 191 268–9BMIseebodymassindexbodymassindex(BMI)322–3Bonferronimethod148–51boxandwhiskerplot58 66 351 359boxers264boxes93–4breastcancer37 216–17breastfeeding153breathlessness74–5BritishStandardsInstitution270bronchitis130–2 146 233–4
![Page 683: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/683.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>C
CCampylobacterjejuni44–6 255 350C-Tscanner5–6 68caesariansection25 349calculationerror70calibration194cancer23 32–9 41 69–74 216–17 241–3breast37 216–17cervicalcancer23lung32 35–9 68–70 241–3 299oesophagus74 78–80parathyroidcancerregistry39
capillarydensity159–64 174cards7 12 50carry-overeffect15case-controlstudy37–40 45–6 153–5 241–3 248 323 349–50 362–3casefatalityrate303casereport33–4caseseries33–4cataracts266 373categoricaldata47–8 373 seenominaldatacats350causeofdeath70–3 75celloftable230censoredobservations281 308 324–5census27 47–8 86 294decennial27 294hospital27 47–8 86local27
![Page 684: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/684.jpg)
national27 294years294 299
centile57–8 279–81centrallimittheorem107–8cervicalcancer23cervicalsmear275cervicalcytology22chartbarseebarchartpieseepiechart
cheatinginallocation12–13Chi-squareddistribution118–20 232–3andsamplevariance119–20 132contingencytables231–3 249–51degreesoffreedom118–19 231–2 251table233
chi-squaredtest230–6 238–40 243–51 249–51 258–9 261–2 371 372373contingencytable230–6 238–40 243–7 249–51 258–9 261–2 371 372373continuitycorrection238–40 247 259 261degreesoffreedom231–2 251goodnessoffit248–9logranktest287–8samplesize341trend243–5 259 261–2validity234–6 239–40 245
childrenseeschoolchildrenchoiceofstatisticalmethod257–267cholesterol55 326 345cigarettesmokingseesmokingcirrhosis297–9 306 317classinterval49–50classvariable317clinicaltrials5–25 32–3 326–30allocationtotreatment6–15 20–1 23assessment19–20
![Page 685: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/685.jpg)
combiningresultsfrom326–30clusterrandomized21–2 179–81 205 344–6 380consentofsubjects22–4cross-over15–16 341doubleblind19–20doublemaskedseedoubleblindethics19 22–4groupedsequential152informedconsent22–4intentiontotreat14–15 23 348 372meta-analysis326–30placeboeffect17–19randomized7–11samplesize336–42 344–6 347selectionofsubjects16–17sequential151–2volunteerbias13–14
Clinstatcomputerprogram3 9 30 93 248 298clusterrandomization21–2 179–81 205 344–6 380clustersampling31 344–6Cochran,W.G.230coefficientofvariation271coefficientsinregression189 191–2 310–12 314 317 322–3 325Cox325andinteraction314logistic322–3multiple310–12 314 317simplelinear189 191–2
coeliacdisease34 165–7 213–15 339cohortstudy36–7 350cohort,hypotheticalinlifetable299coins7 28 87–92colds69 241–3colontransittime267combinations97–8combiningdatafromdifferentstudies326–30commoncoldseecolds
![Page 686: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/686.jpg)
commonestimate326–30commonoddsratio328–30commonproportion145–7commonvariance162–4 173comparisonmultipleseemultiplecomparisonsofmeans12–19 143–5 162–4 170–6 338–41 347 361 379–80ofmethodsofmeasurement269–73ofproportions130–2 145–7 233–4 245–7 259 341–3 347 372 379ofregressionlines208 9 367–8oftwogroups128–32 143–7 162–4 211–17 233–4 254 255–7 338–43344–6 347 361 372 379–80ofvariances172 260withinonegroup159–62 217–20 245–7 257 260–1 341
compliance183–4 228–9 363–7 369–70computer2 8–9 30 107 166 174 201 238 288–90 298 308 310 318diagnosis288–90randomnumbergeneration8–9 107programforconfidenceintervalofproportion132programsforsampling30statisticalanalysis2 174 201 298 308 310 318
conception142conditionallogisticregression323conditionaloddsratio248conditionalprobability96–7conditionaltest250confidenceinterval126–34appropriateforcomparison134centile133 280–1correlationcoefficient200–1differencebetweentwomeans128–9 136 162–4 361differencebetweentwoproportions130–1 243differencebetweentworegressioncoefficients208–9 368hazardratio288 325mean126–7 136 159–60 335–6 361median133numberneededtotreat290–1
![Page 687: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/687.jpg)
oddsratio241–3 248percentile133 280–1predictedvalueinregression194–5proportion128 132–3 336quantile133 280–1ratiooftwoproportions131–2referenceinterval280–1 290 375 378regressioncoefficient191–2regressionestimate192–4andsamplesize335–6orsignificancetest142 145 227SMR298–9 307 376–7sensitivity276sensitivity276survivalprobability283transformeddata166–7usingrankorder216 220
confidencelimits126–34confounding34–5consentofresearchsubjects22–4conservativemethods15constraint118–19 250–1contingencytable230 330
continuitycorrection225–6 238–40 247chi-squaredtest238–40Kendall'srankcorrelationcoefficient226Mann-WhitneyUtest225McNemarstest247
continuousvariable47–50 75 87–8 93 103–6 276–8 323indiagnostictest276–8
contrastsensitivity266 373controlgroupcasecontrolstudy37–9 350 362–3clinicaltrial5–7
controlledtrialseeclinicaltrialcornflakes153–5 362–3
![Page 688: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/688.jpg)
coronaryarterydisease34 149 326coronarythrombosis11–12 36correlation197–205 220 260–2 309–11assumptions200–1betweenrepeatedmeasurements341coefficient197–204confidenceinterval200–1Fisher'sztransformation201 339–40 343intra-class179 204–5 272 346intra-cluster346 347linearrelationship199matrix202 309–10multiple311negative198positive197productmoment198r198–200r2199–200rankseerankcorrelationandregression199–200 311repeatedobservations202–4samplesize343–4significancetest200–1tableof200tableofsamplesize344zero198
cough34–5 41 128–32 144–7 233–4 240–1 254counselling41–2counties347covarianceanalysis321Coxregression324–5crime97Crohn'sdisease153–5 165–7 213–15 339 362–3cross-classification230 370–1cross-overtrial15–16 137 341cross-sectionalstudy34–5cross-tabulation230 370–1
![Page 689: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/689.jpg)
crudedeathrate294–5crudemortalityrate294–5 302cumulativefrequency48–51 56cumulativesurvivalprobability282–3 299cushionvolume333–4 378cut-offpoint277–8 281
![Page 690: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/690.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>D
Ddeath27 70–3 96 101–2 281deathcertificate27 294deathrateseemortalityratedecennialcensus27 294decimaldice8decimalplaces70 268decimalpoint70decimalsystem69–70decisiontree289–90DeclarationofHelsinki22degreesoffreedom61 67 118–20 153–4 159 169 171–2 191 231–2251 288 309 311 319 331analysisofvariance173–5Chi-squareddistribution118–20chi-squaredtest231–2 251Fdistribution120Ftest171 173–5goodnessoffittest248–9logranktest288regression191 310 313samplesizecalculations335tdistribution120 157–8tmethod157–8 160–4varianceestimate61 67 94–5 119 352–3
delivery25 230–1 322–3 349demography299denominator68–9dependentvariable187depressivesymptoms18
![Page 691: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/691.jpg)
Derbyshire128designeffect344–6 380detection,belowlimitof281deviationfromassumptions161–2 164 167–8 175–6 196–7deviationsfrommean61 352deviationsfromregressionline187–8dexamethasone290–1diabetes135–6 360–1diagnosis47–8 86 275–9 288–90 317diagnostictest136 275–9 361diagrams72–82 85–6barseebarchartpieseepiechartscatterseescatterdiagram
diarrhoea172 318diastolicbloodpressureseebloodpressuredice7–8 87–9 122dichotomousvariable258–62 308 317 321–3 325 328differenceagainstmeanplot161–2 184 271–5 364–5 367differences129–30 138–9 159–62 184 217–20 271–5 341 364–5 369–70differencesbetweentwogroups128–31 136 143–7 162–7 211–17 258–9 338–43 344–6 347 362–3digitpreference269directstandardization296dischargefromhospital48discretedata47 49discriminantanalysis289distributionBinomialseeBinomialdistributionChi-squaredseeChi-squareddistributioncumulativefrequencyseecumulativefrequencydistributionFseeFdistributionfrequencyseefrequencydistributionNormalseeNormaldistributionPoissonseePoissondistributionprobabilityseeprobabilitydistributionRectangularseeRectangulardistribution
![Page 692: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/692.jpg)
tseetdistributionUniformseeUniformdistribution
distribution-freemethods210diurnalvariation249DNA97doctors36 68 86 297–9 356Dopplerultrasound150dotplot77doubleblind19–20doubledummy18doublemaskedseedoubleblinddoubleplaceboseedoubledummydrug69dummytreatmentseeplacebodummyvariables317 328Duncan'smultiplerangetest176
![Page 693: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/693.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>E
Ee,mathematicalconstant83–4 95ecologicalfallacy42–3ecologicalstudies42–3eczema97election28 32 41electoralroll30 32embryos333–4 378enumerationdistrict27envelopes12enzymeconcentration347 379–80epidemiologicalstudies32 34–40 42–3 45–6 326equality,lineof273–4error70 140 187 192 269–72 337alpha140beta140 337calculation70firstkind140measurement269–72secondkind140 337terminregressionmodel187 192typeI140typeII140 337
estimate61 122–36 326–30estimation122–36 335–6ethicalapproval32ethics4 19 22–4 32evidence-basedpractice1expectation92–4ofadistribution92–3
![Page 694: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/694.jpg)
ofBinomialdistribution94ofChi-squareddistribution118oflife102 300–2 305 357–8ofsumofsquares60–4 98–9 119
expectedfrequency230–31 26 250expectednumberofdeaths297–9expectedvalueseeexpectation,expectedfrequencyexperimentalunit21–2 180experiments5–25animal5 16–17 20–1 33clinicalseeclinicaltrialsdesignof5–25factorial10–11laboratory5 16–17 20–1
expertsystem288–90ex-prisoners128
![Page 695: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/695.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>F
FFdistribution118 120 334Ftest171 173–5 311 313–15 317–18 320 334 378face-lifts23factor317–18factorial90 97factorialexperiment10–11falsenegative277–9falsepositive277–9familyofdistributions90 96Farr,William1FATseefixedactivatedT-cellsfatabsorption78 169–71fatalityrate303feet,ulcerated159–64 174fertility142 302–3fertilityrate303FEVl49–54 57–60 62–3 125–7 133 185–6 188–95 197–9 201 279–80310–11 335–6fevertree26Fisher1Fisher'sexacttest236–40 251–2 259 262Fisher'sztransformation201 343fivefiguresummary58fiveyearsurvivalrate283fixedactivatedT-cells(FAT)318–21fixedeffects177–9 328follow-up,losttoorwithdrawnfrom282footulcers159–64 174forcedexpiratoryvolumeseeFEV1
![Page 696: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/696.jpg)
forestdiagram330forwardregression326fourths57frequency48–56 68–9 230–1 250cumulative48–51density52–4 104–5distribution48–56 66–7 103–5 351–2 354expected230–1 250perunit52–4polygon54andprobability87 103–5proportion68relative48–50 53–4 104–5tallysystem50 54intables71 230–1
![Page 697: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/697.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>G
GG.P.41Gabriel'stest177gallstones284–8 324–5Galton186gastricpH265–6 372–3GaussiandistributionseeNormaldistributiongeewhizgraph79–80geometricmean113 167 320geriatricadmissions86 255–6 354 356 370–1gestationalage196–7glucose35 66–7 121–2 351–3 359–60gluesniffingseevolatilesubstanceabusegoodnessoffittest248–9GossettseeStudentgradient185–6graftfailure331graphs72–82 85–6groupcomparisonseecomparisonsgroupedsequentialtrials152groupingofdata167guidelines179–81
![Page 698: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/698.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>H
Hharmonicmean113hayfever97hazardratio288 324–25health40–1healthcentre220–1healthpromotion347healthypopulation279 292–3hearttransplants264heatwave86 255–6 356 370–1height75–6 87–8 93–4 112 159 185–6 188–95 197–9 201 208–9 308–17 367–9Helsinki,Declarationof22heteroscedasticity175heterogeneitytest249 328–9Hill,Bradford1histogram50–7 67 72 75 103–4 267 303–4 352 354 356 359historicalcontrols6HIV58 128 172 174–7holes93–4homogeneityofoddsratios328–9homogeneityofvarianceseeuniformvariancehomoscedasticity175hospitaladmissions86 255–6 356 370–1hospitalcensus27 47–8 85hospitalcontrols38–9house-dustmite265 372housingtenure230–1 317Huff79 81humanimmunodeficiencyvirusseeHIV
![Page 699: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/699.jpg)
hypercholesterolaemia55hypertension43 91 265 372hypocalcaemia34hypothesis,alternativeseealternativehypothesishypothesis,nullseenullhypothesis
![Page 700: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/700.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>I
IICCseeintra-classcorrelationICDseeInternationalClassificationofDiseaseileostomy265 372incidence303independentevents88 357independentgroups128–32 143–7 162–4 172–7 211–17independentrandomvariables93–4independenttrials90independentvariableinregression187India17 33indirectstandardization296–9inductionoflabour322–3infantmortalityrate303infinity(∞)291inflammatoryarthritis40informedconsent22–3instrumentaldelivery25 349intentiontotreat14–15 348–9 372interaction310 313–14 320–1 327–9 334 378intercept185–6InternationalClassificationofDisease70–72inter-pupildistance331interquartilerange60interval,class49intervalestimate126intervalscale210 217 258–62 373intra-classcorrelationcoefficient179 204–5 272 380intra-clustercorrelationcoefficient272 380
![Page 701: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/701.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>J
Jjitteringinscatterdiagrams77
![Page 702: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/702.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>K
KKaplan-Meiersurvivalcurve283Kaposi'ssarcoma69 220–1Kendall'srankcorrelationcoefficient222–6 245 261–2 373 374continuitycorrection226incontingencytables245τ222table225tau222ties23–4comparedtoSpearman's224–5
Kendall'stestfortwogroups217Kent245–7KnowYourMidwifetrial25 348–9knowledgebasedsystem289–90Korotkovsounds268–9Kruskal-Wallistest217 261–2
![Page 703: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/703.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>L
Llabour322–3 348–9laboratoryexperiment5 16 20–1lactulose172 175–7Lanarkshiremilkexperiment12laparoscopy142largesample126 128–32 143–7 168–9 258–60 335–6leastsquares187–90 205–6 310leftcensoreddata281Levenetest172lifeexpectancy102 300–2 305 357–8lifetable101–2 282–3 296 299–302limitsofagreement274–5linegraph77–80 354 356lineofequality273–4linearconstraint118–19 243–5 250–1linearregressionseeregression,multipleregressionlinearrelationship185–209 243–5lineartrendincontingencytable243–5LiteraryDigest31lithotrypsy284logseelogarithm,logarithmicloghazard324–5log-linearmodel330logodds240 252–3 321–3logoddsratio241–2 252–3 323logarithm82–4 131baseof82–4
logarithmofproportion131logarithmofratio131
![Page 704: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/704.jpg)
logarithmicscale81–2logarithmictransformation113–14 116 164–7 175–6 184andcoefficientofvariation271andconfidenceinterval167geometricmean113 167toequalvariance164–7 175–6 196–7 271toNormaldistribution113–14 116 164–7 175–6 184 360 364–5 372standarddeviation113–14varianceof131 248
logisticregression289 321–3 326 328–9 330conditional323multinomial330ordinal330
logittransformation235 248–9 321–3Lognormaldistribution83 113logranktest284 287–9 325longitudinalstudy36–7losstofollow-up282Louis,Pierre-Charles-Alexandre1lungcancer32 35–9 68–70 96 242–3 299lungfunctionseeFEV1,PEFR,meantransittime,vitalcapacitylymphaticdrainage40
![Page 705: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/705.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>M
Mmagnesium135–6 292–3 360–1 375–6malaria26mannitol58 172 174–7 317–18Mann–WhitneyUtest164 211–17 225–7 258–9 259 278 373–4andtwo-sampletmethod211 215–17continuitycorrection225–6Normalapproximation215 225–6andROCcurve278table212tablesof217ties213 215
Mantel'smethodforsurvivaldata288Mantel-Haenszelmethodforcombining,2by2tables328methodfortrend245
marginaltotals230–1matchedsamples159–62 217–20 245–7 260 341 363–7 369–70matching39 45–6maternalage267 373maternalmortalityrate303maternitycare25mathematics2matrix309maximum58 65 169 345maximumvoluntarycontraction308–16McNemar'stest245–7 260meantransittime265 368mean59–60 67arithmetic59
![Page 706: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/706.jpg)
comparisonoftwo128–9 143–5 162–4 338–41 361 378–9confidenceintervalfor126–7 132 335 361deviationsfrom60geometric113 167harmonic113ofpopulation126–7 335–6ofprobabilitydistribution92–4 105–6ofasample56–8 65–6 352–3samplesize335–6 338–41samplingdistributionof122–5standarderrorof126–7 136 156 335 361sumofsquaresabout60–65
measurement268–9measurementerror269–72measurementmethods272–5median56–9 133 216–7 220 351confidenceintervalfor133 220
MedicalResearchCouncil9mercury34meta-analysis326–30methodsofmeasurement269–73mice21 33 333–4 378midwives25 342–3mildhypertension265 368milk12–13 45–6 255 349–50miniWrightpeakflowmeterseepeakflowmeterminimization13minimum58 66 351misleadinggraphs78–81missingdenominator69missingzero79–80mites265 372MLn3MLWin3mode55modulus239Montecarlomethods238
![Page 707: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/707.jpg)
mortality15 36 70–6 86 294–6 302–3 347 356 357–8 376–7mortalityrate36 294–6 302–3age-specific295–6 299–300 302 307 376–7age-standardized296 302crude294–5 302infant303 305neonatal303perinatal303
mosquitos26MTBseemycobateriumtuberculosisMTTseemeantransittimemultifactorialmethods308–34multi-levelmodelling3multinomiallogisticregression330multiplecomparisons175–7multipleregression308–18 333–4analysisofvariancefor310–15andanalysisofvariance318assumptions310 315–16backward326classvariable317–18coefficients310–12 314 378computerprograms308 310 318correlatedpredictorvariables312degreesoffreedom310 312dichotomouspredictor317dummyvariables317–18Ftest311 313 317factor317–18forward326interaction310 313–14 333–4 378leastsquares310linear310 314inmeta-analysis327non-linear310 314–15 378Normalassumption315–16outcomevariable308
![Page 708: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/708.jpg)
polynomial314–15predictorvariable308 312–13 316–18quadraticterm315 316 378qualitativepredictors316–18R2311referenceclass317residualvariance310residuals315–16 333–4 378significancetests310–13standarderrors311–12stepwise326sumofsquares310 313–14 378ttests310–12 317transformations316uniformvariance316varianceratio311variationexplained311
multiplesignificancetests148–52 169multiplicativerule88 90 92–4 96multi-wayanalysisofvariance318–21multi-waycontingencytables330musclestrength308–16mutuallyexclusiveevents88 90 357mycobateriumtuberculosis(MTB)318–21myocardialinfarction277 347 379
![Page 709: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/709.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>N
NNapier83naturalhistory26 33naturallogarithm83naturalscale81–2nauseaandvomiting290–1Nazideathcamps22negativepredictivevalue278–9neonatalmortalityrate300NewYork6–7 10Newman-Keulstest176Nightingale,Florence1nitrite265 372–3NNHseenumberneededtoharmNNTseenumberneededtotreatnodesinbreastcancer216–17nominalscale210 258–62non-parametricmethods210 226–7non-significant140–1 142–3 149nonedetectable281Normalcurve106–9normaldelivery25 349Normaldistribution91 101–20andBinomial91 106–8inconfidenceintervals126–7 258–60 262 373incorrelation200–1deriveddistributions118–20independenceofsamplemeanandvariance119–20aslimit106–8andnormalrange279–81 293
![Page 710: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/710.jpg)
ofobservations112–18 156 210 258–62 359–60andreferenceinterval279–81 293 375 378inregression187 192 194 315–16insignificancetests143–7 258–60 262 368standarderrorofsamplestandarddeviation132intmethod156–8tables109–10
Normalplot114–19 121–2 161 163 165–7 170–3 175–6 180–1 267359–60Normalprobabilitypaper114normalrangeseereferenceintervalnullhypothesis137 139–42numberneededtoharm290numberneededtotreat290–1Nuremburgtrials22nuisancevariable320
![Page 711: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/711.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>O
Oobservationalstudies5 26–46observedandexpectedfrequencies230–1occupation96odds240 321–3oddsratio240–2 248 252–3 259 323 328–9oesophogealcancer74 77–80OfficeofNationalStatistics294ontreatmentanalysis15one-sidedpercentagepoint110one-sidedtest141–2 237one-tailedtest141–2 237opinionpoll29 32 41 347 378–9orderednominalscale258–62ordinallogisticregression330ordinalscale210 220 258–62 373outcomevariable187 190 308 321outliers58 196 378overview326–30oxygendependence267 373–4
![Page 712: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/712.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>P
Ppa(O2)183–4pain15–16 18painreliefscore18paireddata129–30 138–9 159–62 167–8 217–20 245–7 260 341 363–7369–70 372inlargesample129–30McNemar'stestseeMcNemar'stestsamplesize341signtestseesigntesttmethodseetmethodsWilcoxonseetestWilcoxontest
parameter90parametricmethods210 226–7parathyroidcancer282–4parity49 52–3 248–9passivesmoking34–5PCOseepolycysticovarydiseasepeakexpiratoryflowrateseePEFRpeakflowmeter269–75peakvalue169Pearson'scorrelationcoefficientseecorrelationcoefficientPEFR54 128–9 144–5 147–8 208–9 265 269–75 363–4 368percentage68 71percentagepoint109–10 347 378percentile57 279–81perinatalmortalityrate303permutation97–8pH265 372–3phlegm145 147–8
![Page 713: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/713.jpg)
phosphomycin69physicalmixing12pictogram80–1piechart72–3 80 354–5piediagramseepiechartpilotstudy335 339 341Pitman'stest260placebo17–20 22pointestimate125Poissondistribution95–6 108 165 248–50 252 298–9Poissonheterogeneitytest249Poissonregression330poliomyelitis13–14 19 68 86 355polycysticovarydisease35polygonseefrequencypolygonpolynomialregression314–15population27–34 36 39 87 335–6census27 294estimate294mean126–7 335–6national27 294projection302pyramid303–5restricted33standarddeviation124–5statisticalusage28variance124–5
positivepredictivevalue278power147–8 337–46p–pplot117–18precision268–9predictorvariable187 190 308 312–13 316–18 321 323 324pregnancy25 49 348–9prematurebabies267presentingdata68–86presentingtables71–2prevalence35 90 278–9 303
![Page 714: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/714.jpg)
probability87–122additionrule88conditional96–7densityfunction104–6distribution88–9 92–4 103–6 357–8ofdying101–2 299–300 357–8multiplicationrule88 96paper114insignificancetests137 9ofsurvival101–2 357–8thatnullhypothesisistrue140
productmomentcorrelationcoefficientseecorrelationcoefficientpronethalol15–16 138–9 217–20proportion68–9 71 128 130–3 165 321–3arcsinesquareroottransformation165confidenceintervalfor128 132–3 336denominator69differencebetweentwo130–1 145–7 233–4 245–7 341–3 347asoutcomevariable321–3ratiooftwo131–2 147samplesize336 341–3 347standarderror128 336intables71ofvariabilityexplained191 200
proportionalfrequency48proportionalhazardsmodel324–5prosecutor'sfallacy97prospectivestudy36–7protocol268pseudo-random8publicationbias327pulmonarytuberculosisseetuberculosispulserate178–9 190–1 204Pvalue1 139–41Pvaluespending152pyramid,population303–5
![Page 715: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/715.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>Q
Qq–qplotseequantile–quantileplotquadraticterm315 316 378qualitativedata47 258–62 316–18quantile56–8 116–18 133 279–81confidenceinterval133 280–1
quantile-quantileplot116–18quantitativedata47 49quartile57–8 66 351quasi-randomsampling31questionnaires36 40–2quotasampling28–29
![Page 716: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/716.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>R
Rr,correlationcoefficient198–9r2199–20 311rS,Spearmanrankcorrelation220R,multiplecorrelationcoefficient311R2311radiologicalappearance20RAGE23randomallocation7–11 15 17 20–3 25bygeneralpractice21 23byward21inclusters21–2 344–6
randombloodglucose66–7randomeffects177–9 328randomnumbers8 10 29–30randomsampling9 29–32 38 90randomvariable87–118additionofaconstant93differencebetweentwo94expectedvalueof92–4meanof92–4multipliedbyaconstant92sumoftwo92–3varianceof92–4
randomizationseerandomallocationrandomizedconsent23randomizingdevices7–8 87 90range59–60 279interquartile59–60normalseereferenceinterval
![Page 717: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/717.jpg)
referenceseereferenceintervalrank211 213–14 218 221 223rankcorrelation220–6 261–2 373 374choiceof226 261–2Kendall's222–6 261–2 373 374Spearman's220–2 226 261–2 374
rankorder211 213–14 221ranksumtest210–20onesampleseeWilcoxontwosampleseeMannWhitney
rate68–9 71agespecificmortality295–6 299–300 302 307agestandardizedmortality296 302attack303birth303 305casefatality303crudemortality294–5 302denominator69fertility303fiveyearsurvival283incidence303infantmortality303 305maternalmortality303mortality294–6 302–3multiplier68 295neonatalmortality303perinatalmortality303prevalence303response31–2stillbirth303survival283
ratiooddsseeoddsratioofproportions131–2 147scale257–8standardizedmortalityseestandardizedmortalityratio
rats20
![Page 718: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/718.jpg)
rawdata167recallbias39 350 363receiveroperatingcharacteristiccurveseeROCcurvereciprocaltransformation165–7Rectangulardistribution107–8referenceclass317referenceinterval33 136 279–81 293 361 375 378confidenceinterval280–1 293 375 378bydirectestimation280–1samplesize347 378usingNormaldistribution279–80 293 361 375 378usingtransformation280
refusingtreatment13–15 25registerofdeaths27regression185–9 199–200 205–7 208–9 261–2 308–18 312–30 333–4analysisofvariancefor310–15assumptions187 191–2 194–5 196–7backward326coefficient189 191–2comparingtwolines208–9 367–8confidenceinterval192incontingencytable234–5andcorrelation199–200Cox324–5dependentvariable187deviationsfrom187deviationsfromassumptions196–7equation189errorterm187 192estimate192–3explanatoryvariable187forward326gradient185–6independentvariable187intercept185–6leastsquares187–90 205–6line187
![Page 719: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/719.jpg)
linear189logistic321–3 326 328–9multinomiallogistic330multipleseemultipleregressionordinallogistic330outcomevariable187 190outliers196perpendiculardistancefromline187–8Poisson330polynomialseepolynomialregressionprediction192–4predictorvariable187 190proportionalhazards324–5residualsumofsquares191residualvariance191residuals194–6significancetest192simplelinear189slope185–6standarderror191–4stepwise326sumofproducts189sumofsquaresabout191–2 310sumofsquaresdueto191–2towardsthemean186–7 191variabilityexplained191 200varianceaboutline191–2 205–6XonY190–1
rejectingnullhypothesis140–1relationshipbetweenvariables33 73–8 185–209 220–6 230–45 257261–2 308–34relativefrequency48–50 53 103–5relativerisk132 241–3 248 323reliability272repeatability33 269–72repeatedobservations169–71 202–3repeatedsignificancetests151–2 169
![Page 720: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/720.jpg)
replicates177representativesample28–32 34residualmeansquare174 270residualstandarddeviation191–2 270residualsumofsquares174 310 312residualvariance173 310residuals165–6 175–6 267 315–16 333–4aboutregressionline194–6 315–16plotsof162–4 173–4 194–6 315–16 333–4 378withingroups165–6 175–6
respiratorydisease32 34–5respiratorysymptoms32 34–5 41 125–9 142–7 233–4 240–1 243–7254responsebias17–19responserate31–2responsevariableseeoutcomevariableretrospectivestudy39rheumatoidarthritis37Richterscale114risk131–2riskfactor39 326–7 350RND(X)107robustnesstodeviationsfromassumptions167–9ROCcurve277–8
![Page 721: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/721.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>S
Ss2,symbolforvariance61saline13–14Salkvaccine13–14 17 19 68 355salt43sample87large127–31 168–9 258–60 262 335–6meanseemeansizeseesizeofsamplesmall130–1 132–3 156–69 227 258–60 262 344varianceseevariance
sampling27–34inclinicalstudies32–4 293 375cluster31distribution122–5 127inepidemiologicalstudies32 34–9experiment63–4 122–5frame29multi-stage30quasi-random31quota29random29–31simplerandom29–30stratified31systematic31
scanner5–6scatterdiagram75–7 185–6scattergramseescatterdiagramschoolchildren12–13 17 22 31 34–5 41 43 128–32 143–7 233–4 240–1 243–7 254
![Page 722: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/722.jpg)
schools22 31 34screening15 22 81 216–7 265 275–9selectionofsubjects16–17 32–3 37–9incasecontrolstudies37–9inclinicaltrials16–17self31–2
selfselection31–2semenanalysis183semi-parametric325sensitivity276–8sequentialanalysis151–2sequentialtrials151–2serialmeasurements169–71sex71–2signtest138–9 161 210 217 219–20 228 246–7 260 369–70 372 373signed-ranktestseeWilcoxonsignificanceandimportance142–3significanceandpublication327significancelevel140–1 147significancetests137–55multiple148–52 169andsamplesize336–8insubsets149–50inferiortoconfidenceintervals142 145
significantdifference140significantdigitsseesignificantfiguressignificantfigures69–72 268–9sizeofsample32 147–8 335–47accuracyofestimation344inclusterrandomization344–6correlationcoefficient343–4andestimation335–0pairedsamples6–341referenceinterval347 378andsignificancetests147–8 336–8singlemean335–6singleproportion336 378–9
![Page 723: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/723.jpg)
twomeans338–41 379–80twoproportions341–3 379
skewdistribution56 59 67 112–14 116–17 165 167–8 360skinfoldthickness165–7 213–15 335slope185–6smallsamples156–67 227 258–60smoking22 26 31–2 34–9 41 67 74–5 241–3 356SMR297–9 303 307 376–7Snow,John1sodium116–17somites333–4 378SouthEastLondonScreeningStudy15Spearman'srankcorrelationcoefficient220–2 226 261–2 373table219ties219
specificity276–8spinalchordatrophy37squareroottransformation165–7 175–7squares,sumofseesumofsquaresstandardagespecificmortalityrates297–8standarddeviation60 62–4 67 92–4 119–21degreesoffreedomfor63–4 67 119ofdiiferences159–62ofpopulation123–4ofprobabilitydistribution92–4 105ofsample62–4 67 119 353ofsamplingdistribution123–4andtransformation113–14andstandarderror126standarderrorof132withinsubjects269–70
standarderror122–5andconfidenceintervals126–7centite280correlationcoefficient201 343differencebetweentwomeans128–9 136 338–41 361 379–80differencebetweentwoproportions130–1 145–7 341–3 379
![Page 724: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/724.jpg)
differencebetweentworegressioncoefficients208 367–8differentinsignificancetestandconfidenceinterval147loghazardratio325logoddsratio241–2 252–3logisticregressioncoefficient322mean123–5 136 335–6 361percentile280predictedvalueinregression192–4proportion128 336 378–9quantile280ratiooftwoproportions121–2referenceinterval280 370–1 378regressioncoefficient191–2 311–12 317regressionestimate192–3SMR298–9 377standarddeviation132survivalrate283–4 341
StandardNormaldeviate114–17 225–6StandardNormaldistribution108–11 143 156–8 337–8standardpopulation296standardizedmortalityrate74 296standardizedmortalityratio296–9 303 307standardizedNormalprobabilityplot117–18Stata118StatExact238statistic47 139 302–3 337test139 337vital302–3
Statistics1statisticalsignificanceseesignificanceteststemandleafplot54 57 66 184 351 364–6stepfunction51 283step-down326step-up326stepwiseregression326stillbirthrate303
![Page 725: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/725.jpg)
stratification31strength308–16strengthofevidence137 140 362streptomycin9–10 17 19–20 81 235–6 290stroke5–6 23 249Stuarttest248 260Student12–13 156 158–9Student'stdistributionseetdistributionStudentizedrange176subsets149–50success90suicide306sumofproductsaboutmean189 198–200sumofsquares60–1 63–5 98–9 119 173–4 310 313–14aboutmean60–1 63–5 119 352–3aboutregression191–2 310 313–14duetoregression191–2 310 313–14expectedvalueof63–4 98–9 119
summarystatistics169 180–1 327summation59survey28–9 42 90survival10 101–2 281–8 324–5analysis281–8 324–5curve283–4 286 324probability101–2 282–4 287rate283time162 281–8
symmetricaldistribution54 56 59synergy320syphilis22systolicbloodpressureseebloodpressure
![Page 726: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/726.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>T
Ttdistribution120 156–9degreesoffreedom120 153–4 157–8andNormaldistribution120 156–8shapeof157table158
tmethods114 156–69assumptions161–8 184 365–7confidenceintervals159–63 164 167deviationsfromassumptions161–2 164 167–8differencebetweenmeansinmatchedsample159–61 184 260 363–7370 372differencebetweenmeansintwosamples162–7 258–9 262onesample159–62 184 260 363–7 370 372paired159–62 167–8 184 217 220 260 363–7 370 372regressioncoefficient191–2 310–12 317singlemean159–62 176twosample162–7 217 258–9 262 317 373–4unpairedsameastwosample
tableofprobabilitydistributionChi-squared233correlationcoefficient200Kendall'sτ225Mann–WhitneyU212Normal109–10Spearman'sρ222t158Wilcoxonmatchedpairs219
tableofsamplesizeforcorrelationcoefficient344tablesofrandomnumbers8–9 29–30
![Page 727: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/727.jpg)
tables,presentationof71–2tables,twoway230–48tailsofdistributions56 359–60tallysystem49–50 54Tanzania69 220–4TBseetuberculosistelephonesurvey42temperature10 70 86 210 255–6 332test,diagnostic136 275–9 361test,significanceseesignificancetestteststatistic136 337threedimensionaleffectingraphs80thrombosis11–12 36 345thyroidhormone267 373–4tiesinranktests213 215 218–19 222–4tiesinsigntest138time324–5timeseries77–8 169–71 354 356timetopeak169time,survivalseesurvivaltimeTNFseetumournecrosisfactortotalsumofsquares174transformations112–14 163–7 320arcsinesquareroot165andconfidenceintervals167Fisher'sz201 343logarithmic112–14 116 163–7 170–1 175–6 184 320 364–7 369–70logit240 252–3toNormaldistribution112–14 116 164–7 175–6 184reciprocal113 165–7andsignificantfigures269squareroot165–7 175–7touniformvariance163–7 168 175–6 196–7 271
treatedgroup5–7treatment5–7 326–7treatmentguidelines179–81trendincontingencytables243–5
![Page 728: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/728.jpg)
chi-squaredtest243–5Kendall'sτb245Mantel–Haenzsel245
trial,clinicalseeclinicaltrialtrialofscar322–3triglyceride55–6 58–59 63 112–13 280–2trisomy-16333–4 378truedifference147truenegative278truepositive278tuberculosis6–7 9–10 17 81–2 290Tukey54 58Tukey'sHonestlySignificantDifference176tumourgrowth20tumournecrosisfactor(TNF)318–21TuskegeeStudy22twins204two-samplettestseetmethodstwo-sampletrial16two-sidedpercentagepoint110two-sidedtest141–2two-tailedtest141–2typeIerror140typeIIerror140 337
![Page 729: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/729.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>U
Uulceratedfeet159–64 174ultrasonography134unemployment42Uniformdistribution107–8 249uniformvariance159 162–4 167–8 175–6 187 191 196–7 316 319–20unimodaldistribution55unitofanalysis21–2 179–81urinaryinfection69urinarynitrite265 372–3
![Page 730: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/730.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>V
Vvaccine6–7 11 13–14 17 19validityofchi-squaredtest234–6 239–40 245variability59–64 269variabilityexplainedbyregression191 200variable47categorical47continuous47 49dependent187dichotomous259–62discrete47 49explanatory187independent187nominal210 259–62nuisance320ordinal210 259–62outcome187 190 308 321predictor187 190 308 312–13 316–18 321 323 324qualitative47 316–18quantitative47randomseerandomvariable
variance59–64 67aboutregressionline191–2 205–6analysisofseeanalysisofvariancebetweenclusters345–6betweensubjects178–9 204common162–4 170 173comparisoninpaireddata260comparisonofseveral172comparisonoftwo171 260
![Page 731: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/731.jpg)
degreesoffreedomfor61 63–4 352–3estimate59–64 124–5oflogarithm131 252population123–4ofprobabilitydistribution91–4 105ofrandomvariable91–4ratio120 311residual192 205–6 310sample59–64 67 94 98–9 119 352–3uniform162 163–7 168 174–6 187 196–7 316withinclusters345–6withinsubjects178–9 204 269–72
variation,coefficientof271visualacuity266 373vitalcapacity75–6vitalstatistics302–3vitaminA328–9vitaminD115–16volatilesubstanceabuse42 307 376–7volunteerbias6 13–14 32volunteers5–6 13–14 16–17VSAseevolatilesubstanceabuse
![Page 732: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/732.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>W
WWandsworthHealthDistrict86 255–6 356website3 4weightgain20–1wheeze267whoopingcough265 373Wilcoxontest217–20 260 369–70 373matchedpairs217–20 260 369–70 373onesample217–20 260 369–70 373signedrank217–20 260 369–70 373table219ties218–19twosample217 seeMann-Whitney
withdrawnfromfollow-up282withinclustervariance345–6withingroupresidualsseeresidualswithingroupssumofsquares173withingroupsvariance173withinsubjectsvariance178–9 204
withinsubjectsvariation178–9 269–72Wooif'stest328Wrightpeakflowmeterseepeakflowmeter
![Page 733: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/733.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>X
X[xwithbarabove],symbolformean59X-ray19–20 81 179–81
![Page 734: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/734.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>Y
YYates'correction238–40 247 259 261
![Page 735: An Introduction to Medical Statistics by Martin Bland](https://reader035.vdocument.in/reader035/viewer/2022071523/613cb931a3339922f86eeda2/html5/thumbnails/735.jpg)
Authors: Bland,MartinTitle: IntroductiontoMedicalStatistics,An,3rdEdition
Copyright©2000OxfordUniversityPress
>BackofBook>Index>Z
Zztest143–7 258–9 262 234ztransformation201 343zero,missing78–80zidovudineseeAZT%symbol71!(symbolforfactorial)90 97∞(symbolforinfinity)291|(symbolforgiven)96|(symbolforabsolutevalue)239α(symbolforalpha)140β(symbolforbeta)140χ(symbolforchi)118–19µ(symbolformu)92–3φ(symbolforphi)108Φ(symbolforPhi)109ρ(symbolforrho)220–2Σ(symbolforsummation)57σ(symbolforsigma)92–3τ(symbolfortau)222–5