biostatistics workbook aug07-1

Upload: waelinchawi

Post on 10-Oct-2015

48 views

Category:

Documents


3 download

DESCRIPTION

Ststistics manualUnderstanding the practical

TRANSCRIPT

  • BiostatisticsWorkbookFieldEpidemiologyandLabTrainingPrograms(FELTP)

    DRAFT

    DepartmentofHealthandHumanServicesCentersforDiseaseControlandPrevention

    CoordinatingOfficeforGlobalHealthOfficeofCapacityDevelopmentandProgramCoordination

    DivisionofEpidemiologyandSurveillanceCapacityDevelopment

  • Acknowledgements:

    Wethankthefollowingfortheirtimeandeffortsindevelopingthecontentofthisworkbook:

    DonnaJonesMichaelA.JosephJenniferScharff

    NadineSunderland

    ContentReview:

    EdmondMaesPeterNsubuga

  • BiostatisticsWorkbook 5DRAFT:Aug.28,2007

    TableofContents

    HowtoUsethisWorkbook ...........................................................................................6IntroductiontoBiostatistics ..........................................................................................7

    ScalesofMeasurement ................................................................................................9FrequencyDistributions ............................................................................................11

    CentralLocationandDispersion ................................................................................33MeasuresofCentralTendency...................................................................................34MeasuresofDispersion .............................................................................................41

    ProbabilityandtheNormalDistribution ...................................................................52ProbabilityDistribution.............................................................................................53NormalDistribution ..................................................................................................55CentralLimitTheorem ..............................................................................................61

    StatisticalInference .....................................................................................................63ConfidenceIntervalAroundaMean ..........................................................................65ConfidenceIntervalAroundaProportion..................................................................77HypothesisTesting:TwoSamplettest ......................................................................85ConfidenceIntervalEstimation:TwoSamplettest ...................................................95HypothesisTesting:ztestforDifferenceinProportions ..........................................106ConfidenceIntervalEstimation:ztestforDifferenceinProportions .......................115HypothesisTesting:Pairedttest .............................................................................125ConfidenceIntervalEstimation:Pairedttest ..........................................................136FishersExactTest ..................................................................................................145ChiSquareTestforIndependence ...........................................................................155

    ConfidenceIntervalsforCaseControlandCohortStudies....................................163ConfidenceIntervals:OddsRatiosandRelativeRisks .............................................164

    SampleSize ................................................................................................................181SampleSizeforDescriptiveStudies .........................................................................182SampleSizeforAnalyticStudies ..............................................................................191

    CorrelationandRegressionAnalysis........................................................................205PearsonProductMomentCorrelationCoefficient...................................................206SimpleLinearRegression ........................................................................................217OneWayAnalysisofVariance(ANOVA) ................................................................223

    References..................................................................................................................231Appendix1:AnswerKey ..........................................................................................234Appendix2:DistributionTables...............................................................................243

    StudentstTable......................................................................................................244StandardNormalz...................................................................................................245ChiSquareDistribution ...........................................................................................246FDistribution ..........................................................................................................247

  • BiostatisticsWorkbook 6DRAFT:Aug.28,2007

    HowtoUsethisWorkbook

    Thisworkbookisintendedasaresourceforstudentsinintroductorybiostatisticscourses. Itprovidesstudentswithstepbystepguidancethroughexampleproblemscalculatedbyhandandwithreadilyavailablestatisticalsoftwareprograms.Practiceproblemsaregiven,alongwithananswerkey,sothatstudentsareabletosolidifywhattheyhavelearnedintheirbiostatisticscourses.

    Theworkbookmayalsobeusedasareferenceonceastudenthascompletedabiostatisticscourse. Thoughitdoesnotprovidedetailedinformationonthetheoryofbiostatisticalconcepts,itwillserveasarefresherastowhatstatisticaltestshouldbeusedinagivensituationandhowtodothecalculationsthataccompanythattest.

  • IntroductiontoBiostatistics

    BiostatisticsWorkbook 7DRAFT:Aug.28,2007

    IntroductiontoBiostatisticsThisworkbookprovidesanoverviewofbasicbiostatisticstopicsincludingscalesofmeasurement,centrallocationanddispersion,normaldistribution,testsofstatisticalinference,samplesize,andcorrelationandregressionanalysis.Followingthedescriptionareexamplesandpracticeproblemstobecompletedbothbyhandandwiththeaidofastatisticalcomputerprogram.Theseexamplesandpracticeproblemswillgiveyouanopportunitytoapplytheconceptstosituationsthatyoumayfindinthefield. DatasetsforthepracticeproblemsareeitherincludedintheworkbookorontheaccompanyingCD. Asyoucompletethepracticeproblems,youmaycheckyourworkbyreferringtotheanswerkeylocatedinAppendix1.

    Thisworkbookismeantasasupplementaltextandisnotintendedtoreplaceyourregularbiostatisticscourse.However,weallneedafriendlyreminderfromtimetotime.Forthisreason,wehaveincludeddefinitionsofcommonlyusedtermsinbiostatisticsforyourreference.

    Data: Therawmaterialofstatistics,datagenerallyconsistsofnumbersofmeasurementorcountsofapopulationsample.Forexample,anursemayrecordthetemperatureofpatients(ameasurement)orcountthenumberofpatientswithatemperatureabovenormal.

    Variable: Thetermforacharacteristicthatisdifferentinmembersofapopulationorsample,suchasheight.Thismeasurementisnotconstant,sothereforeitisvariable.Variablescanbequalitativeorquantitative,continuousordiscrete.Randomvariablescannotbepredictedandarethemostusefulforstatisticalpurposes.

    Population: Acollectionofentities.Astatisticalpopulationreferstothelargestcollectionofentitiesinwhichwehaveaninterest.Forexample,wemaybeinterestedinlookingatwomenofreproductiveagewhohavehadonechild.Therefore,ourpopulationislimitedtoonlythosewomenaged1545whohaveonechild.

    Sample: Partofapopulation.Asampleoftheexamplepopulationofwomen1545withonechildmightconsistofanestimated25percentofthepopulation.

    Parameter:Adescriptivemeasurecomputedfromthedataofapopulation.

    Statistic: Adescriptivemeasurecomputedfromthedataofasample.Statisticsisafieldwhichexaminesthecollection,organization,summarization,andanalysisofdataanddrawsinferencesregardingthatdataforapopulationthroughobservationofasample.

  • IntroductiontoBiostatistics

    BiostatisticsWorkbook 8DRAFT:Aug.28,2007

    DescriptiveStatistics: Methodsforpresentingandsummarizingdata.Descriptivestatisticsallowustounderstandgeneralpatternsinalargequantityofdatawithoutconductingaformaltestofahypothesis.

    InferentialStatistics:Statisticsusedtoreachaconclusionaboutapopulationbasedoninformationgatheredfromasampleofthatpopulation. Involvesestimationorhypothesistesting.

    StatisticalSymbols

    :populationmean :populationstandarddeviationx :samplemean s:samplestandarddeviation.50:median

  • FrequencyDistributions

    BiostatisticsWorkbook 9DRAFT:Aug.28,2007

    ScalesofMeasurement

    Therearefourcommonlyrecognizedscalesofmeasurementforvariables.

    NominalScaleThenominalscaleclassifiespersonsorthingsbasedonaqualitativeassessmentofthecharacteristicbeingassessed.Itneitherincludesinformationonquantityoramountnordoesitindicatemorethanorlessthan.

    Example:Gender(maleorfemale)isacommonnominalvariableusedinepidemiologicstudies.

    Example:Countrytelephonecodesareanexampleofnumericvariablesthatdonotindicatemoreorless(countrycode82isnotmorethancountrycode37).

    OrdinalScaleTheordinalscalealsoclassifiespersonsorthingsbasedonthecharacteristicbeingassessedbutdoesindicatemorethanorlessthan.Inthissense,itprovidesmoreinformationthanthenominalscale. However,theordinalscaledoesnotindicatehowmuchmorethanorlessthan.

    Example:Ratingstudentsperformanceasbeingpoor,average,good,orexcellentindicateshowwellstudentsperformandprovidesabasisforcomparison.However,itdoesnotindicatehowmuchbetteranexcellentperformanceiscomparedtoagoodone.

    IntervalScaleTheintervalscalehasthesamecharacteristicsoftheordinalscaleclassifyingpersonsorthingsbasedonthecharacteristicassessedandindicatingmorethanorlessthanbuttheintervalscaleindicateshowmuchmorethanorlessthan.Whattheintervalscaledoesnotdoisindicateatruezeropointmeaningthat

    Overview

    Scalesofmeasurementallowyoutocategorizedatainordertoprovideinformationaboutthecharacteristicbeingmeasured.

    Thetypeofscaleusedinmeasuringdataaffectsthetypeandamountofinformationthatcanbeobtained.Thisaffectshowdatawillbetreatedstatistically.

    Recognizingthedifferentscalesofmeasurementandunderstandingtheirimplicationsforanalyzingdatawillalsoassistyouincreatingquestionnairesforepidemiologicstudies.

  • FrequencyDistributions

    BiostatisticsWorkbook 10DRAFT:Aug.28,2007

    therecannotbeanabsenceofacharacteristicbeingmeasured. Additionally,ratiosmadewithtwonumbersintheintervalscaledonothavemeaning.

    Example:Temperatureisanintervalinthatdifferentvaluescantellyouhowmuchmoreorless.However,thereisnotruezeropoint.Thevalueofzerointemperaturedoesnotindicateabsenceoftemperature. Also,whencomparingtwotemperatures,theirratioisnotmeaningful.Wewouldnotsaythata90degreetemperatureistwiceashotasa45degreetemperature.

    RatioScaleTheratioscaleincludesallthecharacteristicsoftheintervalscalebutdoesindicateatruezeropoint.

    Example:Heightandweightmeasurementsindicatehowmuchmoreorless,butalsohaveatruezeropoint.Aweightofzeroindicatesanabsenceofweight.

    ScalesofMeasurement:SUMMARY

    Nominal Ordinal Interval Ratio Classifiespersons

    orthingsbasedonaqualitativeassessment

    Similarordissimilarbutnotmoreorless

    Canbenumericbutnothereisnoimplicationofmoreorless

    Classifiespersonsorthingsbasedonaqualitativeassessment

    Moreorlessbutnothowmuchmoreorless

    Indicateshowmuchmoreorless

    Doesnotcontainatruezeropoint

    Cannotcreatemeaningfulratiosofthesetwonumbers

    Includesallthecharacteristicsoftheintervalscale,butcontainsatruezeropoint.

    Practice:ScalesofMeasurementIdentifythescaledescribedineachsituationbelow:

    1. Temperatureofpatientsatahealthfacility2. Theweightofchildrenunderfiveataweeklybabyweighing3. Thereligionoffamiliesinavillage4. Thelengthoftimespentinthehospital5. Thediagnosisofpatientsuponadmissiontothehospital

    RelatedConcepts

    FrequencyDistribution

  • FrequencyDistributions

    BiostatisticsWorkbook 11DRAFT:Aug.28,2007

    FrequencyDistributions

    Oneofthemostcommonwaystosummarizedataforbetterunderstandingandclearerpresentationisthroughafrequencydistribution.Afrequencydistributionisapresentationofthenumberoftimes(orthefrequency)thateachvalue(orgroupofvalues)occursinthestudypopulation.

    Afrequencydistributionhelpstogiveapictureoftheshapeofthedistributionofthedata. Dataisunimodalifitonlyhasonepeak,bimodalifithastwopeaks,andmultimodaliftherearemorethantwopeaks.Measuresofdispersionwillhelpyoutoform aclearerpictureofthedistributionofthedatabydescribingtheheight,orthespread,ofthedata.Wewilldiscussthisinmoredetailinthesectiontitled,MeasuresofDispersion.

    Afrequencydistributioncanbedisplayedasatable,abarchart,ahistogram,orafrequencypolygon. Eachmethodshouldbeclearlylabeledwiththefrequencynumber. Themethodusuallydependsonthetypeofvariablebeingdescribed.

    Overview

    Frequencydistributionsshowhowofteneachvalueforavariableoccursinasampleorpopulation.

    Example:Malariacasesmaybereportedonafrequencybymonthbasisinordertodeterminethehighriskmonthsintheyear.

  • FrequencyDistributions

    BiostatisticsWorkbook 12DRAFT:Aug.28,2007

    Categoricalvariablesarequalitativeinnatureandarebestdisplayedasatableorabarchart.

    TableAfrequencytablesimplyshowsthenumberoftimeseachspecificobservationappearsinasampleorpopulation.

    CasesofMalaria

    Frequency

    Monday 6Tuesday 4Wednesday 2Thursday 5Friday 3Saturday 4Total 24

    BarchartAbarchart,likeatable,displaysthenumberofobservationsforeachvariable,butprovidesabettervisualrepresentation.

    CasesofMalaria

    0

    1

    2

    3

    4

    5

    6

    7

    Monday

    Tuesday

    Wednesday

    Thursday

    Friday

    Saturda

    y

    Frequen

    cy

  • FrequencyDistributions

    BiostatisticsWorkbook 13DRAFT:Aug.28,2007

    Numericalvariablesarequantitativeinnatureandarebestdisplayedasafrequencyhistogramorafrequencypolygon.

    FrequencyhistogramAfrequencyhistogramshowsthefrequenciesrelativetoeachother.Thewidthofthebarisinproportionwiththeclassintervalthatitrepresents.Typicallytherearenospacesbetweenbarsinafrequencyhistogram,thoughyoumayseethemconstructedinthisfashionattimes.

    FrequencyofMalariaCasesinthePastYear

    0

    5

    10

    15

    20

    25

    0 1 2 3 3+

    NumberofCases

    Peo

    ple

  • FrequencyDistributions

    BiostatisticsWorkbook 14DRAFT:Aug.28,2007

    FrequencypolygonAfrequencypolygonincludesthesameareaunderthelinethatahistogramdisplayswithinthebars. Eachpointrepresentsamidpointinthedata.Thoughafrequencypolygonmaylooklikealinegraph,afrequencypolygonmustbeclosedattheends.

    FrequencyofMalariaCasesinthePastyear

    0

    5

    10

    15

    20

    25

    . 0 1 2 3 3+ .

    NumberofCases

    Peo

    ple

    Numericalvariablesmayneedtobegroupedforpresentationifthenumberofvaluesis largeoritisacontinuousvariable.Theboxbelowgivesguidelinesonhowtogroupvariables.

  • FrequencyDistributions

    BiostatisticsWorkbook 15DRAFT:Aug.28,2007

    RelativeFrequency

    Oftenitisusefultoknowtheproportionofthevaluesthatfallwithinaspecificcategoryorgroup.Thisisobtainedbydividingthenumberofvaluesatthatcategorybythetotalnumberinthesample.Thisisreferredtoastherelativefrequencyandispresentedasaproportion(valuesfrom0.0to1.0)orapercent(valuesfrom 0%to100%).

    Whenreportingeitherthefrequencyortherelativefrequencyintableorgraphform,makesurethatalldataisclearlylabeled.

    CasesofMalaria

    Frequency Percent CumPercent

    Monday 6 25.0 25.0Tuesday 4 16.7 41.7Wednesday 2 8.3 50.0Thursday 5 20.8 70.8Friday 3 12.5 83.3Saturday 4 16.7 100.0Total 24 100.0 100.0

    Inthetableabove,therelativefrequencyispresentedasapercentofthewhole.

    GroupingVariables

    Continuousnumericvariablesmustoftenberegroupedintocategoriesforanalysispurposes.Listedbelowaresomegeneralguidelinestousewhengroupingvariables:

    Createclassintervalsthataremutuallyexclusiveandincludealldata.Itshouldbeclearwhereoneintervalstopsandthenextonebegins.Nointervalshouldincludethesamenumbertwice.

    Usealargenumberofnarrowclassintervalsfortheinitialanalysis.Allintervalsshouldbethesamesize.Youcancombineintervalslaterifneeded,butitisimpossibletobreakintervalsdownfurtherwithoutreferringbacktotheoriginaldata.

    Usenaturalormeaningfulgroupingswhenpossible.Therearemanygroupings,suchasfiveyearageintervalsandbodymassindex(BMI),whichareusedfrequentlyand,therefore,havebecomestandard.SomegroupingshavebeenestablishedbyorganizationssuchasWHOorCDC.

    Createaseparatecategoryforunknowns.Thiswillavoidconfusionwhencomparingsubgroupobservations(n)tothetotalnumberofobservations(N).

  • FrequencyDistributions

    BiostatisticsWorkbook 16DRAFT:Aug.28,2007

    StepbyStepExample:FrequencyDistributionsUsethedatabelowtocreatefrequencydistributions. Thismightrepresentaclassofmastersstudents.First,createafrequencytableforGender,thendisplaythesameinformationinabarchart.Next,createahistogramofNumberofchildren. Also,displaythisinformationinafrequencypolygon.

    Subject Gender Age Numberofchildren

    MaritalStatus*

    1 M 32 1 M2 M 35 0 M3 F 28 0 S4 M 45 3 D5 F 47 3 M6 F 36 2 D7 M 29 1 S8 M 31 0 S9 F 42 2 D10 F 44 2 M*M=married,S=single,D=divorced

    Step Example1. Createafrequency

    table.DeterminethenumberofobservationsforeachvariableunderGender.Displaythisinatable.

    Gender FrequencyFemale 5Male 5

    2. Createabarchart. DisplaythefrequencyoftheobservationsforGenderinabarchart.

    GenderofParticipants

    0

    1

    2

    3

    4

    5

    6

    Male FemaleGender

    Frequen

    cy

  • FrequencyDistributions

    BiostatisticsWorkbook 17DRAFT:Aug.28,2007

    Step Example3. Createahistogram. Displaythefrequencyoftheobservationsfor

    Numberofchildreninahistogram.

    NumberofChildrenofParticipants

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    0 1 2 3

    NumberofChildren

    4. Createafrequencypolygon.

    DisplaythefrequencyforNumberofChildrenasapolygon.

    NumberofChildrenofParticipants

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    . 0 1 2 3 .

    Children

    5. Describethedata. Thereareanequalnumberofmenandwomenparticipatingintheconference. Thefrequencydistributionshowsthatthevariablechildrenisbimodalinnature.Themajorityofparticipantshaveeithernochildrenortwochildren.

  • FrequencyDistributions

    BiostatisticsWorkbook 18DRAFT:Aug.28,2007

    Practice:FrequencyDistributionsUsingthefollowingdataset,createvisualrepresentationsofthefrequencydistributionsforthevariables.

    Subject Gender Age Numberofchildren

    MaritalStatus

    1 M 32 1 M2 M 35 0 M3 F 28 0 S4 M 45 3 D5 F 47 3 M6 F 36 2 D7 M 29 1 S8 M 31 0 S9 F 42 2 D10 F 44 2 M

    1. Createafrequencytableforthevariable,MaritalStatus.(Includethecumulativepercent.)

    2. Showthesameinformationinabarchart.3. Drawafrequencyhistogramforthevariable, Age.Grouptheagesin

    intervalsoffivebeforebeginning.4. Displaythesameinformationinafrequencypolygon.

    Spacehasbeenprovidedonthefollowingpagestocompleteyourwork.

  • FrequencyDistributions

    BiostatisticsWorkbook 19DRAFT:Aug.28,2007

    Step PracticeSpace1. Createafrequency

    table.

    2. Createabarchart.

  • FrequencyDistributions

    BiostatisticsWorkbook 20DRAFT:Aug.28,2007

    Step PracticeSpace3. Createahistogram.

    4. Createafrequencypolygon.

    5. Describethedataset.

  • FrequencyDistributions

    BiostatisticsWorkbook 21DRAFT:Aug.28,2007

    EpiInfoExample:FrequencyDistributionsYouareattendingafictitiousinternationalconference.Demographicdatawascollectedontheattendees.Usewhatyouknowaboutfrequencydistributiontosummarizethedata. First,createatableandabarchartofthecategoricalvariable,Occupation.Then,createahistogramandafrequencypolygonforthecontinuousnumericalvariable,Weight_kg. ThedatasetiscalledFrequency_DistandisfoundintheBios_Workbook_Examples.mdbdatabase.

    FrequencyTable

    Step Example

    1. READthedataset. OpenEpiInfoandchooseAnalyzeData.

    SelectREADunderDataAnalysisCommands.

    OpenFrequency_Distinthedatabase,Bios_Workbook_Examples.mdb.

    2. Createafrequencytable.

    SelecttheFREQUENCIEScommand.

    IntheFrequencydropdownbox,highlightthevariablethatyouwanttoexamine.Forthisexample,highlightOccupation.

    ClickOK.

    3. Describethedata. Youshouldseeafrequencytableonyourscreenthatlooksliketheonebelow:

    Thischartprovidesinformationonthevariableoccupationbypresentingfrequenciesandrelativefrequencies.

  • FrequencyDistributions

    BiostatisticsWorkbook 22DRAFT:Aug.28,2007

    BarChart

    1. MakeafrequencybarchartinEpiInfo.

    ChooseGRAPHunderStatistics.

    IntheGraphTypedropdownbox,chooseBar(default).

    Intheboxlabeled1stTitle|2ndTitle,typeOccupationofParticipants.Thisisthetitleofyourchart.

    UnderXAxis,chooseOccupationastheMainVariable.

    UnderYAxis,ShowValueofCount.(default)

    ClickOK.

  • FrequencyDistributions

    BiostatisticsWorkbook 23DRAFT:Aug.28,2007

    2. Describethedata. EpiInfowillgiveyouthegraphbelow:

    Noticethatthegraphrepresentstheexactnumberslistedinthetablecreatedpreviously.

    YoucanmakeabarchartofthepercentageofparticipantsineachoccupationbychoosingShowValueofCount%underYAxis.

  • FrequencyDistributions

    BiostatisticsWorkbook 24DRAFT:Aug.28,2007

    Histogram

    1. MakeahistograminEpiInfo.

    ChooseGRAPHunderStatistics.

    UnderGraphType,chooseHistogram.

    Createatitleforyourgraph.

    ChooseWeight_kgasthemainvariableandShowValueofCount.

    NoticewhenyouselectHistogramastheGraphType,youaregiventheoptiontocreateintervals.ThisallowsyoutogroupthevariableWeight_kg,withoutcreatinganewvariable.UsingtheIntervalsoptionmakesthedataeasiertoview.IfyoucreateaFREQUENCIEStableyoucanseethattherearenearly50differentweightsrecorded.Itmaynotbeusefultohaveeachonelistedseparately.

    Tocreateintervals,lookatthecolumnmarkedXAxis.Type5inthefirstspaceunderIntervalType45inthespaceunderFirstValue.

    ClickOK.

    2. Describethedata. Nowthegraphyouseewillpresenttheweightofparticipantsin5kgintervals.

  • FrequencyDistributions

    BiostatisticsWorkbook 25DRAFT:Aug.28,2007

    EpiInfoPractice:FrequencyDistributionsUsethedatasetfromthefictitiousconference(Frequency_Dist)onceagaintocreatefrequencydistributionsforHeight_cmandPreferredLanguageinEpiInfo.

    1. CreateafrequencytableofPreferredLanguageinEpiInfo.

    2. MakeafrequencybarchartofPreferredLanguageinEpiInfo.

    3. MakeahistogramofHeightinEpiInfo.

    Revieweachofthesedisplaysanddescribethedataset.

    Step PracticeSpace

    4. Describethedatasetusingthefrequencychartsandgraphsthatyouhavecreated.

    ExcelExample:FrequencyDistributionsNowuseExceltocreateafrequencypolygonforthecontinuousnumericalvariable,Weight_kg.ThedatasetiscalledFrequency_DistandisfoundintheBios_Workbook_Examples.mdbdatabase.

    1. CreateafrequencypolygoninExcel.

    a.OpenExcelandimportthedataset.

    Fromthetoolbar,selectData.HighlightImportExternalData.ChooseImportData.LocateFrequency_DistintheBios_Workbook_Examples.mdbdatabase.ClickOpen.

    ThedatasetshouldappearasanExcelspreadsheet.

  • FrequencyDistributions

    BiostatisticsWorkbook 26DRAFT:Aug.28,2007

    b.CreateafrequencytableforWeight_kg.

    CopythevariableWeight_kgbyhighlightingthecolumn.PressCtrl+Ctocopy.ChooseablankcellonthespreadsheetandpastethevariablebypressingCtrl+V.

    Inthecellnexttothevariableheading,typeInterval.Completethecolumnbyenteringtheintervalsthatyouhavechosenforthedata.Inthiscase,createintervalsof5,beginningwith4549andcontinuinguntil100104.Youshouldanchortheintervalsbyincluding=105.Thefirstandlastintervalsshouldhaveafrequencyofzero.

    ThenextcolumnwillbetitledBin. BinisawordusedbyExceltodefineintervallimits. Inthiscolumn,wetellExcelhowtoreadtheintervalsthatwehavecreated.ThefirstnumberinthebinarraywilltellExceltofindallobservationslessthanorequaltothatnumber,n.Thesecondnumber,p,willtellExceltolocateallobservationsthatoccurbetweenn+1andp.Thiscontinuesuntilthefinalnumberinthebin,whichtellsExceltolocateallnumbersgreaterthanorequaltothatfinalnumber.

    Createthebinbytypinginthehighestnumberthatshouldbeincludedinthatinterval.Forthefirstnumberinthebin,Excelwilllookforallobservationslessthanorequaltothatnumber.Forthelastnumberinthebin,Excelwillfindobservationsgreaterthanorequaltothatnumber.

    Weight(kg) Intervals BIN Frequency

    73 =105 105

  • FrequencyDistributions

    BiostatisticsWorkbook 27DRAFT:Aug.28,2007

    677587

    YourfinalcolumnwillbecalledFrequency.WewillletExcelcalculatethefrequenciesforus.

    HighlighttheFrequencycolumnbyclickingonthefirstcellundertheheadinganddraggingthemouseuntiltheshadedareaequalsthelengthoftheBincolumn.Donotincludethecolumnlabel(Frequency)whenhighlighting.

    UnderInsertinthetoolbar,chooseFunction.SelectthefunctionFREQUENCY.Youmayhavetodoasearchforthefrequencyoptionbytypingthewordfrequencyattheprompt.

    ClickOK.

    Youwillseethefollowingbox:

  • FrequencyDistributions

    BiostatisticsWorkbook 28DRAFT:Aug.28,2007

    ClickonthecharticontotherightoftheboxlabeledData_array.HighlightallthevaluesforthevariableWeight_kg.

    Clickonthecharticonagaintoreturntothefunctionbox.

  • FrequencyDistributions

    BiostatisticsWorkbook 29DRAFT:Aug.28,2007

    c.Createafrequencypolygon.

    ClickonthecharticontotherightoftheboxlabeledBins_array.HighlightallthevaluesintheBincolumn.Clickonthecharticonagaintoreturntothefunctionbox.

    PressControlandShifttogetherandhitEnterwhilecontinuingtoholdtheothertwokeysdown.(DONOTCLICKOK!)

    Thenumberofobservationsincludedineachintervalwillbeshowninthechart.Younowhaveafrequencytable.Notethatthereisafrequencyofzeroatthehighendandatthelowendoftheweightintervals.Youwillneedthisinordertocreateafrequencypolygoncorrectly.

    Usingthefrequencytablethatyoujustmade,highlightallthevaluesinthefrequencycolumn.

    UnderInsertinthetoolbar,selectChart.

    ChooseChartType:Line.Thefirstlinegraphinthesecondrowispreferredbecauseitshowsthemidpointsinthegraph.

    ClickNext.

    Afrequencypolygonwillappear.

  • FrequencyDistributions

    BiostatisticsWorkbook 30DRAFT:Aug.28,2007

    Tocorrectlylabelthepolygon,choosetheSeriestab.

    ClickthecharticonnexttotheboxlabeledCategory(X)axislabels.

    Highlightthevaluesinthecolumn,Intervals.

    Yourchartshouldnowbelabeledsimilartotheonebelow:

    ClickNext.

    ChooseTitletogiveyourchartatitleandlabeltheXaxis.

    ClickFinish.

  • FrequencyDistributions

    BiostatisticsWorkbook 31DRAFT:Aug.28,2007

    2. Describethedata.WeightofConferenceParticipants

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    =105

    Weightinkg

    Thisdistributionisunimodalbecauseonepeakishigherthantherest.Themajorityofparticipantsweightsfalltotheleftofthepeak.Mostparticipantsweighlessthan84kg.

    ExcelPractice:FrequencyDistributionsUsethedatasetfromthefictitiousconference(Frequency_Dist)tocreateafrequencypolygonforHeight_cminExcel.

    1. CreateafrequencypolygonofHeightinExcel.

    Useyourgraphtoanswerthefollowingquestions.

    Step PracticeSpace

    2. Describethedatasetusingthefrequencypolygon.

  • FrequencyDistributions

    BiostatisticsWorkbook 32DRAFT:Aug.28,2007

    3. HowisthissimilartothehistogramthatyoucreatedinEpiInfo?

    RelatedConcepts

    CentralLocationandDispersion

  • CentralLocationandDispersion

    BiostatisticsWorkbook 33DRAFT:Aug.28,2007

    CentralLocationandDispersion

    Measuresofcentrallocationanddispersionaregenerallyreferredtoasdescriptivestatisticsbecausetheydescribethedistributionofthedataset.

    Frequencydistributionprovidesapictureofthenumberoftimesthatavariableoccurs,butrevealsnothingaboutthespreadofthedata. Inordertogainaclearerpictureofhowdataisdistributed,wewillcalculate:

    Measuresofcentraltendency:mean,median,mode,range Measuresofdispersion:variance,standarddeviation,andstandarderror

    Throughthesemeasures,thedatabeginstotakeshape.Whencombinedwithfrequencydistribution,wecanvisualizethedistributionofthedata. Weobtainthenumberandheightofthepeaksinthedistributionfromthefrequency.Measuresofdispersionallowustoobtainanideaofthewidth,orthespreadofthedistributionofthedata.

    Datacanbeeithersymmetricorskewed.Ifthedatacanbedividedintopiecesthatareverysimilartoeachother,wecansaythatthedataissymmetric.Ifonetailofaunimodaldistributionislongerthantheothertail,thenthedataisskewed,meaningthatthedataisnotspreadevenly.Datacanbeeitherrightskewedorleftskewed. Ifdataisskewedtotheright,itwillrisequicklytoapeakandhavealongtailontheright.Theoppositeistruefordatathatisskewedtotheleft.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 34DRAFT:Aug.28,2007

    MeasuresofCentralTendency

    MeanThemeanissimplythearithmeticaverageofthedataandiscalculatedbytakingthesumofallvaluesinthenumbersetanddividingthattotalbythenumberofvaluesinthedataset. Themeanisthemostcommonlyusedmeasureofcentraltendency.

    n

    xx =

    MedianThemedianisthe50thpercentileofthevaluesinadatasetandrepresentstheliteralmiddleofthedata.Themedianisfoundbyarrangingallvaluesinthedatasetinnumericalorderandthenchoosingthemiddlevalue. Ifthenumberofvaluesinadatasetiseven,takethemeanofthetwomiddlenumberstofindthemedian.

    ModeThemoderepresentsthevaluethatisfoundmostfrequentlyinasetofnumbers.Notethatitispossibletohavemorethanonemode. Inthefollowingsetofnumbers,{87889656467},themodeisboth8and6,sinceeachisincludedinthedatasetthreetimes. Thisdatasetisreferredtoasbimodalbecauseithastwomodes. Itisalsopossiblenottohaveamodeinasetofnumbers.Inthefollowingsetofnumbers,{5497638},thereisnonumberwhichoccursmorefrequentlythananyother.Therefore,thereisnomode.

    Overview

    Measuresofcentraltendencyareusedtodescribethedatainthesamplebygivinganideaofthecenterandthedistributionofthedata.

    Therearethreecommonmeasuresofcentraltendency:mean,medianandmode.

    Formula:Forinstance,thearithmeticmeaniscalculatedasfollows:

    n

    xx =

  • CentralLocationandDispersion

    BiostatisticsWorkbook 35DRAFT:Aug.28,2007

    Comparisonofmean,median,andmodeWhenyouaretoldtoaveragethedata,itisgenerallyexpectedthatyouwilltakethemean.Technically,however,theaveragecouldrefertothemean,themedian,orthemodeofthedata.Themeanisabletogiveusthemostinformationaboutthedatasetasawhole,especiallywhencombinedwiththestandarddeviation.Therefore,weprefertousethemeanwhenwecan.

    Therearecertainadvantagestothemedian. Themedianisresistanttoskewing,theresultofanoutliercausingthemeanofthedatatoshifteithertotheleftortotheright. Itisnotaffectedbyextremevalueslikethemeanisanditismorerepresentativeofthecenterofdatawhendataisasymmetrical.

    Letsconsiderskeweddata.LookatthegraphofthepopulationdistributionbystateintheUnitedStates.

    PopulationoftheUnitedStatesbyState

    0

    5,000,000

    10,000,000

    15,000,000

    20,000,000

    25,000,000

    30,000,000

    35,000,000

    40,000,000

    .Califo

    rnia

    .Tex

    as

    .New

    York

    .Florid

    a.Illinois

    .Pen

    nsylva

    nia

    .Ohio

    .Michiga

    n.G

    eorgia

    .New

    Jerse

    y.NorthCarolina

    .Virg

    inia

    .Mas

    sach

    usetts

    .Was

    hing

    ton

    .Indian

    a.Ten

    nessee

    .Ariz

    ona

    .Misso

    uri

    .Marylan

    d.W

    isco

    nsin

    .Minne

    sota

    .Colorad

    o.Alaba

    ma

    .Lou

    isiana

    .Sou

    thCarolina

    .Ken

    tuck

    y.O

    rego

    n.O

    klah

    oma

    .Con

    necticut

    .Iowa

    .Mississippi

    .Arkan

    sas

    .Kan

    sas

    .Utah

    .Nev

    ada

    .New

    Mex

    ico

    .Wes

    tVirg

    inia

    .Neb

    rask

    a.Id

    aho

    .Maine

    .New

    Ham

    pshire

    .Haw

    aii

    .Rho

    deIs

    land

    .M

    ontana

    .Delaw

    are

    .Sou

    thDak

    ota

    .Alask

    a.NorthDak

    ota

    .Vermon

    t.Districto

    f.W

    yoming

    State

    Population

    Thestatesappearingontheleftsideofthehistogramhaveasignificantlylargerpopulationthanotherstates.Becauseofthis,weexpectthemeantobehigherinvaluethanthemedian.Thecalculatedmeaninthissampleis5,811,968.706,whichisjustmarkedonthegraphabove.Themedianis4,173,405,alsomarkedonthegraph. Themeaninthisexampleisgreaterthanthemedian. Ageneralruletofollowisthatifthedataisskewedeithertotheleftortotheright,themedianrepresentsthedatabetterthanthemean. Ifasampleisnormallydistributed,themeanandmedianwillbenearlythesame.Withsymmetricaldata,themodewillbesimilaraswell.

    Mean Median

    UnitedStatesPopulationbyState

  • CentralLocationandDispersion

    BiostatisticsWorkbook 36DRAFT:Aug.28,2007

    Whenthesamplesizeissmall,themodemayrepresentthedatamostaccurately. Itispossiblethatinbimodaldata,themodeswillbeamoreaccuratedescriptionaswell.Themodeisalsofrequentlyusedtodescribequalitativedata.Forexample,youmightfindamodaldiagnosis,orusethemodetodescribemedicaldiagnosesbystatingthediagnosisthatwasseenmostfrequentlyoveragivenperiodoftime.

    StepbyStepExample:Mean,Median,ModeThefollowingareagesofpatientsseenbythedoctorforabrokenboneinthepastmonth:

    15 17 20 14 16 15 17 22 18 13 15 14 16 18 20

    Usethedatatoanswerthefollowingquestions:

    Whatisthemeanageofthepatients?Whatisthemedianageofthepatients?Whatisthemodalageofthepatients?Whichmeasureisthemostrepresentativeofthesample?

    Step Example1. Findthe

    mean, x ,ofthesample.

    x =n

    x =

    15201816141513182217151614201715 + + + + + + + + + + + + + + =

    15250

    =16.7

    2. Findthemedianofthesample.

    Firstlinethenumbersupinnumericalorder:131414151515161617171818202022

    Findthemiddlenumber:131414151515161617171818202022

    Thereare7numbersoneithersideofthearrow,thus16isthemedian.

    3. Findthemodeofthesample.

    131414151515161617171818202022

    Thenumberthatappearsmost,atthreetimes,inthisdatasetis15.Therefore,15isthemode.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 37DRAFT:Aug.28,2007

    Step Example4. Which

    statisticismostrepresentativeofthecenterofthedataset?

    Inthiscase,themeanandthemedianarenearlyequal.Therefore,wecanassumethatthecurveisnormallydistributedandthemeanrepresentsthecenterofthecurve.Ifthemeanandthemedianaredifferent,wecanassumethatthedataisskewedandthemedianwillgenerallybemoreappropriate.

    Practice:Mean,Median,ModeInordertodetermineifthereisarelationshipbetweenageandthenumberofvisitstothedoctor,youdecidetocountthenumberofdoctorvisitsthatindividualsmakeoverthecourseofayear.Belowisthedatathatyouhavecollected:

    Individual Age Visits1 45 152 60 83 52 224 46 95 23 26 52 157 37 38 33 13

    Describetheaverageageofyoursampleandtheaveragenumberofdoctorvisitsmadebyanindividualusingthemean,median,andmode.

    Step PracticeSpace1. Findthemean, x .

    x =n

    x

    2. Findthemedian.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 38DRAFT:Aug.28,2007

    Step PracticeSpace3. Findthemode.

    4. Whichstatisticismostrepresentativeofthecenterofthedatasetandwhy?

    EpiInfoExample:Mean,Median,ModeUsingthesamedatathatwepracticedwithbeforeonpage36,wecanfindthemean,median,andmodeintwosimplestepsusingEpiInfo.

    Step Example1. UseEpiInfoto

    determinedescriptivestatistics.

    a. READthedataset.

    OpenEpiInfoandchooseAnalyzeData.

    SelectREADinDataAnalysisCommands.

    HighlightCentral_TendencyfromtheDataSourceBios_Workbook_Examples.

    ClickOK.

    b. FindtheMEANSofthedata.

    SelectMEANSfromtheCommandscolumnunderStatistics.

    ChooseAgefromthedropdownboxunderMeansof.

    ClickOK.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 39DRAFT:Aug.28,2007

    Step Example2. Identifythemean,

    median,andmodeofthedata.

    Thisistheoutputthatyoushouldsee:

    Theoutputgivesyouthemean,themedian,andthemode.EpiInforeportsthemeantobe16.7,themediantobe16.0,andthemodetobe15.0.Thisdoesnotdifferfromthehandcalculationsthatweperformedpreviously.

    3. Interprettheresults.

    Aswedeterminedearlier,themeanandthemedianarenearlyequal. Therefore,wecanassumethatthecurveisnormallydistributedandthemeanrepresentsthecenterofthecurve.Ifthemeanandthemedianaredifferent,wecanassumethatthedataisskewedandthemedianwillgenerallybemoreappropriate.

    EpiInfoPractice:Mean,Median,ModeYouareweighingbabiesfrom9AMto11AMatanunderfiveclinicinthevillage.Yourresultsareasfollows:

    Age(months)

    Length(cm)

    Weight(kg)

    21 77 9.834 87 11.523 84 10.830 92 14.027 85 12.024 82 10.831 87 11.626 85 11.822 85 12.432 86 12.0

    UseEpiInfotofindthemean,median,andmode. Then,answerthequestionsthatfollow. ThedatasetyouareworkingfromiscalledBabyWeighing.RemembertoopenthedatasetinEpiInfobyusingtheREADcommand.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 40DRAFT:Aug.28,2007

    Step PracticeSpace1. Identifythemean,

    median,andmodeofthedata.

    Length: Weight:

    Mean______ Mean______

    Median_____ Median_____

    Mode______ Mode______

    2. Whatistheaveragelengthandweightofbabiesthatcameintothecliniconthismorning?

    3. Whatcanyoudetermineaboutthedistributionofthedatabasedonyourresults?

    MeasuresofDispersionRelatedConcepts

    MeasuresofDispersionNormalDistribution

  • CentralLocationandDispersion

    BiostatisticsWorkbook 41DRAFT:Aug.28,2007

    MeasuresofDispersion

    Intheprevioussection,wediscussedmethodsofdescribingthecenterofthedata.Nowwewanttoexaminewaystodescribethespreadofthedata,orhowfareachdatapointisfromthecenter.

    Range:Therangeofthedataisthedifferencebetweenthesmallestobservation(minimumvalue)andthelargestobservation(maximumvalue)inasetofdata.Therangeiscalculatedbyfindingthedifferencebetweenthemaximumvalueandtheminimumvalueinasetofdata.

    range=maximum minimum

    InterquartileRange(IQR): Theinterquartilerangeisthedifferencebetweenthe25thpercentile(1stquartile)andthe75thpercentile(3rdquartile)inasetofdata.Thismeasurementgivesanideaofthemiddle50percentoftheobservationsandis,therefore,lesslikelytobeinfluencedbyoutliersorextremevalues.

    IQR4

    )1n(4

    )1n(3 + -

    + =

    Overview

    Measuresofdispersiondescribevariabilityofdatainasamplebydescribingthespreadofthedata.

    Formulas:Range=maximum minimum

    InterquartileRange=4

    )1n(4

    )1n(3 + -

    + =

    Variance= 2in

    1i

    2 )xx()1n(

    1s -

    - = S

    =

    OR)1n(n

    )x(xn 2i2i

    - -

    Standarddeviation= 2ss =

    Standarderror=n

    sSE =

  • CentralLocationandDispersion

    BiostatisticsWorkbook 42DRAFT:Aug.28,2007

    Variance(s2): Thevariancerepresentstheamountofspreadorvariabilityaroundthemeanofasetofdata. Becausethevarianceisinunitssquared,wefindthestandarddeviationtodescribeourdataintheproperunits. Thesymbols2 isusedwhenwearereferringtothevarianceofasampleandthesymbol2

    (pronouncedsigmasquared)whenwearereferringtothevarianceofapopulation.

    2i

    n

    1i

    2 )xx()1n(

    1s -

    - = S

    =

    OR)1n(n

    )x(xn 2i2i

    - -

    StandardDeviation(s): Thestandarddeviationofasetofdataisthesquarerootofthevariance. Itdescribestheaveragedistanceofallobservationsfromthemeanofthesampleandisusedasvariabilitytodescribethespreadofthedata.Alargestandarddeviationrepresentsawidespreadbecausetheobservationsarefarfromthemean. Whenwerefertothestandarddeviationofapopulation,weusethesymbol(sigma).

    2ss =

    StandardError(SE): Thestandarderroristhestandarddeviationofthesamplingdistributionofthemeans,ratherthantheobservationsthemselves.Thesmallerthestandarderror,thecloseranygivensamplemeanislikelytobetothetruepopulationmean.

    n

    sSE =

    StepbyStepExample:MeasuresofDispersionUsingthedatabelow,followtheinstructionstoidentifythemeasuresofdispersionforAge.

    Individual Age Visits1 45 152 60 83 52 224 46 95 23 26 52 157 37 38 33 13

  • CentralLocationandDispersion

    BiostatisticsWorkbook 43DRAFT:Aug.28,2007

    Minimum,maximum,andrange

    Step Example1. Identifytheminimum

    valueofAge.Theminimumvalueisthelowestvalueinthesample.Inthiscase,itis23.

    2. IdentifythemaximumvalueofAge.

    Themaximumvalueisthehighestvalueinthesample.Inthiscaseitis60.

    3. DeterminetherangeofAge.

    maxmin=range

    6023=37

    37istherangeofthesample.

    4. Stateyourconclusions.

    TheobservationsinAgecoverarangeof37years.

    InterquartileRange

    Step Example1. Arrangeobservations

    ofthevariableAgeinorderofincreasingvalue.

    1)232)333)374)455)466)527)528)60

    2. Findthepositionofthe1st (Q1)and3rd

    (Q3)quartiles.

    4)1n(

    Q1 +

    = 4

    )1n(3Q3

    + =

    25.2=4

    )1+8(=Q1

    75.6=4

    )1+8(3=Q3

  • CentralLocationandDispersion

    BiostatisticsWorkbook 44DRAFT:Aug.28,2007

    Step Example3. Locateeachnumber

    indicatedinthedataset.

    Q1,withapositionof2.25,isonefourthofthewaybetweenthe2ndand3rdobservationsintheset.The2ndvalueis33andthe3rd is37,so

    34133)3337(41

    331 = + = - + = Q

    Q3,withapositionof6.75,isthreefourthsofthewaybetweenthe6thand7thobservationsintheset.The6thvalueis52andthe7thvalueisalso52.Therefore,Q3=52.

    4. FindthedifferencebetweenQ1andQ3todeterminetheinterquartilerange.

    Q3Q1=IQR

    Q1=34Q3=52

    5234=18

    5. Stateyourconclusions.

    The50thpercentileofthedatahasarangeof18.ThismeansthatthemiddlehalfofalltheobservationsinAgeisspreadacross18years.

    Variance,standarddeviation,andstandarderror

    Step Example1. Findthemeanof

    thedataset.1)232)333)374)455)466)527)528)60

    5.43=8

    348=

    860+52+52+46+45+37+33+23

    =x

  • CentralLocationandDispersion

    BiostatisticsWorkbook 45DRAFT:Aug.28,2007

    2. Calculatethevarianceusingtheformulabelow.

    2i

    n

    1i

    2 )xx()1n(

    1s -

    - = S

    =

    ])5.4360(+)5.4352(2+)5.4346(+)5.4345(

    +)5.4337(+)5.4333(+)5.4323[()18(

    1=s

    2222

    2222

    ]25.272+)25.72(2

    +25.6+25.2+25.42+25.110+25.420[71

    =s2

    99871

    =s2

    57.142s2 =

    3. Calculatethestandarddeviation.

    2ss =

    57.142=s

    s=11.94

    4. Calculatethestandarderrorofthemeans.

    n

    sSE =

    8

    94.11SE =

    SE=4.22

    5. Stateyourconclusions

    Theobservationsareanaverageof11.94yearsawayfromthemean.Ifweweretotakemanysamplesfromthesamepopulation,theaverageofthesamplemeanswouldbe4.44yearsfromtheactualpopulationmean.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 46DRAFT:Aug.28,2007

    Practice:MeasuresofDispersionUsethesamedatasettodescribethedispersionoftheobservationsofthevariableVisits.

    Individual Age Visits1 45 152 60 83 52 224 46 95 23 26 52 157 37 38 33 13

    Minimum,maximum,andrange

    Step PracticeSpace1. Identifytheminimum

    valueofVisits.

    2. IdentifythemaximumvalueofVisits.

    3. DeterminetherangeofVisits.

    maxmin=range

    4. Stateyourconclusions.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 47DRAFT:Aug.28,2007

    InterquartileRange

    Step PracticeSpace1. Arrangeobservations

    ofthevariableVisitsinorderofincreasingvalue.

    2. Findthepositionofthe1st (Q1)and3rd

    (Q3)quartiles.

    4)1n(

    Q1 +

    = 4

    )1n(3Q3

    + =

    3. Locateeachnumberindicatedinthedataset.

    4. FindthedifferencebetweenQ1andQ3todeterminetheinterquartilerange.

    Q3Q1=IQR

    5. Stateyourconclusions.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 48DRAFT:Aug.28,2007

    Variance,standarddeviation,andstandarderror

    Step PracticeSpace1. Findthemeanofthe

    variableVisits.

    2. Calculatethevarianceusingtheformulabelow.

    2i

    n

    1i

    2 )xx()1n(

    1s -

    - = S

    =

    3. Calculatethestandarddeviation.

    2ss =

    4. Calculatethestandarderrorofthemeans.

    n

    sSE =

    5. Stateyourconclusions.

  • CentralLocationandDispersion

    BiostatisticsWorkbook 49DRAFT:Aug.28,2007

    EpiInfoExample:MeasuresofDispersionUsethetablebelow(datasetBabyWeighing)tofindmeasuresofdispersionforthevariableAgeinEpiInfo.Firstfindthemaximum,minimum,range,andinterquartilerange.Thencalculatethevariance,thestandarddeviation,andthestandarderror.

    Step Example1. READthedatasetin

    EpiInfo.OpenEpiInfoandchooseAnalyzeData.

    SelectREADandopenthedatabase,Bios_Workbook_Examples.ChoosethedatasetBabyWeighing.

    ClickOK.

    2. FindtheMEANSofthedataset.

    SelectMEANSundertheStatisticsheading.

    InthedropdownmenuforMeansOf,chooseAge_in_months.

    ClickOK.

    Age(months)

    Length(cm)

    Weight(kg)

    21 77 9.834 87 11.523 84 10.830 92 14.027 85 12.024 82 10.831 87 11.626 85 11.822 85 12.432 86 12.0

  • CentralLocationandDispersion

    BiostatisticsWorkbook 50DRAFT:Aug.28,2007

    Step Example3. Usetheoutputto

    determinetherangeandtheinterquartilerange.

    Theoutputprovidesyouwiththemaximumandtheminimuminthedata.Findthedifferencetodeterminetherange.

    Range=maximumminimumRange=3421=13

    Theoutputalsoprovidesthe25thpercentile,equaltoQ1,andthe75thpercentile,equaltoQ3,sothatwecandeterminetheinterquartilerange.

    IQR=Q3Q1IQR=3123=8

    4. Usetheoutputtoidentifythevarianceandstandarddeviationofthevariable.

    Variance=20.67StandardDeviation=4.55

    Ifwewanttocalculatethestandarderror,wesimplydividethestandarddeviationbythesquarerootofthenumberofobservations:

    44.110

    5461.4SE = =

  • CentralLocationandDispersion

    BiostatisticsWorkbook 51DRAFT:Aug.28,2007

    Step Example5. Describethevariable

    intermsofdispersion.TherangeofthevariableAge_in_monthsis13months.Themiddlehalfofthedataspans8months.Theaveragedistanceofeachobservationfromthemeanofthedatais4.55months.Ifweweretotakemanysamplesfromthesamepopulation,wewouldfindthattheaveragesamplemeanis1.44monthsfromtheactualpopulationmean.

    EpiInfoPractice:MeasuresofDispersionUsethesamedataset,BabyWeighing,topracticedescribingdataintermsofdispersionwiththehelpofEpiInfo.Determinetherangeandinterquartilerangeandidentifythevariance,standarddeviation,andthestandarderrorofthevariableLength.

    FindtheMEANSofthedatasetinEpiInfo.

    Usetheoutputtoanswerthefollowingquestions.

    Step PracticeSpace1. Determinetherange

    andtheinterquartilerange.

    Range=

    IQR=

    2. Identifythevarianceandstandarddeviationofthevariable.

    s=______

    s2=______

    3. Describethevariableintermsofdispersion.

    RelatedConcepts

    NormalDistribution

  • ProbabilityandtheNormalDistribution

    BiostatisticsWorkbook 52DRAFT:Aug.28,2007

    Probability andtheNormalDistribution

    Uptothispoint,wehavefocusedondescriptivestatistics.Wehavesimplybeenorganizingandsummarizingdatathathasbeencollected.Wealsowanttoexploresomemethodsfordrawingconclusionsaboutpopulationsbasedsolelyondatathatwehaveforasampleofthatpopulation. Becausewecanneverbecertainthatourconclusionsbasedonthissampleaccuratelyrepresentthetargetpopulation,werefertothisasinferentialstatistics.Inferentialstatisticsisbasedonprobabilitytheory,orthescienceofuncertainty.Thefollowingsectionsdescribehowprobabilitytheoryallowsustomakeinferencesaboutapopulationbasedondataobtainedfromasampleofthatpopulation.

  • NormalDistribution

    BiostatisticsWorkbook 53DRAFT:Aug.28,2007

    ProbabilityDistribution

    Probabilityisanindicatorofthelikelihoodthataneventorconditionwilloccur.Somedescribeitasthelongrunrelativefrequencyoftheeventinrepeatedtrialsundersimilarconditions.Itreflectstheproportionofthepopulationwiththeconditionorevent.Forexample,if40%ofworkersinafactoryarefemale,theprobabilitythatarandomlyselectedworkerwillbeafemaleis40%orstatedanotherwayifwerandomlyselectnworkers,theexpectednumberoffemalesinthesampleisnx40%. Alternatively,theexpectednumberofmalesisnx(100%40%),ornx60%.

    Probabilitycanalsobeusedtoconsidercontinuousvariables(notjustconditionsoreventsasnotedabove).Itcanindicatethelikelihoodofavalueinaparticularrange.Forexample,if5%ofmenatthefactoryhaveaheightover180cm,theprobabilitythatarandomlyselectedmanwillhaveaheightover180cmis5%.

    Probabilitydistributionsrepresenttheprobabilityofthedifferentoutcomes(e.g.male,female)forasampleselection.Therelationshipbetweenthevaluesofavariableandtheprobabilitiesoftheiroccurrencecanbesummarizedinaprobabilitydistribution.

    Ifweselectasingleworkerfromthisfactory,theprobabilitydistributionforthepossibleoutcomesforgenderissimple.

    Possibleoutcome ProbabilityMale 0.60Female 0.40

    Ifweselectthreeworkersthentheprobabilitydistributionbecomesmorecomplicated.

    Possibleoutcomes ProbabilityAllmale 0.216=(0.60x0.60x0.60)2male,1female 0.432=(0.60x0.60x0.40)2female,1male 0.288=(0.40x0.40x0.60)

    Overview

    Aprobabilitydistributionisadistributionofdatabasedonthelikelihoodthataneventorindicatorwilloccurinasampleofthepopulation.

    Knowledgeoftheprobabilitydistributionofavariableallowsustodrawconclusionsaboutapopulationbasedondatatakenfromasampleofthatpopulation.

  • NormalDistribution

    BiostatisticsWorkbook 54DRAFT:Aug.28,2007

    Allfemale 0.064=(0.40x0.40x0.40)

    Thereareseveralmodelortheoreticalprobabilitydistributionsthatwillallowustodeterminetheprobabilityofagivenvalueforarandomvariableevenifwedonothave(orknow)thefullprobabilitydistributionforthatvariable.Theseprobabilitydistributionsaregivenorcalculatedbymathematicalformulaecalledprobabilityfunctions. Wecanapplythemodeltocreateaprobabilitydensitycurvewheretheheightofthecurvereflectsthefrequencyoftheindividualvaluesandtheareasinanintervalunderthecurvereflectstheproportionofapopulationinthatinterval.Thisisalsoaprobabilitydistribution.

    Examplesofprobabilityandotherdistributionsincludethenormal,binomial,Poisson,Chisquare,F,andtdistributions. Forthesakeofsimplicity,theonlydistributionwewillcoverinthisworkbookisthenormaldistribution.

    RelatedConcepts

    NormalDistribution

  • NormalDistribution

    BiostatisticsWorkbook 55DRAFT:Aug.28,2007

    NormalDistribution

    Thenormaldistributionisthemostfamousandimportantofthetheoreticalprobabilitydistributionsfortwomainreasons.First,formanyvariablesweencounterinthehealthfield(e.g.height,bloodpressure,hemoglobinlevel,etc.),itisagooddescriptionofthedistributionofthevariable.Secondlyandmoreimportantly,thenormaldistributionhasacentralroleinstatisticalanalysisasitisusedastheprobabilitydistributionofthesamplemeans. Calculationsbasedonthenormaldistributionareusedtoderiveconfidenceintervalsanddeterminepvaluesforquantitativedata,proportions,andrates.

    Characteristicsofanormaldistribution:

    Itisspecifiedbytwoparameters:thepopulationmeanandthestandarddeviation.

    Itissymmetricalaroundthemean,bellshaped,andunimodal.Thisiswhythenormalcurveisfrequentlyreferredtoasthebellcurve.

    Themean,median,andmode,areallinthemiddleofthecurve. Thetotalareaunderthecurveabovethexaxisisonesquareunitwith

    50%oftheareatotherightofthemeanand50%totheleftofthemean.AccordingtotheEmpiricalRule: Theareaboundedbyonestandarddeviationtotherightandonestandard

    deviationtotheleftofthemeanwillrepresentsapproximately68%ofthevalues.

    Theareaboundedbytwostandarddeviationstotherightandtwototheleftwillrepresentsapproximately95%ofthevalues.

    99.7%ofthevalueswillbewithinthreestandarddeviationsofthemean.Thisisdemonstratedinthegraphonthenextpage:

    Overview

    Thenormaldistributionisabellshapedcurvewithboththemeanandthemedianatthecenterofthecurve.

    Thestandardnormaldistributionisadistributionofdatawithameanofzeroandastandarddeviationofone.Itallowsdifferentpopulationstobecomparedtoeachother.

    Formula:Theformulabelowisusedtocalculatethestandardscore,orthezscorewhencomparingnormallydistributedpopulations.

    x

    =z

  • NormalDistribution

    BiostatisticsWorkbook 56DRAFT:Aug.28,2007

    Knowingthemeanandstandarddeviationofanormaldistributionallowsonetodeterminethefollowingvalues:

    Theproportionofindividualswhofallintoanyrangeofvalues Thepercentileatwhichagivenvaluefalls Thevaluewhichcorrespondstoagivenpercentile

    BelowisafrequencydistributionoftheheightofmenintheUSpopulation,characterizedbyanormaldistributionwithameanof171.5cmandastandarddeviationof6.5cm.

    =171.5cm

  • NormalDistribution

    BiostatisticsWorkbook 57DRAFT:Aug.28,2007

    GiventhatthemeanheightofthemenintheUSis171.5cm(=171.5cm)andthestandarddeviationis6.5cm(=6.5cm)andusingourknowledgeofthenormalcurve,weknowthefollowinginformation:

    68.3%ofmenarebetween165and178cm ( 1=171.5 6.5) 95.5%ofmenarebetween158.5and184.5cm( 2=171.5 2x6.5)

    Whatifwewanttoknowspecificinformationsuchas:

    Whatproportionofmenareover180cm? Whatheightvalueisatthe10thpercentile?

    Statisticianshavedevisedamethodtotransformallnormaldistributionssothattheyusethesamescale.Thisisknownasthestandardnormaldistribution.Thestandardnormaldistributionisanormaldistributionwithameanof0andastandarddeviationof1. Anormaldistributioncanbecomparedwithothernormaldistributionsbyconvertingittoastandardnormaldistributionusingtheformulashownbelow. Thestandardnormaldistributionspecifieshowfaranindividualvalueisfromthemeaninunitsofthestandarddeviation,whichallowsustocalculateastandardscore.Thestandardscoreisawayofexpressinganindividualvalueintermsofstandarddeviationunits.Thestandardscore,referredtoasthezscore,iscalculatedas (observedvaluemean)dividedbythestandarddeviation.Theformulaisbelow:

    x

    =z

    Thezscorewillalsobereferredtoasateststatistic.Eachdistributionhasacorrespondingteststatistic.Thezscorecorrespondswiththestandardnormaldistribution.

  • NormalDistribution

    BiostatisticsWorkbook 58DRAFT:Aug.28,2007

    Example:UsingtheStandardNormalDistributionGivenanormaldistributionofmaleheightswith=171.5cmand=6.5cm,whatistheproportionofmentallerthan180cm?

    5.65.171180

    =x

    =z

    31.1=5.65.8

    =z

    Nowthatweknowthezscore,wemustfindtheareaofthestandardnormalcurveabove1.31.

    Inordertofindtheareaofthecurvethatisrepresentedbythezscore,1.31,wemustrefertothestandardnormalzdistributionlocatedinAppendix2.

    OntheStandardNormalzTable,locatethezscore1.31. Underthecolumnlabeledz,findthevalue,1.3.Therowlabeledzwillprovideyouwiththehundredthsplaceofyourzscore,sofollowitoveruntil0.01.Ifyouplaceonefingeron1.3andononefingeron0.01andfollowthosepathsuntilyourtwofingersmeet,youfindthevalue,0.9049. UsetheexcerptfromtheStandardNormalzTableonthefollowingpagetohelpyoulocatethezscore.

    0 1.31

  • NormalDistribution

    BiostatisticsWorkbook 59DRAFT:Aug.28,2007

    ThistablewillgiveustheareaofthecurvelocatedtotheLEFTofthezscore.Asyoucanseebythediagram,wewanttofindtheareaofthecurvelocatedtotheRIGHTofthezscore. Tofindtheareatotherightofthezscore,wesubtract0.9049from1.

    10.9049=0.0951

    Therefore,approximately9.5%(0.0951x100%)ofthecurveisabove180cm(orabove1.31SDofthemean).Wecanalsosaythatmenwhoseheightsare180cmandabovearetallerthan90.5%ofAmericanmen. Thus,aheightof180cmrepresentsthe90thpercentile.

    Topracticeusingthetableforthestandardnormaldistribution,answerthefollowingquestion.

    Whatheightvalueisatthe10thpercentile? Wemanipulatetheformulatosolveforxratherthanz:

    x=+(z )where:

    xistheobservedvalue isthepopulationmean(given) isthepopulationstandarddeviation(given) zcomesfromthestandardnormaldistribution

  • NormalDistribution

    BiostatisticsWorkbook 60DRAFT:Aug.28,2007

    Tofindtheanswertothisproblem,firstlookupthezscorefromthetableinAppendix2whichcorrespondstothelowest10%oftheareabeneaththecurve.Thisareawillbeonthelefthandsideofthecurve. Dothisbyreversingthestepswepreviouslyusedtofindthearea.

    Locatetheareaclosestto0.10intheztable.Thenfollowtherowandcolumntoidentifythezscorethatitisassociatedwith.Youshouldfindazscoreof1.28.

    x=+(z )x=171.5+(1.28x6.5)x=171.58.3525=163.1475

    The10thpercentileis163.1cm.Thismeansthat10%ofAmericanmenare163.1cmorshorterand90%ofAmericanmenaretallerthan163.1cm.

    Practice:UsingtheStandardNormalDistributionYouhaveattendedanHIV/AIDStrainingwhereapretestandaposttestwasgiveninordertomeasureknowledgegained.Pretestscoresareincludedinthetablebelow.Usethetabletoanswerthefollowingquestions.

    PretestScores:HIVKnowledge

    Females Males

    Mean 60 40

    SD 12 10

    N 138 97

    1. Ifamalegetsascoreof70,whatishiszscore?2. Whatisthezscoreforafemalewithascoreof35?3. Whatscoreforfemalesisequivalenttoamalesscoreof78?

    RelatedConcepts

    CentralLimitTheorem

  • CentralLimitTheroem

    BiostatisticsWorkbook 61DRAFT:Aug.28,2007

    CentralLimitTheorem

    Notalldataisnormallydistributed.Datathatisnotnormallydistributedrequiresdifferenttestsinordertoproperlyanalyzeandcompareit.Fortunately,ifwehaveanadequatelylargesamplesize,(n>30),thesamplingdistributiontendstoapproachnormalityandweareabletotreatitasnormal.ThisconceptisknownastheCentralLimitTheorem.

    Justaswecalculatedthestandarddeviationforadistributionofindividualvaluesaroundamean,wenowcancalculateasimilarmeasureofvariabilityforaseriesofsamplesfromthepopulation.ThisistheStandardErrorofthestatisticandmeasurestheprecisionofthestatistic(meanorproportion)asanestimateofthepopulationmeanorpopulationproportion.Itindicatesthedegreetowhichasamplestatisticreflectsthetruepopulationvalue.

    Thestandarderroristhebasisforcalculatingconfidenceintervalsandconductinghypothesistestsformeansandproportions.Thisallowsustomakegeneralizationsaboutalargergroupofindividualsbasedonasubsetorsample.

    Asyouknow,mostepidemiologicstudiesarecarriedoutwiththeaimoflearningaboutacharacteristicinatargetpopulation.Itisrarelyfeasibletostudyeveryindividual.Therefore,weusuallycompareexposuresordiseasewithinasampleofthepopulation.Amajorroleofstatisticsistoallowustogeneralizeresultsfromasampletothelargegroupandunderstandhowaccuratelythatgeneralizationreflectstheactualpopulationmean(orproportion).

    Overview

    Thesamplingdistributionofsamplestatistics(meanorproportion)willlooknormallydistributedforlargesamplesizes.

    Simply,ifthesamplesizeislarge(typicallyn>30),thedistributionofsamplemeansorsampleproportionsapproximatesanormaldistribution.

    Formula:

    n

    s=SE

  • CentralLimitTheroem

    BiostatisticsWorkbook 62DRAFT:Aug.28,2007

    Thus,standarderrorbecomessmallerasngetsbigger,meaningthatthelargerthesamplesize,themoreprobableitisthatthesamplemean, x ,approachesthepopulationmean,.

    RelatedConcepts

    StatisticalInference

    StandardDeviationVs.StandardError

    Botharemeasuresofvariationinadataset.

    Standarddeviationisameasureofvariation ofindividualobservationsfromthemeaninasetofdata.

    Standarderrorofthemeanmeasuresthestandarddeviationofthesamplemeans.

  • StatisticalInference

    BiostatisticsWorkbook 63DRAFT:Aug.28,2007

    StatisticalInference

    Forindividualvaluesweusethezscoretotellushowfaranindividualvalueisfromthemeanofthesample.Anysamplewillhaveanelementofrandomerror,meaningthatbychanceitmaynotlookexactlylikethepopulationfromwhichitwasdrawn.Inferentialstatisticsallowsustoquantifytheamountofrandomerror.

    Thestepsforconductinginferentialstatisticaltestsaresimilarforeachtest:

    1. Statethenullandalternativehypotheses.2. Determinethedecisionrule.3. Conducttheappropriatetest.4. Interprettheresults.

    1. StatethenullandalternativehypothesesHypothesesareformulatedbasedonprovingordisprovingthestatusquo,orwhatwecurrentlyregardtobeastrue.Eachtimewetestanewidea,weareinactualitycomparingittoouroldideaofwhatalreadyisknown.Forexample,ifweknowchloroquinetobeaneffectivemalariadrug,thenwhenwetesttheeffectivenessofanewdrugsuchassulfadoxinepyrimethamine,weusetheolddrug,chloroquine,asthebaseline.Thus,ourexpectationisthatchloroquineworksandtherewillbenodifferencefoundbyusingthenewdrug.Thisbecomesthenullhypothesis,orH0.Thealternatehypothesis(HA),oftenreferredtoastheresearchhypothesis,thenrepresentsthechancethatasignificantdifferenceisfoundbetweenthenewdrugandtheolddrug.Asweknow,adifferencecanbeeitherhigherorlower,betterorworse.Ifwearetestingforanydifference,wewilluseatwotailedtest.Ifwearetestingtoseeinwhichdirectionthedifferencelies,weuseaonetailedtest.Usingthesamelevelofsignificance(alphavalue),atwotailedtestismorestringentthanaonetailedtest.

    2. DeterminethedecisionruleAnalphavalue()determinesthelevelofsignificanceatwhichyouwillconductyourtest.Thisvalueischosenbytheresearcher.Themostcommonalphavalueseenandonewhichisconsideredanacceptablelevelofsignificancebyresearchersworldwideis0.05,or5percent.Youwillalsoseeanalphavalueof0.10,butanythingbelowthatisgenerallyconsideredtobetoolenienttoaccountfordifferencesbeyondthosewhicharerandomorcoincidentaloccurrences.

    Wecangenerallydeterminetheresultsofhypothesistestinginthreeways:1)bycomparingacalculatedvalue(tcalc)toacriticalvalue(tcrit)2)bycomparingthealphavaluetoapvalue,and3)bydeterminingifthevaluespecifiedinthenullhypothesisiscontainedwithinthelimitsofaconfidenceinterval. Thecalculatedvalueisalsoreferredtoastheteststatisticandiscalculatedthroughtheuseofdescriptivestatisticsforthesample.Acriticalvalueisidentifiedbyusingthecorrecttable.Analphavalue,aspreviouslydiscussed,isspecifiedbythe

  • StatisticalInference

    BiostatisticsWorkbook 64DRAFT:Aug.28,2007

    researcherandwillbegiven.Thepvaluecorrespondstothevalueofthecomputedteststatisticandcanbefoundinsometables,ordeterminedusingastatisticalsoftwarepackage.

    Whenthevalueofthecomputedteststatisticexceedsthecriticalvalue,(i.e.tcalc>tcrit)wecanrejectthenullhypothesis.When>p,wecanalsorejectthenullhypothesis. Lastly,ifthevaluespecifiedinthenullhypothesisisnotcontainedwithinthelimitsofourconfidenceinterval,wecanonceagainrejectthenullhypothesis. Notethatwhenwearenotabletorejectthenull,weusethephrasefailtorejectthenull.Weneveracceptthenull.Weonlyrejectitorfailtorejectit.Byrejectingthenull,wehaveprovenouralternativehypothesistobetrue.

    3. ConducttheappropriatetestThereareseveraldifferentteststatisticsthatyoumustchoosefromwhentestingforstatisticalsignificance.Theteststatisticyouwillusedependsontheknownparametersofthevariable.Ifapopulationstandarddeviation()isknown,thenweusetheztest.Withtheexceptionoftestsofproportionorverysmallpopulations,wewillgenerallyknowonlythestandarddeviationofasample(s),inwhichcaseweusethettest.Therefore,whentalkingaboutstatisticaltestsingeneral,wearereferringtothetdistribution.Thetdistributionlooksverysimilartothenormalzdistribution,butthetailsoneithersideofthecurvearelonger.

    Letusnowrevisitthegeneralformulafortheconstructionofateststatistic:

    teststatistic=samplestatistichypothesizedpopulationparameterstandarderroroftherelevantsamplestatistic

    Forcontinuousdataanalyzedusingthetwosamplettest,thenumeratorcomparesthedifferencebetweenthetwosamplemeans ( ) 21 xx referredtoasthesamplestatisticorpointestimatehere,withthedifferencethatwouldbeexpectedunderatruenullhypothesis(i.e., 0=:H 210 ) referredtoasthehypothesizedpopulationparameter,whichoftenequalszero.Thedenominatorismadeupbythestandarderror,whichservesasourmeasureofvariability.

    4. InterprettheresultsThedistributiontablesthatyouwillneedinordertointerpretresultswhenconductingtestsbyhandareincludedattheendofthisworkbook.TheyincludetheStudentsttable,thenormalstandardzdistribution,andthechisquaredistributiontables. TablesneededtocompletetheexercisespresentedinthisworkbookareincludedinAppendix2.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 65DRAFT:Aug.28,2007

    ConfidenceIntervalAroundaMean

    Thesamplemean( x )estimatesthepopulationmean()butsuppliesnoinformationonthevariabilityorourconfidenceintheestimate. Forthisreason,weuseconfidenceintervals.

    TheintervalestimatemakesuseoftheCentralLimitTheoremandthezscore.Wefirstdeterminehowconfidentwewanttobeinourestimate.Themostcommonlevelofconfidenceis95%.AswelearnedwiththeEmpiricalRule,afeatureofthenormalcurveisthat95%ofthevalueswillbewithintwostandarddeviationsofthemean. Thisvalueof2isroundedupfromtheexactvalueof1.96. Thustheprobability(P)thatzfallsbetween1.96and+1.96is0.95,or95%.

    Ifwesubstituteourformula,n/)x( ,forz,weget

    Aftersomealgebra,weendupwiththeformulaforthe95%confidenceintervalaroundthemeanas:

    Theprobabilitythatthepopulationmeanliesbetweenoursamplemeanisplusorminus1.96timesthestandarderror,whichisequalto95%. Themultiplier1.96waschosenfromthestandardztablewithanalpha0.05.If,forexample,wewantedtocalculatea99%confidenceinterval,wewouldusethezscorethatcorrespondswithanalphaof0.01. (Notethatitisthestandarderrorofthemeanthatwearemultiplyingbythezscore.)

    Overview

    Theconfidenceintervalofthemeangivestherangeofplausiblevaluesforthetruepopulationmean.

    95%ofthetime,thepopulationmeanwillbewithinapproximatelytwostandarderrorsofthesamplemean.

    Formula:

    95%CI= )n

    96.1+x,

    n

    96.1x(

    95.0)96.196.1( = + - zP

    95.0)96.1(

    96.1( = + /

    ) -

    n

    xP

    s m

    95.0=)n

    96.1+x

    n

    96.1x(P

    )n

    96.1+x,

    n

    96.1x(

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 66DRAFT:Aug.28,2007

    Thus,the95%confidenceintervalis:

    StepbyStepExample:ConfidenceIntervalAroundaMeanYouwanttodeterminethemeanbloodpressureamonggovernmentemployees.Inordertodothis,youmeasurethebloodpressureof200employees. Usethedescriptivestatisticsbelowtodeterminea95%confidenceintervalaroundthemean.

    n=200x =127mmHgs=13

    Step Example1. Calculatethestandard

    errorofthemean.

    n

    s=SE

    SE=200

    13=0.92

    2. Findthelowerlimitofthe95%confidenceinterval.

    95%LL= )SE(96.1x

    95%LL= )92.0(96.1127=1271.80=125.2

    3. Findtheupperlimitofthe95%confidenceinterval.

    95%UL= )SE(96.1+x

    95%UL=1271.96(0.92)=1271.80=128.8

    4. Interpretthe95%confidenceinterval.

    The95%confidenceintervalis(125.2,128.8).Thismeansthatwithrepeatedrandomsampling,95%ofthemeanswillfallbetween125.2and128.8.Weare,therefore,95%confidentthatthisisoneofthoseintervalsandthetruemeanofthepopulation()isbetween125.2and128.8.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 67DRAFT:Aug.28,2007

    Practice:ConfidenceIntervalAroundaMeanYourecordgestationalageatbirthforlivebirthsinthepastmonthatthreeprimaryhealthfacilitiesintheregion. Calculatea95%confidenceintervalaroundthemean.

    n=350x =37.5weekss=12.2

    Step PracticeSpace1. Calculatethestandard

    errorofthemean.

    n

    s=SE

    2. Findthelowerlimitofthe95%confidenceinterval.

    95%LL= )SE(96.1x

    3. Findtheupperlimitofthe95%confidenceinterval.

    95%UL= )SE(96.1+x

    4. Interpretthe95%confidenceinterval.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 68DRAFT:Aug.28,2007

    OpenEpiExample:ConfidenceIntervalAroundaMeanUsingthesamebloodpressuredataasbefore,useOpenEpitocalculatea95%confidenceintervalaroundthemean.

    n=200x =127mmHgs=13

    Step Example1. OpentheOpenEpi

    application.FromtheOpenEpimenuchooseMeanCIundertheheading,ContinuousVariables.

    2. Enterthedescriptivestatisticsasprompted.

    ClickonEnterNewData.

    Thescreenshownabovewillopenup.

    Usethegiveninformationtofillintheboxes.

    Noticethatyouonlyneedtoprovideeitherthestandarddeviation,thestandarderror,orthevariance.Youdonotneedtoprovideallthree.Sincethestandarddeviationisgiven,thisisthestatisticthatwewilluse.

    Becauseourpopulationislargeandunknown,wecanusethedefaultnumber,999999999,torepresentthepopulationsize. Ifyouhaveaknownpopulation,specifythatnumberhere.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 69DRAFT:Aug.28,2007

    Step Example3. Calculatethe95%

    confidenceinterval.ClickonthebuttonlabeledCalculate.

    Apopupwillopendisplayingtheresultsofthecalculation.Notethatyoumustsetyourbrowsertoallowpopupsinordertoviewtheresults.

    4. Interprettheresults.

    Choosethe95%confidenceintervalcorrespondingwiththettest,sincewedonotknowthevarianceofthepopulation,onlythestandarddeviationofthesample.

    The95%confidenceintervalis(125.2,128.8).

    Withrepeatedrandomsampling,95%ofthemeanswillfallbetween125.2and128.8.Weare,therefore,95%confidentthatthisisoneofthoseintervalsandthetruemeanofthepopulation()isbetween125.2and128.8.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 70DRAFT:Aug.28,2007

    ExcelExample:ConfidenceIntervalAroundaMeanWecanfindaconfidenceintervalaroundameanusingdescriptivestatisticsinExcelaswell. Usethesamebloodpressuredatathatweusedinthepreviousexample.

    Step Example1. Selecttheconfidence

    intervalfunctioninExcel.

    Inablankworksheet,chooseInsertfromthetoolbar.Fromthedropdownmenu,selectFunction.

    TypeconfidenceintervalintheboxlabeledSearchforafunction.Thefunctionforconfidenceintervals,CONFIDENCEwillappearasyouronlyoption.Alternatively,youcanscrolldownthelistoffunctionsuntilyoufindtheonelabeledCONFIDENCE.

    ClickonOK.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 71DRAFT:Aug.28,2007

    Step Example2. Enterthedescriptive

    statistics.

    Youwillbepromptedtoenterthealpha,standarddeviation,andsamplesize.Sincewearecalculatinga95%confidenceinterval,=1.000.95andistherefore,0.05.

    ClickonOK.

    Theresultwillthenbedisplayedontheworksheetinthecellmarkedbyyourcursor.

    Theresultistheequivalentofz(SE).

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 72DRAFT:Aug.28,2007

    Step Example3. Calculatethe95%

    confidenceinterval.Therefore,wecancalculatethe95%confidenceintervalbysubtractingandadding1.80tooursamplemeanof127.

    95%LL=1271.80=125.2

    95%UL=127+1.80=128.8

    4. Interpretyourresults. The95%confidenceintervalis(125.2,128.8).Thismeansthatwithrepeatedrandomsampling,95%ofthemeanswillfallbetween125.2and128.8.Weare,therefore,95%confidentthatthisisoneofthoseintervalsandthetruemeanofthepopulation()isbetween125.2and128.8.

    YoucanalsouseExceltofindtheconfidenceintervalaroundthemeanifyouaregivenadatasetinsteadofdescriptivestatistics.

    ExcelExample:ConfidenceIntervalAroundaMeanForthisexample,wewillusethedatasetSit/Lie.Calculatea95%confidenceintervalaroundthemeanforthevariableSitting.

    Step Example1. Importthe

    datasetintoExcel.

    Importthedataset,twosamplet,byusingthedirectionsintheboxbelow.

    ToopenadatasetinExcel:

    ChoosetheheadingDatafromthetoolbar.ClickonImportExternalData.ClickonImportData.Openthefolderwhereyouhavestoredthedatabase.Choosethetablethatyouwillbeworkingfrom.ClickOK.Choosewhereyouwouldliketoputthedatabyselectingacellofthecurrentworksheetorseclectinganewworksheet.ClickOK.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 73DRAFT:Aug.28,2007

    Step Example2. Calculatethe

    95%confidenceintervalusingExcel.

    ChooseToolsfromthetoolbar.SelectDataAnalysisfromthedropdownbox.HighlightDescriptiveStatisticsandclickOK.Youwillseeaboxliketheonebelow:

    ClickonthecharticonnexttothetextboxmarkedInputRange.

    HighlightthecolumnforthevariableSittingbyclickingontheletterwhichcorrespondswiththecolumn.

    ClickonthecharticonintheboxlabeledDescriptiveStatisticstoreturntothedialoguebox.

    ChecktheboxnexttoLabelsinFirstRow.

    Next,chooseyouroutputoptions. Anewworksheetischosenasthedefault,butifyouwouldlikeyouroutputtoappearonthesameworksheetasyourdataset,selectthefirstoptionunderOutputoptions,OutputRange. Clickontheiconnexttothetextbox. Choosetheareawhereyouwouldlikeyouroutputtoappearbyclickingonacell.Clickontheiconagaintoreturntothedialoguebox.

    ChecktheboxesnexttoSummarystatisticsandConfidencelevelforMean.

    ClickOK.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 74DRAFT:Aug.28,2007

    Step Example3. Usetheoutput

    tocalculatetheconfidenceinterval.

    Youroutputwilllooklikethis:

    Noticethattheoutputdoesnotactuallyprovideyouwithaconfidenceinterval.Instead,youaregivenanumberwhichrepresentsthedifferencefromthemean.Tofindtheconfidenceintervalaroundthemean,subtractthisnumberfromandaddthisnumbertothemean.

    95%CI= x confidencelevel=80.9514.13,80.95+14.13=66.82,95.08

    4. Interprettheresults.

    The95%confidenceintervalaroundthemeanis(66.82,95.08).Withrepeatedrandomsampling,95%ofthemeanswillfallbetween66.82and95.08.Weare,therefore,95%confidentthatthisisoneofthoseintervalsandthetruemeanofthepopulation()isbetween66.82and95.08.

    ExcelorOpenEpiPractice:ConfidenceIntervalAroundaMeanUsingthedatafromtheHIVKnowledgepretest,calculatethe95%confidenceintervalaroundthemeanscoreforfemalesineitherExcelorOpenEpi.

    PretestScores:HIVKnowledge

    Females Males

    Mean 60 40

    SD 12 10

    N 138 97

    Foradditionalpractice,calculatethe95%confidenceintervalaroundthemeanscoreformalesbyusingthecomputerapplicationthatyoudidnotpreviouslyuse.

  • ConfidenceIntervalAroundaMean

    BiostatisticsWorkbook 75DRAFT:Aug.28,2007

    1. Opentheappropriateapplication.

    2. Enterthedescriptivestatistics.

    Step PracticeSpace3. Calculatethe95%

    confidenceinterval.

    4. Interpretyourresults.

    RelatedConcepts

    ConfidenceIntervalAroundaProportionConfidenceInterval:TwoSampletTest

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 77DRAFT:Aug.28,2007

    ConfidenceIntervalAroundaProportion

    TheCentralLimitTheoremalsoapplieswhenconsideringadistributionofsampleproportions,whenthesamplesizeislargeenough.Thesamplingdistributionwouldbeconstructedsimilarlyasforthemean.Howeverthecharacteristicsofthesamplingdistributionwillbedifferentasthisisabinomialdistribution.Wewillbeestimatingthepopulationproportionratherthanthepopulationmean.Sincethebinomialdistributionisasamplingdistributionforp,itsmeanequalsthepopulationmeananditsstandarddeviationrepresentsthestandarderror(SE).

    n=samplesizeornumberoftrials p=probabilityofsuccess 1p=probabilityoffailure

    SEoftheproportion=n

    )p1(p

    Asthesamplesize,n,increases,thebinomialdistributionbecomesveryclosetoanormaldistributionduetothecentrallimittheorem

    Therefore,thenormaldistributioncanbeusedtocalculateconfidenceintervalsanddohypothesistests

    Ifnpandn(1p)areequalto10ormore,thenthenormalapproximationmaybeused

    Similartothemethodusedtocalculateaconfidenceintervalaroundamean,tocalculatethe95%confidenceintervalaroundaproportion,wefirstcalculatethestandarderroroftheproportionandthenusethesameformula:

    95%CIn

    )p1(p96.1p=

    Overview

    Theconfidenceintervalaroundaproportiongivestherangeofplausiblevaluesforthetruepopulationproportion.

    95%ofthetime,thepopulationproportionwillbewithinapproximatelytwostandarderrorsofthesampleproportion.

    Formula:

    95%CIn

    )p1(p96.1p=

    ,

    n)p1(p

    96.1+p

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 78DRAFT:Aug.28,2007

    StepbyStepExample:ConfidenceIntervalAroundaProportionOutof212pregnantwomentestedforHIV,53hadpositiveresults.Usethisinformationtofinda95%confidenceintervalforthepopulation.

    Step Example1. Identifypand1p.

    p,theproportionofsuccess= 25.0=21253

    1p,theproportionoffailures=10.25=0.75

    2. Calculatethe95%lowerlimit.

    95%LLn

    )p1(p1.96p=

    95%LL212

    )75.0(25.096.125.0=

    =0.25 96.12121875.0

    =0.251.96 00088.0=0.25(1.96x0.0297)=0.250.0583=0.1918

    3. Calculatethe95%upperlimit.

    95%ULn

    )p1(p1.96+p=

    95%UL212

    )75.0(25.096.1+25.0=

    =0.25+0.0583=0.3083

    4. Interprettheinterval. The95%confidenceintervalis(0.19,0.31).Withrepeatedrandomsampling,95%ofintervalscalculatedwillcontainthetrueproportionofthepopulation.Weare95%confidentthatthisisoneofthoseintervalsandtheprevalenceofHIVinthepopulationisbetween19%and31%.

    Note:Yousee(1p)referredtoasqlaterinthisworkbook,aswellasinmanybiostatisticstexts.

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 79DRAFT:Aug.28,2007

    Practice:ConfidenceIntervalAroundaProportionUpontesting250confirmedAIDScases,youfindthat116arepositivefortuberculosis.Findthe95%confidenceintervalaroundtheproportionofAIDSpatientsinfectedwithTB.

    Step PracticeSpace4. Identifypand1p.

    4. Calculatethe95%lowerlimit.

    95%LLn

    )p1(p1.96p=

    4. Calculatethe95%upperlimit.

    95%ULn

    )p1(p1.96+p=

    4. Interprettheinterval.

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 81DRAFT:Aug.28,2007

    OpenEpiExample:ConfidenceIntervalAroundaProportionUsingthepreviousexample,wewilldemonstratehowtocalculatea95%confidenceintervalaroundaproportion.Outof212pregnantwomentestedforHIV,53hadpositiveresults.Usethisinformationtofinda95%confidenceintervalforthepopulationinOpenEpi.

    Step Example1. OpentheOpenEpi

    application.FromtheOpenEpimenuchooseProportionundertheheading,Counts

    2. Entertheproportiondataasprompted.

    ClickonEnterNewData.

    Ascreenliketheoneabovewillopen.

    Usethegiveninformationtofillintheboxes.Thenumeratorwillalwaysconsistofthenumberofsuccesses,orp.Thedenominatoristhesizeofthepopulationorsample.

    3. Calculatethe95%confidenceinterval.

    ClickonthebuttonlabeledCalculate.

    Apopupwillopendisplayingtheresultsofthecalculation.Notethatyoumustsetyourbrowsertoallowpopupsinordertoviewtheresults.

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 82DRAFT:Aug.28,2007

    Step Example4. Interprettheresults.

    OpenEpicalculatesthe95%confidenceintervalbyusingseveraldifferentmethods.ThoughtheeditorsrecommendtheMidPExacttolookatfirst,itistheWald(NormalApproximation)thatcorrespondsmostcloselywithourhandcalculations.

    The95%confidenceintervalis(0.19,0.31).Withrepeatedrandomsampling,95%ofintervalscalculatedwillcontainthetrueproportionofthepopulation.Weare95%confidentthatthisisoneofthoseintervalsandtheprevalenceofHIVinthepopulationisbetween19%and31%.

  • ConfidenceIntervalAroundaProportion

    BiostatisticsWorkbook 83DRAFT:Aug.28,2007

    OpenEpiPractice:ConfidenceIntervalAroundaProportionTherehasbeenameningitisoutbreak.Youfindthatinoneschool,threestudentsoutofanenrolled400havebeeninfectedwithmeningitis.UseOpenEpitocalculatea95%confidenceinterval.

    1. OpentheOpenEpiapplication.

    2. Entertheproportiondataasprompted.

    3. Calculatethe95%confidenceinterval.

    Step PracticeSpace4. Interprettheresults.

    RelatedConcepts

    ConfidenceInterval:ztestofProportions

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 85DRAFT:Aug.28,2007

    HypothesisTesting:TwoSamplettest

    Usedforcontinuousdata,thettestisoneofthemostcommonlyusedstatisticaltestsperformedinthepublichealthandclinicalliterature.Hypothesistesting

    Overview

    Testemployedtoevaluatethenullhypothesis ( ) 0H thatthepopulationmeansareequalversusthealternativehypothesis ( ) aHthatthepopulationmeansaredifferent.Thistestisusedtocomparethemeansoftwoindependentsamples.

    Example:Comparingthedifferenceinmeanbloodpressureforasampleofrefugeestothatofasampleofhostcountryresidents.

    Formula: ( ) ( )

    2

    2p

    1

    2p

    2121

    n

    s

    n

    s

    xxt

    +

    - - - =

    Assumptions:o Twoindependentrandomsampleso Normallydistributedpopulationo Equal,butunknownvariancesinthetwosamples(Note:ThereisamethodtocomparetwosampleswithunequalvariancescalledSatterwaitesmethod.Pleaserefertoabiostatisticstextforfurtherexplanation.)

    Typeofvariables:Continuous Decisionrule:Ifthecalculatedvalueoft( calct )isgreaterthanthe

    criticalvalueoft( critt ),thenwecanrejectthenullhypothesis. Tableused:Studentsttable

    Where:

    ( ) ( ) 2nn

    s1ns1ns

    21

    222

    2112

    p - + - + -

    =

    andisreferredtoasthepooledvariance.

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 86DRAFT:Aug.28,2007

    usingthettestallowsustodeterminewhethertheobserveddifferencebetweenthemeanvaluesoftwogroupsisstatisticallysignificant.

    Avitalcomponentusedinthecalculationofthestandarderrorforthetwosamplettestisthepooledvariance,denoted 2ps .Asindicatedabove,amajorassumptionnecessaryforthevalidityofthetwosamplettestisthatthevariancesareunknown,butassumedtobeequal. Wecanjustifythisassumptionbydividingthevarianceofonesamplebythevarianceofthesecondsample

    (22

    21

    ss

    ). If22

    21

    ss

    equalsavalueoflessthanthree,assumethatthevariancesare

    approximatelyequal.Thecloserthatthisvalueistoone,themoreequalthevariancesare. Whenthisassumptionisjustified,apooledestimateofthecommonvariancecanbecalculated ( ) 2ps ,whichestimatestheoverallvarianceoftheentirestudypopulation.

    Thepooledestimateisobtainedbycomputingtheweightedaverageofthetwosamplevariances.Thesamplevariances ( ) 2221 sands areweightedaccordingtothenumberofobservationsineach.Ifthesamplesizesareequal( 21 nn = ),thisweightedaverageisthemeanofthetwosamplevariances.Ifthetwogroupsareofunequalsize( 21 nn ),thepooledvarianceiscalculatedasfollows:

    ( ) ( ) 2nn

    s1ns1ns

    21

    222

    2112

    p - + - + -

    =

    OurteststatisticisdistributedintheStudentsttablewith 2nn 21 - + degreesoffreedom.

    StepbyStepExample:HypothesisTestingTwoSamplettestCanweconcludethatinfantsbornatalowincomeareaclinic,ontheaverage,tendtobelighterthanthosebornataclinicservingahighincomepopulationarea?Withinthepastmonth,astudenthascollecteddataonbirthweights(grams)from arandomsampleof80deliveriesatahighincomepopulationservingclinic(High)and100deliveriesatalowincomepopulationservingclinic(Low).Therelevantinformationissummarizedbelowinthetable. Letalphaequal0.05.

    Clinic n x sHighClinic(1) 80 2800 100LowClinic(2) 100 2650 82

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 87DRAFT:Aug.28,2007

    Step Example1. Statethenulland

    alternativehypotheses.

    Theresearcherwilldetermineifthemeanvalueforonegroupislowerthanthatoftheother,soaonesidedtestofourhypothesesisindicated.

    Ournullhypothesisstatesthatthemeanbirthweightofbabiesbornatthehighincomeclinic(1)shouldbelessthanorequaltothatofbabieswhoarebornatthelowincomeclinic(2).Thenullhypothesisiswrittenas:

    210 :H m m

    Thealternativehypothesisstatesthatthemeanbirthweightofbabiesbornatthehighincomeclinic(1)isgreaterthanthatofthosebornatthelowincomeclinic(2),andiswrittenas:

    21a :H m m >

    Anotherwayofstatingthehypothesesisbelow.Hereyouarestatingthatthedifferencebetweenthetwopopulationmeans(D)islessthanorequaltozero(null)orthedifferenceisgreaterthanzero(alternative).

    0:H 210 - m m 0:H 21a > - m m

    2. Statethedecisionrule.

    Usingaonesidedtestwithanalphavalueof0.05and 2nn 21 - + =178df,thecriticalvalueoftheteststatisticis1.645. WeobtainthisvaluefromtheStudentsttable.Notethat178degreesoffreedomisnotonthetable,soweapproximateitbyusinginfinity().

    Thus,weshouldreject 0H if 1.645tcalc >

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 88DRAFT:Aug.28,2007

    Step Example3. Calculatethevalueof

    theteststatistic.Computingthevalueoftheteststatisticinvolvesseveralsteps. Theformulawewillfollowis

    ( ) ( )

    2

    2p

    1

    2p

    2121

    n

    s

    n

    s

    xxt

    +

    - - - =

    a. Calculatethedifferenceinsamplemeans.

    ( ) 21d xxx =

    Beginbycomputingthedifferenceinsamplemeans:

    ( ) 21 - isassumedtobe0becauseournullhypothesisstatesthatthereisnodifferencebetweenthetwopopulations.

    ( ) 21 xx - iscomputedas: 15026502800 = -

    b. Computethevalueofthepooledvariance.

    ( ) ( ) 2nn

    s1ns1ns

    21

    222

    2112

    p - + - + -

    =

    Thepooledvarianceiscalculatedas:

    ( ) ( ) 8177.955

    178829910079

    s22

    2p =

    + =

    c. Findthevalueforthestandarderror.

    2

    2p

    1

    2p

    n

    s+

    n

    s=SE

    Thiswillbethedenominatorofthetcalcequation.Usingthepooledvariancecalculatedabove,thestandarderroriscomputedas:

    13.56100

    8177.95580

    8177.955 = +

    d. Determinethevalueof calct .

    ( ) ( )

    2

    2p

    1

    2p

    2121

    n

    s

    n

    s

    xxt

    +

    - - - =

    Specifically,wearetakingourcalculationsfrompartsaandcandsubstitutingthoseintoourformula.

    11.0613.56

    0150tcalc = =

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 89DRAFT:Aug.28,2007

    Step Example4. Statethestatistical

    decision.Wereject 0H sincethevalueofourteststatistic calct=11.06exceedsthetcriticalvalueof1.645.Wethereforehaveevidencethatourteststatisticfallsintherejectionregion.

    5. Reportthepvalue. Forthistest,apvalue

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 90DRAFT:Aug.28,2007

    Step PracticeSpace3. Calculatethevalueof

    theteststatistic.

    a. Calculatethedifferenceinsamplemeans.

    ( ) 21d xxx =

    b. Computethevalueofthepooledvariance.

    ( ) ( ) 2nn

    s1ns1ns

    21

    222

    2112

    p - + - + -

    =

    c. Findthevalueforthestandarderror.

    2

    2p

    1

    2p

    n

    s+

    n

    s=SE

    d. Determinethevalueof calct .

    ( ) ( )

    2

    2p

    1

    2p

    2121

    n

    s

    n

    s

    xxt

    +

    - - - =

    4. Statethestatisticaldecision.

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 91DRAFT:Aug.28,2007

    Step PracticeSpace5. Reportthepvalue.

    6. Statethepracticalconclusion.

    EpiInfoExample:HypothesisTestingTwoSamplettestWewillusetheexampleonpage86toconductatwosamplettestinExcel. Wearedeterminingwhetherinfantsbornatalowincomeareaclinictendtohavealowerbirthweightthanthosebornataclinicservinganareawithahighincomepopulation.Forthisstatisticaltest,wewilluseaonetailedanalysissincewewanttoknowspecificallywhetherbabiesbornattheclinicservingalowincomepopulationarea,ontheaverage,tendtobelighterthanthosebornattheclinicservingahighincomepopulationarea,andnotonlyifthebirthweightsdiffer.Assumeanof0.05.

    Step Example1. Statethenulland

    alternativehypotheses.

    H0:12or120(Babiesborninthehighincomeareaclinicweighlessthanorequaltothoseborninaclinicservingalowincomearea.)

    Ha:1>2or12>0(Babiesborninthehighincomeareaclinicweighmorethanthosebabiesborninaclinicservingalowincomearea.)

    2. Statethedecisionrule.

    Wewillchooseanalphavalueof0.05inordertocompareourresultswiththecomputerprogramtothosewhichwepreviouslycalculatedbyhand.

    If>p,wecanrejectthenullhypothesis.

    Inaddition,ifweknowthecriticaltvalue,theniftcalc>tcrit,wecanrejectthenullhypothesis.

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 92DRAFT:Aug.28,2007

    Step Example3. Executethetwo

    samplettest.

    a. READthedatabasefile.

    OpenEpiInfoandchooseAnalyzeData.

    Choosethetabletwo_sample_tfromthedatasetBios_Workbook_Examples.

    b. SelecttheMEANScommand.

    UsethearrowunderMeansoftoscrollthroughthevariables.ChooseBirthweight.

    ScrolldownunderCrosstabulatebyValueofandchooseClinic.

    ClickonOK.

    Scrolldowntofindthedescriptivestatistics.Theyshouldlooklikethis:

    4. Reportthepvalueand/orthecalculatedtvalue.

    Ourpvaluegivenintheoutputis0.00.

    Wehavefoundatstatisticof11.05,whichdiffersonlyslightlyfromthetstatisticcalculated(11.06)onpage88.Thiscouldbeduetoroundingerrorsthatwemadeinourcalculations.

    NotethatEpiInfousesanalphavalueof0.05andatwotailedtestasdefaults.

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 93DRAFT:Aug.28,2007

    Step Example5. Statethestatistical

    decision.Sinceourpvalueof0.00*islessthanthealphaof0.05,wehavesufficientevidencetoconcludethatthereisasignificantdifferencebetweenbirthweightsinthetwoclinics.

    RememberthatwecanfindourcriticaltvaluebyusingtheStudentsttable.Inthiscaseitis1.645(usethetotalobservationstofindNandthetotaldegreesoffreedom).Sinceourcalculatedtis11.0545andisgreaterthan1.645,wecanconfirmtheabilitytorejectthenullhypothesis.

    6. Statethepracticalconclusion.

    Becausep

  • HypothesisTesting:TwoSamplettest

    BiostatisticsWorkbook 94DRAFT:Aug.28,2007

    EpiInfoPractice:HypothesisTestingTwoSamplettestTherewasanoutbreakofcholeraamongstudentsinavillageschool. Youweregivenarecordofthoseinfectedbytheschooldirector. Ofthestudentsinfectedwithcholera,youwanttodetermineifthereisasignificantdifferenceintheageoftheinfectedbygender.UsethettestinEpiInfotodetermineifthereisasignificantdifference(alpha=0.05)betweenthemeanagesofmalesandfemalesinfectedwithcholera.UsethetableAgeInSchoolfromthedataset,Bios_Workbook_Examples.

    Step PracticeSpace1. Statethenulland

    alternativehypotheses.

    2. Statethedecisionrule.

    3. Performatwosamp