p8130: biostatistical methods i · 2017-11-29 · measures of location: median • compared to the...
TRANSCRIPT
![Page 1: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/1.jpg)
P8130:BiostatisticalMethodsILecture2:DescriptiveStatistics
CodyChiuzan,PhDDepartmentofBiostatisticsMailmanSchoolofPublicHealth(MSPH)
![Page 2: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/2.jpg)
Lecture1:Recap• IntrotoBiostatistics• TypesofData• StudyDesigns
![Page 3: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/3.jpg)
DescriptiveStatistics
• Thecollectionandpresentationofthedatathroughgraphicalandnumericaldisplays
• Lookforpatternsinthedataandsummarizeinformation
• Measuresoflocation
• Measuresofdispersion
• Graphicaldisplay
![Page 4: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/4.jpg)
MeasuresofLocation• Measuresoflocationorcentraltendency indicatethecenterofthedata
• Mean(average)
• Median(the50th percentile)
• Mode
![Page 5: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/5.jpg)
MeasuresofLocation:MeanDefinition:thearithmeticmeanrepresentsthesumofallobservationsdividedbythenumberofobservations
Samplemeanforasampleofn observationsisgivenby:
𝑥=∑ 𝑥#/𝑛&#'(
Samplemeanisusedtoestimatethepopulationmeanμ whichistypicallyunknown
![Page 6: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/6.jpg)
MeasuresofLocation:Mean• Themostcommonusedmeasureoflocation
• Overlysensitivetooutliers(unusualobservations),thusnotrecommendedifthedataareskewed
• Notappropriatefornominalorcategoricalvariables
![Page 7: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/7.jpg)
MeasuresofLocation:MedianDefinition:Thesamplemedianiscomputedas:1. Ifnisodd,medianiscomputedas &)(
*𝑡ℎ largestiteminthesample
2. Ifniseven,medianiscomputedastheaveragebetween &*𝑎𝑛𝑑 &
*+ 1 th
largestitems
Example:Givenn=7(odd)totalsampleobservations,medianisthe1)(* = 4𝑡ℎ largestitemGivenn=10(even)totalsampleobservations,medianistheaverageofthe
(4* = 5𝑡ℎand (4* + 1 = 6𝑡ℎ largestitems
![Page 8: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/8.jpg)
MeasuresofLocation:Median• Comparedtothemean,themedianisnotaffectedbyeveryvalueinthedatasetincludingoutliers
• Themedianisdefinedasthemiddlevalueorthe50th percentile• Thismeansthathalfofthedataarelessthanorequaltoit,andatleastaregreatertanorequaltoit
•Mediancalculationstartsbyfirstorderingthedata(increasingorder)• Appropriatemeasureforordinaldata
![Page 9: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/9.jpg)
OtherMeasuresofLocationPercentiles:medianisthe50th percentile
• Ingeneral:thekth percentileisavaluesuchthatmostk%ofthedataaresmallerthanitand(100-k)%arelarger• Deciles:10th,20th,30th,…• Quartiles:25th (Q1),50th,75th (Q3)
• Question:whatdoesitmeanifyourGREscoreisinthe90thpercentile?
![Page 10: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/10.jpg)
MeasuresofLocation:ModeDefinition:themostfrequentlyoccurringvalueinthedata
• Youcanhavemultiplemodesornone(really?)
• Problematicifthereisalargenumberofpossiblevalueswithinfrequentoccurrence
![Page 11: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/11.jpg)
MeasuresofDispersionDescribethespreadofthedata:• Range
• Inter-quartilerange(IQR)
• Variance/Standarddeviation
• Coefficientofvariation(CV)
![Page 12: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/12.jpg)
MeasuresofDispersionRange:Max– Min
Inter-quartilerange:IQR=75th (Q3)– 25th (Q1)
Sincetherangeonlydependsontheminimumandmaximumvalues,itcanbeinfluencedbytheextremes
Solution?UsetheIQR
![Page 13: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/13.jpg)
MeasuresofDispersionPopulationVarianceistheaveragesquareddeviationfromthemean:
𝜎*=(<∑ (𝑥# − 𝜇)*<#'(
PopulationStandardDeviationisjustthesquarerootofthevariance:
𝜎 = 𝜎*�
Valuesoftenunknownandthenwereferbacktosample…
![Page 14: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/14.jpg)
MeasuresofDispersionSampleVarianceistheaveragesquareddeviationfromthemean:
𝑠*= (&C(
∑ (𝑥# − 𝑥)*&#'(
PopulationStandardDeviationisjustthesquarerootofthevariance:
s= 𝑠*�
Lotsofchangesinnotationandalsoformula!!
![Page 15: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/15.jpg)
MeasuresofDispersion
Meanandstandarddeviationsarethemostusedmeasuresoflocationandspread.Why?It’sallaboutthe…
Property:lineartransformationsdoaffectthesemeasures
Let𝑌 = 𝑐𝑋 + 𝑏 bealineartransformationavariableX
Meanof𝑌 = 𝑐𝑋 + 𝑏StandardDeviation𝑠H = 𝑐𝑠I
![Page 16: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/16.jpg)
MeasuresofDispersion
CoefficientofVariation(CV)isameasurethatrelatesthemeanandthestandarddeviation.• Sometimesthevariancechangeswithitsmean
• Population:𝐶𝑉 = LM×100%
• Sample:𝐶𝑉 = QR×100%
• CVisunitless andcanbeinterpretedintermsofvariabilitytotheaverage
![Page 17: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/17.jpg)
GraphicalDisplay
• Apictureisworthathousandwords(sometimes)
• Bargraphs
• Histograms
• Box-plots
• Scatterplots(laterinlinearregression)
![Page 18: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/18.jpg)
BarGraph• Dataaredividedintogroupsandfrequenciesaredeterminedforeachgroup• Rectanglesareconstructedwiththebaseofconstantwidthandheightsproportionaltothefrequencies
![Page 19: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/19.jpg)
Histogram• Numericalvaluesaregroupedintomeasurementsclasses,definedbyequal-lengthintervalsalongthenumericalscale• Eachvaluebelongstoonlyoneclass• Usually5-12classes• Likebargraph,thisplothasfrequenciesontheverticalaxis• Ifthemean>median:rightskew• Ifthemean<median:leftskew
![Page 20: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/20.jpg)
Box-plot• ExtendsfromtheQ1(25th)totheQ3(75th)quartile– thebox• The‘whiskers’extendfromthesmallesttothelargestvalues• Ifoneofthewhiskersislong,itindicatesskewnessinthatdirection• IfadatavalueislessthanQ1–1.5(IQR)orgreaterthanQ3+1.5(IQR),thenitisconsideredanoutlierandgivenaseparatemarkontheboxplot
![Page 21: P8130: Biostatistical Methods I · 2017-11-29 · Measures of Location: Median • Compared to the mean, the median is not affected by every value in the data set including outliers](https://reader034.vdocument.in/reader034/viewer/2022042021/5e7812ca61e0680c241de498/html5/thumbnails/21.jpg)
Readings
Rosner,FundamentalsofBiostatistics,Chapter2
• Sections:2.2– 2.6
• Sections:2.9– 2.10