analysing foodborne outbreaks in the usa renske · analysing foodborne outbreaks in the usa project...

Post on 06-Sep-2018

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

AnalysingfoodborneoutbreaksintheUSAProjectforDesignofExperiments–byRenskeBouma

IntroductionFoodisvitaltolife,butcanalsocauseillnessorevendeath.Foodcannamelybeacarrierofdangerousmicro-organisms,whichthenwillresultinfoodbornedisease.AccordingtotheWorldHealthOrganisation(WHO)afoodbornediseaseisdefinedas:‘Anydiseaseofaninfectiousortoxicnaturecausedby,orthoughttobecausedby,theconsumptionoffoodorwater’.Thisdefinitionalsoincludesdiseasescausedbynon-microbialsubstances,likeharmfulpesticidesorprocessingchemicals.Mostcommonarehowevertheillnessescausedbymicro-organismsandtheirtoxins(Adams&Moss2008)andthesewillbethefocusofthisreport.Afoodborneoutbreakisdefinedas:‘Anincidentinwhichtwoormorepersonsexperienceasimilarillnessresultingfromtheingestionofacommonfood’(CDC2000).Oftenenough,theoutbreaksarelargerthantwoillnesses.InthisreportfoodborneoutbreaksintheUSAareinvestigatedforthedependenceofaverageoutbreaksizeonlocationofpreparationofthefoodvehicle,themicro-organismthatcausedthediseaseandthestateitoccurredin.

ThedatabaseThedatabaseIuseisputtogetherbytheCentersforDiseaseControlandPrevention(CDC).ThisistheorganisationintheUSAthatisworkingtowardsabetterpublichealth.Toknowhowtodothis,theCDCneedstoknowwheretheproblemslayandthereforeitmonitorstheprevalenceofdiseases,likefoodbornediseases.ItcreatedtheFOODtool,theFoodborneOutbreakOnlineDatabase(CDC2015),inwhichallreportedcasesoffoodborneoutbreaksthatwerereportedtotheCDCsince1998areincluded.Idownloadedanextensiveexcelfilefromtheirwebsitetouseforthestatisticalanalysis.TheCDCwarnsthatthedatabaseisnotfinal,reportscanstillbechangedwhennewinformationisgathered.ThedatabaseIusedwaslastlyupdatedon16October2015.Thenewerreportscouldthereforebereflectingthetrueoutbreaklessthantheolderreports,whichcouldleadtosystematicerrors.However,thedatabasedoesnotcontainreportsnewerthan2014,soalsothenewestoutbreakshadalmostayeartobefullyreported.Thedatabaseincludesthefollowinginformationfromeveryoutbreak(seeFigure1):theyearandmonthitoccurred,thestate,thespecie/speciesthat(probably)causedthedisease,theserotype(ifknown)ofthemicro-organism,theetiologystatus(confirmedoronlysuspectedorigin),thelocation(s)ofpreparationoftheinfectedfood,theresultingillnesses,hospitalizationsanddeaths,thefoodvehicleandthecontaminatedingredientinthisfooditem.TosimplifyIonlyuseyear,state,genus,locationofpreparationandresultingillnessesinmyanalysis.Thesearethefactorsthatcanbegroupedmosteasilyingroupsthatarestillbigenoughforanalysisandthatseemthemostinterestingtome.

Figure1theoriginaldatabase

2

HypothesisIwouldliketoknowwhereamistakecausesthemostillnesses.Doesamistakebyacaterercausemoreillnessesonaveragethanamistakeatabanquet?Doesanoutbreakatarestaurantcausemoreillnessesthanoneathealthcare?Nexttothat,Iamcuriousiftheotherfactors,genusandstate,playaroleaswell.Whichgenuscausesthemostillnessesperoutbreak?Aretheredifferencesbetweenstatesinhowbigtheoutbreaksare?

ExperimentaldesignIwanttoknowwhetherornotthedifferencesinamountofillnessperoutbreakbetweendifferentlocationsofpreparations,generaandstatesaresignificantornot.Isitsimplybecauseofchancethattheylookdifferentorisitlikelythatthereisarealdifference?ToknowthisIwillanalysethevarianceofthedatawithanone-waybalancedANOVAinR.BeforeIcandothis,Ihavetostructurethedata.

StructuringofthedataTogetaclearresultIremovedallthedata-pointswithmultiplepossiblespecies,multiplelocationofpreparationsandthemultistateoutbreaks.IgroupedthedifferentspeciesofthemostcommongeneraasdisplayedinTable1andomittedalltheotherdata-pointsfromlesscommongenera.IalsogroupeddifferentlocationofpreparationsasdisplayedinTable2andleftoutallotherdata-pointsfromlesscommonlocations(likecampsandfestivals).Noneofthegroupshaslessthan100data-points,whichIbelievegivesagoodreliability.

Table1groupingofdifferentspeciesintheirrespectivegenus

group contains data-points Bacillus (B) B. cereus, B. other, B. unknown 246 Campylobacter (Ca) C. jejuni, C. coli, C. fetus, C. other, C unknown 185 Clostridium (Cl) C. perfringens, C. botulinum 547 Escherichia (Es) E. coli, enteroaggregative, E. coli enteropathogenic, E.

coli other, E. coli shiga toxin-producing 222

Norovirus (N) Norovirus, Norovirus Genogroup 1, Norovirus Genogroup 2, Norovirus unknown

3729

Salmonella (Sa) Salmonella, S. enterica, S. other, S. unknown, 1335 Shigella (Sh) Shigella, S. boydii, S. dysenteriae, S flexneri, S. sonnei,

S. unknown 112

Staphylococcus (St) S. aureus, S. other, S. unknown 415

Table2groupingofdifferentlocationsofpreparation

group contains data-points Banquet Banquet facility (food prepared and served on-site) 210

Caterer Caterer (food prepared off-site from where served), Caterer;

unknown, Caterer; other 687

Health care Hospital, Long-term care/nursing home/assisted living facility, long-term care..; Hospital, long-term care..; Other

162

Private home Private home/residence 861 Restaurant Restaurant- “Fast food” (drive up service or pay at counter),

Restaurant- other or unknown type, Restaurant – other or unknown type; Other, Restaurant – Sit-down dining, Restaurant – sit-down dining; Other

4871

3

Afterthisstructuringand‘cleaning’,thedata-setlookedlikeshowninFigure2.Thedatasetstillcontainedmorethan6500data-points.

FirstlookatthedataBeforestartingtogointothestatistics,Ihavealookattherawdata.Themeanofallillnessesfromalloutbreaksis20.33.Plottingthedatagivesanideaabouttherangetheamountofillnessesperoutbreakcanbein.InFigure3theoutbreaksize(theamountofillnessesperoutbreak)forthedifferentlocationsisgiven.Itcanbeseenthatmostoftheoutbreaksresultinlessthan100illnesses,buttherearemanyexceptions.Fiveoutbreakswererelativelyextremeandresultedinmorethan600reportedillpeople.Mostoutliersareseenattherestaurant,butnoconclusionscanbedrawnfromthis,astheamountofdata-pointsarenotequalforalllocations.Thedata-setcontainsbyfarthemostpointsforrestaurants,whichcanexplainthebiggerrangeofpointswithinthisgroup.Whenonlylookingattheboxplots,thereseemstobeadifferencebetweenthefirstthreelocationsandthelasttwo.

Figure3boxplotofoutbreaksizeversuslocationofpreparation

Figure2thestructureddata-setinExcel

4

TheboxplotspergenusisgiveninFigure4.Ofcoursemostoftheoutbreaksareagainbelow100illnesses,asthesamedataisplottedasbefore,justinadifferentgrouping.Interestingtoseeiswheretheoutliersare.Escherichiaseemsatfirstsighttobequitedangerous,becauseofthehighoutliers,buttheboxplotitselfisquitesmall.Norovirushasanexceptionalamountofoutliers,butjustasfortherestaurants,noconclusioncanbedrawnfromthis.TheNorovirusgroupcontainsthemostdata-pointsfromallgroupings,sothiscouldexplainthehigheramountofoutliers.Whenjustlookingattheboxplot,itseemsthatClostridium,NorovirusandShigellaareresultinginthemoreillnessesthantheothergenera.WewillseeifthisdifferenceisseenaswellfromtheANOVA.

Figure4boxplotofoutbreaksizeversusgenus

TheboxplotperstateisgiveninFigure5.Itisclearthattherearedifferencesbetweenthestates.Wewillfindoutlaterwhetherornotthesedifferencesaresignificantornot.

Figure5boxplotofoutbreaksizeversusstate

5

HypothesistestingMymaininterestiswhetherornotthelocationofpreparationofthefoodthatcausestheoutbreakhasasignificantinfluenceontheoutbreaksize.SoIperformanANOVAinRonthelocation.Thisistheoutput:

Df Sum Sq Mean Sq F value Pr(>F) Location 4 333252 83313 54.51 <2e-16 *** Residuals 6786 10371990 1528 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theresultshowsthatatleastbetweentwolocationsthereisasignificantdifferenceinaverageoutbreaksize.Thisdifferenceissoextremethattheprobabilitythatthiswouldoccurunderthenullhypothesis(thereisnodifference)islessthan0.1%.

ValidationofassumptionsTheinterpretationoftheANOVAisonlyvalidwhentheassumptionsareacceptable.TheresultofANOVAismeaningfulwhenitcanbeassumedthattheresiduesarenormally,independentlyandidenticallydistributed(NIID).NormalitycanbecheckedbymakingaQ-Qplot.Whentheresiduesaremoreorlessnormallydistributedastraightlineisobserved.InFigure6itcanbeseenthatthisisnotthecase.Anon-parametictestshouldbeusedorthedatashouldbetransformedsothatthenormalityassumptionbecomesvalid.

Figure6Q-QplotofresiduesfromANOVAoflocation

6

HypothesistestingInthebook(Boxetal.2005)itwassuggestedthattakingthelogarithmoftheoutput(hereoutbreaksize)canstabilizethevariance.IperformtheANOVAwiththelog-transformeddata.ThisistheoutputfromR:

Df Sum Sq Mean Sq F value Pr(>F) Location 4 846 211.52 209.2 <2e-16 *** Residuals 6786 6861 1.01 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theresultisthesameasbefore:thereisaverysignificancedifference.Letusseeifthistimewecantrusttheresult.

ValidationofassumptionsAgain,ImakeaQ-Qplottocheckiftheresiduesarenormallydistributed.AscanbeseeninFigure7thedotsaremoreorlessononeline.Normalitycanthereforebeassumed.

Figure7Q-QplotofresiduesoftheANOVAoflocationaftertransformation

7

Theresiduesshouldalsobeapproximatelyindependentlyandidenticallydistributed.InFigure8itcanbeseenthattheresiduesarespreadmoreorlessthesameovertheyears.Thereisnotrend,soprobablythereisnoautocorrelationbetweenthedata-points.Thismeanstheassumptionofindependencecanbemade.

Figure8plotofresiduesintimeorder

InFigure9theresiduesperexpectedvalueareplotted.Curiousenough,onlyfourlinesappear,whileIamresearchingfivelocations.Apparentlytwolocationshaveanexpectedvalueveryclosetoeachother.Thespreadofresiduesshouldbeapproximatelythesameperexpectedvaluetobeabletoassumeidenticaldistribution.Thespreadisnotidenticalhere,butitiscloseenough.

Figure9plotofresiduesversusexpectedvalue

8

HypothesistestingNowIknowIfoundanusefultransformationofmydataIdosomemorehypothesistestingwiththetransformeddata.Ialreadysawasignificantdifferencebetweendifferentlocationofpreparations.NowIamalsocuriousiftherearedifferencesbetweengenera.IntheANOVAtablebelowitcanbeseenthatthedifferentgeneradonotresultinthesameaverageoutbreaksize.

Df Sum Sq Mean Sq F value Pr(>F) Genus 7 648 92.50 88.88 <2e-16 *** Residuals 6783 7060 1.04 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Ialsowouldliketoknowiftherearesignificantdifferencesbetweenstates.FromtheANOVAtablebelowitisclearthatthereareverysignificantdifferencesbetweenstates.

Df Sum Sq Mean Sq F value Pr(>F) State 53 828 15.620 15.3 <2e-16 *** Residuals 6737 6879 1.021 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theplotstochecktheassumptionsfortheseresidualsareleftoutforspacereasons,butcanbefoundintheappendix.Theygivenoreasontodoubtanyoftheassumptions.

Blocking?Wehaveseenthattherearedifferencesinaverageoutbreaksizebetweenoutbreaksindifferentstates,fromdifferentlocationsofpreparationandbydifferentgenera.CombiningtheeffectsinoneANOVAsothattheresiduesareexplainedbyallfactorswouldreducetheunexplainedremainingresidues.ThiswouldincreasethesensitivityoftheANOVAtestandwiththatthesignificanceoftheresults.However,theresultsarealreadyassignificantasitgets,soblockingisnotnecessary.

GraphicalANOVAsToseewherethedifferencesexactlylieIperformagraphicalANOVAforthelocations,generaandstates.ToknowwhichpointinthegraphcorrespondentstowhatIalsoshowtheaveragesinatable.Nexttothis,bylookingattherealaverages(andnotthetransformedones)inthesetableswecanseewhetherornotasignificantdifferenceisalsoaninterestingone.Withsuchalargedata-setadifferenceisalreadyquitefastasignificantone,butifthedifferenceissmall,thismightnotbeveryinteresting.

9

LocationsThegraphicalANOVAforthedifferentlocationsofpreparationcanbeseeninFigure10andtheaveragesaregiveninTable3.

Figure10GraphicalANOVAforlocations

Table3averageoutbreaksizeperlocationofpreparation

Locationofpreparation AverageoutbreaksizeBanquet 36.30Caterer 37.37Health care 31.68Private home 14.31Restaurant 17.93

Theresidualsarespreadfrom-3to5whilethedifferencebetweenlocationsisasbigas40.ItcanthereforebeseenclearlyfromthisgraphicalANOVAthatthereisasignificantdifferencebetweensomelocations.Thelowpointinthegraphisactuallyboththepointforprivatehomeandtherestaurant.Theyaresoclosetoeachotherthattheycannotbeseenasindividualpoints.Thustheyarevirtuallythesame.Theyarehoweververydifferentfromtheotherlocations:banquet,catererandhealthcare.Thismakessenseasatabanquetandacatererfoodisprovidedtobiggroupsofpeople.Athealthcaretheamountofpeopleeatingthefoodisnotasbigasforcatererorbanquet,butasthepeopleeatingthefoodarefragile,theyaremorepronetogetsickfromafoodcontamination.

10

GeneraThegraphicalANOVAforthedifferentgeneracausingtheillnessescanbeseeninFigure11Figure10andtheaveragesaregiveninTable4.Theorderofaverageoutbreaksizefromsmallesttolargestisgiventomakeiteasiertoseewhatthedifferentgenerahaveaseffectontheaverageoutbreaksize.

Figure11graphicalANOVAforgenera

Table4averageoutbreaksizepergenuscausingit

Genus Averageoutbreaksize OrderBacillus (B) 7.34 1Campylobacter (Ca) 10.79 2Clostridium (Cl) 24.54 6Escherichia (Es) 21.44 5Norovirus (N) 22.34 7Salmonella (Sa) 18.23 4Shigella (Sh) 33.39 8Staphylococcus (St) 11.38 3

BacillusoutbreaksresultonaverageintheleastillnesseswhileShigellaoutbreakscausethemost.ThebigaverageoutbreaksizebyShigellacouldbeexplainedbythefactthattheinfectiousdoseislow.Inotherwords,afewcellsarealreadyenoughtocauseadisease.Commonly,Shigellaisspreadperson-to-person,butwhenfoodispreparedbypersonnelthatcariesthebacterium,thefoodcangetinfected(Adams&Moss2008).Insuchsituationsoftenmanypeoplegetillasonesickpersoninthepersonnelofforexampleacateringcompanyhandlethefoodformany.Also,peoplethatgetillfromShigellaarelikelytoreportthemselves,asthesymptomsofshigellosisareoftenquiteextremeandneedmedicalattention.

11

Bacillusisaverycommonfoodpathogen.Itformssporesbywhichitcansurviveharshconditions.DifferentspeciesofBacillusformdifferententerotoxinswhichcanresultintwodifferentillnesses:diarrhoealandemeticsyndrome.Bothillnessesareoftenoverinlessthan24hoursandthesymptomsareinmostcasesquitenormal,likevomitinganddiarrhoea(Adams&Moss2008).Asmostpeopledonotreportthesekindsofsymptomsiftheyareoverinaday,itisverylikelythattheamountofillnessesofanoutbreakofabacillusspecieareunderreported.Itishoweveralsopossiblethatoutbreaksarereallysmallerasindividualproductscanbethesourceofanoutbreak(insteadofpersonnelhandlingfoodformany).Forexample,B.cereuscansurviveinpasteurizedmilk,butwillonlygrowandproducetoxinswhenstoredattoohightemperature(Adams&Moss2008).Inthiswayitispossiblethatonlyonepackagebecomesunsafe.Solelythepeopleeatingfromthatonepackagethengetsick(likeafamily).Thisresultsinsmalleroutbreaks.

StatesThegraphicalANOVAforthedifferentstateswheretheoutbreaksoccurredcanbeseeninFigure12Figure10andtheaveragesaregiveninTable5onthenextpage.Astherearemanystates,themostinterestingonesarehighlighted.

Figure12graphicalANOVAforstates

12

Table5Averageoutbreaksizeperstateitoccurredin

State Averageoutbreaksize State AverageoutbreaksizeAlabama 18.25 Nebraska 32.93Alaska 12.78 Nevada 51.45Arizona 24.33 NewHampshire 21.74Arkansas 28.27 NewJersey 21.02California 17.58 NewMexico 24.89Colorado 18.46 NewYork 25.37Connecticut 15.89 NorthCarolina 31.21Delaware 17.25 NorthDakota 32.06Florida 10.87 Ohio 18.71Georgia 24.60 Oklahoma 23.62Guam 3.50 Oregon 16.63Hawaii 22.69 Pennsylvania 20.22Idaho 19.71 PuertoRico 22.18Illinois 24.98 RepublicofPalau 6.00Indiana 21.63 RhodeIsland 17.53Iowa 31.23 SouthCarolina 29.57Kansas 20.51 SouthDakota 36.83Kentucky 30.45 Tennessee 29.62Louisiana 31.25 Texas 47.88Maine 10.19 Utah 33.97Maryland 18.49 Vermont 7.40Massachusetts 27.59 Virginia 28.89Michigan 35.71 Washington 13.68Minnesota 15.90 WashingtonDC 48.86Mississippi 50.62 WestVirginia 15.63Missouri 28.08 Wisconsin 24.08Montana 57.40 Wyoming 44.09TwostrikingaveragesarethoseforGuamandtheRepublicofPalau,withonly3.5and6illnessesrespectivelyonaverageperoutbreak.Aquickgoogleexplainswhy.GuamisalittleislandfareastofthePhilippinesandisinhabitedbylessthan200.000people(Wikipedia2016a).PalauisanotherislandrelativelyclosetoGuamandhasevenlesspeople:about25.000(Wikipedia2016c).Thesmallamountofpeopleandthedistancebetweenthemainlandandtheislandsprobablyexplainthesmalloutbreaks.AsitisacolonyoftheUSAwithquiteadifferentculture,theeagernesstoreportislikelytobesmaller.Alsotherearesimplylesspeopletogetill.Themainlandstate,Vermont,becomestheninterestingwithanaverageoutbreaksizeofonly7.4.Vermonthasagoodreputationwhenitcomestopublichealth.VermontgotfirstrankforhealthoutcomesintheUSAin2010.From2000to2008Vermontwasrankedasthehealthiestplacetolivesevenoutofeighttimes(Wikipedia2016d).Thelowaverageoutbreaksizefitsinthispicture.NevadaandMississippiareonthecompleteothersideoftherange,with51.5and50.6illnessesrespectivelyonaverageperoutbreak.Mississippiisinfamousforitshealthcare.ItwasgiventhelowestrankforhealthcareamongalltheAmericanstatesbytheCommonwealthFund(Wikipedia2016b).ThelargeaverageoutbreaksizeinNevadamightbecausedbythepopularityofLasVegas.MassivescaledbuffetsareverycommoninLasVegas,whichmeansthatifthereisanoutbreak,many(tourists)willgetsickatonce.

13

ConclusionanddiscussionItcanbeconcludedthatfoodborneoutbreaksdifferinsizedependingonlocationofpreparationofthefood,micro-organismcausingtheillnessandthestateitoccursin.Sizeinthiscasereferstothereportedamountofillnesses.Thequestionishoweverhowrealisticthereportsreflecttherealoutbreaksizes.Probably,alloutbreaksareunderreported,butsomemightbemoreunderreportedthanothers,whichisproblematicasthismightcreatesignificantdifferenceswhereactuallytherearenone.Nexttothis,Iwanttomentionthatabiggeroutbreaksizedoesnotdirectlysaysomethingabouttheseriousnessoftheoutbreak.Abigoutbreakcouldmeanthat50peoplehadtovomitones,andasmallonecouldmeanthat10peopledied.Thisreportissimplyandonlyabouttheamountofpeopleaffectedperoutbreak.AnotherremarkIwouldliketomakeisthefactthatIomitteddatabecauseofambiguity.Itcouldbethataspecificgenusishardtodistinguish,butactuallycreatesbigoutbreaks.Thiswouldnotbeseeninthisanalysisbecausealldata-pointswithmultiplepossiblemicro-organismscausingit,werenottakenintoaccount.Theomittingofdatadoeshaveanadvantageconsideringtheconclusionsaboutlocationofpreparation.Thefactthatunsafefoodwaspreparedsomewheredoesnotsaythatsomethingwentwronginthatparticularlocation.Itmightbethatatthefactory,oratthefarm,orduringtransportsomethingwentwrongbywhichunsafefoodwascreated.However,iffoodbecomesunsafeinanearlystepintheproduction,itislikelytoendupindifferentplaces.Byomittingthedata-pointswithmultiplelocations,itismoreprobablethatitwasactuallyamistakeatthatlocationcausingthediseaseintheremainingdata-points.Withthisitispossibletomakemorereliableconclusionsaboutwhereamistakeresultsinthebiggestfoodborneoutbreak.

ReferencesAdams, M.R. & Moss, M.O., 2008. Food Microbiology 3th ed., The Royal Society of Chemistry.

Box, G.E.P., Hunter, J.S. & Hunter, W.G., 2005. Statistics for Experimenters 2nd ed., Wiley-Interscience.

CDC, 2000. Appendix B Guidelines for Confirmation of Foodborne-Disease Outbreaks. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/ss4901a3.htm [Accessed May 23, 2016].

CDC, 2015. Foodborne Outbreak Online Database (FOOD Tool). Available at: http://wwwn.cdc.gov/foodborneoutbreaks/ [Accessed May 20, 2016].

Wikipedia, 2016a. Guam. Available at: https://en.wikipedia.org/wiki/Guam [Accessed May 20, 2016].

Wikipedia, 2016b. Mississippi - health. Available at: https://en.wikipedia.org/wiki/Mississippi#Health [Accessed May 20, 2016].

Wikipedia, 2016c. Palau. Available at: https://en.wikipedia.org/wiki/Palau [Accessed May 20, 2016].

Wikipedia, 2016d. Vermont - Public health. Available at: https://en.wikipedia.org/wiki/Vermont#Public_health [Accessed May 20, 2016].

14

Appendix

Thecodesetwd("~/R/workingdirectory")Data=read.table("projectdata.txt",header=T,"\t")attach(Data)mean(Illnesses)plot(Location,Illnesses)plot(Genus,Illnesses)plot(State,Illnesses)##ANOVA###influencelocationonamountofIllnessesr.l=aov(Illnesses~Location)summary(r.l)#locationmatters,noisedidnotpreventresult,blockingnecessary?#assumptions#res.l=resid(r.l)qqnorm(res.l)#normalitydata-->notnormal!qqline(res.l)plot(x=Year,y=res.l)#independence-->notrendplot(fitted(r.l),res.l)#equalvariance-->notreally!#interpretationANOVAisquestionable##!needofdatatransformationorothertest!##datatransformation:logr.tl=aov(log(Illnesses)~Location)summary(r.tl)res.tl=resid(r.tl)qqnorm(res.tl)qqline(res.tl)plot(Year,res.tl)plot(fitted(r.tl),res.tl)#Better!Stillverysignificantinfluenceoflocationofpreparation#influencegenusonamountofIllnessesr.g=aov(log(Illnesses)~Genus)summary(r.g)res.g=resid(r.g)qqnorm(res.g)qqline(res.g)plot(Year,res.g)plot(fitted(r.g),res.g)#Assumptionsarealright,genusmattersverysignificantly!#influencestateonamountofIllnessesr.s=aov(log(Illnesses)~State)summary(r.s)

15

res.s=resid(r.s)qqnorm(res.s)qqline(res.s)plot(Year,res.s)plot(fitted(r.s),res.s)#Assumptionsarealright,Statemattersalsoverysignificantly!#blockingispossible,butnecessary?Testisalreadyverysign.#nowIwanttoknowwherethedifferencesare!##GraphicalANOVA##raw.total=c(Data$Illnesses)total=log(raw.total)ga=mean(total)#grandaveragepar(mfrow=2:1)#LocationsmeanL=aggregate(x=log(Illnesses),by=list(Location=Location),mean)devL=meanL$x-gastripchart(sqrt(6786/4)*devL,main="Locations")stripchart(res.tl,main="Residuals",method="stack",offset=0.005)#GenusmeanG=aggregate(x=log(Illnesses),by=list(Genus=Genus),mean)devG=meanG$x-gastripchart(sqrt(6783/7)*devG,main="Genera")stripchart(res.g,main="Residuals",method="stack",offset=0.005)#StatemeanS=aggregate(x=log(Illnesses),by=list(State=State),mean)devS=meanS$x-gastripchart(sqrt(6737/53)*devS,main="States")stripchart(res.s,main="Residuals",method="stack",offset=0.005)

16

Assumptions

Genera

17

18

States

19

top related