analysing foodborne outbreaks in the usa renske · analysing foodborne outbreaks in the usa project...

19
1 Analysing foodborne outbreaks in the USA Project for Design of Experiments – by Renske Bouma Introduction Food is vital to life, but can also cause illness or even death. Food can namely be a carrier of dangerous micro-organisms, which then will result in foodborne disease. According to the World Health Organisation (WHO) a foodborne disease is defined as: ‘Any disease of an infectious or toxic nature caused by, or thought to be caused by, the consumption of food or water’. This definition also includes diseases caused by non-microbial substances, like harmful pesticides or processing chemicals. Most common are however the illnesses caused by micro-organisms and their toxins (Adams & Moss 2008) and these will be the focus of this report. A foodborne outbreak is defined as: ‘An incident in which two or more persons experience a similar illness resulting from the ingestion of a common food’ (CDC 2000). Often enough, the outbreaks are larger than two illnesses. In this report foodborne outbreaks in the USA are investigated for the dependence of average outbreak size on location of preparation of the food vehicle, the micro-organism that caused the disease and the state it occurred in. The database The database I use is put together by the Centers for Disease Control and Prevention (CDC). This is the organisation in the USA that is working towards a better public health. To know how to do this, the CDC needs to know where the problems lay and therefore it monitors the prevalence of diseases, like foodborne diseases. It created the FOOD tool, the Foodborne Outbreak Online Database (CDC 2015), in which all reported cases of foodborne outbreaks that were reported to the CDC since 1998 are included. I downloaded an extensive excel file from their website to use for the statistical analysis. The CDC warns that the database is not final, reports can still be changed when new information is gathered. The database I used was lastly updated on 16 October 2015. The newer reports could therefore be reflecting the true outbreak less than the older reports, which could lead to systematic errors. However, the database does not contain reports newer than 2014, so also the newest outbreaks had almost a year to be fully reported. The database includes the following information from every outbreak (see Figure 1) : the year and month it occurred, the state, the specie/ species that (probably) caused the disease, the serotype (if known) of the micro-organism, the etiology status (confirmed or only suspected origin), the location(s) of preparation of the infected food, the resulting illnesses, hospitalizations and deaths, the food vehicle and the contaminated ingredient in this food item. To simplify I only use year, state, genus, location of preparation and resulting illnesses in my analysis. These are the factors that can be grouped most easily in groups that are still big enough for analysis and that seem the most interesting to me. Figure 1 the original database

Upload: tranque

Post on 06-Sep-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

1

AnalysingfoodborneoutbreaksintheUSAProjectforDesignofExperiments–byRenskeBouma

IntroductionFoodisvitaltolife,butcanalsocauseillnessorevendeath.Foodcannamelybeacarrierofdangerousmicro-organisms,whichthenwillresultinfoodbornedisease.AccordingtotheWorldHealthOrganisation(WHO)afoodbornediseaseisdefinedas:‘Anydiseaseofaninfectiousortoxicnaturecausedby,orthoughttobecausedby,theconsumptionoffoodorwater’.Thisdefinitionalsoincludesdiseasescausedbynon-microbialsubstances,likeharmfulpesticidesorprocessingchemicals.Mostcommonarehowevertheillnessescausedbymicro-organismsandtheirtoxins(Adams&Moss2008)andthesewillbethefocusofthisreport.Afoodborneoutbreakisdefinedas:‘Anincidentinwhichtwoormorepersonsexperienceasimilarillnessresultingfromtheingestionofacommonfood’(CDC2000).Oftenenough,theoutbreaksarelargerthantwoillnesses.InthisreportfoodborneoutbreaksintheUSAareinvestigatedforthedependenceofaverageoutbreaksizeonlocationofpreparationofthefoodvehicle,themicro-organismthatcausedthediseaseandthestateitoccurredin.

ThedatabaseThedatabaseIuseisputtogetherbytheCentersforDiseaseControlandPrevention(CDC).ThisistheorganisationintheUSAthatisworkingtowardsabetterpublichealth.Toknowhowtodothis,theCDCneedstoknowwheretheproblemslayandthereforeitmonitorstheprevalenceofdiseases,likefoodbornediseases.ItcreatedtheFOODtool,theFoodborneOutbreakOnlineDatabase(CDC2015),inwhichallreportedcasesoffoodborneoutbreaksthatwerereportedtotheCDCsince1998areincluded.Idownloadedanextensiveexcelfilefromtheirwebsitetouseforthestatisticalanalysis.TheCDCwarnsthatthedatabaseisnotfinal,reportscanstillbechangedwhennewinformationisgathered.ThedatabaseIusedwaslastlyupdatedon16October2015.Thenewerreportscouldthereforebereflectingthetrueoutbreaklessthantheolderreports,whichcouldleadtosystematicerrors.However,thedatabasedoesnotcontainreportsnewerthan2014,soalsothenewestoutbreakshadalmostayeartobefullyreported.Thedatabaseincludesthefollowinginformationfromeveryoutbreak(seeFigure1):theyearandmonthitoccurred,thestate,thespecie/speciesthat(probably)causedthedisease,theserotype(ifknown)ofthemicro-organism,theetiologystatus(confirmedoronlysuspectedorigin),thelocation(s)ofpreparationoftheinfectedfood,theresultingillnesses,hospitalizationsanddeaths,thefoodvehicleandthecontaminatedingredientinthisfooditem.TosimplifyIonlyuseyear,state,genus,locationofpreparationandresultingillnessesinmyanalysis.Thesearethefactorsthatcanbegroupedmosteasilyingroupsthatarestillbigenoughforanalysisandthatseemthemostinterestingtome.

Figure1theoriginaldatabase

2

HypothesisIwouldliketoknowwhereamistakecausesthemostillnesses.Doesamistakebyacaterercausemoreillnessesonaveragethanamistakeatabanquet?Doesanoutbreakatarestaurantcausemoreillnessesthanoneathealthcare?Nexttothat,Iamcuriousiftheotherfactors,genusandstate,playaroleaswell.Whichgenuscausesthemostillnessesperoutbreak?Aretheredifferencesbetweenstatesinhowbigtheoutbreaksare?

ExperimentaldesignIwanttoknowwhetherornotthedifferencesinamountofillnessperoutbreakbetweendifferentlocationsofpreparations,generaandstatesaresignificantornot.Isitsimplybecauseofchancethattheylookdifferentorisitlikelythatthereisarealdifference?ToknowthisIwillanalysethevarianceofthedatawithanone-waybalancedANOVAinR.BeforeIcandothis,Ihavetostructurethedata.

StructuringofthedataTogetaclearresultIremovedallthedata-pointswithmultiplepossiblespecies,multiplelocationofpreparationsandthemultistateoutbreaks.IgroupedthedifferentspeciesofthemostcommongeneraasdisplayedinTable1andomittedalltheotherdata-pointsfromlesscommongenera.IalsogroupeddifferentlocationofpreparationsasdisplayedinTable2andleftoutallotherdata-pointsfromlesscommonlocations(likecampsandfestivals).Noneofthegroupshaslessthan100data-points,whichIbelievegivesagoodreliability.

Table1groupingofdifferentspeciesintheirrespectivegenus

group contains data-points Bacillus (B) B. cereus, B. other, B. unknown 246 Campylobacter (Ca) C. jejuni, C. coli, C. fetus, C. other, C unknown 185 Clostridium (Cl) C. perfringens, C. botulinum 547 Escherichia (Es) E. coli, enteroaggregative, E. coli enteropathogenic, E.

coli other, E. coli shiga toxin-producing 222

Norovirus (N) Norovirus, Norovirus Genogroup 1, Norovirus Genogroup 2, Norovirus unknown

3729

Salmonella (Sa) Salmonella, S. enterica, S. other, S. unknown, 1335 Shigella (Sh) Shigella, S. boydii, S. dysenteriae, S flexneri, S. sonnei,

S. unknown 112

Staphylococcus (St) S. aureus, S. other, S. unknown 415

Table2groupingofdifferentlocationsofpreparation

group contains data-points Banquet Banquet facility (food prepared and served on-site) 210

Caterer Caterer (food prepared off-site from where served), Caterer;

unknown, Caterer; other 687

Health care Hospital, Long-term care/nursing home/assisted living facility, long-term care..; Hospital, long-term care..; Other

162

Private home Private home/residence 861 Restaurant Restaurant- “Fast food” (drive up service or pay at counter),

Restaurant- other or unknown type, Restaurant – other or unknown type; Other, Restaurant – Sit-down dining, Restaurant – sit-down dining; Other

4871

3

Afterthisstructuringand‘cleaning’,thedata-setlookedlikeshowninFigure2.Thedatasetstillcontainedmorethan6500data-points.

FirstlookatthedataBeforestartingtogointothestatistics,Ihavealookattherawdata.Themeanofallillnessesfromalloutbreaksis20.33.Plottingthedatagivesanideaabouttherangetheamountofillnessesperoutbreakcanbein.InFigure3theoutbreaksize(theamountofillnessesperoutbreak)forthedifferentlocationsisgiven.Itcanbeseenthatmostoftheoutbreaksresultinlessthan100illnesses,buttherearemanyexceptions.Fiveoutbreakswererelativelyextremeandresultedinmorethan600reportedillpeople.Mostoutliersareseenattherestaurant,butnoconclusionscanbedrawnfromthis,astheamountofdata-pointsarenotequalforalllocations.Thedata-setcontainsbyfarthemostpointsforrestaurants,whichcanexplainthebiggerrangeofpointswithinthisgroup.Whenonlylookingattheboxplots,thereseemstobeadifferencebetweenthefirstthreelocationsandthelasttwo.

Figure3boxplotofoutbreaksizeversuslocationofpreparation

Figure2thestructureddata-setinExcel

4

TheboxplotspergenusisgiveninFigure4.Ofcoursemostoftheoutbreaksareagainbelow100illnesses,asthesamedataisplottedasbefore,justinadifferentgrouping.Interestingtoseeiswheretheoutliersare.Escherichiaseemsatfirstsighttobequitedangerous,becauseofthehighoutliers,buttheboxplotitselfisquitesmall.Norovirushasanexceptionalamountofoutliers,butjustasfortherestaurants,noconclusioncanbedrawnfromthis.TheNorovirusgroupcontainsthemostdata-pointsfromallgroupings,sothiscouldexplainthehigheramountofoutliers.Whenjustlookingattheboxplot,itseemsthatClostridium,NorovirusandShigellaareresultinginthemoreillnessesthantheothergenera.WewillseeifthisdifferenceisseenaswellfromtheANOVA.

Figure4boxplotofoutbreaksizeversusgenus

TheboxplotperstateisgiveninFigure5.Itisclearthattherearedifferencesbetweenthestates.Wewillfindoutlaterwhetherornotthesedifferencesaresignificantornot.

Figure5boxplotofoutbreaksizeversusstate

5

HypothesistestingMymaininterestiswhetherornotthelocationofpreparationofthefoodthatcausestheoutbreakhasasignificantinfluenceontheoutbreaksize.SoIperformanANOVAinRonthelocation.Thisistheoutput:

Df Sum Sq Mean Sq F value Pr(>F) Location 4 333252 83313 54.51 <2e-16 *** Residuals 6786 10371990 1528 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theresultshowsthatatleastbetweentwolocationsthereisasignificantdifferenceinaverageoutbreaksize.Thisdifferenceissoextremethattheprobabilitythatthiswouldoccurunderthenullhypothesis(thereisnodifference)islessthan0.1%.

ValidationofassumptionsTheinterpretationoftheANOVAisonlyvalidwhentheassumptionsareacceptable.TheresultofANOVAismeaningfulwhenitcanbeassumedthattheresiduesarenormally,independentlyandidenticallydistributed(NIID).NormalitycanbecheckedbymakingaQ-Qplot.Whentheresiduesaremoreorlessnormallydistributedastraightlineisobserved.InFigure6itcanbeseenthatthisisnotthecase.Anon-parametictestshouldbeusedorthedatashouldbetransformedsothatthenormalityassumptionbecomesvalid.

Figure6Q-QplotofresiduesfromANOVAoflocation

6

HypothesistestingInthebook(Boxetal.2005)itwassuggestedthattakingthelogarithmoftheoutput(hereoutbreaksize)canstabilizethevariance.IperformtheANOVAwiththelog-transformeddata.ThisistheoutputfromR:

Df Sum Sq Mean Sq F value Pr(>F) Location 4 846 211.52 209.2 <2e-16 *** Residuals 6786 6861 1.01 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theresultisthesameasbefore:thereisaverysignificancedifference.Letusseeifthistimewecantrusttheresult.

ValidationofassumptionsAgain,ImakeaQ-Qplottocheckiftheresiduesarenormallydistributed.AscanbeseeninFigure7thedotsaremoreorlessononeline.Normalitycanthereforebeassumed.

Figure7Q-QplotofresiduesoftheANOVAoflocationaftertransformation

7

Theresiduesshouldalsobeapproximatelyindependentlyandidenticallydistributed.InFigure8itcanbeseenthattheresiduesarespreadmoreorlessthesameovertheyears.Thereisnotrend,soprobablythereisnoautocorrelationbetweenthedata-points.Thismeanstheassumptionofindependencecanbemade.

Figure8plotofresiduesintimeorder

InFigure9theresiduesperexpectedvalueareplotted.Curiousenough,onlyfourlinesappear,whileIamresearchingfivelocations.Apparentlytwolocationshaveanexpectedvalueveryclosetoeachother.Thespreadofresiduesshouldbeapproximatelythesameperexpectedvaluetobeabletoassumeidenticaldistribution.Thespreadisnotidenticalhere,butitiscloseenough.

Figure9plotofresiduesversusexpectedvalue

8

HypothesistestingNowIknowIfoundanusefultransformationofmydataIdosomemorehypothesistestingwiththetransformeddata.Ialreadysawasignificantdifferencebetweendifferentlocationofpreparations.NowIamalsocuriousiftherearedifferencesbetweengenera.IntheANOVAtablebelowitcanbeseenthatthedifferentgeneradonotresultinthesameaverageoutbreaksize.

Df Sum Sq Mean Sq F value Pr(>F) Genus 7 648 92.50 88.88 <2e-16 *** Residuals 6783 7060 1.04 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Ialsowouldliketoknowiftherearesignificantdifferencesbetweenstates.FromtheANOVAtablebelowitisclearthatthereareverysignificantdifferencesbetweenstates.

Df Sum Sq Mean Sq F value Pr(>F) State 53 828 15.620 15.3 <2e-16 *** Residuals 6737 6879 1.021 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Theplotstochecktheassumptionsfortheseresidualsareleftoutforspacereasons,butcanbefoundintheappendix.Theygivenoreasontodoubtanyoftheassumptions.

Blocking?Wehaveseenthattherearedifferencesinaverageoutbreaksizebetweenoutbreaksindifferentstates,fromdifferentlocationsofpreparationandbydifferentgenera.CombiningtheeffectsinoneANOVAsothattheresiduesareexplainedbyallfactorswouldreducetheunexplainedremainingresidues.ThiswouldincreasethesensitivityoftheANOVAtestandwiththatthesignificanceoftheresults.However,theresultsarealreadyassignificantasitgets,soblockingisnotnecessary.

GraphicalANOVAsToseewherethedifferencesexactlylieIperformagraphicalANOVAforthelocations,generaandstates.ToknowwhichpointinthegraphcorrespondentstowhatIalsoshowtheaveragesinatable.Nexttothis,bylookingattherealaverages(andnotthetransformedones)inthesetableswecanseewhetherornotasignificantdifferenceisalsoaninterestingone.Withsuchalargedata-setadifferenceisalreadyquitefastasignificantone,butifthedifferenceissmall,thismightnotbeveryinteresting.

9

LocationsThegraphicalANOVAforthedifferentlocationsofpreparationcanbeseeninFigure10andtheaveragesaregiveninTable3.

Figure10GraphicalANOVAforlocations

Table3averageoutbreaksizeperlocationofpreparation

Locationofpreparation AverageoutbreaksizeBanquet 36.30Caterer 37.37Health care 31.68Private home 14.31Restaurant 17.93

Theresidualsarespreadfrom-3to5whilethedifferencebetweenlocationsisasbigas40.ItcanthereforebeseenclearlyfromthisgraphicalANOVAthatthereisasignificantdifferencebetweensomelocations.Thelowpointinthegraphisactuallyboththepointforprivatehomeandtherestaurant.Theyaresoclosetoeachotherthattheycannotbeseenasindividualpoints.Thustheyarevirtuallythesame.Theyarehoweververydifferentfromtheotherlocations:banquet,catererandhealthcare.Thismakessenseasatabanquetandacatererfoodisprovidedtobiggroupsofpeople.Athealthcaretheamountofpeopleeatingthefoodisnotasbigasforcatererorbanquet,butasthepeopleeatingthefoodarefragile,theyaremorepronetogetsickfromafoodcontamination.

10

GeneraThegraphicalANOVAforthedifferentgeneracausingtheillnessescanbeseeninFigure11Figure10andtheaveragesaregiveninTable4.Theorderofaverageoutbreaksizefromsmallesttolargestisgiventomakeiteasiertoseewhatthedifferentgenerahaveaseffectontheaverageoutbreaksize.

Figure11graphicalANOVAforgenera

Table4averageoutbreaksizepergenuscausingit

Genus Averageoutbreaksize OrderBacillus (B) 7.34 1Campylobacter (Ca) 10.79 2Clostridium (Cl) 24.54 6Escherichia (Es) 21.44 5Norovirus (N) 22.34 7Salmonella (Sa) 18.23 4Shigella (Sh) 33.39 8Staphylococcus (St) 11.38 3

BacillusoutbreaksresultonaverageintheleastillnesseswhileShigellaoutbreakscausethemost.ThebigaverageoutbreaksizebyShigellacouldbeexplainedbythefactthattheinfectiousdoseislow.Inotherwords,afewcellsarealreadyenoughtocauseadisease.Commonly,Shigellaisspreadperson-to-person,butwhenfoodispreparedbypersonnelthatcariesthebacterium,thefoodcangetinfected(Adams&Moss2008).Insuchsituationsoftenmanypeoplegetillasonesickpersoninthepersonnelofforexampleacateringcompanyhandlethefoodformany.Also,peoplethatgetillfromShigellaarelikelytoreportthemselves,asthesymptomsofshigellosisareoftenquiteextremeandneedmedicalattention.

11

Bacillusisaverycommonfoodpathogen.Itformssporesbywhichitcansurviveharshconditions.DifferentspeciesofBacillusformdifferententerotoxinswhichcanresultintwodifferentillnesses:diarrhoealandemeticsyndrome.Bothillnessesareoftenoverinlessthan24hoursandthesymptomsareinmostcasesquitenormal,likevomitinganddiarrhoea(Adams&Moss2008).Asmostpeopledonotreportthesekindsofsymptomsiftheyareoverinaday,itisverylikelythattheamountofillnessesofanoutbreakofabacillusspecieareunderreported.Itishoweveralsopossiblethatoutbreaksarereallysmallerasindividualproductscanbethesourceofanoutbreak(insteadofpersonnelhandlingfoodformany).Forexample,B.cereuscansurviveinpasteurizedmilk,butwillonlygrowandproducetoxinswhenstoredattoohightemperature(Adams&Moss2008).Inthiswayitispossiblethatonlyonepackagebecomesunsafe.Solelythepeopleeatingfromthatonepackagethengetsick(likeafamily).Thisresultsinsmalleroutbreaks.

StatesThegraphicalANOVAforthedifferentstateswheretheoutbreaksoccurredcanbeseeninFigure12Figure10andtheaveragesaregiveninTable5onthenextpage.Astherearemanystates,themostinterestingonesarehighlighted.

Figure12graphicalANOVAforstates

12

Table5Averageoutbreaksizeperstateitoccurredin

State Averageoutbreaksize State AverageoutbreaksizeAlabama 18.25 Nebraska 32.93Alaska 12.78 Nevada 51.45Arizona 24.33 NewHampshire 21.74Arkansas 28.27 NewJersey 21.02California 17.58 NewMexico 24.89Colorado 18.46 NewYork 25.37Connecticut 15.89 NorthCarolina 31.21Delaware 17.25 NorthDakota 32.06Florida 10.87 Ohio 18.71Georgia 24.60 Oklahoma 23.62Guam 3.50 Oregon 16.63Hawaii 22.69 Pennsylvania 20.22Idaho 19.71 PuertoRico 22.18Illinois 24.98 RepublicofPalau 6.00Indiana 21.63 RhodeIsland 17.53Iowa 31.23 SouthCarolina 29.57Kansas 20.51 SouthDakota 36.83Kentucky 30.45 Tennessee 29.62Louisiana 31.25 Texas 47.88Maine 10.19 Utah 33.97Maryland 18.49 Vermont 7.40Massachusetts 27.59 Virginia 28.89Michigan 35.71 Washington 13.68Minnesota 15.90 WashingtonDC 48.86Mississippi 50.62 WestVirginia 15.63Missouri 28.08 Wisconsin 24.08Montana 57.40 Wyoming 44.09TwostrikingaveragesarethoseforGuamandtheRepublicofPalau,withonly3.5and6illnessesrespectivelyonaverageperoutbreak.Aquickgoogleexplainswhy.GuamisalittleislandfareastofthePhilippinesandisinhabitedbylessthan200.000people(Wikipedia2016a).PalauisanotherislandrelativelyclosetoGuamandhasevenlesspeople:about25.000(Wikipedia2016c).Thesmallamountofpeopleandthedistancebetweenthemainlandandtheislandsprobablyexplainthesmalloutbreaks.AsitisacolonyoftheUSAwithquiteadifferentculture,theeagernesstoreportislikelytobesmaller.Alsotherearesimplylesspeopletogetill.Themainlandstate,Vermont,becomestheninterestingwithanaverageoutbreaksizeofonly7.4.Vermonthasagoodreputationwhenitcomestopublichealth.VermontgotfirstrankforhealthoutcomesintheUSAin2010.From2000to2008Vermontwasrankedasthehealthiestplacetolivesevenoutofeighttimes(Wikipedia2016d).Thelowaverageoutbreaksizefitsinthispicture.NevadaandMississippiareonthecompleteothersideoftherange,with51.5and50.6illnessesrespectivelyonaverageperoutbreak.Mississippiisinfamousforitshealthcare.ItwasgiventhelowestrankforhealthcareamongalltheAmericanstatesbytheCommonwealthFund(Wikipedia2016b).ThelargeaverageoutbreaksizeinNevadamightbecausedbythepopularityofLasVegas.MassivescaledbuffetsareverycommoninLasVegas,whichmeansthatifthereisanoutbreak,many(tourists)willgetsickatonce.

13

ConclusionanddiscussionItcanbeconcludedthatfoodborneoutbreaksdifferinsizedependingonlocationofpreparationofthefood,micro-organismcausingtheillnessandthestateitoccursin.Sizeinthiscasereferstothereportedamountofillnesses.Thequestionishoweverhowrealisticthereportsreflecttherealoutbreaksizes.Probably,alloutbreaksareunderreported,butsomemightbemoreunderreportedthanothers,whichisproblematicasthismightcreatesignificantdifferenceswhereactuallytherearenone.Nexttothis,Iwanttomentionthatabiggeroutbreaksizedoesnotdirectlysaysomethingabouttheseriousnessoftheoutbreak.Abigoutbreakcouldmeanthat50peoplehadtovomitones,andasmallonecouldmeanthat10peopledied.Thisreportissimplyandonlyabouttheamountofpeopleaffectedperoutbreak.AnotherremarkIwouldliketomakeisthefactthatIomitteddatabecauseofambiguity.Itcouldbethataspecificgenusishardtodistinguish,butactuallycreatesbigoutbreaks.Thiswouldnotbeseeninthisanalysisbecausealldata-pointswithmultiplepossiblemicro-organismscausingit,werenottakenintoaccount.Theomittingofdatadoeshaveanadvantageconsideringtheconclusionsaboutlocationofpreparation.Thefactthatunsafefoodwaspreparedsomewheredoesnotsaythatsomethingwentwronginthatparticularlocation.Itmightbethatatthefactory,oratthefarm,orduringtransportsomethingwentwrongbywhichunsafefoodwascreated.However,iffoodbecomesunsafeinanearlystepintheproduction,itislikelytoendupindifferentplaces.Byomittingthedata-pointswithmultiplelocations,itismoreprobablethatitwasactuallyamistakeatthatlocationcausingthediseaseintheremainingdata-points.Withthisitispossibletomakemorereliableconclusionsaboutwhereamistakeresultsinthebiggestfoodborneoutbreak.

ReferencesAdams, M.R. & Moss, M.O., 2008. Food Microbiology 3th ed., The Royal Society of Chemistry.

Box, G.E.P., Hunter, J.S. & Hunter, W.G., 2005. Statistics for Experimenters 2nd ed., Wiley-Interscience.

CDC, 2000. Appendix B Guidelines for Confirmation of Foodborne-Disease Outbreaks. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/ss4901a3.htm [Accessed May 23, 2016].

CDC, 2015. Foodborne Outbreak Online Database (FOOD Tool). Available at: http://wwwn.cdc.gov/foodborneoutbreaks/ [Accessed May 20, 2016].

Wikipedia, 2016a. Guam. Available at: https://en.wikipedia.org/wiki/Guam [Accessed May 20, 2016].

Wikipedia, 2016b. Mississippi - health. Available at: https://en.wikipedia.org/wiki/Mississippi#Health [Accessed May 20, 2016].

Wikipedia, 2016c. Palau. Available at: https://en.wikipedia.org/wiki/Palau [Accessed May 20, 2016].

Wikipedia, 2016d. Vermont - Public health. Available at: https://en.wikipedia.org/wiki/Vermont#Public_health [Accessed May 20, 2016].

14

Appendix

Thecodesetwd("~/R/workingdirectory")Data=read.table("projectdata.txt",header=T,"\t")attach(Data)mean(Illnesses)plot(Location,Illnesses)plot(Genus,Illnesses)plot(State,Illnesses)##ANOVA###influencelocationonamountofIllnessesr.l=aov(Illnesses~Location)summary(r.l)#locationmatters,noisedidnotpreventresult,blockingnecessary?#assumptions#res.l=resid(r.l)qqnorm(res.l)#normalitydata-->notnormal!qqline(res.l)plot(x=Year,y=res.l)#independence-->notrendplot(fitted(r.l),res.l)#equalvariance-->notreally!#interpretationANOVAisquestionable##!needofdatatransformationorothertest!##datatransformation:logr.tl=aov(log(Illnesses)~Location)summary(r.tl)res.tl=resid(r.tl)qqnorm(res.tl)qqline(res.tl)plot(Year,res.tl)plot(fitted(r.tl),res.tl)#Better!Stillverysignificantinfluenceoflocationofpreparation#influencegenusonamountofIllnessesr.g=aov(log(Illnesses)~Genus)summary(r.g)res.g=resid(r.g)qqnorm(res.g)qqline(res.g)plot(Year,res.g)plot(fitted(r.g),res.g)#Assumptionsarealright,genusmattersverysignificantly!#influencestateonamountofIllnessesr.s=aov(log(Illnesses)~State)summary(r.s)

15

res.s=resid(r.s)qqnorm(res.s)qqline(res.s)plot(Year,res.s)plot(fitted(r.s),res.s)#Assumptionsarealright,Statemattersalsoverysignificantly!#blockingispossible,butnecessary?Testisalreadyverysign.#nowIwanttoknowwherethedifferencesare!##GraphicalANOVA##raw.total=c(Data$Illnesses)total=log(raw.total)ga=mean(total)#grandaveragepar(mfrow=2:1)#LocationsmeanL=aggregate(x=log(Illnesses),by=list(Location=Location),mean)devL=meanL$x-gastripchart(sqrt(6786/4)*devL,main="Locations")stripchart(res.tl,main="Residuals",method="stack",offset=0.005)#GenusmeanG=aggregate(x=log(Illnesses),by=list(Genus=Genus),mean)devG=meanG$x-gastripchart(sqrt(6783/7)*devG,main="Genera")stripchart(res.g,main="Residuals",method="stack",offset=0.005)#StatemeanS=aggregate(x=log(Illnesses),by=list(State=State),mean)devS=meanS$x-gastripchart(sqrt(6737/53)*devS,main="States")stripchart(res.s,main="Residuals",method="stack",offset=0.005)

16

Assumptions

Genera

17

18

States

19