a study of calorie estimation in pictures of foodclulab.cs.arizona.edu/papers/zhou2018.pdf · a...

24
A Study of Calorie Estimation in Pictures of Food Jun Zhou 1 , Dane Bell 2 , Sabrina Nusrat 3 , Melanie Hingle 4 , Mihai Surdeanu 3 , Stephen Kobourov 3 1 Department of Computer Science, Columbia University, New York, NY, United States 2 Department of Linguistics, University of Arizona, Tucson, AZ, United States 3 Department of Computer Science, University of Arizona, Tucson, AZ, United States 4 Department of Nutritional Sciences, University of Arizona, Tucson, AZ, United States Abstract Background: Software designed to accurately estimate food calories from still images could help users and health professionals more efficiently identify dietary patterns and food choices associated with health and health risks. However, calorie estimation from images is difficult, and no publicly available software can do so accurately while minimizing the burden associated with data collection and analysis. Objective: The aim of this study is to determine the accuracy of crowdsourced annotations of calorie content in food images, and to identify and quantify sources of bias and noise as a function of respondent characteristics and food qualities (e.g., energy density). Methods: We invited adult social media users to provide calorie estimates for 20 food images (for which ground truth calorie data were known) using a custom-built webpage that administers an online quiz. The images were selected to provide a range of food types and energy density. Participants optionally provided age range, gender, and their height and weight. Additionally, five nutrition experts provided annotations for the same data to form a basis of comparison. We examined estimate accuracy on the basis of expertise, demographic data, and food qualities using linear mixed effects models with participant and image index as random variables. We also analyzed the advantage of aggregating nonexpert estimates. Results: 2028 respondents agreed to participate in the study (males: 770 [38%], mean body mass index: 27.5). Average accuracy was 5 out of 20 correct guesses, where “correct” was defined as a number within 20% of the ground truth. Even a small crowd of 10 individuals achieved an accuracy of 7, exceeding the average individual's and expert annotator’s accuracy of 5. Women were more accurate than men (P<.001), and younger people were more accurate than older people (P<.001). The calorie content of energy-dense foods was overestimated (P=.024). Participants performed worse when images contained reference objects, such as credit cards, for scale (P=.014). Conclusions: Our findings provide new information about how calories are estimated from food images, which can inform the design of related software and analyses.

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

AStudyofCalorieEstimationinPicturesofFoodJun Zhou1, Dane Bell2, Sabrina Nusrat3, Melanie Hingle4, Mihai Surdeanu3, Stephen Kobourov3

1DepartmentofComputerScience,ColumbiaUniversity,NewYork,NY,UnitedStates2DepartmentofLinguistics,UniversityofArizona,Tucson,AZ,UnitedStates3DepartmentofComputerScience,UniversityofArizona,Tucson,AZ,UnitedStates4DepartmentofNutritionalSciences,UniversityofArizona,Tucson,AZ,UnitedStates

AbstractBackground: Software designed to accurately estimate food calories from stillimages could help users and health professionalsmore efficiently identify dietarypatternsandfoodchoicesassociatedwithhealthandhealthrisks.However,calorieestimation from images is difficult, and no publicly available software can do soaccurately while minimizing the burden associated with data collection andanalysis.

Objective: The aim of this study is to determine the accuracy of crowdsourcedannotationsofcaloriecontentinfoodimages,andtoidentifyandquantifysourcesofbias andnoise as a functionof respondent characteristics and foodqualities (e.g.,energydensity).

Methods:We invited adult socialmediausers toprovide calorie estimates for20foodimages(forwhichgroundtruthcaloriedatawereknown)usingacustom-builtwebpage that administers an online quiz. The imageswere selected to provide arangeoffoodtypesandenergydensity.Participantsoptionallyprovidedagerange,gender, and their height andweight. Additionally, five nutrition experts providedannotationsforthesamedatatoformabasisofcomparison.Weexaminedestimateaccuracyonthebasisofexpertise,demographicdata,andfoodqualitiesusinglinearmixedeffectsmodelswithparticipantandimageindexasrandomvariables.Wealsoanalyzedtheadvantageofaggregatingnonexpertestimates.

Results: 2028 respondents agreed to participate in the study (males: 770 [38%],mean bodymass index: 27.5). Average accuracywas 5 out of 20 correct guesses,where “correct”wasdefinedasanumberwithin20%of theground truth.Evenasmall crowd of 10 individuals achieved an accuracy of 7, exceeding the averageindividual'sandexpertannotator’saccuracyof5.Womenweremoreaccuratethanmen(P<.001),andyoungerpeopleweremoreaccuratethanolderpeople(P<.001).Thecaloriecontentofenergy-densefoodswasoverestimated(P=.024).Participantsperformedworsewhenimagescontainedreferenceobjects,suchascreditcards,forscale(P=.014).

Conclusions: Our findings provide new information about how calories areestimated from food images,which can inform thedesignof related softwareandanalyses.

Page 2: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

Keywords: Calorie estimation, image annotation, crowdsourcing, obesity, publichealth

IntroductionEstimatingcaloriesinpicturesoffoodisanimportanttask,providingdatatoinformnutrition researchandpractice, andhelping individuals achieveoptimal,balanceddietary intakes. Yet this task turns out to be difficult for both experts andnonexperts. We are using this study as an opportunity to enhance ourunderstanding ofwhether and how calorie estimationworks “in thewild”, i.e., inreal-world scenarios. There aremany applications of this understanding, rangingfrom improving themethodological rigor (and reducing theassociatedburden)ofdietary assessment, a pervasive and unanswered question in nutrition science, aswellasinfluencingthedesignofinterventionsfocusedondietarybehaviorchange.

Thefactthatindividualsdonotestimatecalorieswell[23,65-67]hasmotivatedthedesignofsoftwareapplications("apps")tohelpindividualsbetterestimatedifferentaspects of dietary intake (e.g., calories, energydensity, nutrient density, portions)usingmachine learning (ML) and by harnessing the “wisdom of the crowd”. Thelatterphenomenonwasfirstdocumentedina1907Naturepaper[62]andhasbeensuccessfully used inmany domains, ranging from gene network inference [63] tocomputational problems [64]. Apps in this space remain quite difficult to use,requiring burdensome manual logging of what one eats, or, when ML is used toclassifypicturesoffoods,explicitweightvaluestobeenteredmanually.Toalargeextent, the identification of calorie content from images of food, either throughcrowdsourcingormachinelearning,remainsanopenresearchquestion.

This work is a necessary step towards the automated identification of caloriecontentfromimagesoffood.

Theaimofthisstudyistodeterminetheaccuracyofcrowdsourcedannotationsofcaloriecontentinfoodimages,andtoidentifyandquantifysourcesofbiasandnoiseasafunctionofrespondentcharacteristicsandfoodqualities(e.g.,energydensity).

Methods

Theproposedtaskisessentiallyacombinationoftwotestsindividualsmustengageinwhenestimatingcalories.The first test relates to the relativeenergydensityofthe food pictured, whereas the second test discerns the portion size. Thus, wecontend that the ecological validity of our approach is high, despite the task'scomplexity.The studyprotocoldescribedhereinwas reviewedbyan InstitutionalReviewBoardattheUniversityofArizonaandmetthecriteriaforexemptionunder45CFR46.101(b).

We designed a simple online quiz administered by a custom-built webpage tomeasuretheaccuracyofcalorieestimationinpicturesoffood,verifytheexistenceofcollectivewisdom,andanalyzedataandfindpatternsandtrendsthatcanbeusefulinthedesignofcalorietrackingapps.

Page 3: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

We posted the quiz to SampleSize [59], a subreddit (i.e., a forum on reddit)dedicatedtopostingsurveysandsurveyresults.Thischoicewasmadeonthebasisof having a large, active user base that reflects the demographics likely to makelarge-scalefoodannotationsforreasonsofpersonalinterestinself-quantification.

Thequizbeganwithashortintroduction“Wewouldliketoseewhetheryouhaveagoodunderstandingaboutcalories.Wewillshowyouseveralpicturesof foodandyourtask,shouldyouchoosetoaccept it, istoguesshowmanycaloriesareinthefood.We will not share any identifying information about you. All of the data isanonymous.”

Figure1.Untrainedparticipantsestimatedthefoodcaloriesinthesetwentyimages.

Thequiz included20questions.Eachquestionconsistedofapictureofsomefooditem(seeFig.1)andtheprompt,“Howmanycaloriesareinthefoodpicturedhere?(Type anumber in theboxbetween50 and800).” Implausibledietarydata, from(un)intentional under-reporting or over-reporting, are a pervasive problem innutrition research and can introduce bias or lead to erroneous interpretations ofdiet-weightordiet-diseaserelationships.Acommonwayofhandlingthisissueistoexclude extreme values after the fact based on the distribution of the data (e.g.,removing datamore than 2 standard deviations from themean) or by subjectiveassessment[35].Incontrast,weprovidedtheupperandlowerlimitsontheguesses,basedonthegroundtruthdata,inordertoeaseanalreadydifficulttaskandtherebyreducetheamountofdataitwouldlaterbenecessarytoremove.Thenumbersalsohelped clarify that we were referring to kilocalories and helped reduce outliers.

Page 4: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

Neitherthecorrectcalorieamountsnorotherparticipants'answerswerevisibletoaparticipantduringtheestimationportionoftheexperiment,althoughitispossiblethat some might have read the reddit comments prior to participation, whichrevealedsomecalorievalues.Wedecidedtonotaddadditional informationtothepictures (e.g., does the sandwich contain mayonnaise?) in order to keep the taskclosertoarealisticimageannotationtask.

Following the food-relatedquestions, theparticipantswereasked toprovide theirage group, gender, and BMI. An option to calculate BMI via height and weightinformation was also available. We deliberately chose not to ask for additionaldemographic questions (e.g., location, income, education) in order to protectparticipantprivacy.

Wereportedtheaccuracyoftheindividualparticipantwhojustcompletedthequiz,aswellastheaverageaccuracyofallpriorparticipants,usingabreakdownshowingtheperformanceofeachquestion.

We used two categories of food for the pictures: single-ingredient (e.g., broccoli,cheese) and mixed-ingredients (e.g., sandwich, pizza). There were 20 pictures offooditemsintotal(Figure1),12single-ingredientand8mixed.Thefoodshowninthepicturesranged from100 to720kcal. Importantly,wechose these food itemsaccording to theUSDA'sMyPlatemodel [9] thatcaptures thebuildingblocks forahealthydiet,andwhichincludesfivetypesoffood(vegetables,fruits,protein,dairy,andgrain),aswellasmixedfoodscontainingtheseingredients.Ourselectionaimedtofollowthismodel,toincluderealisticfoodsthatappearindailyconsumption,andtobeconcisesoparticipantsengagewiththequiz.

ThefoodportionsselectedaresummarizedinTable1.Theimageswereorderedsothateachfoodtypewasmaximallyseparatedfromother instancesof its type,andthe order was the same for each participant. We collected nutrition informationaboutsomefooditemsfromofficialrestaurantwebsites.Whilethecaloriecontentof the foods pictured was not directly measured, US federal statute requires thepublishedcalorievaluesofrestaurantfooditemstobewithin20%ofactualcalorievalue[31].

Table1.Foodswerechosenforthequiztoattainmaximumcoverageoffoodtypesencountered in daily life by likely participants. Scaling refers to the presence ofreferenceobjectssuchascreditcards,whichcouldindicatefoodvolume.

Food TypeEnergy(kcal) Mass(g) Scaling? Source

Cheddarcheese dairy 200 51 no [11]

Goudacheese dairy 300 84 yes [1]

Page 5: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

avocado fruit 200 125 no [11]

kiwi fruit 200 328 no [11]

brownrice grain 420 297.7 no [2]

cereal grain 200 55 no [11]

ham meat 300 185.1 yes [1]

salami meat 300 72.9 yes [1]

redonion vegetable 200 475 no [11]

potato vegetable 100 141.7 no [6]

broccoli vegetable 200 588 no [11]

cauliflower vegetable 300 1200 yes [1]

cheeseburger mixed 270 104 no [3]

hotdog mixed 310 123 no [3]

greenteacake mixed 136 40 no [8]

longcheeseburger mixed 590 213 no [3]

pepperoni and sausagepizza mixed 240 97 no [10]

Swissroll mixed 251 96 no [5]

tunasandwich mixed 720 420 yes [7]

turkeysandwich mixed 510 254 yes [7]

We chose not to inform the participants of the sources of the images, in order toreducethepotentialthattheywouldsearchthewebfor“groundtruth”data,e.g.,bygoing to the actual Burger King website. Likewise, the participants were notexplicitly told that some images were from fast food restaurants, and thus morelikely to be subject to food engineering, e.g., replacing sugar or using sweetnessenhancers,oraddingwaterorprotein toenhance foodpropertiesandpalatability[47,50].Thefactthatthiswasnotexplicitlymentionedtotheparticipantsraisesthepossibility that participants might have considered these foods as "homemade",whichmayhaveinfluencedperceivedenergydensityandcalories.However,sinceamajorityofhamburgersareeatenatrestaurantsratherthanhomemade,judgments

Page 6: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

aboutengineered foodsareasormorerelevant thanhome-cooked foods forbothnaturalisticandapp-relatedpurposes.

PatternsandAnalysisThreemeasureswererelevanttoouranalysis:

1. Error, e, is estimated kilocalories (𝑐) minus ground truth kilocalories (c),𝑒 = 𝑐 − 𝑐,andpercenterror, η, iserrorasapercentageof thegroundtruthkilocalories,𝜂 = !!!

!, both of which are positive in overestimation and

negative in underestimation. Because of the variation in the ground truthkilocaloriesofthefoods,thelatterisamorereliableindicatorofthescaleofresponsebias.

2. Absolute error, |e|, measures accuracy irrespective of the direction ofestimationbias( 𝑒 = 𝑐 − 𝑐 ).

3. Discreteaccuracy,D,isthenumberofestimatesthatwerewithin20%ofthetrue calorie value (out of twenty estimates).

𝑑! =1, 𝑖𝑓0.8𝑐! ≤ 𝑐! ≤ 1.2𝑐!

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝐷 = 𝑑!

!"

!!!

Discreteaccuracywasthemeasurereportedtoquizparticipants.

Priortothisanalysis,weremovedparticipantswhoreportedaBMIlessthan15ormorethan50kg/m2(whichareunlikelytobecorrect),andparticipantswhodidnotreport their gender.Additionally,we eliminated responses of less than50kcal orgreaterthan800kcal,andkeptallremainingones.

Weanalyzedtheresultsofthesurveyusinglinearmixed-effectsmodelinginR[16,43], allowing regression with random intercepts for both participants and foodssimultaneously.TheR2valuesaretheproportionofthevarianceinthedatathatisdescribedbythemodels'predictedvalues.Forallanalyses,aPvaluelessthanα=.05wasconsideredindicativeofastatisticallysignificantrelation.

ResultsIn total, 2,125 individuals participated in our reddit quiz. After removing 97participants with missing or invalid demographic data, 2,028 individuals wereincludedintheanalysis.

ParticipantDemographicsThe demographics of the participants are summarized in Figure 2. Although wecollected no location data, an earlier study, again recruiting from the SampleSize

Page 7: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

subreddit, found that 67% of participants reported a location within the UnitedStates[18,51],aratethatissimilarthe64%reportedinanothervoluntarysurveywith participants from across reddit [51]. We also have a higher percentage offemaleparticipants than theUS average, anda larger fractionofpeoplewithBMIaround 25 kg/m2. It is possible that the participants in our quiz were moreinterested in this topic than the average person. However, in their self-selection,they are more demographically similar than the average person to likelycrowdsourcingannotatorsforpotentialfutureappdevelopment.

Figure 2. Demographic data from our quiz is compared to data from NHANES

(NationalHealthandNutritionExaminationSurvey).

ParticipantFeedback

TheparticipantsvolunteeredtheirBMIandotherdemographicinformation,and18participantsleft31commentsontheredditthread.Table2summarizesthetypesoffeedbackcommentswereceived,aswellassomeexamples.

Table 2. Representative comments from the reddit post of the calorie estimationquiz.

Type Example

Page 8: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

fun“Thatwasfun!Ithinkthefolksin‘loseIt’[anothersubreddit]andonvariousMFP[MyFitnessPal]forumswouldenjoytakingthis,too.”

surprise

“I’mreallyreallydoubtfulthatburgerisonly270cal.”

“[N]owayaretworedonions200calories.”

units “[C]ountriesotherthantheUSusetheactualunitofenergy-Joules”

scale

“Itwouldhavebeengreattohavearulernexttothefood.”

“[I]fyoushowmeaplateofrice,Ican’tguesshowmuchriceareontheplatebecauseIdon’tknowhowbigtheplateis.”

difficulty

“Shoot,got1rightoutof20LOL.NowondermyBMIis29.”

“Idontknowif therewasmayoon[thesubmarinesandwiches]ornot,whichchangesthingsalot.”

The feedback from the participants demonstrates engagement, interest, andcuriosity.Thisimpliesthatsuchtaskscouldbelegitimatelygamified(applyinggamemechanics andgamedesign techniques to engage andmotivatepeople to achievetheirgoals).ItalsoshowsthatunlikeMechanicalTurkparticipants,theparticipantsinourstudywereengagedandmotivatedbyintrinsicinterest.

NotethatourworkaddressessomeoftherequestsshowninTable2.Forexample,wefoundnoincreasedaccuracyfromthepresenceofreferenceobjectsforscaleinthepictures.

Howgoodarepeopleatestimatingcalories?Theparticipants' estimateshad amean absolute error (|e|) of 58% (136kcal). Interms of discrete accuracy (D), the mean participant answered 5.15 questionscorrectlyoutof20.Figure3showsthedistributionofcorrectresponses.Absoluteerrorvariedconsiderablybyitemfromthemostaccurateitem—aturkeysandwich,with a mean absolute error of 23.0% (39 kcal)—to the least—green tea cake, at241% (327 kcal) absolute error. Figure 4 illustrates the variety of estimates andpercenterror (η)distributions fordifferent items.Together, these facts show thathuman calorie estimates are both inaccurate overall and inconsistent in theirinaccuracies.

Page 9: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

Figure3.Ahistogramofthenumberofcorrectestimateseachparticipantmade.See

PatternsandAnalysisforthedefinitionofthismeasure,D.

Doesthewisdomofthecrowdphenomenonapplyhere?A consensus formed rapidly for each food, as shown in Figure 5 (see the dashedorange line in the figure), so that ten responses gave a very good estimate of thenext1,000responses.Infact,abootstrapsignificancetestshowsthattheaverageof10 randomly selectedparticipants’ guesses is nomore (or less) accurate than theaverage of those of 1,000 randomparticipants (P=.358).Moreover, the consensusresponseshadgreaterdiscreteaccuracy(D)thanthatoftheindividualparticipants,achieving7correctresponsesoutof20,a36%relativeimprovementoverthe5.15correct among individual participants. This result is consistent with previousstudiesdemonstratingthewisdomofthecrowd,inwhichtheaccuracyofconsensusjudgmentsexceedsthatofindividualjudgments(seeComparisonwithPriorWork).

Another important observation is that, although error was high for individualresponses and individual foods, the bias in the errors was low overall across allquestions, such that the median of the error across items and participants is 0(when using crowdsourcing over 2,028 participants). While this result is notactionableinitself,sinceitisaveragedacrossallquestions,itdoesdemonstratethepowerofcrowdstoconvergetowardhigh-accuracyjudgments.

Dothenutritionalexpertsoutperformthecrowd?Inadditiontoredditors,wesolicitedparticipationfromfivenutritionalexperts.WerecruitedfacultyonavoluntarybasisfromtheDepartmentofNutritionalScienceatthe University of Arizona and the School of Nutrition and Health Promotion atArizonaStateUniversity.Somewhatsurprisingly,neithertheabsoluteerroroftheirresponses nor their discrete accuracywas statistically different from those of theaveragenonexpertparticipant (P=.195). In fact, a small crowdofonly2 randomly

Page 10: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

selected nonexperts was required to outperform the highest performing expert,achieving an average absolute error (|e|) of 119.3 (52.34%) compared with theexpert’s 130.2 (55.34%). Expert performance is shown in comparison withnonexpert performance in Figure 5. This result is consistent with the hypothesisthat the sources of error (e.g., erroneous volume estimation due to a notion oftypicalportionsize)applyequally toexpertsandnonexperts.Priorwork inmanydomains of estimation has supported the notion that a relatively small group ofnonexpertscanestimatejustaswellasasingleexpert[54,57](seealsoComparisonwithPriorWork).

Figure4. (Top)Calorieestimates for20 food items.Foreach food item, theviolinplot represents the distribution of the calorie estimates by the participants. Thebottom and top of the boxes and the red band represent the first and the thirdquartile, and the mean of the calorie estimates, respectively. The green bandrepresentstheactualcalorievalueforeachfooditem.Theactualcaloriesareshownin parentheses next to each food item. (Bottom) Percent error (𝜂) for each fooditem.Thebottomandtopoftheboxesandtheredbandrepresentthefirstandthethirdquartile,andthemeanoftheerrorrates,respectively.

Page 11: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

Doeshavinganobjectforscaleinthepicturehelp?Several comments in the reddit thread expressed the hypothesis that picturesfeaturing standard-sized reference object (such as a credit card) were easier toanswer. The results showed that reference objects, far from aiding estimation,increasedabsoluteerror(|e|)byamean4.6kcal(P=.014,R2=.31).Ourhypothesisisthatparticipantsusedbackgroundknowledgeaboutthetypicalsizeoffoodstoscalefoods, butwere not able to profit from comparison against the reference objects.This is statistically significant evidence for the notion that scale information doesnotaidcalorieestimationindigitalimages(compare[68]).However,itisimportanttonotethatthiswasapost-hocanalysisonly;theexperimentwasnotdesignedtoanalyze this hypothesis. For example, we included objects that come in manydifferent sizes, e.g., forks, as referenceobjects,whichmayhave confused thequiztakers.We leaveamorecarefulevaluationof thisparticularobservationas futurework.

Figure5.Meanestimatesforeachfoodasmoreparticipantsareaddedshowthataconsensusformsrapidly.Thedottedbluelinesshowthetruecalorievalueforeachfood. The x-axis uses a logarithmic scale. The orange dashed line indicates theestimates of non-experts. The green continuous line represents the estimates ofnutrition science experts.Note that the range of acceptable calorie estimateswas50–800calories,foreachfooditem.

Doesenergydensityoffoodspredictestimationerror?AsshowninFigure6,thecaloriccontentofenergy-densefoodswassystematicallyoverestimated, and that of energy-sparse foods underestimated, as measured by

Page 12: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

error(e,P=.024,R2=.57).ThisbiasissimilartoonefoundbyAlmiron-Roigetal.[13]inestimating in-personportionsizes,andcouldreflect twonon-exclusivesources.First, it could result from the perceived healthiness of the food items [30]. Forexample, broccoli is a prototypically healthy food, but is not devoid of calories;conversely,prototypicallyunhealthy foods suchas cheeseburgershaveoftenbeen“engineered”forlowcalories[50].ThisexplanationalignswiththeresultsofCarelset al. [23], who found that college students overestimated the caloric content offoodsconsideredtobeunhealthywhiletheyunderestimatedtheamountofcaloriesin healthy foods. Second, the bias could result froman assumption that the itemswould have a similar weight to one another, when in fact there was an inverserelationship between the energy density and weight of the items (Pearsoncorrelation: ρ=–0.70). We hypothesize that inelastic adjustment of portion sizeaccordingtoenergydensitycouldcontributetoobesity.

Figure 6. Participants underestimated the calorie content of calorie-sparse foodsandoverestimatedthatofcalorie-richfoods.

DoesBMIpredictestimationerrors?BMIitselfdoesnotpredictaccuracyorbiasinthisdata,similartoBlakeetal.[21]and Chandon and Wansink [25]. Other studies show that overweight and obeseindividuals consistently under-report calorie intake to a greater degree thannon-overweight individuals [14, 42]. However, BMI does significantly interact withenergydensityinpredictingpercenterror(η,P=.002,R2=.57),suchthatthehigheraparticipant's BMI, the more they exaggerated the calorie content of calorie-rich

Page 13: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

foods. We hypothesize that overweight individuals are more sensitive toperceptionsoffood.

Dogenderandagepredictestimationerrors?No biasing effect (toward underestimation, for example)was found, but absoluteerror (|e|) was greater for men than for women (P<.001, R2=.31), similar to theportionjudgmentresultbyAlmiron-Roigetal.[13].Additionally,theabsoluteerrorwasgreaterforolderparticipants(P<.001,R2=.31),buttheseeffectsdidnotinteract.Figure7summarizesthesedifferences.Wehypothesizethattheprimaryreasonforthese differences is cultural, reflecting gender norms and the relatively recentculturalemphasisoncaloriesasameasureofhealthiness.

Figure 7. The absolute error (|e|) of participants differs by gender (top) and age(bottom).Boxedgesshowthefirstandthirdquartilesandaresplitbythemedian.Theboxes’whiskersextendtothefarthestpointwithin1.5timestheinterquartilerange from the box ends. The notches denote the 95% confidence interval of the

Page 14: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

median.They-axisisonasquare-rootscale.

Doestimationerrorsclusterbyfoodtype?The over- and underestimation errors for some foods correlate with those forothers. For example, a participant who underestimates the calories in broccoli islikely todo so for cauliflower aswell. Figure8 shows an automatically generatedmap[41]illustratingthesecorrelations,withclustersshowingsimilarsubnetworks.A largermapwithmore food itemswouldbea strongbasis forpredictinghumanbiasonclustersoffoodtypes(e.g.,vegetables).

Figure 8. This map shows a network of food items in our survey based oncorrelationofestimationerrors.Pairsofstronglycorrelatedfoodsareconnectedbyedges. The stronger the correlation, the closer the two (distances are inverse tocorrelation)are.Clustersshowgroupsofsimilarsubnetworks.

Discussion

PrincipalResultsTheaboveanalysis identifiesseveralpatterns thatare important for thedesignofcalorieestimationapps.

First and foremost, our study demonstrates that individuals are poor judges ofcalorie content in images, andpriorworkhas shown that theyarepoor judgesofportionsizeinreal-lifesituations(seenextsubsection).Thissuggeststheutilityofamachine-learningapproachtocalorieestimationtofacilitatemealplanning.Keeping

Page 15: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

trackofcaloriesbydescribingfoodsandguessingquantitiesandvaluesisatediousandinaccuratestrategy,yetitistheonemostcommonlyusedinappstoday.Giventhat “a picture isworth a thousandwords,” our initial hypothesiswas that usingimages (rather than descriptions of foods) should lead to better estimates. Ourresults, however, do not support this hypothesis: on average, participantsperformed poorly at estimating the amount of calories in pictures of food,answering 5 of 20 questions correctly on average. Our analysis indicates thatparticipants inourdataset tendedtoexaggeratecommondietaryknowledge: theyunderestimated theamountof calories in energy-sparse foods, andoverestimatedtheminenergy-denseones.

Our relatedworkdiscussion (ComparisonwithPriorWork,below)highlights thatestimating calories using machine learning remains an open research problem.However,ourworksuggeststhatsuchappscouldtakeadvantageofthewisdomofthecrowdforestimation.Weshowedthatthecrowdperformsbetterthanexperts,onaverage,evenwhenthecrowdissmall.Thissuggeststhatthisannotationcouldbeimplementedaccuratelyandatlowcost.

The results suggest that for apps that focus on caloriemonitoring (including self-reporting), itmight be a good idea to characterize users' demographic data (age,gender, and BMI) shown to influence the accuracy of calories estimates, eitherdirectlyorwhencombinedwithotherfactorssuchasenergydensity.

We identified additional patterns that simplify the design and implementation ofcalorie-tracking apps. The first such pattern is that scale information does notimproveestimationaccuracy.Thesecond is thatestimationerrorsclusterby foodtypes,whichindicatesthattheappmayextrapolateuserpatternsbetweenfoodsinthesamegroup.

Itisimportanttonotethattheobservationsofthisstudyarestatisticallysignificantand applicable to the population of interest to us (i.e., individuals likely toparticipateincrowdsourcedannotations).Thispopulationisconsiderablyyoungerthan the US population (χ2=3363, P<.001), and contains more womenproportionally(χ2=80.98,P<.001).Infuturework,weaimtorepeatthisstudyforalarger population thatmatches known demographics to verify the validity of ouranalysisonsuchpopulations.

ComparisonwithPriorWorkRelatedwork includespriorwork innutritional sciences,machine learning, imageprocessing,andcrowdsourcing.Wereviewasmallbutrepresentativesubsetbelow.

NutritionanddietBandini et al. [15] andSchoeller et al. [53]have reported that individuals tend toselectively under-report the energy intake when these data aremanually logged.Thisseemstobeespeciallytrueforoverweightandobeseindividuals[14,42],andcouldbeassociatedwithafailuretoaccuratelyestimateportions,althoughBlakeetal.[21]andChandonandWansink[25]foundthatBMIdoesnotcorrelatewiththe

Page 16: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

abilitytoestimatecalorieswhenthistaskisconductedinperson.Portionestimationof in-person food remains poor, whether in reference to images on computerscreensoronprintedimages[34].However,calorieestimationoflargemealsmaybeworsethanthatofsmallmeals[56].

To monitor dietary intake more accurately, third-party automated food analysissystemshave been proposed.Martin et al. [46] use the remote food photographymethod(RFPM),whichrequiresindividualstouploadthreepictureswhenhavingameal: theplateof the foodsselectedbyan individual, standardportionsofknownquantities of the foods, and the leftovers. These pictures are sent to traineddietitians who verify portions with participants, and analyze this data using astandardized nutrient database. This approach relies on the judgment of trainednutrition professionals, and argues for the validity of RFPM. Providing all threepicturesforeachmealisachallenge,asindicatedbyWilliamsonetal.[58].Beltranetal.[19]testedthereliabilityoftheeButtonsystem,inwhichacamerawornonthechest records images continuously. The images are captured passively while theparticipantgoesabouttheirday,butsuchasystemstillrequiresexpertstoidentifyfoods in the images, and confirm them with participants. Similar to the RFPMemployedbyMartin et al. [46], the eButton system requires valid pictures beforeandaftereachmeal,cameraplacementatacertainangle,andproperlighting.Whilepromising,suchsystemsareunlikely toscale to themillionsofpeoplewhowouldliketoaccuratelytracktheirnutritionalintake.

MachinelearningandimageprocessingGiven the challenges of the systems described above, a system that canautomaticallymeasurecaloriesinpicturesoffoodwouldbeingreatdemand.Imageprocessing techniques can be used to recognize food in images, and machinelearningcanbeusedtoestimatethecaloriesinthefood.

Menu-Match[17]usesadatabaseofrestaurantsandGPSlocations,andattemptstoguesswhatisinthepicture,usingimagefeaturessuchascolorandscale-invariantfeature transforms [44]. It has not been made available to the general public.Im2Calories [48] is built on the work of menu-match. A multi-label classifier istrainedona collectionof imagesof food.Theapp locates the restaurant auser isdining in and, given an image from the user, the classifier (running on the user'sphone)guesseswhichfoodsarepresentinthemeal.Lookingupthenutritionalfactsprovidedbytherestaurant,usingtheresultingestimates,yieldsgoodresults.Note,however,thatIm2Calorieshasnotbeenmadeavailabletothegeneralpublicorevenforresearchpurposes.

Bettadapura et al. [20] show that food recognition using location data improvesaccuracy. Such systems, however, are inherently limited to the restaurantswhosemenusareinthedatabase.Thesealsoassumethatmenusdonotchangeoften,andthat thevolumeof food is thesame fromplate toplate. In reality,mostmealsareeateneitheroutsideofrestaurantsorinrestaurantswhosemenusarenotincludedinsomedataset.The“inthewild”problemismorenatural,butalsomoredifficult.

Page 17: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

ThewebapplicationandappFoodlog[39,40]dividesfoodimagesinto300blockseachandextractsDiscreteCosineTransform(DCT)coefficientsandcolorhistogramfrom each block. Using this data, Foodlog classifies the food into five categoriesaccording to the USDA's My Pyramid system. Experimental results report 88%accuracyintheextractionoffoodand73%accuracyinfoodbalanceestimation.TheFoodCam system [38] segments the region of each food by GrabCut (an imagesegmentationapproachbasedoniterativegraph-cuts)[52],extractsimagefeaturesof histogram of oriented gradients (HOG) [27] and color patches with the FisherVector(animagerepresentationobtainedbypoolinglocalimagefeatures)[26]andfinally classifies it into one of 100 food categories using linear support vectormachines(SVM).

Withtheexceptionof Im2Calories, thesystemsaboveachieverelativelygoodfoodrecognition, but without volume estimation. To estimate volume, Chae et al. [24]minimizethefalse-segmentedregions,smooththesegmentationboundariesoffood,andreconstruct3Dprimitiveshapesfromasinglefoodimage.Heetal.[33]estimatetheweight of food given a single image using a shape template for regular-shapefoods and area-based weight estimation for irregularly shaped food. TheIm2Caloriessystem[48]estimates thedistanceofeverypixel fromthecamerabyusingaconvolutionalneuralnet(CNN)architecture,convertsthedepthmap[4]intoa voxel representation, and estimates the volume of the food. Although suchapproachesareeffective,thereisnoapplicationforestimationoffoodvolumesthatisavailabletothegeneralpublic.

CrowdsourcingCrowdsourcing sometimesmakes itpossible tousemultiplenonexpert judgmentstoapproachthehighqualityofexpertannotation[22].Surowiecki[54]arguesthatinmany instances, the averagenonexpert estimates canevenoutperforma singleexpert.Watsonhasshownthattheaverageoftheindividualjudgmentscanbeequalorsuperiortothejudgmentofthebestindividualwithinthegroup[57].Moreover,the validity of judgments increases with more judges [32]. The strength of thewisdom of the crowd overmachine learning is well understood and exploited inindustry. For example, CardMunch (now a service of Evernote [60]) usescrowdsourcingwithAmazonMechanicalTurktoconvertpicturesofbusinesscardsinto digital contact information. Eloquent Labs [61] uses amix of crowdsourcingwithanartificial intelligenceto implementaconversationalassistantforcustomerservice.

In thenutritiondomain,Mamykinaet al. [45] show that crowdsourced ingredientannotations fromfood imagesare improvedbyexpertannotationandbyshowingthe annotators previous annotations of the images. The PlateMate [49] appleverages crowdsourcing to implement the first step in the Remote FoodPhotographyMethod.Rather than typingnamesof foods andestimatingportions,userstakephotographsoftheirplatesbothatthebeginningofthemealandattheend to accurately capture how much food was actually eaten. PlateMate usesannotations from nonexpert Amazon Mechanical Turk workers instead of expertdietitianstoestimatethecompositionoffoodsinstaticimages.PlateMate'sresults

Page 18: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

are as accurate as the experts. Similarly, the Im2Calories [48] project usescrowdsourcing to annotate all the food terms that apply to an image. Manuallymerging synonymous terms, they create the Food201 multi-label dataset fortraining. Compared to the original Food101 classes, the new classes of Food201-MultiLabel do better according to mean average precision, since they oftencorrespondtosidedishesorsmallfooditems.

In sum, despite the abundance of interest in this and related topics, includingcalorie-trackingappswithmanualentry,thereexistsnopubliclyavailableappthatwillaccuratelyestimatecaloriesfromasingleimage.Likewise,whiletherearemanystudies of human bias in tracking calories and lack of skill in estimating portionsizes, no previouswork establishes the accuracy and biases of crowdsourcing forcalorieestimation,orwhatdemographicfactorsmightcorrelatewithaccuracy.

Learning from our study, we envision a very simple app, where the only actionrequiredfromtheuseristotakeapictureofherorhisfood.Theestimationlogic,drivenbythewisdomofthecrowdandmachinelearning,wouldbetransparenttotheuser,i.e.,itwouldbetriggeredautomaticallywhenthecameraisused.Thelogicincludes: (a)detecting if thepicture is apictureof foodusing image classification[36, 37], and (b) routing the image for crowdannotations (similar toCardMunch,which routes the task of processing images of business cards to the crowd).Wehope that this simplicity will yield wide adoption, which in turn will lead tomeasurableeffectsindietarychoices.

ConclusionsWedescribed a studymeasuring the ability of over two thousands individuals toestimatecalories in20picturesof foodchosen tocapture thebuildingblocksofahealthydiet[9].Webelievethisstudyshouldbereadasananalysisthatdrivesthedesign of future food-related apps, with additional impacts on crowdsourcingstrategiesandthedesignofhuman-computerinterfaces.

Ouranalysisconfirmssomeearlierobservations(e.g.,calorieestimationisadifficulttask,evenfortheexperts),andoffersnewinsights:

1. Evenasmallcrowdoftwononexpertsachievescalorieestimationaccuracygreater than that of the expert annotators. This suggests that semi-automated food labeling apps can be implemented at a low cost byharnessingthewisdomofthecrowd,evenwhenthecrowdissmall.Notethatsome prior approaches in this space, such as PlateMate [49], usecrowdsourcing to provide calorie information to users. To the best of ourknowledge,thecrowdsourcingmethodhadneverbeentestedasasourceofdataforalgorithmiccalorieestimationbefore.

2. We found new type-of-food effects, with energy-dense foods (such ashamburgers) being consistently over-estimated and energy-sparse foods(such as broccoli) consistently underestimated. Future crowdsourcing (ormachine learning)projects aiming to annotate food for calorie contentwillbenefitfromcorrectionusingthesebiases.

Page 19: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

3. We found the absence of some expected correlations. For example, thepresenceofreferenceobjectsforscaledoesnotimproveaccuracybutratherslightlydecreasesaccuracy,andtheBodyMassIndexisnotcorrelatedwithaccuracy.Theseobservations impactthedesignof interfaces forannotationapps,aswellasdatacollectionprotocols.

Allinall,thisworksuggeststhatcalorie-estimationappsareneededandcanbebuiltat low cost (e.g., using small annotator groups, and without the overhead ofincludingreferenceobjectsinimages,orcontrollingfortheBMIofusers).

Several interesting research questions remain. First, given the low calorieestimation accuracy (5 out of 20), and some clear patterns (underestimating“healthy" foodsandoverestimating“unhealthy” foods) it isnatural toaskwhethersimpletrainingwithfeedbackcanhelpimproveaccuracyfornonexperts.Ifso,howmuchtraining isrequired,whatgains inaccuracycanbeobtained,andhowmuchfurthercan thecrowdboost theresults?Second,canwe factor inbiases(e.g.,age,gender) to obtain better crowdsourced prediction? Third, can better (moreconsistent) referenceobjects lead to improvements inaccuracy?Fourth,assumingthebaselineaccuracyfor“simple”foods(e.g.,fruits,vegetables,sandwiches)canbeimproved with some of the ideas above, can we hope to tackle more difficultchallenges, such as amorphous foods (porridge, mashed potatoes) and liquids(soups, smoothies) inwhich ingredients and volume are less obvious? Lastly, butperhapsmost importantly,weaim toapply theknowledgegained fromthis studybeyondtheunderstandingofhow(orhowwell)peopleestimatecalories,toincludeassessmentofdietquality,whichhasbecomeadietaryconstructof interest inthepastfiveyears[55].Thischangehasoccurredbecausedietarypatternsanddietaryquality (e.g., increasednutrientdensity,nutrientdiversity,andnutrientadequacy)havebeenstronglyassociatedwithhealthanddiseaseoutcomes.Thisinformationprovidespotentiallymoremeaningfulmetricsthanamountofcalories(whichsaysnothingaboutthequalityor“healthiness”ofthefood)whenprovidingparticipantsorpatientswithfeedback.

Webelievethisstudyshouldbereadasananalysisthatinformsthedesignoffuturefood-related apps (in particular, apps that feature calorie estimation), withadditionalpotentialimpactsoncrowdsourcingstrategiesandthedesignofhuman-computerinterfaces.Ourfuturegoalistoprovideestimatesaboutjudgingcaloriesfrom images for the purpose of mass annotation (e.g., in support of a calorie-estimationapp),which,inturn,ispartofalargersystemthatanalyzestext,images,and videos to estimate risk of diet-sensitive diseases such as type 2 diabetesmellitus[18,69].

ConflictsofInterestNonedeclared.

AbbreviationsBMI:bodymassindex(kg/m2)

Page 20: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

ML:machinelearning

USDA:UnitedStatesDepartmentofAgriculture

References1. World Health Organization. 2017. Obesity and overweight.

http://www.who.int/mediacentre/factsheets/fs311/en/. Archived at:http://www.webcitation.org/6tg2Yf7s5

2. HealthAssist. 2017. 300CalorieFoodPictureGallery.http://www.healthassist.net/food/300kcal/300.shtml. Archived at:http://www.webcitation.org/6tdAdUSkj

3. Panda Express. 2017. Brown Steamed Rice—Panda Express Chinese Restaurant.https://www.pandaexpress.com/menu/sides/brown-steamed-rice. Archived at:http://www.webcitation.org/6tbsukhFK

4. Burger King. 2017. Burger King. https://www.bk.com/. Archived at:http://www.webcitation.org/6tbqVFj2e

5. Slism. 2017. Food nutrients can be seen calorie calculation at a glance.http://calorie.slism.jp/200528/. Archived at:http://www.webcitation.org/6tbt9Z9Ym

6. Food Network. 2017. Foods with 100 Calories.https://www.foodnetwork.com/healthy/photos/foods-with-100-calories.html.Archivedat:http://www.webcitation.org/6tbqxkJsp

7. Jimmy John’s. 2017. Jimmy John’s Gourmet Sandwiches.https://www.jimmyjohns.com/. Archived at:http://www.webcitation.org/6tbr3E3nW

8. Wit Co., Ltd. 2017. Matcha calorie calculation of cake.http://www.asken.jp/calculate/meal/94371. Archived at:http://www.webcitation.org/6tbs116BG

9. United States Department of Agriculture. 2017. MyPlate Model.https://www.choosemyplate.gov/MyPlate. Archived at:http://www.webcitation.org/6tbsR3lvJ

10. PapaJohn’sPizza.2017.PapaJohn’sPizza.https://www.papajohns.com/.Archivedathttp://www.webcitation.org/6tbsSuImT

11. wiseGEEK. 2017. What does 200 Calories Look Like?http://www.wisegeek.com/what-does-200-calories-look-like.htm. Archived at:http://www.webcitation.org/6tbsWxbS1

12. Holleman F, Gale E. Diagnosis and Classification of diabetes mellitus (revisionnumber21).Diapedia.2014.doi:10.14496/dia.11040851106.21.

13. Almiron-RoigE,Solis-TrapalaI,DoddJ,JebbSA.Estimatingfoodportions.Influenceof unit number, meal type and energy density. Appetite. 2013;71:95-103.doi:10.1016/j.appet.2013.07.012.

14. Bailey RL, Mitchell DC, Miller C, Smiciklas-Wright H. Assessing the effect ofunderreportingenergyintakeondietarypatternsandweightstatus.JAmDietAssoc.2007;107(1):64-71.doi:10.1016/j.jada.2006.10.009

15. Bandini,LG,Schoeller,DA,Cyr,HN,Dietz,WH.Validityofreportedenergyintakeinobeseandnonobeseadolescents.AmJClinNutr.1990;52(3):421-425.

16. BatesD,MächlerM,BolkerB,WalkerS.FittingLinearMixed-EffectsModelsUsinglme4.JStatSoftw.2015;67(1).doi:10.18637/jss.v067.i01.

Page 21: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

17. Beijbom,Oscar,etal.“Menu-Match:Restaurant-SpecificFoodLoggingfromImages.”2015 IEEE Winter Conference on Applications of Computer Vision, 2015,doi:10.1109/wacv.2015.117.

18. BellD,FriedD,HuangfuL,SurdeanuM,KobourovS.Towardsusingsocialmediatoidentifyindividualsatriskforpreventablechronicillness.ProceedingsoftheTenthInternationalConferenceonLanguageResourcesandEvaluation(LREC2016);2016May23-28;Portorož,Slovenia;2016.

19. Beltran A, Dadabhoy H, Chen TA, Lin C, Jia W, Baranowski J, Yan G, Sun M,BaranowskiT.AdaptingtheeButtontotheabilitiesofchildrenfordietassessment.ProceedingsofMeasuringBehavior2016;2016May25-27;Dublin,Ireland;2016.

20. Bettadapura V, Thomaz E, Parnami A, Abowd GD, Essa I. Leveraging Context toSupportAutomatedFoodRecognitioninRestaurants.2015IEEEWinterConferenceonApplicationsofComputerVision.2015.doi:10.1109/wacv.2015.83.

21. Blake,AJ,Guthrie,HA,Smiciklas-Wright,H.Accuracyof foodportionestimationbyoverweightandnormal-weightsubjects.JAmDietAssoc.1989;89(7):962-964.

22. Brabham,DC.Crowdsourcingasamodel forproblemsolving:An introductionandcases.Convergence.2008;14(1):75-90.

23. CarelsRA,KonradK,HarperJ.Individualdifferencesinfoodperceptionsandcalorieestimation: An examination of dieting status, weight, and gender. Appetite.2007;49(2):450-458.doi:10.1016/j.appet.2007.02.009.

24. Chae J,Woo I,KimS,MaciejewskiR,ZhuF,DelpEJ,BousheyCJ,EbertDS.Volumeestimation using food specific shape templates in mobile image-based dietaryassessment. Proc SPIE Int Soc Opt Eng. 2011 Feb 7;7873:78730K. doi:10.1117/12.876669.

25. Chandon P, Wansink B. Is Obesity Caused by Calorie Underestimation? APsychophysical Model of Meal Size Estimation. J Mark Res. 2007;44(1):84-99.doi:10.1509/jmkr.44.1.84.

26. Csurka G, Perronnin F. Fisher vectors: Beyond bag-of-visual-words imagerepresentations. Communications in Computer and Information Science ComputerVision, Imaging and Computer Graphics Theory and Applications. 2011;28-42.doi:10.1007/978-3-642-25382-9_2.

27. Dalal N, Triggs B. Histograms of oriented gradients for human detection.Proceedingsofthe2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR'05);2005Jun20-26;1:886-893.

28. Duffey KJ, Popkin BM. Energy density, portion size, and eating occasions:ContributionstoincreasedenergyintakeintheUnitedStates,1977–2006.PLoSMed.2011;8(6):e1001050.doi:10.1371/journal.pmed.1001050.

29. Ello-Martin,JA,Ledikwe,JH,Rolls,BJ.Theinfluenceoffoodportionsizeandenergydensity on energy intake: Implications for weight management. Am J Clin Nutr2005;82(1):236S-241S.

30. Faulkner GP, Pourshahidi LK, Wallace JMW, Kerr MA, Mccaffrey TA, LivingstoneMBE. Perceived ‘healthiness’ of foods can influence consumers’ estimations ofenergy density and appropriate portion size. Int J Obes 2013;38(1):106-112.doi:10.1038/ijo.2013.69.

31. Food andDrug Administration, HHS. Food labeling; nutrition labeling of standardmenuitemsinrestaurantsandsimilarretailfoodestablishments.Finalrule.FedReg2014;79(230):71155.

32. Gordon K. Group judgments in the field of lifted weights. J Exp Psychol.1924;7(5):398-400.doi:10.1037/h0074666.

Page 22: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

33. He Y, Xu C, Khanna N, Boushey CJ, Delp EJ. Food image analysis: Segmentation,identification and weight estimation. 2013 IEEE International Conference onMultimediaandExpo(ICME).2013.doi:10.1109/icme.2013.6607548.

34. HernándezT,WilderL,KuehnD,RubotzkyK,Moser-VeillonP,GodwinS,ThompsonC,WangC.Portionsizeestimationandexpectationofaccuracy. JFoodCompAnal2006;19:S14-S21.

35. Huang TT-K, Roberts SB, Howarth NC, Mccrory MA. Effect of screening outimplausibleenergyintakereportsonrelationshipsbetweendietandBMI.ObesRes2005;13(7):1205-1217.doi:10.1038/oby.2005.143.

36. KagayaH,AizawaK.Highlyaccuratefood/non-foodimageclassificationbasedonadeepconvolutionalneuralnetwork.InternationalConferenceonImageAnalysisandProcessing; 2015 Sep 7; Genova, Italy. 2015:350-357. doi:10.1007/978-3-319-23222-5_43.

37. KagayaH,AizawaK,OgawaM.FoodDetectionandRecognitionUsingConvolutionalNeuralNetwork.ProceedingsoftheACMInternationalConferenceonMultimedia-MM14.2014.doi:10.1145/2647868.2654970.

38. KawanoY,YanaiK.FoodCam:Areal-timefoodrecognitionsystemonasmartphone.MultimedToolsAppl2014;74(14):5263-5287.doi:10.1007/s11042-014-2000-8

39. KitamuraK,YamasakiT,AizawaK.Foodlogbyanalyzingfoodimages.Proceedingofthe 16th ACM international conference on Multimedia - MM 08. 2008.doi:10.1145/1459359.1459548.

40. Kitamura K, Yamasaki T, AizawaK. FoodLog. Proceedings of the ACMmultimedia2009 workshop on Multimedia for cooking and eating activities - CEA 09. 2009.doi:10.1145/1630995.1631001.

41. KobourovSG,PupyrevS,SimonettoP.Visualizinggraphsasmapswithcontiguousregions.EurographicsConferenceonVisualization(EuroVis).2014.

42. Kretsch MJ, Fong AK, Green MW. Behavioral and Body Size Correlates of EnergyIntake Underreporting by Obese and Normal-weight Women. J Amer Diet Assoc1999;99(3):300-306.doi:10.1016/s0002-8223(99)00078-4.

43. KuznetsovaA, BrockhoPB, ChristensenRHB. lmerTest: Tests in linearmix effectsmodels.2016.

44. LoweD. Object recognition from local scale-invariant features. Proceedings of theSeventh IEEE International Conference on Computer Vision. 1999.doi:10.1109/iccv.1999.790410.

45. Mamykina L, Smyth TN, Dimond JP, Gajos KZ. Learning from the crowd:Observationallearningincrowdsourcingcommunities.Proceedingsofthe2016CHIConference on Human Factors in Computing Systems - CHI 16. 2016.doi:10.1145/2858036.2858560.

46. MartinCK,HanH,CoulonSM,AllenHR,ChampagneCM,AntonSD.Anovelmethodtoremotelymeasure food intakeof free-living individuals inreal time: theremotefood photography method. Br J Nutr 2008;101(03):446.doi:10.1017/s0007114508027438.

47. Meister K, Doyle ME. 2009. Obesity and Food Technology. American Council onScience and Health. https://www.scribd.com/document/37170221/. Archived athttp://www.webcitation.org/6tfsQIMLD.

48. MyersA, JohnstonN,RathodV, et al. Im2Calories: Towards anAutomatedMobileVisionFoodDiary.2015IEEEInternationalConferenceonComputerVision(ICCV).2015.doi:10.1109/iccv.2015.146.

Page 23: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

49. NoronhaJ,HysenE,ZhangH,GajosKZ.Platemate.Proceedingsof the24thAnnualACM symposium on User interface software and technology - UIST 11. 2011.doi:10.1145/2047196.2047198.

50. Pérez-EscamillaR,ObbagyJE,AltmanJM,EsseryEV,McGraneMM,WongYP,SpahnJM,WilliamsCL.Dietaryenergydensityandbodyweight inadultsandchildren:asystematicreview.JAcadNutrDiet2012;112(5):671-84.

51. Reddit. 2011. Who in the World is reddit? Results are in....https://redditblog.com/2011/09/12/who-in-the-world-is-reddit-results-are-in/.Archivedat:http://www.webcitation.org/6tbvF5LJE.

52. RotherC,KolmogorovV,BlakeA.Grabcut: Interactive foregroundextractionusingiterated graph cuts. ACM SIGGRAPH 2004 Papers on - SIGGRAPH 04. 2004.doi:10.1145/1186562.1015720.

53. SchoellerDA,BandiniLG,DietzWH. Inaccuracies in self-reported intake identifiedby comparison with the doubly labelled water method. Can J Phys Pharmacol1990;68(7):941-949.doi:10.1139/y90-143.

54. Surowiecki J.Thewisdomofcrowds:Whythemanyaresmarter thanthe fewandhow collectivewisdom shapes business, economies, societies and nations. 1st ed.NewYork:Doubleday;2004.ISBN:978-0-385-50386-0.

55. Tapsell LC, Neale EP, Satija A, Hu FB. Foods, nutrients, and dietary patterns:Interconnectionsand implications fordietaryguidelines.AdvNutr2016;7(3):445-454.doi:10.3945/an.115.011718.

56. WansinkB, ChandonP.Meal size, not body size, explains errors in estimating thecalorie content of meals. Ann Intern Med 2006;145(5):326. doi:10.7326/0003-4819-145-5-200609050-00005.

57. WatsonGB.Dogroupsthinkmoreefficientlythanindividuals?JAbnormSocPsychol.1928;23(3):328-336.doi:10.1037/h0072661.

58. Williamson DA, Allen HR, Martin PD, Alfonso A, Gerald B, Hunt A. Digitalphotography: A new method for estimating food intake in cafeteria settings. EatWeightDisord.2004;9(1):24-28.doi:10.1007/bf03325041.

59. Reddit. 2017. /r/SampleSize: Where your opinions actually matter!https://www.reddit.com/r/SampleSize/. Archived at:https://www.webcitation.org/6tg5V8wTB

60. Evernote. 2017. Get organized. Work smarter. Remember everything.http://evernote.com.Archivedat:http://www.webcitation.org/6udUVuy9M

61. Eloquent Labs. 2017. Eloquent Labs. http://eloquent.ai. Archivedathttp://www.webcitation.org/6uhsMCtST

62. Galton F. Vox populi (The wisdom of crowds). Nature 1907;75(7):450-451.doi:10.1038/075450a0.

63. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR,Aderhold A, Bonneau R, Chen Y, Collins JJ. Wisdom of crowds for robust genenetworkinference.Natmethods2002;9(8):796-806.doi:10.1038/nmeth.2016.

64. Yi SKM, Steyvers M, Lee MD, Dry MJ. The wisdom of the crowd in combinatorialproblems.Cognsci2012;36(3):452-470.doi:10.1111/j.1551-6709.2011.01223.x.

65. Carels RA, Harper J, Konrad K. Qualitative perceptions and caloric estimations ofhealthy and unhealthy foods by behavioral weight loss participants. Appetite2006;46(2)199-206.doi:10.1016/j.appet.2005.12.002.

66. Block JP,CondonSK,KleinmanK,Mullen J,LinakisS,Rifas-ShimanS,GillmanMW.Consumers’ estimation of calorie content at fast food restaurants: cross sectionalobservationalstudy.BMJ2013;346:f2907.doi:10.1136/bmj.f2907.

Page 24: A Study of Calorie Estimation in Pictures of Foodclulab.cs.arizona.edu/papers/Zhou2018.pdf · A Study of Calorie Estimation in Pictures of Food Jun Zhou1, Dane Bell2, Sabrina Nusrat3,

67. Brown RE, Canning KL, Fung M, Jiandani D, Riddell MC, Macpherson AK, Kuk JL.CalorieEstimationinAdultsDifferinginBodyWeightClassandWeightLossStatus.MedSciSportsExerc2016;48(3):521-526.doi:10.1249/MSS.0000000000000796

68. HernándezT,WilderL,KuehnD,RubotzkyK,Moser-VeillonP,GodwinS,ThompsonC,WangC.Portionsizeestimationandexpectationofaccuracy.JFoodComposAnal2006;19:S14-S21.doi:10.1016/j.jfca.2006.02.010

69. RainsSA,HingleMD,SurdeanuM,BellD,KobourovS.ATestofTheRiskPerceptionAttitudeFrameworkasaMessageTailoringStrategytoPromoteDiabetesScreening.HealthComm2018;0(0):1-8.doi:10.1080/10410236.2018.1431024