multi-task learning of dish detection and calorie estimation · dish detection and calorie...

1
P Multi-task Learning of Dish Detection and Calorie Estimation Takumi Ege and Keiji Yanai The University of Electro-Communications, Tokyo Background Method [1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015. [2] T. Ege and K. Yanai. Simultaneous estimation of food categories and calories with multi-task cnn. In Proc. of IAPR International Conference on Machine Vision Applications(MVA), 2017. [3] Y. Matsuda, H. Hajime, and K. Yanai. Recognition of multiple-food images by detecting candidate regions. In Proc. of IEEE International Conference on Multimedia and Expo, 2012. [4] K. Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556, 2014. Result Model mAP (%) Rel. err. Abs. err. Corr. 20%err. Detection (uecfood100) 34.1 --- --- --- --- Detection (uecfood100+calorie50) (a) 37.8 --- --- --- --- Calorie estimation (calorie50) (b) --- 27.1 91.8 80.7 50.9 Sequential model ((a)→(b)) --- 27.3 92.5 80.5 50.6 Detection + Calorie estimation (Ours) 37.7 26.6 89.4 81.0 50.7 Comparison of single-task and multi-task learning Conclusion Future work P Experiment Baseline (Single-task learning) UEC Food-100 dataset [3] with bounding boxes. Single-task learning of dish detection task with UECFood-100 dataset [3]. (a) Single-task learning of calorie estimation task with Calorie-50 dataset. (b) Sequential model ((a)→(b)) that detects each dish in a food image by (a) and the food calorie of each dish image based on detected results are estimated by (b). Our method (Multi-task learning) Examples of dish detection and food calorie estimation from multiple-dish food photos. (based on YOLOv2) We estimate food calories from multiple-dishes food photos. Multi-task learning of food detection and food calorie estimation. Image-based food calorie estimation based on amount of food. Construction of large-scale food photos dataset. Some meal recording app can estimate food calories. But they … Need user’s manual input of food categories and volumes. Estimate food calories for each dish one by one. Are paid service to hire nutritionists who estimate food calories. Grilled fish 221kcal Rice 168kcal Salad 35kcal Miso soup 60kcal Tofu 42kcal Purpose : Image-based food calorie estimation Multi-task learning of dish detection and food calorie estimation Conv layers ( Detection Network ) (x,y,w,h) (class) Bbox 1 Bbox B 1,2,3 C-1,C (calorie) Output feature map ( Ours ) High-speed and highly accurate CNN-based detection system. End-to-end learning of the whole system is possible. The architecture of network of SSD ([1]). 1. SSD : Wei Liu et. al. [1] 2016 ( Detection Net ) Simultaneous estimation of bounding boxes of food dishes and their calories. The output channels of estimated calories are added to the output features. We use the network for single-dish detection from multiple-dish photos. Image-based food calorie estimation with CNN. Regression based-method. Output food calories directly from single-dish food photos. We denote as an relative error and as a absolute error, is defined as follows: 2. Single-task CNN : Ege and Yanai [2] 2017 ( Calorie estimation Net ) Conv layers ( VGG16[4] ) 335 kcal Food calorie estimation fc6 fc8c Food calorie for one person. Single-dish Food Photo. = = is an estimated food calorie. is ground-truth. We use the network for food calorie estimation. Multi-task learning of dish detection and calorie estimation with both datasets. In training of UECFood-100 dataset, we use the detection loss. In training of Calorie-50 dataset, we use the calorie loss and detection loss. Calorie50 dataset with food calories. Pseudo-bounding boxes for Calorie50 dataset. Fried noodle 601 kcal(ES) Pilaf 495kcal(ES) Miso soup 102kcal(ES) Nikujaga 417kcal(ES) Potato salad 144kcal(ES) Curry 483kcal(ES) Potato salad 170kcal(ES) Gratin 456kcal(ES) Spaghetti 536kcal(ES) Fried noodle 539kcal(GT) Pilaf 475kcal(GT) Potato salad 169kcal(GT) Miso soup 74kcal(GT) Nikujaga 352kcal(GT) Gratin 450kcal(GT) Spaghetti 518kcal(GT) Curry 761kcal(GT) Potato salad 169kcal(GT) Potato salad 175 kcal(ES) Cold tofu 76kcal(ES) Stew 270kcal(ES) Gratin 482kcal(ES) Potato salad 147kcal(ES) Gratin 496kcal(ES) Miso soup 106kcal(ES) Cold tofu 89kcal(ES) Nikujaga 387kcal(ES) Nikujaga 450kcal(ES) Potato salad 169kcal(GT) Cold tofu 95kcal(GT) Nikujaga 352kcal(GT) Gratin 450kcal(GT) Stew 382kcal(GT) Cold tofu 95kcal(GT) Miso soup 74kcal(GT) Potato salad 169kcal(GT) Gratin 450kcal(GT) Nikujaga 352kcal(GT) Our method achieves high speed and save memory by simultaneous estimation in a single network. Our method achieves the calorie estimation from multiple-dish photos without degrading dish detection accuracy.

Upload: others

Post on 25-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-task Learning of Dish Detection and Calorie Estimation · Dish Detection and Calorie Estimation Takumi Ege and Keiji Yanai The University of Electro-Communications, Tokyo Background

P

Multi-task Learning of

Dish Detection and Calorie EstimationTakumi Ege and Keiji Yanai

The University of Electro-Communications, Tokyo

Background

Method

[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015.

[2] T. Ege and K. Yanai. Simultaneous estimation of food categories and calories with multi-task cnn. In Proc. of IAPR International Conference on Machine Vision Applications(MVA), 2017.

[3] Y. Matsuda, H. Hajime, and K. Yanai. Recognition of multiple-food images by detecting candidate regions. In Proc. of IEEE International Conference on Multimedia and Expo, 2012.

[4] K. Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556, 2014.

Result

Model mAP

(%)

Rel.

err.

Abs.

err.

Corr. ≤20%err.

Detection (uecfood100) 34.1 --- --- --- ---

Detection (uecfood100+calorie50) (a) 37.8 --- --- --- ---

Calorie estimation (calorie50) (b) --- 27.1 91.8 80.7 50.9

Sequential model ((a)→(b)) --- 27.3 92.5 80.5 50.6

Detection + Calorie estimation (Ours) 37.7 26.6 89.4 81.0 50.7

Comparison of single-task and multi-task learning

Conclusion

Future work

P

Experiment Baseline (Single-task learning)

UEC Food-100 dataset [3]with bounding boxes.

• Single-task learning of dish detection task with UECFood-100 dataset [3]. (a)

• Single-task learning of calorie estimation task with Calorie-50 dataset. (b)

• Sequential model ((a)→(b)) that detects each dish in a food image by (a) and

the food calorie of each dish image based on detected results are estimated by (b).

Our method (Multi-task learning)

Examples of dish detection and food calorie estimation

from multiple-dish food photos. (based on YOLOv2)

We estimate food calories from multiple-dishes food photos.

Multi-task learning of food detection and food calorie estimation.

Image-based food calorie estimation based on amount of food.

Construction of large-scale food photos dataset.

Some meal recording app can estimate food calories.

But they …

• Need user’s manual input of food categories and volumes.

• Estimate food calories for each dish one by one.

• Are paid service to hire nutritionists who estimate food calories.

Grilled fish

221kcal

Rice

168kcal

Salad 35kcal

Miso soup 60kcal

Tofu 42kcal

Purpose : Image-based food calorie estimation

Multi-task learning of dish detection and food calorie estimation

Conv layers ( Detection Network )

(x,y,w,h) (class)

Bbox 1 Bbox B

1,2,3 C-1,C

(calorie)

Output feature map ( Ours )

• High-speed and highly accurate CNN-based detection system.

• End-to-end learning of the whole system is possible.

The architecture of network of SSD ([1]).

1. SSD : Wei Liu et. al. [1] 2016 ( Detection Net )

• Simultaneous estimation of bounding boxes of food dishes and their calories.

• The output channels of estimated calories are added to the output features.

We use the network for single-dish detection from multiple-dish photos.

• Image-based food calorie estimation with CNN.

• Regression based-method.

• Output food calories directly from single-dish food photos.

• We denote 𝑳𝒓𝒆 as an relative error and 𝑳𝒂𝒃 as a absolute error, 𝑳𝒄𝒂𝒍 is

defined as follows:

2. Single-task CNN : Ege and Yanai [2] 2017 ( Calorie estimation Net )

Conv layers ( VGG16[4] )

335 kcal

Food calorie estimation

fc6 fc8c

Food calorie for one person.Single-dish Food Photo.

𝑳𝒄𝒂𝒍 = λ𝑟𝑒𝐿𝑟𝑒 + λ𝑎𝑏𝐿𝑎𝑏

𝑳𝒓𝒆 =𝑦 − 𝑔

𝑔𝐿𝑎𝑏 = 𝑦 − 𝑔

𝑦 is an estimated food calorie.

𝑔 is ground-truth.

We use the network for food calorie estimation.

• Multi-task learning of dish detection and calorie estimation with both datasets.

In training of UECFood-100 dataset, we use the detection loss.

In training of Calorie-50 dataset, we use the calorie loss and detection loss.

Calorie50 dataset

with food calories.

• Pseudo-bounding boxes

for Calorie50 dataset.

Fried noodle 601 kcal(ES)

Pilaf 495kcal(ES)Miso soup 102kcal(ES)

Nikujaga 417kcal(ES)

Potato salad 144kcal(ES)

Curry 483kcal(ES)

Potato salad 170kcal(ES)

Gratin 456kcal(ES)

Spaghetti 536kcal(ES)

Fried noodle 539kcal(GT)Pilaf 475kcal(GT)

Potato salad 169kcal(GT)Miso soup 74kcal(GT)Nikujaga 352kcal(GT)

Gratin 450kcal(GT)Spaghetti 518kcal(GT)

Curry 761kcal(GT)Potato salad 169kcal(GT)

Potato salad 175 kcal(ES)

Cold tofu 76kcal(ES) Stew 270kcal(ES)

Gratin 482kcal(ES)

Potato salad 147kcal(ES)

Gratin 496kcal(ES)

Miso soup 106kcal(ES)

Cold tofu 89kcal(ES)

Nikujaga 387kcal(ES)

Nikujaga 450kcal(ES)

Potato salad 169kcal(GT)Cold tofu 95kcal(GT)Nikujaga 352kcal(GT)

Gratin 450kcal(GT)Stew 382kcal(GT)

Cold tofu 95kcal(GT)Miso soup 74kcal(GT)

Potato salad 169kcal(GT)Gratin 450kcal(GT)Nikujaga 352kcal(GT)

Our method achieves high speed and save memory by

simultaneous estimation in a single network.

Our method achieves the calorie estimation from

multiple-dish photos without degrading dish detection accuracy.