yolo: you only look onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/yolo.pdf · it...

60
YOLO: You Only Look Once Unified Real-Time Object Detection Presenter: Liyang Zhong Quan Zou

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

YOLO: You Only Look Once

Unified Real-Time Object Detection

Presenter: Liyang Zhong Quan Zou

Page 2: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Outline

1. Review: R-CNN

2. YOLO: -- Detection Procedure

-- Network Design

-- Training Part

-- Experiments

Page 3: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Page 4: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Proposal + Classification

Page 5: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Shortcoming:1. Slow, impossible for real-time detection

2. Hard to optimize

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Page 6: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

WHAT’S NEW

Regression

Page 7: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

YOLO Features:1. Extremely fast (45 frames per second)

2. Reason Globally on the Entire Image

3. Learn Generalizable Representations

Page 8: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Detection Procedure https://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44

Page 9: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

We split the image into an S*S grid

Page 10: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

We split the image into an S*S grid

7*7 grid

Page 11: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Page 12: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Page 13: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

each box predict:

P(Object): probability that the box contains an object

B = 2

Page 14: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Page 15: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Page 16: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell predicts boxes and confidences: P(Object)

Page 17: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Each cell also predicts a class probability.

Dog

Bicycle Car

Dining Table

Page 18: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Conditioned on object: P(Car | Object)

Dog

Bicycle Car

Dining Table

Eg.Dog = 0.8Cat = 0Bike = 0

Page 19: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Then we combine the box and class predictions.

P(class|Object) * P(Object)=P(class)

Page 20: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Finally we do threshold detections and NMS

Page 21: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

S * S * (B * 5 + C) tensor

https://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44

Page 22: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Network

Page 23: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

Page 24: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

pretrain

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

Page 25: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

pretrain stride = 2

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

Page 26: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Train

Page 27: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

During training, match example to the right cellhttps://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44

Page 28: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

During training, match example to the right cell

Page 29: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Dog = 1Cat = 0Bike = 0...

Adjust that cell’s class prediction

Page 30: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Look at that cell’s predicted boxes

Page 31: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Find the best one, adjust it, increase the confidence

Page 32: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Find the best one, adjust it, increase the confidence

Page 33: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Find the best one, adjust it, increase the confidence

Page 34: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Decrease the confidence of the other box

Page 35: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Decrease the confidence of the other box

Page 36: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Some cells don’t have any ground truth detections!

Page 37: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Some cells don’t have any ground truth detections!

Page 38: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Decrease the confidence of boxes boxes

Page 39: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Decrease the confidence of these boxes

Page 40: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Don’t adjust the class probabilities or coordinates

Page 41: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

Page 42: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

Page 43: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

Page 44: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

Page 45: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Experiments

•Datasets•PASCAL VOC 2007 & VOC 2012

Page 46: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Experiments

•Datasets

Page 47: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

Ref: https://pjreddie.com/publications/

Page 48: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

Ref: https://pjreddie.com/publications/

Page 49: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

⅓ Mile, 1760 feetRef: https://pjreddie.com/publications/

Page 50: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

Fast R-CNN 70.0 .5 FPS 2 s/img

176 feetRef: https://pjreddie.com/publications/

Page 51: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

Fast R-CNN 70.0 .5 FPS 2 s/img

Faster R-CNN 73.2 7 FPS 140 ms/img

12 feet8 feet

Ref: https://pjreddie.com/publications/

Page 52: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Accurate object detection is slow!

Pascal 2007 mAP Speed

DPM v5 33.7 .07 FPS 14 s/img

R-CNN 66.0 .05 FPS 20 s/img

Fast R-CNN 70.0 .5 FPS 2 s/img

Faster R-CNN 73.2 7 FPS 140 ms/img

YOLO 63.4 45 FPS 22 ms/img

2 feetRef: https://pjreddie.com/publications/

Page 53: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Error AnalysisLoc: Localization Error

Correct class,

.1<IOU<.5

Background:

IOU<0.1

Page 54: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

YOLO generalizes well to new domains (like art)

Ref: https://pjreddie.com/publications/

Page 55: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

S. Ginosar, D. Haas, T. Brown, and J. Malik. Detecting people in cubist art. In Computer Vision-ECCV 2014 Workshops, pages 101–116. Springer, 2014.

H. Cai, Q. Wu, T. Corradi, and P. Hall. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs.

Page 56: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Demo https://youtu.be/VOC3huqHrsshttps://youtu.be/VOC3huqHrss

Page 57: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Strengths and Weaknesses● Strengths:

○ Fast: 45fps, smaller version 155fps○ End2end training○ Background error is low

Page 58: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

● Weaknesses:○ Performance is lower than state-of-art ○ Makes more localization errors

Strengths and Weaknesses

Page 59: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Open Questions● How to determine the number of cell, bounding box and the size of the box

● Why normalization x,y,w,h even all the input images have the same resolution?

Page 60: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork

Extension PartYOLOv2!

arXiv:1612.08242