classification and semantic segmentationyboykov/courses/cs898/lectures/lec5... · semantic...

51
Classification and Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, Serena Yeung, Jia-Bin Huang, Bharath Hariharan, Jeremy Jordan

Upload: others

Post on 07-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Classification

and

Semantic Segmentation

Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, Serena Yeung, Jia-Bin Huang, Bharath Hariharan, Jeremy Jordan

Page 2: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Supervised Machine Learning

▪ Training data 𝑥1, … , 𝑥𝑁 with true labels (targets) 𝑦1, … , 𝑦𝑁▪ Chose hypothesis class ℎ 𝑥,𝑊▪ Define loss function for 𝑥 when the true label is 𝑦

▪ i.e. 𝐿 ℎ 𝑥,𝑊 , 𝑦 = 𝑦 − ℎ 𝑥,𝑊 2

▪ Training stage▪ minimize total loss on training set using gradient descent

min𝑊

𝑖=1

𝑁

𝐿 ℎ 𝑥𝑖 ,𝑊 , 𝑦𝑖

▪ Test stage▪ compute accuracy on test data, unseen during training

Page 3: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Single Layer Neural Network on Images

▪ 2 classes (cat vs dog)▪ ℎ 𝑥,𝑊 = σ 𝑊𝑥

▪ range in (0,1)

Page 4: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Single Layer Neural Network on Images

▪ 2 classes (cat vs dog)▪ ℎ 𝑥,𝑊 = σ 𝑊𝑥

▪ range in (0,1) 𝑊p(dog)sigmoid

▪ Also called Linear Classifier▪ Works well only for linearly separable classes

▪ not expressive enough

Page 5: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Single Layer NN for multiple classes

▪ Several classes (dog, cat, horse)

p(horse)softmax p(dog)

p(cat)

▪ One-hot encoding for labels 𝑦 horse = 100

, dog = 010

, cat = 001

𝑐𝑙𝑎𝑠𝑠𝑒𝑠

𝑦𝑡𝑟𝑢𝑒log(𝑦𝑝𝑟𝑒𝑑)▪ Cross-entropy loss

Page 6: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Multilayer Neural Network on images

Linear +

ReLU

Linear +

ReLULinear + sigmoid p(dog)

1024

32

▪ cat vs dog

256x256 2048

▪ Layers are called fully-connected▪ Expressive enough, but huge number of parameters

▪ expensive, requires lots of data to train well

Page 7: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Reducing Number of Parameters

65,536

Page 8: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Idea 1: local connectivity

Pixels only related to nearby pixels

Page 9: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Idea 2: Translation invariance

Pixels only related to nearby pixelsWeights should not depend on the location of the neighborhood

Page 10: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Linear function + translation invariance = convolution

▪ Local connectivity determines kernel size

5.4 0.1 3.6

1.8 2.3 4.5

1.1 3.4 7.2

Page 11: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Convolution over multiple channels

*

*

*

*+

+

= =

Page 12: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

CNN: Convolutional layer

w

h

c

w

h

c’

Convolution

c

c’

Page 13: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

CNN: Convolution Subsampling Convolution

▪ Subsampling can be implemented by applying convolution in strides▪ every 2 (or 3, or 4,…) pixels ▪ number of features is usually increased after subsampling, to maintain

expressiveness

subsampling

▪ Convolution in earlier steps detects more local patterns less resilient to distortion▪ Convolution in later steps detects more global patterns more resilient to

distortion▪ Subsampling allows capture of larger, more invariant patterns

Page 14: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Invariance to distortions: Pooling

▪ Each window reduced to one value▪ with max or average

Page 15: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

4 7 6 9 3 11

8 3 21 4 0 0

1 2 1 3 5 6

7 9 4 3 1 8

5 2 1 5 5 0

0 1 6 4 5 6

Invariance to distortions: Max Pooling

8 21 11

9 4 8

5 6 6

Page 16: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

4 7 6 9 3 11

8 3 21 4 0 0

1 2 1 3 5 6

7 9 4 3 1 8

5 2 1 5 5 0

0 1 6 4 5 6

Invariance to distortions: Average Pooling

5.5 10 3.5

4.75 2.75 5

2 4 4

▪ Each pooling layer takes a collection of feature maps as input and produces a collection of feature maps as output

▪ Output feature maps are usually smaller in height and width▪ Parameters: None

Page 17: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

CNN: Pooling Layer

Page 18: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Convolutional networks

Horse

convolutional and pooling layers

fully connected layers

Page 19: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

First Successful Classification CNN

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86.11 (1998): 2278-2324.

Page 20: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

AlexNet - 2012

▪ Won ImageNet competition by a large margin

Page 21: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

▪ First simple widely used net▪ Smaller filters and Deeper Network

VGGNet 2014

Page 22: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

ResNet 2015

▪ Many more layers▪ Special skip connections for better training

Page 23: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation

person

grass

trees

motorbike

road

Page 24: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation

Page 25: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation: One-hot encoding

Page 26: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation: Cross-Entropy Loss Function

𝑐𝑙𝑎𝑠𝑠𝑒𝑠

𝑦𝑡𝑟𝑢𝑒log(𝑦𝑝𝑟𝑒𝑑)

▪ Pixelwise loss

▪ Added over all pixels

Page 27: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

h

w

3

Page 28: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

h/4

w/4

d

Page 29: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

d

h/4

w/4

Page 30: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

h/4

w/4

d𝑑 good features for classifying top left ‘pixel’

Page 31: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

𝑑

convolve with 𝑐 filters of size 1x1

𝑐

h/4

w/4

▪ Finally pass 𝑐 features of each pixel feature through softmax

Page 32: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Semantic Segmentation with CNNs

▪ Pass image through convolution and subsampling layers

▪ Final convolution with #classes outputs▪ Get scores for subsampled image▪ Upsample back to original size

person

bicycle

Page 33: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

The Resolution Issue

▪ Problem: Need fine details!▪ Shallower network/earlier layers?

▪ not very semantic

Horse

Visualizations from : M. Zeiler and R. Fergus. Visualizing and Understanding Convolutional Networks. In ECCV 2014

Page 34: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

The Resolution Issue

▪ Problem: Need fine details!▪ Remove subsampling?

▪ Need many features per pixel▪ expensive without subsampling

▪ Need large field of view for final features▪ very deep network, expensive without subsampling

Page 35: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 1: Image pyramids

Learning Hierarchical Features for Scene LabelingClement Farabet, Camille Couprie, Laurent Najman, Yann LeCun. In TPAMI, 2013

Hig

her

res

olu

tio

nLe

ss c

on

text

▪ Does not scale well to deep architectures

Page 36: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 2: CNN+Conditional Random Fields▪ Combine with CRF as post-processing

▪ “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFS”, Chen et.al. ICLR’2015

Page 37: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 2: CNN+Conditional Random Fields

CNN

input class probabilities Full CRF final output

RNN

◼ Combine with CRF in end-to-end trainable system ◼ mean field inference implemented as RNN◼ Zheng et.al., “Conditional Random Fields as Recurrent Neural Networks”ICCV’2015

Page 38: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 3: Learn to Upsample

◼ Encoder/decoder structure

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015Badrinarayanan et al, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”, TPAMI 2017

Page 39: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Methods for Upsampling

Page 40: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Methods for Upsampling

Page 41: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Decoding using only Upsampling

From long et al.: struggles to produce fine-grained segmentationsSemantic segmentation faces an inherent tension between semantics and location: global information resolves what while local information resolves where... Combining fine layers and coarse layers lets the model make local predictions that respect global structure.

Page 42: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 4: Skip connections

upsample

Page 43: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Skip connections

Fully convolutional networks for semantic segmentation. Evan Shelhamer, Jon Long, Trevor Darrell. In CVPR 2015

without skip connections

with skip connections

Page 44: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 5: Dilation

▪ Need subsampling to allow convolutional layers to capture large regions with small filters▪ can we do this without subsampling?

Page 45: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 5: Dilation

▪ Need subsampling to allow convolutional layers to capture large regions with small filters▪ can we do this without subsampling?

Fully convolutional networks for semantic segmentation. Evan Shelhamer, Jon Long, Trevor Darrell. In CVPR 2015Multi-Scale Context Aggregation by Dilated Convolutions. Yu et.al.ICRL’2016

Page 46: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 5: Dilation

▪ Need subsampling to allow convolutional layers to capture large regions with small filters▪ can we do this without subsampling?

Page 47: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Solution 5: Dilation

▪ Instead of subsampling by factor of 2: dilate by factor of 2▪ allows for exponential increase in field of view without decrease

of spatial dimensions▪ Not panacea: without subsampling, feature maps are much larger

▪ memory issues

Page 48: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

Putting it all together

55

60

65

70

Basic +Skip +Dilation +CRF

mean IoU on PASCAL VOC

Best Non-CNN approach: ~46.4%

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan Yuille. In ICLR, 2015.

Page 49: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

More Architectures: U-net

▪ expanding the decoder with symmetry

“U-Net: Convolutional Networks for Biomedical Image Segmentation”, Ronneberger et. al., ICMI’2015

Page 50: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

PSPNet▪ Pyramid Pooling mode

▪ new module to capture global scene context▪ 82.6 mean IoU on PASCAL VOC

“Pyramid Scene Parsing Network”, Zhao et.al., CVPR 2017

Page 51: Classification and Semantic Segmentationyboykov/Courses/cs898/Lectures/lec5... · Semantic Segmentation Most slides are from Fei-Fei Li, Justin Johnson, Andrej Karpathy, SerenaYeung,

ICNet for Real-Time Semantic Segmentation

“ICNet for Real-Time Semantic Segmentation on High-Resolution Images ”, Zhao et.al., ECCV 2018

▪ Apply heavier CNN to small resolution