pixel-level image understanding with semantic segmentation...

Pixel-Level Image Understanding with Semantic Segmentation and Panoptic Segmentation

Hengshuang Zhao

The Chinese University of Hong Kong

May 29, 2019

Part I: Semantic Segmentation

Semantic Segmentation

Original Image Per-Pixel Annotation

person

horse

car

background

Images adapted from PASCAL VOC 2012Images adapted from ADE20K

Fully Convolutional Network

FCN [Long et al. 2015]

Conditional Random Field

DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]

Encoder-Decoder

UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015],SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016],

RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]

Atrous Convolution / Dilated Convolution

DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]

Context Aggregation

Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016]Large Kernel: GCN [Peng et al. 2017]

Neural Architecture Search

Search for backbone: Auto-DeepLab [Liu et al. 2019]Search for head: DPC [Chen et al. 2018]

Attention Mechanism

Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018]OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]

Channel reweighting: SENet [Hu et al. 2018],EncNet [Zhang et al. 2018], DFN [Yu et al. 2018]

Point-wise Spatial Attention Network (PSANet)

• Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions

• Pooling Operation: Fixed weights at each position with none adaptively manner

• Feature Correlation: Relative position information ignored

• Point-wise Spatial Attention:

• Long-range context aggregation for dense prediction

• Bi-direction information propagation

• Self-adaptively learned and location-sensitive masks

Point-wise Spatial Attention Network


Information collection branch

Information distribution branch

Over-completed Compact


Information collection branch

Information distribution branch

Over-completed Compactfeature fusion: local & global

Attention Mask Generation

Incorporation with FCN

Result on ADE20K and VOC 2012

ADE20K: information aggregation approaches ADE20K: result on val set

PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set

Result on Cityscapes

result on val set

result on test set(train with fine set)

result on test set(train with fine+coarse set)

Visual Prediction on ADE20K

Visual Prediction on VOC 2012

Visual Prediction on Cityscapes

Mask Visualization

Part II: Panoptic Segmentation

Semantic Segmentation

semantic segmentation:instances indistinguishable

Instance Segmentation

instance segmentation:stuff unsolved

Panoptic Segmentation

panoptic segmentation:stuff and things are solved, instances distinguishable

Heuristic Combination

Mask R-CNN [He et al. 2017]

PSPNet [Zhao et al. 2017]

Instance

Semantic

redundant computation for independent models

Heuristic Combination

Mask R-CNN [He et al. 2017]

PSPNet [Zhao et al. 2017]

Instance

Semantic

HeuristicMerge

heuristic merge logic is not end-to-end trainable

heuristic combination

our end-to-end output

Unified Panoptic Segmentation Network (UPSNet)

Unified Backbone NetworkSave Computation!

Pixel-wise ClassificationConsistent Estimation!

Semantic & Instance Head

Semantic Head: FPN with Deformable ConvInstance Head: Same as Mask-RCNN

Panoptic Head

Mask logits from Instance head

𝑌𝑖 resize/pad

𝑋thing

Thing & Stuff logitsfrom Semantic head

𝑋mask𝑖

𝑁inst

H x W

𝑋stuff𝑁stuff

H x W

Panoptic logits

max

max

1Logits for Unknown

Performance Comparison

160

165

170

175

180

185

190

41.4

41.6

41.8

42

42.2

42.4

42.6

Results on COCO (800 x 1300)

0

200

400

600

800

1000

1200

57

57.5

58

58.5

59

59.5

Results on Cityscapes (1024 x 2048)

UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP

Detailed Result

result on COCO result on Cityscapes

result on internal datarun time comparison

Visual Prediction

result on COCO

result on Cityscapes

Code Resource

I. Semantic Segmentation:• Caffe:

• https://github.com/hszhao/PSPNet• https://github.com/hszhao/PSANet• https://github.com/hszhao/ICNet

• PyTorch:• https://github.com/hszhao/semseg (new)• highly optimized codebase with better reimplementation results

II. Panoptic Segmentation:• PyTorch:

• https://github.com/uber-research/UPSNet• the first open sourced codebase for unified end-to-end panoptic segmentation

https://github.com/hszhao/PSPNet

https://github.com/hszhao/PSANet

https://github.com/hszhao/PSPNet

https://github.com/hszhao/semseg

https://github.com/uber-research/UPSNet

Remain Problem

I. Semantic Segmentation:• imbalance classes: long-tail distribution

• confusion classes: using human’s confusion matrix (e.g., ade20k) as prior

• data augmentation: adaptive augmentation or auto augmentation

• hard mining: effective while not elegant

• robustness and generalization: one model for different datasets

• accuracy and efficiency: can both be achieved?

II. Panoptic Segmentation:• introduce parameters into panoptic head (e.g., 3d Conv)

• new frameworks with a single panoptic head

Thanks!

pixel-level image understanding with semantic segmentation...

Documents