pixel-level image understanding with semantic segmentation...
TRANSCRIPT
Pixel-Level Image Understanding with Semantic Segmentation and Panoptic Segmentation
Hengshuang Zhao
The Chinese University of Hong Kong
May 29, 2019
Part I: Semantic Segmentation
Semantic Segmentation
Original Image Per-Pixel Annotation
person
horse
car
background
Images adapted from PASCAL VOC 2012Images adapted from ADE20K
Fully Convolutional Network
FCN [Long et al. 2015]
Conditional Random Field
DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]
Encoder-Decoder
UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015],SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016],
RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]
Atrous Convolution / Dilated Convolution
DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]
Context Aggregation
Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016]Large Kernel: GCN [Peng et al. 2017]
Neural Architecture Search
Search for backbone: Auto-DeepLab [Liu et al. 2019]Search for head: DPC [Chen et al. 2018]
Attention Mechanism
Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018]OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]
Channel reweighting: SENet [Hu et al. 2018],EncNet [Zhang et al. 2018], DFN [Yu et al. 2018]
Point-wise Spatial Attention Network (PSANet)
• Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions
• Pooling Operation: Fixed weights at each position with none adaptively manner
• Feature Correlation: Relative position information ignored
• Point-wise Spatial Attention:
• Long-range context aggregation for dense prediction
• Bi-direction information propagation
• Self-adaptively learned and location-sensitive masks
Point-wise Spatial Attention Network
Point-wise Spatial Attention Network
Information collection branch
Information distribution branch
Over-completed Compact
Point-wise Spatial Attention Network
Information collection branch
Information distribution branch
Over-completed Compactfeature fusion: local & global
Attention Mask Generation
Incorporation with FCN
Result on ADE20K and VOC 2012
ADE20K: information aggregation approaches ADE20K: result on val set
PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set
Result on Cityscapes
result on val set
result on test set(train with fine set)
result on test set(train with fine+coarse set)
Visual Prediction on ADE20K
Visual Prediction on VOC 2012
Visual Prediction on Cityscapes
Mask Visualization
Part II: Panoptic Segmentation
Semantic Segmentation
semantic segmentation:instances indistinguishable
Instance Segmentation
instance segmentation:stuff unsolved
Panoptic Segmentation
panoptic segmentation:stuff and things are solved, instances distinguishable
Heuristic Combination
Mask R-CNN [He et al. 2017]
PSPNet [Zhao et al. 2017]
Instance
Semantic
redundant computation for independent models
Heuristic Combination
Mask R-CNN [He et al. 2017]
PSPNet [Zhao et al. 2017]
Instance
Semantic
HeuristicMerge
heuristic merge logic is not end-to-end trainable
heuristic combination
our end-to-end output
Unified Panoptic Segmentation Network (UPSNet)
Unified Backbone NetworkSave Computation!
Pixel-wise ClassificationConsistent Estimation!
Semantic & Instance Head
Semantic Head: FPN with Deformable ConvInstance Head: Same as Mask-RCNN
Panoptic Head
Mask logits from Instance head
𝑌𝑖 resize/pad
𝑋thing
Thing & Stuff logitsfrom Semantic head
𝑋mask𝑖
𝑁inst
H x W
𝑋stuff𝑁stuff
H x W
Panoptic logits
max
max
1Logits for Unknown
Performance Comparison
160
165
170
175
180
185
190
41.4
41.6
41.8
42
42.2
42.4
42.6
Results on COCO (800 x 1300)
0
200
400
600
800
1000
1200
57
57.5
58
58.5
59
59.5
Results on Cityscapes (1024 x 2048)
UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP
Detailed Result
result on COCO result on Cityscapes
result on internal datarun time comparison
Visual Prediction
result on COCO
result on Cityscapes
Code Resource
I. Semantic Segmentation:• Caffe:
• https://github.com/hszhao/PSPNet• https://github.com/hszhao/PSANet• https://github.com/hszhao/ICNet
• PyTorch:• https://github.com/hszhao/semseg (new)• highly optimized codebase with better reimplementation results
II. Panoptic Segmentation:• PyTorch:
• https://github.com/uber-research/UPSNet• the first open sourced codebase for unified end-to-end panoptic segmentation
Remain Problem
I. Semantic Segmentation:• imbalance classes: long-tail distribution
• confusion classes: using human’s confusion matrix (e.g., ade20k) as prior
• data augmentation: adaptive augmentation or auto augmentation
• hard mining: effective while not elegant
• robustness and generalization: one model for different datasets
• accuracy and efficiency: can both be achieved?
II. Panoptic Segmentation:• introduce parameters into panoptic head (e.g., 3d Conv)
• new frameworks with a single panoptic head
Thanks!