pixel-level image understanding with semantic segmentation...

Pixel-Level Image Understanding with Semantic Segmentation and Panoptic Segmentation

Hengshuang Zhao

The Chinese University of Hong Kong

May 29, 2019

Part I: Semantic Segmentation

Semantic Segmentation

Original Image Per-Pixel Annotation

person

background

Images adapted from PASCAL VOC 2012Images adapted from ADE20K

Fully Convolutional Network

FCN [Long et al. 2015]

Conditional Random Field

DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]

Encoder-Decoder

UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015],SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016],

RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]

Atrous Convolution / Dilated Convolution

DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]

Context Aggregation

Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016]Large Kernel: GCN [Peng et al. 2017]

Neural Architecture Search

Search for backbone: Auto-DeepLab [Liu et al. 2019]Search for head: DPC [Chen et al. 2018]

Attention Mechanism

Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018]OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]

Channel reweighting: SENet [Hu et al. 2018],EncNet [Zhang et al. 2018], DFN [Yu et al. 2018]

Point-wise Spatial Attention Network (PSANet)

• Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions

• Pooling Operation: Fixed weights at each position with none adaptively manner

• Feature Correlation: Relative position information ignored

• Point-wise Spatial Attention:

• Long-range context aggregation for dense prediction

• Bi-direction information propagation

• Self-adaptively learned and location-sensitive masks

Point-wise Spatial Attention Network

Information collection branch

Information distribution branch

Over-completed Compact

Point-wise Spatial Attention Network

Information collection branch

Information distribution branch

Over-completed Compactfeature fusion: local & global

Attention Mask Generation

Incorporation with FCN

Result on ADE20K and VOC 2012

ADE20K: information aggregation approaches ADE20K: result on val set

PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set

Result on Cityscapes

result on val set

result on test set(train with fine set)

result on test set(train with fine+coarse set)

Visual Prediction on ADE20K

Visual Prediction on VOC 2012

Visual Prediction on Cityscapes

Mask Visualization

Part II: Panoptic Segmentation

Semantic Segmentation

semantic segmentation:instances indistinguishable

Instance Segmentation

instance segmentation:stuff unsolved

Panoptic Segmentation

panoptic segmentation:stuff and things are solved, instances distinguishable

Heuristic Combination

Mask R-CNN [He et al. 2017]

PSPNet [Zhao et al. 2017]

Instance

Semantic

redundant computation for independent models

Heuristic Combination

Mask R-CNN [He et al. 2017]

PSPNet [Zhao et al. 2017]

Instance

Semantic

HeuristicMerge

heuristic merge logic is not end-to-end trainable

heuristic combination

our end-to-end output

Unified Panoptic Segmentation Network (UPSNet)

Unified Backbone NetworkSave Computation!

Pixel-wise ClassificationConsistent Estimation!

Semantic & Instance Head

Semantic Head: FPN with Deformable ConvInstance Head: Same as Mask-RCNN

Panoptic Head

Mask logits from Instance head

𝑌𝑖 resize/pad

𝑋thing

Thing & Stuff logitsfrom Semantic head

𝑋mask𝑖

𝑁inst

𝑋stuff𝑁stuff

Panoptic logits

1Logits for Unknown

Performance Comparison

Results on COCO (800 x 1300)

Results on Cityscapes (1024 x 2048)

UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP

Detailed Result

result on COCO result on Cityscapes

result on internal datarun time comparison

Visual Prediction

result on COCO

result on Cityscapes

Code Resource

I. Semantic Segmentation:• Caffe:

• https://github.com/hszhao/PSPNet• https://github.com/hszhao/PSANet• https://github.com/hszhao/ICNet

• PyTorch:• https://github.com/hszhao/semseg (new)• highly optimized codebase with better reimplementation results

II. Panoptic Segmentation:• PyTorch:

• https://github.com/uber-research/UPSNet• the first open sourced codebase for unified end-to-end panoptic segmentation

Remain Problem

I. Semantic Segmentation:• imbalance classes: long-tail distribution

• confusion classes: using human’s confusion matrix (e.g., ade20k) as prior

• data augmentation: adaptive augmentation or auto augmentation

• hard mining: effective while not elegant

• robustness and generalization: one model for different datasets

• accuracy and efficiency: can both be achieved?

II. Panoptic Segmentation:• introduce parameters into panoptic head (e.g., 3d Conv)

• new frameworks with a single panoptic head

Thanks!

pixel-level image understanding with semantic segmentation...

Documents

depth discontinuities by pixel-to-pixel...

advertentie-bc-20190529spinboxtel.nl/wp-content/uploads/2017/07/spinnieuws-20190529-me… ·...

pixel control system - elevator...

using the goes-16 glmglm pixel grid: -geostationary...

detector structures ii – pixel...

development of cmos pixel sensors featuring pixel-level

star1000 - 1m pixel radiation hard cmos image sensor...

mobile device & os compatibility...sistema operativo...

a pixel is not a pixel - quirksmode - for all your browser...

mediakit 2020 - boatbuilder türkiye › mediakits ›...

summary : depth discontinuities by pixel-to-pixel stereo

the evolution of computer graphics - nvidia.com · pixel....

semiconductor detectors -...

new data threshold scan@ angle – 1-pixel – all 2-pixel...

contentscfm.zjuyh.com/.../20190529/20190529141709_8019.pdf ·...

remove pixel. – easily get rid of pixel. from your pc

pimms: pixel imaging mass spectrometry with fast pixel...

pixel-by-pixel arterial spin labeling blood flow pattern

1 pixel h. wieman hft cdo lbnl 25-26-feb-2008. 2 topics ...

depfet pixel: a pixel device with integrated...