espnet: efficient spatial pyramid of dilated convolutions ... · complexity of the esp module...

Introduction • Real-world applications demand online processing of data locally on resource constrained edge devices. • We introduce ESPNet, a deep learning based segmentation network for edge devices which is fast, light-weight, memory efficient, and power efficient. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation Sachin Mehta 1 , Mohammad Rastegari 2 , Anat Caspi 1 , Linda Shapiro 1 , and Hannaneh Hajishirzi 1 1 University of Washington, Seattle, WA 2 Allen Institute of Artificial Intelligence and XNOR.AI Email: {sacmehta, caspian, shapiro, hannaneh}@cs.washington.edu [email protected] Efficient spatial pyramid (ESP) module. PASCAL VOC 2012 (GPU: TitanX) Unseen dataset mIOU ENet 0.33 ERFNet 0.25 ESPNet 0.40 Project Webpage ESPNet • Encoder-decoder with ESP as the basic building block • For efficiency and accuracy: – Input-reinforcement (IR): concatenate image with feature maps at different spatial-levels – Light-weight decoder (LWD): bottom-up decoding with low- dimensional feature maps from the encoder Results on the CityScapes dataset • ESP outperforms MobileNet (7%) and ShuffleNet (12%) while learning a similar number of parameters with comparable inference speed • ESPNet is more accurate, faster, and more power efficient than ENet • ESPNet is 22x faster and 180x smaller than PSPNet, while its 8% less accurate Device: Laptop, GPU: GTX960M Device: Desktop, GPU: TitanX Device: NVIDIA TX2 • ESPNet delivers competitive performance without pre-training on the ImageNet. • ESPNet outperforms SegNet (4%) with 81x fewer parameters. • Reduce: project high-dimensional feature maps to low-dimensional space • Split and Transform: learn representations in parallel with different dilation rates • Merge: concatenate hierarchically fused feature maps to produce output Hierarchical feature fusion (HFF) • removes the gridding artifacts caused by dilated convolutions • does not increase the computational complexity of the ESP module ESPNet has more generalization power than ENet and ERFNet on an unseen dataset, Mapillary. Breast biopsy WSI dataset ESPNet achieves the same accuracy as the SOTA method while learning 9.5x fewer parameters. ESPNet segments high-resolution images at 112 FPS Compared to the standard convolution, ESP • learns fewer parameters • has fewer FLOPs • has higher receptive field

Upload: lynguyet

Post on 29-Aug-2019

225 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: ESPNet: Efficient Spatial Pyramid of Dilated Convolutions ... · complexity of the ESP module ESPNet has more generalization power than ENet and ERFNet on an unseen dataset, Mapillary

Introduction• Real-world applications demand online processing of data

locally on resource constrained edge devices.

• We introduce ESPNet, a deep learning based segmentation network for edge devices which is fast, light-weight, memory efficient, and power efficient.

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Sachin Mehta1, Mohammad Rastegari2, Anat Caspi1, Linda Shapiro1, and Hannaneh Hajishirzi1

1University of Washington, Seattle, WA 2Allen Institute of Artificial Intelligence and XNOR.AI

Email: {sacmehta, caspian, shapiro, hannaneh}@cs.washington.edu [email protected]

Efficient spatial pyramid (ESP) module.

PASCAL VOC 2012 (GPU: TitanX)

Unseen datasetmIOU

ENet 0.33

ERFNet 0.25

ESPNet 0.40

Project Webpage

ESPNet• Encoder-decoder with ESP as the basic building block

• For efficiency and accuracy:

– Input-reinforcement (IR): concatenate image with feature maps at different spatial-levels

– Light-weight decoder (LWD): bottom-up decoding with low-dimensional feature maps from the encoder

Results on the CityScapes dataset

• ESP outperforms MobileNet (7%) and ShuffleNet (12%) while learning a similar number of parameters with comparable inference speed

• ESPNet is more accurate, faster, and more power efficient than ENet• ESPNet is 22x faster and 180x smaller than PSPNet, while its 8% less accurate

Device: Laptop, GPU: GTX960M Device: Desktop, GPU: TitanX Device: NVIDIA TX2

• ESPNet delivers competitive performance without pre-training on the ImageNet.• ESPNet outperforms SegNet (4%) with 81xfewer parameters.

• Reduce: project high-dimensional feature maps to low-dimensional space

• Split and Transform: learn representations in parallel with different dilation rates

• Merge: concatenate hierarchically fused feature maps to produce output

Hierarchical feature fusion (HFF)• removes the gridding artifacts caused

by dilated convolutions• does not increase the computational

complexity of the ESP module

ESPNet has more generalization power than ENet and ERFNet on an unseen dataset, Mapillary.

Breast biopsy WSI datasetESPNet achieves the same accuracy as the SOTA method while learning 9.5x fewer parameters.

ESPNet segments

high-resolution

images at 112 FPS

Compared to the standard convolution, ESP

• learns fewer parameters

• has fewer FLOPs

• has higher receptive field

Cross-Linkage Between Mapillary Street Level Photos …flrec.ifas.ufl.edu/geomatics/hochmair/pubs/JuhaszHochmair_AGILE... · Cross-Linkage Between Mapillary Street Level Photos and

ESPNet: Efficient Spatial Pyramid of Dilated arXiv:1803 ... · tion [15]). We introduce an efficient convolutional module, ESP (efficient spatial pyra-mid), which is based on the

MSCOCO & Mapillary Panoptic Segmentation Challenge 2018presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf · Huanyu Liu Xiangyu Zhang Gang Yu Jian Sun Xu Liu Zeming Li

Exploratory Completeness Analysis of Mapillary for Selected …flrec.ifas.ufl.edu/geomatics/hochmair/pubs/Juhasz... · 2015. 7. 13. · As the last step, TileMill and Mapnik toolkits

Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

ESPNet: Efﬁcient Spatial Pyramid of Dilated arXiv:1803 ... · The ESP module decomposes a convolutional layer into a point-wise convolution and spatial pyramid of dilated convolutions

Convolutional Neural Network’s...Densely Connected Convolutional Networks (DenseNet161) Method and Model > ESPNet A fast and efficient convolutional neural network for high resolution

arXiv:1912.02096v3 [cs.CV] 30 Mar 2020Lorenzo Porziy, Markus Hoﬁngerz, Idoia Ruiz , Joan Serrat , Samuel Rota Bulo` y, Peter Kontschieder Mapillary Research y , Graz University of

ERFNet: Efﬁcient Residual Factorized ConvNet for Real-time ... · such as the road pavement, pedestrians, cars, signs or trafﬁc lights independently [1]. However, recent advances

Joint COCO and Mapillary Workshop at ICCV 2019: COCO ...Joint COCO and Mapillary Workshop at ICCV 2019: COCO Keypoint Challenge Track Technical Report: Res-Steps-Net for Multi-Person

Thursday 17 October 2019...for road asset management and broader community Edoardo Neerhut Mapillary 2.45pm Native Title and the role of the Cadastral Surveyor Dale Atkinson Atkinson

Joint COCO and Mapillary Recognition Challenge Workshop

Convolutional Neural Networks for Computer VisionConvolutional Block: ESPNet •This block works on following principle: •Reduce •Split •Transform •Merge •This block uses

ERFNet: Efﬁcient Residual Factorized ConvNet for Real ... · ERFNet: Efﬁcient Residual Factorized ConvNet for Real-time Semantic Segmentation ... We propose to redesign the non-bottleneck

Privacy Protection in Street-View Panoramas Using Depth and … · 2019. 6. 10. · In recent years, street-view services such as Google Street View, Bing Maps Streetside, Mapillary

workshop timetable - FoundationComputer Vision for Wildlife Conservation (pg. 5) Visual Recognition for Medical Images (pg. 5) Joint COCO and Mapillary Recognition Challenge (pg. 6)

The Mapillary Vistas Dataset for Semantic Understanding of ... · The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes Supplementary Material Gerhard Neuhold,

The Mapillary Vistas Dataset for Semantic Understanding of ...openaccess.thecvf.com/...The_Mapillary_Vistas_ICCV...The Mapillary Vistas Dataset is a novel, large-scale street-level

Enhancing Collaborative Data Collection Using Mapillary · Enhancing Collaborative Data Collection Using Mapillary Jay Dahlstrom, Christian Matthews, Tommy Nguyen August, 2017 . Recommended

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions ...openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet... · our ESP module is more efﬁcient than other

Mapillary Street-Level Sequences: A Dataset for Lifelong ...openaccess.thecvf.com/content_CVPR_2020/papers/... · task in computer vision, with vast applications in robust local-ization

The State of Mapillary: An Exploratory Analysis...International Journal of Geo-Information Article The State of Mapillary: An Exploratory Analysis Dawei Ma 1, Hongchao Fan 2,*, Wenwen

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for … · 2018. 8. 28. · 2 Mehta et al. (a) Reduce M,1×1,d ESP Strategy Split Transform Merge 1 1 2 2 d,n 3 × 3 ···

ESPNet: Efﬁcient Spatial Pyramid of Dilated arXiv:1803 ... · ESPNet: Efﬁcient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation Sachin Mehta1, Mohammad Rastegari2,

Efficient Semantic Segmentation using Gradual Grouping · 2020. 4. 8. · DG29 18 16 20 Models ERFNet-pretrained D*-proposed DG2*-proposed DG4*-proposed IOU 72.10 68.39 66.10 63.80

Heatmap-based Vanishing Point boosts Lane Detection · 2 Fig. 1: Illustration of our proposed network architecture and four possible structures. The ERFNet is the backbone of the

Mapillary Street-Level Sequences: A Dataset for Lifelong Place … · 2020. 6. 28. · Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition Frederik Warburg†,∗,