deep inside convolutional networks: visualising image...

Given an image classification ConvNet, we aim to answer two questions: • What does a class model look like? • What makes an image belong to a class? To this end, we visualise: • Canonical image of a class • Class saliency map for a given image and class Both visualisations are based on the class score derivative w.r.t. the input image (computed using back-prop) • Derivative of a ConvNet class score w.r.t. the input image is useful for visualising: • canonical image of a class • image-specific class saliency • Image-specific class saliency can be further processed to perform weakly-supervised object segmentation and detection 1. OVERVIEW Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University of Oxford, UK This work was supported by ERC grant VisRec no. 228180 We acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU 2. CLASS MODEL VISUALISATION fully connected classifier layer soft-max layer … • We compute a (regularised) image with a high class score : [Erhan et al., 2009] • Optimised using gradient descent, initialised with the zero image • Gradient is computed using back-prop • Maximising soft-max score leads to worse visualisation • We visualise a ConvNet trained on ImageNet ILSVRC 2013 (1000 classes) 3. IMAGE-SPECIFIC CLASS SALIENCY VISUALISATION • Linear approximation of the class score in the neighbourhood of an image : • has the same size as the image • Magnitude of defines a saliency map for image and class – computed using back-prop – score of -th class Image-Specific Class Saliency Properties: • Weakly supervised • computed using classification ConvNet, trained on image labels • no additional annotation required (e.g. boxes or masks) • Highlights discriminative object parts • Instant computation – no sliding window, just a single back-prop pass • Fires on several object instances • Given an image and a saliency map: 1. Saliency map is thresholded to obtain foreground / background masks 2. GraphCut colour segmentation [Boykov and Jolly, 2001] is initialised with the masks 3. Object localisation: bounding box of the largest foreground connected component • GraphCut propagates segmentation from the most salient areas of the object • ILSVRC 2013 localisation accuracy: 46.4% • weak supervision: ground-truth bounding boxes were not used for training • saliency maps for top-5 predicted classes were used to compute five bounding box predictions Layer Forward pass DeconvNet [Zeiler & Fergus, 2013] Back-prop w.r.t. input Convolution RELU Max-pooling equivalent equivalent max location “switch” slightly different: threshold layer output vs input – n th layer activity; – n th layer DeconvNet reconstruction; – visualised neuron activity goose toucan ostrich keyboard dumbbell husky kit fox dalmatian bell pepper lemon 4. WEAKLY-SUPERVISED OBJECT LOCALISATION 5. RELATION TO DECONVOLUTIONAL NETS 6. CONCLUSION

Upload: others

Post on 26-Sep-2020

0 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Deep Inside Convolutional Networks: Visualising Image ...vgg/publications/2014/Simonyan14a/poster.pdf · Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University

Given an image classification ConvNet, we aim to answer two questions: • What does a class model look like? • What makes an image belong to a class?

To this end, we visualise: • Canonical image of a class • Class saliency map for a given image and class

Both visualisations are based on the class score derivative w.r.t. the input image (computed using back-prop)

• Derivative of a ConvNet class score w.r.t. the input image is useful for visualising: • canonical image of a class • image-specific class saliency

• Image-specific class saliency can be further processed to perform weakly-supervised object segmentation and detection

1. OVERVIEW

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Visual Geometry Group, University of Oxford, UK

This work was supported by ERC grant VisRec no. 228180 We acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU

2. CLASS MODEL VISUALISATION

fully connected classifier layer

soft-max layer

…

• We compute a (regularised) image with a high class score : [Erhan et al., 2009]

• Optimised using gradient descent, initialised with the zero image

• Gradient is computed using back-prop

• Maximising soft-max score leads to worse visualisation

• We visualise a ConvNet trained on ImageNet ILSVRC 2013 (1000 classes) 3. IMAGE-SPECIFIC

CLASS SALIENCY VISUALISATION

• Linear approximation of the class score in the neighbourhood of an image :

• has the same size as the image • Magnitude of defines a saliency map for image

and class

– computed using back-prop

– score of -th class

Image-Specific Class Saliency Properties:

• Weakly supervised

• computed using classification ConvNet, trained on image labels

• no additional annotation required (e.g. boxes or masks)

• Highlights discriminative object parts

• Instant computation – no sliding window, just a single back-prop pass

• Fires on several object instances

• Given an image and a saliency map:

1. Saliency map is thresholded to obtain foreground / background masks

2. GraphCut colour segmentation [Boykov and Jolly, 2001] is initialised with the masks

3. Object localisation: bounding box of the largest foreground connected component

• GraphCut propagates segmentation from the most salient areas of the object

• ILSVRC 2013 localisation accuracy: 46.4%

• weak supervision: ground-truth bounding boxes were not used for training

• saliency maps for top-5 predicted classes were used to compute five bounding box predictions

Layer Forward pass DeconvNet [Zeiler & Fergus, 2013] Back-prop w.r.t. input

Convolution

RELU

Max-pooling

equivalent

max location “switch”

slightly different: threshold layer output vs input

– nth layer activity; – nth layer DeconvNet reconstruction; – visualised neuron activity

goose toucan ostrich keyboard dumbbell

husky kit fox dalmatian bell pepper lemon

4. WEAKLY-SUPERVISED OBJECT LOCALISATION

5. RELATION TO DECONVOLUTIONAL NETS 6. CONCLUSION

Relja Arandjelović and Andrew Zisserman · 2014. 10. 28. · Visual vocabulary with a semantic twist Relja Arandjelović and Andrew Zisserman Visual Geometry Group, Department of

{Charles, Pfister, Magee, Hogg, and Zisserman} 2013 ... · {Charles, Pfister, Everingham, and Zisserman} 2013{} {Dantone, Gall, Leistner, and Vanprotect unhbox voidb@x penalty @M

Synthetic Data for Text Localisation in Natural Imagesankush/textloc.pdfSynthetic Data for Text Localisation in Natural Images Ankush Gupta Andrea Vedaldi Andrew Zisserman Dept. of

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION does size matter? Karen Simonyan Andrew Zisserman

Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI ... · Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI Research Abstract We propose a method for building large collections

Fcv taxo zisserman

Spatial Transformer Networkspapers.nips.cc/paper/5854-spatial-transformer-networks.pdf · Spatial Transformer Networks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu

Parallel Separable 3D Convolution for Video and Volumetric ... · arXiv:1809.04096v1 [cs.CV] 11 Sep 2018 {He, Zhang, Ren, and Sun} 2015{} {Simonyan and Zisserman} 2014{} {Yu and Koltun}

Cats and Dogs - University of Oxfordvgg/publications/2012/parkhi12a/parkhi… · Cats and Dogs Omkar M Parkhi 1;2 Andrea Vedaldi Andrew Zisserman C. V. Jawahar2 1Department of Engineering

A arXiv:1412.6598v2 [cs.CV] 11 Apr 2015vgg/publications/2015/Parizi15/parizi15.pdf · Sobhan Naderi Parizi Andrea Vedaldi Andrew Zisserman Pedro Felzenszwalb Brown University University

arXiv:1805.08136v3 [cs.CV] 24 Jul 2019 · Philip H.S. Torr Andrea Vedaldi FiveAI & University of Oxford University of Oxford [email protected] [email protected] ABSTRACT

Lecture 10: Convolutional Neural Networks · CS231n: Convolutional Neural Networks K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large -Scale Image Recognition,

Ken Chatfield James Philbin Andrew Zisserman

Andrew Zisserman Talk - Part 2

Polymeric Nano-Systems Used in Drug Delivery Arsen Simonyan SUNY-ESF

The SVM classifier zisserman lecture note.pdf

Vedaldi sift.pdf

A. Simonyan, V. Danielyan, V. Dekhtiarov,

Andrew Zisserman Talk - Part 1b

Springer 2008 Vedaldi A

Objects in Contextcseweb.ucsd.edu/~sjb/iccv2007a.pdf · Objects in Context Andrew Rabinovich [email protected] Andrea Vedaldi y [email protected] Carolina Galleguillos [email protected]

Simonyan 1000 i 1 Urok

Andrew Zisserman - UCLAhelper.ipam.ucla.edu/publications/sews2/sews2_7272.pdfAndrew Zisserman (work with Ondřej Chum, ... 2. Scaling up ... Part 2: Scaling up: the Oxford buildings

Deep Features for Text Spotting - Information … Features for Text Spotting Max Jaderberg, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department of Engineering Science,

A arXiv:1412.6598v2 [cs.CV] 11 Apr 2015vedaldi/assets/pubs/parizi...Sobhan Naderi Parizi Andrea Vedaldi Andrew Zisserman Pedro Felzenszwalb Brown University University of Oxford Brown

CSE 559A: Computer Visionayan/courses/cse559a/PDFs/lec...Let's talk about VGG-16. Winner of Imagenet 2014. Reference Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman,

Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI … · 2020-04-09 · Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI Research Abstract We propose a method for building

Attribute Recognition from Adaptive Parts arXiv:1607 ...pingtan/Papers/bmvc16.pdf · arXiv:1607.01437v2 [cs.CV] 18 Jul 2016 {Jader berg, Simonyan, Zisserman, etprotect unhbox voidb@x

arXiv:1412.7062v2 [cs.CV] 28 Feb 2015 › pdf › 1412.7062v2.pdf · (Krizhevsky et al., 2013; Sermanet et al., 2013; Simonyan & Zisserman, 2014; Szegedy et al., 2014; Work initiated

VLFeat - An open and portable library of computer vision ...vedaldi/assets/pubs/vedaldi10vlfeat.pdf · VLFeat - An open and portable library of computer vision algorithms Andrea Vedaldi

Very Deep ConvNets for Large-Scale Image Recognitionkaren/pdf/ILSVRC_2014.pdf · Very Deep ConvNets for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman Visual Geometry

EEP STRUCTURED OUTPUT LEARNING FOR ...vgg/publications/2015/...DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew

Speeding up Convolutional Neural Networks with Low …vedaldi/assets/pubs/... · 2016-01-05 · JADERBERG, VEDALDI, ZISSERMAN: SPEEDING UP CONVOLUTIONAL NEURAL... 1 Speeding up Convolutional

On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman

SpatialTransformerNetworks · 2018. 10. 9. · SpatialTransformerNetworks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu Google DeepMind, London, UK 15210240008 贺珂珂