salicon: reducing the semantic gap in saliency prediction...

Xun Huang 1,2 , Chengyao Shen 1 , Xavier Boix 1,3 , and Qi Zhao 1 1 Department of Electrical and Computer Engineering, National University of Singapore 2 School of Computer Science and Engineering, Beihang University 3 CBMM, Massachusetts Institute of Technology SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. Motivation Semantic Gap: Conventional saliency models typically rely on low-level image statistics to predict human fixations. While these models perform significantly better than chance, there is still a large gap between model prediction and human behavior. This gap is largely due to the limited capability of models in predicting eye fixations with strong semantic content, the so- called semantic gap. Contributions This poster presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). Two key components of our architecture are: Saliency Objective: Objective functions based on the saliency evaluation metrics Extension to Multi-scale: Feature integration at different image scales Compared with 14 saliency models on 6 public eye tracking benchmark datasets, results demonstrate that our DNNs can automatically learn features for saliency prediction that surpass by a big margin the state-of-the-art. In addition, our model ranks top under all seven metrics on the MIT300 challenge set. Introduction Methods Results Results Intra-model comparison on multi-scale and fine-tuning Concatenation Linear Integration 512 Saliency Objectives (KLD, NSS, CC, SIM) Saliency Map 1024 Human Fixation Maps Resizing DNNs with shared weights Upsampling 512 37 512 50 and Normalization Input images 800 600 37 OSIE FIFA NUSEF MIT1003 Toronto PASCAL-S Stimuli Human Fine Coarse Fine+Coarse Methods Saliency Objective Four saliency evaluation metrics as objective functions of the back-propagation: Kullback-Leibler divergence (KLD) Normalized Scanpath Saliency (NSS) Similarity (Sim) Linear Correlation Coefficient (CC) Extension to Multi-scale We use the input image at different scales obtained by downsampling the image. Each scale is processed by a DNN. The neural responses of all DNNs are concatenated to predict the saliency map. The effect of multi-scale is qualitatively illustrated below: Architecture of the DNN Evaluation Metric Gauss State-of-the-art Ours Infinite Humans Relative Advances AUC-Judd 0.78 0.84 Deep Gaze 0.87 0.91 42.9% AUC-Borji 0.77 0.83 Deep Gaze 0.85 0.87 50.0% sAUC 0.51 0.68 AWS 0.74 0.80 50.0% NSS 0.92 1.41 BMS 2.12 3.18 40.1% CC 0.38 0.55 BMS 0.74 1 42.2% Similarity 0.39 0.51 BMS 0.6 1 18.4% EMD 4.81 3.33OS 2.62 0 21.3% Quantitative comparison (in sAUC) to the state-of-the-art models in 6 datasets. Qualitative comparison to the state-of-the-art Results on MIT300 Online Benchmark (in 7 evaluation metrics) Histogram of the weights of the convolutional filter Visualization Top 3 salient features Top 3 non-salient features Scan the QR code to try our demo! Stimuli Human Ours BMS AWS AIM RARE LG eDN Judd SigSal CAS CovSal ∆QDCT SUN GBVS ITTI

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: SALICON: Reducing the Semantic Gap in Saliency Prediction ...xhuang/publications/salicon_poster.pdf · Xun Huang1,2, Chengyao Shen1, Xavier Boix1,3, and Qi Zhao1 1Department of Electrical

Xun Huang1,2, Chengyao Shen1, Xavier Boix1,3, and Qi Zhao1

1Department of Electrical and Computer Engineering, National University of Singapore 2School of Computer Science and Engineering, Beihang University

3CBMM, Massachusetts Institute of Technology

SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks

Saliency in Context (SALICON) is an ongoing effort that aims

at understanding and predicting visual attention.

Motivation

Semantic Gap: Conventional saliency models typically rely on

low-level image statistics to predict human fixations. While these

models perform significantly better than chance, there is still a

large gap between model prediction and human behavior. This

gap is largely due to the limited capability of models in

predicting eye fixations with strong semantic content, the so-

called semantic gap.

Contributions

This poster presents a focused study to narrow the semantic gap

with an architecture based on Deep Neural Network (DNN). Two

key components of our architecture are:

Saliency Objective: Objective functions based on the saliency

evaluation metrics

Extension to Multi-scale: Feature integration at different

image scales

Compared with 14 saliency models on 6 public eye tracking

benchmark datasets, results demonstrate that our DNNs can

automatically learn features for saliency prediction that surpass

by a big margin the state-of-the-art. In addition, our model ranks

top under all seven metrics on the MIT300 challenge set.

Introduction Methods

Results

Intra-model comparison on multi-scale and fine-tuning

Concatenation

Linear Integration512

Saliency Objectives(KLD, NSS, CC, SIM)

Saliency Map

1024

Human Fixation Maps

ResizingDNNs with shared weights

Upsampling

512

and Normalization

Input images

800

0 37

OSIE

FIFANUSEF

MIT1003

Toronto

PASCAL-S

Stimuli Human Fine Coarse Fine+Coarse

Methods

Saliency Objective

Four saliency evaluation metrics as objective functions of the

back-propagation:

Kullback-Leibler divergence (KLD)

Normalized Scanpath Saliency (NSS)

Similarity (Sim)

Linear Correlation Coefficient (CC)

Extension to Multi-scale

We use the input image at different scales obtained by

downsampling the image. Each scale is processed by a DNN. The

neural responses of all DNNs are concatenated to predict the

saliency map. The effect of multi-scale is qualitatively illustrated

below:

Architecture of the DNN

Evaluation

Metric

Gauss State-of-the-art Ours Infinite

Humans

Relative

Advances

AUC-Judd 0.78 0.84 Deep Gaze 0.87 0.91 42.9%

AUC-Borji 0.77 0.83 Deep Gaze 0.85 0.87 50.0%

sAUC 0.51 0.68 AWS 0.74 0.80 50.0%

NSS 0.92 1.41 BMS 2.12 3.18 40.1%

CC 0.38 0.55 BMS 0.74 1 42.2%

Similarity 0.39 0.51 BMS 0.6 1 18.4%

EMD 4.81 3.33OS 2.62 0 21.3%

Quantitative comparison (in sAUC) to the state-of-the-art models in 6 datasets.

Qualitative comparison to the state-of-the-art

Results on MIT300 Online Benchmark

(in 7 evaluation metrics)

Histogram of the weights of the convolutional filter

Visualization

Top 3 salient features Top 3 non-salient features

Scan the QR code to try our demo!

Stimuli Human Ours BMS AWS AIM RARE LG eDN Judd SigSal CAS CovSal ∆QDCT SUN GBVS ITTI

Electronic Testing Equipment by Salicon Nano Technology Private Limited, Delhi

SALICON: Reducing the Semantic Gap in Saliency Prediction by … · 2015-10-24 · SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks Xun Huang1,2†

Characterization of Cell Boundary and Confocal Effects ...users.wpi.edu/~xhuang/pubs/2018_kingsley_biophysical.pdf · Article Characterization of Cell Boundary and Confocal Effects

1 ITC242 – Introduction to Data Communications. 2 Contact Details Dr Xiaodi Huang Building 760 Room 105 Phone 02 60519 652 Email [email protected]@csu.edu.au

Marketed & Supported By : Salicon Nano Technology Pvt. Ltd ... · PDF fileMarketed & Supported By : Salicon Nano Technology Pvt. Ltd. 111,First Floor, Laxmi Deep Tower, Laxmi Nagar

ComputerScience&Engineering,Texas A&M University ...people.tamu.edu/~xhuang/Xiao_WSDM18_Exploring.pdf · ComputerScience&Engineering,ArizonaStateUniversity,Tempe,AZ,USA Emails:{xhuang,qqsong,xiahu}@tamu.edu,

Alex D. Pon and Jason Ku and Chengyao Li and Steven L ...Alex D. Pon and Jason Ku and Chengyao Li and Steven L. Waslander Abstract—Safe autonomous driving requires reliable 3D object

Modeling Scientific Influence for Research Trending Topic ... · Modeling Scientiﬁc Inﬂuence for Research Trending Topic Prediction Chengyao Chen 1, Zhitao Wang , Wenjie Li ,

Department of Electrical and Computer Engineering University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu A Methodology

Multicast communications in cognitive radio networks using ...users.wpi.edu/~xhuang/pubs/2015_guo_wcmc.pdf · 1 Active Broadband Networks, 20 Speen St., Framingham, MA 01701, U.S.A

Security of Autonomous Systems Employing Embedded ...users.wpi.edu/~xhuang/pubs/2013_wyglinski_micro.pdf · the autonomous system should perform. The last necessary component is a

Tone Management - University of California, Irvine · Xuan Huang California State University-Long Beach [email protected] Siew Hong Teoh* University of California-Irvine [email protected]

Salicon Nano Technology Private Limited, Delhi , LCR meter

visual attention reading - MITweb.mit.edu/zoya/www/visual_attention_reading.pdf · Visual attention with deep neural nets Paper discussions: “SALICON: Reducing the Semantic Gap

Marketed & Supported By - 4.imimg.com · Salicon Nano Technology Pvt. Ltd. 111,First Floor, Laxmi Deep Tower, Laxmi Nagar Distt. Center, Delhi-110092,India Tel.:+91-11-22525940,40618940,

Exploring Expert Cognition for Attributed Network …people.tamu.edu/~xhuang/Xiao_WSDM18.pdfExploring Expert Cognition for Attributed Network Embedding ... First, expert cognition

On Coverage and Capacity for Disaster Area Wireless Networks …users.wpi.edu/~xhuang/pubs/2009_guo_jwcn.pdf · 2019. 3. 27. · is proposed for optimal base station selection within

Low-Power SDR Design on an FPGA for Intersatellite ...users.wpi.edu/~xhuang/pubs/2018_cai_tvlsi.pdfpaper presents a complete software-deﬁned radio (SDR) model for intersatellite

SALICON: Saliency in Contextqzhao/publications/pdf/salicon_cvpr… · Mouse tracking Mouse tracking and eye-mouse coordi-nation have been studied in the human-computer interac-tion

Chengyao Li, Jason Ku, and Steven L. Waslander · 2020. 3. 13. · Chengyao Li, Jason Ku, and Steven L. Waslander Abstract—Accurate and reliable 3D object detection is vital to

UNIVERSITY OF CALGARY FitClipse: a Testing Tool for ...ase.cpsc.ucalgary.ca/uploads/Publications/Chengyao... · survey on Executable Acceptance Test Driven Development. The results

DetectorDesignandComplexityAnalysisforCooperative ...users.wpi.edu/~xhuang/pubs/2011_peng_isrn.pdf · While communication via cooperative relays has seen a lot of active research

Worcester Polytechnic Institute - Accurate and Reliable Detection …xhuang/pubs/2016_chen_itsm.pdf · 2017. 7. 10. · IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE r 28 r WINTER

ComputerScience&Engineering,Texas A&M University ...people.tamu.edu/~xhuang/Xiao_SDM17_Accelerated.pdf · ComputerScience&Engineering,Texas A&M University,CollegeStation,TX,USA ComputerScience&Engineering,ArizonaStateUniversity,Tempe,AZ,USA

Security–quality aware routing for wireless multimedia ...users.wpi.edu/~xhuang/pubs/2015_rashwan_scn.pdf · is used to develop a secure routing protocol. It works by dividing data

ChipNet: Real-Time LiDAR Processing for Drivable Region ...users.wpi.edu/~xhuang/pubs/2018_lyu_tcas1.pdf · Convolutional Network (FCN) [25], various network struc-tures have been

Genetic Algorithm Based QoS Perception Routing Protocol ...users.wpi.edu/~xhuang/pubs/2018_zhang_wcmc.pdf · of location based routing protocols in urban environment for VANETs, an