fast semantic proposals for image and video annotation using … · 2019-04-09 · srikar...

Srikar MuppirisettyManager, Machine Learning and Data Analytics, Volvo Cars

Joint work with Sohini RoyChowdary preseted at ICMLA 2018

Fast Semantic Proposals for Image and Video Annotation using Modified ESNs

Motivation

Annotated data is the "future source

code " - Nvidia

Data is new Oil Mined annotated data is currency/Gold

With DL models, massive high quality annotated data becomes necessity

Challenges with Data Annotation

Cost• Manual annotation

expensive and time consuming

• Automatic generation of fast and accurate semantic pre-proposals for video and images.

Scalability• Scalability of algorithms across

data sets is often a challenge

• Proposed framework based on variant of RNN high level feature abstraction with very small number of image frames.

Recurrent Neural Network• Directed cycle between

connections between units

• Useful for series kind of data

• In RNN the decisions areinfluenced from what has learntfrom the past

• Difficulty in training RNNsvanishing gradient problem

Echo State Network• Variant of RNN• Input layer, dynamical reservoir,

output layer• Reservoir unitlarge sparsely

and randomly connectedneurons

• Inner weights fixed, and only the output layer is trained low computationl cost for training

Source:https://www.mdpi.com/1996-1073/8/10/12228/htm

𝒙𝒙 𝑘𝑘 = 𝑓𝑓(𝑾𝑾𝑖𝑖𝑖𝑖𝒖𝒖 𝑘𝑘 + 𝑾𝑾𝒙𝒙 𝑘𝑘 − 1 + 𝑾𝑾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝒚𝒚 𝑘𝑘 − 1 )y 𝑘𝑘 = 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜(𝑾𝑾𝑜𝑜𝑜𝑜𝑜𝑜(𝒖𝒖 𝑘𝑘 ,𝒙𝒙 𝑘𝑘 )

Tuning of ESN

Spectral radius

• Max absolute value of Eigenvalues of the weight matrix

• Higher spectralradius for longermemory

Reserviour size

• Numbers ofneurons in the reserviour

• Larger reservoiroffers betterperformance dueto more non-linearity in ESN

Connectivity

• Connectivity b/w diff. neurons in the weight matrix

• 10neuron networkconnectivity of 0.6 40 zero weights

ESN approach for Semantic Segmentation

Input image

Extract somehand-craftedimage planes

Extract input feature vector

per image pixel

ESN modelwith a

reservoir layer

Probabilityimage

predictedby ESN

Post processing

Final segmented

binarymage mask

Proposed ESN framework

Baseline ESN Modified ESN

Mathematical Formulation

• Reserviour state in Baseline ESN

• Reserviour state in Modified ESN

• Output state

Data Sets

Weizmann two object dataset

• 100 color images

• Manually annotated

LISA vehicle detection dataset

• 3 video subsets(30fps)

• Urban (300): 1car Sunny(300):3-4 cars

• Dense(1600): 4 or more cars

ADE Challenge subset data

• 20K images

• 125 images for drivable surface

• Manuallyannotated

Performance Metrics

Sens.Recall

Time

MaxF_score

Specificity

F_ScoreIOUAUC

FDR

Experiments & Results

Weizmann 2-object Dataset

ADE Challenge Data Set

LISA Vehicle Detection Data Set (Video)

Summary

• A modification to existing Echo state network (ESN) modelsto incorporate spatial and temporal features within an imageand also across a batch of training images.

• A modified ESN framework that generates fast automatedproposals (~1 second per image) for adetection/segmentation tasks using only 20-30% of a datasetfor training and testing on the remaining 70-80% dataset.

fast semantic proposals for image and video annotation using … · 2019-04-09 · srikar...

Documents