fast semantic proposals for image and video annotation using … · 2019-04-09 · srikar...
TRANSCRIPT
Srikar MuppirisettyManager, Machine Learning and Data Analytics, Volvo Cars
Joint work with Sohini RoyChowdary preseted at ICMLA 2018
Fast Semantic Proposals for Image and Video Annotation using Modified ESNs
Motivation
Annotated data is the "future source
code " - Nvidia
Data is new Oil Mined annotated data is currency/Gold
With DL models, massive high quality annotated data becomes necessity
Challenges with Data Annotation
Cost• Manual annotation
expensive and time consuming
• Automatic generation of fast and accurate semantic pre-proposals for video and images.
Scalability• Scalability of algorithms across
data sets is often a challenge
• Proposed framework based on variant of RNN high level feature abstraction with very small number of image frames.
Recurrent Neural Network• Directed cycle between
connections between units
• Useful for series kind of data
• In RNN the decisions areinfluenced from what has learntfrom the past
• Difficulty in training RNNsvanishing gradient problem
Echo State Network• Variant of RNN• Input layer, dynamical reservoir,
output layer• Reservoir unitlarge sparsely
and randomly connectedneurons
• Inner weights fixed, and only the output layer is trained low computationl cost for training
Source:https://www.mdpi.com/1996-1073/8/10/12228/htm
𝒙𝒙 𝑘𝑘 = 𝑓𝑓(𝑾𝑾𝑖𝑖𝑖𝑖𝒖𝒖 𝑘𝑘 + 𝑾𝑾𝒙𝒙 𝑘𝑘 − 1 + 𝑾𝑾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝒚𝒚 𝑘𝑘 − 1 )y 𝑘𝑘 = 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜(𝑾𝑾𝑜𝑜𝑜𝑜𝑜𝑜(𝒖𝒖 𝑘𝑘 ,𝒙𝒙 𝑘𝑘 )
Tuning of ESN
Spectral radius
• Max absolute value of Eigenvalues of the weight matrix
• Higher spectralradius for longermemory
Reserviour size
• Numbers ofneurons in the reserviour
• Larger reservoiroffers betterperformance dueto more non-linearity in ESN
Connectivity
• Connectivity b/w diff. neurons in the weight matrix
• 10neuron networkconnectivity of 0.6 40 zero weights
ESN approach for Semantic Segmentation
Input image
Extract somehand-craftedimage planes
Extract input feature vector
per image pixel
ESN modelwith a
reservoir layer
Probabilityimage
predictedby ESN
Post processing
Final segmented
binarymage mask
Proposed ESN framework
Baseline ESN Modified ESN
Mathematical Formulation
• Reserviour state in Baseline ESN
• Reserviour state in Modified ESN
• Output state
Data Sets
Weizmann two object dataset
• 100 color images
• Manually annotated
LISA vehicle detection dataset
• 3 video subsets(30fps)
• Urban (300): 1car Sunny(300):3-4 cars
• Dense(1600): 4 or more cars
ADE Challenge subset data
• 20K images
• 125 images for drivable surface
• Manuallyannotated
Performance Metrics
Sens.Recall
Time
MaxF_score
Specificity
F_ScoreIOUAUC
FDR
Experiments & Results
Weizmann 2-object Dataset
ADE Challenge Data Set
LISA Vehicle Detection Data Set (Video)
Summary
• A modification to existing Echo state network (ESN) modelsto incorporate spatial and temporal features within an imageand also across a batch of training images.
• A modified ESN framework that generates fast automatedproposals (~1 second per image) for adetection/segmentation tasks using only 20-30% of a datasetfor training and testing on the remaining 70-80% dataset.