discriminative models for multi- class object...
Post on 22-Aug-2020
0 Views
Preview:
TRANSCRIPT
Discriminative Models for Multi-Class Object Layout
Chaitanya Desai, Deva Ramanan, Charles Fowlkes
Presented by:
Vignesh Ramanathan, Vivardhan Kanoria, Kevin Truong
Introduction Why another Object Detector?
Issues with other Detectors:
Binary 0-1 classification model for each image window and object class, independent of the remaining image and objects present in it
Heuristic post processing to improve performance of detectors on datasets, e.g. Non Maximal Suppression
Interactions between Objects 1. Activation
Intra Class – Textures of Objects
[17] Y. Liu, W. Lin, and J. Hays. Near-regular texture analysis and manipulation. ACM Transactions on Graphics, 23(3):368–376, 2004
Between Class – Spatial Cueing
Interactions between Objects 2. Inhibition
Intra Class – Non Maximal Suppression
Between Class – Mutual Exclusion
Interactions between Objects 3. Global Properties
Between Class – Co-occurrence
At most 1 biker per bike
Intra Class – Total Counts
At most 1 Sydney Opera House
Summary of Spatial Interactions Modeled
Within Class Between Class
Activation Textures of Objects Spatial Cueing
Inhibition Non Maximal Suppression Mutual Exclusion
Global Expected Counts Co-occurrence
Contributions of Multi-Class Object Layout
The object layout framework formulates detection as a structured prediction task for an entire image rather than a binary classification task on sub-windows
The model learns all of the listed spatial interactions, in addition to learning local appearance statistics
Problem Formulation
The objective is to train a model to detect multiple classes of objects in test images given training images with annotated bounding boxes for each class specified
learning Model Parameters
Test Image
Inference
Model Formulation
Suppose we wish to model 𝐾 different object classes. The vector of object labels is
𝑙𝑎𝑏𝑒𝑙𝑠: 𝑌 = 𝑦𝑖: 𝑖 = 1…𝑀 , 𝑦𝑖𝜖 0…𝐾 ; 0 = background
Construct the image pyramid Let 𝑀 be the total number of sub-windows. An image 𝑋 is represented by a set of features 𝑥𝑖: 𝑋 = 𝑥𝑖: 𝑖 = 1…𝑀
𝒙𝒊 = 𝑯𝑶𝑮 𝑭𝒆𝒂𝒕𝒖𝒓𝒆𝒔 𝒚𝒊 = 𝟑 (𝒉𝒖𝒎𝒂𝒏)
Task: Model should predict all labels Y, given an image X
Spatial Interaction Model
The spatial configuration of a window 𝑗 with respect to a window 𝑖 is encoded as follows:
𝑑𝑖𝑗 =
𝑁𝑒𝑎𝑟?𝐹𝑎𝑟?𝐴𝑏𝑜𝑣𝑒?𝑂𝑛𝑡𝑜𝑝?𝐵𝑒𝑙𝑜𝑤?𝑁𝑒𝑥𝑡 − 𝑡𝑜?50% 𝑂𝑣𝑒𝑟𝑙𝑎𝑝?
;
j=1
j=2
𝑑𝑖1 =
1000000
; 𝑑𝑖2 =
1000010
𝑑𝑖𝑗 is a 7 dimensional sparse binary vector:
The first 6 components depend only on the relative location of the center of window j with respect to window i.
Model Parameters The score of labeling an image X with labels Y is:
𝑆 𝑋, 𝑌 = 𝜔𝑦𝑖,𝑦𝑗𝑇
𝑖,𝑗
𝑑𝑖𝑗 + 𝜔𝑦𝑖𝑇
𝑖
𝑥𝑖 ;
where 𝜔𝑎,𝑏 and 𝜔𝑐 are model parameters.
Sum over all pairs of windows Sum over all windows
𝜔𝑎,𝑏 captures spatial interactions between object classes a and b
𝜔𝑐 captures local appearance characteristics of object class c
• 𝜔𝑎,𝑏 = 7 × 1; 𝑎, 𝑏 ∈ 0…𝐾 × 0…𝐾
• 𝜔𝑐 = 𝑆𝑖𝑧𝑒 𝑜𝑓 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑥𝑖 𝐻𝑂𝐺, 𝑒𝑡𝑐. ; 𝑐 𝜖 0…𝐾
Append a 1 to each 𝑥𝑖 to learn biases between classes
Assign 𝜔0 and 𝜔0,1 and 𝜔1,0 to be 0
Inference: NP Hard
To get the desired detection, we need to compute:
arg𝑚𝑎𝑥𝑌 𝑆 𝑋, 𝑌 = arg𝑚𝑎𝑥𝑌 𝜔𝑦𝑖,𝑦𝑗𝑇
𝑖,𝑗
𝑑𝑖𝑗 + 𝜔𝑦𝑖𝑇
𝑖
𝑥𝑖
i.e. Find the labeling 𝑌 that maximizes the score S for image 𝑋, given learnt model parameters 𝜔
There are (𝐾 + 1)𝑀 possible values for 𝑌
This is NP hard.
Inference: Greedy Forward Search Algorithm
1. Initialize all labels to 0 (i.e. background)
2. Repeatedly change the label of window 𝑖 to class 𝑐, where:
𝑖, 𝑐 is the window-class pair that maximizes the increase in score S(X,Y)
3. Stop when all windows have been instanced or step 2 causes a decrease in score
Effectiveness was tested on small scale problems where the brute force solution was easily computed
The score for the greedy forward search algorithm was found to be quite close to the actual solution
The two solutions typically differed in the labels of 1-3 windows
Greedy Forward Search: Details
Initialize 𝐼 = ; Set of instanced windows 𝑆 = 0; ∆ 𝑖, 𝑐 = 𝜔𝑐
𝑇𝑥𝑖 ; Change in score Repeat:
1. 𝑖∗, 𝑐∗ = arg𝑚𝑎𝑥(𝑖,𝑐)∉𝐼 ∆ 𝑖, 𝑐
2. 𝐼 = 𝐼 ∪ 𝑖∗, 𝑐∗ 3. 𝑆 = 𝑆 + ∆ 𝑖∗, 𝑐∗
4. ∆ 𝑖, 𝑐 = ∆ 𝑖, 𝑐 + 𝜔𝑐∗,𝑐𝑇 𝑑𝑖∗,𝑖 +𝜔𝑐,𝑐∗
𝑇 𝑑𝑖,𝑖∗
Stop when: ∆ 𝑖∗, 𝑐∗ < 0 or all windows have been instanced
CRF Formulation - Scoring
Model 𝑃(𝑌|𝑋) as a CRF with pairwise potentials between 𝑌 and each 𝑋
being exponential in 𝑆 𝑋, 𝑌 , i.e. 𝑃 𝑌 𝑋 =1
𝑍(𝑋)𝑒𝑆(𝑋,𝑌)
A natural choice for scoring each detection is the log odds ratio between probability of detecting a class c versus detecting any other class:
𝑚 𝑦𝑖 = 𝑐 = 𝑙𝑜𝑔𝑃(𝑦𝑖 = 𝑐|𝑋)
𝑃(𝑦𝑖 ≠ 𝑐|𝑋)= 𝑙𝑜𝑔
𝑃(𝑦𝑖 = 𝑐, 𝒚𝒓|𝑋)𝒚𝒓
𝑃(𝑦𝑖 = 𝑐′, 𝒚𝒔|𝑋)𝒚𝒔,𝒄
′≠𝑐
Assume that both marginals are dominated by their largest terms. These are given by:
𝑟∗ = arg𝑚𝑎𝑥𝑟 𝑆 𝑋, 𝑦𝑖 = 𝑐, 𝑦𝑟 𝑠∗ = arg𝑚𝑎𝑥𝑠,𝑐′≠𝑐 𝑆 𝑋, 𝑦𝑖 = 𝑐
′, 𝑦𝑠
Then the log odds ratio is given by:
𝑚 𝑦𝑖 = 𝑐 ≈ 𝑙𝑜𝑔𝑃 𝑦𝑖=𝑐,𝑦𝑟∗ 𝑋
𝑃 𝑦𝑖=𝑐∗,𝑦𝑠∗ 𝑋
= 𝑆 𝑋, 𝑦𝑖 = 𝑐, 𝑦𝑟∗ − 𝑆 𝑋, 𝑦𝑖 = 𝑐∗, 𝑦𝑠∗
top related