composite statistical modeling in segmentation
Post on 07-Jan-2017
230 Views
Preview:
TRANSCRIPT
Composite Statistical Modeling
in Segmentation
Fuxin Li
Georgia Institute of Technology
http://www.cc.gatech.edu/~fli/
1
CollaboratorsJoao Carreira Cristian Sminchisescu Guy Lebanon
Ahmad Humayun David Tsai James M. RehgTaeyoung Kim
2
Outline
• Composite statistical modeling in semantic
segmentation
– Learning
– Inference
• in video segmentation
– Learning
3
Recognizing Objects in a Scene
• Given an image, identify the category and spatial
extent of all relevant objects
– a.k.a. Semantic segmentation (Shotton et al. 2006, 2008, Csurka
and Perronin 2010, Boix et al. 2010, Ladicky et al. 2010, Bourdev and Malk 2009,
Bourdev et al. 2010, Xia et al. 2013, Yalladopour et al. 2013, Z. Li et al. 2013)
Horse
Person
Horse
Person
Image Category Label Object Label
Obj 1
Obj 2
Obj 3
Obj 4
Semantic Segmentation 4
Multiple Segmentation Hypotheses
- First used in the Bonn entry (Carreira, Li, Sminchisescu) winning
PASCAL VOC Segmentation Challenge 2009- (CVPR 2014) New algorithm RIGOR that can achieve CPMC
accuracy in 2-4 seconds per image (CPU-only)
Semantic Segmentation: Learning 5
SVRSEGM:Regression on overlap
• Regress on maximal class-specific overlap
Overlap:
Overlap with Horse class: (maximize over 2 horses)80.8% 36.5% 4.7%
Semantic Segmentation: Learning
Li, Carreira, Sminchisescu, CVPR 2010, IJCV2012
6
SVRSEGM• 1-vs-all class-specific overlap regression
on many segment hypotheses
• Heuristic sequential post-processing
Semantic Segmentation: Learning 7
Composite Statistical ModelingTraining set For each Generate (Bottom-up)
Extract features on segments,
learn models that predict statistics (overlap)
Testing image Generate Predict Inference
Recover pixel labels
from prediction
Segment Statistics
(overlap)
Semantic Segmentation 8
Composite Statistical Learning
Composite Statistical Inference
Open Inference Problem
• Resolve noisy predictions on noisy segments
• Identify complicated object interactions,
especially occluded/disconnected objects
Goal:
Category: Object:
Semantic Segmentation: Inference 9
Li, Carreira, Lebanon, Sminchisescu,
CVPR 2013
Idea #1: Break and Recombine
• Break the segments apart and recombine them
– Initial enumerations are constrained
• e.g. continuity, boundary adherence
– Interactions among objects• Create occlusions!
Semantic Segmentation: Inference 10
Dissecting Segments
Seg #1: Chair 0.53
Person 0.29
Seg #2: Chair 0.36
Person 0.47
Seg #3: Chair 0.34
Person 0.54Superpixels:Seg #4: Chair 0.19
Person 0.43
Semantic Segmentation: Inference
1 2
45
6
3
7
11
Generating the Overlap Statistic
• Parametrize on superpixels:
𝐵1
𝜃𝑖𝑗|𝑆𝑖| = num. of category 𝑐𝑗ground truth pixels in 𝑆𝑖
V𝑗 𝐴1; 𝜃 =|𝐴1∩𝐺𝑇|
|𝐴1∪𝐺𝑇|=
𝐴𝑙𝑙 𝐺𝑇 𝑝𝑖𝑥𝑒𝑙𝑠 𝑖𝑛 𝐴1
|𝐴1|+𝐴𝑙𝑙 𝐺𝑇 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝐴1=
𝑆𝑖∈𝐴1𝜃𝑖𝑗|𝑆𝑖|
𝑆𝑖∈𝐴1|𝑆𝑖|+ 𝑆𝑖∉𝐴1
𝜃𝑖𝑗|𝑆𝑖|
𝜃11: % of chair
𝜃12: % of person
𝜃21: % of chair
𝜃22: % of person
𝜃31: % of chair
𝜃32: % of person
𝜃41: % of chair
𝜃42: % of person
𝑆1 𝑆2
𝑆3 𝑆4
𝐴1: Chair 0.53
Person 0.29
Semantic Segmentation: Inference 12
Idea #2: Composite Statistical
Inference• MCLE, moment matching:
• 𝐴𝑖: segments
• 𝑉𝑗: Predicted overlap with category 𝐶𝑗
min𝜃
𝑗=1
𝑐
𝑖=1
𝑚
𝑉𝑗 𝐴𝑖; 𝜃 − 𝑉𝑗 𝐴𝑖2
“Generated” statistic Predicted from the regressor
Jointly over all categories + all segments!
13
Joint optimization
• 𝜃 map after joint optimization on all objects:
Semantic Segmentation: Inference
Chair
Person
Person
Chair
14
Idea #3: Separating Multiple Objects
from the Same Category
• MAP within each category to determine number
of objects– Geometric prior favors less objects
𝑛ℎ𝑜𝑟𝑠𝑒 = 1
𝜃 map:
Posterior: -40.133 𝑛ℎ𝑜𝑟𝑠𝑒 = 2 Posterior: -35.889
𝑛ℎ𝑜𝑟𝑠𝑒 = 3 Posterior: -47.600
15
Joint optimization
• 𝜃 map after joint optimization on all objects:Horse Person Horse
Person
Person
PersonPerson
HorseHorse
Obj1Obj4
Obj2Obj3
Final Result:
Semantic Segmentation: Inference 16
Results: PASCAL 2012
• CSI does especially well on high-interaction objects such
as bike, person, chair, sofa, etc.
Semantic Segmentation: Inference
SVRSEGM JSL CSI
46.8% 47.0% 47.5%
Xia et al. 2013 Yadollahpour et al. 2013 Li et al. 2013
48.0% 48.1% 48.3%
+ mix of models,
more data:
with only PASCAL
training data, only overlap
Person PersonPerson
Person
Horse
PlantChair
TableChair
DogChair
Chair
Bike
Person
Horse
17
PASCAL: noise-free case
• Supply ground truth overlap to different
algorithms
– Upper bound performance with perfect regressor,
noisy segments
• Recombination is important!
SVRSEGM CPMC Best CSI Superpixel Best
79.0% 81.8% 90.2% 95.1%
Person
PersonMotorbike
Obj 1
Obj 2
Obj 3
Obj 1
Obj 2
Obj 3
Semantic Segmentation: Inference 18
Composite Statistical Modeling
Composite Statistical Learning
Composite Statistical Inference
Training set For each Generate (Bottom-up)
Extract features on segments,
perform regression to learn model
Testing image Generate Predict Inference
(Top-down)
Class-specific overlap
Break and recombine
19
Approach
• Track all segments from each frame
– Long-term appearance model for each track
– Every segment starts a track (1000+ tracks)
– Training: Use all segments, regress against overlap
with each track (0-1 segment per frame)
Video Segmentation
Track 1:
Track 3:
Track 2:
21
Least squares make wonders
𝐗⊤𝐗
Store one vector per appearance model
plus a global covariance matrix
Enables learning/optimal online updating
1000+ appearance models
𝐖 = −1
min𝐖
𝐖⊤𝐗 − 𝐘 2 + 𝜆 𝐖 𝐹2
22
How to use that in video?
• Always use the whole segment pool to train
– If we go from 1st – 20th frame, our training set is
always all the segments in all the frames, for ANY
target
– Online update: At each frame, add all segments from
the frame to 𝐗⊤𝐗 and 𝐗⊤𝒚
Video Segmentation 23
Greedy Trimming of Tracks
• Test on the next frame
– Obtain the regression result of every segment against
every track
– Choose best-scoring segment to match
Video Segmentation 24
Results
• Automatically reduce number of tracks from
1200 (CPMC) to 60 per sequence
Video Segmentation 25
Numbers
• We beat closest competitor by 14%
• CSI Refinement improves 3%
• Purely automatic, no user input
Video Segmentation
SPT SPT
+CSI
Pairw
ise
Kim et
al. 2011
Grundmann
et al. 2010
Oracle
Segment
Mean per
object
62.7 65.9 55.4 45.3 51.8 78.6
Mean per
sequence
68.0 71.2 58.6 57.3 50.8 81.5
Avg. number
of tracks
60.0 60.0 702.8 10.6 336.6 1219.3
26
Conclusion
• Composite statistical modeling
– Holistic segments, object-scale models
– Training is a breeze (regression)
– Least squares offer additional benefits
– Breaking down + recombine segments for refinement
– Refinement will be needed when we are going from
85% to 90%
• Or inferring about higher-level semantics, occlusion, etc.
27
top related