exp1 exp2 exp3 - virginia techygoyal/posters/holistic.pdf · exp1 exp2 exp3 se ma ntic se g me nta...
TRANSCRIPT
Holistic Scene Understanding via Multiple Structured Hypotheses from Perception ModulesGordon Christie1* Ankit Laddha2* Aishwarya Agrawal1 Stanislaw Antol1 Yash Goyal1 Dhruv Batra1
1Virginia Tech 2CMU*equal contribution
Overview
MotivationPerception problems are hardGoalHolistic Scene Understanding (inputs from multiple modules)Challenges• Inaccurate models• Search space explosion
Experiment 1: Captioned Scene Understanding Experiment 2: Indoor Scene Understanding
Proposed Solution
all possible segmentations
all possiblesupport estimations
all possiblesentence parsings
Semantic
Segmentation
Sentence
Parsing
Support
Estimation
ay
S(yi )
S(zj )
S(ak)
C(yk,zj ) C(zj ,ak)
C(yi ,ak)
C(yi ,zj ,ak)
z
Delta Approximation Delta Approximation
Approximation
Delta
aSco
re(a
)
Sco
re(y
)
Sco
re(z
)
Sco
re(z
)
Sco
re(y
)
Sco
re(a
)
y
z z
y aSemantic
Segmentation
Sentence
Parsing
Support
Estimation
ay
S(yi )
S(zj )
S(ak)
C(yk,zj ) C(zj ,ak)
C(yi ,ak)
C(yi ,zj ,ak)
z
Delta Approximation Delta Approximation
Approximation
Delta
aSco
re(a
)
Sco
re(y
)
Sco
re(z
)
Sco
re(z
)
Sco
re(y
)
Sco
re(a
)
y
z z
y a
Semantic
Segmentation
Sentence
Parsing
Support
Estimation
ay
S(yi )
S(zj )
S(ak)
C(yk,zj ) C(zj ,ak)
C(yi ,ak)
C(yi ,zj ,ak)
z
Delta Approximation Delta Approximation
Approximation
Delta
aSco
re(a
)
Sco
re(y
)
Sco
re(z
)
Sco
re(z
)
Sco
re(y
)
Sco
re(a
)
y
z z
y a
X X .. X
Semantic
Segmentation
3D Support
Estimation
2D Semantic
Segmentation
3D Semantic
Segmentation
Consistent
Hypothesis
#1
Hypothesis
#2
Hypothesis
#M
Semantic
Segmentation
Sentence
Parsing
“A dog is standing next to a woman
on a couch”
Consistent
Couch
Person
Person
Person
Couch
Dog
Other
Structure Wall
Other
Prop
Wall Other
Structure
Wall
Table
Sofa
Table
Other
Prop
Chair
Wall
Wall
Television
Consistent
Road
Building Sky
Person Car
Sidewalk
Building
Road
Sidewalk
Curb
Person Car
Sky
Road
Mark Sidewalk
Sidewalk Person
Sign Sky Tree/bush
Building
Building
Mark Road
Building Sign
Car
Tree/bush
Road
Building Car
Mark
Car
Road
Tree/bush
Sidewalk
Sidewalk
Curb
Car
Tree/bush Building
Couch
Exp1 Exp2 Exp3 Module 1: Semantic Segmentation (SS)Module 2: Prepositional Phrase Attachment
Resolution (PPAR)Datasets : ABSTRACT-50S/ PASCAL-50S/
NYU-v2 Features : Module, Consistent Preposition
and Presence
Approach
• Extract diverse hypotheses from multiple modules [1]
• Jointly reason about hypotheseso Develop “Mediator”
model (factor graph)o Infer consistency
ABSTRACT-50S
Module INDEP Ours-MEDIATOR oracle
PPAR 56.73 77.39 97.53
NYUv2
Module INDEP Ours-CASCADE Ours-MEDIATOR oracle
SS 46.13 46.05 46.37 51.30
PPAR 61.54 57.69 64.42 92.31
Average 53.84 51.87 55.40 71.81
PASCAL-50S
Module INDEP Ours-CASCADE Ours-MEDIATOR oracle
SS 31.14 32.68 34.12 38.87
PPAR 62.42 78.92 87.00 96.50
Average 46.78 55.80 60.56 67.68
Methods:• INDEP: 1-best solution for each
module• Ours-CASCADE: DivMBest for module1
+ 1-best for module2• Ours-MEDIATOR: DivMBest for
module1 and module2• oracle: best tuple always selected
Semantic
Segmentation
3D Support
Estimation
2D Semantic
Segmentation
3D Semantic
Segmentation
Consistent
Hypothesis
#1
Hypothesis
#2
Hypothesis
#M
Semantic
Segmentation
Sentence
Parsing
“A dog is standing next to a woman
on a couch”
Consistent
Couch
Person
Person
Person
Couch
Dog
Other
Structure Wall
Other
Prop
Wall Other
Structure
Wall
Table
Sofa
Table
Other
Prop
Chair
Wall
Wall
Television
Consistent
Road
Building Sky
Person Car
Sidewalk
Building
Road
Sidewalk
Curb
Person Car
Sky
Road
Mark Sidewalk
Sidewalk Person
Sign Sky Tree/bush
Building
Building
Mark Road
Building Sign
Car
Tree/bush
Road
Building Car
Mark
Car
Road
Tree/bush
Sidewalk
Sidewalk
Curb
Car
Tree/bush Building
Couch
Exp1 Exp2 Exp3
Semantic
Segmentation
3D Support
Estimation
2D Semantic
Segmentation
3D Semantic
Segmentation
Consistent
Hypothesis
#1
Hypothesis
#2
Hypothesis
#M
Semantic
Segmentation
Sentence
Parsing
“A dog is standing next to a woman
on a couch”
Consistent
Couch
Person
Person
Person
Couch
Dog
Other
Structure Wall
Other
Prop
Wall Other
Structure
Wall
Table
Sofa
Table
Other
Prop
Chair
Wall
Wall
Television
Consistent
Road
Building Sky
Person Car
Sidewalk
Building
Road
Sidewalk
Curb
Person Car
Sky
Road
Mark Sidewalk
Sidewalk Person
Sign Sky Tree/bush
Building
Building
Mark Road
Building Sign
Car
Tree/bush
Road
Building Car
Mark
Car
Road
Tree/bush
Sidewalk
Sidewalk
Curb
Car
Tree/bush Building
Couch
Exp1 Exp2 Exp3
Module 1: Semantic Segmentation (SS)Module 2: 3D Support Estimation (SE)Dataset: NYUv2
Experiment 2 Results
Module INDEP Joint Ours-CASCADE
Ours-MEDIATOR
oracle
SS 64.24 62.00 64.22 64.24 70.24
SE 55.48 56.43 57.38 57.33 62.29
Average 59.86 59.22 60.80 60.79 66.27
Experiment 3: Urban Scene Understanding
Module 1: 2D Semantic Segmentation Module 2: 3D Semantic SegmentationDataset: CITY (stereo)
Semantic
Segmentation
3D Support
Estimation
2D Semantic
Segmentation
3D Semantic
Segmentation
Consistent
Hypothesis
#1
Hypothesis
#2
Hypothesis
#M
Semantic
Segmentation
Sentence
Parsing
“A dog is standing next to a woman
on a couch”
Consistent
Couch
Person
Person
Person
Couch
Dog
Other
Structure Wall
Other
Prop
Wall Other
Structure
Wall
Table
Sofa
Table
Other
Prop
Chair
Wall
Wall
Television
Consistent
Road
Building Sky
Person Car
Sidewalk
Building
Road
Sidewalk
Curb
Person Car
Sky
Road
Mark Sidewalk
Sidewalk Person
Sign Sky Tree/bush
Building
Building
Mark Road
Building Sign
Car
Tree/bush
Road
Building Car
Mark
Car
Road
Tree/bush
Sidewalk
Sidewalk
Curb
Car
Tree/bush Building
Couch
Exp1 Exp2 Exp3
Experiment 3 Results
Module INDEP Ours-CASCADE
Ours-MEDIATOR
oracle
2D SS 54.80 55.65 55.65 57.82
3D SS 32.07 57.16 57.98 61.15
Average 43.44 56.41 56.82 59.49
[1] D. Batra et al. Diverse M-Best Solutions in Markov Random Fields. In ECCV, 2012.
INDEP
Ours-MEDIATOR
INDEP
Ours-MEDIATOR
INDEP
Ours-MEDIATOR
+20.7 %
+1.6 %
+13.8 %
Ours-MEDIATOR
INDEP+0.93 %
Ours-MEDIATOR
INDEP+13.4 %
…couchcouch dog cat
couch
…