exp1 exp2 exp3 - virginia techygoyal/posters/holistic.pdf · exp1 exp2 exp3 se ma ntic se g me nta...

1
Holistic Scene Understanding via Multiple Structured Hypotheses from Perception Modules Gordon Christie 1 * Ankit Laddha 2 * Aishwarya Agrawal 1 Stanislaw Antol 1 Yash Goyal 1 Dhruv Batra 1 1 Virginia Tech 2 CMU *equal contribution Overview Motivation Perception problems are hard Goal Holistic Scene Understanding (inputs from multiple modules) Challenges Inaccurate models Search space explosion Experiment 1: Captioned Scene Understanding Experiment 2: Indoor Scene Understanding Proposed Solution all possible segmentations all possible support estimations all possible sentence parsings X X .. X Semantic Segmentation Sentence Parsing “A dog is standing next to a woman on a couchCouch Person Person Person Dog Consistent Couch Module 1: Semantic Segmentation (SS) Module 2: Prepositional Phrase Attachment Resolution (PPAR) Datasets : ABSTRACT-50S/ PASCAL-50S/ NYU-v2 Features : Module, Consistent Preposition and Presence Approach Extract diverse hypotheses from multiple modules [1] Jointly reason about hypotheses o Develop “Mediator” model (factor graph) o Infer consistency ABSTRACT-50S Module INDEP Ours-MEDIATOR oracle PPAR 56.73 77.39 97.53 NYUv2 Module INDEP Ours-CASCADE Ours-MEDIATOR oracle SS 46.13 46.05 46.37 51.30 PPAR 61.54 57.69 64.42 92.31 Average 53.84 51.87 55.40 71.81 PASCAL-50S Module INDEP Ours-CASCADE Ours-MEDIATOR oracle SS 31.14 32.68 34.12 38.87 PPAR 62.42 78.92 87.00 96.50 Average 46.78 55.80 60.56 67.68 Methods: INDEP: 1-best solution for each module Ours-CASCADE: DivMBest for module1 + 1-best for module2 Ours-MEDIATOR: DivMBest for module1 and module2 oracle: best tuple always selected Semantic Segmentation 3D Support Estimation Consistent Other Structure Wall Other Prop Wall Other Structure Wall Table Sofa Table Other Prop Chair Wall Wall Television Module 1: Semantic Segmentation (SS) Module 2: 3D Support Estimation (SE) Dataset: NYUv2 Experiment 2 Results Module INDEP Joint Ours- CASCADE Ours- MEDIATOR oracle SS 64.24 62.00 64.22 64.24 70.24 SE 55.48 56.43 57.38 57.33 62.29 Average 59.86 59.22 60.80 60.79 66.27 Experiment 3: Urban Scene Understanding Module 1: 2D Semantic Segmentation Module 2: 3D Semantic Segmentation Dataset: CITY (stereo) 2D Semantic Segmentation 3D Semantic Segmentation Consistent Road Building Sky Person Car Sidewalk Building Road Sidewalk Curb Person Car Sky Road Mark Sidewalk Sidewalk Person Sign Sky Tree/bush Building Building Mark Road Building Sign Car Tree/bush Road Building Car Mark Car Road Tree/bush Sidewalk Sidewalk Curb Car Tree/bush Building Experiment 3 Results Module INDEP Ours- CASCADE Ours- MEDIATOR oracle 2D SS 54.80 55.65 55.65 57.82 3D SS 32.07 57.16 57.98 61.15 Average 43.44 56.41 56.82 59.49 [1] D. Batra et al. Diverse M-Best Solutions in Markov Random Fields. In ECCV, 2012. INDEP Ours-MEDIATOR INDEP Ours- MEDIATOR INDEP Ours- MEDIATOR +20.7 % +1.6 % +13.8 % Ours- MEDIATOR INDEP +0.93 % Ours- MEDIATOR INDEP +13.4 % couch couch dog cat couch

Upload: others

Post on 18-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exp1 Exp2 Exp3 - Virginia Techygoyal/posters/holistic.pdf · Exp1 Exp2 Exp3 Se ma ntic Se g me nta tion 3 D Su pport t ima tion 2 D Se ma ntic Se g me nta tion 3 D Se ma ntic Se g

Holistic Scene Understanding via Multiple Structured Hypotheses from Perception ModulesGordon Christie1* Ankit Laddha2* Aishwarya Agrawal1 Stanislaw Antol1 Yash Goyal1 Dhruv Batra1

1Virginia Tech 2CMU*equal contribution

Overview

MotivationPerception problems are hardGoalHolistic Scene Understanding (inputs from multiple modules)Challenges• Inaccurate models• Search space explosion

Experiment 1: Captioned Scene Understanding Experiment 2: Indoor Scene Understanding

Proposed Solution

all possible segmentations

all possiblesupport estimations

all possiblesentence parsings

Semantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y aSemantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y a

Semantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y a

X X .. X

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3 Module 1: Semantic Segmentation (SS)Module 2: Prepositional Phrase Attachment

Resolution (PPAR)Datasets : ABSTRACT-50S/ PASCAL-50S/

NYU-v2 Features : Module, Consistent Preposition

and Presence

Approach

• Extract diverse hypotheses from multiple modules [1]

• Jointly reason about hypotheseso Develop “Mediator”

model (factor graph)o Infer consistency

ABSTRACT-50S

Module INDEP Ours-MEDIATOR oracle

PPAR 56.73 77.39 97.53

NYUv2

Module INDEP Ours-CASCADE Ours-MEDIATOR oracle

SS 46.13 46.05 46.37 51.30

PPAR 61.54 57.69 64.42 92.31

Average 53.84 51.87 55.40 71.81

PASCAL-50S

Module INDEP Ours-CASCADE Ours-MEDIATOR oracle

SS 31.14 32.68 34.12 38.87

PPAR 62.42 78.92 87.00 96.50

Average 46.78 55.80 60.56 67.68

Methods:• INDEP: 1-best solution for each

module• Ours-CASCADE: DivMBest for module1

+ 1-best for module2• Ours-MEDIATOR: DivMBest for

module1 and module2• oracle: best tuple always selected

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Module 1: Semantic Segmentation (SS)Module 2: 3D Support Estimation (SE)Dataset: NYUv2

Experiment 2 Results

Module INDEP Joint Ours-CASCADE

Ours-MEDIATOR

oracle

SS 64.24 62.00 64.22 64.24 70.24

SE 55.48 56.43 57.38 57.33 62.29

Average 59.86 59.22 60.80 60.79 66.27

Experiment 3: Urban Scene Understanding

Module 1: 2D Semantic Segmentation Module 2: 3D Semantic SegmentationDataset: CITY (stereo)

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Experiment 3 Results

Module INDEP Ours-CASCADE

Ours-MEDIATOR

oracle

2D SS 54.80 55.65 55.65 57.82

3D SS 32.07 57.16 57.98 61.15

Average 43.44 56.41 56.82 59.49

[1] D. Batra et al. Diverse M-Best Solutions in Markov Random Fields. In ECCV, 2012.

INDEP

Ours-MEDIATOR

INDEP

Ours-MEDIATOR

INDEP

Ours-MEDIATOR

+20.7 %

+1.6 %

+13.8 %

Ours-MEDIATOR

INDEP+0.93 %

Ours-MEDIATOR

INDEP+13.4 %

…couchcouch dog cat

couch