parsing image facades with reinforcement learningvision.cse.psu.edu/research/symmcomp/workshop/shape...

Parsing Image Facades with Reinforcement LearningParsing Image Facades with Reinforcement Learning

Symmetry Detection from Real Word Images Workshop, CVPR 2011

Olivier Teboul, Iasonas Kokkinos, Panagiotis Katsourakis, Loic Simon and Nikos Paragios

6/29/2011 1

Semantic Segmentation of Urban ScenesSemantic Segmentation of Urban Scenes

• Image Parsing

building

pedestrian

street lamp

pavement

road

cars

trees

sky

shops

• Cropped and Rectified Building Images: `Facade Parsing’

6/29/2011 2

wall

balconnies

shop

windows

sky

roof

ImageImage--based Procedural 3D Modelsbased Procedural 3D Models

• Based on 2D parsing + simple extrude and insertion rules turn 2D to 3D…

6/29/2011 3

Problem StatementProblem Statement

• Input: image

• Output: labelling

• Pixel-level classification function:

• Objective:

• Wanted:

6/29/2011 4

original

– Top level: axiom

– Recursive application of shape operators

• Partition domain and assign label to each part

– Terminals: semantic labels (e.g. window, door etc)

Shape Grammars: Recursive Derivation of LabellingShape Grammars: Recursive Derivation of Labelling

6/29/2011 5

Binary Split GrammarsBinary Split Grammars

– Binary:

– Split: one dimension at a time

N1 a N2

N2 b N1

6/29/2011 6

ChallengesChallenges

6/29/2011 7

REINFORCEMENT LEARNING FOR REINFORCEMENT LEARNING FOR

SHAPE GRAMMAR PARSINGSHAPE GRAMMAR PARSING

6/29/2011 8

The 1D caseThe 1D case

Task: horizontal split(s) of image slice Binary Split Grammar

• 2 rules

floorWall wall floorWin

floorWin win floorWall

• Recursive segmentation

6/29/2011 9

• Agent iteratively interacting with environment

– Agent takes action, lands in new state

– Environment yields reward

Markov Decision Process (MDP) formulationMarkov Decision Process (MDP) formulation

Environment

Agent

– Potentially stochastic state transition and reward functions

• Goal: maximize cumulative reward

• Markov assumption

6/29/2011 10

MDP & Policy functionsMDP & Policy functions

• Policy function adopted by agent:

• Merit function

– Expected reward-to-go if at s we perform α, and then follow π

• ε-greedy policy:• ε-greedy policy:

• Reinforcement learning (Q-learning)

6/29/2011 11

Policy Evaluation

Policy Improvement

Reinforcement Learning AlgorithmReinforcement Learning Algorithm

???

6/29/2011

12

???

Enforcing Symmetry Enforcing Symmetry

• Straighforward extension: 2D state, 2D action

– Large state: Slow convergence

– Impossible to enforce floor symmetry

• Can we use single policy for all floors?

– DP: ?

6/29/2011

– DP: ?

– RL: Yes, with state aggregation

DataData--DrivenDriven ExplorationExploration

• Bottom-up cues:

– Line detection, window detection,...

• How can we exploit them in model fitting?

– Modify ε-greedy exploration strategy

6/29/2011 14

– Accumulate gradients:

– Use to `propose’ actions:

EXPERIMENTAL VALIDATIONEXPERIMENTAL VALIDATION

6/29/2011 15

Randomized ForestRandomized Forest

6/29/2011 16

Original wall window door balcony roof shop sky MAP

Quantitative Validation: Benchmark 2010Quantitative Validation: Benchmark 2010

• 20 images for training 10 images for testingoriginal MAP

6/29/2011 17

Potts RL

#generated buildings Time(sec)

[Simon 2011] ~600

RLParsing ~30

[Simon 2011] RL Parsing

Quantitative Validation: Benchmark 2011Quantitative Validation: Benchmark 2011

• Complete Benchmark:

– 104 annotated images

– Manual parsing

6/29/2011 18

Mean Std

Topology 0.93 0.09

Appearance 0.81 0.07

Qualitative ValidationQualitative Validation

Robustness to Artificial Noise and OcclusionsRobustness to Artificial Noise and Occlusions

• Salt-and-pepper noise from 0 to 100% (GMM learnt on noise-free image)

• Artificial occlusions added on images

6/29/2011 20

Robustness to Real Occlusions and IlluminationsRobustness to Real Occlusions and Illuminations

• Natural Occlusions • Cast Shadows

• Night Lights

6/29/2011 21

CONCLUSIONCONCLUSION

Theoretical contributions

Theoretical contributions

Binary Split Grammars: natural fit for façade modeling

Reinforcement learning: flexible techniques for shape parsing

Enforcing symmetry via state aggregation

Data-driven exploration

Efficient exploration of state-action space

6/29/2011 22

Efficient exploration of state-action space

State-of-the-art results on many grammars

Practical contributions

Annotated benchmark for façade parsing

Rflib: Open Source Libraries for Randomized Forests

grapes: software for Facade Parsing with Shape Grammar

Q&AQ&A

• Thank you!

6/29/2011 23

Parsing Algorithm ConvergenceParsing Algorithm Convergence

Artificial Data

Real DataReal Data

6/29/2011 24

ContributionsContributions

Theory

– Binary Shape Grammar(BSG): generic mutually recursive grammars well-suited for

façade modeling and optimization.

– Reformulation of the Parsing problem in the Reinforcement Learning framework

– Generic reinforcement learning algorithm for suitable for any BSG

– State aggregation for fast and consistent parsing

– Data-driven exploration to boost the convergence

– State-of-the-art quantitative and qualitative results on many grammars– State-of-the-art quantitative and qualitative results on many grammars

Practice

– Annotated benchmark for façade parsing

– Rflib: Open Source Libraries for Randomized Forests

– grapes: software for Facade Parsing with Shape Grammar

6/29/2011 25

3 3 classes classes of solutionsof solutions

…

• Dynamic Programming

– At each state s, consider all actions α.

– Obtain merit of state s, backpropagate.

Needs to consider all state-action combinations

• Monte Carlo

– Fix first action, α

6/29/2011

…

– Fix first action, α

– Probabilistically sample subsequent actions.

Needs to consider full episode

• Reinforcement Learning

– At each state pick single action

– Back-propagate locally

Dynamic Programming vs. Reinforcement LearningDynamic Programming vs. Reinforcement Learning

6/29/2011 27

UserUser--defined Constraintsdefined Constraints

6/29/2011 28

MDP & Policy functionsMDP & Policy functions

• Policy function adopted by agent:

• Merit function

– Expected reward-to-go if at s we perform α, and then follow π

• Bellman’s recursion:• Bellman’s recursion:

• Bellman’s recursion for optimal policy, :

6/29/2011 29

QQ--Learning AlgorithmLearning Algorithm

Policy

Evaluation

Sampling (MC)

old

estimate

current

estimate

new

estimate

6/29/2011 30

Policy

Improvement

Semi Markov Decision ProcessesSemi Markov Decision Processes

6/29/2011 31

Other RLOther RL--friendly Techniquesfriendly Techniques

6/29/2011 32

Gaussian Mixture ModelsGaussian Mixture Models

6/29/2011 33

strokes wall window roof shop MAP

HueHue

6/29/2011 34

original hue MAP

histogram polar histogram

parsing image facades with reinforcement learningvision.cse.psu.edu/research/symmcomp/workshop/shape...

Documents