parsing image facades with reinforcement learningvision.cse.psu.edu/research/symmcomp/workshop/shape...
TRANSCRIPT
Parsing Image Facades with Reinforcement LearningParsing Image Facades with Reinforcement Learning
Symmetry Detection from Real Word Images Workshop, CVPR 2011
Olivier Teboul, Iasonas Kokkinos, Panagiotis Katsourakis, Loic Simon and Nikos Paragios
6/29/2011 1
Semantic Segmentation of Urban ScenesSemantic Segmentation of Urban Scenes
• Image Parsing
building
pedestrian
street lamp
pavement
road
cars
trees
sky
shops
• Cropped and Rectified Building Images: `Facade Parsing’
6/29/2011 2
wall
balconnies
shop
windows
sky
roof
ImageImage--based Procedural 3D Modelsbased Procedural 3D Models
• Based on 2D parsing + simple extrude and insertion rules turn 2D to 3D…
6/29/2011 3
Problem StatementProblem Statement
• Input: image
• Output: labelling
• Pixel-level classification function:
• Objective:
• Wanted:
6/29/2011 4
original
– Top level: axiom
– Recursive application of shape operators
• Partition domain and assign label to each part
– Terminals: semantic labels (e.g. window, door etc)
Shape Grammars: Recursive Derivation of LabellingShape Grammars: Recursive Derivation of Labelling
6/29/2011 5
Binary Split GrammarsBinary Split Grammars
– Binary:
– Split: one dimension at a time
N1 a N2
N2 b N1
6/29/2011 6
ChallengesChallenges
6/29/2011 7
REINFORCEMENT LEARNING FOR REINFORCEMENT LEARNING FOR
SHAPE GRAMMAR PARSINGSHAPE GRAMMAR PARSING
6/29/2011 8
The 1D caseThe 1D case
Task: horizontal split(s) of image slice Binary Split Grammar
• 2 rules
floorWall wall floorWin
floorWin win floorWall
• Recursive segmentation
6/29/2011 9
• Agent iteratively interacting with environment
– Agent takes action, lands in new state
– Environment yields reward
Markov Decision Process (MDP) formulationMarkov Decision Process (MDP) formulation
Environment
Agent
– Potentially stochastic state transition and reward functions
• Goal: maximize cumulative reward
• Markov assumption
6/29/2011 10
MDP & Policy functionsMDP & Policy functions
• Policy function adopted by agent:
• Merit function
– Expected reward-to-go if at s we perform α, and then follow π
• ε-greedy policy:• ε-greedy policy:
• Reinforcement learning (Q-learning)
6/29/2011 11
Policy Evaluation
Policy Improvement
Reinforcement Learning AlgorithmReinforcement Learning Algorithm
???
6/29/2011
12
???
Enforcing Symmetry Enforcing Symmetry
• Straighforward extension: 2D state, 2D action
– Large state: Slow convergence
– Impossible to enforce floor symmetry
• Can we use single policy for all floors?
– DP: ?
6/29/2011
– DP: ?
– RL: Yes, with state aggregation
DataData--DrivenDriven ExplorationExploration
• Bottom-up cues:
– Line detection, window detection,...
• How can we exploit them in model fitting?
– Modify ε-greedy exploration strategy
6/29/2011 14
– Accumulate gradients:
– Use to `propose’ actions:
EXPERIMENTAL VALIDATIONEXPERIMENTAL VALIDATION
6/29/2011 15
Randomized ForestRandomized Forest
6/29/2011 16
Original wall window door balcony roof shop sky MAP
Quantitative Validation: Benchmark 2010Quantitative Validation: Benchmark 2010
• 20 images for training 10 images for testingoriginal MAP
6/29/2011 17
Potts RL
#generated buildings Time(sec)
[Simon 2011] ~600
RLParsing ~30
[Simon 2011] RL Parsing
Quantitative Validation: Benchmark 2011Quantitative Validation: Benchmark 2011
• Complete Benchmark:
– 104 annotated images
– Manual parsing
6/29/2011 18
Mean Std
Topology 0.93 0.09
Appearance 0.81 0.07
Qualitative ValidationQualitative Validation
Robustness to Artificial Noise and OcclusionsRobustness to Artificial Noise and Occlusions
• Salt-and-pepper noise from 0 to 100% (GMM learnt on noise-free image)
• Artificial occlusions added on images
6/29/2011 20
Robustness to Real Occlusions and IlluminationsRobustness to Real Occlusions and Illuminations
• Natural Occlusions • Cast Shadows
• Night Lights
6/29/2011 21
CONCLUSIONCONCLUSION
Theoretical contributions
Theoretical contributions
Binary Split Grammars: natural fit for façade modeling
Reinforcement learning: flexible techniques for shape parsing
Enforcing symmetry via state aggregation
Data-driven exploration
Efficient exploration of state-action space
6/29/2011 22
Efficient exploration of state-action space
State-of-the-art results on many grammars
Practical contributions
Annotated benchmark for façade parsing
Rflib: Open Source Libraries for Randomized Forests
grapes: software for Facade Parsing with Shape Grammar
Q&AQ&A
• Thank you!
6/29/2011 23
Parsing Algorithm ConvergenceParsing Algorithm Convergence
Artificial Data
Real DataReal Data
6/29/2011 24
ContributionsContributions
Theory
– Binary Shape Grammar(BSG): generic mutually recursive grammars well-suited for
façade modeling and optimization.
– Reformulation of the Parsing problem in the Reinforcement Learning framework
– Generic reinforcement learning algorithm for suitable for any BSG
– State aggregation for fast and consistent parsing
– Data-driven exploration to boost the convergence
– State-of-the-art quantitative and qualitative results on many grammars– State-of-the-art quantitative and qualitative results on many grammars
Practice
– Annotated benchmark for façade parsing
– Rflib: Open Source Libraries for Randomized Forests
– grapes: software for Facade Parsing with Shape Grammar
6/29/2011 25
3 3 classes classes of solutionsof solutions
…
• Dynamic Programming
– At each state s, consider all actions α.
– Obtain merit of state s, backpropagate.
Needs to consider all state-action combinations
• Monte Carlo
– Fix first action, α
6/29/2011
…
– Fix first action, α
– Probabilistically sample subsequent actions.
Needs to consider full episode
• Reinforcement Learning
– At each state pick single action
– Back-propagate locally
Dynamic Programming vs. Reinforcement LearningDynamic Programming vs. Reinforcement Learning
6/29/2011 27
UserUser--defined Constraintsdefined Constraints
6/29/2011 28
MDP & Policy functionsMDP & Policy functions
• Policy function adopted by agent:
• Merit function
– Expected reward-to-go if at s we perform α, and then follow π
• Bellman’s recursion:• Bellman’s recursion:
• Bellman’s recursion for optimal policy, :
6/29/2011 29
QQ--Learning AlgorithmLearning Algorithm
Policy
Evaluation
Sampling (MC)
old
estimate
current
estimate
new
estimate
6/29/2011 30
Policy
Improvement
Semi Markov Decision ProcessesSemi Markov Decision Processes
6/29/2011 31
Other RLOther RL--friendly Techniquesfriendly Techniques
6/29/2011 32
Gaussian Mixture ModelsGaussian Mixture Models
6/29/2011 33
strokes wall window roof shop MAP
HueHue
6/29/2011 34
original hue MAP
histogram polar histogram