unsupervised video object segmentation for deep ...€¦ · unsupervised video object segmentation...

Unsupervised Video Object Segmentation for Deep Reinforcement Learning

Machine Learning and Data Analytics SymposiumDoha, Qatar, April 1, 2019

Vikash Goel, Jameson Weng, Pascal Poupart

Pascal: RBC Borealis AI Research Director

• Research institute funded by RBC

• 5 research centers: – Montreal, Toronto, Waterloo,

Edmonton and Vancouver

• 80 researchers: – Integrated (applied & fundamental) research model

• ML, RL, NLP, computer vision, private AI, knowledge graphs

• We are hiring!

Pascal: ML Professor at U of Waterloo

• Deep Learning– Automated structure learning, sum-product networks, transfer learning

• Reinforcement learning– Constrained RL, motion-oriented RL, sport analytics

• NLP– Conversational agents, machine translation, automated proofreading

• Theory– Convex relaxations of sum-product networks, characterization of local

optima in mixture models, consistent approximate Bayesian techniques

Outline

• Background

– Reinforcement learning: data inefficiency

– Solution: self-supervised learning

• MOREL: Motion-Oriented REinforcement Learning

– Unsupervised object & motion recognition

– Faster policy optimization & interpretability

Reference: Goel, Weng, Poupart (2018) Unsupervised Video Object Segmentation for Deep Reinforcement Learning, NeurIPS.

Reinforcement Learning

Games, robotics, automated trading, autonomous driving, recommender systems, conversational agents, operations research, data center optimization

Environment

ObservationReward Action

Data Inefficiency• Most RL successes: simulated environments

• Atari baselines: 40M frames (Schulman et al., 2017)

Atari MuJoCo VizDoom Computer Go

Image-based RL

e actionsor

values

deep neural network sparse

reward

Self-supervised learning

• Auxiliary tasks and objectives– Future observation/reward prediction– Past observation prediction (inverse dynamics)– Observation reconstruction (auto-encoder)

Environment

ObservationReward Action

Image-based RL• Deep RL:

• Self-supervised RL (auxiliary tasks):

e actionsor

values

deep neural network

dense signal

deep neural networkim

sparse reward

Prior knowledge• What do you see?

– Humans: moving objects– RL agent: sequence of pixels

seaquest space invaders breakout

Discovery of relevant features slows down learning

e actionsor

values

deep neural network

sparse reward

Feature extractionPolicy optimization

Faster LearningCan we learn a policy that automatically segments moving objects and identifies relevant objects?

seaquest space invaders breakout

Outline• Background

– Reinforcement learning: data inefficiency– Solution: self-supervised learning

• MOREL: Motion-Oriented REinforcement Learning– Unsupervised object & motion recognition– Faster policy optimization & interpretability

MOREL: Motion-Oriented RL

Unsupervised object segmentation

Only 1% of the frames (random actions)

Faster policy segmentation

Based on object segmentation and motion

Phase 1 Phase 2

Motion Consistency • Supervised segmentation: labor intensive labeling

• Idea: leverage optical flow (structure from motion)

SfM-NetVijayanarasimhan, Ricco, Schmid, Sukthankar, Fragkiadaki, SfM-Net: Learning of Structure and Motion from Video, arXiv, 2017.

SfM-Net predictions (KITTI 2015)

Simplified 2D SfM-Net

• No skip connection

• Reconstruction loss: DSSIM (structural dissimilarity)

• Flow regularization: L1 loss

• Curriculum: gradually increase !"#$ from 0 to 1

%"#&'()*"+&* = -../0

%"#$ =12

0(2) × 62 7

%899 = %"#&'()*"+&* + !"#$%"#$

Simplified 2D SfM-Net

Frame 1 Frame 2Masks

(summed)Most salient

mask Optical flow

Unsupervised object segmentationMasks (summed) Most salient mask Optical flow

Frame 1 Frame 2Masks

(summed)Most salient

mask Optical flow

MOREL: Motion-Oriented RLMulti-objective: max $%&'$() and min ,-./0'121,&3$$,$

Comparison with PPOBetter: 25 gamesSimilar: 25 gamesWorse: 9 games

Comparison with A2CBetter: 26 gamesSimilar: 30 gamesWorse: 3 games

VideosPong

Breakout

Seaquest

Beamrider

Performance CurvesBreakout

Frames Frames

Seaquest

Beamrider

Ablation StudyBreakout

Seaquest Beamrider

FramesFrames

Conclusion• MOREL: Motion-Oriented REinforcement Learning

– Unsupervised object & motion recognition– Faster policy optimization & interpretability

• Future work– 3D environments, physics-based dynamics, object-oriented RL,

model-based RL

RBC Borealis AI

• Graduating soon?– Join RBC Borealis AI (https://www.borealisai.com)– Email: pascal.poupart@borealisai.com

• Research Institute– Fundamental research (publications)– Applied research (products)

• Topics– RL: automated trading– NLP: news filtering, information extraction, text generation– Computer Vision: satellite-based house valuation– Privacy: differential privacy– Knowledge graphs: recommender systems

unsupervised video object segmentation for deep ...€¦ · unsupervised video object segmentation...

Documents

d - clutter: building object model library from...

instance embedding transfer to unsupervised video object...

unsupervised acute intracranial hemorrhage segmentation

anchor diffusion for unsupervised video object segmentation

unsupervised signal segmentation based on ... - …

edinburgh research explorer · 2018-01-04 · 1.2.1....

unsupervised mitochondria segmentation using recursive...

unsupervised online video object segmentation with motion...

locus (learning object classes with unsupervised...

unsupervised speaker segmentation and tracking in real-time...

unsupervised segmentation and classification of cervical...

learning unsupervised video object segmentation through

unsupervised video object segmentation for deep...

1 unsupervised segmentation of synthetic aperture radar...

image fusion and unsupervised joint segmentation using a...

unsupervised co-segmentation through region...

learning unsupervised video object segmentation...

unsupervised morphemic segmentation

improving unsupervised defect segmentation by applying

unsupervised texture segmentation with nonparametric...