challenges in deep reinforcement learning · pdf file• discuss some recent work in deep...

24
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley

Upload: buihanh

Post on 07-Feb-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Challenges in Deep Reinforcement Learning

Sergey Levine

UC Berkeley

Page 2: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy
Page 3: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

• Discuss some recent work in deep reinforcement learning

• Present a few major challenges

• Show some of our recent work toward tackling these challenges

Page 4: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Deep Q-NetworksMnih et al.2013

Guided policy searchLevine et al.2013

RL on raw visual inputLange et al.2009

Deep deterministic policy gradientsLillicrap et al.2015

Trust region policy optimizationSchulman et al.2015

Some recent work on deep RL

End-to-end visuomotor policiesLevine*, Finn* et al.2015

Supersizing self-supervisionPinto & Gupta2016

stability efficiency scale

AlphaGoSilver et al.2016

Page 5: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Challenges in Deep Reinforcement Learning

1. Stability

2. Efficiency

3. Scale

Page 6: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Challenges in Deep Reinforcement Learning

1. Stability

2. Efficiency

3. Scale

Page 7: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Deep RL with Policy Gradients

• Unbiased but high-variance gradient• Stable• Requires many samples• Example: TRPO [Schulman et al. ‘15]

Page 8: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Deep RL with Off-Policy Q-Function Critic

• Low-variance but biased gradient• Much more efficient (because off-policy)• Much less stable (because biased)• Example: DDPG [Lillicrap et al. ‘16]

Page 9: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Improving Efficiency & Stability with Q-Prop

Shane Gu

Policy gradient: Q-function critic:

Q-Prop:

• Unbiased gradient, stable• Efficient (uses off-policy samples)• Critic comes from off-policy data• Gradient comes from on-policy data• Automatic variance-based adjustment

Page 10: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Comparisons

• Works with smaller batches than TRPO• More efficient than TRPO• More stable than DDPG with respect to hyperparameters

• Likely responsible for the better performance on harder task

Page 11: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Challenges in Deep Reinforcement Learning

1. Stability

2. Efficiency

3. Scale

Page 12: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Parameter Space vs Policy Space

parameters

why policy space?

• local optima/easier optimization landscapes

• can be easier to update in policy space vs parameter space

Page 13: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Mirror Descent Guided Policy Search (MDGPS)

Page 14: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Mirror Descent Guided Policy Search (MDGPS)

“projection”: supervised learning

local policy optimization:

• trajectory-centric model-based RL [Montgomery ‘16]

• path integral policy iteration [Chebotar ‘16]

Page 15: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

MDGPS with Random Initial States and Local Models

Chelsea FinnHarley

Montgomery Anurag Ajay

Page 16: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Efficiency & Real-World Evaluation

Learning 2D reaching(simple benchmark task):• TRPO (best known value): 3000 trials• DDPG, NAF (best known value): 2000 trials• Q-Prop: 2000 trials• MDGPS: 500 trials

Page 17: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

MDGPS with Demonstrations and Path Integral Policy Iteration

MrinalKalakrishnanYevgen Chebotar Adrian LiAli Yahya

much better handling of non-smoothproblems (e.g. discontinuities)

requires more samples, works bestwith demo initialization

Page 18: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Challenges in Deep Reinforcement Learning

1. Stability

2. Efficiency

3. Scale

Page 19: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

ingredients for success in learning:

supervised learning: reinforcement learning:

computation

algorithms

data

computation

algorithms~data?

L., Pastor, Krizhevsky, Quillen ‘16

Page 20: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Policy Learning with Multiple Robots

Local policy optimization Global policy optimization

Rollout execution

MrinalKalakrishnan Yevgen ChebotarAdrian LiAli Yahya

Page 21: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Yahya, Li, Kalakrishnan, Chebotar, L., ‘16

Page 22: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Policy Learning with Multiple Robots: Deep RL with NAF

Gu*, Holly*, Lillicrap, L., ‘16

Shane Gu Ethan Holly Tim Lillicrap

Page 23: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Future Outlook & Future Challenges

Stability remains a huge challenge• Can’t do hyperparameter sweeps in the real world…• Likely missing a few more pieces of theory

High efficiency is important, but what about diversity?• Efficiency seems at odds with generalization• Massively off-policy learning• Semi-supervised learning

(not addressed in this talk) What about the reward function? Highly nonobvious how to set in the real world

Page 24: Challenges in Deep Reinforcement Learning · PDF file• Discuss some recent work in deep reinforcement learning ... Schulman et al. ... • path integral policy

Acknowledgements

Shane Gu Ethan Holly Tim LillicrapChelsea FinnHarley

Montgomery Anurag Ajay

MrinalKalakrishnan Yevgen ChebotarAdrian LiAli Yahya