elec 576: deep reinforcement learning€¦ · elec 576: deep reinforcement learning ankit b. patel,...

51
ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.) 11-08-2017

Upload: others

Post on 31-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

ELEC 576: Deep Reinforcement Learning

Ankit B. Patel, CJ BarberanBaylor College of Medicine (Neuroscience Dept.)

Rice University (ECE Dept.) 11-08-2017

Page 2: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Value-Based Deep Reinforcement Learning

Page 3: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Value-Based Deep RL• The value function is

represented by Q-network with weights w

[David Silver ICML 2016]

Page 4: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Q-Learning• Optimal Q-values obey Bellman Equation

• Minimise MSE loss via SGD

• Diverges due to

• Correlations between samples

• Non-stationary targets

Page 5: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Interesting Question• What can go wrong in reinforcement learning that

doesn’t go wrong in standard supervised learning?

Page 6: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Experience Replay• Build dataset from agent’s own

experience

• Sample experiences from the dataset and apply update

[David Silver ICML 2016]

Page 7: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Benefits of Experience Replay

• Greater data efficiency

• Breaks the correlations

• Smoothing out learning and avoiding oscillations or divergence in parameters

Page 8: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN w/ Experience Replay

[Deepmind 2014]

Page 9: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN: Atari 2600

[David Silver ICML 2016]

Page 10: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN: Atari 2600• End-to-end learning of Q(s,a) from the frames

• Input state is stack of pixels from last 4 frames

• Output is Q(s,a) for the joystick/button positions

• Varies with different games (~3-18)

• Reward is change in score for that step

Page 11: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN: Atari 2600• Network architecture and hyper parameters are

fixed across all the games

[David Silver ICML 2016]

Page 12: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN: Atari Video

[Jon Juett Youtube]

Page 13: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

DQN: Atari Video

[PhysX Vision Youtube]

Page 14: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Improvements on the Basic DQN Algorithm

Page 15: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Double DQN• Double DQN

• Current Q-network w is used to select actions

• Older Q-network is w- is used to evaluate actions

• Prioritized reply

• Store experience in priority queue by the DQN error

Page 16: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Dueling Network• Duelling Network

• Action-independent value function V(s,v)

• Action-dependent advantage function A(s,a,w)

• Q(s,a) = V(s,v) + A(s,a,w)

Page 17: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Prioritized Experience Replay

[Schaul et al 2016]

Page 18: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Prioritized Experience Replay

[Schaul et al 2016]

Page 19: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Improvements after DQN

[Marc G. Bellemare Youtube]

Page 20: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

General Reinforcement Learning Architecture

[David Silver ICML 2016]

Page 21: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Policy-based Deep Reinforcement Learning

Page 22: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Deep Policy Network• The policy is represented by a network with weights u

• Define objective function as total discounted reward

• Optimise objective via SGD

• Adjust policy parameters u to achieve higher reward

[David Silver ICML 2016]

Page 23: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Policy Gradients• Gradient of a stochastic policy

• Gradient of a deterministic policy

[David Silver ICML 2016]

Page 24: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Actor Critic Algorithm• Estimate value function Q(s, a, w)

• Update policy parameters u via SGD

[David Silver ICML 2016]

Page 25: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Asynchronous Advantage Actor-Critic: Labyrinth

[DeepMind Youtube]

Page 26: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

A3C: Labyrinth• End-to-end learning of softmax policy from raw

pixels

• Observations are pixels of the current frame

• State is an LSTM

• Outputs both value V(s) & softmax over actions

• Task is to collect apples (+1) and escape (+10)

[David Silver ICML 2016]

Page 27: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Model-based Deep Reinforcement Learning

Page 28: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Learning Models of the Environment

• Generative model of Atari 2600

• Issues

• Errors in transition model compound over the trajectory

• Planning trajectory differ from executed trajectories

• Long, unusual trajectory rewards are totally wrong

[David Silver ICML 2016]

Page 29: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Learning Models of the Environment

[Junhyuk Oh Youtube]

Page 30: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Newer Implementations

Page 31: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

AlphaGo

[Deepmind Nature]

Page 32: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Policy and Value Networks

[Deepmind Nature]

Page 33: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Monte Carlo Tree Search

[Deepmind Nature]

Page 34: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

AlphaGo Results

[Deepmind Nature]

Page 35: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Which Move to Make

[Deepmind Nature]

Page 36: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Five Matches Against Fan Hui

[Deepmind Nature]

Page 37: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Neural Architecture Search with Reinforcement Learning• Uses an RNN to generate model descriptions of the

NNs

• Trains the RNN with RL to maximize expected accuracy of generated architectures on validation set

Page 38: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Neural Architecture Search

[Zoph & Le Arxiv 2016]

Page 39: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

RNN Controller

[Zoph & Le Arxiv 2016]

Page 40: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Training with REINFORCE

[Zoph & Le Arxiv 2016]

Page 41: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Parallel Training

[Zoph & Le Arxiv 2016]

Page 42: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Increase Architecture Complexity

[Zoph & Le Arxiv 2016]

Page 43: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Constructing the Network

[Zoph & Le Arxiv 2016]

Page 44: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Results

[Zoph & Le Arxiv 2016]

Page 45: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Reinforcement Learning with Unsupervised Auxiliary Tasks• Train an agent that maximises other pseudo-reward

functions simultaneously by RL

• Those tasks have to develop in absence of extrinsic rewards

• Outperforms previous state-of-the-art on atari

• Averaging 880% expert human performance

Page 46: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

UNREAL Agent

[Jaderberg et. al Arxiv 2016]

Page 47: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Auxiliary Control Tasks

[Jaderberg et. al Arxiv 2016]

Page 48: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Auxiliary Tasks

[Jaderberg et. al Arxiv 2016]

Page 49: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

UNREAL Algorithm

A3C Loss is minimised on policyValue function is optimised from replayed data

Auxiliary control loss is optimised off-policy from replied dataReward loss is optimised from rebalanced replay data

[Jaderberg et. al Arxiv 2016]

Page 50: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Atari Results

[Jaderberg et. al Arxiv 2016]

Page 51: ELEC 576: Deep Reinforcement Learning€¦ · ELEC 576: Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE

Reinforcement Learning with Unsupervised Auxiliary Tasks Video

[DeepMind Youtube]