reinforcement learning with matlab & simulinkcode for configuring agent and training. 17 create...

37
1 © 2019 The MathWorks, Inc. Reinforcement Learning with MATLAB & Simulink Christoph Stockhammer MathWorks Application Engineering

Upload: others

Post on 04-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

1© 2019 The MathWorks, Inc.

Reinforcement Learning with MATLAB & Simulink

Christoph Stockhammer MathWorks Application Engineering

Page 2: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

2

Unsupervised

Learning[No Labeled Data]

Clustering

Machine Learning

Machine Learning, Deep Learning, and Reinforcement Learning

Page 3: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

3

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Machine Learning

Machine Learning, Deep Learning, and Reinforcement Learning

Page 4: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

4

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Machine Learning

Machine Learning, Deep Learning, and Reinforcement Learning

Deep Learning

Supervised learning typically involves

manual feature extraction

Deep learning typically does not

involve feature extraction

Page 5: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

5

Unsupervised

Learning[No Labeled Data]

Supervised Learning

[Labeled Data]

Clustering Classification Regression

Deep Learning

Machine Learning

Reinforcement

Learning

[Interaction Data]

Decision

MakingControl

Reinforcement Learning vs Machine Learning

Reinforcement learning:

Learning a behavior or

accomplishing a task through

trial & error

[interaction]

Complex problems typically need

deep models

[Deep Reinforcement Learning]

Reinforcement Learning Toolbox

New in

Page 6: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

6

Reinforcement Learning enables the use of Deep Learning for

Controls and Decision Making Applications

A.I. Gameplay

Finance

Robotics

Autonomous driving

Page 7: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

7

Why is reinforcement learning appealing?

Teach a robot to follow a straight line using camera data

Page 8: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

8

Let’s try to solve this problem the traditional way

Motor

Control

Leg &

Trunk

Trajectories

Balance

Motor

Commands

Observations

Sensors

Camera

Data

Feature

Extraction

State

EstimationController

Observations

Motor

Commands

Page 9: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

9

What is the alternative approach?

Sensors

Camera

Data

Feature

Extraction

State

EstimationController

Observations

Motor

Commands

Camera

Data

Sensors

Motor

Commands

How do we

design this?

Page 10: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

10

A Practical Example of Reinforcement LearningTraining a Robot to Walk

Robot’s computer learns how to walk…

(agent)

using sensor readings from joints, torso,…

(state)

that represent robot’s pose and orientation,…

(environment)

by generating joint torque commands,…

(action)

based on an internal state-to-action mapping…

(policy)

that tries to optimize forward locomotion, …

(reward).

The policy is updated through repeated trial-

and-error by a reinforcement learning

algorithm

AGENT

Reinforcement

Learning

Algorithm

Policy

ENVIRONMENT

ACTION

REWARD

STATE

Policy

update

Page 11: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

11

Connections with Controls

OtAt

Rt = 𝑥 𝑡 𝑇𝑅𝑥(𝑡) + 𝑢 𝑡 𝑇𝑄𝑢(𝑡)

Page 12: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

12

Define Environment to Generate Data

Physical modeling of robot

dynamics and contact

forces using Simscape

Agent

The environment provides 29 observations to the agent.

The observations are: Y (lateral) and Z (vertical) translations of the torso center of mass; X (forward), Y (lateral), and

Z (vertical) translation velocities; yaw, pitch, and roll angles of the torso; yaw, pitch, and roll angular velocities;

angular position and velocity of 3 joints (ankle, knee, hip) on both legs; and previous actions from the agent. The

translation in the Z direction is normalized to a similar range as the other observations.

Page 13: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

13

Define Environment to Generate Data

AgentReward defines task to

learn

Page 14: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

14

Define Environment to Generate Data

*Reward function inspired by: N. Heess et al, "Emergence of Locomotion Behaviours in Rich Environments," Technical Report, ArXiv, 2017.

Page 15: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

15

Define Policy and Learning Algorithm

Define agent’s policy and

learning algorithm

Agent

Page 16: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

16

Code for Configuring Agent and Training

Page 17: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

17

Create Critic Network

Page 18: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

18

Create Actor Network

Page 19: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

19

Create DDPG Agent

Page 20: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

20

Training the Agent

Page 21: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

21

Train Robot to Walk and Track Progress

Episode Number

Epis

ode R

ew

ard

Page 22: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

22

Train Robot to Walk and Track Progress

Episode Number

...

Accelerate Training with Parallel Computing

Epis

ode R

ew

ard

Page 23: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

23

Deploy Policy to Embedded Device

Page 24: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

24

Everything is Great, Right?

Page 25: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

25

Reward Function Design Matters

Page 26: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

26

Reward Function Design Matters

Page 27: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

27

Reward Function Design Matters

Page 28: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

28

Reward Function Design Matters

Page 29: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

29

Reward Function Design Matters

Page 30: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

30

Simulation and Virtual Models are a Key Aspect of Reinforcement

Learning

Reinforcement learning needs a lot of data

(sample inefficient)

– Training on hardware can be prohibitively

expensive and dangerous

Virtual models allow you to simulate conditions

hard to emulate in the real world

– This can help develop a more robust

solution

Many of you have already developed MATLAB

and Simulink models that can be reused

Page 31: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

31

Pros & Cons of Reinforcement Learning

Pros Cons

No need to label data before training A lot of simulation trials required

Complex end-to-end solutions can be developed

(e.g. camera input→ car steering wheel)

Reward signal design, network layer structure &

hyperparameter tuning can be challenging

Can be applied to uncertain, nonlinear

environments

No performance guarantees, Training may not

converge

Virtual models allow simulations of varying

conditions and training parallelization

Further training might be necessary after

deployment on real hardware

Page 32: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

32

Reinforcement Learning Toolbox

New in

Built-in and custom algorithms for reinforcement

learning

Environment modeling in MATLAB and Simulink

Deep Learning Toolbox support for designing policies

Training acceleration through GPUs and cloud

resources

Deployment to embedded devices and production

systems

Reference examples for getting started

Page 33: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

33

Predefined Environments and Many Examples

MATLAB Environment

– 'BasicGridWorld'

– 'CartPole-Discrete'

– 'CartPole-Continuous'

– 'DoubleIntegrator-Discrete'

– 'DoubleIntegrator-Continuous'

– 'SimplePendulumWithImage-Discrete'

– 'SimplePendulumWithImage-Continuous'

– 'WaterFallGridWorld-Stochastic'

– 'WaterFallGridWorld-Deterministic'

Simulink Environment

– 'SimplePendulumModel-Discrete'

– 'SimplePendulumModel-Continuous'

– 'CartPoleSimscapeModel-Discrete'

– 'CartPoleSimscapeModel-Continuous'

Examples

– Grid World, MDP

– Classical Control Benchmarks

– Automotive

– Robotics

– Custom LQR Agent

Page 34: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

34

Extensible Environment Interface

env = rlFunctionEnv(obsInfo,actInfo,stepfcn,resetfcn)

– obsInfo: Observation Specification

– actInfo: Action Specification

– stepfcn: Function handle for stepping the environment

– resetfcn: Function handle for resetting the environment

Subclassing from rl.env.MATLABEnvironment

– Custom MATLAB Environments

– Interfacing with 3rd party simulators (e.g. OpenAI Gym)

Page 35: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

35

Resources

Reference examples for controls,

robotics, and autonomous system

applications

Documentation written for

engineers and domain experts

Tech Talk video series on

reinforcement learning concepts for

engineers

Page 36: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

36

Reinforcement Learning Toolbox

New in

Built-in and custom algorithms for reinforcement

learning

Environment modeling in MATLAB and Simulink

Deep Learning Toolbox support for designing policies

Training acceleration through GPUs and cloud

resources

Deployment to embedded devices and production

systems

Reference examples for getting started

Page 37: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent

37

Questions?