reinforcement learning in control theory
TRANSCRIPT
Reinforcement Learning in
Control TheoryFarnaz Adib Yaghmaie
1 Introduction to RL
2 RL for Continuous-time Systems
3 Simulation results
4 Open Problems
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 2 / 34
Introduction to RL
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 3 / 34
Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Finding suitable actions to take in a given
situation in order to maximize a reward 1.
1Richard S Sutton & Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT
press Cambridge, 1998.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 4 / 34
Beating Atari champions2
2Mnih et al. Playing Atari with Deep Reinforcement Learning, arXiv preprint, 2013.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 5 / 34
Make a Robot Walk 3
3Schulman et al. Trust Region Policy Optimization, arXiv preprint, 2017.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 6 / 34
A graphical representation
Photo Credit: @en.wikipedia.org/wiki/Reinforcement_learning
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 7 / 34
Optimality or adaptivity or both?
Optimal Design
• Offline• Model-base
Adaptive Design
• Online• Non-optimal• Model-free
Reinforcement Learning: Optimality +Adaptivity 4.
4Lewis et al. Reinforcement learning and feedback control, IEEE Control Systems Magazine,
2012.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 8 / 34
Markov Decision Process (MDP)
• States
• Actions
• Transition
probability
• Reward
• Discount factor
Photo Credit: @ https://en.wikipedia.org/wiki/Markov_decision_process
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 9 / 34
RL for Continuous-timeSystems
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 10 / 34
Optimal control problem
• Agent
u = k(x).• Environment
x = f(x) + g(x)u = F (x, u).• Average value function
Va(x(t), u(t)) = limT →∞
1T
∫ t+T
tr(x(τ), u(τ))dτ.
Optimal average cost
λ∗.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 11 / 34
Optimal control problem
Optimize the value function
Va(x(t), u(t)) = limT →∞
1T
∫ t+T
tr(x(τ), u(τ))dτ.
with respect to
x = f(x) + g(x)u = F (x, u).
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 12 / 34
Do I sell optimal control problem again with the fancy
name RL???
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 13 / 34
Why to use RL at all?
• No analytical solution in general
• Difficult to solve
• Computation is offline
• Exact knowledge of dynamics is required
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 14 / 34
The key to RL
Bellman Principle ofOptimality
&Finding Fixed point equations!
λ∗ = minu
[r(x(t), u(t)) + ∇V ∗T∞ F (x, u)].
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 15 / 34
Integral Reinforcement Learning (IRL)
Bellman in integral form 5 6
V ∗(x(t)) = minu
∫ t+T
tr(x(τ), u(τ)) − λ∗dτ + V ∗(x(t + T ))).
No dynamics is involved.
5Vrabie et al. Adaptive optimal control for continuous-time linear systems based on policy
iteration, Automatica, 20096Adib Yaghmaie et al. Reinforcement Learning for Average Cost Optimal Control Problem with
an Application in Tracking, TAC, Under revision, 2018.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 16 / 34
Methods to solve Bellman equation
• Monte-Carlo
• Temporal Difference Learning
• Least squares Temporal Difference Learning
• Their variations/combinations
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 17 / 34
Algorithm 1 Policy Iteration Algoritm
1: Initialize: u(0), k = 0.2: repeat
3: Step 1: Policy evaluation
V (k)∞ (x(t)) =
∫ t+∆T
tQ(x) + u(k)T Ru(k)dτ
+ V (k)∞ (x(t + ∆T )) − λ(k)∆T.
4: Step 2: Policy improvement
u(k+1) = −12R−1gT ∇V (k)
∞ .
5: until Convergence
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 18 / 34
Simulation results
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 19 / 34
Tracking problem
• System
xs = fs(xs) + gs(xs)u.
• Reference
xr = fr(xr).
• Design u such that
xs − xr → 0.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 20 / 34
Solution to tracking problem
Theorem 3.1 If a feedback controller ufb stabilizes
the system and
fr(xr) = fs(xr) + gs(xr)uff (xr),
then the following controller solves the tracking
problem
u(xr, xr) = uff (xr) + ufb(xs − xr).
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 21 / 34
System Model
System 6
xs =[
0 1−5 −0.5
]xs +
[01
]u.
Reference
xr =[
0 1−5 0
]xr, xr0 = [0, 0.5
√5]T .
Tracking problem
limt→∞
e = limt→∞
(xs − xr) → 0.
6Adib Yaghmaie et al. Reinforcement Learning for Average Cost Optimal Control Problem with
an Application in Tracking, TAC, Under revision, 2018.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 22 / 34
Define x = [eT , xTr ]T
x =
0 1 0 0
−5 −0.5 0 −0.50 0 0 10 0 −5 0
x +
0100
u.
Running cost
r(x, u) = xT
100 0 0 00 100 0 00 0 0 00 0 0 0
x + u2.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 23 / 34
Analytical solution
u = kfbe + kff xr.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 24 / 34
Offline solution-Feedback gain
ARE([
0 1−5 −0.5
],
[01
],
[100 00 100
], 1)
Feedback controller
ufb(e) = kfbe =[−6.18 −10.11
]e.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 25 / 34
Offline solution-Feedforward gain
Tracking condition
fr(xr) = fs(xr) + gs(xr)uff (xr).
Feedforward controller
uff (xr) = kff xr =[0 0.5
]xr.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 26 / 34
Offline solution-Tracking controller
u =[−6.18 −10.11
]e +
[0 0.5
]xr
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 27 / 34
RL framework
Optimize the value function
Va(x(t), u(t)) = limT →∞
1T
∫ t+T
teT Qe + u2dτ.
with respect to
[exr
]=
0 1 0 0
−5 −0.5 0 −0.50 0 0 10 0 −5 0
[
exr
]+
0100
u
which is unknown.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 28 / 34
RL framework
Value function V ∗∞ = wT Φ(x)
Φ(x) = [x21, x1x2, x1x3, x1x4, x2
2, x2x3, x2x4,
x23, x3x4, x2
4].
Optimal controller
u∗ = −12R−1gT ∇Φw.
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 29 / 34
RL solution and comparison with offline method
Analytical
w∗ = [116.14 12.36 # #10.11 # # #
# #]
K∗ = [-6.18 -10.11 0 0.5]
λ∗ = 0.1563
RL Algorithm
w(9) = [116.14 12.36 -4.51 -0.06
10.11 0.034 -0.96 0
-0.12 0]
K(9) = [-6.18 -10.11 -0.01 0.48]
λ(9) = 0.1559
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 30 / 34
State and control
0 5 10 15 20 25 30 35 40
t
-5
-4
-3
-2
-1
0
1
2
3
4
x
e1
e2
Figure: state e
0 5 10 15 20 25 30 35 40
t
-200
-150
-100
-50
0
50
100
150
200
250
u
Figure: Control
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 31 / 34
Weights
Figure: The weight w(k)
0 5 10 15 20 25 30 35 40
t
0
10
20
30
40
50
60
70
80
35 400.1559250.15593
Figure: The average cost λ(k)
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 32 / 34
Tracking from random initial condition
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x1
-1.5
-1
-0.5
0
0.5
1
1.5
2x 2
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 33 / 34
Open Problems
Reinforcement Learning in Control Theory Farnaz Adib Yaghmaie Nov 12, 2018 34 / 34
• Stability of dynamic system
• Really model-free?
• Deep RL in control?