aa203 optimal and learning-based control · 5/6/20 aa 203 | lecture 10 9. simplifying the notation...
TRANSCRIPT
![Page 1: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/1.jpg)
AA203Optimal and Learning-based Control
Introduction to MPC, persistent feasibility
![Page 2: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/2.jpg)
Roadmap
Optimal control
Open-loop
Indirect methods
Direct methods
Closed-loop
DP HJB / HJI
MPC
Adaptiveoptimal control Model-based RL
Linear methods
Non-linear methods
AA 203 | Lecture 105/6/20LQR iLQR DDP
Model-free RL
LQR Reachability analysis
State/controlparam
Controlparam
2
![Page 3: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/3.jpg)
Agenda
• Introduction to MPC• Persistent feasibility of MPC
• Readings:• F. Borrelli, A. Bemporad, M. Morari. Predictive Control for Linear and Hybrid
Systems, 2017.• J. B. Rawlings, D. Q. Mayne, M. M. Diehl. Model Predictive Control: Theory,
Computation, and Design, 2017.
5/6/20 AA 203 | Lecture 10 3
![Page 4: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/4.jpg)
Model predictive control • Model predictive control (or, more broadly, receding horizon
control) entails solving finite-time optimal control problems in a receding horizon fashion
5/6/20 AA 203 | Lecture 10 4
![Page 5: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/5.jpg)
Model predictive control
Key steps:1. At each sampling time 𝑡, solve an open-loop optimal control
problem over a finite horizon2. Apply optimal input signal during the following sampling interval
𝑡, 𝑡 + 13. At the next time step 𝑡 + 1, solve new optimal control problem
based on new measurements of the state over a shifted horizon
5/6/20 AA 203 | Lecture 10 5
![Page 6: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/6.jpg)
Basic formulation
• Consider the problem of regulating to the origin the discrete-time linear invariant system
𝐱 𝑡 + 1 = 𝐴𝐱 𝑡 + 𝐵𝐮 𝑡 , 𝐱 𝑡 ∈ ℝ! , 𝐮 𝑡 ∈ ℝ"
subject to the constraints𝐱 𝑡 ∈ 𝑋, 𝐮 𝑡 ∈ 𝑈, 𝑡 ≥ 0
where the sets 𝑋 and 𝑈 are polyhedra
5/6/20 AA 203 | Lecture 10 6
![Page 7: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/7.jpg)
Basic formulation
• Assume that a full measurement of the state 𝐱(𝑡) is available at the current time 𝑡• The finite-time optimal control problem solved at each stage is
𝐽!∗ 𝐱 𝑡 = min𝐮!|!,…,𝐮!#$%&|!
𝑝 𝐱!&'|! +*)*+
',-
𝑐(𝐱!&)|!, 𝐮!&)|!)
5/6/20 AA 203 | Lecture 10 7
subject to 𝐱!&)&-|!= 𝐴𝐱!&)|! + 𝐵𝐮!&)|!, 𝑘 = 0,… ,𝑁 − 1
𝐱!&)|!∈ 𝑋, 𝐮!&)|!∈ 𝑈, 𝑘 = 0,… ,𝑁 − 1
𝐱!&'|!∈ 𝑋.𝐱!|!= 𝐱(𝑡)
![Page 8: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/8.jpg)
Basic formulation
Notation:• 𝐱#$%|# is the state vector at time 𝑡 + 𝑘 predicted at time 𝑡 (via the
system’s dynamics)• 𝐮#$%|# is the input 𝐮 at time 𝑡 + 𝑘 computed at time 𝑡
Note: 𝐱'|(≠ 𝐱'|)
5/6/20 AA 203 | Lecture 10 8
![Page 9: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/9.jpg)
Basic formulation
• Let 𝑈#→#$+|#∗ ≔ {𝐮#|#∗ , 𝐮#$(|#∗ , … , 𝐮#$+-(|#∗ } be the optimal solution, then
𝐮 𝑡 = 𝐮#|#∗ (𝐱(𝑡))• The optimization problem is then repeated at time 𝑡 + 1, based on
the new state 𝐱#$(|#$(= 𝐱(𝑡 + 1)• Define 𝜋# 𝐱 𝑡 ≔ 𝐮#|#∗ (𝐱(𝑡))• Then the closed-loop system evolves as
𝐱 𝑡 + 1 = 𝐴𝐱 𝑡 + 𝐵𝜋# 𝐱 𝑡 ≔ 𝐟./(𝐱 𝑡 , 𝑡)• Central question: characterize the behavior of closed-loop system
5/6/20 AA 203 | Lecture 10 9
![Page 10: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/10.jpg)
Simplifying the notation• Note that the setup is time-invariant, hence, to simplify the notation, we
can let 𝑡 = 0 in the finite-time optimal control problem, namely
• Denote 𝑈!∗ 𝐱 𝑡 = {𝐮!∗ , … , 𝐮#$%∗ }5/6/20 AA 203 | Lecture 10 10
𝐽+∗ 𝐱 𝑡 = min𝐮',…,𝐮$%&
𝑝 𝐱' +*)*+
',-
𝑐(𝐱), 𝐮))
subject to 𝐱)&-= 𝐴𝐱) + 𝐵𝐮), 𝑘 = 0,… ,𝑁 − 1
𝐱)∈ 𝑋, 𝐮)∈ 𝑈, 𝑘 = 0,… ,𝑁 − 1
𝐱'∈ 𝑋.𝐱+= 𝐱(𝑡)
![Page 11: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/11.jpg)
Simplifying the notation
• With new notation,𝐮 𝑡 = 𝐮0∗ 𝐱 𝑡 = 𝜋(𝐱(𝑡))
and closed-loop system becomes 𝐱 𝑡 + 1 = 𝐴𝐱 𝑡 + 𝐵𝜋 𝐱 𝑡 ≔ 𝐟./(𝐱 𝑡 )
5/6/20 AA 203 | Lecture 10 11
![Page 12: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/12.jpg)
Typical cost functions
• 2-norm:𝑝 𝐱+ = 𝐱+1 𝑃𝐱+ , 𝑐 𝐱% , 𝐮% = 𝐱%1 𝑄𝐱%+ 𝐮%1 𝑅𝐮% , 𝑃 ≥ 0, 𝑄 ≥ 0, 𝑅 > 0
• 1-norm or ∞-norm:𝑝 𝐱+ = 𝑃𝐱+ 2 𝑐 𝐱% , 𝐮% = 𝑄𝐱% 2+ 𝑅𝐮% 2, 𝑝 = 1 or ∞
where 𝑃, 𝑄, 𝑅 are full column ranks
5/6/20 AA 203 | Lecture 10 12
![Page 13: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/13.jpg)
Online model predictive control
repeat
5/6/20 AA 203 | Lecture 10 13
measure the state 𝐱(𝑡) at time instant 𝑡obtain 𝑈0∗ 𝐱 𝑡 by solving finite-time optimal control problemif 𝑈0∗ 𝐱 𝑡 = ∅ then ‘problem infeasible’ stopapply the first element 𝐮0∗ of 𝑈0∗ 𝐱 𝑡 to the systemwait for the new sampling time 𝑡 + 1
![Page 14: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/14.jpg)
Main implementation issues
1. The controller may lead us into a situation where after a few steps the finite-time optimal control problem is infeasible→ persistent feasibility issue
2. Even if the feasibility problem does not occur, the generated control inputs may not lead to trajectories that converge to the origin (i.e., closed-loop system is unstable) → stability issue
Key question: how do we guarantee that such a “short- sighted” strategy leads to effective long-term behavior?
5/6/20 AA 203 | Lecture 10 14
![Page 15: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/15.jpg)
Analysis approaches
1. Analyze closed-loop behavior directly → generally very difficult
2. Derive conditions on terminal function 𝑝, and terminal constraint set 𝑋3 so that persistent feasibility and closed-loop stability are guaranteed
5/6/20 AA 203 | Lecture 10 15
![Page 16: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/16.jpg)
Addressing persistent feasibility
Goal: design MPC controller so that feasibility for all future times is guaranteed
Approach: leverage tools from invariant set theory
5/6/20 AA 203 | Lecture 10 16
![Page 17: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/17.jpg)
Set of feasible initial states
• Set of feasible initial states
𝑋+ ≔ 𝐱+ ∈ 𝑋 ∃ 𝐮+, … , 𝐮',- such that 𝐱) ∈ 𝑋, 𝐮) ∈ 𝑈, 𝑘 = 0,… ,𝑁 − 1,𝐱' ∈ 𝑋. where 𝐱)&- = 𝐴𝐱) + 𝐵𝐮), 𝑘 = 0,… ,𝑁 − 1}
• A control input can be found only if 𝐱(0) ∈ 𝑋0!
5/6/20 AA 203 | Lecture 10 17
![Page 18: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/18.jpg)
Controllable sets
• For the autonomous system 𝐱 𝑡 + 1 = 𝜙(𝐱(𝑡))with constraints 𝐱 𝑡 ∈ 𝑋, 𝐮 𝑡 ∈ 𝑈, the one-step controllable set to set 𝑆 is defined as
Pre 𝑆 ≔ {𝐱 ∈ ℝ! ∶ 𝜙 𝐱 ∈ 𝑆}
• For the system 𝐱 𝑡 + 1 = 𝜙 𝐱 𝑡 , 𝐮 𝑡 with constraints 𝐱 𝑡 ∈ 𝑋,𝐮 𝑡 ∈ 𝑈, the one-step controllable set to set 𝑆 is defined as
Pre 𝑆 ≔ {𝐱 ∈ ℝ! ∶ ∃𝑢 ∈ 𝑈 such that 𝜙 𝐱, 𝐮 ∈ 𝑆}
5/6/20 AA 203 | Lecture 10 18
![Page 19: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/19.jpg)
Control invariant sets
• A set 𝐶 ⊆ 𝑋 is said to be a control invariant set for the system 𝐱 𝑡 + 1 = 𝜙 𝐱 𝑡 , 𝐮 𝑡 with constraints 𝐱 𝑡 ∈ 𝑋, 𝐮 𝑡 ∈ 𝑈, if:
𝐱 𝑡 ∈ 𝐶 ⇒ ∃𝐮 ∈ 𝑈 such that 𝜙 𝐱 𝑡 , 𝐮 𝑡 ∈ 𝐶, for all 𝑡
• The set 𝐶4 ⊆ 𝑋 is said to be the maximal control invariant set for the system 𝐱 𝑡 + 1 = 𝜙 𝐱 𝑡 , 𝐮 𝑡 with constraints 𝐱 𝑡 ∈ 𝑋,𝐮 𝑡 ∈ 𝑈, if it is control invariant and contains all control invariant sets contained in 𝑋
• These sets can be computed by using the MPT toolbox https://www.mpt3.org/
5/6/20 AA 203 | Lecture 10 19
![Page 20: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/20.jpg)
Persistent feasibility lemma
• Define “truncated” feasibility set:𝑋- ≔ 𝐱- ∈ 𝑋 ∃ 𝐮-, … , 𝐮',- such that 𝐱) ∈ 𝑋, 𝐮) ∈ 𝑈, 𝑘 = 1,… ,𝑁 − 1,
𝐱' ∈ 𝑋. where 𝐱)&- = 𝐴𝐱) + 𝐵𝐮), 𝑘 = 1,… ,𝑁 − 1}
• Feasibility lemma: if set 𝑋( is a control invariant set for system:𝐱 𝑡 + 1 = 𝐴𝐱 𝑡 + 𝐵𝐮 𝑡 , 𝐱 𝑡 ∈ 𝑋, 𝐮 𝑡 ∈ 𝑈, 𝑡 ≥ 0
then the MPC law is persistently feasible
5/6/20 AA 203 | Lecture 10 20
![Page 21: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/21.jpg)
Persistent feasibility lemma
• Proof:1. Pre 𝑋( = {𝐱 ∈ ℝ! ∶ ∃𝐮 ∈ 𝑈 such that 𝐴𝐱 + 𝐵𝐮 ∈ 𝑋(}2. Since 𝑋( is control invariant
∀𝐱 ∈ 𝑋( ∃𝐮 ∈ 𝑈 such that 𝐴𝐱 + 𝐵𝐮 ∈ 𝑋(3. Thus 𝑋( ⊆ Pre 𝑋( ∩ 𝑋4. One can write𝑋0 = 𝐱0 ∈ 𝑋 ∃𝐮0 ∈ 𝑈 such that 𝐴𝐱0 + 𝐵𝐮 ∈ 𝑋(} = Pre 𝑋( ∩ 𝑋
5. Thus, 𝑋( ⊆𝑋0
5/6/20 AA 203 | Lecture 10 21
![Page 22: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/22.jpg)
Persistent feasibility lemma
• Proof:6. Pick some 𝐱0 ∈ 𝑋0. Let 𝑈0∗ be the solution to the finite-time
optimization problem, and 𝐮0∗ be the first control. Let𝐱( = 𝐴𝐱0 + 𝐵𝐮0∗
7. Since 𝑈0∗ is clearly feasible, one has 𝐱( ∈ 𝑋(. Since 𝑋( ⊆𝑋0, one has
𝐱( ∈ 𝑋0hence the next optimization problem is feasible!
5/6/20 AA 203 | Lecture 10 22
![Page 23: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/23.jpg)
Practical significance
• For 𝑁 = 1, we can set 𝑋3 = 𝑋(. If we choose the terminal set to be control invariant, then MPC will be persistently feasible independentof chosen control objectives and parameters• Designer can choose the parameters to affect performance (e.g.,
stability)• How to extend this result to 𝑁 > 1?
5/6/20 AA 203 | Lecture 10 23
![Page 24: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/24.jpg)
Persistent feasibility theorem
• Feasibility theorem: if set 𝑋3 is a control invariant set for system:𝐱 𝑡 + 1 = 𝐴𝐱 𝑡 + 𝐵𝐮 𝑡 , 𝐱 𝑡 ∈ 𝑋, 𝐮 𝑡 ∈ 𝑈, 𝑡 ≥ 0
then the MPC law is persistently feasible
5/6/20 AA 203 | Lecture 10 24
![Page 25: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/25.jpg)
Persistent feasibility theorem
• Proof1. Define “truncated” feasibility set at step 𝑁 − 1:
𝑋+-( ≔ 𝐱+-( ∈ 𝑋 ∃ 𝐮+-( such that 𝐱+-( ∈ 𝑋, 𝐮+-( ∈ 𝑈,𝐱+∈ 𝑋3 where 𝐱+ = 𝐴𝐱+-( + 𝐵𝐮+-(}
2. Due to the terminal constraint𝐴𝐱+-( + 𝐵𝐮+-( = 𝐱+ ∈ 𝑋3
5/6/20 AA 203 | Lecture 10 25
![Page 26: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/26.jpg)
Persistent feasibility theorem
• Proof3. Since 𝑋3 is a control invariant set, there exists a 𝐮 ∈ 𝑈 such that
𝐱$ = 𝐴𝐱+ + 𝐵𝐮 ∈ 𝑋34. The above is indeed the requirement to belong to set 𝑋+-(5. Thus, 𝐴𝐱+-( + 𝐵𝐮+-( = 𝐱+ ∈ 𝑋+-(6. We have just proved that 𝑋+-( is control invariant 7. Repeating this argument, one can recursively show that 𝑋+-),
𝑋+-', ⋯ , 𝑋( are control invariant, and the persistent feasibility lemma then applies
5/6/20 AA 203 | Lecture 10 26
![Page 27: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/27.jpg)
Practical aspects of persistent feasibility
• The terminal set 𝑋3 is introduced artificially for the sole purpose of leading to a sufficient condition for persistent feasibility• We want it to be large so that it does not compromise closed-loop
performance• Though it is simplest to choose 𝑋3 = 0, this is generally undesirable • We’ll discuss better choices in the next lecture
5/6/20 AA 203 | Lecture 10 27
![Page 28: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/28.jpg)
Example
• System with 𝑛 = 3 states, 𝑚 = 2 inputs• 𝐴, 𝐵 chosen randomly • quadratic stage cost: 𝑐 𝐱, 𝐮 = 𝐱1𝐱 + 𝐮1𝐮• 𝑋 = 𝐱 𝐱 4 ≤ 1}, 𝑈 = {𝐮 | 𝐮 4 ≤ 0.5}• initial point: (−0.9, 0.9, −0.9)
5/6/20 AA 203 | Lecture 10 28
![Page 29: AA203 Optimal and Learning-based Control · 5/6/20 AA 203 | Lecture 10 9. Simplifying the notation •Note that the setup is time-invariant, hence, to simplify the notation, we can](https://reader033.vdocument.in/reader033/viewer/2022052106/60412e962d483b014116923c/html5/thumbnails/29.jpg)
Next time
• Stability of MPC, explicit MPC, and practical aspects
AA 203 | Lecture 105/6/20 29