analytical solution of constrained lq optimal … solution of constrained lq optimal control for...

Analytical Solution of Constrained LQ OptimalControl for Horizon 2

Jose De Dona

September 2004

Centre of Complex DynamicSystems and Control

Outline

1 A Motivating ExampleCautious DesignSerendipitous DesignTactical Design

2 Explicit SolutionsExplicit vs Implicit

3 A Simple Problem with N = 2Dynamic Programming

4 Motivating Example Revisited

5 Conclusions


A Motivating Example

Consider the double integrator plant:

d2y(t)dt2

= u(t).

The zero-order hold discretisation with sampling period 1 is:

xk+1 = Axk + Buk ,

yk = Cxk ,

with

A =

[1 10 1

], B =

[0.51

], C =

[1 0

].

Assume the actuator has a maximum and minimum allowed value(saturation) of ±1. Thus, the controller has to satisfy the constraint:|uk | ≤ 1 for all k .


A Motivating Example

The schematic of the feedback control loop is shown in the Figure.

uk xkcontroller linear

systemsat

Figure: Feedback control loop.

“sat” represents the actuator modelled by the saturation function

sat(u) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩1 if u > 1,

u if |u| ≤ 1,

−1 if u < −1.


(i) Cautious Design

Suppose the initial condition is x0 =[−6 0

]t.

We start with a “low gain” linear state feedback controller (LQR),computed so as to minimise the objective function:

V∞({xk }, {uk }) = 12

∞∑k=0

(xtk Qxk + utk Ruk

),

with weighting matrices Q = CtC =

[1 00 0

]and R = 20.

The solution is then computed by solving the algebraic Riccatiequation P = A tPA + Q − K t(R + BtPB)K, whereK = (R + BtPB)−1BtPA. We obtain the linear state feedback law:

uk = −Kxk = −[0.1603 0.5662

]xk .


(i) Cautious Design

“Slow gain” control: Q = CtC =

[1 00 0

], R = 20, uk = −Kxk .

The resulting input and output sequences are shown in the Figure.

0 5 10 15 20 25−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25−6

−5

−4

−3

−2

−1

0

1

k

k

u ky k

We can see from the figure thatthe input uk satisfies the givenconstraint |uk | ≤ 1, for all k ; forthis initial condition.

The response is rather slow. The“settling time” is of the order of 8samples.


(ii) Serendipitous Design

“High gain” control: Q = CtC =

[1 00 0

], R = 2, uk = sat(−Kxk ).


0 5 10 15 20 25−1

0

1

2

3

0 5 10 15 20 25−6

−5

−4

−3

−2

−1

0

1

k

k

u ky k

We can see from the figure thatthe input uk satisfies the givenconstraint |uk | ≤ 1, for all k . Thecontroller makes better use of theavailable control authority.

The amount of overshoot isessentially the same. The“settling time” is of the order of 5samples.


(ii) Serendipitous Design

“High gain” control: Q = CtC =

[1 00 0

], R = 0.1, uk = sat(−Kxk ).


0 5 10 15 20 25−6

−4

−2

0

2

4

6

0 5 10 15 20 25−6

−4

−2

0

2

4

k

k

u ky k

We can see from the figure thatthe input uk satisfies the givenconstraint |uk | ≤ 1, for all k . Thecontrol sequence stays saturatedmuch longer.

Significant overshoot. The“settling time” blows out to 12samples.


Recapitulation

Going from R = 20→ 2: same overshoot, faster response.

Going from R = 2→ 0.1: large overshoot, long settling time.

Let us examine the state space trajectory corresponding to theserendipitous strategy R = 0.1, uk = sat(−Kxk ):

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

R0

R1

R2

The control law u = −sat(Kx)partitions the state space intothree regions. Hence, thecontroller can be characterisedas a switched control strategy:

u = K(x) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩−Kx if x ∈ R0,

1 if x ∈ R1,

−1 if x ∈ R2.


Recapitulation

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

R0

R1

R2

u = K(x) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩−Kx if x ∈ R0,

1 if x ∈ R1,

−1 if x ∈ R2.

Heuristically, we can think,in thisexample, of x2 as “velocity” andx1 as “position.” Now, in ourattempt to change the positionrapidly (from −6 to 0), the velocityhas been allowed to grow to arelatively high level (+3). Thiswould be fine if the braking actionwere unconstrained. However,our input (including braking) islimited to the range [−1, 1].Hence, the available braking isinadequate to “pull the systemup,” and overshoot occurs.


(iii) Tactical Design

We will now start “afresh” with a formulation that incorporatesconstraints from the beginning in the design process.

A sensible idea would seem to be to try to “look ahead” and takeaccount of future input constraints (that is, the limited brakingauthority available).

We now consider a Model Predictive Control law with predictionhorizon N = 2. At each sampling instant i and for the current statexi, the two-step objective function:

V2({xk }, {uk }) = 12

xti+2Pxi+2 +12

i+1∑k=i

(xtk Qxk + utk Ruk

), (1)

is minimised subject to the equality and inequality constraints:

xk+1 = Axk + Buk , k = i, i + 1,

|uk | ≤ 1, k = i, i + 1.(2)



In the objective function (1), we set, as before, Q = CtC, R = 0.1.The terminal state weighting matrix P is taken to be the solution ofthe algebraic Riccati equation P = A tPA + Q − K t(R + BtPB)K,where K = (R + BtPB)−1BtPA.

As a result of minimising (1) subject the constraints (2), we obtainan optimal fixed-horizon control sequence

{uopti , uopti+1}

We then apply the resulting value of uopti to the system in areceding horizon control form.

We can see that this strategy has the ability to “look ahead” byconsidering the constraints not only at the current time i, but alsoone step ahead i + 1.



MPC: Q = CtC =

[1 00 0

], R = 0.1, Horizon N = 2


0 5 10 15 20 25−6

−4

−2

0

2

4

6

0 5 10 15 20 25−6

−4

−2

0

2

4

k

k

u ky k

Dashed line: control uk = −Kxk .Solid line: MPC

We can see from the figure thatthe output trajectory withconstrained input now hasminimal overshoot. Thus, theidea of “looking ahead” has paiddividends.


Recapitulation

As we will see in this part of the course, the receding horizoncontrol strategy (MPC) we have used can be described as apartition of the state space into different regions in which affinecontrol laws hold.

Serendipitous strategyR = 0.1, uk = sat(−Kxk ).

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

R0

R1

R2

Receding horizon tacticaldesign R = 0.1, N = 2.

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

R0

R1

R2

R3

R4


Explicit vs Implicit

Before we start studying how to find an explicit characterisation ofthe MPC solution, let us first define what procedures we would callexplicit and which ones we would call implicit.

Explicit solution(p = parameter,a, b = constants).

fp(z) = z2 + 2apz + bp

z∂fp∂z = 0⇒ zopt(p) = −ap

Implicit (numerical)solution (p = parameter,a, b = constants).

fp(z) = z2 + 2apz + bp

zz0pzk

pzk+1p

. . .


Problem Setup

We consider the discrete time system

xk+1 = Axk + Buk ,

where xk ∈ Rn and uk ∈ R. The pair (A ,B) is assumed to bestabilisable. We consider the following fixed horizon optimal controlproblem:

PN(x) : VoptN (x) = min VN({xk }, {uk }), (3)

subject to: x0 = x,

xk+1 = Axk + Buk for k = 0, 1, . . . ,N − 1,

uk ∈ U = [−∆,∆] for k = 0, 1, . . . ,N − 1,

where ∆ > 0 is the input constraint level. The objective function is:

VN({xk }, {uk }) = 12

xtNPxN +12

N−1∑k=0

(xtk Qxk + utk Ruk

). (4)


Problem Setup

Let the control sequence that achieves the minimum in (3) be:

{uopt0 , uopt1 , . . . , u

optN−1}

The Receding horizon control law, which depends on the currentstate x = x0, is

KN(x) = uopt0 .

Can we obtain an explicit expression for KN(·) defined above,as a function of the parameter x?


A Simple Problem with N = 2

Consider now the following fixed horizon optimal control problemwith prediction horizon N = 2:

P2(x) : Vopt2 (x) = min V2({xk }, {uk }), (5)

subject to: x0 = x

xk+1 = Axk + Buk for k = 0, 1,

uk ∈ U = [−∆,∆] for k = 0, 1.

The objective function in (5) is

V2({xk }, {uk }) = 12

xt2Px2 +12

1∑k=0

(xtk Qxk + utk Ruk

).


A Simple Problem with N = 2

Objective function:

V2({xk }, {uk }) = 12

xt2Px2 +12

1∑k=0

(xtk Qxk + utk Ruk

). (6)

The matrices Q and R are assumed positive semi-definite anddefinite respectively. P is taken as the solution to the algebraicRiccati equation

P = A tPA + Q − K tRK ,

where K = R−1BtPA, R = R + BtPB.Let the control sequence that minimises (6) be

{uopt0 , uopt1 }.

Then the RHC law is given by the first element of the optimalsequence (which depends on the current state x0 = x), that is,

K2(x) = uopt0 .


Dynamic Programming for N = 2

We will use Dynamic Programming to find the analytical solutionfor this problem. We define the partial value functions as:

x0 = x x1 x2

u1u0

Vopt0 (x2) =12

xt2Px2

Vopt1 (x1) = minu1∈U

x2=Ax1+Bu1

12

xt2Px2 +12

xt1Qx1 +12

ut1Ru1

Vopt2 (x) = minuk ∈U

xk+1=Axk +Buk

12

xt2Px2 +12

1∑k=0

(xtk Qxk + utk Ruk

)



The dynamic programming algorithm makes use of the Principleof Optimality, which states that any portion of the optimaltrajectory is itself an optimal trajectory:

Voptk (x) = minu∈U

12

xtQx +12

utRu + Voptk−1(Ax + Bu),

where u and x denote, u = uk and x = xk , respectively.

In the sequel we will use the saturation function:

sat∆(u) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩∆ if u > ∆,

u if |u| ≤ ∆,−∆ if u < −∆.



Theorem (RHC Characterisation for N = 2)

The RHC law has the form

K2(x) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩−sat∆(Gx + h) if x ∈ Z−,−sat∆(Kx) if x ∈ Z,−sat∆(Gx − h) if x ∈ Z+,

(7)

where the gains K ,G ∈ R1×n and the constant h ∈ R are

K = R−1BtPA , G =K + KBKA1 + (KB)2

, h =KB

1 + (KB)2∆,

and the sets (Z−,Z,Z+) are defined by

Z− = {x : K (A − BK )x < −∆} , Z = {x : |K (A − BK )x | ≤ ∆} ,Z+ = {x : K (A − BK )x > ∆} .



Outline of the Proof:

(i) The partial value function Vopt0 :Here x = x2. By definition, the partial value function at timek = N = 2 is

Vopt0 (x) =12

xtPx for all x ∈ Rn.

(ii) The partial value function Vopt1 :Here x = x1 and u = u1. By the principle of optimality, for all x ∈ Rn,

Vopt1 (x) = minu∈U

{12

xtQx +12

utRu + Vopt0 (Ax + Bu)

}

= minu∈U

{12

xtQx +12

utRu +12

(Ax + Bu)tP(Ax + Bu)

}

= minu∈U

{12

xtPx +12

R(u + Kx)2}.




{12

xtPx +12

R(u + Kx)2}.

12xtPx + 1

2 R(u + Kx)2

uuopt1 = −Kx

12xtPx + 1

2R(u + Kx)2

−Kx uuopt1

From the convexity of the function R(u + Kx)2 it then follows thatthe constrained (u ∈ U) optimal control law is given by

uopt1 = sat∆(−Kx) = −sat∆(Kx) for all x ∈ Rn.



Substituting the optimal control

uopt1 = −sat∆(Kx) for all x ∈ Rn.

into the minimisation problem


{12

xtPx +12

R(u + Kx)2}.

we have that the partial value function at time k = N − 1 = 1 is

Vopt1 (x) =12

xtPx +12

R [Kx − sat∆(Kx)]2 for all x ∈ Rn.



(iii) The partial value function Vopt2 :Here x = x0 and u = u0. By the principle of optimality, we havethat, for all x ∈ Rn,


{12

xtQx +12

utRu + Vopt1 (Ax + Bu)

}

= minu∈U

{12

xtQx +12

utRu +12

(Ax + Bu)tP(Ax + Bu)

+12

R [K (Ax + Bu) − sat∆(K (Ax + Bu))]2}

=12

minu∈U{xtPx + R(u + Kx)2

+ R [KAx + KBu − sat∆(KAx + KBu)]2}.



uopt0 = arg minu∈U{ f1(u)︷��︸︸��︷R(u + Kx)2 +

f2(u)︷��︸︸��︷R [KAx + KBu − sat∆(KAx + KBu)]2

}.

Case (a): x ∈ Z−

x ∈ Z−

⇔KAx + KB(−Kx) < −∆

f1(u)f2(u)

f1(u) + f2(u)

−Kx u



We conclude that

uopt0 = arg minu∈U{R(u + Kx)2 + R [KAx + KBu − sat∆(KAx + KBu)]2

}

= arg minu∈U{R(u + Kx)2 + R [KAx + KBu + ∆]2

}

Hence

uopt0 = −sat∆(Gx + h) for all x ∈ Z−,where

G =K + KBKA1 + (KB)2

, h =KB

1 + (KB)2∆.





}.

Case (b): x ∈ Z

x ∈ Z⇔|KAx + KB(−Kx)| ≤ ∆

f1(u)

f2(u)

f1(u) + f2(u)

−Kx u



We conclude that


}

= arg minu∈U{R(u + Kx)2

}

Hence

uopt0 = −sat∆(Kx) for all x ∈ Z.





}.

Case (c): x ∈ Z+

x ∈ Z+

⇔KAx + KB(−Kx) > ∆

f1(u)f2(u)

f1(u) + f2(u)

−Kx u



We conclude that


}

= arg minu∈U{R(u + Kx)2 + R [KAx + KBu − ∆]2

}

Hence

uopt0 = −sat∆(Gx − h) for all x ∈ Z+,where

G =K + KBKA1 + (KB)2

, h =KB

1 + (KB)2∆.

�Centre of Complex DynamicSystems and Control

Motivating Example Revisited

We have, from the previousresult, that the control law is:

K2(x) =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩−sat∆(Gx + h) if x ∈ Z−,−sat∆(Kx) if x ∈ Z,−sat∆(Gx − h) if x ∈ Z+,

or, equivalently:

K2(x) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

−∆ if x ∈ R1,

−Gx − h if x ∈ R2,

−Kx if x ∈ R3,

−Gx + h if x ∈ R4,

∆ if x ∈ R5.

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

Z−

Z

Z+

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

x1k

x2 k

R1

R2

R3

R4

R5


Conclusions

Note that we have obtained, for the case N = 2, the followingexpression for the receding horizon control law

K2(x) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

−∆ if x ∈ R1,

−Gx − h if x ∈ R2,

−Kx if x ∈ R3,

−Gx + h if x ∈ R4,

∆ if x ∈ R5.

That is, the control law can be characterised as a piecewise affinefunction of the state x.

The calculation using Dynamic Programming can be extended tolonger prediction horizons, N > 2. However, we will insteadreconsider this same problem, and further extensions, usinggeometric arguments later in the course.


analytical solution of constrained lq optimal … solution of constrained lq optimal control for...

Documents