combining state-dependent riccati equation (sdre) methods with model predictive neural control...

7
Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC) Eric A. Wan and Alexander A. Bogdanov Oregon Graduate Institute of Science & Technology 20000 NW Walker Rd, Beaverton, Oregon 97006 [email protected], [email protected] Abstract In this paper, we present a method for control of MIMO non-linear systems based on receding horizon model pre- dictive control (MPC) that is used to augment with state- dependent Riccati equation (SDRE) control. The MPC is implemented using a neural network (NN) feedback con- troller, hence we refer to the approach as model predictive neural control (MPNC). The NN is optimized to minimize state and control costs, subject to dynamic and kinematic con- straints. While SDRE control provides robust control based on a pseudo-linear formulation of the dynamics, the MPNC utilizes highly accurate numerical nonlinear models. By com- bining the two techniques, we achieve overall improved ro- bustness and performance. Specifically, the SDRE technique is used in a number of ways to provide a) enhanced local sta- bility of the system, b) an initial feasible control trajectory for use in the MPC optimization, c) a control Lyapunov function (CLF) for approximation of the ’cost-to-go’ used in the re- ceding horizon optimization, and d) increased computational performance by approximating plant Jacobians used in the MPC optimization. Results are illustrated with an example involving control of a highly realistic helicopter model. 1. Introduction Model predictive control (MPC) is an optimization based framework for learning a stabilizing control sequence which minimizes a specified cost function [9, 8]. For general nonlin- ear systems, this requires a numerical optimization procedure involving iterative forward (and backward) simulations of the model dynamics. An ’initial’ guess of a stabilizing control sequence is essential for implementation and proper conver- gence. The resulting control sequence represents an ’open- loop’ control law, which can then be re-optimized on-line at periodic update intervals to improve robustness. Our approach to MPC utilizes a combination of an SDRE controller and an optimal NN controller. The SDRE tech- nique [3, 4] is an improvement over traditional linearization This work was sponsored by DARPA under grant F33615-98-C-3516 with principal co-investigators Richard Kieburtz, Eric Wan, and Antonio Baptista. We also would like to express special thanks to Ying Long Zhang, Andy Moran and Magnus Carlsson for assistance with the Flightlab model. based Linear Quadratic (LQ) controllers (SDRE control will be elaborated on in Section 3.1). In our framework, the SDRE controller provides an initial stabilizing controller which is augmented by a NN controller. The NN controller is opti- mized using a calculus of variations approach to minimize the MPC cost function. For simplicity, we will refer to the com- bined control system (SDRE+NN) as model predictive neural control (MPNC). Earlier related work on our approach is de- scribed in [11, 2]. We begin in the next section with additional details on the MPC and receding horizon framework. 2. MPC and Receding Horizon Control The general MPC optimization problem involves minimizing a cost function (1) which represents an accumulated cost of the sequence of states and controls from the current discrete time to the final time . For regulation problems . Optimization is done with respect to the control sequence subject to constraints of the system dynamics, (2) As an example, corre- sponds to the standard Linear Quadratic cost. For linear systems, this leads to linear state-feedback control, which is found by solving a Riccati Equation [5]. In this paper we consider general MIMO non-linear systems with tracking er- ror costs of the form (3) where , with corresponding to a desired reference state trajectory. The last term provides a penalty for control saturation, where each element ( ) of the vector is defined as , if sign , otherwise 1

Upload: amin342

Post on 30-Dec-2015

38 views

Category:

Documents


1 download

DESCRIPTION

SDREState-Dependent Riccati EquationMPNCModel Predictive Neural

TRANSCRIPT

Page 1: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

Combining State-Dependent Riccati Equation (SDRE) methodswith Model Predictive Neural Control (MPNC)

Eric A. Wan and Alexander A. Bogdanov

Oregon Graduate Institute of Science & Technology20000 NW Walker Rd, Beaverton, Oregon 97006

[email protected], [email protected]

Abstract

In this paper, we present a method for control of MIMOnon-linear systems based on receding horizon model pre-dictive control (MPC) that is used to augment with state-dependent Riccati equation (SDRE) control. The MPC isimplemented using a neural network (NN) feedback con-troller, hence we refer to the approach as model predictiveneural control (MPNC). The NN is optimized to minimizestate and control costs, subject to dynamic and kinematic con-straints. While SDRE control provides robust control basedon a pseudo-linear formulation of the dynamics, the MPNCutilizes highly accurate numerical nonlinear models. By com-bining the two techniques, we achieve overall improved ro-bustness and performance. Specifically, the SDRE techniqueis used in a number of ways to provide a) enhanced local sta-bility of the system, b) an initial feasible control trajectory foruse in the MPC optimization, c) a control Lyapunov function(CLF) for approximation of the ’cost-to-go’ used in the re-ceding horizon optimization, and d) increased computationalperformance by approximating plant Jacobians used in theMPC optimization. Results are illustrated with an exampleinvolving control of a highly realistic helicopter model.

1. Introduction

Model predictive control (MPC) is an optimization basedframework for learning a stabilizing control sequence whichminimizes a specified cost function [9, 8]. For general nonlin-ear systems, this requires a numerical optimization procedureinvolving iterative forward (and backward) simulations of themodel dynamics. An ’initial’ guess of a stabilizing controlsequence is essential for implementation and proper conver-gence. The resulting control sequence represents an ’open-loop’ control law, which can then be re-optimized on-line atperiodic update intervals to improve robustness.

Our approach to MPC utilizes a combination of an SDREcontroller and an optimal NN controller. The SDRE tech-nique [3, 4] is an improvement over traditional linearization

This work was sponsored by DARPA under grant F33615-98-C-3516with principal co-investigators Richard Kieburtz, Eric Wan, and AntonioBaptista. We also would like to express special thanks to Ying Long Zhang,Andy Moran and Magnus Carlsson for assistance with the Flightlab model.

based Linear Quadratic (LQ) controllers (SDRE control willbe elaborated on in Section 3.1). In our framework, the SDREcontroller provides an initial stabilizing controller which isaugmented by a NN controller. The NN controller is opti-mized using a calculus of variations approach to minimize theMPC cost function. For simplicity, we will refer to the com-bined control system (SDRE+NN) as model predictive neuralcontrol (MPNC). Earlier related work on our approach is de-scribed in [11, 2]. We begin in the next section with additionaldetails on the MPC and receding horizon framework.

2. MPC and Receding Horizon Control

The general MPC optimization problem involves minimizinga cost function

Jt =

tfinalXk=t

Lk(xk;uk); (1)

which represents an accumulated cost of the sequence ofstates xk and controls uk from the current discrete time t tothe final time tfinal. For regulation problems tfinal = 1.Optimization is done with respect to the control sequencesubject to constraints of the system dynamics,

xk+1 = f(xk ;uk): (2)

As an example, Lk(xk ;uk) = xTkQxk + uTkRuk corre-sponds to the standard Linear Quadratic cost. For linearsystems, this leads to linear state-feedback control, which isfound by solving a Riccati Equation [5]. In this paper weconsider general MIMO non-linear systems with tracking er-ror costs of the form

L(ek;uk) = eTkQek + uTkRuk + uoverTk Rsatuoverk (3)

where ek = xk � xdesk , with xdesk corresponding to a desired

reference state trajectory. The last term provides a penalty forcontrol saturation, where each element (j = 1; : : : ;m) of thevector uover is defined as

uoverkj=

�0, if jukj j � usatj

ukj � usatj sign(ukj ), otherwise

1

Page 2: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

In general, a numerical optimization approach is used to solvefor the sequence of controls, fukg

tfinalt = argmin Jt, corre-

sponding to an open-loop control law, which can then be re-optimized on-line at periodic update intervals. The complex-ity of the approach is a function of the final time tfinal, whichdetermines the length of the optimal control sequence. Inpractice, we can reduce the number of computations by takinga Receding Horizon (RH) approach, in which optimization isperformed over a shorter fixed length time interval. This isaccomplished by rewriting the cost function as

Jt =

t+N�1Xk=t

Lk(xk;uk) + '(et+N ) (4)

where the last term'(et+N ) denotes the cost-to-go from timet+N to time tfinal. The advantage is that this yields an op-timization problem of fixed length N . In practice, the truevalue of '(et+N ) is unknown, and must be approximated.Most common is to simply set '(et+N ) = 0; however, thismay lead to reduced stability and poor performance for shorthorizon lengths [7]. Alternatively, we may include a controlLyapunov function (CLF), which guarantees local stability ifthe CLF is an upper bound on the cost-to-go [7]. In Sec-tion 3.2, we will use the solution to the SDRE controller toform the cost-to-go function.

3. Neural and SDRE Receding Horizon Control

Difficulties with MPC include, a) the need for a ‘good’ initialsequence of controls that is capable of stabilizing the model,and b) the need to re-optimize at short intervals to avoid prob-lems associated with open-loop control. We address theseissues by directly implementing a feedback controller as acombination of an SDRE stabilizing controller and a neuralcontroller:

uk = � �Nnet(xk; ek;w) + (1� �) �K(xk)ek; (5)

where 0 < � < 1 is a constant. The SDRE controllerK(xk)ek provides a robust stabilizing control (as establishedvia simulation), while the weights of the neural network ware optimized to minimize the overall receding horizon MPCcost. Each of these components are detailed in the followingsections.

3.1. SDRE Controller

Referring to the system state-space Equation 2, an SDRE con-troller [3] is designed by reformulating f(xk ;uk) as

f(xk;uk) = �(xk)xk + �(xk)uk: (6)

This yields the resulting system

xk+1 = �(xk)xk + �(xk)uk : (7)

This representation is not a linearization. To illustrate theprinciple, consider a simple scalar example: xk+1 = sinxk+xk cosxkuk, then �(xk) =

sinxkxk

;�(xk) = xk cosxk.Based on this new state-space representation, we design

an optimal LQ controller to track the desired state xdesk . Thisleads to the nonlinear controller,

usdk = �R�1BT (xk)P(xk)(xk � xdesk ) � K(xk)ek; (8)

where P(xk) is a solution of the standard Riccati Equationsusing state-dependent matrices �(xk) and �(xk), which aretreated as being constant. The procedure is repeated at everytime step at the current state xk and provides local asymp-totic stability of the plant [3]. In practice, the approach hasbeen found to be far more robust than LQ controllers basedon standard linearization techniques.

3.2. Neural Network Controller

The overall flowgraph of the MPNC system in shown in Fig-ure 1. Optimization is performed by learning the weights ofthe NN in order to minimize the receding horizon MPC costfunction (Equation 4) subject to the system dynamics and thecomposite form of overall feedback controller (Equation 5).The problem is solved by taking a standard calculus of vari-ations approach, where �k and �k are vectors of Lagrangemultipliers in the augmented cost function

Jt =

t+N�1Xk=t

�L(ek;uk) + �Tk

�xk+1 � f(xk ;uk)

�+

+ �Tk

�uk � �unnk � (1� �)K(xk)ek

��+ '(et+N ) (9)

where unnk = Nnet(xk; ek;w). The cost-to-go '(et+N ) isapproximated using the solution of the SDRE at time t+N ,

'(et+N ) = eTt+NP(xt+N )et+N �

�1X

k=N

n(xk � x

dest+N )TQ(xk � x

dest+N ) + uTkRuk

o: (10)

This CLF provides the exact cost-to-go for regulation assum-ing a linear system at the horizon length. A similar formula-tion was used for nonlinear regulation control in [10].

We can now derive the recurrent Euler-Lagrange equa-tions

�k =

�@f(xk;uk)

@xk

�T

�k+1 +

�@L(ek;uk)

@ek

@ek@xk

�T

+

��@Nnet(xk; ek;w)

@xk+ �

@Nnet(xk; ek;w)

@ek

@ek@xk

+ (1� �)@�K(xk)ek

�@xk

�T

�k (11)

�ki =

8>>>><>>>>:

��@f(xk;uk)

@uk

�T

�k+1 +

�@L(ek;uk)

@uk

�T �i

if juki j � usati

2�Rsatu

overk

�i, if juki j > usati

(12)

2

Page 3: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

NeuralNetwork

1 α−

( )kK x

Plantdynamics

q�1

Neural control

SDRE control

Plant

Ju ( )�

( )finaleJ ⋅

( )eJ ⋅

Desiredstate

Target’s coord.and vel.

lim ( )J ⋅

Tk ke Qe

overT overk sat ku R u T

k ku Ru

( )t Neϕ +

kx

ke

ku

kx

kx

1kx +

nnku

sdku

kx

deskx

tarkx

deskx

ke

Figure 1: MPNC signal flow diagram

Des. stateJacobian

NeuralNetworkJacobian

��

PlantJacobians

1q+

Neural control

SDREJacobian

Plant

1kλ +kµ

( )( )k k

k

K x e

x

∂∂

( )t N

k

e

x

ϕ +∂∂

2 overTk satu R 2 T

ku R

kdx

kdunnkdunn

kdx

nnkde

nnkde

kde

sdkdu

deskdx

nnkdx

2 Tke Q

kλdesk

k

x

x

∂∂

Figure 2: Adjoint system

with �t+N =�@'(et+N )@et+N

@et+N@xt+N

�T, k = (t+N)�1, (t+N)�

2, : : : ; t, and i = 1; : : : ;m (m is the dimension of controlvector u). For L(ek;uk) given by Equation 3, @L(ek;uk)

@ek=

2eTkQ, and each element i of gradient vector�@L(ek;uk)

@uk

�i=

2�uTkR

�i, if juki j � usati .

These equations correspond to an adjoint system showngraphically in Figure 2, with optimality condition

@Jt@w

=

t+N�1Xk=t

��Tk@Nnet(xk; ek;w)

@w= 0: (13)

The overall training procedure for the NN can now be sum-marized as follows:

1. Simulate the system forward in time for N time steps(Figure 1). Note that the SDRE controller is updated ateach time step.

2. Run the adjoint system backward in time to accumu-late the Lagrange Multipliers (Figure 2). Jacobians areevaluated analytically or by perturbation. In practice@K(xk)ek=@xk � K(xk)@ek=@xk.

3. Update the weights using gradient descent 1, �w =� @Jk

@w.

4. Repeat until convergence or until an acceptable level ofcost reduction is achieved.

The NN is initialized with small weights, which allows theSDRE controller to initially dominate, providing stable track-ing and good conditions for training. As training progresses,the NN decreases the tracking error. This reduces the SDRE

1In practice we use an adaptive learning rate for each weight in the net-work using a procedure similar to delta-bar-delta [6]

3

Page 4: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

control output, which in turn gives more authority to the neu-ral controller. The training process is repeated at the updateinterval to recompute new weights w for the next horizon.

3.3. Computation considerations and approximations

MPC design implies numerous simulations of the system for-ward in time and the adjoint system backward in time. Com-putations scale with the number of training epochs and thehorizon length. The most computational demanding opera-tions correspond to solving the Riccati equation for the SDREcontroller in the forward simulation, and the numeric (i.e., byperturbation) computation of the plant Jacobians in the back-ward simulation. In order to approach real-time feasibility,we consider possible trade-offs between computational effortand controller performance through a number of successivesimplifications:

1. The plant Jacobian with respect to xk is approxi-mated as @f(xk;uk)

@xk� �(xk), i.e., we use the state-

dependent transition matrix found analytically fromthe model. In addition, the SDRE Jacobian is ap-proximated as @

�K(xk)ek

�=@xk � K(xk)@ek=@xk,

where @ek=@xk, can then be calculated analytically asgiven in Equation 15.

2. Same as (1), plus the discrete system matrix �(xk) ismemorized at each time step k during the first epochand then used for all subsequent epochs within the cur-rent horizon. Here we assuming the control sequenceand thus the Jacobians are not changing significantlyduring training.

3. Same as (2), plus the matrices �(xk) and K(xk) arememorized for the first epoch, and again used for theall subsequent epochs. This simplification, allows us toavoid resolving the SDRE in the corresponding forwardsimulations.

4. Same as (3), plus all matrices that were calculated inthe previous horizon are re-used in the current horizonfor the time segment that the two overlap. For example,if horizon length = 10, and update interval = 2, thenthere will be an overlap of 8 time steps, and only thelast 2 time steps require computations of new matrices.

The computational versus performance trade-offs associ-ated with these simplifications are evaluated in the context ofhelicopter control in Section 4.4.

4. Application to Helicopter Control

We develop helicopter control through the use of the Flight-Lab simulator [1]. Flightlab is a commercial software productdeveloped by Advanced Rotorcraft Technologies. It includesmodeling of the main rotor, tail rotor and fuselage flow inter-ference, effects of dynamic stall, transonic flow, aeroelastic

desku

deskv

deskw Target

Figure 3: Reference trajectory generation

response, dynamic loads, vortex wake, blade element aerody-namics, and can also provide finite elements structure analy-sis. For our research purposes, we generated a high-fidelityhelicopter model with a rigid fuselage, flexible blades, quasi-unsteady airloads and 3-state inflow. The model is presentedas a numerical discrete-time nonlinear system with 76 inter-nal states.

A challenge in using flight simulators such as FlightLabto design controllers, is that governing dynamic equations arenot readily available (i.e., the aircraft represents a ‘black-box’model). This precludes the use of most traditional nonlinearcontrol approaches that require an analytic model. To utilizethe MPNC approach, the numeric simulator model is approx-imated by a 6DOF rigid body dynamic model, providing aset of governing equations at each time instance necessary todesign the SDRE. The neural network controller, however, isstill trained to minimize the MPC cost based on the full non-linear simulator model. Details of this are discussed in thefollowing section.

4.1. Helicopter and FlightLab Design Considerations

For helicopter control, we define the state xk to correspondto the standard states of a 6DOF rigid body model. This12 dimensional state vector consists of Cartesian coordinatesin the inertial frame x; y; z, Euler angles ; �; � (yaw, roll,pitch), linear velocities u; v; w and angular velocities p; q; rin the body coordinate frame. Technically, this represents areduced state-space, as our FlightLab model utilizes a totalof 76 internal states (e.g., rotor states). However, we treatthe additional states as both unobservable and uncontrollablefor purposes of deriving the controller. There are 4 controlinputs, �0;�1C ;�1S ;�T0, corresponding to the main col-lective, lateral cyclic, longitudinal cyclic, and tail collective(incident angles of blades) 2. While the dynamics of the con-trol mechanisms are simulated (and thus accounted for in theMPC design), the explicit states associated with these dynam-ics are again not utilized in our derivations.

The tracking error for the helicopter, ek = xk � xdesk , isdetermined by the trajectory of a reference target x tar

k . The

2We set control constraints as follows: Max. �O = �20 deg., Max.�1C = �20 deg., Max. �1S = �20 deg., Max. �TO = �45 deg.

4

Page 5: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

A(x) =

0BBBBBBBBBBBBBBBBBBBBB@

0 �q=2 �w=2 �g s��

r=2 0 0 v=2 0 : : : 0q=2 0 u=2 0 �p=2 �v=2 0 0 0 : : : 00 0 0 0 0 a0r � a1p 0 a0p+ a1r 0 : : : 00 0 c� 0 0 0 0 �s� 0 : : : 0

�r=2 p=2 0 0 0 w=2 gc� s��

�u=2 0 : : : 0

0 0 a2p+ a3r 0 0 a2q 0 a3q 0 : : : 00 0 s� tan � 0 0 1 0 c� tan � 0 : : : 00 0 a4p+ a5r 0 0 a4q 0 a5q 0 : : : 00 0 s�=c� 0 0 0 0 c�=c� 0 : : : 0

c c�s s�++c c�s�

0 0�s c�++c s�s�

0 0 0 0 : : : 0

s c��c s�++s�s c�

0 0c c�++s�s�s

0 0 0 0 : : : 0

�s� c�c� 0 0 c�s� 0 0 0 0 : : : 0

1CCCCCCCCCCCCCCCCCCCCCA

s(�), c(�) denote sin and cos, a0 = Izz�Ixx2Iyy

, a1 = IxzIyy

, a2 = Ixz(Ixx�Iyy+Izz)2(IxxIzz�I2xz)

, a3 = IyyIzz�I2zz�I2

xz

2(IxxIzz�I2xz),

a4 = I2xx�IyyIxx+I2xz

2(IxxIzz�I2xz), a5 = Ixz(�Ixx+Iyy�Izz)

2(IxxIzz�I2xz), xk = (u; w; q; �; v; p; �; r; ; x; y; z)T .

Figure 4: State-dependent system matrix representation.

target specifies desired coordinates and velocities in the iner-tial frame (with roll, pitch and angular velocities set to zero).The reference state is then projected into the body frame toproduce the desired state,

xdesk = T(xk)xtark ; (14)

where T(xk) is an appropriate projection matrix consistingof necessary rotation operators. Minimization of this errorcauses the the helicopter to move in the direction of the targetmotion (see Figure 3 on the previous page).

4.2. Specifics of the SDRE Controller

As stated earlier, the SDRE controller requires a set of gov-erning equations, which are not available in FlightLab. Thuswe derive a 6DOF rigid body model as an analytical approx-imation. The simplified dynamics are given by

_u = �(wq � vr)� g sin � + Fx=Ma

_v = �(ur � wp) + g cos � sin�+ Fy=Ma

_w = �(vp� uq) + g cos � cos�+ Fz=Ma

Ixx _p = (Iyy � Izz)qr + Ixz( _r + pq) + L

Iyy _q = (Izz � Ixx)rp + Ixz(r2 � p2) +M

Izz _r = (Ixx � Iyy)pq + Ixz( _p� qr) +N

_� = p+ q sin� tan � + r cos� tan �_� = q cos�� r sin�_ = q sin� sec � + r cos� sec �0

@ _x_y_z

1A = Rot1( ; �; �)

0@ u

vw

1A

where Rot1( ; �; �) is a rotation matrix (coordinate trans-formation) from body frame to inertial frame, Ma is the air-

craft mass, and Fx; Fy; Fz; L;M;N are rotor-induced forcesand moments. The forces and moments are nonlinear func-tions of helicopter states and control inputs. We then rewritethis into a SDRE continuous canonical representation _x =A(x)x +B(x)u. The matrixA(x) is given explicitly in Fig-ure 4. Thus �(xk) is obtained fromA(x) by discretization ateach time step (e.g., �(xk) = eA(xk)�t) 3.

Since the nonlinear mapping of states and control inputsto rotor-induced forces and moments are not known, B(x)cannot be explicitly found. Thus we approximate �(xk) bylinearizing the full FlightLab model with respect to the con-trol inputs uk around an equilibrium point in hover at the cur-rent altitude or appropriate trim state. This is accomplishednumerically by successively perturbing each control input tothe simulator at the current state and time.

Finally, given�(xk) and �(xk), we can design the SDREcontrol gain K(xk) at each time step. Note that while �(�)is based on the 6DOF model, the state argument xk comesdirectly from the FlightLab model. We have found that thismixed approach using the approximate model plus the lin-earized control matrix, �(xk), is far more robust that simplyusing a standard LQ approach based on linearization for allsystem matrices.

4.3. Specifics of the Neural Network Controller

The neural controller is designed using the full FlightLab he-licopter model and augments the SDRE controller. The over-all flowgraph of the system in consistent with that shown in

3Parameters settings to match FlightLab are Ma = 16308 lb, Ixx =

9969 lb � ft2, Iyy = 44493 lb � ft2 , Izz = 44265 lb � ft2 , Ixz = �1478 lb �ft2.

5

Page 6: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

0

50

100

150

200

0

50

100

150

200

100

120

140

160

180

200

x−position [ft]

Position in the XYZ space

y−position [ft]

z−po

sitio

n [f

t]

1

2

3

4

5

6

7

8

910

11

12

0

50

100

150

200

0

50

100

150

200

100

120

140

160

180

200

x−position [ft]

Position in the XYZ space

y−position [ft]

z−po

sitio

n [f

t]

1

2

3

4

5

6

7

8

9

10

11

12

Figure 5: Test trajectory: a) SDRE, b) MPNC

Figure 1. However, the neural controller is specified as:

unn = Nnet(u; v; w; p; q; r; s ; s�; s�; c ; c�; c�; e;w);

where we have included sines and cosines of the yaw, pitch,and roll angles. This is motivated since the helicopter dy-namics depend on such trigonometric functions of the Eulerangles. Coordinates of the aircraft in the inertial frame do notinfluence dynamics, and are excluded as inputs.

The adjoint system (see Figure 2) also requires a slightmodification to incorporate the effects of the tracking error.Specifically, the term appearing in Equation 11 is given by

@ek@xk

= I�@�T(xk)x

tark

�@xk

(15)

where the partial is evaluated analytically by specifyingT(xk).

All other aspects of the NN training (e.g., approximationof the cost-to-go, computational considerations, etc.) are asdescribed previously.

0 5 10 15 20 25 30 35 40 45−20

−15

−10

−5

0

5

10

15

20

time [s]

Main rotor collective angle [deg]

0 5 10 15 20 25 30 35 40 45−20

−15

−10

−5

0

5

10

15

20

time [s]

Main rotor collective angle [deg]

Figure 6: Collective control: a) SDRE, b) MPNC

4.4. Simulation Results

Figure 5 shows a test trajectory for the helicopter (verticalrise, forward flight, left turn, u-turn, forward flight to hover).The figure compares tracking performance at a velocity of20 ft=s for the MPNC system (SDRE+NN) versus a standardSDRE controller (MPNC settings are: horizon = 25 , updateinterval = 5, training epochs = 10, sampling time = 0:097sec, � = 0:7). Note that a standard LQ controller based onlinearization exhibits loss of tracking and crashes for veloci-ties above 12 ft=s. The smaller tracking error for the MPNCcontroller is apparent (note the reduced overshooting and os-cillations at mode transitions). Figure 6 shows that the controleffort spent by the MPNC controller is also less.

Table 1 illustrates the trade-offs between computationaleffort and control performance (accumulated cost, J t) for thesimplifications discussed in Section 3.3. Clearly, substantialspeed-up in simulation time can be achieved with only a mi-nor loss of performance (while we have included represen-tative simulation times for relative comparison, the experi-ments were performed in MATLAB (with the standalone ve-hicle model generated by FlightLab simulator) and were notoptimized for efficient implementation). Note that all sim-plifications still result in a substantial improvement over the

6

Page 7: Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC).pdf

Table 1: Simplification levels vs. computing time andperformance costs (Pentium-3 750 MHz, Linux).

Simp. level Accurate 1 2 3 4Cost 5.85 6.15 7.40 7.17 7.84GFLOPs 1060 84:1 84:0 13:7 7:22

Sim. time 34hr. 3.5hr. 54min. 33min. 24min.

Table 2: Performance cost comparisons.

Hid. neurons 50 neurons 200 neuronsHoriz./update 10/2 25/5 50/10 10/2 25/5 50/10MPNC + ' 8.76 8.29 10.31 7.56 7.17 8.95MPNC w/o ' � 7.47 9.58 10.78 6.21 8.50

SDRE 31.28

standard SDRE controller, which has an associated cost of31:28. For all subsequent simulations we use simplificationlevel 3.

Finally, Table 2 summarizes comparisons of the accumu-lated cost with respect to the horizon length and MPC updateinterval, number of neurons in the hidden layer, and with andwithout the use of the cost-to-go '(et+N ). The weights ofthe NN are trained for 10 epochs for each horizon (numberof simulated trajectories). Missing data in the table corre-sponds to a case where states and control inputs exceededthe envelope of Flightlab model consistency. Results indi-cate that the optimal horizon length is between 10 to 25 timesteps (1-2 seconds). The importance of the cost-to-go func-tion is apparent for short horizon lengths. On the other hand,inclusion of the cost-to-go does not appear to help for longerhorizons. Overall, significant performance improvement isclearly achieved with the MPNC controller relative to the pureSDRE controller.

5. Conclusions

In this paper, we have presented a new approach to recedinghorizon MPC based on a NN feedback controller in combina-tion with an SDRE controller. The approach exploits both asophisticated numerical model of the vehicle (FlightLab) andits analytical nonlinear approximation (6DOF model). TheNN is optimized using properties of the full FlightLab simu-lator to minimizes the MPC cost, while the SDRE controlleris designed using the approximate model and provides a base-line stabilizing control trajectory. In addition, we considereda number of simplifications in order to improve the compu-tational requirements of the approach. Overall, results ver-ify the superior performance of the approach over traditionalSDRE (and LQ) control. Future work includes incorporationof vehicle-environment interaction model and use of environ-

mental short term forecast for improved flight control.

6. References

[1] Flightlab release note - version 2.8.4. Advanced RotorcraftTechnology, Inc., 1999.

[2] A. A. Bogdanov and E. A. Wan. Model predictive neural con-trol of a high fidelity helicopter model. In Submitted to AIAAGuidance Navigation and Control Conference, Montreal, Que-bec, Canada, August 2001.

[3] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear reg-ulation and nonlinear H-infinity control via the state-dependentRiccati equation technique: Part1, Theory. In Proceedings ofthe International Conference on Nonlinear Problems in Avia-tion and Aerospace, Daytona Beach, FL, May 1996.

[4] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear reg-ulation and nonlinear H-infinity control via the state-dependentRiccati equation technique: Part2, Examples. In Proceedingsof the International Conference on Nonlinear Problems in Avi-ation and Aerospace, Daytona Beach, FL, May 1996.

[5] G. F. Franklin, J. D. Powell, and M. L. Workman. Digital Con-trol of Dynamic Systems. Addison-Wesley, Reading, MA, sec-ond edition, 1990.

[6] R. A. Jacobs. Increasing rates of convergence through learningrate adaptation. Neural Networks, 1(4):295–307, 1988.

[7] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing receding hori-zon control of nonlinear systems: a control Lyapunov functionapproach. In Proceedings of American Control Conference,1999.

[8] E. S. Meadow and J. B. Rawlings. Nonlinear Process Control,chapter Model predictive control. PHALL, 1997.

[9] S. J. Qin and T. A. Badgwell. An overview of industrial modelpredictive control technology. Chemical Process Control -AIChE Symposium Series, pages 232–256, 1997.

[10] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and C. Mracek.Receding horizon control Lyapunov function approach to sub-optimal regulation of nonlinear systems. The Journal of Guid-ance, Control, and Dynamics, 23(3):399–405, May-June 2000.

[11] E. A. Wan and A. A. Bogdanov. Model predictive neural con-trol with applications to a 6 DoF helicopter model. In Proceed-ings of IEEE American Control Conference, Arlington, VA,June 2001.

7