applications of optimal control theory

Applications of Optimal Control Theory using Artificial Neural Networks

J.M. Martinez (*), C. Barret, M. Houkari, P. Meyne, M. Dominguez (*)

Laboratoire de Robotique d'Evry, Allke Jean Rostand, 91025 Evry, France

(*) Centre d'Etudes de Saclay, DMT, 91190 Gif sur Yvette, France

e-mail : [email protected]

Abstract : This paper shows neural networks capabilities in optimal control applications of non linear dynamic systems. Our method is based on a classical method concerning the direct research of the optlmal control using gradient techniques. We show that neural approach and backpropagation paradigm are able to solve et-ficiently equations relative to necessary conditions for an optimizing solution. We have taken into account the known capabilities of neural networks in approximation functions. And for dynamic systems, we have generalized the indirect learning of inverse model adaptive architecture that is capable of defining an optimal control in relation to a temporal criterion.

Keywords : neural control theory, adaptive idenufication, adaptive control, optimal control.

1. Introduction Neural techniques have already shown their ability in identification and control processes [1,2,3,4,5,6]. At first these techniques were introduced for static processes from direct or induect learning architectures of the inverse model. Then they were extended for dynamic processes. These approa- ches are only local optimisation based, i.e. the effects of the control in time are not really taken into account.

An alternative to this limitation consists in the elaboration of the control from the minimization of a criterion in relation with temporal evolutions of process states. This approach is recurrent networks based where the behaviour of the process is analyzed from a sequence of neural networks models. To fit dynamical behaviour, one can use Back propagation Through Time. BTT (12). Our approach is similar. But here, we will use BTT to deal with optimal control learning. We will use a sequence of neural networks to estimate the temporal evolution of the process states in order to define the best control.

Today, claqsical methods in control theory are sufficiently

mature and are well formalized. Nevertheless, these methods are not suitable in real applications concerning optimal control. They may be implemented by coupling them with neural methods. In [8], one emphazises the <<continuity of the research on artificial neural networks with more tradi- tional research,, in order to take advantages of the control process background knowledge.

Definition of optimal control by classical methods requires good knowledge of the process. An analytical model in al- gebric-derivative or recurrent non linear equations is necessary. In real processes models are not sufficiently known. And when they are, it is not possible to use them in line be- cause of temporal constraints. So, in real applications, one usually deals with idenhfication problem using adaptive linear models which provides the typical feature of the process to classical controllers. But these models are not able to estimate the process behaviour over a long time and optimal control is not possible.

In relation with classical methods, neural techniques can be distinguished by two characteristics. The fmt one consists in the smcture of neural models, i.e. non linear models gene- ralizing the classical linear approach. The second is relative to adaptation algorithms like backpropagation that fits real applications in adaptive and optimal control. We detal these two points below.

For the most part of real cases processes are non linear. The neural approach with its non linear features is better than the linear approach. Neural models are to be seen as a generalization of linear models. Indeed, if activation functions of neural units are linear, we are back to linear identification models. In an adaptive scheme, the parameters of identification models are synaptic weights. In the other hand, it is known from [9], that a two-layer network with an arbitrarily large number of units in the hidden layer can approximate any continuous function f E C(R", RP) over a compact sub- set of R'. And we can add the high degree parallelism of the neural computation capbable of dealing complex applications, using dedcated hardware.

The second point relative to backpropagation is more tech- nical. It has been shown in [2, 6, 91 that backpropagation provides easily the jacobian of the neural model. So we can use it as if it was the jacobian of the process. The first idea is to help operators of the process in direct or indirect mode to define the better actions in relation to a given goal, i.e. to answer the requests ((What If ?n and <<What For '5). <<What if>> to help process operators to estimate perturbations on the process before any decision on the control, and <<What for>> to propose them some variations on the control to reach a de-

0-7803-2129-4194 $3.00 @ 1994 IEEE

mailto:[email protected]

sired goal on the process state. The other idea that we pre- sent here, is to use this appropriateness to extend control help towards the definition of a real optimal control.

Section 2 will describe the classical direct method to find an optimizing solution. This method gives necessary conditions. It is a well known method which can be found in [ 101. Section 3 presents the resolution of these necessary conditions using neural techniques. This approach is based on indirect learning of inverse model to identify the process by a multi layered network. Section 4 gives our views on this approach that seems very attractive for real applications. We conclude in the Section 5 .

2. Optimization Method We consider non linear dynamic systems which can be described by (1 ) :

X , + l = F ( X , . U , ) X E RP U E Rq

where X, is the state vector and U, is the control vector at discrete time t. From a given initial state XO, the problem is to tind il sequence of optimal control U,, U1, ..., U,, that minimize a given cost function by the equation (2) :

T+ 1

C(X,.X,. .... uo, U,, ...) = c Cr(Xr, Vr) r = 1

This N-srage optimal control problem, when analytical cases are not available, can be solve from numerical techniques by computing gradients cost in relation to the sequence control (3) :

ac aut

&U, = -a.- I € [O,U

To compute sensitivities of the cost with respect to variations in control space, the direct classical method leads to solve associated recurrence equations from final condition (4) :

Yr-, = F x r . Yr+cXr with YT+l = 0

The gadients Cut = aC/dU, are calculated by ( 5 ) :

cur = Fur ' *, + C u r

Details of calculations are given in the Appendix. This scheme needs the process model F (Equation 1) and the associated jacobian Fx (Equation 4) and Fu(Equation 5) . We deal with this using neural techniques identification to provide a neural model and backpropagation to compute gradients of the cost function in relation to control inputs.

ping from inputs to outputs. The vector of parameters (i.e. weights synaptic) are calculated using backpropapation to minimize a cost function based on the discrepancy between mget outputs and network outputs. So, we can use this adaptive scheme to deal with identification process. To perform a model of the process, we propose for example the classical series-parallel method , as seens as in Figure 3.1. Other methods of identification can be used (e.g. feedforward or recurrent networks using stochastic or batch gradients) [ll].

Here, the method enables us to identify a process described by equation (1). So we suppose that the identification problem is solved by a feedforward neural network. The neural model which identifies the process, is a good model for control as long as it gives good enough mapping from inputs (state and control) to outputs (state). Besides. this kind of learning is capable of adapting to possible process drifts if it is kept on line.

Control U, - F(Xn,UJ

* State A

x n ~a Figure 3.1 : Series Parallel Identification

The notation f1 is the unit time delay. Backpropagation also gives differentiations of inputs with respect to ourputs. So, we are going b use backpropagation to solve equations (4) and (5 ) in which we will substitute the jacobian of the neural model for the jacobian of the real process model F (6 and 7) :

y - 1 = @ X r . Y r + C X r

Cur = @ut. *r + C u r

These equations can be solved using backpropagation through neural model. Now we are going to describe our method. Building blocks of propagation and backprapaga- tion steps are described respectively in Figure 3.2 and 3.3. The arrows in thick line represent the result of each

3. Neural Method A feedforward network can be seen as a parameterized map-

1465

step.

P R O P ~ A T I O N I Ut

The PROPAGATION step is the classical forward step for multi-layered networks where we have added the calculation of the grahents of local function q(X,,UJ. This step Figure 3.3 : Backorooazation SteD

- . .

defines each state Xt+l according to the previous state X, and value control U,. During this step the network at rela- tes to process state at discrete time t. So this step must be repeated for t =1 to T+l to get the estimated temporal behaviour of the process at discrete time t = 1,2 ,..., T+1. The par- tial derivatives of local cost function from state and control, Le; cXt = ac,(X,,U,)/dX, and cut = aq(X,,U,)/dU, are also calculated at discrete time t= 0, 1, ..., T.

The BACKPROPAGATION step, as seen as in Figure 3.3, performs a classical backpropagation of adjoint vectors Y, through internal state of neural models Qv The backpropag- tion step provides the terms %tY, and the terms OUt.Y,. To define each adjoint vector Y, and the sensitive relation of the global cost Cut with respect to control vector U,, one must add cut and cxt corresponding to the sensitive relation of lo- tal cost c, with respect to variations in x, and U, respectively. Each of these terms has been calculated during the PROPAGATION step.

From this interpretation, one can find a minimun of the global cost function in relation with the sequence of control vectors U, fort = O , l , ..., T. We show in Figure 3.4 the general architecture to provide the sensitive relations Cut, i.e. to solve the adjoint system. To lighten this figure we have included in each block at the calculation of the respective local gradient cut and cxt

In the following figure, two data streams go through each elementary block (@,, c,). The first one consists in the propagation of pairs (X,, U,) initializing the intemal state of each unity. The second stream is relative to backpropagation seen as an echo occurring on temporal terminal T+1. There is no hypothesis concerning the depth of the temporal terminal. This echo propagates on the horizontal axis the value of adjoint vectors Yt and on the vertical axis the values Cut in relation to variations that must be applied on the control vector. On this figure, the sequential distribution of the cost function appears along the sequence of blocks @,. This distribution fits the definition of the global cost function as the

1466

.

sum of local cost functions. inputs are Xo, XT+ld. In this case and for the particular value

Ficure 3.4 : Adioint System Resolution

This method requires initial values of the control vectors U,. To deal with this problem we use learning capabilities of neural networks. To define initial conditions, the idea is to build a neural controller to estimate the o p m control in relation with the initial state X, and each cost function q. An iterative solution to perform the learning step of the neural controller consists in channelling sensitive realtions Cut to the neural controller.

These sensitives relations are seen as errors on the last layer of the neural controller. From these errors, the backpropagation will adapt modifications of synaptic weights of the neural controller to minimize these errors. And little by little, after several iterations, neural controller will leam optimal control in relation to initial Xo states and desired states X,d included in local cost ct.

Figure 3.5 shows the general principle of this method. In general. the number of neural controller inputs is dependent on the number of desired states and desired control vectors, i.e. on the depth of the temporal window of the cost function. In general, the main objective is to control a path in the state space. So the cost function is only dependent on the desired states and the neural controller inputs are Xo, Xld,...,XT+, . Similarly, sometimes the main goal is to reach a desired state at discrete time T+1, so in this case the neural controller

d

T = 0, we recognize the neural architecture which was proposed [ l , 6,7].

Figure 3.5 : O~timal Control Learninc Architecture

4. Discussion This approach can be generalized easily to processes which are described by non linear recurrence relations such as Xt+l = F(X,,X,-, ,..., U,,U,, ,... ). This representation is certainly more adapted to processes for which delay lines link state and control vectors. On the other hand, if there is no access to the state vector, estimation techniques such as Kalman fil- ters or other neural techniques can be used.

The gradient problems must be solved : value of the step, the criterion to stop iterative procedure and the convergence to a local minimun.We must also deal with all problems concerning the numeric stability to solve the adjoint system.

Nevertheless we have applied our method to solve the problem of the optimal control for a second order system, described by d%/dt2 = U. For the states of the process, we dealt with position and variation of the position using an Euler approximation (10 ms for the sampling periode). The step a of the gradient (as seen in Equation 6) has been changed between 5 and 200 according to the variations of the cost function. After about 1000 iterations we found the optimal solution, i.e. the bang-bang control law.

1467

5. Conclusions Today it is known that supervised learning is not completely dependent on a teacher [ 11. To solve problems of control this kind of learning is used to build a model of the ccworld, and to rely on this model to give directives to a controller in order to reach a goal. Our work tries to apply this approach to process optimal control, i.e. when a trajectory in state space is desired.

Our approach is a generalization of the neural architectures which were proposed by Jordan and Barto [l , 81. Indeed, with only one neural model to estimate the state, i.e. with T = 0, we recognize their architectures. When the goal is spe- clfied over a long time (T>O) our method is reminiscent of Widrow’s works in [2]. The difference consists in the forma- lization of the optimal control using background classical methods. We hope have proved that a sequence of fitted feedforward networks to process can provide the optimal control. We have shown that a baclcpropagation through this sequence of neural models solves the adjoint system of necessary conditions for an optimizing solution.

6. References [ 11 M. I. Jordan. D. E. Rumelhart, “ForwardModels : Super- vised Learning with a distal teacher”, Cognitive Science, 16, page 307-354. [2] D. Nguyen. B. Widrow, ‘The Truck Backer-upper”, In- ternational Neural Network Conference, July 9-13 1990, Pa- ris, France. [3] K.S. Narendra, K. ParthaSarathy, “Identification and Control of Dynamical Systems Using Neural Network”, IEEE Trans. On Neural Networks, Vol. 1, No. 1, March 1990. [4] D. Psaltis. A. Sideris and A. Yamamura, “Neural Con- trollers”, IC”, San Diego, 1987. [5 ] M. Kawato, “Computational Scheme and Neural Network Models for Formation and Control of Multijoint Ann Trajectory”, in Neural Networks for Control edited by W. Thomas Miller, R. Sutton and PJ. Werbos, Bradford Book, 1990. [6] J.M. Martinez, Ch. Parey, M. Houkari, “Lar6tropropaga- tion sous I’angle de la thCorie du Contrijle”, NEURO-NI- MES’91,4-8 Novembre 1991, Nmes, France. 171 A. G. Barto, “Connectionnist Learning for Control” in Neural Networks for Control edited by W. Thomas Miller, R. Sutton and P.J. Werbos, Bradford Book, 1990. [8] K.M. Homik, M. Stinchcombe, H. White, “Multi-layer Feedforward Networks are Universal Approximators”, UCSD Depamnent of Economiccs Discussion Paper, June 1988. [Y] Y. Lecun. “A Theorical Framework for Back-Propaga- tion“. Connectionnist Models, Summer School, Morgan

Kaufinann Publishers. [ 101 R. Boudarel, J. D e b , P. Guichet, “Commande Opti- male des Processus”, Techniques de 1’ Automatisme, Dunod Paris 1968. [ l l ] S.-Z. Qin, H.-T. Su, andT.J. McAvoy, “Comparison of Four Neural Net Learning Methods for Dynamic System Identification”, IEEE Trans. On Neural Networks, Vol. 3, No. 1, Jan. 1992. [ 121 P. J. Werbos, <<Backpropagation Through Time : What it Does and How to Do it,,, Proc. IEEE, vol78, no 10, Oct 90, pp 1550-1560.

Appendix

We deal with systems and cost functions which are defined by the following equations :

Xr+, = F(X,. U!) X E Rp U E R4 T

= ( vO) c cr (xp + ‘T+ 1 (‘T+ 1 )

I = I

From the cost function C considered as a function of control vectors U,, we have :

Let us define the adjoint vector by :

Adjoint vectors Y, are linked by following recurrent equations :

T + l a c k ax, d ~ , , ~ - ac, q+, ,,,ax, ax,,, awl - G+~GF, =

Using following notations :

We obtain sensitives relations of the global cost C with respect to control vectors U, :

Y,-, = c X r + Y r - F x , with Y T + , = 0

cur = Cur + y, ’ Fur

applications of optimal control theory

Documents