an optimal tracking approach to formation control of nonlinear...

13
American Institute of Aeronautics and Astronautics 1 An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems Ali Heydari 1 and S. N. Balakrishnan 2 Missouri University of Science and Technology, Rolla, MO, 65409 Formation control of network of multi-agent systems with heterogeneous nonlinear dynamics is formulated as an optimal tracking problem and a decentralized controller is developed using the framework of ‘adaptive critics’ to solve the optimal control problem. The reference signal is assumed available only in online implementation so its dynamics is unavailable for offline training of the neurocontroller. However, this issue is resolved by using a re-optimization of the network output through retraining of the neurocontroller online. Finally, the developed controller is applied to the formation control of multi- spacecraft orbiting the Earth at different orbits and seeking consensus on their position to dock. I. Introduction HE formation control of network of multi-agent systems with heterogeneous nonlinear dynamics has many application and benefits compared to single agent systems; designing a decentralized controller for such a network of agents though is a challenging problem. Using a decentralized controller, each agent is supposed to have access to its neighbors only while the mission for the whole system is to achieve a desired formation. Solutions to the network of linear agents have been developed by several different authors. A method developed in Ref. 1, called behavior-based decentralized control, performs the multi-agent control for network of agents with ring-wise communication topology by considering two desired behaviors: formation keeping and goal seeking. The role of the communication topology in the formation’s stability is shown in Ref. 2 through the appearance of the eigenvalues of Laplacian matrix of the communication graph in the dynamics of the formation. In Ref. 3 also, the stability of the formation/consensus is shown to be related to the eigenvalues of the Laplacian matrix and for the double-integrator system, the authors of Ref. 3 have designed a controller based on the eigenvalues of this matrix. Minimization of some performance index to reach a consensus is the approach used in Ref. 4 and Ref. 5 for single integrator and double integrator dynamics, respectively. Many of the developed methods in the literature are applicable only to single or double integrator dynamics 1,3-8 . In some other papers, the consensus protocol is developed for higher-order dynamics, but still limited to a multi-integrator structure, i.e., linear Brunovsky canonical form, for example in Ref. 9. In practice, dynamics of the agents could be much more complicated and hence, developing a consensus protocol which guarantees the consensus for the agents of general linear dynamics is of importance. Studies in Ref. 10-12 consider this point by decomposing the eigenvalues of the Laplacian matrix, 3, and designing control gains such that the agents’ closed loop dynamics are stable. A different approach is selected in Ref. 13 to serve the goal by solving a linear matrix inequality which contains the Laplacian matrix. Despite the promising advantages of these methods, they rely on the topology of the communication, consequently, once this topology changes, the controller’s stability is no longer guaranteed. In reality, this topology may change due to a communication link failure or creation of a new link that leads to changes in the Laplacian and its eigenvalues. Even if the switching topology always has rooted spanning tree 3 , the eigenvalue changes and the controller designed through Ref. 10-13 may result in failure to reach consensus. The issue of switching topology has been discussed by some researchers in the literature 12,14 . For a single integrator dynamics, the authors of Ref. 14 have analyzed the effect of switching directed topologies and for general linear dynamics, the authors of Ref. 12 have developed a method to remedy the problem of dependency on the 1 Ph.D. Student, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla, MO 65409, [email protected], AIAA Student Member. 2 Curators' Professor of Aerospace Engineering, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla, MO 65409, [email protected], AIAA Associate Fellow. T AIAA Guidance, Navigation, and Control Conference 13 - 16 August 2012, Minneapolis, Minnesota AIAA 2012-4694 Copyright © 2012 by Ali Heydari & S. N. Balakrishnan. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

1

An Optimal Tracking Approach to Formation Control of

Nonlinear Multi-Agent Systems

Ali Heydari1 and S. N. Balakrishnan2

Missouri University of Science and Technology, Rolla, MO, 65409

Formation control of network of multi-agent systems with heterogeneous nonlinear

dynamics is formulated as an optimal tracking problem and a decentralized controller is

developed using the framework of ‘adaptive critics’ to solve the optimal control problem.

The reference signal is assumed available only in online implementation so its dynamics is

unavailable for offline training of the neurocontroller. However, this issue is resolved by

using a re-optimization of the network output through retraining of the neurocontroller

online. Finally, the developed controller is applied to the formation control of multi-

spacecraft orbiting the Earth at different orbits and seeking consensus on their position to

dock.

I. Introduction

HE formation control of network of multi-agent systems with heterogeneous nonlinear dynamics has many

application and benefits compared to single agent systems; designing a decentralized controller for such a

network of agents though is a challenging problem. Using a decentralized controller, each agent is supposed to have

access to its neighbors only while the mission for the whole system is to achieve a desired formation.

Solutions to the network of linear agents have been developed by several different authors. A method developed

in Ref. 1, called behavior-based decentralized control, performs the multi-agent control for network of agents with

ring-wise communication topology by considering two desired behaviors: formation keeping and goal seeking. The

role of the communication topology in the formation’s stability is shown in Ref. 2 through the appearance of the

eigenvalues of Laplacian matrix of the communication graph in the dynamics of the formation. In Ref. 3 also, the

stability of the formation/consensus is shown to be related to the eigenvalues of the Laplacian matrix and for the

double-integrator system, the authors of Ref. 3 have designed a controller based on the eigenvalues of this matrix.

Minimization of some performance index to reach a consensus is the approach used in Ref. 4 and Ref. 5 for single

integrator and double integrator dynamics, respectively. Many of the developed methods in the literature are

applicable only to single or double integrator dynamics 1,3-8. In some other papers, the consensus protocol is

developed for higher-order dynamics, but still limited to a multi-integrator structure, i.e., linear Brunovsky canonical

form, for example in Ref. 9.

In practice, dynamics of the agents could be much more complicated and hence, developing a consensus protocol

which guarantees the consensus for the agents of general linear dynamics is of importance. Studies in Ref. 10-12

consider this point by decomposing the eigenvalues of the Laplacian matrix, 3, and designing control gains such that

the agents’ closed loop dynamics are stable. A different approach is selected in Ref. 13 to serve the goal by solving a

linear matrix inequality which contains the Laplacian matrix. Despite the promising advantages of these methods,

they rely on the topology of the communication, consequently, once this topology changes, the controller’s stability

is no longer guaranteed. In reality, this topology may change due to a communication link failure or creation of a

new link that leads to changes in the Laplacian and its eigenvalues. Even if the switching topology always has

rooted spanning tree 3, the eigenvalue changes and the controller designed through Ref. 10-13 may result in failure

to reach consensus.

The issue of switching topology has been discussed by some researchers in the literature 12,14. For a single

integrator dynamics, the authors of Ref. 14 have analyzed the effect of switching directed topologies and for general

linear dynamics, the authors of Ref. 12 have developed a method to remedy the problem of dependency on the

1 Ph.D. Student, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla, MO 65409, [email protected],

AIAA Student Member. 2 Curators' Professor of Aerospace Engineering, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla,

MO 65409, [email protected], AIAA Associate Fellow.

T

AIAA Guidance, Navigation, and Control Conference13 - 16 August 2012, Minneapolis, Minnesota

AIAA 2012-4694

Copyright © 2012 by Ali Heydari & S. N. Balakrishnan. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.

Page 2: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

2

Laplacian eigenvalues. In Ref. 12, all the possible network topologies for a fixed number of agents are selected and

the eigenvalues of all of the resulting Laplacians are calculated and used in the controller design to stabilize the

system under all of those topologies. If the number of agents becomes larger, the required computation for this

technique grows rapidly, to calculate the eigenvalues of the Laplacians of all possible network topologies.

Despite the numerous developments for network of linear agents, designing a decentralized control law for

network of nonlinear heterogeneous dynamics is still an open problem in the literature. The developed methods in

this regard mainly include the approaches of feedback linearization based methods 1,21, and virtual structure scheme 22. In the feedback linearization based papers, the nonlinearity is cancelled through a suitable expression for control

to end up with network of linear agents, while in a virtual structure design as proposed in Ref. 22 one node is

responsible to generate some desired signals for different agents to be tracked by them, hence, resulting in a

centralized control in the sense that all the agents need to be able to communicate with that node/agent. For a

specific communication topology, in Ref. 23 the virtual structure scheme is modified to make it decentralized for

agents with particular dynamics.

This paper’s contribution is developing an ‘adaptive critic’ based neurocontroller 20, for solving the optimal

tracking problem and using it along with online re-optimization to track a signal whose value is available to the

agent at each instance, but its evolution is not known a priori. Using this scheme, the problem of formation control

of heterogeneous agents can be solved by defining the average of the states of the neighbors as the reference signal

to be tracked. The neurocontroller is then trained using a reinforcement learning scheme, based on some estimated

dynamics for the reference signal. Next, through re-optimization of the neurocontroller online, the controller is re-

trained to be optimized for the true dynamics of the reference signal. A nice feature of this controller is its

robustness in a time-varying communication topology.

Solving optimal tracking problems for nonlinear systems using adaptive critics has been investigated by

researchers in Ref. 24-29. In Ref. 24 the authors have developed a tracking controller for the system whose input

gain matrix, i.e. matrix ��� � in the state equation (1), is invertible.

��� ����� � ����� � (1)

In Ref. 25 the reference signal is limited to those which satisfy the dynamics of the system, i.e., denoting the

reference signal by �, it needs to satisfy ��� ����� � ����� �� where �� denotes the desired control and the state

equation is given by (1). Developments in Ref. 26-29 solves the tracking problem for the systems of nonlinear

Brunovsky canonical form.

The developed neurocontroller in this study is called the Tracking Single Network Adaptive Critic (Tracking-

SNAC) for solving optimal tracking problem of nonlinear control-affine dynamics for tracking a signal whose

dynamics could be nonlinear and arbitrary.

Rest of the paper is organized as follows: after the problem formulation in section II, a neurocontroller, called

Tracking-SNAC, is developed for infinite-horizon optimal tracking of a reference signal in section III and is used for

the formation control of the heterogeneous nonlinear agents in section IV. Numerical results are provided in section

V, followed by conclusions in section VI.

II. Problem Formulation

Assume a network of � agents whose heterogeneous discrete-time nonlinear dynamics are described by

���� ������ � � ������ � �� ����� �� �� � � ����� �� �� �� � (2)

with initial conditions ��� where ��� � �� and �� � � are the state and the control vectors. The subscripts denote

the time index and the superscripts denote the agent’s index. Functions ���� � � �� and ���� � � ��! �are the

system dynamics of the �th agent. Positive integers " and # denote the dimensions of the state space and the input,

respectively. The objective is to design � decentralized controls ��$�, such that they guarantee the convergence of

the agents to a consensus/formation, i.e.

%�#�&'���� ( ��)� ���*�� *+ (3)

The selected approach here is formulating the problem as a tracking problem and solving it using optimal control

tools. Assume the cost function

Page 3: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

3

,� -. /���� ( 0�� �12���� ( 0�� � � �� 13 �� 4'�5� ������ �� �� � � ���� (4)

0�� 6 7897. ��))�89 (5)

where �� denotes the set of neighbors of agent � and 7��7 denotes its number of neighbors. Matrices 2 � ��!�and3 � � ! are weighting matrices for the states’ errors, and the control effort, respectively. The matrix 2 should be

positive definite or positive semi-definite while 3 has to be a positive definite matrix. Optimizing ,� with positive

definite 2, for different �’s leads to (3), hence, the consensus problem converts to a tracking problem. In solving the

tracking problem, one needs to note that for the solution to be decentralized, the controller has access only to ��) for + � �� as well as ��� for calculating each �� . In this paper, the framework of adaptive critics is selected for solving

the optimal tracking problem of the discrete-time nonlinear system.

III. Tracking-SNAC

Consider the nonlinear discrete-time input-affine system

��� ����� � ����� ����� �� �� �� � (6)

where �� � �� and � � � denote the state and the control vectors at time �, respectively� ��� � � �� and ��� � ���! are the system dynamics and the initial states is given by ��. Given the reference signal 0� � �� with the

dynamics of

0�� :�0������ �� �� �� � (7)

where the reference signal is propagated by the dynamics :�� � � �� with the given initial value of 0�, the objective

is selecting a control history �, � �� �� �� �, such that the below cost function is minimized.

, -. ���� ( 0��12��� ( 0�� � �13 ��'�5� (8)

Denoting the cost-to-go at instant � by ,���� 0��, because of its dependency on the current state �� and the

current reference signal 0�, the optimal solution, superscripted by ;, is given by the HJB equation below

,;���� 0�� <=>?@ A- ���� ( 0��12��� ( 0�� � �13 �� � ,����� 0���B ���� �� �� �� � (9)

�; CDE<=>?@ A- ���� ( 0��12��� ( 0�� � �13 �� � ,����� 0���B (10)

Define the costate vector as F� 6 GH@GI@ where ,���� 0�� is denoted by ,� to get

�; (3J�����1 GH@KLGI@KL (3J�����1F��� (11)

Replacing � in (9) by �; , the HJB equation reads

,;���� 0�� - ���� ( 0��12��� ( 0�� � �13 �� � ,;����� 0��� (12)

The costate equation can be derived by taking the derivative of both sides of (12) with respect to �� as

F�� 2��� ( 0�� � M�1F� (13)

or equivalently

Page 4: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

4

F��� 2���� ( 0��� � M��1 F��- (14)

where

M�� 6 GI@KNGI@KL G�O�I@KL��P�I@KL�?@KL�GI@KL (15)

Using the SNAC scheme 30, a neural network (NN) is selected to generate F��� . From (14), the costate vector F�is observed not only to be a function of the states �, but also a function of the reference signal 0, hence, the inputs to

the network are selected to be �� and 0�. Denoting the NN mapping with ���� � one has

F��� ������ 0�� (16)

Selecting the network structure of linear in the tunable weights, one has

F��� Q1R���� 0�� (17)

where Q � �S!� denotes the network weight matrix, R�� � � �S is composed of T linearly independent scalar basis

functions.

Considering (14), the network training target, denoted by FU, can be calculated using the following equation.

F��U 2���� ( 0��� � M��1 F��-���� �� �� �� � (18)

which in the training process, F��- on the right hand side of (18) will be substituted by Q1R����� 0���. Noting

that the closed loop dynamics, using (6), (11), and (17), is given by

��� ����� ( �����3J�����1Q1R���� 0�� (19)

equation (18) changes to

F��U 2������ ( �����3J�����1Q1R���� 0�� ( :�0�������������������������������M��1 Q1R������ ( �����3J�����1Q1R���� 0��� :�0��� (20)

Note that in (20) which is supposed to generate F��U to be used for training the weights, the right hand side of

the equation is dependent on Q, hence, the calculated target F��U is itself a function of the weights and the optimal

target needs to be obtained through a successive approximation scheme, called reinforcement learning. Defining the

training error as

V�� 6 F��� ( F��U Q�1R���� 0�� ( F��U (21)

the iterative process of learning Q can be summarized as the below algorithm.

Algorithm 1:

1- Randomly select the state vector �� and the reference signal, i.e., 0�, belonging to the selected domain of

interest.

2- Through the process showed in Fig. 1 and given by equation (20), calculate F��U .

3- Train network weight Q using input-target pair W���� 0��� F��U X.4- Calculate the training error V�� using (21).

5- Repeat step 1 to 4 until the error V�� converges to zero or a small value for different ��’s and 0� 's

selected in step 1.

Having the input-target pair W���� 0��� F��U X calculated, the network can be trained using any training method 31.

The selected training law in this study is the least squares method. Assume that in each iteration of Algorithm 1,

instead of one random state and reference signal� Y random states and reference signals denoted by ���� and 0���,� �� �� � � Y� are selected. Denoting the training target FU calculated using ���� and 0��� by FU������� 0�����, the

objective is finding Q such that it solves

Page 5: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

5

Fig. 1. Tracking-SNAC training diagram.

Z[\[]Q1R������ 0���� FU������ 0����Q1R����-�� 0��-�� FU����-�� 0��-���Q1R����^�� 0��^�� FU����^�� 0��^��

(22)

Define

_ 6 `R������ 0�������R����-�� 0��-����� ���R����^�� 0��^��a (23)

bc 6 `FU������ 0�������FU����-�� 0��-��������FU����^�� 0��^��a (24)

Using least squares, the solution to system of linear equations (22) is given by

Q �__1�J_bc1 (25)

Note that for the inverse of matrix �__1� to exist, one needs the basis functions R to be linearly independent

and the number of random states Y to be greater than or equal to the number of neurons, T.

Though (25) looks like a one shot solution for the ideal NN weights, the training is an iterative process which

needs selecting different random states and reference signals and updating the weights through solving (25)

successively. Note that bc used in the weight update (25), as explained earlier, is not the true optimal costate and is a

function of current estimation of the ideal unknown weight, i.e., bc bc�Q�. Denoting the weights at the �th epoch

of the weight update by Q� the iterative procedure is given by

Q�� �__1�J_bc�Q��1 (26)

Costate Equation (18)

Optimal

Control

Equation

(11) ��

F��-

���

F��

Tracking-SNAC (17)

Tracking-SNAC (17)

Optimal Control

Equation (11)

State Equation (6)

F��U

��

:�� �

0��

0�

Page 6: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

6

Upon convergence of the weights, one has

%�#�&'Q� Q; (27)

where�Q; denotes the optimal NN weights in the sense that it generates the optimal costate vector F��. For this

purpose, one starts with an initial weight Q� and iterates through (25) until the weights converge. The initial weight

can be set to zero or can be selected based on the linearized solutions of the given nonlinear system.

Comment 1: If the reference signal is such that there is no control needed after the system achieves perfect tracking,

then the term �13 � in the cost function becomes zero. If not, the cost function becomes unbounded. In this case, an

alternative formulation is suggested as given below:

Use discounted cost function

, . d����� ( 0��12��� ( 0�� � �13 ��'�5� (28)

with � e d e � being the discount factor. In this case, the costate equation (14) changes to

f� 2��� ( 0�� � dM��1 f�� (29)

and the optimal control equation (11) reads

� (d3J�����1f�� (30)

A. Re-optimization of Tracking-SNAC

As seen in (20) and Fig. 1, the target calculation for training the Tracking-SNAC is dependent on the dynamics

of the reference signal, i.e., :�� �. In case this dynamics is not perfectly known, one may use the available/estimated

dynamics for offline training of the Tracking-SNAC, and then, in online implementation, re-optimize the network

through re-training it. Note that in online implementation, which 0� is directly available, the knowledge of function :�� � is not needed since the equation given below for target calculation can be used in online re-optimization.

F�U 2��� ( 0�� � M�1Q1R���� 0�� (31)

and then train the network using input-target pairs W���J� 0�J�� F�U X. In this manner, one needs to start the re-

optimization process after the first time index, and then, using the current �� and 0�, along with ��J and 0�J, the

target F�U can be calculated and the weights can be re-optimized.

The weight update law used for offline training was a batch training method, in which, one trains the network

based on Y different input-target pairs. But, for the re-optimization, at each instant, only one input-target pair is

available and the network should be trained based on that for the subsequent pairs to be generated using a network

with a more optimized weights, hence, the online training needs to be done using a sequential training method 31.

IV. Formation Control Using Tracking-SNAC

Using the developed Tracking-SNAC for the formation control problem, the reference signal is given by (5).

However, the problem is that the dynamics of the reference signal is not known a priori, for use in the training. The

suggested solution is estimating the dynamics, in order to be able to propagate the randomly selected 0� to 0�� as

needed in the training process, and training the network based on that, and then, in online implementation, re-

optimizing the network to learn the reference signal’s true dynamics. As explained in the previous section, one needs

to know only 0� and 0�J about the reference signal during the re-optimization phase at each time step �. In this

approach, one uses the knowledge of ��) for + � �� at each time step � to calculate 0�� using (5). Hence, the

controller will be decentralized in the sense that no knowledge of the states of non-neighbors is required.

Note that, even if the dynamics of the agents are similar, each agent will need its own Tracking-SNAC to re-

optimize it based on its observed reference signal, which is different than that of the others. In case of identical

agents, one may train one network offline and duplicate it for different agents; then, each one will re-optimize its

own.

Page 7: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

7

V. Numerical Analysis

In this section, the problem of formation control of several spacecraft is selected and simulated to demonstrate

the performance of the developed controller. The problem is having four spacecraft as four mass points in different

orbits and a decentralized controller is required to bring them all to one location for docking.

A. Modeling

Equation of motion of a spacecraft in a gravity field described in an inertial frame centered at the center of

gravity is 32,

�g ( h7i7j � � : (32)

where �, 7�7, k, and : denote the position vector of the spacecraft from the center of gravity, the magnitude of the

vector �, gravitational coefficient of the gravity field, and the vector of the force excreted on the spacecraft per its

unit of mass, respectively. Moreover, notation �g denotes the second time derivative of the vector � with respect to

the inertial frame.

Selecting some reference length 3 and reference time l, one can normalize the parameters by defining

normalized � and : denoted by �m and :n, respectively, as

�m o �� �mp 1o � :n 1No : (33)

Selecting the reference time of l� �q3rsk� and normalizing (32) leads to

�mg ( 7im7j �m � :n (34)

Representing vector �m in the inertial frame by three element t����-���ru1 and their rates by t�v���w���xu1 6t�p���p-���pru1, the state vector can be formed as

� t����-���r���v���w���xu1 (35)

and the normalized control force per unit of mass represented in the inertial frame by :n t �� -�� ru1 can form the

control vector

t �� -�� ru1 (36)

Using (34), the state equation reads

�p �y��� � �y��� �$����� �� �� �� � (37)

where

�y���

z{{{{{{|�v�w�x( 7im7j �( 7im7j �-( 7im7j �r}~

~~~~~��������y���

z{{{{|� � �� � �� � �� � �� � �� � �}~

~~~� (38)

and

7�m7 q�- � �-- � �r- (39)

Page 8: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

8

Discretizing dynamics (37) using small sampling time �$, the discrete-time state equation as given in (6) will be

resulted, where

����� ����$� � �$�y�����$�� ����� �$�y�����$�� (40)

B. Numerical Results

Having modeled the dynamics of the spacecraft, four agents in different orbits listed in Table 1 are selected.

Assuming that at the beginning of the maneuver, the spacecraft are all passing through the equatorial plane moving

from the southern hemisphere toward the northern hemisphere, the objective is defined as performing some

maneuvers to dock at the location of the first spacecraft. In the other words, the first spacecraft will keep rotating

along its orbit without any control force applied on it and the rest are supposed to get to the orbiting location of

spacecraft 1 and dock with it.

The communication topology between the spacecraft is assumed time-varying and depicted in Fig. 2. Indexing

the spacecraft from 1 to 4, in the first five time units spacecraft 2 and 4 can communicate with spacecraft 1 and have

direct knowledge of its location while spacecraft 3 does not have access to spacecraft 1. For the next five time units

spacecraft 4 has only access to the location of spacecraft 3 which itself can only access the location of spacecraft 2

as seen in the figure, hence, the decentralized nature of the controller can be examined in this example.

Table 1: The specifications of the orbits of spacecraft 1 to 4.

The Characteristic Orbit 1 Orbit 2 Orbit 3 Orbit 4

Orbit Semi-major axes 9,000 km 11,000 km 13,000 km 15,000 km

Right Ascension of the Ascending Node 20 deg. 0 deg. 40 deg. 50 deg.

Inclination 20 deg. 0 deg. 40 deg. 50 deg.

Eccentricity 0 0 0 0

Fig. 2. Graphs representing communication between the spacecraft: a) For the first five time units, b) For

next 5 time units, and c) For the rest of the simulation.

The reference length is selected as 3 �������#, hence, reference time will be l q3rsk ������ ! ��r�.In this condition, each control unit will be 3sl- ����� ! ��Jr�#s�-. The initial conditions of each spacecraft

based on Table 1 is calculated and listed below.

�� t����������������������������� ( ��������������������������u1 (41)

��- t��������������������������������������������������������u1 (42)

��r t����������������������������� ( ��������������������������u1 (43)

��v t����������������������������� ( ��������������������������u1 (44)

The weight matrix of the cost function are selected as

2 �������������������������������� (45)

3 ������������� (46)

1 2

34

a

1 2

34

b

4

c

1 2

3

Page 9: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

9

The first three elements of 2 correspond to the position error and by putting high weight on them, one emphasizes

on the minimization of the position error.

The Tracking-SNAC design and training is the next step. The sampling time of �$ ���� is selected for

discretizing continuous dynamics (37) and denoting the basis functions inputs with � and 0 and their elements with �� and 0�, � ���� � ��, the vector R��� 0� is selected of elements ��� and 0��, � ���� � ��, along with following

terms ��-, ��r, �-�r, �v�w, �v�x, �w�x, and also 00-, 00r, 0-0r, 0v0w, 0v0x, 0w0x, or in the other words, the

input elements along with different combination of multiplication of position elements with each other and velocity

element with each other, for the state vector and the reference signal, separately. This leads to R��� 0� � �-v.Note that the cost function (8) is finite for this problem since no control is needed after spacecraft 2 to 4 get to

the location of spacecraft 1 and keep orbiting with that along the orbit. (See comment 2). As discussed earlier, for

the offline training an estimation of the dynamics of reference signal 0, i.e., :�� � is required. Denoting the elements

of 0 by 0�, � ������ � ��, in this study, following selection is made

:�t0� 0-� 0r� 0v� 0w� 0xu1� t0� 0-� 0r� 0v� 0w� 0xu1 � �$t0v� 0w� 0x� �����u1 (47)

which means that the reference signal’s position is propagated based on its velocity which is assumed constant.

The training is done for 200 epochs, where in each iteration, 50 random states and reference signals are selected

to perform the least squares technique given in (25). The weights of the Tracking-SNAC converged after around 70

iterations of Algorithm 1 as seen in Fig. 3 which shows the history of evolution of the weights versus the offline

training iterations.

Once trained, the neurocontroller was duplicated to be used for different spacecraft and the simulation was done

for 30 time units. The position histories resulted from simulating the proposed controller are depicted in Fig. 4. As

seen, the three spacecraft 2 to 4 starting at different locations have converged to the location of spacecraft 1 and kept

orbiting along with that. This can be confirmed by looking at Fig. 5, as well. In this figure, the error between the

positions of spacecraft 2 to 4 with respect to the position of spacecraft 1 is shown. Convergence of the error to zero

corresponds to the desired formation.

In Fig. 6, history of the weights of the networks during online re-optimization versus time is shown. As

expected, initially the weights have changed rapidly and then converged to some final values. These changes in the

weights are because of re-optimizing the network to give optimal results based on the actual dynamics of the

reference signal.

In order to analyze the effect of re-optimization of the Tracking-SNAC, a separate simulation is performed using

the same offline-trained network without re-optimization. The position error histories, with respect to the position of

spacecraft 1, are depicted in Fig. 7. As seen in this figure, even though the controller has been able to decrease the

initial high errors, but, the errors do not get closer to zero, hence, the Tracking-SNAC without re-optimization

cannot make the satellites perfectly track the desired signal to achieve the desired formation. This emphasizes the

lack of optimality of the offline trained networks for tracking the signal whose dynamics is different than the one

used in training and shows the need for the online re-optimization.

VI. Conclusions

The problem of formation control of network of heterogeneous nonlinear dynamics is converted to a tracking

problem and the developed Tracking-SNAC controller is shown to solve the problem through offline training based

on an estimated dynamics of the reference signal and re-optimizing the network in online implementation to learn

the actual dynamics of the reference signal. The simulation results seem to indicate that the proposed technique has

great promises.

Acknowledgement

This research was partially supported by a grant from the National Science Foundation.

Page 10: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

10

Fig. 3. History of weights versus the training iterations for the offline training phase.

Fig. 4. Histories of the positions of the spacecraft in different axes.

Spacecraft 1 to 4 are denoted by ai, i =1,2,3,4.

0 50 100 150 200−400

−200

0

200

400

Weig

ht

Ele

ments

Training Iterations

0 5 10 15 20 25 30

−1

0

1

2

Po

sitio

n (

X)

0 5 10 15 20 25 30

−1

0

1

2

Po

sitio

n (

Y)

0 5 10 15 20 25 30

−0.4

−0.2

0

0.2

0.4

0.6

Po

sitio

n (

Z)

Time Unit

a1

a2

a3

a4

a1

a2

a3

a4

a1

a2

a3

a4

Page 11: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

11

Fig. 5. Histories of the errors in the positions of the spacecraft in different axes.

Spacecraft 2 to 4 are denoted by ai, i = 2,3,4.

Fig. 6. History of weights versus time for the online re-optimization.

0 5 10 15 20 25 30−0.2

−0.1

0

0.1

0.2

0.3

Po

sitio

n E

rro

r (Z

)

Time Unit

0 5 10 15 20 25 30−0.4

−0.2

0

0.2

0.4

Po

sitio

n E

rro

r (X

)

a2

a3

a4

0 5 10 15 20 25 30−0.5

0

0.5

1

Po

sitio

n E

rro

r (Y

) a2

a3

a4

a2

a3

a4

0 5 10 15 20 25 30−400

−200

0

200

400

Weig

ht E

lem

ents

Time Unit

Page 12: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

12

Fig. 7. Histories of the error in the positions of the spacecraft in different axes without re-optimization of

the Tracking-SNACs. Spacecraft 2 to 4 are denoted by ai, i = 2,3,4.

References 1J. Lawton, R. Beard, and B. Young, "A Decentralized Approach to Formation Maneuvers," IEEE Transactions on Robotics

and Automation, vol. 19, no. 6, 2003. 2A. Fax and R. Murray, "Information Flow and Cooperative Control of Vehicle Formations," IEEE Transactions on

Automatic Control, vol. 49, no. 9, 2004. 3G. Lafferriere, A. Williams, J. Caughman, and J.J.P. Veerman, “Decentralized control of vehicle formation,” Systems and

control letters, (54), 2005. 4Y.C. Cao and W. Ren, ”Optimal Linear Consensus Algorithm: an LQR Perspective,” IEEE Trans. on Systems, Man, and

Cybernetics (Part B), Vol. 40, 2010, pp.819–830. 5J. Wang and M. Xin, "Multi-agent Consensus Algorithm with Obstacle Avoidance via Optimal Control Approach, Proc.

American Control Conference, San Francisco, 2011, pp.2783-2788. 6R. Olfati-Saber and R. Murray, “Consensus protocols for networks of dynamic agents,” Proc. Amer. Control Conf., 2003,

pp. 951–956. 7X.H. Wang, V. Yadav, and S.N. Balakrishnan, “Cooperative UAV Formation Flying with Obstacle/ Collision Avoidance,”

IEEE Trans. on Control Systems Technology, Vol. 15, 2007, pp.672–679. 8S. Zhang and G. Duan, “Consensus seeking in multiagent cooperative control systems with bounded control input,” J

Control Theory Appl, Vol. 9 (2), 2011, pp.210–214. 9W. He and J. Cao, “Consensus control for high-order multi-agent systems,” IET Control Theory Appl., Vol. 5 (1), 2011, pp.

231–238.10J. Seo, H. Shima, and J. Back, “Consensus of high-order linear systems using dynamic output feedback compensator: Low

gain approach,”Automatica, Vol. 45 ,2009, pp.2659-2664.

0 5 10 15 20 25 30−0.2

−0.1

0

0.1

0.2

0.3

Po

sitio

n E

rro

r (X

)

0 5 10 15 20 25 30−0.5

0

0.5

1

Po

sitio

n E

rro

r (Y

)

0 5 10 15 20 25 30−0.2

−0.1

0

0.1

0.2

0.3

Po

sitio

n E

rro

r (Z

)

Time Unit

a2

a3

a4

a2

a3

a4

a2

a3

a4

Page 13: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and

American Institute of Aeronautics and Astronautics

13

11F. Xiao and L. Wang, “Consensus problems for high-dimensional multi-agent systems,” IET Control Theory Appl., Vol. 1

(3), 2007, pp. 830–837. 12J. Wang, D. Cheng, and X. Hu, “Consensus of multi-Agent Linear Dynamic Systems,” Asian Journal of Control, Vol. 10

(2), 2008, pp.144-155. 13G. Zhai, S. Okuno, J. Imae, and T. Kobayashi, “A Matrix Inequality Based Design Method for Consensus Problems in

Multi–Agent Systems,” Int. J. Appl. Math. Comput.Sci., Vol. 19 (4), 2009, pp.639–646. 14R. Olfati-Saber and R. Murray, “Consensus problems in networks of agents with switching topology and time-delays,”

IEEE Transactions on Automatic Control,Vol. 49 (9), 2004, pp.1520–1533. 15R.A. Horn and C.R. Johnson, Topics in matrix analysis, Cambridge Univ. Press, Cambridge, 1994. 16C. Meyer, Matrix analysis and applied linear algebra, SIAM, Cambridge, 2001. 17C. Chen, Linear System Theory and Design, Oxford University Press, USA, 1999. 18A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic

programming: convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 38, pp. 943-949, 2008. 19L. Jian-Zhen, “Reference model based consensus control of second-order multi-agent systems,” Chin. Phys. B, Vol. 20 (2),

2011.20D.V. Prokhorov and D.C. Wunsch, “Adaptive Critic Designs,” IEEE Trans. Neural Networks, Vol. 8 (5), pp. 997-1007,

1997.21A. Das and F.L. Lewis, “Cooperative adaptive control for synchronization of second-order systems with unknown

nonlinearities,” Int. J. Robust. Nonlinear Control, Vol. 21, pp. 1509–1524, 2011. 22W. Ren and R. Beard, "Formation feedback control for multiple spacecraft via virtual structures," IEE Proc.-Control

Theory Appl., Vol. 151, No. 3, pp. 357-368, 2004. 23W. Ren and R. Beard, "Decentralized Scheme for Spacecraft Formation Flying via the Virtual Structure Approach," J. of

Guidance, Control, and Dynamics.,Vol. 27 (1), pp. 73-82, 2004. 24H. Zhang, Q. Wei, and Y. Luo, “A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time

Nonlinear Systems via the Greedy HDP Iteration Algorithm,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 38 (4), pp. 937-

942, 2008. 25T. Dierks and S. Jagannathan, “Online optimal control of nonlinear discrete-time systems using approximate dynamic

programming,” J. Control Theory Appl., Vol.9 (3), pp.361–369., 2011. 26Q. Yang and S. Jagannathan, “Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-

Time Systems,” Proc. IEEE Int. Sym. on Intelligent Control, 2007, pp. 578–583. 27J. Yao, X. Liu, and X. Zhu, “Asymptotically Stable Adaptive Critic Design for Uncertain Nonlinear Systems,” Proc. Amer.

Control Conf., 2009, pp. 5156–5161. 28L. Yang, J. Si, K. Tsakalis, and A. Rodriguez, “Direct Heuristic Dynamic Programming for Nonlinear Tracking Control

With Filtered Tracking Error,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 39 (6), pp. 1617-1622, 2009. 29S. Bhasin, N. Sharma, P. Patre, P., and W. Dixon, “Asymptotic tracking by a reinforcement learning-based adaptive critic

controller,” J. Control Theory Appl., Vol.9 (3), pp. 400–409., 2011. 30R. Padhi, N. Unnikrishnan, X. Wang, and S.N. Balakrishnan, “A Single Network Adaptive Critic (SNAC) Architecture for

Optimal Control Synthesis for a Class of Nonlinear Systems,” Neural Networks, Vol.19, 2006, pp.1648–1660. 31S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice Hall, 1998. 32M.J. Sidi, Spacecraft Dynamics and Control: A Practical Engineering Approach, Cambridge University Press, 2000.