an optimal tracking approach to formation control of nonlinear...
TRANSCRIPT
![Page 1: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/1.jpg)
American Institute of Aeronautics and Astronautics
1
An Optimal Tracking Approach to Formation Control of
Nonlinear Multi-Agent Systems
Ali Heydari1 and S. N. Balakrishnan2
Missouri University of Science and Technology, Rolla, MO, 65409
Formation control of network of multi-agent systems with heterogeneous nonlinear
dynamics is formulated as an optimal tracking problem and a decentralized controller is
developed using the framework of ‘adaptive critics’ to solve the optimal control problem.
The reference signal is assumed available only in online implementation so its dynamics is
unavailable for offline training of the neurocontroller. However, this issue is resolved by
using a re-optimization of the network output through retraining of the neurocontroller
online. Finally, the developed controller is applied to the formation control of multi-
spacecraft orbiting the Earth at different orbits and seeking consensus on their position to
dock.
I. Introduction
HE formation control of network of multi-agent systems with heterogeneous nonlinear dynamics has many
application and benefits compared to single agent systems; designing a decentralized controller for such a
network of agents though is a challenging problem. Using a decentralized controller, each agent is supposed to have
access to its neighbors only while the mission for the whole system is to achieve a desired formation.
Solutions to the network of linear agents have been developed by several different authors. A method developed
in Ref. 1, called behavior-based decentralized control, performs the multi-agent control for network of agents with
ring-wise communication topology by considering two desired behaviors: formation keeping and goal seeking. The
role of the communication topology in the formation’s stability is shown in Ref. 2 through the appearance of the
eigenvalues of Laplacian matrix of the communication graph in the dynamics of the formation. In Ref. 3 also, the
stability of the formation/consensus is shown to be related to the eigenvalues of the Laplacian matrix and for the
double-integrator system, the authors of Ref. 3 have designed a controller based on the eigenvalues of this matrix.
Minimization of some performance index to reach a consensus is the approach used in Ref. 4 and Ref. 5 for single
integrator and double integrator dynamics, respectively. Many of the developed methods in the literature are
applicable only to single or double integrator dynamics 1,3-8. In some other papers, the consensus protocol is
developed for higher-order dynamics, but still limited to a multi-integrator structure, i.e., linear Brunovsky canonical
form, for example in Ref. 9.
In practice, dynamics of the agents could be much more complicated and hence, developing a consensus protocol
which guarantees the consensus for the agents of general linear dynamics is of importance. Studies in Ref. 10-12
consider this point by decomposing the eigenvalues of the Laplacian matrix, 3, and designing control gains such that
the agents’ closed loop dynamics are stable. A different approach is selected in Ref. 13 to serve the goal by solving a
linear matrix inequality which contains the Laplacian matrix. Despite the promising advantages of these methods,
they rely on the topology of the communication, consequently, once this topology changes, the controller’s stability
is no longer guaranteed. In reality, this topology may change due to a communication link failure or creation of a
new link that leads to changes in the Laplacian and its eigenvalues. Even if the switching topology always has
rooted spanning tree 3, the eigenvalue changes and the controller designed through Ref. 10-13 may result in failure
to reach consensus.
The issue of switching topology has been discussed by some researchers in the literature 12,14. For a single
integrator dynamics, the authors of Ref. 14 have analyzed the effect of switching directed topologies and for general
linear dynamics, the authors of Ref. 12 have developed a method to remedy the problem of dependency on the
1 Ph.D. Student, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla, MO 65409, [email protected],
AIAA Student Member. 2 Curators' Professor of Aerospace Engineering, Dept. of Mechanical and Aerospace Eng., 400 W. 13th St. Rolla,
MO 65409, [email protected], AIAA Associate Fellow.
T
AIAA Guidance, Navigation, and Control Conference13 - 16 August 2012, Minneapolis, Minnesota
AIAA 2012-4694
Copyright © 2012 by Ali Heydari & S. N. Balakrishnan. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
![Page 2: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/2.jpg)
American Institute of Aeronautics and Astronautics
2
Laplacian eigenvalues. In Ref. 12, all the possible network topologies for a fixed number of agents are selected and
the eigenvalues of all of the resulting Laplacians are calculated and used in the controller design to stabilize the
system under all of those topologies. If the number of agents becomes larger, the required computation for this
technique grows rapidly, to calculate the eigenvalues of the Laplacians of all possible network topologies.
Despite the numerous developments for network of linear agents, designing a decentralized control law for
network of nonlinear heterogeneous dynamics is still an open problem in the literature. The developed methods in
this regard mainly include the approaches of feedback linearization based methods 1,21, and virtual structure scheme 22. In the feedback linearization based papers, the nonlinearity is cancelled through a suitable expression for control
to end up with network of linear agents, while in a virtual structure design as proposed in Ref. 22 one node is
responsible to generate some desired signals for different agents to be tracked by them, hence, resulting in a
centralized control in the sense that all the agents need to be able to communicate with that node/agent. For a
specific communication topology, in Ref. 23 the virtual structure scheme is modified to make it decentralized for
agents with particular dynamics.
This paper’s contribution is developing an ‘adaptive critic’ based neurocontroller 20, for solving the optimal
tracking problem and using it along with online re-optimization to track a signal whose value is available to the
agent at each instance, but its evolution is not known a priori. Using this scheme, the problem of formation control
of heterogeneous agents can be solved by defining the average of the states of the neighbors as the reference signal
to be tracked. The neurocontroller is then trained using a reinforcement learning scheme, based on some estimated
dynamics for the reference signal. Next, through re-optimization of the neurocontroller online, the controller is re-
trained to be optimized for the true dynamics of the reference signal. A nice feature of this controller is its
robustness in a time-varying communication topology.
Solving optimal tracking problems for nonlinear systems using adaptive critics has been investigated by
researchers in Ref. 24-29. In Ref. 24 the authors have developed a tracking controller for the system whose input
gain matrix, i.e. matrix ��� � in the state equation (1), is invertible.
��� ����� � ����� � (1)
In Ref. 25 the reference signal is limited to those which satisfy the dynamics of the system, i.e., denoting the
reference signal by �, it needs to satisfy ��� ����� � ����� �� where �� denotes the desired control and the state
equation is given by (1). Developments in Ref. 26-29 solves the tracking problem for the systems of nonlinear
Brunovsky canonical form.
The developed neurocontroller in this study is called the Tracking Single Network Adaptive Critic (Tracking-
SNAC) for solving optimal tracking problem of nonlinear control-affine dynamics for tracking a signal whose
dynamics could be nonlinear and arbitrary.
Rest of the paper is organized as follows: after the problem formulation in section II, a neurocontroller, called
Tracking-SNAC, is developed for infinite-horizon optimal tracking of a reference signal in section III and is used for
the formation control of the heterogeneous nonlinear agents in section IV. Numerical results are provided in section
V, followed by conclusions in section VI.
II. Problem Formulation
Assume a network of � agents whose heterogeneous discrete-time nonlinear dynamics are described by
���� ������ � � ������ � �� ����� �� �� � � ����� �� �� �� � (2)
with initial conditions ��� where ��� � �� and �� � � are the state and the control vectors. The subscripts denote
the time index and the superscripts denote the agent’s index. Functions ���� � � �� and ���� � � ��! �are the
system dynamics of the �th agent. Positive integers " and # denote the dimensions of the state space and the input,
respectively. The objective is to design � decentralized controls ��$�, such that they guarantee the convergence of
the agents to a consensus/formation, i.e.
%�#�&'���� ( ��)� ���*�� *+ (3)
The selected approach here is formulating the problem as a tracking problem and solving it using optimal control
tools. Assume the cost function
![Page 3: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/3.jpg)
American Institute of Aeronautics and Astronautics
3
,� -. /���� ( 0�� �12���� ( 0�� � � �� 13 �� 4'�5� ������ �� �� � � ���� (4)
0�� 6 7897. ��))�89 (5)
where �� denotes the set of neighbors of agent � and 7��7 denotes its number of neighbors. Matrices 2 � ��!�and3 � � ! are weighting matrices for the states’ errors, and the control effort, respectively. The matrix 2 should be
positive definite or positive semi-definite while 3 has to be a positive definite matrix. Optimizing ,� with positive
definite 2, for different �’s leads to (3), hence, the consensus problem converts to a tracking problem. In solving the
tracking problem, one needs to note that for the solution to be decentralized, the controller has access only to ��) for + � �� as well as ��� for calculating each �� . In this paper, the framework of adaptive critics is selected for solving
the optimal tracking problem of the discrete-time nonlinear system.
III. Tracking-SNAC
Consider the nonlinear discrete-time input-affine system
��� ����� � ����� ����� �� �� �� � (6)
where �� � �� and � � � denote the state and the control vectors at time �, respectively� ��� � � �� and ��� � ���! are the system dynamics and the initial states is given by ��. Given the reference signal 0� � �� with the
dynamics of
0�� :�0������ �� �� �� � (7)
where the reference signal is propagated by the dynamics :�� � � �� with the given initial value of 0�, the objective
is selecting a control history �, � �� �� �� �, such that the below cost function is minimized.
, -. ���� ( 0��12��� ( 0�� � �13 ��'�5� (8)
Denoting the cost-to-go at instant � by ,���� 0��, because of its dependency on the current state �� and the
current reference signal 0�, the optimal solution, superscripted by ;, is given by the HJB equation below
,;���� 0�� <=>?@ A- ���� ( 0��12��� ( 0�� � �13 �� � ,����� 0���B ���� �� �� �� � (9)
�; CDE<=>?@ A- ���� ( 0��12��� ( 0�� � �13 �� � ,����� 0���B (10)
Define the costate vector as F� 6 GH@GI@ where ,���� 0�� is denoted by ,� to get
�; (3J�����1 GH@KLGI@KL (3J�����1F��� (11)
Replacing � in (9) by �; , the HJB equation reads
,;���� 0�� - ���� ( 0��12��� ( 0�� � �13 �� � ,;����� 0��� (12)
The costate equation can be derived by taking the derivative of both sides of (12) with respect to �� as
F�� 2��� ( 0�� � M�1F� (13)
or equivalently
![Page 4: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/4.jpg)
American Institute of Aeronautics and Astronautics
4
F��� 2���� ( 0��� � M��1 F��- (14)
where
M�� 6 GI@KNGI@KL G�O�I@KL��P�I@KL�?@KL�GI@KL (15)
Using the SNAC scheme 30, a neural network (NN) is selected to generate F��� . From (14), the costate vector F�is observed not only to be a function of the states �, but also a function of the reference signal 0, hence, the inputs to
the network are selected to be �� and 0�. Denoting the NN mapping with ���� � one has
F��� ������ 0�� (16)
Selecting the network structure of linear in the tunable weights, one has
F��� Q1R���� 0�� (17)
where Q � �S!� denotes the network weight matrix, R�� � � �S is composed of T linearly independent scalar basis
functions.
Considering (14), the network training target, denoted by FU, can be calculated using the following equation.
F��U 2���� ( 0��� � M��1 F��-���� �� �� �� � (18)
which in the training process, F��- on the right hand side of (18) will be substituted by Q1R����� 0���. Noting
that the closed loop dynamics, using (6), (11), and (17), is given by
��� ����� ( �����3J�����1Q1R���� 0�� (19)
equation (18) changes to
F��U 2������ ( �����3J�����1Q1R���� 0�� ( :�0�������������������������������M��1 Q1R������ ( �����3J�����1Q1R���� 0��� :�0��� (20)
Note that in (20) which is supposed to generate F��U to be used for training the weights, the right hand side of
the equation is dependent on Q, hence, the calculated target F��U is itself a function of the weights and the optimal
target needs to be obtained through a successive approximation scheme, called reinforcement learning. Defining the
training error as
V�� 6 F��� ( F��U Q�1R���� 0�� ( F��U (21)
the iterative process of learning Q can be summarized as the below algorithm.
Algorithm 1:
1- Randomly select the state vector �� and the reference signal, i.e., 0�, belonging to the selected domain of
interest.
2- Through the process showed in Fig. 1 and given by equation (20), calculate F��U .
3- Train network weight Q using input-target pair W���� 0��� F��U X.4- Calculate the training error V�� using (21).
5- Repeat step 1 to 4 until the error V�� converges to zero or a small value for different ��’s and 0� 's
selected in step 1.
Having the input-target pair W���� 0��� F��U X calculated, the network can be trained using any training method 31.
The selected training law in this study is the least squares method. Assume that in each iteration of Algorithm 1,
instead of one random state and reference signal� Y random states and reference signals denoted by ���� and 0���,� �� �� � � Y� are selected. Denoting the training target FU calculated using ���� and 0��� by FU������� 0�����, the
objective is finding Q such that it solves
![Page 5: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/5.jpg)
American Institute of Aeronautics and Astronautics
5
Fig. 1. Tracking-SNAC training diagram.
Z[\[]Q1R������ 0���� FU������ 0����Q1R����-�� 0��-�� FU����-�� 0��-���Q1R����^�� 0��^�� FU����^�� 0��^��
(22)
Define
_ 6 `R������ 0�������R����-�� 0��-����� ���R����^�� 0��^��a (23)
bc 6 `FU������ 0�������FU����-�� 0��-��������FU����^�� 0��^��a (24)
Using least squares, the solution to system of linear equations (22) is given by
Q �__1�J_bc1 (25)
Note that for the inverse of matrix �__1� to exist, one needs the basis functions R to be linearly independent
and the number of random states Y to be greater than or equal to the number of neurons, T.
Though (25) looks like a one shot solution for the ideal NN weights, the training is an iterative process which
needs selecting different random states and reference signals and updating the weights through solving (25)
successively. Note that bc used in the weight update (25), as explained earlier, is not the true optimal costate and is a
function of current estimation of the ideal unknown weight, i.e., bc bc�Q�. Denoting the weights at the �th epoch
of the weight update by Q� the iterative procedure is given by
Q�� �__1�J_bc�Q��1 (26)
Costate Equation (18)
Optimal
Control
Equation
(11) ��
F��-
���
F��
�
Tracking-SNAC (17)
Tracking-SNAC (17)
Optimal Control
Equation (11)
State Equation (6)
F��U
��
:�� �
0��
0�
![Page 6: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/6.jpg)
American Institute of Aeronautics and Astronautics
6
Upon convergence of the weights, one has
%�#�&'Q� Q; (27)
where�Q; denotes the optimal NN weights in the sense that it generates the optimal costate vector F��. For this
purpose, one starts with an initial weight Q� and iterates through (25) until the weights converge. The initial weight
can be set to zero or can be selected based on the linearized solutions of the given nonlinear system.
Comment 1: If the reference signal is such that there is no control needed after the system achieves perfect tracking,
then the term �13 � in the cost function becomes zero. If not, the cost function becomes unbounded. In this case, an
alternative formulation is suggested as given below:
Use discounted cost function
, . d����� ( 0��12��� ( 0�� � �13 ��'�5� (28)
with � e d e � being the discount factor. In this case, the costate equation (14) changes to
f� 2��� ( 0�� � dM��1 f�� (29)
and the optimal control equation (11) reads
� (d3J�����1f�� (30)
A. Re-optimization of Tracking-SNAC
As seen in (20) and Fig. 1, the target calculation for training the Tracking-SNAC is dependent on the dynamics
of the reference signal, i.e., :�� �. In case this dynamics is not perfectly known, one may use the available/estimated
dynamics for offline training of the Tracking-SNAC, and then, in online implementation, re-optimize the network
through re-training it. Note that in online implementation, which 0� is directly available, the knowledge of function :�� � is not needed since the equation given below for target calculation can be used in online re-optimization.
F�U 2��� ( 0�� � M�1Q1R���� 0�� (31)
and then train the network using input-target pairs W���J� 0�J�� F�U X. In this manner, one needs to start the re-
optimization process after the first time index, and then, using the current �� and 0�, along with ��J and 0�J, the
target F�U can be calculated and the weights can be re-optimized.
The weight update law used for offline training was a batch training method, in which, one trains the network
based on Y different input-target pairs. But, for the re-optimization, at each instant, only one input-target pair is
available and the network should be trained based on that for the subsequent pairs to be generated using a network
with a more optimized weights, hence, the online training needs to be done using a sequential training method 31.
IV. Formation Control Using Tracking-SNAC
Using the developed Tracking-SNAC for the formation control problem, the reference signal is given by (5).
However, the problem is that the dynamics of the reference signal is not known a priori, for use in the training. The
suggested solution is estimating the dynamics, in order to be able to propagate the randomly selected 0� to 0�� as
needed in the training process, and training the network based on that, and then, in online implementation, re-
optimizing the network to learn the reference signal’s true dynamics. As explained in the previous section, one needs
to know only 0� and 0�J about the reference signal during the re-optimization phase at each time step �. In this
approach, one uses the knowledge of ��) for + � �� at each time step � to calculate 0�� using (5). Hence, the
controller will be decentralized in the sense that no knowledge of the states of non-neighbors is required.
Note that, even if the dynamics of the agents are similar, each agent will need its own Tracking-SNAC to re-
optimize it based on its observed reference signal, which is different than that of the others. In case of identical
agents, one may train one network offline and duplicate it for different agents; then, each one will re-optimize its
own.
![Page 7: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/7.jpg)
American Institute of Aeronautics and Astronautics
7
V. Numerical Analysis
In this section, the problem of formation control of several spacecraft is selected and simulated to demonstrate
the performance of the developed controller. The problem is having four spacecraft as four mass points in different
orbits and a decentralized controller is required to bring them all to one location for docking.
A. Modeling
Equation of motion of a spacecraft in a gravity field described in an inertial frame centered at the center of
gravity is 32,
�g ( h7i7j � � : (32)
where �, 7�7, k, and : denote the position vector of the spacecraft from the center of gravity, the magnitude of the
vector �, gravitational coefficient of the gravity field, and the vector of the force excreted on the spacecraft per its
unit of mass, respectively. Moreover, notation �g denotes the second time derivative of the vector � with respect to
the inertial frame.
Selecting some reference length 3 and reference time l, one can normalize the parameters by defining
normalized � and : denoted by �m and :n, respectively, as
�m o �� �mp 1o � :n 1No : (33)
Selecting the reference time of l� �q3rsk� and normalizing (32) leads to
�mg ( 7im7j �m � :n (34)
Representing vector �m in the inertial frame by three element t����-���ru1 and their rates by t�v���w���xu1 6t�p���p-���pru1, the state vector can be formed as
� t����-���r���v���w���xu1 (35)
and the normalized control force per unit of mass represented in the inertial frame by :n t �� -�� ru1 can form the
control vector
t �� -�� ru1 (36)
Using (34), the state equation reads
�p �y��� � �y��� �$����� �� �� �� � (37)
where
�y���
z{{{{{{|�v�w�x( 7im7j �( 7im7j �-( 7im7j �r}~
~~~~~��������y���
z{{{{|� � �� � �� � �� � �� � �� � �}~
~~~� (38)
and
7�m7 q�- � �-- � �r- (39)
![Page 8: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/8.jpg)
American Institute of Aeronautics and Astronautics
8
Discretizing dynamics (37) using small sampling time �$, the discrete-time state equation as given in (6) will be
resulted, where
����� ����$� � �$�y�����$�� ����� �$�y�����$�� (40)
B. Numerical Results
Having modeled the dynamics of the spacecraft, four agents in different orbits listed in Table 1 are selected.
Assuming that at the beginning of the maneuver, the spacecraft are all passing through the equatorial plane moving
from the southern hemisphere toward the northern hemisphere, the objective is defined as performing some
maneuvers to dock at the location of the first spacecraft. In the other words, the first spacecraft will keep rotating
along its orbit without any control force applied on it and the rest are supposed to get to the orbiting location of
spacecraft 1 and dock with it.
The communication topology between the spacecraft is assumed time-varying and depicted in Fig. 2. Indexing
the spacecraft from 1 to 4, in the first five time units spacecraft 2 and 4 can communicate with spacecraft 1 and have
direct knowledge of its location while spacecraft 3 does not have access to spacecraft 1. For the next five time units
spacecraft 4 has only access to the location of spacecraft 3 which itself can only access the location of spacecraft 2
as seen in the figure, hence, the decentralized nature of the controller can be examined in this example.
Table 1: The specifications of the orbits of spacecraft 1 to 4.
The Characteristic Orbit 1 Orbit 2 Orbit 3 Orbit 4
Orbit Semi-major axes 9,000 km 11,000 km 13,000 km 15,000 km
Right Ascension of the Ascending Node 20 deg. 0 deg. 40 deg. 50 deg.
Inclination 20 deg. 0 deg. 40 deg. 50 deg.
Eccentricity 0 0 0 0
Fig. 2. Graphs representing communication between the spacecraft: a) For the first five time units, b) For
next 5 time units, and c) For the rest of the simulation.
The reference length is selected as 3 �������#, hence, reference time will be l q3rsk ������ ! ��r�.In this condition, each control unit will be 3sl- ����� ! ��Jr�#s�-. The initial conditions of each spacecraft
based on Table 1 is calculated and listed below.
�� t����������������������������� ( ��������������������������u1 (41)
��- t��������������������������������������������������������u1 (42)
��r t����������������������������� ( ��������������������������u1 (43)
��v t����������������������������� ( ��������������������������u1 (44)
The weight matrix of the cost function are selected as
2 �������������������������������� (45)
3 ������������� (46)
1 2
34
a
1 2
34
b
4
c
1 2
3
![Page 9: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/9.jpg)
American Institute of Aeronautics and Astronautics
9
The first three elements of 2 correspond to the position error and by putting high weight on them, one emphasizes
on the minimization of the position error.
The Tracking-SNAC design and training is the next step. The sampling time of �$ ���� is selected for
discretizing continuous dynamics (37) and denoting the basis functions inputs with � and 0 and their elements with �� and 0�, � ���� � ��, the vector R��� 0� is selected of elements ��� and 0��, � ���� � ��, along with following
terms ��-, ��r, �-�r, �v�w, �v�x, �w�x, and also 00-, 00r, 0-0r, 0v0w, 0v0x, 0w0x, or in the other words, the
input elements along with different combination of multiplication of position elements with each other and velocity
element with each other, for the state vector and the reference signal, separately. This leads to R��� 0� � �-v.Note that the cost function (8) is finite for this problem since no control is needed after spacecraft 2 to 4 get to
the location of spacecraft 1 and keep orbiting with that along the orbit. (See comment 2). As discussed earlier, for
the offline training an estimation of the dynamics of reference signal 0, i.e., :�� � is required. Denoting the elements
of 0 by 0�, � ������ � ��, in this study, following selection is made
:�t0� 0-� 0r� 0v� 0w� 0xu1� t0� 0-� 0r� 0v� 0w� 0xu1 � �$t0v� 0w� 0x� �����u1 (47)
which means that the reference signal’s position is propagated based on its velocity which is assumed constant.
The training is done for 200 epochs, where in each iteration, 50 random states and reference signals are selected
to perform the least squares technique given in (25). The weights of the Tracking-SNAC converged after around 70
iterations of Algorithm 1 as seen in Fig. 3 which shows the history of evolution of the weights versus the offline
training iterations.
Once trained, the neurocontroller was duplicated to be used for different spacecraft and the simulation was done
for 30 time units. The position histories resulted from simulating the proposed controller are depicted in Fig. 4. As
seen, the three spacecraft 2 to 4 starting at different locations have converged to the location of spacecraft 1 and kept
orbiting along with that. This can be confirmed by looking at Fig. 5, as well. In this figure, the error between the
positions of spacecraft 2 to 4 with respect to the position of spacecraft 1 is shown. Convergence of the error to zero
corresponds to the desired formation.
In Fig. 6, history of the weights of the networks during online re-optimization versus time is shown. As
expected, initially the weights have changed rapidly and then converged to some final values. These changes in the
weights are because of re-optimizing the network to give optimal results based on the actual dynamics of the
reference signal.
In order to analyze the effect of re-optimization of the Tracking-SNAC, a separate simulation is performed using
the same offline-trained network without re-optimization. The position error histories, with respect to the position of
spacecraft 1, are depicted in Fig. 7. As seen in this figure, even though the controller has been able to decrease the
initial high errors, but, the errors do not get closer to zero, hence, the Tracking-SNAC without re-optimization
cannot make the satellites perfectly track the desired signal to achieve the desired formation. This emphasizes the
lack of optimality of the offline trained networks for tracking the signal whose dynamics is different than the one
used in training and shows the need for the online re-optimization.
VI. Conclusions
The problem of formation control of network of heterogeneous nonlinear dynamics is converted to a tracking
problem and the developed Tracking-SNAC controller is shown to solve the problem through offline training based
on an estimated dynamics of the reference signal and re-optimizing the network in online implementation to learn
the actual dynamics of the reference signal. The simulation results seem to indicate that the proposed technique has
great promises.
Acknowledgement
This research was partially supported by a grant from the National Science Foundation.
![Page 10: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/10.jpg)
American Institute of Aeronautics and Astronautics
10
Fig. 3. History of weights versus the training iterations for the offline training phase.
Fig. 4. Histories of the positions of the spacecraft in different axes.
Spacecraft 1 to 4 are denoted by ai, i =1,2,3,4.
0 50 100 150 200−400
−200
0
200
400
Weig
ht
Ele
ments
Training Iterations
0 5 10 15 20 25 30
−1
0
1
2
Po
sitio
n (
X)
0 5 10 15 20 25 30
−1
0
1
2
Po
sitio
n (
Y)
0 5 10 15 20 25 30
−0.4
−0.2
0
0.2
0.4
0.6
Po
sitio
n (
Z)
Time Unit
a1
a2
a3
a4
a1
a2
a3
a4
a1
a2
a3
a4
![Page 11: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/11.jpg)
American Institute of Aeronautics and Astronautics
11
Fig. 5. Histories of the errors in the positions of the spacecraft in different axes.
Spacecraft 2 to 4 are denoted by ai, i = 2,3,4.
Fig. 6. History of weights versus time for the online re-optimization.
0 5 10 15 20 25 30−0.2
−0.1
0
0.1
0.2
0.3
Po
sitio
n E
rro
r (Z
)
Time Unit
0 5 10 15 20 25 30−0.4
−0.2
0
0.2
0.4
Po
sitio
n E
rro
r (X
)
a2
a3
a4
0 5 10 15 20 25 30−0.5
0
0.5
1
Po
sitio
n E
rro
r (Y
) a2
a3
a4
a2
a3
a4
0 5 10 15 20 25 30−400
−200
0
200
400
Weig
ht E
lem
ents
Time Unit
![Page 12: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/12.jpg)
American Institute of Aeronautics and Astronautics
12
Fig. 7. Histories of the error in the positions of the spacecraft in different axes without re-optimization of
the Tracking-SNACs. Spacecraft 2 to 4 are denoted by ai, i = 2,3,4.
References 1J. Lawton, R. Beard, and B. Young, "A Decentralized Approach to Formation Maneuvers," IEEE Transactions on Robotics
and Automation, vol. 19, no. 6, 2003. 2A. Fax and R. Murray, "Information Flow and Cooperative Control of Vehicle Formations," IEEE Transactions on
Automatic Control, vol. 49, no. 9, 2004. 3G. Lafferriere, A. Williams, J. Caughman, and J.J.P. Veerman, “Decentralized control of vehicle formation,” Systems and
control letters, (54), 2005. 4Y.C. Cao and W. Ren, ”Optimal Linear Consensus Algorithm: an LQR Perspective,” IEEE Trans. on Systems, Man, and
Cybernetics (Part B), Vol. 40, 2010, pp.819–830. 5J. Wang and M. Xin, "Multi-agent Consensus Algorithm with Obstacle Avoidance via Optimal Control Approach, Proc.
American Control Conference, San Francisco, 2011, pp.2783-2788. 6R. Olfati-Saber and R. Murray, “Consensus protocols for networks of dynamic agents,” Proc. Amer. Control Conf., 2003,
pp. 951–956. 7X.H. Wang, V. Yadav, and S.N. Balakrishnan, “Cooperative UAV Formation Flying with Obstacle/ Collision Avoidance,”
IEEE Trans. on Control Systems Technology, Vol. 15, 2007, pp.672–679. 8S. Zhang and G. Duan, “Consensus seeking in multiagent cooperative control systems with bounded control input,” J
Control Theory Appl, Vol. 9 (2), 2011, pp.210–214. 9W. He and J. Cao, “Consensus control for high-order multi-agent systems,” IET Control Theory Appl., Vol. 5 (1), 2011, pp.
231–238.10J. Seo, H. Shima, and J. Back, “Consensus of high-order linear systems using dynamic output feedback compensator: Low
gain approach,”Automatica, Vol. 45 ,2009, pp.2659-2664.
0 5 10 15 20 25 30−0.2
−0.1
0
0.1
0.2
0.3
Po
sitio
n E
rro
r (X
)
0 5 10 15 20 25 30−0.5
0
0.5
1
Po
sitio
n E
rro
r (Y
)
0 5 10 15 20 25 30−0.2
−0.1
0
0.1
0.2
0.3
Po
sitio
n E
rro
r (Z
)
Time Unit
a2
a3
a4
a2
a3
a4
a2
a3
a4
![Page 13: An Optimal Tracking Approach to Formation Control of Nonlinear …faculty.smu.edu/aheydari/Research/Conference_Papers/PDF... · 2016-10-13 · American Institute of Aeronautics and](https://reader036.vdocument.in/reader036/viewer/2022063004/5f7af83be9fbbd7855115af8/html5/thumbnails/13.jpg)
American Institute of Aeronautics and Astronautics
13
11F. Xiao and L. Wang, “Consensus problems for high-dimensional multi-agent systems,” IET Control Theory Appl., Vol. 1
(3), 2007, pp. 830–837. 12J. Wang, D. Cheng, and X. Hu, “Consensus of multi-Agent Linear Dynamic Systems,” Asian Journal of Control, Vol. 10
(2), 2008, pp.144-155. 13G. Zhai, S. Okuno, J. Imae, and T. Kobayashi, “A Matrix Inequality Based Design Method for Consensus Problems in
Multi–Agent Systems,” Int. J. Appl. Math. Comput.Sci., Vol. 19 (4), 2009, pp.639–646. 14R. Olfati-Saber and R. Murray, “Consensus problems in networks of agents with switching topology and time-delays,”
IEEE Transactions on Automatic Control,Vol. 49 (9), 2004, pp.1520–1533. 15R.A. Horn and C.R. Johnson, Topics in matrix analysis, Cambridge Univ. Press, Cambridge, 1994. 16C. Meyer, Matrix analysis and applied linear algebra, SIAM, Cambridge, 2001. 17C. Chen, Linear System Theory and Design, Oxford University Press, USA, 1999. 18A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic
programming: convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 38, pp. 943-949, 2008. 19L. Jian-Zhen, “Reference model based consensus control of second-order multi-agent systems,” Chin. Phys. B, Vol. 20 (2),
2011.20D.V. Prokhorov and D.C. Wunsch, “Adaptive Critic Designs,” IEEE Trans. Neural Networks, Vol. 8 (5), pp. 997-1007,
1997.21A. Das and F.L. Lewis, “Cooperative adaptive control for synchronization of second-order systems with unknown
nonlinearities,” Int. J. Robust. Nonlinear Control, Vol. 21, pp. 1509–1524, 2011. 22W. Ren and R. Beard, "Formation feedback control for multiple spacecraft via virtual structures," IEE Proc.-Control
Theory Appl., Vol. 151, No. 3, pp. 357-368, 2004. 23W. Ren and R. Beard, "Decentralized Scheme for Spacecraft Formation Flying via the Virtual Structure Approach," J. of
Guidance, Control, and Dynamics.,Vol. 27 (1), pp. 73-82, 2004. 24H. Zhang, Q. Wei, and Y. Luo, “A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time
Nonlinear Systems via the Greedy HDP Iteration Algorithm,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 38 (4), pp. 937-
942, 2008. 25T. Dierks and S. Jagannathan, “Online optimal control of nonlinear discrete-time systems using approximate dynamic
programming,” J. Control Theory Appl., Vol.9 (3), pp.361–369., 2011. 26Q. Yang and S. Jagannathan, “Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-
Time Systems,” Proc. IEEE Int. Sym. on Intelligent Control, 2007, pp. 578–583. 27J. Yao, X. Liu, and X. Zhu, “Asymptotically Stable Adaptive Critic Design for Uncertain Nonlinear Systems,” Proc. Amer.
Control Conf., 2009, pp. 5156–5161. 28L. Yang, J. Si, K. Tsakalis, and A. Rodriguez, “Direct Heuristic Dynamic Programming for Nonlinear Tracking Control
With Filtered Tracking Error,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,Vol. 39 (6), pp. 1617-1622, 2009. 29S. Bhasin, N. Sharma, P. Patre, P., and W. Dixon, “Asymptotic tracking by a reinforcement learning-based adaptive critic
controller,” J. Control Theory Appl., Vol.9 (3), pp. 400–409., 2011. 30R. Padhi, N. Unnikrishnan, X. Wang, and S.N. Balakrishnan, “A Single Network Adaptive Critic (SNAC) Architecture for
Optimal Control Synthesis for a Class of Nonlinear Systems,” Neural Networks, Vol.19, 2006, pp.1648–1660. 31S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice Hall, 1998. 32M.J. Sidi, Spacecraft Dynamics and Control: A Practical Engineering Approach, Cambridge University Press, 2000.