applied mathematics and computationgouliana/files/research/2013... · 2015. 7. 6. · k. goulianas...

Applied Mathematics and Computation 219 (2013) 4444–4464

Contents lists available at SciVerse ScienceDirect

Applied Mathematics and Computation

journal homepage: www.elsevier .com/ locate /amc

Finding all real roots of 3 � 3 nonlinear algebraic systemsusing neural networks

Konstantinos Goulianas a, Athanasios Margaris b,⇑, Miltiades Adamopoulos ca TEI of Thessaloniki, Department of Informatics, Thessaloniki, Greeceb TEI of Larissa, Department of Computer Science and Telecommunications, Greecec University of Macedonia, Thessaloniki, Greece

a r t i c l e i n f o

Keywords:Nonlinear algebraic systemsNeural networksNumerical analysisComputational methods

0096-3003/$ - see front matter � 2012 Elsevier Inchttp://dx.doi.org/10.1016/j.amc.2012.10.049

⇑ Corresponding author.E-mail addresses: [email protected] (K. Gouli

a b s t r a c t

The objective of this research is the description of a feed-forward neural network capableof solving nonlinear algebraic systems with polynomials equations. The basic features ofthe proposed structure, include among other things, product units trained by the back-propagation algorithm and a fixed input unit with a constant input of unity. The presentedtheory is demonstrated by solving complete 3� 3 nonlinear algebraic system paradigms,and the accuracy of the method is tested by comparing the experimental results producedby the network, with the theoretical values of the systems roots.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

A typical nonlinear algebraic system is defined as Fð~zÞ ¼ 0 with the mapping function F : Rn ! Rn ðn > 1Þ to be describedas an n-dimensional vector

F ¼ ½f1; f2; . . . ; fn�T ; ð1Þ

where fi : Rn ! R (i ¼ 1;2; . . . ;n). Generally speaking, there are no good methods for solving such systems: even in the simple

case of only two equations in the form f1ðz1; z2Þ ¼ 0 and f2ðz1; z2Þ ¼ 0, the estimation of the system roots is reduced to theidentification of the common points of the zero contours of the functions f1ðz1; z2Þ and f2ðz1; z2Þ. But this is a very difficulttask, since in general, these two functions have no relation to each other at all. In the general case of N nonlinear equations,solving the system requires the identification of points that are mutually common to N unrelated zero-contour hyper-surfaces each of dimension N � 1 [28].

2. Nonlinear algebraic systems

According to the basic principles of the nonlinear algebra [26], a complete nonlinear algebraic system of n polynomialequations with n unknowns ~z ¼ ðz1; z2; . . . ; znÞ is identified completely by the number of equations n, and their degreesðs1; s2; . . . ; snÞ, it is expressed mathematically as

Fið~zÞ ¼Xn

j1 ;j2 ;...;jsi

Aj1 j2 ...jsii zj1 zj2 . . . zjsi ¼ 0 ð2Þ

. All rights reserved.

anas), [email protected], [email protected] (A. Margaris), [email protected] (M. Adamopoulos).

http://dx.doi.org/10.1016/j.amc.2012.10.049mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.amc.2012.10.049http://www.sciencedirect.com/science/journal/00963003http://www.elsevier.com/locate/amc

K. Goulianas et al. / Applied Mathematics and Computation 219 (2013) 4444–4464 4445

(i ¼ 1;2; . . . ;n), and it has one non-vanishing solution (i.e. at least one zj – 0) if and only if the equation

Rs1 ;s2 ;...sn Aj1j2 ...jsii

n o¼ 0 ð3Þ

holds. In this equation, the function R is called the resultant and it is a straightforward generalization of the determinant of alinear system. The resultant R is a polynomial of the coefficients of A of degree

ds1 ;s2 ;...;sn ¼ degARs1 ;s2 ;...;sn ¼Xni¼1

Yj–i

sj

!ð4Þ

When all degrees coincide, i.e. s1 ¼ s2 ¼ � � � ¼ sn ¼ s, the resultant Rnjs is reduced to a simple polynomial of degreednjs ¼ degARnjs ¼ nsn�1 and it is described completely by the values of the parameters n and s. It can be proven that the coef-ficients of the matrix A, which is actually a tensor for n > 2, are not all independent each other. More specifically, for thesimple case s1 ¼ s2 ¼ � � � ¼ sn ¼ s, the matrix A is symmetric in the last s contra-variant indices and it contains only nMnjsindependent coefficients, with

Mnjs ¼ðnþ s� 1Þðn� 1Þ! s! : ð5Þ

An interesting description concerning the existence of solution for a nonlinear algebraic system can be found in [30].Even though the notion of the resultant has been defined for homogenous nonlinear equations, it can also describe non-

homogenous algebraic equations as well. In the general case, the resultant R, satisfies the nonlinear Cramer rule

Rs1 ;s2 ;...;snfAðkÞðZkÞg ¼ 0; ð6Þ

where Zk is the kth component of the solution of the no homogenous system, and AðkÞ is the kth column of the coefficient ma-

trix A.

3. Review of previous work

The solution of nonlinear algebraic systems is generally possible by using not analytical, but numerical algorithms. Be-sides the well known fixed-point based methods, (quasi)-Newton and gradient descent methods, a well known class of suchalgorithms is the ABS algorithms introduced in 1984 by Abaffy, Broyden, and Spedicato [10] to solve linear systems as well asnonlinear equations and system of equations [11,4]. The basic function of the initial ABS algorithms is to solve a determinedor under-determined n�m linear system Az ¼ b (z 2 Rn; b 2 Rm;m 6 n) by using special matrices, known as Abaffians. Thechoice of those matrices as well as the quantities used in their defining equations, determine particular sub-classes of theABS algorithms, the most important of them are the conjugate direction subclass, the orthogonality scaled subclass as wellas, the optimally stable subclass. The extension of the ABS methods for solving nonlinear algebraic systems is straightfor-ward and it can be found in many sources such as [9,8]. It can be proven, that under appropriate conditions, the ABS methodsare locally convergent with a speed of Q-order two, while, the computational cost of one iteration is Oðn3Þ flops plus onefunction and one Jacobian matrix evaluation. To save the cost of Jacobian matrix evaluations, Huang [5] introduced quasi-Newton based ABS methods known as row update methods. These methods, do not require the a priori computation ofthe Jacobian matrix, and therefore its computational cost is Oðn3Þ.

Galantai and Jeney [1] have proposed alternative methods for solving nonlinear systems of equations that are combina-tions of the nonlinear ABS methods and quasi-Newton methods. Another class of methods has been proposed by Kublanovs-kaya and Simonova [27] for estimating the roots of m nonlinear coupled algebraic equations with two unknowns k and l. Intheir work, the nonlinear system under consideration is described by the algebraic equation Fðk;lÞ ¼ ½f1ðk;lÞ;f2ðk;lÞ; . . . ; fmðk;lÞ�T ¼ 0 with the function fkðk;lÞ ðk ¼ 1; . . . ;mÞ to be a polynomial in the form

fkðk;lÞ ¼ ½aðkÞts lt þ � � � þ aðkÞ0s �k

s þ � � � þ ½aðkÞt0 lt þ � � � þ aðkÞ00 �: ð7Þ

In this equation, the coefficients aij (i ¼ 0;1; . . . ; t and j ¼ 0;1; . . . ; s) are, in general, complex numbers, while s and t are themaximum degrees of polynomials in k and l respectively, found in Fðk;lÞ ¼ 0. The algorithms proposed by Kublanovskayaand Simonova are capable of finding the zero-dimensional roots ðk�;l�Þ, i.e. the pairs of fixed numbers satisfying the non-linear system, as well as the one-dimensional roots defined as ðk;lÞ ¼ ½uðlÞ;l� and ðk;lÞ ¼ ½k; ~uðkÞ� whose componentsare functionally related.

The first method of Kublanovskaya and Simonova consists of two stages. At the first stage, the process passes from thesystem Fðk;lÞ ¼ 0 to the spectral problem for a pencil Dðk;lÞ ¼ AðlÞ � kBðlÞ of polynomial matrices AðlÞ and BðlÞ, whosezero-dimensional and one-dimensional eigenvalues coincide with the zero dimensional and one dimensional roots of thenonlinear system under consideration. On the other hand, at the second stage, the spectral problem for Dðk;lÞ is solved,i.e. all zero-dimensional eigenvalues of Dðk;lÞ as well as a regular polynomial matrix pencil whose spectrum gives allone-dimensional eigenvalues of Dðk;lÞ are found. Regarding the second method, it is based to the factorization of Fðk;lÞ intoirreducible factors and to the estimation of the roots ðl; kÞ one after the other, since the resulting polynomials produced bythis factorization are polynomials of only one variable.

4446 K. Goulianas et al. / Applied Mathematics and Computation 219 (2013) 4444–4464

Emiris, Mourrain, and Vrahatis [7] developed a method for the counting and identification of the roots of a nonlinear alge-braic system based to the concept of topological degree and by using bisection techniques (for the application of those tech-niques see also [22]). A method for solving such a system using linear programming techniques can be found in [29], andanother interesting method that uses evolutionary algorithms is described in [6]. There are many other methods concerningthis subject; some of them are described in [2,25,24,12]. In the concluding section, the proposed neural model will be com-pared against some of those other methods such as the trust-region-dogleg [3], the trust-region-reflective [3] and the Leven-berg–Marquardt algorithm [15,20], as well as the multivariate Newton–Raphson method with partial derivatives.

4. ANNs as nonlinear system solvers

Artificial neural networks (ANNs in short) have been used among other methods for solving linear algebra problems andsimple linear systems and equations [14,13] as well as nonlinear equations and systems of nonlinear equations; however,they are not used so frequent as the methods described in the previous section, especially for the case of nonlinear systems.Mathia and Saeks [21] used recurrent neural networks composed of linear Hopfield networks to solve nonlinear equationswhich are approximated by a multilayer perceptron. Mishra and Kalra [23] used a modified Hopfield network with a properlyselected energy function to solve a nonlinear algebraic system of m equations with n unknowns. Hopfield neural networkswere also used by Luo and Han [17] to solve a nonlinear system of equations which in the previous stage has been trans-formed to a kind of quadratic optimization. Finally, Li and Zeng [16] used the gradient descent rule with a variable step sizeto solve nonlinear algebraic systems at very rapid convergence and with very high accuracy.

The main idea behind the proposed approach for solving a nonlinear algebraic system of p equations with p unknowns[18], is to construct a network with p output neurons, with the desired output of the ‘th neuron ð1 6 ‘ 6 pÞ to be equal tozero; in this way, the nonlinear algebraic system under consideration is simulated completely by the neural model. An effi-cient approach to construct such a network is shown in Fig. 1 for the case of a complete 2� 2 nonlinear algebraic system[19]. In this figure, the proposed model is composed of four layers, an input layer, with one summation unit with a constantinput equal to unity, a hidden layer of linear units that generate the linear terms x and y, a hidden layer of summation andproduct units that generate all the terms of the left hand part of the system equations except the fixed term – namely, theterms x; y; x2; xy and y2 – by using the appropriate activation function (for example, the activation function of the neuron

Fig. 1. The structure of the complete 2� 2 nonlinear algebraic system neural solver.


producing the x2 term is the function f ðxÞ ¼ x2) and finally, an output layer of summation units whose total input is theexpression of the left hand part of the corresponding equation. These output neurons uses the hyperbolic tangent as the acti-vation function and have an additional input from a bias unit, whose fixed weight is the constant term of the associated non-linear equation. To simulate the system, the synaptic weights are configured in the way shown in the figure. It is not difficultto note that the weights of synapses joining the neurons of the second layer to the neurons of the third layer, are fixed andequal to unity, while the weights joining the summation and the product units of the third layer to the neurons of the fourth(output) layer have been set to the fixed values of the coefficients of the system equations associated with the linear andnonlinear terms provided by the neurons (it is clear that if the algebraic system is not complete but there are missing terms,the corresponding synaptic weights are set to zero). The only variable-weight synapses are the two synapses that join theinput neuron with the two linear neurons of the first hidden layer. Since this network structure produces as output a zerovalue, it is clear that a training operation with the traditional back propagation algorithm and the ½0 0�T as the desired outputvector, will assign to the weights of the two variable weight synapses, the components x and y of one of the system roots. Thetraining of this network with a lot of initial conditions is capable of identifying all roots of the traditional 2� 2 completenonlinear algebraic systems with four distinct roots as well as the basin of attraction for each one of them and the basinof infinity that lead to a nonconverging behavior [19].

5. The neural model for the 3� 3 complete nonlinear algebraic system

The complete 3� 3 nonlinear algebraic system is defined as

Table 1The stru

Laye

LayeLayeLayeLaye

Laye

Fiðx1; x2; x3Þ ¼ ai;1x31 þ ai;2x32 þ ai;3x33 þ ai;4x21x2 þ ai;5x1x22 þ ai;6x21x3 þþai;7x1x23 þ ai;8x22x3 þ ai;9x2x23 þ ai;10x1x2x3þ ai;11x1x2 þþai;12x1x3 þ ai;13x2x3 þ ai;14x21 þ ai;15x22 þ ai;16x23 þ ai;17x1 þ ai;18x2 þ ai;19x3 � ai;20 ¼ 0

ði ¼ 1;2;3Þ and it is a straightforward generalization of the 2� 2 complete nonlinear system studied in [19]. The neuralstructure that solves this type of system is based to the approach described in the previous section, it is described in Table1 and it is shown in Fig. 2. In complete accordance with the proposed design, there are full connections between two con-secutive layers; however, for the sake of clarity, the figure shows only the connections with nonzero synaptic weighs. Theseweights are defined as follows:

W12 ¼ W ð1Þ12 Wð2Þ12 W

ð3Þ12

h i¼ x1x2x3½ �;

W23 ¼ fW ði;jÞ2;3 gj¼1;2;3;...;9i¼1;2;3 ¼

1 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 1

264

375;

W34 ¼ fW ði;jÞ3;4 gj¼1;2;3;...;17;18;19i¼1;2;...;9 ¼

1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0

0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

2666666666666666664

3777777777777777775

;

cture of the neural solver for the 3� 3 complete nonlinear algebraic system.

r Neurons Neuron type Weight table

r 1 01 Summation W12ð1� 3Þr 2 03 Summation W23ð3� 9Þr 3 09 Summation W34ð9� 19Þr 4 19 Product & Summation W45ð19� 3Þ

Neurons 1,2,3,8,9,10,17,18,19Summation unitsNeurons 4,5,6,7,11,12,13,14,15,16Product units

r 5 03 Summation Not available

Fig. 2. The structure of the complete 3� 3 nonlinear algebraic system neural solver. Since the back propagation algorithm requires full connectionsbetween consecutive layers, the missing synapses are modeled as synapses with zero weight values. For the sake of simplicity, the parameters x1, x2 and x3are denoted as x, y and z, respectively, while the parameters ai;20 (i ¼ 1;2;3) are denoted as a20, b20 and c20.


W45 ¼

a1;17 a2;17 a3;17a1;14 a2;14 a3;14a1;1 a2;1 a3;1a1;11 a2;11 a3;11a1;4 a2;4 a3;4a1;5 a2;5 a3;5a1;10 a2;10 a3;10a1;18 a2;18 a3;18a1;15 a2;15 a3;15a1;2 a2;2 a3;2a1;13 a2;13 a3;13a1;9 a2;9 a3;9a1;8 a2;8 a3;8a1;12 a2;12 a3;12a1;7 a2;7 a3;7a1;6 a2;6 a3;6a1;19 a2;19 a3;19a1;16 a2;16 a3;16a1;3 a2;3 a3;3

2666666666666666666666666666666666666666664

3777777777777777777777777777777777777777775

:

The activation function for the neurons of the network structure are defined as follows:

� For the second layer: all neurons use the identity function; therefore we have

f ðiÞ2 ðtÞ ¼ t; i ¼ 1;2;3:

� For the third layer: the activation functions of the nine neurons are defined as follows:

f 3 ¼ f ð1Þ3 fð2Þ3 f

ð3Þ3 f

ð4Þ3 f

ð5Þ3 f

ð6Þ3 f

ð7Þ3 f

ð8Þ3 f

ð9Þ3

h i¼ t t2 t3 t t2 t3 t t2 t3� �

:


� For the fourth layer: all the neurons use the identity function, namely

f ðiÞ4 ðtÞ ¼ t; i ¼ 1;2;3; . . . ;17;18;19:

� For the fifth (output) layer: all the neurons use the hyperbolic tangent function, namely

f ðiÞ5 ðtÞ ¼ tanhðtÞ; i ¼ 1;2;3:

6. Building the back propagation equations

By following the traditional back propagation approach, the equation for training the neural network are constructed asfollows:

6.1. Forward pass

The input to the three neurons of the second layer are estimated as

tðiÞ2 ¼WðiÞ12 ¼ xi; i ¼ 1;2;3

while the outputs produced by them have the values

mðiÞ2 ¼ fðiÞ2 ðt

ðiÞ2 Þ ¼ f

ðiÞ2 ðxiÞ ¼ xi; i ¼ 1;2;3:

These outputs are sent as inputs to the neurons of the third layer, whose total input is thus given by the equation

tðiÞ3 ¼X3j¼1

W ðj;iÞ23 mðjÞ2 ; j ¼ 1;2;3;4;5;6;7;8;9

or in expanded form

tð1Þ3 ¼Wð1;1Þ23 m

ð1Þ2 þW

ð2;1Þ23 m

ð2Þ2 þW

ð3;1Þ23 m

ð3Þ2 ¼ x1;


ð1Þ2 þW

ð2;2Þ23 m

ð2Þ2 þW

ð3;2Þ23 m

ð3Þ2 ¼ x1;


ð1Þ2 þW

ð2;3Þ23 m

ð2Þ2 þW

ð3;3Þ23 m

ð3Þ2 ¼ x1;


ð1Þ2 þW

ð2;4Þ23 m

ð2Þ2 þW

ð3;4Þ23 m

ð3Þ2 ¼ x2;


ð1Þ2 þW

ð2;5Þ23 m

ð2Þ2 þW

ð3;5Þ23 m

ð3Þ2 ¼ x2;


ð1Þ2 þW

ð2;6Þ23 m

ð2Þ2 þW

ð3;6Þ23 m

ð3Þ2 ¼ x2;


ð1Þ2 þW

ð2;7Þ23 m

ð2Þ2 þW

ð3;7Þ23 m

ð3Þ2 ¼ x3;


ð1Þ2 þW

ð2;8Þ23 m

ð2Þ2 þW

ð3;8Þ23 m

ð3Þ2 ¼ x3;


ð1Þ2 þW

ð2;9Þ23 m

ð2Þ2 þW

ð3;9Þ23 m

ð3Þ2 ¼ x3:

On the other hand, the outputs of the neurons of the third layer are estimated as

mð1Þ3 ¼ fð1Þ3 ðt

ð1Þ3 Þ ¼ f

ð1Þ3 ðxÞ ¼ x1;


ð2Þ3 Þ ¼ f

ð2Þ3 ðxÞ ¼ x21;


ð3Þ3 Þ ¼ f

ð3Þ3 ðxÞ ¼ x31;


ð4Þ3 Þ ¼ f

ð4Þ3 ðyÞ ¼ x2;


ð5Þ3 Þ ¼ f

ð5Þ3 ðyÞ ¼ x22;


ð6Þ3 Þ ¼ f

ð6Þ3 ðyÞ ¼ x32;


ð7Þ3 Þ ¼ f

ð7Þ3 ðzÞ ¼ x3;


ð8Þ3 Þ ¼ f

ð8Þ3 ðzÞ ¼ x23;


ð9Þ3 Þ ¼ f

ð9Þ3 ðzÞ ¼ x33:

These outputs are sent as inputs to the neurons of the fourth layer. For the summation neurons, their total input is estimatedas

tðiÞ4 ¼X9j¼1

W ðj;iÞ34 mðjÞ3 ; i ¼ 1;2;3;8;9;10;17;18;19


or in expanded form

tð1Þ4 ¼X9j¼1

W ðj;1Þ34 mðjÞ3 ¼ ðW

ð1;1Þ34 m

ð1Þ3 Þ ¼ x1;

tð2Þ4 ¼X9j¼1


ð2;2Þ34 m

ð2Þ3 Þ ¼ x21;

tð3Þ4 ¼X9j¼1


ð3;3Þ34 m

ð3Þ3 Þ ¼ x31;

tð8Þ4 ¼X9j¼1


ð4;8Þ34 m

ð4Þ3 Þ ¼ x2;

tð9Þ4 ¼X9j¼1


ð5;9Þ34 m

ð5Þ3 Þ ¼ x22;

tð10Þ4 ¼X9j¼1


ð6;10Þ34 m

ð6Þ3 Þ ¼ x32;

tð17Þ4 ¼X9j¼1


ð7;17Þ34 m

ð7Þ3 Þ ¼ x3;

tð18Þ4 ¼X9j¼1


ð8;18Þ34 m

ð8Þ3 Þ ¼ x23:

tð19Þ4 ¼X9j¼1


ð9;19Þ34 m

ð9Þ3 Þ ¼ x33

while the outputs of the product units of this fourth layer are estimated as

tðiÞ4 ¼Yj2Si

W ðj;iÞ34 mðjÞ3 ; i ¼ 4;5;6;7;11;12;13;14;15;16;

where Si is the set of indices of the neurons of the third layer that are connected to the ith neuron of the fourth layer(i ¼ 4;5;6;7;11;12;13;14;15;16). If we expand this equation we get a set of equations in the form

tð4Þ4 ¼ ðWð1;4Þ34 m

ð1Þ3 ÞðW

ð4;4Þ34 m

ð4Þ3 Þ ¼ x1x2;


ð2Þ3 ÞðW

ð4;5Þ34 m

ð4Þ3 Þ ¼ x21x2;


ð1Þ3 ÞðW

ð5;6Þ34 m

ð5Þ3 Þ ¼ x1x22;


ð1Þ3 ÞðW

ð4;7Þ34 m

ð4Þ3 ÞðW

ð7;7Þ34 m

ð7Þ3 Þ ¼ x1x2x3;

tð11Þ4 ¼ ðWð4;11Þ34 m

ð4Þ3 ÞðW

ð7;11Þ34 m

ð7Þ3 Þ ¼ x2x3;

tð12Þ4 ¼ ðWð4;12Þ34 m

ð4Þ3 ÞðW

ð8;12Þ34 m

ð8Þ3 Þ ¼ x2x23;

tð13Þ4 ¼ ðWð5;13Þ34 m

ð5Þ3 ÞðW

ð7;13Þ34 m

ð7Þ3 Þ ¼ x22x3;

tð14Þ4 ¼ ðWð1;14Þ34 m

ð1Þ3 ÞðW

ð7;14Þ34 m

ð7Þ3 Þ ¼ x1x3;

tð15Þ4 ¼ ðWð1;15Þ34 m

ð1Þ3 ÞðW

ð8;15Þ34 m

ð8Þ3 Þ ¼ x1x23;

tð16Þ4 ¼ ðWð2;16Þ34 m

ð2Þ3 ÞðW

ð7;16Þ34 m

ð7Þ3 Þ ¼ x21x3

with the outputs of the fourth layer estimated as


ð1Þ4 Þ ¼ f

ð1Þ4 ðxÞ ¼ x1;


ð2Þ4 Þ ¼ f

ð2Þ4 ðx2Þ ¼ x21;


ð3Þ4 Þ ¼ f

ð3Þ4 ðx3Þ ¼ x31;


ð4Þ4 Þ ¼ f

ð4Þ4 ðxyÞ ¼ x1x2;


ð5Þ4 Þ ¼ f

ð5Þ4 ðx2yÞ ¼ x21x2;


ð6Þ4 Þ ¼ f

ð6Þ4 ðxy2Þ ¼ x1x22;


ð7Þ4 Þ ¼ f

ð7Þ4 ðxyzÞ ¼ x1x2x3;


ð8Þ4 Þ ¼ f

ð8Þ4 ðyÞ ¼ x2;



ð9Þ4 Þ ¼ f

ð9Þ4 ðy2Þ ¼ x22;


ð10Þ4 Þ ¼ f

ð10Þ4 ðy3Þ ¼ x32;


ð11Þ4 Þ ¼ f

ð11Þ4 ðyzÞ ¼ x2x3;


ð12Þ4 Þ ¼ f

ð12Þ4 ðyz2Þ ¼ x2x23;


ð13Þ4 Þ ¼ f

ð13Þ4 ðy2zÞ ¼ x22x3;


ð14Þ4 Þ ¼ f

ð14Þ4 ðxzÞ ¼ x1x3;

mð15Þ ¼ f ð15Þðtð15ÞÞ ¼ f ð15Þðxz2Þ ¼ x1x2;
4 4 4 4 3mð16Þ4 ¼ f
ð16Þ4 ðt

ð16Þ4 Þ ¼ f

ð16Þ4 ðx2zÞ ¼ x21x3;


ð17Þ4 Þ ¼ f

ð17Þ4 ðzÞ ¼ x3;


ð18Þ4 Þ ¼ f

ð18Þ4 ðz2Þ ¼ x23;


ð19Þ4 Þ ¼ f

ð19Þ4 ðz3Þ ¼ x33:

Finally, the total input to the three summation neurons of the output layer is estimated as

tð1Þ5 ¼X19j¼1

W ðj;1Þ45 mðjÞ4 ¼ a1;17x1 þ a1;14x21 þ a1;1x31 þ a1;11x1x2 þ a1;4x21x2 þ a1;5x1x22 þ a1;10x1x2x3 þ a1;18x2 þ a1;15x22

þ a1;2x32 þ a1;13x2x3 þ a1;9x2x23 þ a1;8x22x3 þ a1;12x1x3 þ a1;7x1x23 þ a1;6x22x3 þ a1;19x3 þ a1;16x23 þ a1;3x33 � a1;20¼ F1ðx1; x2; x3Þ;

tð2Þ5 ¼X19j¼1

W ðj;2Þ45 mðjÞ4 ¼ a2;17x1 þ a2;14x21 þ a2;1x31a2;11x1x2 þ a2;4x21x2 þ a2;5x1x22 þ a2;10x1x2x3 þ a2;18x2 þ a2;15x22 þ a2;2x32

þ a2;13x2x3 þ a2;9x2x23 þ a2;8x22x3 þ a2;12x1x3 þ a2;7x1x23 þ a2;6x22x3 þ a2;19x3 þ a2;16x23 þ a2;3x33 � a2;20 ¼ F2ðx1; x2; x3Þ

tð3Þ5 ¼X19j¼1

W ðj;3Þ45 mðjÞ4 ¼ a3;17x1 þ a3;14x21 þ a3;1x31 þ a3;11x1x2 þ a3;4x21x2 þ a3;5x1x22 þ a3;10x1x2x3 þ a3;18x2 þ a3;15x22

þ a3;2x32 þ a3;13x2x3 þ a3;9x2x23 þ a3;8x22x3 þ a3;12x1x3 þ a3;7x1x23 þ a3;6x22x3 þ a3;19x3þ a3;16x23 þ a3;3x33 � a3;20 ¼ F3ðx1; x2; x3Þ

while the output produced by them and it is the real output of the neural network is given by the equations

mð1Þ5 � o1 ¼ tanh½tð1Þ5 � ¼ tanh½F1ðx1; x2; x3Þ� ¼ f ðF1Þ;



where for the sake of brevity we define f ðxÞ ¼ tanhðxÞ and use the convention Fi ¼ Fiðx; y; zÞ ði ¼ 1;2;3Þ.

6.2. Backward pass – estimation of the d parameters

In this stage we use the vectors

d5 ¼ dð1Þ5 dð2Þ5 d

ð3Þ5

h i;


ð3Þ4 � � � � � � � � � d

ð17Þ4 d

ð18Þ4 d

ð19Þ4

h i;


ð3Þ3 d

ð4Þ3 d

ð5Þ3 d

ð6Þ3 d

ð7Þ3 d

ð8Þ3 d

ð9Þ3

h i;


ð3Þ2

h i

to denote the d values of the fifth, the fourth, the third and the second layer, respectively.

Since the desired output of the neural network are the values d1 ¼ d2 ¼ d3 ¼ 0 it can be easily seen that


dð1Þ5 ¼ ðd1 � o1Þf

ð1Þ05 ðt

ð1Þ5 Þ ¼ �o1f 0ðF1Þ ¼ �f ðF1Þf1� f 2ðF1Þg;


ð2Þ05 ðt

ð2Þ5 Þ ¼ �o2f 0ðF2Þ ¼ �f ðF2Þf1� f 2ðF2Þg;


ð1Þ05 ðt

ð3Þ5 Þ ¼ �o3f 0ðF1Þ ¼ �f ðF3Þf1� f 2ðF3Þg:

Having estimated the components of the d5 vector, we can now estimate the components of the d4 vector (namely the deltavalues of the fourth layer) by using the equation

dðkÞxi4 ¼

X3j¼1

W ðk;jÞ45 dðjÞ5

!nðkÞxi4 ; k ¼ 1; . . . ;19; where n

ðkÞxi4 ¼

@f ðkÞ4@tðkÞ4

@tðkÞ4@xi

:

Since these neurons produce nonlinear terms we have generally to estimate more than one delta values for each one them,and for each independent variable participating to the nonlinear term they produce.

In the above equation, there are many function derivatives with respect to x; y and z; these derivatives are estimated as

� nðkÞxi4 ¼ 1 for ðk; iÞ ¼ fð1;1Þ; ð8;2Þ; ð17;3Þg with nðkÞxj4 ¼ 0 for j – i.

� nðkÞxi4 ¼ 2xi for ðk; iÞ ¼ fð2;1Þ; ð9;2Þ; ð18;3Þg with nðkÞxj4 ¼ 0 for j – i.

� nðkÞxi4 ¼ 3x2i for ðk; iÞ ¼ fð3;1Þ; ð10;2Þ; ð19;3Þg with nðkÞxj4 ¼ 0 for j – i.

� nð7Þxi4 ¼ xjx‘ for ði; j; ‘Þ ¼ fð1;2;3Þ; ð2;1;3Þ; ð3;1;2Þg.� nðkÞxi4 ¼ xj for ðk; i; jÞ ¼ fð4;1;2Þ; ð4;2;1Þ, ð11;2;3Þ; ð11;3;2Þ; ð14;1;3Þ; ð14;3;1Þg.� nðkÞxi4 ¼ 0 for ðk; iÞ ¼ fð4;3Þ; ð11;1Þ; ð14;2Þg.� nðkÞxi4 ¼ x2j for ðk; i; jÞ ¼ fð5;2;1Þ; ð6;1;2Þ, ð12;2;3Þ; ð13;3;2Þ; ð15;1;3Þ; ð16;3;1Þg.� nðkÞxi4 ¼ 2xixj for ðk; i; jÞ ¼ fð5;1;2Þ; ð6;2;1Þ, ð12;3;2Þ; ð13;2;3Þ; ð15;3;1Þ; ð16;1;3Þg.� nðkÞxi4 ¼ 0 for ðk; iÞ ¼ fð5;3Þ; ð6;3Þ, ð12;1Þ; ð13;1Þ; ð15;2Þ; ð16;2Þg.

Based on the above expressions,we get the results

dð1Þx14 ¼

X3j¼1

aj;17dðjÞ5 ; dð1Þx24 ¼ 0;

dð1Þx34 ¼ 0; d

ð2Þx14 ¼ 2x1

X3j¼1

aj;14dðjÞ5 ;

dð2Þx24 ¼ 0; d

ð2Þx34 ¼ 0;

dð3Þx14 ¼ 3x21

X3j¼1


dð3Þx34 ¼ 0; d

ð4Þx14 ¼ x2

X3j¼1

aj;11dðjÞ5 ;

dð4Þx24 ¼ x1

X3j¼1


dð5Þx14 ¼ 2x1x2

X3j¼1

aj;4dðjÞ5 ; dð5Þx24 ¼ x21

X3j¼1

aj;4dðjÞ5 ;

dð5Þx34 ¼ 0; d

ð6Þx14 ¼ x22

X3j¼1

aj;5dðjÞ5 ;

dð6Þx24 ¼ 2x1x2

X3j¼1

aj;5dðjÞ5 ; d

ð6Þx34 ¼ 0;

dð7Þx14 ¼ x2x3

X3j¼1

aj;10dðjÞ5 ; d

ð7Þx24 ¼ x1x3

X3j¼1

aj;10dðjÞ5 ;


dð7Þx34 ¼ x1x2

X3j¼1


dð8Þx24 ¼

X3j¼1


dð9Þx14 ¼ 0; d

ð9Þx24 ¼ 2x2

X3j¼1

aj;15dðjÞ5 ;

dð9Þx34 ¼ 0; d

ð10Þx14 ¼ 0;

dð10Þx24 ¼ 3x22

X3j¼1


dð11Þx14 ¼ 0; d

ð11Þx24 ¼ x3

X3j¼1

aj;13dðjÞ5 ;

dð11Þx34 ¼ x2

X3j¼1

aj;13dðjÞ5 ; d

ð12Þx14 ¼ 0;

dð12Þx24 ¼ x23

X3j¼1

aj;9dðjÞ5 ; d

ð12Þx34 ¼ 2x2x3

X3j¼1

aj;9dðjÞ5 ;

dð13Þx14 ¼ 0; d

ð13Þx24 ¼ 2x2x3

X3j¼1

aj;8dðjÞ5 ;

dð14Þx14 ¼ x3

X3j¼1

aj;12dðjÞ5 ; dð14Þx24 ¼ 0;

dð15Þx14 ¼ x23

X3j¼1


dð16Þx14 ¼ 2x1x3

X3j¼1


dð17Þx14 ¼ 0; d

ð17Þx24 ¼ 0;

dð18Þx14 ¼ 0; d

ð18Þx24 ¼ 0;

dð19Þx14 ¼ 0; d

ð197Þx24 ¼ 0;

dð13Þx34 ¼ x22

X3j¼1

aj;8dðjÞ5 ;

dð14Þx34 ¼ x1

X3j¼1

aj;12dðjÞ5 ;

dð15Þx34 ¼ 2x1x3

X3j¼1

aj;7dðjÞ5 ;


dð16Þx34 ¼ x21

X3j¼1

aj;6dðjÞ5 ;

dð17Þx34 ¼

X3j¼1

aj;19dðjÞ5 ;

dð18Þx34 ¼ 2x3

X3j¼1

aj;16dðjÞ5 ;

dð19Þx34 ¼ 3x23

X3j¼1

aj;3dðjÞ5 :

In the next step we use these values to estimate the parameters dðiÞ3 (i ¼ 1;2;3; . . . ;9) via the expression

dðiÞ3 ¼

X19j¼1

W ði;jÞ34 dðiÞx14 ; i ¼ 1;2;3;

X19j¼1

W ði;jÞ34 dðiÞx24 ; i ¼ 4;5;6;

X19j¼1

W ði;jÞ34 dðiÞx34 ; i ¼ 7;8;9:

8>>>>>>>>>>><>>>>>>>>>>>:

The results are the following

dð1Þ3 ¼ d

ð1Þx14 þ d

ð4Þx14 þ d

ð6Þx14 þ d

ð7Þx14 þ d

ð14Þx14 þ d

ð15Þx14

¼X3j¼1

aj;17dðjÞ5 þ x2X3j¼1



aj;5dðjÞ5 þ x2x3X3j¼1



aj;7dðjÞ5 ;

dð2Þ3 ¼ d

ð2Þx14 þ d

ð5Þx14 þ d

ð16Þx14 ¼ 2x1

X3j¼1

aj;14dðjÞ5 þ 2x1x2X3j¼1


aj;6dðjÞ5

dð3Þ3 ¼ d

ð3Þx14 ¼ 3x21

X3j¼1

aj;1dðjÞ5

dð4Þ3 ¼ d

ð4Þx24 þ d

ð5Þx24 þ d

ð7Þx24 þ d

ð8Þx24 þ d

ð11Þx24 þ d

ð12Þx24

¼ x1X3j¼1

aj;11dðjÞ5 þ x21

X3j¼1

aj;4dðjÞ5 þ x1x3

X3j¼1

aj;10dðjÞ5 þ

X3j¼1

aj;18dðjÞ5 þ x3

X3j¼1

aj;13dðjÞ5 þ x23

X3j¼1

aj;9dðjÞ5 ;

dð5Þ3 ¼ d

ð6Þx24 þ d

ð9Þx24 þ d

ð13Þx24 ¼ 2x1x2

X3j¼1

aj;5dðjÞ5 þ 2x2

X3j¼1

aj;15dðjÞ5 þ 2x2x3

X3j¼1

aj;8dðjÞ5 ;

dð6Þ3 ¼ d

ð10Þx24 ¼ 3x22

X3j¼1

aj;2dðjÞ5

dð7Þ3 ¼ d

ð7Þx34 þ d

ð11Þx34 þ d

ð13Þx34 þ d

ð14Þx34 þ d

ð16Þx34 þ d

ð17Þx34

¼ x1x2X3j¼1





aj;6dðjÞ5 þX3j¼1

aj;19dðjÞ5 ;

dð8Þ3 ¼ d

ð12Þx34 þ d

ð15Þx34 þ d

ð18Þx34 ¼ 2x2x3

X3j¼1


aj;7dðjÞ5 þ 2x3X3j¼1

aj;16dðjÞ5

dð9Þ3 ¼ d

ð19Þx34 ¼ 3x23

X3j¼1

aj;3dðjÞ5 :


Finally, the values of dðiÞ2 ði ¼ 1;2;3Þ are estimated via the equation

dðiÞ2 ¼

X9j¼1

W ði;jÞ23 dðiÞ3 ði ¼ 1;2;3Þ

and the results are

dð1Þ2 ¼ d

ð1Þ3 þ d

ð2Þ3 þ d

ð3Þ3 ¼ 3x21

X3j¼1

aj;1 þ 2x1x2X3j¼1

aj;4 þ x22X3j¼1

aj;5 þ 2x1x3X3j¼1

aj;6 þ x23X3j¼1

aj;7 þ x2x3X3j¼1

aj;10

þ x2X3j¼1

aj;11 þ x3X3j¼1

aj;12 þ 2x1X3j¼1

aj;14 þX3j¼1

aj;17

!dðjÞ5 ¼

X3j¼1

@Fjðx1; x2; x3Þ@x1

� �dðjÞ5 ;

dð2Þ2 ¼ d

ð4Þ3 þ d

ð5Þ3 þ d

ð6Þ3 ¼ 3x22

X3j¼1

aj;2 þ x21X3j¼1

aj;4 þ 2x1x2X3j¼1

aj;5 þ 2x2x3X3j¼1

aj;8 þ x23X3j¼1

aj;9 þ x1X3j¼1

aj;11

þ x3X3j¼1

aj;13 þ 2x2X3j¼1

aj;15 þX3j¼1

aj;18 þ x1x3X3j¼1

aj;10

!dðjÞ5 ¼

X3j¼1


� �dðjÞ5 ;

dð3Þ2 ¼ d

ð7Þ3 þ d

ð8Þ3 þ d

ð9Þ3 ¼ 3x23

X3j¼1

aj;3 þ x21X3j¼1

aj;6 þ 2x1x3X3j¼1

aj;7 þ x22X3j¼1

aj;8 þ 2x2x3X3j¼1

aj;9 þ x1x2X3j¼1

aj;10

þ x1X3j¼1

aj;12 þ x2X3j¼1

aj;13 þ 2x3X3j¼1

aj;16 þX3j¼1

aj;19

!dðjÞ5 ¼

X3j¼1


� �dðjÞ5 :

6.3. Update of the synaptic weights

Proof. According to the back propagation algorithm, the update of the two variable weights W ðiÞ12 ¼ xi ði ¼ 1;2;3Þ isperformed as

fW ðiÞ12gðkþ1Þ ¼ fW ðiÞ12g

ðkÞ þ bdðiÞ2 ¼ fWðiÞ12g

ðkÞ þ bX3j¼1

@Fj@xi

dðjÞ5

or equivalently, as


ðkÞ � bX3j¼1

f ðFjÞð1� f 2ðFjÞÞ@Fj@W ðiÞ12

;

where we used the expressions for the dðiÞ2 and dðjÞ5 parameters.

From the definition of the mean square error of the back propagation algorithm

E ¼ 12

X3j¼1ðdj � ojÞ2 ¼

12

X3j¼1

o2j ¼12

X3j¼1

f 2ðFjÞ

it can be easily seen that

@E

@W ðiÞ12¼X3j¼1

f ðFjÞf 0ðFjÞ@Fj@W ðiÞ12

¼X3j¼1

f ðFjÞð1� f 2ðFjÞÞ@Fj@W ðiÞ12

and therefore the weight update equation gets the form


ðkÞ � b @E@W ðiÞ12

that completes the proof. h

7. Experimental results and comparison with other methods

To test the validity of the neural model and evaluate its precision and performance, typical 3� 3 nonlinear systems wereconstructed and solved using the network described above. The results and the analysis of the associated simulations, arepresented below.


Example 1. The first system we present has eight real roots and it consists of the following equations:

F1ðx1; x2; x3Þ ¼ x31 þ x32 þ x33 � x21x2 þ x1x22 � x22x3 þ x1x2x3 � x1 � x2 � x3 ¼ 0;F2ðx1; x2; x3Þ ¼ 0:5x31 þ 0:5x32 þ 0:5x33 � x21x2 þ x1x22 � x22x3 þ x1x2x3 � 0:5x21 � 0:5x22

� 0:5x23 � 0:5x1 � 0:5x2 � 0:5x3 þ 1:5 ¼ 0;F3ðx1; x2; x3Þ ¼ x31 þ x32 þ x33 þ x21x2 � x1x22 þ x22x3 � x1x2x3 þ x21 þ x22 þ x23 � x1 � x2 � x3 � 3 ¼ 0

The solution of this system by means of symbolic packages such as Mathematica, reveals the existence of 8 discrete rootswith values

1. ðx1; x2; x3Þ ¼ ð�1;�1;�1Þ.2. ðx1; x2; x3Þ ¼ ð�1;�1;þ1Þ.3. ðx1; x2; x3Þ ¼ ð�1;þ1;�1Þ.4. ðx1; x2; x3Þ ¼ ðþ1;�1;þ1Þ.5. ðx1; x2; x3Þ ¼ ðþ1;þ1;�1Þ.6. ðx1; x2; x3Þ ¼ ðþ1;þ1;þ1Þ.7. ðx1; x2; x3Þ ¼ ð�1:224744;�2E� 8;1:224744Þ.8. ðx1; x2; x3Þ ¼ ð1:224744;2E� 8;�1:224744Þ.

To solve the above system, the neural model run many times with different initial conditions in the intervals�3 6 x1 6 2:5, �3 6 x2 6 1:4, and �3 6 x3 6 0:5 with a variation step Dx1 ¼ Dx2 ¼ Dx3 ¼ 0:1. Therefore, the model run207375 times to identify the basin of attraction of each root and to calculate the percentage of the initial conditionsassociated with each root.

The operation of the neural network was simulated in MATLAB, and the components x1, x2 and x3 of each root wereestimated by using recurrently the last equation of the previous section. The learning rate for the back propagation algorithmwas b ¼ 0:005. Since the speed of convergence was too fast, the maximum number of iterations was set to N ¼ 10000;therefore after 10000 iterations, the system was characterized as nonconvergent if it was unable to reach a root, with theprocess to move to the next point of initial conditions. For each such point, the network either reached one of the eightsystem roots (it is important to say that the network was able to identify all roots) or it was unable to estimate one of thoseroots. The basin of attraction for the eight roots as well as the number of points of that basin are depicted graphically in Figs.3–6. Since these figures are three dimensional, in the above plots the projections x1 � x2, x1 � x3 and x2 � x3 are shown. Eventhough MATLAB supports the construction of 3D plots, these 2D graphs are more suitable for preview and they can be veryeasily extended in three dimensions.

The basin of attraction of a root is defined as the fraction of the parameter space points ðx1; x2; x3Þ – namely, the values ofthe initial conditions – for which the neural network converged to that root. The ratio of the number of these basin points tothe total number of the tested initial condition points multiplied by 100, gives the percentage defined above. For each one ofthe estimated roots, the best simulation run with respect to the minimal iteration number was identified.

The experimental results associated with this system are shown in Tables 2 and 3. Table 2 contains for each one of theidentified roots, the number of systems converged to that root, the average number of iterations and the mean square erroras well as the standard deviation of those values, and the average value of the x1, x2 and x3 components for each root. On theother hand, the data of Table 3 are associated with the best run for each root, with the criterion for the best system to be theminimum number of iterations. The values shown for each best run, are the initial conditions ðx01; x02; x03Þ, the number ofiterations, as well as the values of the functions F1ðx1; x2; x3Þ, F2ðx1; x2; x3Þ and F3ðx1; x2; x3Þ. For all best runs the value of MSEwere equal to E� 16.

Fig. 3. Number of basin points for each one of the roots of System 1 and the nonconverging behavior, and the associated pie chart.

Fig. 4. The basin of attraction for the roots 1,2 and 3 of the System 1 in the intervals defined above. For each root, the projections x1 � x2, x1 � x3, and x2 � x3are shown in the top, middle, and bottom figure, respectively.

Fig. 5. The basin of attraction for the roots 4,5 and 6 of the System 1 in the intervals defined above. For each root, the projections x1 � x2, x1 � x3, and x2 � x3are shown in the top, middle, and bottom figure, respectively.


Fig. 6. The basin of attraction for the roots 7 and 8 of the System 1 in the intervals defined above. For each root, the projections x1 � x2, x1 � x3, and x2 � x3are shown in the top, middle, and bottom figure, respectively.


The variation of the iteration number with the learning rate for the first example system is shown in Fig. 7. Due to thelarge variation of the iteration number for the eight roots of the system, the vertical axis is not in a linear scale as thehorizontal axis but in a logarithmic one. From this figure it is clear that in most cases the number of iterations decreases, as alearning rate increases, a fact, that it is of course expected. However, it has to be mentioned that for greater values of learningrate, the neural network could not be able to converge to some root, and therefore, the allowable learning rate values aremuch smaller than the typical value of 0.5–0.7 associated with the classical back-propagation value. This relatively smallvalues for the learning rate, leads to relatively large number of iterations, compared with the iteration number associatedwith other methods as it is pointed out in the concluding section.

Example 2. The second system we present has four real roots and it consists of the following equations:

F1ðx1; x2; x3Þ ¼ x31 þ x32 þ x33 � 3x21 � 2x22 � 2x23 þ 2x1 þ x2 þ x3 ¼ 0;F2ðx1; x2; x3Þ ¼ x31 þ x32 þ x33 � x21 � 5x22 � x23 þ 8x2 � 4 ¼ 0;F3ðx1; x2; x3Þ ¼ x31 þ x32 þ x33 � 4x21 � 4x22 � 5x23 þ 4x1 þ 5x2 þ 8x3 � 6 ¼ 0:

The solution of this system with traditional methods reveals the existence of four roots with values

1. ðx1; x2; x3Þ ¼ ð1:428042;0:384132;1:383866Þ.2. ðx1; x2; x3Þ ¼ ð1:107007;1:008547;0:568966Þ.

Table 2The experimental results for the eight roots of the first algebraic system.

Root no. AVG iteration STDEV iteration AVG MSE STDEV MSE

1 01451 04383:449 9:33 � 10�17 2:58 � 10�17

2 19804 17287:220 2:66 � 10�15 1:24 � 10�13

3 19688 17374:550 3:67 � 10�15 1:53 � 10�13

4 24297 16527:800 3:81 � 10�15 1:29 � 10�13

5 23075 15753:260 3:71 � 10�15 1:20 � 10�13

6 3193 10744:260 8:89 � 10�17 3:23 � 10�17

7 19162 16127:700 3:97 � 10�15 1:52 � 10�13

8 19121 16205:900 3:50 � 10�15 1:56 � 10�13

Root AVG AVG AVGno. x1 x2 x3

1 �1 �1 �12 �1 �1 þ13 �1 þ1 �14 þ1 �1 þ15 þ1 þ1 �16 þ1 þ1 þ17 �1:224744 4 � 10�9 1:2247448 1:224744 8 � 10�9 �1:224744

Table 3The results for the best run for each one of the eight roots of the first algebraic system.

Root x01 x02 x

03

Iterations F1 F2 F3

1 �1:1 �1:1 �1:1 0138 �10�8 �10�8 02 �1:0 �0:5 þ1:1 6224 �10�8 �10�8 03 �1:1 þ0:7 �1:3 6188 �10�8 �10�8 04 þ0:5 �1:4 þ0:7 7357 þ10�8 �10�8 05 þ1:0 þ0:7 �1:7 7045 �10�8 þ10�8 06 þ1:1 þ1:1 þ1:1 0054 0 þ10�8 07 �1:2 þ0:1 þ1:7 5434 þ10�8 �10�8 08 þ0:6 �0:3 �1:2 5615 þ10�8 �10�8 0

Fig. 7. The variation of the iteration number as a function of the learning rate for the first example system.


3. ðx1; x2; x3Þ ¼ ð�0:020060;0:856390;1:143860Þ.4. ðx1; x2; x3Þ ¼ ð0;1;1Þ.

Fig. 8. Number of basin points for each one of the roots of System 2 and the nonconverging behavior, and the associated pie chart.




Table 4The experimental results for the four roots of the second algebraic system.

Root no. AVG iteration STDEV iteration AVG MSE STDEV MSE AVG x1 AVG x2 AVG x3

1 05755 16561:010 10�16 3:82 � 10�32 1:428042 0:384133 1:383867

2 06778 15292:079 10�16 5:71 � 10�30 1:070070 1:008547 0:5689663 09366 15683:740 1:03 � 10�16 9:22 � 10�17 �0:020060 0:856387 1:1438644 15433 20414:910 1:01 � 10�16 4:45 � 10�17 9:52 � 10�9 1 1


As in the previous case, the neural model solved the system with a lot of different initial conditions in order to identify allthe four roots and estimate the basin of attraction for each one of them. More specifically, the parameters x1, x2 and x3 werevaried in the interval �3 6 xi 6 3 with a variation step Dxi ¼ 0:2 ði ¼ 1;2;3Þ. The learning rate value was b ¼ 0:05 and themaximum number of iterations was set to N ¼ 100;000. The percentage of the basin points for the four roots and thenonconverging behavior are shown in Fig. 8, while the projections x1 � x2, x1 � x3 and x2 � x3 are plotted in Figs. 9 and 10.Tables 4 and 5 are similar to Tables 2 and 3 respectively and contain the data of the statistical analysis applied to the secondsystem.

Table 5The results for the best run for each one of the four roots of the second algebraic system.

Root x01 x02 x

03

Iterations F1 F2 F3

1 þ1:2 1 þ1:6 0304 �2 � 10�8 0 02 þ1:4 1 �0:2 0734 �2 � 10�8 �10�8 03 þ0:2 0 þ0:8 1992 þ2 � 10�8 0 �10�8

4 �0:8 1 þ1:0 1930 þ2 � 10�8 0 �10�8

Fig. 11. The variation of the iteration number as a function of the learning rate for the second example system.

Table 6Simulation results for the proposed neural method, the trust-region-dogleg as well as the trust-region-reflective algorithms. In the results associated with theneural method the last column represents the learning rate associated with the run gave the presented results.

Root x1 x2 x3 Iterations F1 F2 F3 LRate

Neural based nonlinear system solver1 �1:000000 �1:000000 �1:000000 0006 �10�8 �10�8 10�8 0:0402 �1:000000 �1:000000 þ1:000000 0797 10�8 �10�8 �10�8 0:0503 �1:000000 þ0:999999 �0:999999 0886 �10�8 �10�8 0 0:0504 þ0:999999 �0:999999 þ1:000000 1397 �10�8 þ10�8 0 0:0455 þ1:000000 þ0:999999 �0:999999 1602 �10�8 þ10�8 0 0:0456 þ1:000000 þ1:000000 þ1:000000 0010 þ10�8 0 þ10�8 0:0157 �1:224744 �0:000000 þ1:224744 1524 �10�8 þ10�8 0 0:0308 þ1:224744 �0:000000 �1:224744 0754 �10�8 �10�8 �10�8 0:030

F1 � 10�a F2 � 10�a F3 � 10�a a

Trust-region-dogleg algorithm1 �0:999982 �0:999999 �1:000017 15 �0:123602 �0:062667 �0:180998 082 �0:999999 �1:000000 þ0:999999 07 �0:105471 �0:071054 �0:004440 133 �1:000000 þ1:000000 �0:999999 05 �0:000000 þ0:666133 þ0:444089 154 þ1:000000 �0:999999 þ1:000000 05 þ0:174749 þ0:013256 þ0:373923 115 þ0:999999 þ0:999999 �1:000000 05 �0:654587 �0:462963 �0:385114 116 þ0:999982 þ0:999999 þ1:000023 15 þ0:298266 þ0:122312 þ0:317619 087 �1:224744 �0:000000 þ1:224744 05 þ0:888178 þ0:444089 �0:888178 158 þ1:224744 �0:000000 �1:224744 05 �0:103483 �0:068491 �0:069544 10

Trust-region-reflective algorithm1 �0:998635 �1:001183 �1:000180 12 �0:998635 �1:001183 �1:000180 042 �1:000000 �1:000000 þ1:000000 06 þ0:222044 þ0:444089 �0:000000 153 �1:000000 þ1:000000 �1:000000 05 þ0:444089 þ0:444089 þ0:000000 154 þ1:000000 �0:999999 þ1:000000 05 þ0:179523 þ0:005684 þ0:357758 115 þ0:999999 þ0:999999 �1:000000 05 �0:648014 �0:450794 �0:383248 116 þ0:999999 þ1:000000 þ1:000000 07 þ0:088817 þ0:000000 þ0:222044 147 �1:224744 �0:000000 þ1:224744 05 þ0:000000 þ0:222044 þ0:440089 158 þ1:224744 �0:000000 �1:224744 07 þ0:932587 þ0:999200 �0:799360 14


The variation of the iteration number with the learning rate for the second example system is shown in Fig. 11 and it ischaracterized by the same features as in the case of the first example system discussed above. It can be easily seen that thevariation of the learning rate is shown more clearly in Fig. 11 than Fig. 7 since the carves are shown in lin-lin axes and not inlin-log axes (as in Fig. 7) due to the large spreading of the iteration number for all the available roots.

Table 7Simulation results for the Levenberg–Marquardt algorithm as well as the multivariate Newton–Raphson method with partial derivatives. The k parameter ofthe Levenberg–Marquardt algorithm has the default value k ¼ 0:01.

Root x1 x2 x3 Iterations F1 � 10�a F2 � 10�a F3 � 10�a a

Levenberg–Marquardt algorithm1 �1:000000 �0:999999 �0:999999 04 �0:024558 þ0:007482 �0:112176 112 �1:000000 �1:000000 þ1:000000 07 þ0:341948 �0:421884 �0:133226 133 �1:000000 þ1:000000 �1:000000 05 þ0:177635 �0:444089 �0:266453 144 þ0:999999 �0:999999 þ1:000000 05 þ0:236477 þ0:190514 þ0:607514 125 þ0:999999 þ1:000000 �1:000000 05 �0:515010 �0:675504 �0:469135 116 þ0:999982 þ1:000000 þ1:000000 04 þ0:113242 �0:008881 þ0:275335 137 �1:224744 �0:000000 þ1:224744 05 þ0:888178 þ0:444089 �0:000000 158 þ1:224744 �0:000000 �1:224744 10 �0:187116 þ0:235652 þ0:073100 09

Multivariate Newton–Raphson method with partial derivatives1 �1:000000 �1:000000 �1:000000 2865 �0:444089 �0:222044 �0:444089 152 þ1:000000 �1:000000 þ1:000000 0009 �0:444089 þ0:000000 þ0:888178 153 �1:000000 þ1:000000 �1:000000 0005 þ0:000000 þ0:000000 þ0:000000 004 þ1:000000 �1:000000 þ1:000000 0006 �0:044408 �0:044408 þ0:177635 145 þ1:000000 þ1:000000 �1:000000 0006 �0:111022 þ0:000000 þ0:000000 156 þ1:000000 �1:000000 þ1:000000 0661 þ0:000000 þ0:000000 0:0000000 007 �1:224744 �0:000000 þ1:224744 0005 �0:155431 �0:128785 �0:479616 138 �0:999999 �0:999999 �1:000000 0045 �0:888178 þ0:666133 �0:888178 15


After the presentation of the experimental results that demonstrate the application of the proposed neural solver, let usproceed now to a comparison between the proposed method and other well known methods with respect to theirperformance. Since the proposed algorithm is an iterative one, we chose for this comparison well known iterative numericalalgorithms such as the trust-region-dogleg [3], the trust-region-reflective [3] and the Levenberg–Marquardt algorithm[15,20], as well as the multivariate Newton–Raphson method with partial derivatives. The first three algorithms aresupported by the fsolve function of the Optimization Toolbox of the MATLAB programming environment, while the Newton–Raphson MATLAB implementation can be found in the literature – for a theoretical description of this algorithm, see forexample [28]. These algorithms were used to solve the problem using the initial conditions ðx0; y0; z0Þ of Table 3 associatedwith the best run (to save pages, this comparison is restricted only to the first example system with eight roots, since itsextension to the second example system is straightforward).

The simulation results of these comparisons can be found in Tables 6 and 7. Table 6 contains the results of the proposedneural solver as well as the trust-region-dogleg and trust-region-reflective algorithms. On the other hand, Table 7 containsthe results of the Levenberg–Marquardt and Newton–Raphson methods. Regarding the results associated with the proposedalgorithm, the last column contains the learning rate value that gave the associated iteration number for the specified initialconditions, while for the other methods this column contains the order of magnitude of the functions evaluation in theposition of the estimated root.

From Tables 6 and 7 it can be easily seen that even though the neural based approach gave the correct results and withthe same accuracy, it requires much more iterations than the other methods, due to the small learning rate values that allowthe network to converge (see the discussion above). Of course, this situation can be improved somehow by adjustingappropriately the learning rate value, as it can be easily seen by comparing the number of iterations in Tables 3 (a fixedlearning rate value b ¼ 0:05) and 6 (the optimum learning rate identified via experimentation). In some cases (namely forsome methods and some roots) the neural method gave more accurate results than the other methods. From the last row inTable 7 it can be seen that the Newton–Raphson method converged to the Root 1 instead of Root 8, possibly because theassociated initial condition is located near the boundary of the basin of attraction of these two roots. Besides thisdisadvantage of the neural based approach (which, actually is not a major problem since modern computing systems arecharacterized by very high speeds, and in most cases the duration of simulation time is not an issue), the proposed method iseasy in its implementation (since it uses the classical back propagation approach and therefore it computes the functionvalues in the first pass and the values of the partial derivatives via the estimation of d parameters), in contrast with the othermethods that are more difficult to implement since they require a lot of mathematics as well as complicated operations suchas Jacobian evaluations at specified points.

8. Conclusions and future work

The objective of this research was the numerical estimation of the roots of complete 3� 3 nonlinear algebraic systems ofpolynomial equations using back-propagation neural networks. The main advantage of this approach is the simple andstraightforward solution of the system, by building a structure that simulates exactly the nonlinear system under consider-ation and find its roots via the classical back propagation approach. Depending on the position on the parameter space of theinitial condition used in each case, each run converged to one of the eight possible solutions or diverged to infinity; therefore,


the proposed neural structure is capable of finding all the roots of such a system, although this cannot be done in only onestep. One of the tasks associated with the future work on this subject is to improve or redesign the neural solver in order tofind all roots in only one run.

This research is going to be extended to cover systems with more dimensions and different types of equations of non-polynomial nature. Special cases have to be investigated – as the case of double roots (and in general of roots with arbitrarymultiplicity) and the associated theory has to be formulated. Finally, this structure can be extended to cover the case of com-plex roots and complex coefficients in nonlinear algebraic systems.

References

[1] A. Galantai, A. Jeney, Quasi-newton abs methods for solving nonlinear algebraic systems of equations, J. Optim. Theory Appl. 89 (3) (1996) 561–573.[2] Chistine Jager, K.N. Dietmar Ralz, P. Pau, A combined method for enclosing all solutions of nonlinear systems of polynomial equations, Rel. Comput. 1

(1) (1995) 41–64.[3] N.R. Conn, N.G., P. Toint, n.d. Trust-region methods, MPS/SIAM Series on Optimization.[4] E. Spedicato, E. Bodon, A.P.N.M.-A. Abs methods and abspack for linear systems and optimization, a review, in: Proceedings of the third Seminar of

Numerical Analysis, Zahedan, November 15/17 2000.[5] E. Spedicato, Z. Huang, Numerical experience with newton-like methods for nonlinear algebraic systems, Computing 58 (1997) 69–89.[6] C. Grosan, A. Abraham, Solving nonlinear equation systems using evolutionary algorithms, IEEE Trans. Syst. Man Cybernet. A 38 (3) (2008).[7] I. Emiris, B.M. Vrahatis, Sign methods for counting and computing real roots of algebraic systems, Technical Report RR-3669, INRIA, Sophia Antipolis,

1999.[8] J. Abaffy, A.E. Spedicato, The local convergence of abs methods for nonlinear algebraic equations, Numer. Math. 51 (1987) 429–439.[9] J. Abaffy, A. Galantai, Conjugate direction methods for linear and nonlinear systems of algebraic equations, in: P. Rezsa, D. Greenspan (Eds.), Numerical

Methods, Colloquia Mathematica Soc, Amsterdam, 1987, pp. 481–502.[10] J. Abaffy, C.E. Spedicato, A class of direct methods for linear systems, Numerische Mathematik 45 (1984) 361–376.[11] J. Abaffy, E. Spedicato, ABS Projection Algorithms: Mathematical Techniques for Linear and Nonlinear Equations, Ellis Horwood 1989.[12] Jiwei Xue, Yaohui Zi, Y.F.-L.Y., Z. Lin, An intelligent hybrid algorithm for solving nonlinear polynomial systems, in: Proceedings of International

Conference on Computational Science, LCNS 3037, 2004, pp. 26–33.[13] K.G. Margaritis, M. Adamopoulos, K.G.-D.E. Artificial neural networks and iterative linear algebra methods, Parallel Algor. Appl. 3 (1) (1994) 31–44.[14] K.G. Margaritis, M. Adamopoulos, K. Solving linear systems by artificial neural network energy minimisation, Universily of Macedonia Annals (vol. XII),

1993, pp. 502–525.[15] K. Levenberg, A method for the solution of certain problems in least squares, Quart. Appl. Math. 2 (1944) 164–168.[16] G. Li, Z. Zeng, A neural network algorithm for solving nonlinear equation systems, Proceedings of 20th International Conference on Computational

Intelligence and Security, Los Alamito, USA, 2008. pp. 20–23.[17] D. Luo, Z. Han, Solving nonlinear equation systems by neural networks, Proc. Int. Conf. Syst. Man Cybernet. 1 (1995) 858–862.[18] A. Margaris, M. Adamopoulos, Solving nonlinear algebraic systems using artificial neural networks, Proceedings of the 10th International Conference

on Engineering Applications of Artificial Neural Networks, Thessaloniki, Greece, 2007. pp. 107–120.[19] A. Margaris, K. Goulianas, Finding all roots of (2 � 2) nonlinear algebraic systems using backpropagation neural networks, Neural Comput. Appl.

21(2012) 891–904.[20] D. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, IAM, J. Appl. Math 11 (1963) 431–441.[21] K. Mathia, R. Saeks, Solving nonlinear equations using recurrent neural networks, World Congress on Neural Networks, Washington DC, USA, 1995.[22] P. Mejzlik, A bisection method to find all solutions of a system of nonlinear equations, Proceedings of the Seventh International Conference on Domain

Decomposition, October 27–30, 1993, The Pennsylvania State University, pp.277–282.[23] D. Mishra, P.K. Kalva, Modified hopfield neural network approach for solving nonlinear algebraic equations, Eng. Lett. 14 (2007) (1).[24] M.W. Smiley, C. Chun, An algorithm for finding all solutions of a nonlinear system, J. Comput. Appl. Math. 137 (2) (2001) 293–315.[25] T-F. Tsai, M.-H.L. Finding all solutions of systems of nonlinear equations with free variables, Eng. Optim. 39 (6) (2007) 649–659.[26] V. Dolotin, A. Morozov, Introduction to Nonlinear Algebra, World Scientific Publishing Company, 2007.[27] V.N. Kublanovskaya, V.N. Simonova, An approach to solving nonlinear algebraic systems 2, J. Math. Sci. (3) (1996) 1077–1092.[28] W. Press, S. Teukolsky, W.B. Numerical Recipes in C – The Art of Scientific Programming, 2nd ed., Cambridge University Press, 1992.[29] S.O. Yusuke Nakaya, Find all solutions of nonlinear systems of equations using linear programming with guaranteed accuracies, J. Univ. Comput. Sci. 4

(2) (1998) 171–177.[30] G. Zhang, L. Bi, Existence of solutions for a nonlinear algebraic system, Discrete Dynamics in, Nature and Society, 2009.

Finding all real roots of 3×3 nonlinear algebraic systems using neural networks1 Introduction2 Nonlinear algebraic systems3 Review of previous work4 ANNs as nonlinear system solvers5 The neural model for the ? complete nonlinear algebraic system6 Building the back propagation equations6.1 Forward pass6.2 Backward pass – estimation of the δ paramete6.3 Update of the synaptic weights

7 Experimental results and comparison with other methods8 Conclusions and future workReferences

applied mathematics and computationgouliana/files/research/2013... · 2015. 7. 6. · k. goulianas...

Documents