simplified neural networks for solving linear least squares and...

14
910 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5, NO. 6, NOVEMBER 1994 Linear Least Least Simplified Neural Networks for Solving Squares Problems in Real Time Squares and Total Andrzej Cichocki and Rolf Unbehauen, Fellow, ZEEE Abstract-In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifica- tions and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm [25] andlor the LMS (Adaline) Widrow-Hoff algorithms [21]. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results. I. INTRODUCTION ANY problems in science and technology involve solving a set of linear equations AxEb (1) M where A is the m x n data matrix and b is the m-dimensional observation vector. Usually, A and b are perturbated versions of the exact but unobservable data A0 and bo respectively, i.e., the exact relation is described by a matrix equation: (2) Such a description arises in a broad class of scientific and engineering problems. For example, in image and signal restoration the matrix A is the degradation matrix, in de- convolution problems it is the convolution matrix; in wave propagation problems in electromagnetics and acoustics, it is the propagation matrix. Moreover, every linear parameter estimation problem arising in automatic control, system iden- tification, signal processing, physics, biology and medicine gives rise to an overdetermined set of linear equations (1) [1]-[13], [23], [24]. The system of equations is sometimes underdetermined due to the lack of information, but often it is greatly overdetermined and usually inconsistent (i.e., it is self-contradictory) due to errors, noise and other perturba- tions. Generally speaking, in signal processing applications the Manuscript received December 13, 1992; revised July 26, 1993. This work was supported by the Deutsche Forschungsgemeinschaft, Bonn, Germany. A. Cichocki is with the Lehrstuhl fur Allgemeine and Theoretische Elek- trotechnik, University Erlangen-Numberg, Erlangen, Germany, on leave from the Warsaw University of Technology, Poland. R. Unbehauen is with the Lehrstuhl fur Allgemeine and Theoretische Elektrotechnik, University Erlangen-Niimberg, D-9 1085 Erlangen, Germany. IEEE Log Number 92132203. A~x = bo. overdetermined case (m > n) describes filtering, estimation of parameters, enhancement, deconvolution and identification problems, while the underdetermined (m < n) case describes inverse and extrapolation problems [3], [7], [lo], [231, [241. In many applications (e.g., robotics, signal processing, au- tomatic control) an on-line (i.e., in real time) solution of a system of linear equations is desired. For such real time applications, when the solution is to be obtained within a time of the order of hundred nanoseconds a digital computer often cannot comply with the desired computation time or its use is too expensive. One possible approach for solving such a problem is to employ analog artificial neural networks [SI-[22]. However, known methods and network architectures require to use at least n (n is the number of unknown variables) artificial neurons (processing units) [lo]-[ 131, [ 161. In many engineering applications, e.g., in image reconstruction and in computer tomography it is required to solve very large systems of linear algebraic equations with high speed and throughput rate [24], [26]-[28]. For such problems a known neural network architecture requires an extremely large number of processing units (artificial neurons) so that a practical hardware implementation of the neural network may be difficult, expensive and even impossible. In this paper we will propose a new analog artificial neural network containing only one simplified neuron (single adap- tive processing unit) with an on chip implemented adaptive leaming algorithm. Many researchers in different scientific disciplines have developed a class of efficient adaptive algorithms for solving systems of linear equations and related problems [ 11-[ 141, [18]-[28]. As early as 1937, Kaczmarz developed a very simple and efficient row-action projection (RAP) algorithm [25]. Because the systems of linear equations have been used in many diverse disciplines and problems the Kaczmarz RAP algorithm and its modifications has been independently and repeatedly rediscovered and in fact it appears under different names in different applications. For example, in the field of medical imaging for computerized tomography it is often called an algebraic reconstructed technique (ART) [28]; in the adaptive filtering literature it is known as the a-LMS algorithm or Widrow-Hoff delta rule, or at least it is very similar to this algorithm [21], [24]-[26], although the notations there are slightly different. The main objective of this paper is analog circuit de- signing of a neural network for implementing such adaptive 1045-9227/94$04.00 0 1994 IEEE

Upload: others

Post on 07-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 910 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5 , NO. 6, NOVEMBER 1994

    Linear Least Least Simplified Neural Networks for Solving

    Squares Problems in Real Time Squares and Total

    Andrzej Cichocki and Rolf Unbehauen, Fellow, ZEEE

    Abstract-In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifica- tions and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm [25] andlor the LMS (Adaline) Widrow-Hoff algorithms [21]. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results.

    I. INTRODUCTION ANY problems in science and technology involve solving a set of linear equations

    A x E b (1)

    M where A is the m x n data matrix and b is the m-dimensional observation vector. Usually, A and b are perturbated versions of the exact but unobservable data A0 and bo respectively, i.e., the exact relation is described by a matrix equation:

    (2)

    Such a description arises in a broad class of scientific and engineering problems. For example, in image and signal restoration the matrix A is the degradation matrix, in de- convolution problems it is the convolution matrix; in wave propagation problems in electromagnetics and acoustics, it is the propagation matrix. Moreover, every linear parameter estimation problem arising in automatic control, system iden- tification, signal processing, physics, biology and medicine gives rise to an overdetermined set of linear equations (1) [1]-[13], [23], [24]. The system of equations is sometimes underdetermined due to the lack of information, but often it is greatly overdetermined and usually inconsistent (i.e., it is self-contradictory) due to errors, noise and other perturba- tions. Generally speaking, in signal processing applications the

    Manuscript received December 13, 1992; revised July 26, 1993. This work was supported by the Deutsche Forschungsgemeinschaft, Bonn, Germany.

    A. Cichocki is with the Lehrstuhl fur Allgemeine and Theoretische Elek- trotechnik, University Erlangen-Numberg, Erlangen, Germany, on leave from the Warsaw University of Technology, Poland.

    R. Unbehauen is with the Lehrstuhl fur Allgemeine and Theoretische Elektrotechnik, University Erlangen-Niimberg, D-9 1085 Erlangen, Germany.

    IEEE Log Number 92132203.

    A ~ x = bo.

    overdetermined case (m > n) describes filtering, estimation of parameters, enhancement, deconvolution and identification problems, while the underdetermined (m < n) case describes inverse and extrapolation problems [3], [7], [lo], [231, [241.

    In many applications (e.g., robotics, signal processing, au- tomatic control) an on-line (i.e., in real time) solution of a system of linear equations is desired. For such real time applications, when the solution is to be obtained within a time of the order of hundred nanoseconds a digital computer often cannot comply with the desired computation time or its use is too expensive. One possible approach for solving such a problem is to employ analog artificial neural networks [SI-[22]. However, known methods and network architectures require to use at least n (n is the number of unknown variables) artificial neurons (processing units) [lo]-[ 131, [ 161. In many engineering applications, e.g., in image reconstruction and in computer tomography it is required to solve very large systems of linear algebraic equations with high speed and throughput rate [24], [26]-[28]. For such problems a known neural network architecture requires an extremely large number of processing units (artificial neurons) so that a practical hardware implementation of the neural network may be difficult, expensive and even impossible.

    In this paper we will propose a new analog artificial neural network containing only one simplified neuron (single adap- tive processing unit) with an on chip implemented adaptive leaming algorithm.

    Many researchers in different scientific disciplines have developed a class of efficient adaptive algorithms for solving systems of linear equations and related problems [ 11-[ 141, [18]-[28]. As early as 1937, Kaczmarz developed a very simple and efficient row-action projection (RAP) algorithm [25]. Because the systems of linear equations have been used in many diverse disciplines and problems the Kaczmarz RAP algorithm and its modifications has been independently and repeatedly rediscovered and in fact it appears under different names in different applications. For example, in the field of medical imaging for computerized tomography it is often called an algebraic reconstructed technique (ART) [28]; in the adaptive filtering literature it is known as the a-LMS algorithm or Widrow-Hoff delta rule, or at least it is very similar to this algorithm [21], [24]-[26], although the notations there are slightly different.

    The main objective of this paper is analog circuit de- signing of a neural network for implementing such adaptive

    1045-9227/94$04.00 0 1994 IEEE

  • CICHOCKl AND UNBEHAUEN SOLVING PROBLEMS IN REAL TIME 91 1

    algorithms. The second objective is to propose some exten- sions and modifications of the existing adaptive algorithms. The third objective is to demonstrate the validity and high per- formance of the proposed neural network models by computer simulation experiments.

    11. FORMULATION OF THE PROBLEM In general, without reference to any specific application, the

    problem can be stated as follows. Let us assume that we want to solve a set of linear algebraic equations written in scalar form as

    n

    or in matrix form

    Ax 2 b. (3b)

    Here, x is the n-dimensional unknown vector, b is the m- dimensional observation vector and A = [aij] is the m x n real coefficient matrix with known elements. Note, that the number of equations is generally not restricted to n; it can be less than, equal to or greater than the number of variables, i.e., the components of x. From a practical point of view it is usually required to find a (minimal norm) solution x* if an exact solution exists at all, or to find an approximate solution which comes as close as possible to the original one subject to a suitable optimality criterion. In the ordinary least squares (LS) approach to the problem (3a) or (3b) the measurements in the matrix A are assumed to be free from error and all errors are confined to the observation vector b [3], [lo], [ 111. In this approach one defines a cost (error function) E(x) as

    1 1 2 2

    = -(AX - b)'(Ax - b) + - - (Yx~x . m . n

    (4)

    where the residuals are given as

    r;(x) = aTx - b; = n

    a;jxj - b; j=1

    and a 2 0 is the regularization parameter. The first term is the standard least squares term and it forces the sum of square residuals to be minimal. The second term is the regularization term whose purpose is to force a smoothness constraint on the estimated solution x* for ill-conditioned problems. The regularization parameter a determines the relative importance of this term [lo]. Using a standard gradient approach for the minimization of the cost function the problem can be mapped to the system of linear differential equations [lo], [ 111

    dx - = -p[AT(Ax - b) + ax] dt

    with any initial conditions x(0) = x(O) (typically x(0) = O), where

    p = diag(p1 ,p2 , . . - ,pn) , pj > O v j j .

    Note that the direct implementation of the system (5a) requires to use m + n linear processing units [ 111. In fact the number of processing units can be reduced to n units since (5a) can be simplified as

    dx dt - = -p[(Wx - 0) + ax]

    where

    W=ATA, O=ATb

    but this requires extra precalculations and it is inconvenient for large matrices especially when the entries a;j and/or b; are slowly changing in time (i.e.. are time variable) [lo]. In the next section we will show how we can avoid this disadvantage. In other words, the known neural network realizations for solving a linear least squares (LS) problem require [cf. (5a) and (5b)l an excessive number of building blocks (analog multipliers and summers). In the next section we will propose a new approach which enables us to solve the LS problem more efficiently and economically.

    The ordinary LS problem is optimal only if all errors are confined to the observation vector b and they have Gaussian distribution. The measurements in the data matrix A are assumed to be free from errors. However, such an assumption is often unrealistic (e.g., in image recognition and computer vision) since sampling errors, modeling errors and instrument errors may imply noise inaccuracies of the data matrix A [3], [4]. The total least squares problem (TLS) has been devised as a more global and often more reliable fitting method than the standard LS problem for solving an overdetermined set of linear equations when the measurement in b as well as in A are subject to errors [1]-[7]. Especially, if the errors are (approximately) uncorrelated with zero mean and equal variance the TLS solution is more accurate than the LS solution [2]-[5]. Golub and Van Loan were the first to introduce the TLS approach into the field of numerical analysis and they developed an algorithm based on the singular value decomposition (SVD) [l], [2]. Van Huffel and Vandewalle investigated the problem, presented a detailed analysis and extended the analysis to cover the nongeneric TLS case [31-[51.

    While the LS problem confines all errors to the observation vector b the TLS problem assumes a perturbation of the data matrix A (by a matrix AA) and the observation vector b (by vector Ab) and tries to compensate for these. The TLS problem can be formulated as the optimization problem to find the vector x%Ls that minimizes

    II AA 11% + II Ab 1 1 % (6) subject to the equality constraint

    (A - AA)x;,s = (b - Ab)

  • 912 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5, NO. 6, NOVEMBER 1994

    where 1 1 A [ I F denotes the Frobenius norm of A. In other words the TLS problem seeks to

    (7) mi~imiZe[Ao,~o]ERmx(,+l) I) [A, BI - [AO! bo] IIF subject to range (bo) c range ( A o ) .

    The main tool for solving the TLS problem is the singular value decomposition (SVD) of the matrix A and the extended matrix [A,b]. Let

    vn+l = [v++l, V Z , ~ + ~ , . . . ! vn+1,n+~17

    be the right singular vector corresponding to the smallest singular value an+l of the extended matrix [A, b]. The TLS solution x%Ls is obtained as [2]

    As the singular value an+l goes to zero, the LS and TLS

    It is important to point out that the ordinary LS problem solutions approach to each other [3], [4].

    minimizes the sum of the squared residuals n

    .;(X) = aajxj - bi (9) j = 1

    while the TLS problem minimizes a sum of the weighted squared residuals [3], [4], [9]

    (10)

    In other words, the TLS problem can be equivalently formulated as the minimization (with respect to the vector x) of the cost function

    r T L s ( x ) = (Ax - b)/(l + x T x ) l / ’ .

    In comparison with the standard LS approach, to obtain the solution of the TLS problem is generally quite burdensome and very time consuming [3], [9]. This probably why the TLS approach has not been as widely used as the usual LS approach although the TLS approach has been investigated in robust statistics long ago. In Section IV we will propose a very simple and efficient algorithm (or more precisely a class of algorithms) to solve the TLS problem by using only one single artificial neuron.

    111. A SIMPLIFIED NEURON FOR THE LEAST SQUARES PROBLEM

    A . Development of Highly Parallel Continuous-Time Learning Algorithm

    In the design of an algorithm for neural networks the key step is to construct an appropriate cost (computational energy) function E(x) so that the lowest energy state will correspond to the desired solution x * [lo]. The formulation of the cost function enables us to transform the minimization problem into a system of differential equations on the basis of which we design an appropriate neural network with associated learning

    algorithm. For our purpose we have developed the following instantaneous error function

    m

    e(t) := sT(Ax - b) = sTr(x) = s ; ( ~ ) T ~ ( x ) (12) i = l

    where n

    T;(x) = aijxj - b; j=1

    and

    is in general the set of zero-mean stochastic independent (uncorrelated) identically distributed (i.i.d.) excitation signals (e.g., uncorrelated zero-mean noise sources, orthogonal sig- nals). For more explanation see the Appendix.

    The actual error e(t) defined by (12) can be written as m

    i = l

    where

    n / m \ m

    n

    = 6 j ( t ) Z j ( t ) - i(t) j = 1

    m m ..” - U j ( t ) := C a i j s i ( t ) : b ( t ) := E b a s ; ( t ) . a= 1 i=l

    For the so-formulated error e ( t ) we can construct the instantaneous estimate of the energy (cost) function at time t as

    where p is a suitable convex function called weighting or loss function and its derivative is called activation or influence function [lo], [32], [33]. The minimization of the cost (com- putational energy) function (14) leads to the set of differential equations

    where

    n

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME 913

    a11 Activation

    azl

    %l

    Fig. 1. solving systems of linear equations Ax 2 b in real time.

    Simplified artificial neuron with on chip learning capability for

    The system of the above differential equations can be written in the compact matrix form

    where

    p = diag (Pl,PZ,...,Pn), P j > 0 vi Z ( t ) = [&(t),&(t) , . . . ,&@)IT

    In fact, the system of differential equations (15) constitutes the basic adaptive learning algorithm of a single artificial neuron (processing unit) shown in Fig. 1.

    There are many possible loss functions p(e ) which can be employed as the cost function (14) [32], [33]. Taking p(e ) = e2/2 gives the ordinary least squares problem. In order to reduce the influence of outliers many different functions p(e ) have been proposed in robust statistics. Here we give only four examples [lo], [321, 1331:

    1) The absolute value function

    = le1 (16)

    2) Huber’s function

    3) Talvar’s function

    4) The logistic function

    p d e ) = P2 In (c-h(e/P)) (19) where p > 0 is a problem dependent parameter, called cut-off parameter. Note that the above loss functions are twice continuously differentiable almost everywhere with a nonnegative second derivative whenever it is defined. The loss

    @t -7-

    P -a

    .‘\ I :

    7

    (C) (d)

    Fig. 2. Robust loss functions p ( e ) and the corresponding activation (influ- ence) functions o ( e ) = ap(e) /ae . (a) Absolute value function. (b) Huber’s function. (c) Talvar’s function. (d) Logistic function (dashed lines in ( b t ( d ) showing the least squares function p(e ) = $eZ).

    functions and their influence (activation) functions @(e) = dp/de are shown in Fig. 2. It is interesting to note that the activation functions @(e) shown in Fig. 2(b) and (d) have a direct implementation in VLSI CMOS technology, since such activation functions are naturally provided by the saturating characteristic of a summing amplifier (adder); i.e., if we take the power supply voltages as * p then the output voltage of the corresponding amplifier can be limited in the range from -p to p. The loss function given by (16)-( 19) belong to the class of so-called robust estimators and their aim is to reduce the influence of large instantaneous errors or wild (spiky or impulsive) noise called outlier on the parameter estimation. In other words the purpose of applying such functions is to compress a large actual (instantaneous) error by preventing the absolute value of the error from being greater than the prescribed cut-off parameter p > 0. The practical importance of the robust loss function for “automated” handling outliers comes from the very great number of measurements (data) and from real time requirements which do not allow any visualization in general [ 101, [321, [331.

  • 914 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5, NO. 6, NOVEMBER 1994

    Zero-mean Noise (Pseudorandom)

    Generator

    (b) Fig. 3. Exemplary realizations of m uncorrelated pseudo-random noise sources.

    (a) Block diagram illustrating the implementation of the continuous-time learning algorithm (21) for solving the regularized LS problem. (b)

    A detailed discussion of the application of the various possi- ble loss functions in artificial neural network implementations is out of the scope of this paper [lo]. We will concentrate here rather on the standard regularized least squares LS problem formulated as follows:

    Find the vector xts which minimizes the cost function

    where a 2 and e ( t ) is given by (13). The minimization of the cost function according to the

    gradient descent rule leads to the leaming algorithm (cf. (15a) and (15b))

    B . Neural Network Implementations An implementation of the learning algorithm (21) using a

    single artificial neuron (cf. Fig. 1) is illustrated in Fig. 3(a). The network consists of analog integrators, summers and analog multipliers. The network is driven by the independent source signals si@) multiplied by the incoming data aij, b;( i = 1,2, . . . , m; j = 1,2, . . , T I ) . The artificial neuron (processing unit) with an on chip adaptive learning algorithm shown in Fig. 3(a) allows processing of the input information (contained in the available input data aij , bi) fully simultaneously, i.e., all m equations are acted upon simultaneously in time. This is the important feature of the proposed neural network. The network

    can be excited by zero-mean random, high frequency, uncor- related identically distributed signals s i ( t ) ( i = 1,2, . . . , m). If only one noise generator (or a pseudo random generator) is available in order to generate m uncorrelated signals s i ( t ) a chain of unit delays can be employed as shown in Fig. 3(b). It should be noted that for zero-mean i.i.d. excitation signals s ; ( t ) all the preprocessed (modulated) signals Z; j ( t ) and b ( t ) are also zero-mean signals (cf.Appendix).

    In the VLSI implementation of the network the rather expensive analog multipliers can be replaced by the simple CMOS switches SI + S, (or sign reversers [35]) as shown in Fig. 4(a).

    Various different strategies for controlling the switches can be chosen. In the simplest strategy the switches can be controlled by a multiphase clock, i.e., the switches will be closed and opened cyclically. In this way the network processes the set of equations in a cyclical order, i.e., in each clock phase only one equation is active. However, the network proposed in Fig. 4(a) is very flexible and it allows, in general, the simultaneous processing of all equations or groups of equations called blocks. In general, these blocks need not be fixed but rather may vary during the optimization (learning) process, i.e., the number of blocks, their sizes and the assignment of equations to blocks may all vary in time [lo], [28]. As demonstrated by extensive computer simulation experiments fully simultaneous processing of all equations shows usually a better initial performance and higher convergence rate than “row-action,’’ i.e., at each single iterative step only one equation is processed (cf. Section V, Example 3).

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME 915

    n

    Zero-mean Signals

    I

    and the signals si (t) can take only one of two discrete (binary) values 0 or 1. The value of the parameter y depends on the strategy employed for controlling the switches. If all the switches are operating fully parallel (i.e.. if they are controlled by a multiple pseudo-random generator producing i.i.d. binary signals) the parameter y is set to 0.5. It should be noted that, in general, the discrete time excitation vector S(t + l),S(t + 2), . . . ,s(t + m)(S(t) = [sl(t), s&), + . . , s,(t)]*), must be linearly independent for ~y discrete-time t = 0,1 ,2 . . .. In other words, the matrix S = [S(t + l),S(t + 2), . . . , s ( t + m)lT E Rmxm must be nonsingular, since then the matrix equation

    h

    ST(Ax - b)= 0

    ... 4 4 4

    -:::- Clock

    (b)

    Fig. 4. (a) Simplified realization of the learning algorithm (21) for solving the regularized LS problem (the analog four-quadrants multipliers are replaced by simple CMOS switches). (b) Exemplary realization of a digital circuit creating multiple pseudo-random bit streams for controlling the switches [34].

    In order to perform a fully simultaneous processing of all equations the switches SI + S, should be controlled by a digital generator producing multiple, random, uncorrelated bit streams. As such a generator, a simple linear feedback shift register can be used which is able to generate uncorrelated multiple, mutually shifted pseudorandom bit streams with good noise-like properties [34] (see Fig. 4(b)). However, it should be noted that, in order to provide that all preprocessed (mod- ulated) signals G j ( t ) and b ( t ) will be zero-mean signals, it is necessary to extract appropriate constants (local mean values) as illustrated in Fig. 4(a).' In such a case the preprocessed signals can be expressed as

    m - a j ( t ) = C a i j Z i ( t ) ( j = 1 , 2 , - . . , n ) (22a)

    i=l

    and m

    i=l

    where

    'Altematively, we can use high-pass filters (HPF's) or local mean es- timators (LME's) in order to extract the mean values of the preprocessed signals.

    should be satisfied if and_ only if Ax* = b (cf. Appendix). However, if the matrix ST is singular (i.e., its vectors are linearly dependent) then the energy function (14) may have local minima which are not solutions of the original problems.

    C . Discrete-Time Learning Algorithms The networks shown in Figs. 3(a), 4(a) are continuous-time

    (analog) networks. In order to realize them by discrete-time networks we can transform the set of differential equations (with Q! = 0 and pj = p > OVj) into a corresponding system of difference equations:

    and k is the iteration count. The above system of difference equations can practically be

    implemented by CMOS switched-capacitor (SC) technology [35]. A functional block diagram illustrating the implementa- tion of the algorithm is shown in Fig. 5(a).

    It can be demonstrated that the iterative algorithm (23) encompasses other known iterative procedures that have been proposed for solving a linear system of equations [l], [24], [26], [27]. In the special case that a set of linear consistent equations is processed in a cyclical order (cf. Fig. 5(b)) the system of difference equations (23) can take the form

    ,('+I) = X(') - v,!')(a?x - bi)ai (24a)

    with y = 0,

    a i = [a;l,ai2,-..,ai,] T , i = k ( m o d m ) + l

    where m is the number of rows in the matrix A. The learning algorithm (24a) can be considered as a modified

  • 916 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5, NO. 6, NOVEMBER 1994

    Fig. 5. (a) Block diagram illustrating the implementation of discrete-time (iterative) algorithm (23). (b) Exemplary clock generator timing diagram for the control of the switches in (a) be employing the Kaczmarz algorithm. (Switches are closed and opened cyclically. For all switches it is assumed, as usual, that the ON state corresponds to the control signal being at the logical value “1,” i.e., high voltage level.) It is interesting to note that the adders (summers) are not required if the switches are controlled by nonoverlapping clock phases.

    (nonnormalized) version of the Kaczmarz algorithm [25], [26], [28]:

    k = 0,1, . . . , z = k(mod m) + 1 where w;(O < wi < 2) is a real relaxation parameter. The algorithm (24) was first developed by Kaczmarz for square matrices with w; = 1 for all i in 1937 [25]. It is interesting to note that the Kaczmarz algorithm is very similar to the normalized least means squares (NLMS) method in adaptive filtering and algebraic reconstruction technique (ART) in computer tomography [14], [28]. Moreover, (24a) and (24b) are very similar to Widrow’s Adaline [21].

    In some applications it is desirable that some of the more important equations are satisfied at the expense of others, i.e., it is required that some of the equations in (3a) and (3b) are nearly satisfied while “larger” errors are acceptable in the remaining equations. In such a case the weighted least- squares approach can be used [26]. It is interesting to note that the Kaczmarz algorithm can easily be adopted to solve the weighted LS problem by choosing, e.g., the relaxation parameters as

    w; := ww8aFa; (z = 1 ,2 , . . . , m) with w > 0. (25)

    The weights w: indicate the reliability of some measurements and can be used to represent constraints on certain equations by choosing the corresponding weights large [26].

    It can be proved that the Kaczmarz algorithm given by (24b) converges for any initial guess x(O) if [26]

    The theoretical robustness of the algorithm (23), (24a) and (24b) is remarkable and the iteration converges even when the matrix A is singular or rectangular. However, as with any linear stationary process the rate of convergence may be arbitrarily slow [26]. For inconsistent linear systems Ax E b the learning rate q must be quite small, otherwise the learning process will become unstable and diverge; but a small learning rate gives a low learning speed.’ Furthermore, as shown by computer simulation experiments the learning rule of (23) using a constant learning rate ~ ( ‘ 1 = q may not behave well and better results can be obtained if it decreases to zero in an appropriate way (e.g., proportionally to l / k ) [29], [9]. One general problem arises: how to select the learning rate ~ ( ~ 1 in order to maximally increase the convergence speed and simultaneously prevent the learning divergence? In Section IV we will propose a new iteration algorithm which enables us to considerably increase the convergence speed.

    IV. ADAFTIVE LEARNING ALGORITHMS FOR THE TLS PROBLEM

    For the TLS problem formulated in Section I1 we can construct the instantaneous energy function

    where

    Applying the standard gradient descent algorithm we obtain the set of differential equations

    where p( t ) > 0. The above system of differential equations, after linearization, can be simplified as

    9 = -p(t)e(t) [Z( t ) +&t)x(t)] (29) d t ’It should be noted that the learning rate q ( k ) for iterative (discrete-time)

    algorithms must be bounded in a small range to assure the convergence of the algorithm. In contrast, for the continuous-time (analog) algorithm (cf. (1% (21)) the teaming parameter p ( t ) can be set sufficiently large without affecting the stability of the algorithm [lo].

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME

    Fig. 6. Implementation of the continuous-time learning algorithm (29) for the TLS problem (p = 1) and the standard LS problem (p = 0).

    The above set of differential equations constitutes a basic adap- tive parallel learning algorithm for solving the TLS problem for overdetermined linear systems Ax b.

    For an ill-conditioned problem the instantaneous estimate of the energy function can be formulated as

    1 e 2 ( t ) a E ( x , a ) = -~ 2 XTX + 1 + ~ l l X l l 2

    where a 2 0 is the standard regularization parameter. In this case the leaming algorithm (29) is modified as

    -- dx(t) - -p ( t ) { e ( t ) [%(t) +b( t )x ( t ) ] + a x ( t ) } (31) dt with p ( t ) > 0.

    An analog (continuous-time) implementation of the algo- rithm (29) is illustrated in Fig. 6. Note that the algorithms (29) and (31) can be considered as a generalization (extension) of the “standard” LS algorithm given by (21) (cf. Fig. 3a). Clearly, the algorithms (29), (31) can be converted to discrete- time iterative algorithms by applying the Euler rule similarly as in Section 111-C.

    Motivated by the desire to increase the maximal con- vergence speed of the iterative algorithms and inspired by the “averaging concept” proposed recently by Polyak for a recursive formula of stochastic approximation [30], [3 11 we have developed a new iterative LSRLS leaming algorithm:

    where 70 > 0, < X < l ,&(k) =

    for the LS problem or p = 1 for the TLS problem. The proposed leaming algorithm combines two processes. The

    [ir1(k),ir2(k),...,iri,(k)IT;k = 0,1, ...,; and p = 0

    917

    first computing process is a standard stochastic recursive algorithm described by the set of difference equations (32a) with a learning rate $’) = q 0 / k X , (3 < X < 1). The second process is an averaging process described by the set of difference equations (32b) with a learning rate ~ ( ~ 1 = l / (k + 1). The implementation of the algorithm is illustrated in Fig. 7. The switches are controlled in general by a multiphase pseudo-random generator. The important feature of the learning algorithm (32a) and (32b) is that in contrast to the standard approach two discrete time sequences are constructed where {x(”} is the arithmetic average of { X ( k ) } . Another essential feature of the algorithm is that it employs two learning rates: = qok-’ and ~ ( ~ 1 = ( I C + 1)-l where

    < X < 1. Note that X = 1 is excluded from the above algorithm and that the learning rate decreases more slowly than = l / ( k + 1). It should be noted that one may use only the set of (32a) but at the expense of a slower convergence speed. The use of the second set of equations (cf. (32b)) improves the convergence speed, i.e., it allows to get the optimal values x* considerably faster.

    In practice the averaging process will start not at k = 0 but from k 2 ko for which z(k) enters the neighborhood of the desired optimal solution x* , i.e., the algorithm can take the following special form:

    and

    A sequence of indices i = { i ( k ) } ~ = o according to which the rows of the matrix A and the elements of the vector b are taken up is called a control strategy. Clearly, various different strategies for controlling the switches {S i} can be chosen (cf. Fig. 7). In the simplest control strategy we have i = k( mod m) + 1; i.e., the switches are closed and opened cyclically. In this case the switches can be controlled by a simple multiphase clock generator. The important problem of choosing an optimal control strategy is out of the scope of this paper and it is left for future investigations.

    The proposed algorithm (33) has the following features: 1) No operations are performed on a linear system Ax b

    as a whole; 2) Each iterative step requires the access to only one row

    of the matrix A and b; 3) The fact that the algorithm needs only one row at a single

    iterative step makes this scheme an especially useful tool for solving large unstructured systems according to the LS and/or TLS criterion;

  • ~

    918 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 5 , NO. 6, NOVEMBER 1994

    m

    Optional Averaging

    Fig. 7. Implementation of the discrete-time (iterative) algorithm in (32a) and (32b).

    4) As shown by some computer experiments the algorithm allows to increase the rate of convergence in comparison to the “standard” iterative algorithm without averaging.

    v. FURTHER EXTENSIONS AND GENERALIZATIONS OF NEURAL NETWORK MODELS

    It is interesting that the neural network models shown in Figs. 3-7 can be employed not only to solve LS or TLS problems but they can easily be modified and/or extended to related problems: e.g., linear and quadratic programming, iteratively reweighted LS problems [lo], minimum Lp-norm (1 5 p 5 00) and/or maximum entropy problems [38].

    Furthermore, by changing the value of the parameter ,f3 (cf. Figs. 6 and 7) more or less emphasis can be given to errors of the matrix A with respect to errors of the vector b. In an extreme case, for large P (say P = 100) it can be assumed that the vector b is almost free of error and the error lies in the data matrix A only. Such a case is referred to as the so called DLS (data least squares) problem (since the error occurs in A but not in b) 1361.

    The DLS problem can be formulated as an optimization problem of finding an optimal vector x* = X*DLS which satisfies the overdetermined system of linear equations

    under the condition that the Frobenius norm of the error matrix 1 1 AA [ I F is minimal [37]. It can be easily demonstrated that the neural networks of Figs. 6 and 7 solve approximately the DLS problem for a large P (typically P = 50 to 100, see Example 3). In other words, the DLS problem can be solved by simulating the system of differential equations

    For complex-valued elements (signals) the algorithm can fur- ther be generalized as

    * = -p(t)e(t) [Gc(t) + pF(t)x(t)] (35b) dt where the superscript c denotes the complex-conjugate oper- ation, and P = 0 for the LS-problem, p = 1 for the TLS problem or /3 >> 1 for the DLS problem. Very recently, it has been shown that the DLS estimation is more appropriate than TLS and ordinary LS for certain types of signal processing problems [36].

    VI. COMPUTER SIMULATION RESULTS

    To check the correctness and performance of the proposed algorithms and associated neural network structures we have simulated them extensively on a computer. The networks shown in Figs. 1, 3-6, and 7(a) have been investigated for many different numerical examples and a good agreement with the theoretical considerations has been obtained. Due to limited space we shall present in this paper only some illustrative examples.

    Example I : Consider the problem of finding the minimal Lz-norm solution of the underdetermined system of linear equations

    A x = b

    with

    2 - 1 4 0 3 1

    1 -2 1 -5 -1 4 A = [ , 1 -3 1 2 0

    Clearly the above set of equations has infinity many solutions. However, there is a unique minimum norm solution which we want to find.

    On the basis of the’general architectures shown in Fig. 3 corresponding continuous-time (analog) circuits have been

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME 919

    designed and simulated on a computer. For the network of Fig. 3(a) as excitation signals si ( t ) ( i = 1,2,3) pseudo- random, independent signals over the range [ - 1,1] were used.

    For the network of Fig. 4(a) the switches were controlled by a three phase clock with clock frequency fc = 100 MHz. Exemplary computer simulation results are shown in Fig. 8(a) and (b). The final solution (equilibrium point) was

    x* = [0.0882,0.1083,0.2733,0.5047,0.3828, -0.30971T

    which is in excellent agreement with the exact minimum L2- norm solution obtained by using MATLAB. Note that the network of Fig. 3(a) reaches the solution in a time less than 100 ns while the network of Fig. 4(a) in a time less than 500 ns. The implementation of the network architecture shown in Fig. 6 leads to the same results (cf. Fig. 8(c)).

    Example 2: Find the pseudo-inverse of the singular matrix

    In order to find the pseudo-inverse matrix B = A+, we need to make the observation vector b successively [l, 0, 0IT, [0,1, 0IT and [O,O, 1IT. We have repeated the computation three times for each observation vector b to find three columns of the pseudo-inverted matrix B. Fig. 9 shows the exemplary computer simulated state trajectories b i j ( t ) ( z , j = 1,2,3) by employing the network architecture shown in Fig. 6. The network was able to find

    1 1

    -0.6387 -0.1663 0.3055 [ 0.5277 0.1666 -0.1942 B = A+ = -0.0554 -0.0001 0.0552 in the total time less than 8 ps. The pseudo-inverse matrix found by MATLAB reads

    -0.6389 -0.1667 0.3056 [ 0.5278 0.1667 -0.1944 AMATLA, + 1 -0.0556 -0.0000 0.0556 Example 3: Let us consider the following linear parameter

    estimation problem described by the set of linear equations 1 1 1 ij jl [::I = 11. 2 1 5

    It is required to find an optimal solution according to the LS, TLS and DLS criterion.

    To solve the problem we have employed neural network architectures depicted in Figs. 6 and 7. Exemplary computer simulated results are shown in Figs. 10 and 11. It has been found that the analog (fully simultaneous) network of Fig. 6 shows better performance (higher convergence speed) in com- parison to the discrete-time (one equation per iterative step) network of Fig. 7. For example the network of Fig. 6 was able

    8.3888888

    8.2888888

    E. 1888888

    E. EEEE

    -E. 1888888

    -8.2888888

    8.8888 Tine 588.8881-89 I

    _. - .- ~

    /- "4 E ,6888888

    8.5888888

    8.4888888 1 J- /-"5

    d

    8.3888888

    8 .2888888

    E. 1888888

    8.8888

    -E, 188888E

    -8.2EBEEEE

    -8.3888888

    -8, 4888888 8.8888 Tine 588 .EEEE-E9

    E. bEEEE88

    8.5888888

    E. 4888888

    8.3888888

    E I 2888888

    E. 1888888

    E. E888

    -8. 1888888 - \

    -8.4888888 8,8888 Time 288 .EBEE-89

    Fig. 8. Computer simulated trajectories of the estimated parameters z3 ( t ) , ( j = 1 , 2 , . . . , 6 ) for the problem in Example 1. (a) By using the network architecture of Fig. 3 with pJ = lo7, a = (b) By using the network architecture of Fig. 4(a) with pJ = 5 X lo6, and clock fre uency fc = 1OOMHz. (c) By using the network in Fig. 6 for pJ = IO', a = 0, p = 1, and p = 0.

  • 920 IEEE TRANSACTIONS ON NEURAL NETWORKS. VOL. 5 , NO. 6, NOVEMBER 1994

    8.8888 I

    -8.4888888 -

    -e .meeee

    -~.aeeee~e -1.8888

    8.EBBB line 2.588E-86

    Fig. 9. Computer simulated trajectories for Example 2.

    to find, in a time less than 400 ns, the following parameters

    xLs for p = 0, z$Ls = [x;,x;lT = [0.8531, 0.2591IT, for p = 1, zkLs for ,6 = 50,

    which are in good agreement with the exact results obtained by using MATLAB

    = [zT, zf = [0.706, 0.298IT,

    = [x?, x f = [1.1132: 0.1898IT,

    XES MATLAB = [0.7, 0.3IT, x$LS MATLAB = [0.8544, 0.2587lT, xLLS MATLAB = [1.1200, 0.18671T.

    Example 4: In order to test the performance of the proposed neural network for a large number of variables we have simu- lated it for a tomographic image reconstruction problem from projection [24], [28]. In the computer simulation experiments a modification of the Shepp and Logan head phantom has been used as an original image [24]. The Shepp and Logan phantom is often used for testing different numerical algorithms for image reconstruction because it approximately represents a cross-section of the human head which is known to place the greatest demands on a tomographic system with respect to its accuracy [24], [28].

    Consider a 2-D discrete model for the n x n cross-sectional image. Let z j ( j = 1 ,2 , . . . , n) denote the density (gray) level of the jth pixel and bi = a,,xj(i = 1,2, . . . , m) represent

    the ith ray-sum, where aij is the weight factor representing the fractional area intercepted by the ith ray and jth pixel. Thus the reconstruction of images from projection can be formulated as a problem of solving a linear system of equations: Ax = b where A E Rmxm is the projection matrix, b E R" is the projection data (measurement) vector and x E R" is the density vector, while n is the number of pixels and m is the number of measurements. Thus the goal is to find or estimate a density vector x assuming that the projection matrix A and data vector b are known. Using the Shepp and Logan head phantom model, we analytically determined the projection matrix A E Rmxm for the test image with 64 rays in each

    n

    J = 1

    e.+" 8.3888888

    e.z8eeeee

    8 I 3888888

    8. 2888888

    E. iEB8888

    8.8888 - __ ~- 8.8888 Tine 58 .8EEE-&

    (C)

    Fig. 10. Simulated trajectories for Example 3. (a) By using the network architecture of Fig. 6 with = 0, p(t) = 10' = const. (b) By using the same architecture but with a learning rate p ( t ) exponentially decreasing to zero. (c) By using the network architecture of Fig. 7 with So = 0 (switch opened), X = 0.8, and fc = 100 MHz.

    of the 56 projections; and the image is reconstructed on a 64 x 64 sampling lattice. Note that the involved system of linear

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME 92 I

    i , , 1. 8888

    8.9888888

    8.8888888

    8.7888888

    8.8888 _ _ _ - ~ 8.6688 Tine 588.888E-89

    ,-e( t ) + 0.5

    E . ~ ~ ~ B B E E

    8.3888888

    E. zeeeem 8. 1888888

    8.8888 - 8.8888 Time 58. MBE-86

    (C)

    Fig. 1 1 . Simulated trajectories for Example 3 (for solving the TLS problem). (a) By using the network architecture of Fig. 6 with /? = 1, p ( t ) = 10’ = const. (b) By using the same architecture but with a leaming rate p ( t ) exponentially decreasing to zero. (c) By using the network architecture of Fig. 7 with SO = 1 (switch closed), J = 1. X = 0.8, and fc = 100 MHz.

    equations is underdetermined, i.e., n = 4096 variables (pixels) and m = 3584 equations have been used.

    Fig. 12. and 10 p s .

    Reconstructed 64 x 64 head-phantom image at the time 1, 2, 5,

    The image reconstruction has been simulated by using the neural network architecture shown in Fig. 4. The parameters of the network have been chosen as follows: p( t ) = lo6 = const, a = 0.02, all switches were controlled by a multiphase pseudorandom binary generator with a basic clock frequency fc = 100 MHz = 10’ Hz. The exemplary images recon- structed from the same projection data, starting from zero initial conditions at the time 1, 2, 5 , and 10 ps are shown in Fig. 12. The image equality has been evaruatedby uskg the normalized root mean squared (NRMS) error defined as

    n 1 1’2 j=1 (Zj - Zf)’ / j=1 (Zj - q2 . where Z; is the reconstructed image at the jth pixel, xj is the original image, T is the mean of the original image and n is the total number of pixels. In our computer experiments we obtained the NRMS error E = 0.0547 after a time less than 10 ps, which means that the reconstructed image was a quite good replica of the original image. It is also interesting to note that the computation time (i.e., the settling time in our experiments less than 10 ps) was principally not influenced by the size of the problem (in contrast to the standard digital computers) but rather depends on the values of the parameter: the learning rate p( t ) , the regularization parameter a( t ) and the clock frequency fc.

    VII. CONCLUSION New, very simple and low-cost analog neural networks for

    solving least squares and TLS problems have been proposed. In fact, such problems can be solved by using only one single highly simplified artificial neuron with an on chip leaming capability. The proposed networks are able to estimate the unknown parameters in real time, i.e., in a time of the

  • 922 IEEE TRANSACTTONS ON NEURAL NETWORKS, VOL. 5, NO. 6, NOVEMBER 1994

    order of hundreds or thousands of nanoseconds. The networks architectures are suitable for currently available VLSI imple- mentations. The potential usefulness of the proposed networks may be attractive for real time and/or high throughput rate applications, e.g., in robotics, computed tomography and au- tomatic control when the entries of the observation vector and the model matrix are changing in time and it is necessary to continuously track or update the solutions. An interesting and important feature of the proposed algorithmic scheme is its universality and flexibility. It allows either a processing of all equations fully simultaneously in time (cf. Figs. 3, 4, and 6) or a processing of groups of equations (i.e., blocks) in every iterative step (cf. Fig. 5 and 7). These blocks need not be fixed but may rather vary dynamically throughout the iterations. In a special case it allows the processing only of one equation per block, i.e., in each iterative step only one single equation can be processed. We believe that the new fully simultaneous algorithmic scheme (cf. Figs. 3, 4, 6, and 7) opens new, and till now unexplored, vistas in mmy areas in which the problem arises to solve large systems of linear equations in real time. The continuous-time (analog) formalism employed in the proposed algorithms (in fact the basic learning algorithms are expressed completely by a system of nonlinear differential equations, cf. (15), (21), (31), (35)) enables us to select the learning rate p ( t ) sufficiently large (to assure an extremely high computation speed) without affecting the stability of the networks. In contrast, for the associated discrete-time iterative scheme (see (23), (32), (33)) the corresponding learning rate r ] must be upper bounded in a small range or else the system will be unstable (i.e., the learning algorithms will diverge).

    APPENDIX

    In this appendix we shall attempt to explain why the external excitation signals si ( t ) are needed, and which conditions they must satisfy in order to solve the specified problems.

    Let us define the energy (cost) function as

    where EM[-] is the expected value of its argument and e(x(t)) is the instantaneous error defined (cf. Fig. 3(a))

    (A2) e(x(t)) = s'(t)(Ax(t) - b) = sTr.

    The above energy function can be evaluated as

    1 1 2 2

    ~ ( x ) = EM[^'] = - ~ M [ P s s ~ r ] = 1 1

    = -rTE~[ssT]r 2 = -rTRsr 2 ('43)

    where R, = E ~ [ s s ~ ] is the autocorrelation matrix of the ex- citation (noise) signals. It should be noted that for a zero-mean i.i.d. stochastic process (excitation signals) the correlation matrix R, E Rmxm is a diagonal matrix with all diagonal elements equal to 0' = E~[s;]Vi(i = 1,2, . . . , m). Thus the energy function can be expressed as

    In fact for zero-mean noise signals with variance a2 = 1 the so defined energy function (Al) is equivalent to the cost function (4) with the regularization parameter a 0. The minimization of the cost function (Al) with respect to the vector x(t) by using the gradient descent method leads to the system of differential equations

    -- dx(t) dE dt - -p- dX = -pP,[x(t)]

    where

    P > O l

    Pc[x(t)l = EM[4t)5(t)ll 5(t) = [&(t), i i z ( t ) , . . . , &@)IT.

    In practice, the crosscorrelation vector P, [x(t)] is not available and its computation is rather difficult. Therefore, an unbi- ased approximation is usually made by replacing the vector P,[x(t)] by the instantaneous vector Pc(t), so the system of differential equations in our case is approximated by (cf. (21))

    Let us now consider a more practical case (cf. Figs. 4, 5, and 7) in which the random uncorrelated identically distributed excitation signals s i ( t ) take only two discrete values 0 and 1. In this case the excitation signals no longer have zero- mean values (under the assumption that y = 0). However, the stochastic process ~ ( t ) = [ s l ( t ) , ~ 2 ( t ) , . . . , sm(t)lT can be decomposed as

    (A71 s ( t ) = S , ( t ) + sc where

    S , ( t ) = [61(t),6z(t)l...lBm(t)]T

    is a zero-mean uncorrelated identically distributed process and

    1 1 2 ' 2

    withs"i(t) E {-- -}

    1 withs, = -

    2 Sc = ~ , [ l , l , . . . , l ] ~

    is a constant process. For this case the energy function (A3) can be evaluated as

    1 1 2 2

    ~ ( x ) = - ~ M [ ( s ~ r ) ~ ] = - E M [ ( ( S , ( t ) + ~ ~ ) ~ r ) ' ] =

    where I is the rz x rz unit matrix and 1 is the n x n matrix with all entries equal to one, and c2 = E~[s"z(t)]Vi. The energy function (As) consists of two terms: the standard quadratic term and the distortion term caused by the nonzero-mean value of the excitation random signals. This additional distortion can degrade the accuracy and convergence rate of the proposed adaptive algorithms. In order to eliminate this term, one can introduce the parameter y = 4 as illustrated in Figs. 4 and 7.

  • CICHOCKI AND UNBEHAUEN: SOLVING PROBLEMS IN REAL TIME 923

    ACKNOWLEDGMENT [27] J. Mandel and W. L. Miranker, “New techniques for fast hybrid solutions of systems of equations,” Int. J. Numer. Math. Eng., vol. 27, pp. 455467, 1989.

    1281 G. T. Herman, A. K. Louis, and F. Natterer, ed., Mathematical Methods in Tomography.

    Automatic Cont., vol. AC-22, pp. 551-575, 1977.

    The authors would like to thank the anonymous referees for many valuable comments that have greatly improved &e

    for making extensive computer simulation experiments in the

    Berlin, Germany: Springer-Verlag, 1991. presentation. are grateful to their student B. Spimler [29] L, Ljung, “Analysis of recursive stochastic algorithms,” IEEE Trans.

    area of the tomographic image reconstruction by employing 1301 B. T. Polyak, “New method of stochastic approximation type,” Automat. Remote Contr., vol. 51, pp. 937-947, 1990.

    1311 G. Yin and Y. Zhu. “Averapine Drocedures in adautive filtering: An the proposed neural network models.

    REFERENCES G. H. Golub and C. F. Van Loan, Matrix Computation. Baltimore, MD: Johns Hopkins Univ. Press, 1989. -, “An analysis of the total least squares problem,” SIAM J. Numer. Anal. vol. 17, pp. 883-893, 1980. S. Van Huffel and J. Vandewalle, “The total least squares problem,” in Computational Aspects and Analysis, Frontiers in Applied Mathematics, vol. 9. -, “Algebraic connections between the least squares and total least squares problems,” Numer. Math., vol. 55, pp. 4 3 1 4 9 , 1989. S. Van Huffel, “On the significance of nongeneric total least squares problems,” SIAM J . Matrix Anal. Appl., vol. 13, pp. 20-35, 1992. M. Wei, “Algebraic relations between the total least squares and least squares problems with more than one solution,” Numer. Math., vol. 62,

    K. S. Arun, “A unitary constrained total least squares problem in signal processing,” SIAM J. Matrix Anal. Appl., vol. 13, no. 3, pp. 728-745, 1992. E. Oja, “A simplified neuron model as a principal component analyzer,” J . Math. Biology, vol. 16, pp. 267-273, 1982. L. Xu, E. Oja, and C. Y. Suen, “Modified Hebbian learning for curve surface fitting,” Neural Net., vol. 5, pp. 441457, 1992. A. Cichocki and R. Unbehauen, Neural Networks for Oprimization and Signal Processing. Chichester, U.K.: Teubner-John Wiley, 1993, p. 524. -, “Neural networks for solving systems of linear equations and related problems,” IEEE Trans. Circ. Syst., vol. 39, pp. 124-128, 1992. -, “Neural networks for solving of linear equations. Part 11: Minimax and least absolute value problems,” IEEE Trans. Circ. Syst. vol. 39, pp. 619433, 1992. S. I. Sundharsanan and M. K. Sundareshan, “Exponential stability and systematic synthesis of a neural network for quadratic minimization,” Neural Net., vol. 4, pp. 599-613, 1991. B. Widrow and S. Steams, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice Hall, 1985. J. I. Hopfield, “Neurons with graded response have collective computa- tional properties like those of two-state neurons,” Proc. Nut. Acad. Sci., vol. 81, pp. 3088-3091, 1984. D. W. Tank and J. J. Hopfield, “Simple neural optimization networks: An a/d converter, signal decision circuits, and a linear programming circuit,” IEEE Trans. Circ. Syst., vol. CAS-33, pp. 533-541, 1986. M. P. Kennedy and L. 0. Chua, “Neural networks for nonlinear programming,” IEEE Trans. Circ. Syst., vol. CAS-35, pp. 554-562, 1988. A. D. Culhane, M. C. Pekerar, and C. R. K. Manian, “A neural net approach to discrete Hartley and Fourier Transform,” IEEE Trans. Circ.

    L.-X. Wang and J. M. Mendel, “Parallel structured networks for solving a wide variety of matrix algebra problems,” J. Parallel and Distributed Computing, vol. 14, pp. 236-247, 1992. M. Forti, S. Manetti, and M. Marini, “Neural network for adaptive FIR filtering,’’ Elect. Lett., vol. 26, no. 14, pp. 1018-1019, 1990. B. Widrow and M. A. Lek, “30 years of adaptive neural networks: Perceptron, Madaline and backpropargation,” Proc. IEEE. vol. 78, pp.

    S. Grossberg, Studies of Mind and Brain: Neural Principles of Learning Perception, Development, Cognition, and Motor Control. Boston, MA: Reidel, 1982. H. W. Sorenson, Parameter Estimation-Principles and Problems. New York: Marcel Dekker, 1990. R. J. Mammone, ed., Computational Methods of Signal Recovery and Recognition. New York Wiley, 1992. S. Kaczmarz, “Angenaerte Auflosung von Systemen linearer Gleichun- gen.” Bull. Acad. Poln. Sci. Lett. A, vo 1. 35, pp. 355-357, 1937. M. Hanke and W. Niethammer, “On the acceleration of Kaczmarz’s method for inconsistent linear systems,” Linear Algebra and Its Appli- cations, vol. 130, pp. 83-98, 1990.

    Philadelphia, PA: SIAM, 1991.

    pp. 123-148, 1992.

    Syst., vol. CAS-36, pp. 695-702, 1989.

    1415-1442, 1990.

    L 1

    efficient approach,” IEEE Ti-a%.>itomatic Cont., voi. 37, pp. 4 6 G 7 5 , 1992.

    1321 D. P. O’Leary, “Robust regression computation using iteratively reweighted least squares,” SIAM J. Matrix Anal. Appl., vol. 11, pp. 466480, 1990.

    1331 F. R. Hampel, E. N. Ronchetti, and W. A. Stachel, Robust Statistic-The Approach Based on Influence Functions. New York Wiley, 1987.

    1341 J. Alspector, J. W. Gannett, S. Haber, M. B. Parker, and R. Chu, “A VLSI efficient technique for generating multiple uncorrelated noise sources and its application to stochastic neural networks,” IEEE Trans. Circ. Syst., vol. 38, pp. 109-123, 1991.

    [35] R. Unbehauen and A. Cichocki, MOS Switched-Capacitor and Continuous-Time Integrated Circuits and Systems4nalysis and Design. New York: Springer-Verlag, 1989.

    1361 R. D. DeGroat and E. M. Dowling, “The data least squares problem and channel equalization,” IEEE Trans. Sig. Process., vol. 41, pp. 4 0 7 4 1 1, 1993.

    1371 A. Cichocki, R. Unbehauen, M. Lendl, and K. Weinzierl, “Neural networks for linear inverse problems with incomplete data especially in applications to signal and image reconstruction,” J. Neurocompuring, 1994.

    Andrej Cichocki received the M. Sc. (with honors), Ph. D., and Habilitate Doctorate (Dr. Sc.) degrees, all in electrical engineering, from the Warsaw Tech- nical University in 1972, 1975, and 1982, respec- tively. Since 1972, he has been with the Institute of Theory and Electrical Engineering and Electrical Measurements at the Warsaw Technical University, where he became a full Professor in 1991. Currently he is at the University of Erlangen-Numberg as a Research Fellow and Guest Professor. He is the co-author of two books MOS Switched-Capacitor

    and Continuous-Time Integrated Circuits and Systems (Springer-Verlag. 1989) and Neural Networks for Optimization and Signul Processing (Teubner-Wiley, 1993). His current research interests lie in the fields of neural computing, MOS analog circuit design, and nonlinear network analysis and synthesis.

    Dr. Cichocki was awarded Research Fellowships by the Alexander von Humboldt Foundation, Germany, in 1984 and 1986, and he spent his fel- lowship with the Lehrstuhl fur Allgemeine and Theoretische Elektrotechnk, University of Erlangen-Niimberg.

    Rolf Unbehauen (M’61-SM’82-F’91) received the diploma in mathematics, the Ph. D. degree in elec- trical engineering, and the qualification for inaugu- ration as academic lecturer from Stuttgart Universi ty, Stuttgart, Germany, in 1954, 1957, and 1964, respectively.

    From 1955 to 1966 he was a member of the Institute of Mathematics, the Computer Center, and the Institute of Electrical Engineering at Stuttgart University, where he was appointed Associate Pro- fessor in 1965. Since 1966 he has been a Full

    Professor of Electrical Engineering at the University of Erlangen-Niimberg, Erlangen, Germany. He has published a great number of papers on his research results, and had authored four books in German, and is co-author of MOS Switched-Capacitor and Continuous-Time Integrated Circuits and Systems (Springer-Verlag, 1989) and Neural Networks for Optimization and Signal Processing (Teubner-Wiley, 1993). His teaching and research interests are in the areas of network theory and its applications, system theory, and electromagnetics.

    Dr. Unbehauen is a member of the Informationstechnische Gesellschaft of Germany and of URSI, Commission C: Signals and Systems.