fundamental theory of artificial higher order neural networks · fundamentals of higher order...

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/215884506

Fundamentals of Higher Order Neural Networks for Modeling and Simulation

Chapter · October 2012

DOI: 10.4018/978-1-4666-2175-6.ch006

CITATIONS

31READS

3,659

5 authors, including:

Some of the authors of this publication are also working on these related projects:

Research on Medical Diagnosis with Intelligent Systems View project

Intelligent Prediction Method of Lung Tumor Motion for Highly Accurate Radiation Therapy View project

Madan M Gupta

University of Saskatchewan

473 PUBLICATIONS 12,206 CITATIONS

SEE PROFILE

Ivo Bukovsky

Czech Technical University in Prague

81 PUBLICATIONS 421 CITATIONS

SEE PROFILE

Noriyasu Homma

Institute of Electrical and Electronics Engineers

207 PUBLICATIONS 1,626 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ivo Bukovsky on 01 February 2017.

The user has requested enhancement of the downloaded file.

https://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_2&_esc=publicationCoverPdfhttps://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_3&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Research-on-Medical-Diagnosis-with-Intelligent-Systems?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Intelligent-Prediction-Method-of-Lung-Tumor-Motion-for-Highly-Accurate-Radiation-Therapy?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_1&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/University-of-Saskatchewan?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Czech_Technical_University_in_Prague?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Institute_of_Electrical_and_Electronics_Engineers?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_10&_esc=publicationCoverPdf

Fundamentals of Higher Order Neural Networks

for Modeling and Simulation

Madan M. Gupta1,

Ivo Bukovsky2, Noriyasu Homma3, Ashu M. G. Solo4, and Zeng-Guang Hou5

Summary: In this chapter, we provide fundamental principles of higher order neural

units (HONUs) and higher order neural networks (HONNs) for modeling and

simulation. An essential core of HONNs can be found in higher order weighted

combinations or correlations between the input variables and HONU. Except the high

quality of nonlinear approximation of static HONUs, the capability of dynamic HONUs

for modeling of dynamic systems is shown and compared to conventional recurrent

neural networks when a practical learning algorithm is used. Also, the potential of

continuous dynamic HONUs to approximate high dynamic-order systems is discussed

as adaptable time delays can be implemented. By using some typical examples, this

chapter describes how and why higher order combinations or correlations can be

effective for modeling of systems.

Keywords: higher order neural networks, higher order neural units, second order

neural networks, second order neural units, sigma-pi networks, pi-sigma networks,

ridge polynomial neural networks, tapped delay line neural networks

1. Introduction

The human brain has more than 10 billion neurons, which have complicated

interconnections, and these neurons constitute a large-scale signal processing and

memory network. The mathematical study of a single neural model and its various

extensions is the first step in the design of a complex neural network for solving a

variety of problems in the fields of signal processing, pattern recognition, control of

complex processes, neurovision systems, and other decision making processes. Neural

network solutions for these problems can be directly used for computer science and

engineering applications.

1 University of Saskatchewan, Saskatoon, SK, Canada, [email protected] 2 Czech Technical University in Prague, Prague, Czech, [email protected] 3 Tohoku University, Sendai, Japan, [email protected] 4 Maverick Technologies America Inc., Wilmington, DE, USA,

[email protected] 5 The Chinese Academy of Sciences, Beijing, China, [email protected]

A simple neural model is presented in Figure 1. In terms of information processing,

an individual neuron with dendrites as multiple-input terminals and an axon as a

single-output terminal may be considered a multiple-input/single-output (MISO)

system. The processing functions of this MISO neural processor may be divided into

the following four categories:

(i) Dendrites: They consist of a highly branching tree of fibers and act as input

points to the main body of the neuron. On average, there are 103 to 104

dendrites per neuron, which form receptive surfaces for input signals to the

neurons.

(ii) Synapse: It is a storage area of past experience (knowledge base). It

provides long-term memory (LTM) to the past accumulated experience. It

receives information from sensors and other neurons and provides outputs

through the axons.

(iii) Soma: The neural cell body is called the soma. It is the large, round central

neuronal body. It receives synaptic information and performs further

processing of the information. Almost all logical functions of the neuron are

carried out in the soma.

(iv) Axon: The neural output line is called the axon. The output appears in the

form of an action potential that is transmitted to other neurons for further

processing.

The electrochemical activities at the synaptic junctions of neurons exhibit a complex

behavior because each neuron makes hundreds of interconnections with other neurons.

Each neuron acts as a parallel processor because it receives action potentials in parallel

from the neighboring neurons and then transmits pulses in parallel to other

neighboring synapses. In terms of information processing, the synapse also performs a

crude pulse frequency-to-voltage conversion as shown in Figure 1.

Figure 1. A simple neural model as a multiple-input (dendrites) and single-output

(axon) processor.

1.1. Neural mathematical operations

In general, it can be argued that the role played by neurons in the brain reasoning

processes is analogous to the role played by a logical switching element in a digital

computer. However, this analogy is too simple. A neuron contains a sensitivity

threshold, adjustable signal amplification or attenuation at each synapse and an

internal structure that allows incoming nerve signals to be integrated over both space

and time. From a mathematical point of view, it may be concluded that the processing

of information within a neuron involves the following two distinct mathematical

operations:

(i) Synaptic operation: The strength (weight) of the synapse is a representation

of the storage of knowledge and thus the memory for previous knowledge.

The synaptic operation assigns a relative weight (significance) to each

incoming signal according to the past experience (knowledge) stored in the

synapse.

(ii) Somatic operation: The somatic operation provides various mathematical

operations such as aggregation, thresholding, nonlinear activation, and

dynamic processing to the synaptic inputs. If the weighted aggregation of the

neural inputs exceeds a certain threshold, the soma will produce an output

signal to its axon.

A simplified representation of the above neural operations for a typical neuron is

shown in

Figure 2. A biological neuron deals with some interesting mathematical mapping

properties because of its nonlinear operations combined with a threshold in the soma.

N e ur a l i nput s S y na ps e

D e ndr i t e s ( i nput s )

A xon

S om a

N e ur a l out put w 0

If neurons were only capable of carrying out linear operations, the complex human

cognition and robustness of neural systems would disappear.

Figure 2. Simple model of a neuron showing (a) synaptic and (b) somatic operations.

Observations from both experimental and mathematical analysis have indicated

that neural cells can transmit reliable information if they are sufficiently redundant in

numbers. However, in general, a biological neuron has an unpredictable mechanism

for processing information. Therefore, it is postulated that the collective activity

generated by large numbers of locally redundant neurons is more significant than the

activity generated by a single neuron.

1.2. Synaptic operation

As shown in

Figure 2, let us consider a neural memory vector of accumulated past experiences

nT

nwww ],,,[ 21 w , which is usually called synapse weights, and a neural

input vector nT

nxxx ],,,[ 21 x as the current external stimuli. Through the

comparison process between the neural memory w and the input x , the neuron can

calculate a similarity between the usual (memory base) and current stimuli and thus

know the current situation (Kobayashi, 2006). According to the similarity, the neuron

can then derive its internal value as the membrane potential.

A similarity measure u can be calculated as an inner product of the neural memory

vector w and the current input vector x given by

n

i

iinn

T

xwxwxwxw

u

1

2211

)(

xwxw

(1)

The similarity implies the linear combination of the neural memory and the current

input, or correlation between them. This idea can be traced back to the milestone

model proposed by McCulloch and Pitts (1943).

As shown in Figure 3, the inner product can also be represented as

cos|||| xwu (2)

where |.| denotes the absolute value of the vector and is the angle between the

vectors w and x .

Figure 3. Inner product as a measure of similarity between a neural memory (past

experience) w and a neural input (current experience) x .

When a current input x points to the same or very similar direction of the neural

memory w , the similarity measure u becomes large and correlation between the

memory w and the input x becomes positively strong due to 1cos . If the

input x points to the opposite or nearly opposite direction of the memory w , the

absolute value of the similarity measure |u| also becomes large, but the negative

correlation becomes strong because 1cos . In these two cases, absolute values of

the memory w and the input x also influence the similarity measure. The other

particular case is that the input x and the memory w are orthogonal with each other.

In this case, the similarity measure u becomes very small due to 0cos . If the two

vectors are strictly orthogonal, the similarity measure u is equal to 0. Thus, the

similarity measure is independent of the absolute values of the memory w and the

input x .

The inner product indicates how much the directions of two vectors are similar to

each other. Indeed, in the case of normalized vectors w and x , i.e., 1|||| xw , the

similarity measure is nothing but cos :

uu coscos|||| xw (3)

Note that the linear combination can be extended to higher order combinations as in

the following section.

1.2.1. Higher Order Terms of Neural Inputs

In the linear combination given in eqn. (1), we considered a neural input vector

consisting of only the first order terms of neural inputs in the polynomial. Naturally,

we can extend the first order terms to the higher order terms of the neural inputs or any

other nonlinear ones. To separate different classes of data with a nonlinear

discriminant line, an HONN (Rumelhart et al., 1986a; Giles and Maxwell, 1987;

Softky and Kammen, 1991; Xu et al., 1992; Taylor and Commbes, 1993; Homma and

Gupta, 2002) is used. An HONN is composed of one or more HONUs.

Here let us consider the second order polynomial of the neural inputs. In this case,

the extended neural input and memory vectors, ax and aw , can be defined by

T

nnnnna xxxxxxxxxxxx ],,...,,,...,,,,,,[2

1

2

2121

2

121 x (4)

T

nnnnnna wwwwwwwww ],,...,,,...,,,,,,[ )1(221121100201 w (5)

Then the similarity measure can be given with the same notation

n

i

n

ij

jiij

n

i

ii

nnnnnnn

nnnn

a

T

aaaa

xxwxw

xwxxwxw

xxwxxwxwxwxwxw

u

11

2

1)1(

2

222

112112

2

1112211

xwxw

(6)

The second order terms of ji xx can be related to correlations between the two inputs

ix and jx . That is, if the two inputs are statistically independent of each other, then

the second order terms become 0 while absolute values of terms become large if there is

a linear relation between them. The squared terms of neural inputs 2

ix indicate the

power of the inputs from the physical point of view.

Consequently, the similarity measure of general higher order terms can be defined

as

n

nn

n i

n

i

n

ii

n

ii

iiiii

n

i

n

ij

jiij

n

i

iia xxxwxxwxwu

1

...

11 1 12 1

2121 (7)

1.3. Somatic operation

Typical neural outputs are generated by a sigmoidal activation function of the similarity

measure u of the inner product of neural memories (past experiences) and current

inputs. In this case, the neural output y can be given as 1)( uy (8)

where is a neural activation function. An example of the activation function can be

defined as a so-called sigmoidal function given by

)exp(1

1)(

xx

(9)

and shown in Figure 4.

Figure 4. A sigmoidal activation function.

Note that the activation function is not limited to the sigmoid one. However, this

type of sigmoid function has been widely used in various fields. Here if the similarity u

is large—that is, the current input x is similar to the corresponding neural memory

w —the neural output y is also large. On the other hand, if the similarity u is small,

the neural output y is also small. This is a basic characteristic of biological neural

activities. Note that the neural output is not proportional to the similarity u, but a

nonlinear function of u with saturation characteristics. This nonlinearity might be a

key mechanism to make the neural activities more complex as brains do.

1.4. Learning from experiences

From the computational point of view, we have discussed how neurons, which are

elemental computational units in the brain, produce outputs y as the results of neural

information processing based on comparison of current external stimuli x with neural

memories of past experiences w . Consequently, the neural outputs y are strongly

dependent on the neural memories w . Thus, how neurons can memorize past

experiences is crucial for neural information processing. Indeed, one of the most

remarkable features of the human brain is its ability to adaptively learn in response to

knowledge, experience, and environment. The basis of this learning appears to be a

network of interconnected adaptive elements by means of which transformation

between inputs and outputs is performed.

Learning can be defined as the acquisition of new information. In other words,

learning is a process of memorizing new information. Adaptation implies that the

element can change in a systematic manner and in so doing alter the transformation

between input and output. In the brain, transmission within the neural system

involves coded nerve impulses and other physical chemical processes that form

reflections of sensory stimuli and incipient motor behavior.

Many biological aspects are associated with such learning processes, including

(Harston, 1990)

Learning overlays hardwired connections

Synaptic plasticity versus stability: a crucial design dilemma

Synaptic modification providing a basis for observable organism behavior

Here, we have presented the basic foundation of neural networks starting from a

basic introduction to the biological foundations, neural models, and learning properties

inherent in neural networks. The rest of the chapter contains the following five

sections:

In section 2, as the first step to understanding HONNs, we will develop a general

matrix form of the second order neural units (SONUs) and the learning algorithm.

Using the general form, it will be shown that, from the point of view of both the neural

computing process and its learning algorithm, the widely used linear combination

neural units described above are only a subset of the developed SONUs.

In section 3, we will conduct some simulation studies to support the theoretical

development of second order neural networks (SONNs). The results will show how and

why SONNs can be effective for many problems.

In section 4, HONUs and HONNs with a learning algorithm will be presented.

Toward computer science and engineering applications, function approximation and

time series analysis problems will be considered in section 5.

Concluding remarks and future research directions will be given in section 6.

2. Second Order Neural Units and Second Order Neural Networks

Neural networks, consisting of first order neurons which provide the neural output as a

nonlinear function of the weighted linear combination of neural inputs, have been

successfully used in various applications such as pattern recognition/classification,

system identification, adaptive control, optimization, and signal processing (Sinha et al.,

1999; Gupta et al., 2003; Narendra and Parthasarathy, 1990; Cichochi and Unbehauen,

1993).

The higher order combination of the inputs and weights will yield higher neural

performance. However, one of the disadvantages encountered in the previous

development of HONUs is the larger number of learning parameters (weights) required

(Schmidt, 1993). To optimize the features space, a learning capability assessment

method has been proposed by Villalobos and Merat (1995).

In this section, in order to reduce the number of parameters without loss of higher

performance, an SONU is presented (Homma and Gupta, 2002); A SONU is also

sometimes denoted as a quadratic neural unit (Bukovsky et al 2010). Using a general

matrix form of the second order operation, the SONU provides the output as a nonlinear

function of the weighted second order combination of input signals. Note that the

matrix form can contribute to high speed computing, such as parallel and vector

processing, which is essential for scientific and image processing.

2.1. Formulation of the second order neural unit

A SONU with n-dimensional neural inputs,nt )(x , and a single neural output,

1)( ty , is developed in this section (Figure 5). Let

1,],,,[ 01

10 xxxx nTna x , be an augmented neural input vector. Here a new

second-order aggregating formulation is proposed by using an augmented weight

matrix )1()1()( nna tW as

aa

T

au xWx (10)

Then the neural output, y, is given by a nonlinear function of the variable u as 1)( uy (11)

Figure 5. An SONU defined by eqns. (10) and (11).

Because both the weights ijw and },,1,0{,, njiw ji in the augmented weight

matrix aW yield the same second order term ji xx (or ij xx ), an upper triangular

matrix or lower triangular matrix is sufficient to use. For instance, instead of

separately determining values for w01 and w10, both of which are weights for x0x1,

one can eliminate one of these weights and determine a value for either w01 or w10

that would be as much as both of these combined if they were computed separately.

This saves time in the neural network’s intensive procedure of computing weights.

The same applies for other redundant weights. The equation for the discriminant

line can be reexpressed as equal to transpose of the vector of neural inputs multiplied by

the upper triangular matrix of neural weights multiplied by the vector of neural inputs

again:

n

i

n

ij

jiijaa

T

a xxxwu0

0 1,xWx (12)

The number of elements, Wn, in the matrix of neural weights with redundant

elements is equal to (n+1) * (n+1). To calculate the number of elements in the final

matrix of neural weights with redundant elements eliminated, Wa, first find the

number of elements, which is (n+1) * (n+1). Then subtract the number of diagonal

elements in the matrix, which is n+1. Divide this by 2 and the result is the number

of elements above or below the diagonal in the matrix. Then add back the number

of diagonal elements in the matrix. Therefore, the number of elements in Wa with

redundant elements eliminated is given as

( 1)*( 1) ( 1)( 1)

2

n n nn

=

2 3 2

2

n n

Note that the conventional first order weighted linear combination is only a special

case of this second order matrix formulation. For example, the special weight matrix

(row vector). )1()1(

00100 ],,,[ nnna wwwRow W , can produce the equivalent

weighted linear combination, n

j jjxwu

0 0. Therefore, the proposed neural model

with the second order matrix operation is more general and, for this reason, it is called

an SONU.

2.2. Learning algorithms for second order neural units

Here learning algorithms are developed for SONUs. Let k denote the discrete time

steps, ,2,1k , and 1)( kyd be the desired output signal corresponding to the

neural input vector nk )(x at the k-th time step. A square error, )(kE , is defined

by the error, )()()( kykyke d , as

2)(2

1)( kekE (13)

where )(ky is the neural output corresponding to the neural input )(kx at the k-th

time instant.

The purpose of the neural units is to minimize the error E by adapting the weight

matrix aW as

)()()1( kkk aaa WWW (14)

Here )(kaW denotes the change in the weight matrix, which is defined as

proportional to the gradient of the error function )(kE

)(

)()(

k

kEk

a

aW

W

(15)

where 0 is a learning coefficient. Since the derivatives, },,2,1{,, njiwE ij ,

are calculated by the chain rule as

)()())((')(

)(

)(

)(

)(

)(

)(

)(

)(

kxkxkuke

kw

ku

ku

ky

ky

kE

kw

kE

ji

ijij

(16)

or

)()())((')()(

)(kkkuke

k

kE Taa

a

xxW

(17)

The changes in the weight matrix are given by

)()())((')()( kkkukekT

aaa xxW (18)

Here )(' u is the slope of the nonlinear activation function used in eqn. (11). For

activation functions such as sigmoidal function, 0)(' u and )(' u can be regarded

as a gain of the changes in weights. Then

)()()()( kkkekT

aaa xxW (19)

where )(' u . Note that taking the average of the changes for some input vectors,

the changes in the weights, )(kwij , implies the correlation between the error )(ke

and the corresponding inputs term )()( kxkx ji .

Therefore, conventional learning algorithms such as the backpropagation algorithm

can easily be extended for multilayered neural network structures having the proposed

SONUs.

In Table I, fundamental learning rules of static and dynamic SONUs are

summarized (for clarity with simplification of ()=for the case of time series

prediction. As an extension of the above static learning rule of SONUs, the update rule

of dynamic SONUs includes the recurrently calculated derivatives of neural output

( )n s ijy k n w where jij denotes columns of a recurrently calculated Jacobian matrix

(Table I).

Table I. Summary of fundamental static and dynamic learning techniques for SONU

for time series prediction where () = for simplicity.

Gradient Descent

k … sample number

Backpropagation Through Time (BPTT)

BPTT learning technique may be implemented as the combination

of:

a) RTRL for recurrent calculation of neural outputs and their

derivatives (with respect to weights) at every sample time k, and

b) Levenberg-Marquardt algorithm for calculation of weight

increments W when recurrent calculation are accomplished.

Recurrent Gradient Descent (RTRL)

Discrete

Dynamic

Levenberg-Marquardt (L-M)

Static

Learning RuleMathematical StructureSONU

Gradient Descent

k … sample number

Backpropagation Through Time (BPTT)

BPTT learning technique may be implemented as the combination

of:

a) RTRL for recurrent calculation of neural outputs and their

derivatives (with respect to weights) at every sample time k, and

b) Levenberg-Marquardt algorithm for calculation of weight

increments W when recurrent calculation are accomplished.

Recurrent Gradient Descent (RTRL)

Discrete

Dynamic

Levenberg-Marquardt (L-M)

Static

Learning RuleMathematical StructureSONU

0

n n

n i j ij

i j i

y x x w

Ta ax Wx

1

1

,

n

x

x

ax

1 2, ,..., ... external neural inputsnx x x

00 01 0

11 10

0 0

n

n

nn

w w w

w w

w

W

... neural outputny

W... upper triangular weight matrix

1

2

1( )

( )

( 1)

( )

( )

1

,

n s

n s

n

m

k n

k n

k

x k

x k

yy

y

ax

( )n sk ny T

a ax Wx

11( )T Tij ij ij ijw

j j j e

(1) (2) ( )T

Ne e ee

(1) (1)

(2) (2)

( ) ( )

i j

i j

ij

i jN N

x x

x x

wx x

nij

yj

... number of samples (data length)N

2( ) ( )( ) ( ) ( )

1

2ij i jk kk k kw e e x x

( 1) ( ) ( )ij ij ijk k kw w w

1

( )

( ) ( 2) ( 1)

( )

0 0 0

TTs

s s

Tni j

ij ij

T

n n n

ij ij ij ij

k n

k n k n k

yx x

w w

y y y

w w w w

a aij a a ij

a

ij

x Wxj W x x W j

xj

( )( )( )n s

ijij

kky k n

w ew

... real valuery

1

typically for prediction:( ) ( )( ) ( 1)

r

m r

k kx k y k mx y

3. Performance Assessment of Second Order Neural Units

To evaluate the learning and generalization abilities of the proposed general SONUs,

the XOR classification problem is used. The XOR problem will provide a simple

example of how well an SONU works for the nonlinear classification problem.

3.1. XOR problem

Because the two-input XOR function is not linearly separable, it is one of the simplest

logic functions that cannot be realized by a single linear combination neural unit.

Therefore, it requires a multilayered neural network structure consisting of linear

combination neural units.

On the other hand, a single SONU can solve this XOR problem by using its general

second order functions defined in eqn. (12). To implement the XOR function using a

single SONU, the four learning patterns corresponding to the four combinations of two

binary inputs )}1,1(),1,1(),1,1(),1,1{(),( 21 xx and the desired output

}1,1{21 xxyd were applied to the SONU.

For the XOR problem, the neural output, y, is defined by the signum function as

)sgn()( uuy . The correlation learning algorithm with a constant gain, 1 , in

eqn. (19) was used in this case. The learning was terminated as soon as the error

converged to 0. Because the SONU with the signum function classifies the neural

input data by using the second order nonlinear function of the neural inputs aaT

a xWx

as in eqn. (10), many nonlinear classification boundaries are possible such as a

hyperbolic boundary and an elliptical boundary (Table II).

Table II. Initial weights ( 0k ), final weights, and the classification boundaries for the

XOR problem.

Note that the results of the classification boundary are dependent on the initial weights

(Table II), and any classification boundary by the second order functions can be realized

by a single SONU. This realization ability of the SONU is obviously superior to the

linear combination neural unit, which cannot achieve such nonlinear classification

using a single neural unit. At least three linear combination neural units in a layered

structure are needed to solve the XOR problem.

Secondly, the number of parameters (weights) required for solving this problem can

be reduced by using the SONU. In this simulation study, by using the upper

triangular weight matrix, only six parameters including the threshold were required for

the SONU whereas at least nine parameters were required for the layered structure

with three linear combination neural units.

Each weight ijw represents how the corresponding input correlation term ji xx

affects the neural output. If the absolute value of the weight is very small, then the

effect of the corresponding input term on the output may also be very small. On the

other hand, the corresponding term may be dominant or important if the absolute value

of the weight is large compared to the other weights.

The weights in Table II suggest that the absolute value of 12w is always large

independent of the initial values and the largest except for only one case (middle row

where it is still the second largest). The absolute value of 00w is the largest in one

case (middle row) among three cases, but the smallest in one case (top row). The input

term corresponding to the weight 00w is nothing but the bias. Note that the large

|| 12w implies a large contribution of the correlation term 21xx to the output and that

the contribution of the term may be negative because 012 w . Indeed, the target XOR

function can be defined as 21xxy .

Consequently, if the target (unknown) function involves a higher order combination

of the input variables, the ability of the higher order neural units can be superior to

neural units that do not have necessary higher order input terms. Of course, this is

only a discussion on the synaptic operation, and somatic operation may create higher

order terms in the sense of Taylor expansion of the nonlinear activation functions.

However, such higher order terms by somatic operation may be limited or indirect.

Thus, the direct effect of the higher order terms is a reason why the higher order neural

units can be effective for such problems that may involve the higher order terms of the

input variables.

3.2. Time Series Prediction

In this subsection, the time-series prediction performance of dynamic SONUs

(Figure 7) adapted by dynamic gradient descent (RTRL) is demonstrated and compared

to single hidden layer perceptron-type recurrent neural networks with various numbers

of sigmoid neurons in the hidden layer (from 3 to 10) and two recurrent configurations,

recurrent hidden layer (RNN) and tapped delay feedbacks of neural output (TptDNN).

For comparison of the performance, extensive simulation analysis was performed on

theoretical and real data shown in Figure 6 and also white noise was added to training

and testing data to compare generalization and overfitting of SONUs.

-1

0

1 Art-3: Artificial ECG

-1

0

1 Real-2: Real ECG

-1

0

1 Real-3: EEG

-0.5

0

0.5

1 Art-4: Lorenz system

-0.5

0

0.5

1Art-5: MacKey-Glass

-0.5

0

0.5

1Art-2: Nonlinear periodic

-0.5

0

0.5

1Art-1: Quasiperiodic

0 200 400 600 800 1000 1200 1400 1600 1800 2000-1

0

1Real-4: R-R

k

-0.5

0

0.5

1

Real-1: Respiration

Figure 6. All signals (clean data) that were used in the experimental study. The first

1000 samples were training data. Samples for k=1001-2000 were used as testing

data.

Table III. Total counts of simulation experiments with SONU (QNU), recurrent

perceptron-type neural networks (RNN), and tapped-delay neural networks

(TptDNN) with a single hidden layer and various numbers of hidden neurons (3,

5, or 7)

Table IV. The percentage of counts of neural architectures that were tested with better

than average performance, measured with sum of square errors (SSE), of all

neural architectures that were tested.

Table V. Count of types of neural architectures that reached absolute minimum SSE

for three predicting horizons (after averaging results over three levels of noise

distortion). G OFpure smooth pure smooth

QNU RNNTptDN

NQNU

TptDN

NQNU RNN QNU RNN

TptD

NNQNU RNN

TptD

NN

Art-1 Quasiperiodic 3 1 1 3 1 2 8 3

Art-2 NonlinPeriodic 3 2 3 3 11

Art-3 ECG_Art 3 2 3 3 11

Art-4 Lorenz 3 2 3 2 1 8 3

Art-5 MacKeyGlass 3 2 3 3 11

Real-1 Respiration 2 1 2 3 2 1 9 1 1

Real-2 ECG_Real 3 2 3 3 11

Real-3 EEG 2 1 2 1 2 1 2 6 2 3

Real-4 RR 3 2 3 3 11

22 4 1 15 3 22 5 19 2 6

81% 15% 4% 83% 17% 81% 19% 70.4% 7.4% 22.2%percentage

Count J=Jmin

100% 100% 100% 100.0%

column count

row count

data info

G OF

pure smooth pure smooth

data info QNU RNN

TptDN

N QNU RNN

TptDN

N QNU RNN

TptDN

N QNU RNN

TptDN

N QNU RNN

TptD

NN

Art-1 Quasiperiodic 81% 46% 50% 89% 49% 59% 91% 85% 69% 89% 76% 67% 87% 64% 61%

Art-2 NonlinPeriodic 81% 46% 50% 84% 49% 56% 96% 75% 70% 90% 59% 57% 88% 57% 58%

Art-3 ECG_Art 100% 40% 33% 96% 43% 43% 93% 58% 59% 97% 48% 51% 97% 47% 46%

Art-4 Lorenz 76% 47% 46% 81% 48% 52% 81% 80% 69% 80% 79% 71% 80% 63% 59%

Art-5 MacKeyGlass 89% 51% 54% 76% 50% 50% 78% 65% 56% 82% 57% 55% 81% 56% 54%

Real-1 Respiration 82% 56% 51% 82% 50% 57% 97% 69% 59% 84% 59% 57% 86% 58% 56%

Real-2 ECG_Real 100% 33% 36% 97% 34% 36% 95% 55% 57% 97% 43% 42% 97% 41% 43%

Real-3 EEG 81% 63% 63% 61% 45% 46% 40% 73% 62% 37% 68% 59% 55% 62% 58%

Real-4 RR 89% 42% 49% 55% 44% 64% 83% 57% 57% 95% 52% 57% 81% 49% 56%

Column Average 86% 47% 48% 80% 46% 51% 84% 68% 62% 84% 60% 57%

Row Average% J

1

s r s rn n n nT

i j iji j i

x x w

a ax W x1

s r s rn n n nT

i j iji j i

x x w

a ax W x( )kx

1z1z

( 1)

( 1)

( )

( 1)

( 1)

1k nn s

kn

kr

kr

k nr r

y

y

y

y

y

( ) s rn n

k

ax

1z1z

1z1z

( )kry( )sk nny

Figure 7. Schematics of the recurrent QNU with ns-1 state feedbacks (recurrences) and

nr external inputs (real measured values) as used for time series prediction.

4. Higher Order Neural Units and Higher Order Neural Networks

To capture the higher order nonlinear properties of the input pattern space, extensive

efforts have been made by Rumelhart et al. (1986), Giles and Maxwell (1987), Softky

and Kammen (1991), Xu et al. (1992), Taylor and Commbes (1993), and Homma and

Gupta (2002) toward developing architectures of neurons that are capable of capturing

not only the linear correlation between components of input patterns, but also the

higher order correlation between components of input patterns. HONNs have proven

to have good computational, storage, pattern recognition, and learning properties and

are realizable in hardware (Taylor and Commbes, 1993). Regular polynomial networks

that contain the higher order correlations of the input components satisfy the

Stone-Weierstrass theorem that is a theoretical background of universal function

approximators by means of neural networks (Gupta et al., 2003), but the number of

weights required to accommodate all the higher order correlations increases

exponentially with the number of the inputs. HONUs are the basic building block for

such an HONN. For such an HONN as shown in Figure 8, the output is given by

)(uy (20)

n

i

n

ii

n

ii

iiiiiiiiii

N

NNxxwxxwxwwu

1 21 1

11212111

, ,,

0

(21)

where T

nxxx ],...,,[ 21x is a vector of neural inputs, y is an output, and (.) is a

strictly monotonic activation function such as a sigmoidal function whose inverse,

(.)1 , exists. The summation for the kth-order correlation is taken on a set

)1(),( 1 NjiiC j , which is a set of the combinations of j indices nii j 11

defined by

NjiiiniiiiiiC jjjj 1},,1:{)( 21111

Also, the number of the Nth-order correlation terms is given by

Njnj

jn

j

jn

1,

)!1(!

)!1(1

The introduction of the set )( 1 jiiC is to absorb the redundant terms due to the

symmetry of the induced combinations. In fact, eqn. (21) is a truncated Taylor series

with some adjustable coefficients. The Nth-order neural unit needs a total of

)!1(!

)!1(1

00

nj

jn

j

jn N

j

N

j

weights including the basis of all of the products up to N components.

Figure 8. Block diagram of the HONU, eqns. (20) and (21).

Example 1 In this example, we consider a case of the third order (N = 3) neural network

with two neural inputs (n = 2). Here

}2,1,0{)( iC

}22,12,11{)( 21 iiC

}222,122,112,111{)( 321 iiiC

and the network equation is

322222211222211123111122222112211122110 xwxxwxxwxwxwxxwxwxwxwwy

The HONUs may be used in conventional feedforward neural network structures as

hidden units to form HONNs. In this case, however, consideration of the higher

correlation may improve the approximation and generalization capabilities of the

neural networks. Typically, only SONNs are usually employed in practice to give a

tolerable number of weights as discussed in sections 2 and 3. On the other hand, if the

order of the HONU is high enough, eqns. (20) and (21) may be considered as a neural

network with n inputs and a single output. This structure is capable of dealing with

the problems of function approximation and pattern recognition.

To accomplish an approximation task for given input-output data )}(),({ kykx , the

learning algorithm for the HONN can easily be developed on the basis of the gradient

descent method. Assume that the error function is formulated as

)(2

1)]()([

2

1)( 22 kekykdkE

where )()()( kykdke , )(kd is the desired output, and )(ky is the output of the

neural network. Minimization of the error function by a standard steepest descent

technique yields the following set of learning equations:

)(')(00 uydwwoldnew (22)

jiii

old

ij

new

ij xxxuydww 21)(')( (23)

where dudu /)(' . Like the backpropagation algorithm for a multilayered

feedforward neural network (MFNN), a momentum version of the above is easily

obtained.

Alternatively, because all the weights of the HONN appear linearly in eqn. (21), one

may use the method for solving linear algebraic equations to carry out the preceding

learning task if the number of patterns is finite. To do so, one has to introduce the

following two augmented vectors

Tnnnnn wwwwwwwww ,...,,,...,,...,,,,...,, 2211121110w

and

TNnNNnn wxxxxxxxxxx ,...,,,...,,...,,,,...,,)( 21112212110 xu

where 10 x , so that the network equations, eqns. (20) and (21), may be rewritten in

the following compact form:

))(( xuwTy (24)

For the given p pattern pairs )}(),({ kdkx , ( pk 1 ), define the following vectors and

matrix

TTTTT pdddpuuu ))(()),...,2(()),1((,)(),...,2(),1( 111 dU

where ))(()( kk xuu , pk 1 . Then, the learning problem becomes one that finds a

solution of the following linear algebraic equation

dUw (25)

If the number of the weights is equal to the number of the data and the matrix U is

nonsingular, then eqn. (25) has a unique solution

dUw1

A more interesting case occurs when the dimension of the weight vector w is less than

the number of data p. Then the existence of the exact solution for the above linear

equation is given by

UdU rankrank In case this condition is not satisfied, the pseudoinverse solution is usually an option

and gives the best fit.

The following example shows how to use the HONN presented in this section to deal

with pattern recognition problems that are also typical applications in computer science

and engineering situations. It is of interest to show that solving such problems is

equivalent to finding the decision surfaces in the pattern space such that the given data

patterns are located on the surfaces.

Example 2 Consider a three-variable XOR function defined as

321213321321321 )()()(),,( xxxxxxxxxxxxxxxfy

The eight input pattern pairs and corresponding outputs are given in Table VI. This is

a typical nonlinear pattern classification problem. A single linear neuron with a

nonlinear activation function is unable to form a decision surface such that the patterns

are separated in the pattern space. Our objective here is to find all the possible

solutions using the third order neural network to realize the logic function.

Table VI. Truth table of XOR function 321 xxx .

A third order neural network is designed as

3211233223311321123322110 xxxwxxwxxwxxwxwxwxwwy

where }1,1{,, 321 xxx are the binary inputs, and the network contains eight weights.

To implement the above mentioned logic XOR function, one may consider the solution of

the following set of linear algebraic equations:

1

1

1

1

1

1

1

1

1232313123210

1232313123210

1232313123210

1232313123210

1232313123210

1232313123210

1232313123210

1232313123210

wwwwwwww

wwwwwwww

wwwwwwww

wwwwwwww

wwwwwwww

wwwwwwww

wwwwwwww

wwwwwwww

The coefficient matrix U is given by

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

U

Pattern Input 1x Input 2x Input 3x Output y

A 1 1 1 1

B 1 1 1 1

C 1 1 1 1

D 1 1 1 1

E 1 1 1 1

F 1 1 1 1

G 1 1 1 1

H 1 1 1 1

which is nonsingular. The equations have a unique set of solutions:

1,0 1232313123210 wwwwwwww

Therefore, the logic function is realized by the third order polynomial 321 xxxy .

This solution is unique in terms of the third order polynomial.

Xu et al. (1992) as well as Taylor and Commbes (1993) also demonstrated that

HONNs may be effectively applied to problems using a model of a curve, surface, or

hypersurface to fit a given data set. This problem, called nonlinear surface fitting, is

often encountered in many computer science and engineering applications. Some

learning algorithms for solving such problems can be found in their papers. Moreover,

if one assumes xx )( in the HONU, the weight exhibits linearity in the networks

and the learning algorithms for the HONNs may be characterized as a linear least

square (LS) procedure. Then the well-known local minimum problems existing in

many nonlinear neural learning schemes may be avoided.

4.1. Representation of Higher Order Neural Network Discriminant Using

Multidimensional Matrix Product

The discriminant of a HONN is a summation of quadratic terms. This can be

alternatively represented using multidimensional matrix multiplication (Solo, 2010).

For example,

3 3

1 1

ij i j

i j

w x x

= w11x12 + w12x1x2 + w13x1x3 + w21x2x1 + w22x22 + w23x2x3 + w31x3x1 + w32x3x2 + w33x32

= w11x12 + w22x22 + w33x32 + x1x2 (w12 + w21) + x1x3 (w13 + w31) + x2x3 (w23 + w32)

This weighted summation is easily represented using classical matrices multiplied

together:

3 3

1 1

ij i j

i j

w x x

= 11 12 13 1

1 2 3 21 22 23 2

31 32 33 3

* *

w w w x

x x x w w w x

w w w x

It is extremely useful to express these weighted summations as matrices multiplied

together to eliminate unnecessary terms in neural network designs. Because both the

weights wij and wji in the matrix above correspond to the same second-order term xixj, it

is sufficient to use only an upper triangular or lower triangular matrix. For instance,

instead of separately determining values for w12 and w21, both of which are weights for

x1x2, one can eliminate one of these weights and determine a value for either w12 or w21

that would be as much as both of these combined if they were computed separately.

The same applies for other redundant weights. This saves time in the neural

network’s intensive procedure of computing weights.

However, the following equation and more complicated equations used for neural

network applications cannot be expressed using classical matrices. Variables xi, xj, and

xk are inputs and wijk are weights for these inputs.

2 2 2

1 1 1

ijk i j k

i j k

w x x x

= w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23

= w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23

This weighted summation can be alternatively represented using multidimensional

matrices (Solo, 2010) multiplied together. Premultiply the 2 * 2 * 2 weight matrix by a

1 * 2 * 2 input matrix in the first dimension and second dimension. Then postmultiply

the 2 * 2 * 2 weight matrix by a 2 * 1 * 2 input matrix in the first dimension and second

dimension. Premultiply this entire product by a 1 * 2 input matrix in the first

dimension and second dimension. Note that because the first dimension and second

dimension of these multidimensional matrices are being multiplied, this does not need

to be indicated in the equations below.

2 2 2

1 1 1

ijk i j k

i j k

w x x x

=

111 121 1

1 2 211 221 2

1 2

112 122 11 2

212 222 2

* * *

w w x

x x w w xx x

w w xx x

w w x

The multidimensional matrix product (Solo, 2010) of the first dimension and second

dimension of the 1 * 2 * 2 input matrix and the 2 * 2 * 2 weight matrix results in a 1 * 2

* 2 matrix.

2 2 2

1 1 1

ijk i j k

i j k

w x x x

=

1

111 1 211 2 121 1 221 2 2

1 2

1112 1 212 2 122 1 222 2

2

* *

x

w x w x w x w x xx x

xw x w x w x w x

x

The multidimensional matrix product of the first dimension and second dimension

of the 1 * 2 * 2 matrix and the 2 * 1 * 2 input matrix results in a 1 * 1 * 2 matrix.

2 2 2

1 1 1

ijk i j k

i j k

w x x x

= 2 2

1 2

2 2

1 2

111 211 1 2 121 1 2 221

1 2

112 212 1 2 122 1 2 222

*w x w x x w x x w x

x xw x w x x w x x w x

The 1 * 1 * 2 matrix can be simplified into a 1-D matrix with 2 elements, so it can be

premultiplied by the 1 * 2 input matrix in the first dimension and second dimension.

2 2 2

1 1 1

ijk i j k

i j k

w x x x

= 2 2

1 2

2 2

1 2

111 211 1 2 121 1 2 2211 2

112 212 1 2 122 1 2 222*

w x w x x w x x w xx x

w x w x x w x x w x

= w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23

= w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23

Thus, this multidimensional matrix multiplication yields the same result as the

summation of quadratic terms above.

4.2. Modified Polynomial Neural Networks

4.2.1. Sigma-Pi Neural Networks

Note that an HONU contains all the linear and nonlinear correlation terms of the

input components to the order n. A slightly generalized structure of the HONU is a

polynomial network that includes weighted sums of products of selected input

components with an appropriate power. Mathematically, the input-output transfer

function of this network structure is given by

n

j

w

jiijxu

1

(26)

N

i

iiuwy1

(27)

where iji ww , , N is the order of the network and iu is the output of the i-th hidden

unit. This type of feedforward network is called a sigma-pi network (Rumelhart et al.

1986). It is easy to show that this network satisfies the Stone-Weierstrass theorem if

)(x is a linear function. Moreover, a modified version of the sigma-pi network, as

proposed by Hornik et al. (1989) and Cotter (1990), is

n

j

w

jiijxpu

1

(28)

N

i

iiuwy1

(29)

where iji ww , and jxp is a polynomial of jx . It is easy to verify that this

network satisfies the Stone-Weierstrass theorem, and thus, it can be an approximator

for problems of functional approximations. The sigma-pi network defined in eqns. (26)

and (27) is a special case of the above network while jxp is assumed to be a linear

function of jx . In fact, the weights ijw in both the networks given in eqns. (26) and

(28) may be restricted to integer or nonnegative integer values.

4.2.2. Ridge Polynomial Neural Networks

To obtain fast learning and powerful mapping capabilities, and to avoid the

combinatorial increase in the number of weights of HONNs, some modified polynomial

network structures have been introduced. One of these is the pi-sigma network (Shin

and Ghosh, 1991), which is a regular higher order structure and involves a much

smaller number of weights than sigma-pi networks. The mapping equation of a

pi-sigma network can be represented as

n

j

ijiji xwu1

(30)

N

i

n

j

ijij

N

i

i xwuy1 11

(31)

The total number of weights for an Nth-order pi-sigma network with n inputs is only

Nn )1( . Compared with the sigma-pi network structure, the number of weights

involved in this network is significantly reduced. Unfortunately, when xx )( , the

pi-sigma network does not match the conditions provided by the Stone-Weierstrass

theorem because the linear subspace condition is not satisfied (Gupta et al., 2003).

However, some studies have shown that it is a good network model for smooth functions

(Shin and Ghosh, 1991).

To modify the structure of the above mentioned pi-sigma networks such that they

satisfy the Stone-Weierstrass theorem, Shin and Ghosh (1991) suggested considering

the ridge polynomial neural network (RPNN). For the vectors Tijnijijij www ,...,, 21w

and Tnxxx ,...,, 21x , let

n

k

kijkij xw1

,wx

which represents an inner product between the two vectors. A one-variable continuous

function f of the form ijwx, is called a ridge function. A ridge polynomial is a

ridge function that can be represented as

i

ij

M

j

ij

N

i

a

wx,00

for some ija and n

ij w . The operation equation of a RPNN is expressed as

N

j

n

i

jiijy1 1

, wx

where xx )( . The denseness, which is a fundamental concept for universal

function approximators described in the Stone-Weierstrass theorem, of this network can

be verified (Gupta et al., 2003).

The total number of weights involved in this structure is 2)1)(1( nNN . A

comparison of the number of weights of the three types of polynomial network

structures is given in Table VII. The results show that when the networks have the

same higher-order terms, there are significantly less weights for a RPNN than for a

sigma-pi network. This is a very attractive improvement offered by RPNNs.

Table VII. The number of weights in the polynomial networks.

5. Engineering Applications

Function approximation problems are typical examples in many computer science and

engineering situations. The capability to approximate nonlinear complex functions

can be a basis of the complex pattern classification ability as well. Furthermore, the

neural network approach with high approximation ability can be used for time series

analysis by introducing time delay features into the neural network structure. Time

series analysis or estimation is one of the most important problems in computer science

and engineering applications. In this section, we will explain the function

approximation ability of HONNs first. Neural network structures to represent time

Order of

network

Number of weights

Pi-sigma RPNN Sigma-pi

N n=5 n=10 n=5 n=10 n=5 n=10

2 12 22 18 33 21 66

3 18 33 36 66 56 286

4 24 44 60 110 126 1001

delay features will then be introduced for time series analysis.

5.1. Function approximation problem

For evaluating the function approximation ability of HONNs, an example was taken

from Klassen et al. (1988). The task consists of learning a representation for an

unknown, one-variable nonlinear function, )(xF , with the only available information

being the 18 sample patterns (Villalobos and Merat, 1995).

For this function approximation problem, a two-layered neural network structure

was composed of two SONUs in the first layer and a single SONU in the output layer

(Figure 9). The nonlinear activation function of the SONUs in the first layer was

defined by a bipolar sigmoidal function as ))exp(1())exp(1()( uuu , but for

the single output SONU, instead of the sigmoidal function, the linear function was used:

uuy )( . The gradient learning algorithm with 1.0 was used for this

problem.

Figure 9. A two-layered neural network structure with two SONUs in the first layer

and a single SONU in the output layer for the function approximation problem.

The mapping function obtained by the SONU network after 710 learning iterations

appears in Figure 10. In this case, the average square error taken over 18 patterns

was 4.566E-6. The fact that the approximation accuracy shown in Figure 10 is

extremely high is evidence of the high approximation ability of the SONN.

Figure 10. Training pairs and outputs estimated by the network with SONUs for the

Klassen's function approximation problem (Klassen et al., 1988).

Five particular trigonometric functions, )2cos(),2sin(),cos(),sin( xxxx and

)4sin( x , were used as special features of the extra neural inputs (Klassen et al., 1988).

Also, it has been reported (Villalobos and Merat, 1995) that the term )cos( x is not

necessary to achieve a lower accuracy within the error tolerance 1.125E-4, but still four

extra features were required.

On the other hand, in this study, the high approximation accuracy of the proposed

SONU network was achieved by only two SONUs with the sigmoidal activation function

in the first layer and a single SONU with the linear activation function in the output

layer, and no special features were required for high accuracy. These are remarkable

advantages of the proposed SONN structure.

To highlight the superiority of HONN over the simple first-order neural networks in

capturing nonlinear correlations among multiple inputs, we show another example of

function approximation. For simplicity and to even more emphasize the strength of

concept of HONN, we will demonstrate the example using a single higher-order neural

unit of various orders N=2, 3, 4, 5.

We consider a multiple-input static function

2

2 2 2( , , )

0.1

x y x y zf x y z

x y z

(32)

where x,y, and z are normally distributed random variables (stdev=1) that represent the

input pattern data, and f( ) represent the target data. The length of training data was

300. For training both MLP and HONU, a basic version of the Levenberg-Marquardt

algorithm was implemented using a decreasing learning rate when training

performance, sum of square errors (SSE), stopped decreasing in two consequent

training epochs.

Figure 11. The upper plot shows training performance of static MLP neural network

with 10 sigmoidal neurons in a hidden layer and a linear output neuron. MLP needs

many epochs; the bottom plot shows that training performance of HONU improves with

increasing order N. HONUs are trained in very few epochs with the same

Levenberg-Marquardt algorithm.

Figure 12. Testing for trained MLP network and HONU from Error! Reference source

not found. on different data. The upper plot shows testing of static MLP network from

upper part of Error! Reference source not found.. The bottom plot shows that testing of

the best trained HONU for N=5. Mean average error of HONU is better than the one

of MLP.

Figure 13. Simulation run from different initial weights than in Error! Reference

source not found.. Again the upper plot shows training performance of a static MLP.

This time, the MLP typically gets stuck in a local minima. The bottom plot shows a

very similar training performance of a HONU for different initial weights and for the

same training data. This is because pure HONU (a polynomial neural unit) is linear in

its parameters, but it performs strong nonlinear mapping.

Figure 14. Testing for trained MLP network and HONU from Figure 13 on different

data. The upper plot shows testing of a static MLP network from the upper part of

Figure 13. The bottom plot shows that testing of the best trained HONU ( N=5).

HONU is pervasively more often precise then MLP. However, its MAE is this time

worse because the three outliers of the HONU become very imprecise. This may

occasionally happen with pure HONUs without output sigmoid function and it relates to

a lack of training data.

6. Concluding Remarks and Future Research Directions

In this chapter, the basic foundation of neural networks, starting from a basic

introduction to biological foundations, neural unit models, and learning properties, has

been introduced. Then as the first step to understanding HONNs, a general SONU

was developed. Simulation studies for both the pattern classification and function

approximation problems demonstrated that the learning and generalization abilities of

the proposed SONU and neural networks having SONUs are greatly superior to that of

the widely used linear combination neural units and their networks. Indeed, from the

point of view of both the neural computing process and its learning algorithm, it has

been found that linear combination of neural units that are widely used in multilayered

neural networks are only a subset of the proposed SONUs. Some extensions of these

concepts to radial basis function (RBF) networks, fuzzy neural networks, and dynamic

neural units will be interesting future research projects.

To further strengthen the readers’ interest in HONUs and HONNs, it should be

mentioned that HONUs are powerful nonlinear approximators that are linear in their

parameters. That is, if we look at the fundamental HONU representations, such as

eqn. (21) in this chapter, we clearly see that when input variables are substituted with

training data, the weight optimization of many fundamental HONN architectures

yields a linear optimization problem that is uniquely solvable by the

Levenberg-Marquardt algorithm or even by the least squares method. We believe that

HONNs represent a very big opportunity for many researchers as the need for more

advanced optimization methods is not so urgent for many HONUs that are basic

polynomials, yet nonlinearly powerful architectures. Therefore, rather than search for

some complicated optimization techniques, neural networks researchers and

practitioners may spend more effort with proper data selection and signal processing

that plays a crucial role in performance of neural networks including HONNs of course.

There is certainly rapidly growing research interest in the field of HONNs. There

are increasing complexities in applications not only in the fields of aerospace, process

control, ocean exploration, manufacturing, and resource based industry, but also in

computer science and engineering. This chapter deals with the theoretical foundations

of HONNs and will help readers to develop or apply the methods to their own modeling

and simulation problems. Most of the book deals with real modeling and simulation

applications.

We hope that our efforts in this chapter will stimulate research interests, provide

some new challenges to its readers, generate curiosity for learning more in the field, and

arouse a desire to seek new theoretical tools and applications. We will consider our

efforts successful if this chapter raises one’s level of curiosity.

7. Acknowledgements

Dr. Madan M. Gupta wishes to acknowledge the support from the Natural Sciences and

Engineering Research Council of Canada through the Discovery Grant. Dr. Ivo

Bukovsky’s research is supported by grants SGS12/177/OHK2/3T/12 and

SGS10/252/OHK2/3T/12. Dr. Zeng-Guang Hou’s research is partially supported by the

National Natural Science Foundation of China (Grant 61175076).

8. References

Bukovsky, I., Bila, J., Gupta, M., M, Hou, Z-G., Homma, N., (2010a).: “Foundation

and Classification of Nonconventional Neural Units and Paradigm of Nonsynaptic

Neural Interaction” in Discoveries and Breakthroughs in Cognitive Informatics and

Natural Intelligence within the series of the Advances in Cognitive Informatics and

Natural Intelligence (ACINI), ed. Y. Wang, IGI Publishing, USA, p.508-523.

Bukovsky, I., Homma, N., Smetana, L., Rodriguez, R., Mironovova M., Vrana S.,

(2010b): “Quadratic Neural Unit is a Good Compromise between Linear Models and

Neural Networks for Industrial Applications”, ICCI 2010 The 9th IEEE International

Conference on Cognitive Informatics, Beijing, China.

Bukovsky, I., Bila, J., & Gupta, M., M. (2005). Linear Dynamic Neural Units with Time

Delay for Identification and Control (in Czech). In Automatizace, Vol. 48, No. 10. Prague.

Czech Republic. ISSN 0005-125X, pp. 628-635,

Bukovsky, I., & Simeunovic, G. (2006). Dynamic-Order-Extended Time-Delay Dynamic

Neural Units. 8th Seminar on Neural Network Applications in Electrical Engineering

NEUREL-2006, IEEE (SCG) CAS-SP. Belgrade. ISBN 1-4244-0432-0

Bukovsky, I., Bila, J., & Gupta, M., M. (2006). Stable Neural Architecture of Dynamic

Neural Units with Adaptive Time Delays. In 7th International FLINS Conference on

Applied Artificial Intelligence. ISBN 981-256-690-2. pp. 215-222.

Cichochi, A., & Unbehauen, R. (1993). Neural Networks for Optimization and Signal

Processing. Chichester: Wiley.

Cotter, N. (1990). The Stone-Weierstrass Theorem and Its Application to Neural

Networks. IEEE Trans. Neural Networks, 1(4), 290-295.

Giles, C. L., & Maxwell, T. (1987). Learning invariance, and generalization in

higher-order networks. Appl. Optics, 26, 4972-4978.

Gupta, M. M., Jin, L., & Homma, N. (2003). Static and Dynamic Neural Networks:

From Fundamentals to Advanced Theory. Hoboken, NJ: IEEE & Wiley.

Harston, C. T. (1990). The Neurological Basis for Neural Computation. In Maren, A.

J., Harston, C. T., & Pap, R. M. (Eds.), Handbook of Neural Computing Applications, Vol.

1. (pp. 29-44). New York: Academic.

Homma, N., & Gupta, M. M. (2002). A general second order neural unit. Bull. Coll.

Med. Sci., Tohoku Univ., 11(1), 1-6.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward Networks

Are Universal Approximators. Neural Networks, 2(5), 359-366.

http://www.icci2010.edu.cn/http://www.icci2010.edu.cn/

Klassen, M., Pao, Y., & Chen, V. (1988). Characteristics of the functional link net: a

higher order delta rule net. Proc. of IEEE 2nd Annual Int'l. Conf. Neural Networks.

Kobayashi, S. (2006). Sensation World Made by the Brain – Animals Do Not Have

Sensors. Tokyo: Corona (in Japanese).

Matsuba, I. (2000). Nonlinear time series analysis. Tokyo: Asakura-syoten (in

Japanese).

McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas imminent in

nervous activity. Bull. Math. Biophys., 5, 115-133.

Narendra, K., & Parthasarathy, K. (1990). Identification and control of dynamical

systems using neural networks. IEEE Trans. Neural Networks, 1, 4-27.

Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks, Reading, MA:

Addison-Wesley..

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Internal

Representations by Error Propagation. In Rumelhart, D. E. and McClelland, J. L.

(Eds.), Parallel Distributed Processing: Explorations in the Microstructure of

Cognition, Vol. 1 (pp. 318-362). Cambridge, MA: MIT Press.

Schmidt, W., & Davis, J. (1993). Pattern recognition properties of various feature

spaces for higher order neural networks. IEEE Trans. Pattern Analysis and Machine

Intelligence, 15, 795-801.

Shin, Y., & Ghosh, J. (1991). The Pi-sigma Network: An Efficient Higher-order

Neural Network for Pattern Classification and Function Approximation. Proc. Int.

Joint Conf. on Neural Networks (pp. 13-18).

Sinha, N., Gupta, M. M., & Zadeh, L. (1999). Soft Computing and Intelligent Control

Systems: Theory and Applications. New York: Academic.

Softky, R. W., & Kammen, D. M. (1991). Correlations in high dimensional or

asymmetrical data sets: Hebbian neuronal processing. Neural Networks, 4, 337-347.

Taylor, J. G., & Commbes, S. (1993). Learning higher order correlations. Neural

Networks, 6, 423-428.

Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

Matrix Calculus: Part 1 of 5. Proceedings of the 2010 International Conference on

Scientific Computing (CSC'10), 353-359. CSREA Press.













Villalobos, L., & Merat, F. (1995). Learning capability assessment and feature space

optimization for higher-order neural networks. IEEE Trans. Neural Networks, 6,

267-272.

P. J. Werbos (1990), “Backpropagation through time: What it is and how to do it,” Proc.

IEEE, vol. 78, no. 10, pp. 1550–1560.

R. J. Williams and D. Zipser (1989) “A learning algorithm for continually running fully

recurrent neural networks,” Neural Comput., vol. 1, pp. 270–280.

Xu, L., Oja, E., & Suen, C. Y. (1992). Modified hebbian learning for curve and surface

fitting. Neural Networks, 5, 441-457.

9. Additional Reading

[A] Biological Motivation on Neural Networks [A.1] Ding, M.-Z., & Yang, W.-M. (1997). Stability of Synchronous Chaos and

On-Off Intermittency in Coupled Map Lattices. Phys. Rev. E, 56(4), 4009-4016. [A.2] Durbin, R. (1989). On the Correspondence Between Network Models and the

Nervous System. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The Computing Neurons. Reading, Mass.: Addison-Wesley.

[A.3] Engel, K., Konig, P., Kreiter, A. K., & Singer, W. (1991). Interhemispheric Synchronization of Oscillatory Neuronal Responses in Cat Visual Cortex. Science, 252, 1177-1178.

[A.4] Ersu, E., & Tolle, H. (1984). A New Concept for Learning Control Inspired by Brain Theory. Proc. 9th World Congress IFAC (pp. 245-250).

[A.5] Forbus, K. D., & Gentner, D. (1983). Casual Reasoning About Quantities. Proc. 5th Annual Conf. of the Cognitive Science Society (pp. 196-206).

[A.6] Fujita, M. (1982). Adaptive Filter Model of the Cerebellum. Biological Cybernetics, 45, 195-206.

[A.7] Garliaskas, A., & Gupta, M. M. (1995). A Generalized Model of Synapse-Dendrite-Cell Body as a Complex Neuron. World Congress on Neural Networks , Vol. 1 (pp. 304-307).

[A.8] Gupta, M. M. (1988). Biological Basis for Computer Vision: Some Perspective. SPW Conf. on Intelligent Robots and Computer Vision (pp. 811-823).

[A.9] Gupta, M. M., & Knopf, G. K. (1992). A Multitask Visual Information Processor with a Biologically Motivated Design. J. Visual Communicat., Image Representation, 3(3), 230-246.

[A.10] Hiramoto, M., Hiromi, Y., Giniger, E., & Hotta, Y. (2000). The Drosophila Netrin Receptor Frazzled Guides Axons by Controlling Netrin Distribution. Nature, 406(6798), 886-888.

[A.11] Honma, N., Abe, K., Sato, M., & Takeda, H. (1998). Adaptive Evolution of Holon Networks by an Autonomous Decentralized Method. Applied Mathematics and Computation, 9(1), 43-61.

[A.12] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.

[A.13] Kohara, K., Kitamura, A., Morishima, M., & Tsumoto, T. (2001). Activity-Dependent Transfer of Brain-Derived Neurotrophic Factor to Postsynaptic Neurons. Science, 291, 2419-2423.

[A.14] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605), Morgan Kaufmann.

[A.15] Lovejoy, C. O. (1981). The Origin of Man. Science, 211, 341-350. [A.16] Maire, M. (2000). On the Convergence of Validity Interval Analysis. IEEE

Trans. on Neural Networks, 11(3), 799-801. [A.17] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993).

Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.

[A.18] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ. Press.

[A.19] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.

[A.20] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.

[A.21] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.

[A.22] Pecht, O. Y., & Gur, M. (1995). A Biologically-Inspired Improved MAXNET. IEEE Trans. Neural Networks, 6, 757-759.

[A.23] Petshe, T., & Dickinson, B. W. (1990). Trellis Codes, Receptive Fields, and Fault-Tolerance Self-Repairing Neural Networks. IEEE Trans. Neural Networks, 1(2), 154-166.

[A.24] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, pp. 46-52.

[A.25] Rao, D. H., & Gupta, M. M. (1993). A Generic Neural Model Based on Excitatory - Inhibitory Neural Population. IJCNN-93 (pp. 1393-1396).

[A.26] Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386-408.

[A.27] Skarda, C. A., & Freeman, W. J. (1987). How Brains Make Chaos in Order to Make Sense of the World. Behavioral and Brain Sciences, 10, 161-195.

[A.28] Stevens, C. F. (1968). Synaptic Physiology. Proc. IEEE, 79(9), 916-930. [A.29] Wilson, H. R. and Cowan, J. D. (1972). Excitatory and Inhibitory

Interactions in Localized Populations of Model Neurons. Biophysical J, 12, 1-24.

[B] Neuronal Morphology: Concepts and Mathematical Models [B.1] Amari, S. (1971). Characteristics of Randomly Connected

Threshold-Element Networks and Network Systems. Proc. IEEE, 59(1), 35-47. [B.2] Amari, S. (1972). Characteristics of Random Nets of Analog Neuron - Like

Elements. IEEE Trans. Systems, Man and Cybernetics, 2, 643-654.

[B.3] Amari, S. (1972). Learning Patterns and Pattern Sequences by Self-Organizing Nets of Threshold Elements. IEEE Trans. on Computers, 21, 1197-1206.

[B.4] Amari, S. (1977). A Mathematical Approach to Neural Systems. In J. Metzler (Ed.), Systems Neuroscience (pp. 67-118). New York: Academic.

[B.5] Amari, S. (1977). Neural Theory of Association and Concept Formation. Biological Cybernetics, 26, 175-185.

[B.6] Amari, S. (1990). Mathematical Foundations of Neurocomputing. Proc. IEEE, 78(9), 1443-1462.

[B.7] Amit, D. J., Gutfreund, G., & Sompolinsky, H. (1985). Spin-Glass Model of Neural Networks. Physical Review A, 32, 1007-1018.

[B.8] Anagun, A. S., & Cin, I. (1998). A Neural-Network-Based Computer Access Security System for Multiple Users. Proc. 23rd Inter. Conf. Comput. Ind. Eng., Vol. 35 (pp. 351-354).

[B.9] Anderson, J. A. (1983). Cognition and Psychological Computation with Neural Models. IEEE Trans. System, Man and Cybernetics, 13, 799-815.

[B.10] Anninos, P. A. Beek, B., Csermel, T. J., Harth, E. E., & Pertile, G. (1970). Dynamics of Neural Structures. J. of Theoretical Biological, 26, 121-148.

[B.11] Aoki, C., & Siekevltz, P. (1988). Plasticity in Brain Development. Scientific American, Dec., 56-64,

[B.12] Churchland, P. S., & Sejnowski, T. J. (1988). Perspectives on Cognitive Neuroscience. Science, 242, 741-745.

[B.13] Holmes C. C., & Mallick, B. K. (1998). Bayesian Radial Basis Functions of Variable Dimension. Neural Computations, 10(5), 1217-1233.

[B.14] Hopfield, J. (1990). Artificial Neural Networks are Coming. An Interview by W. Myers, IEEE Expert, Apr., 3-6.

[B.15] Joshi, A., Ramakrishman, N., Houstis, E. N., & Rice, J. R. (1997). On Neurobiological, Neurofuzzy, Machine Learning, and Statistical Pattern Recognition Techniques. IEEE Trans. Neural Networks, 8.

[B.16] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.

[B.17] Kaneko, K. (1997). Coupled Maps with Growth and Death: An Approach to Cell Differentiation. Phys. D, 103, 505-527.

[B.18] Knopf, G. K., & Gupta, M. M. (1993). Dynamics of Antagonistic Neural Processing Elements. Inter. J. of Neural Systems, 4(3), 291-303.

[B.19] Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks, 1(1), 3-16.

[B.20] Kohonen, T. (1990). The Self-Organizing Map. Proc. of the IEEE, 78(9), 1464-1480.

[B.21] Kohonen, T. (1991). Self-Organizing Maps: Optimization Approaches. In T. Kohonen, K. Makisara, O. Simula, & J. Kangas (Eds.), Artificial Neural Networks (pp. 981-990). Amsterdam: Elsevier.

[B.22] Kohonen, T. (1993). Things You Haven't Heard About The Self-Organizing Map. Proc. Inter. Conf. Neural Networks 1993 (pp. 1147-1156).

[B.23] Kohonen, T. (1998). Self Organization of Very Large Document Collections: State of the Art. Proc. 8th Inter. Conf. Artificial Neural Networks, Vol. 1 (pp. 65-74).

[B.24] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605). Morgan Kaufmann.

[B.25] Lippmann, R. P. (1987). An Introduction to Computing with Neural Networks. IEEE Acoustics, Speech and Signal Processing Magazine, 4(2), 4-22.

[B.26] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993). Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.

[B.27] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ.

[B.28] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.

[B.29] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.

[B.30] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.

[B.31] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, 46-52.

[B.32] Sandewall, E. (1989). Combining Logic and Differential Equations for Describing Real-World Systems. Proc. 1st Inter. Conf. on Principles of Knowledge Representation and Reasoning (pp. 412-420). Morgan Kaufmann.

[B.33] Setiono, R., & Liu, H. (1996). Symbolic Representation of Neural Networks. Computer, 29(3), 71-77.

[B.34] Wilson, H. R., & Cowan, J. D. (1972). Excitatory and Inhibitory Interactions in Localized Populations of Model Neurons. Biophysical J., 12, 1-24.

View publication statsView publication stats
https://www.researchgate.net/publication/215884506

fundamental theory of artificial higher order neural networks · fundamentals of higher order...

Documents