fundamental theory of artificial higher order neural networks · fundamentals of higher order...
TRANSCRIPT
-
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/215884506
Fundamentals of Higher Order Neural Networks for Modeling and Simulation
Chapter · October 2012
DOI: 10.4018/978-1-4666-2175-6.ch006
CITATIONS
31READS
3,659
5 authors, including:
Some of the authors of this publication are also working on these related projects:
Research on Medical Diagnosis with Intelligent Systems View project
Intelligent Prediction Method of Lung Tumor Motion for Highly Accurate Radiation Therapy View project
Madan M Gupta
University of Saskatchewan
473 PUBLICATIONS 12,206 CITATIONS
SEE PROFILE
Ivo Bukovsky
Czech Technical University in Prague
81 PUBLICATIONS 421 CITATIONS
SEE PROFILE
Noriyasu Homma
Institute of Electrical and Electronics Engineers
207 PUBLICATIONS 1,626 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ivo Bukovsky on 01 February 2017.
The user has requested enhancement of the downloaded file.
https://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_2&_esc=publicationCoverPdfhttps://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_3&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Research-on-Medical-Diagnosis-with-Intelligent-Systems?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Intelligent-Prediction-Method-of-Lung-Tumor-Motion-for-Highly-Accurate-Radiation-Therapy?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_1&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/University-of-Saskatchewan?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Czech_Technical_University_in_Prague?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Institute_of_Electrical_and_Electronics_Engineers?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_10&_esc=publicationCoverPdf
-
Fundamentals of Higher Order Neural Networks
for Modeling and Simulation
Madan M. Gupta1,
Ivo Bukovsky2, Noriyasu Homma3, Ashu M. G. Solo4, and Zeng-Guang Hou5
Summary: In this chapter, we provide fundamental principles of higher order neural
units (HONUs) and higher order neural networks (HONNs) for modeling and
simulation. An essential core of HONNs can be found in higher order weighted
combinations or correlations between the input variables and HONU. Except the high
quality of nonlinear approximation of static HONUs, the capability of dynamic HONUs
for modeling of dynamic systems is shown and compared to conventional recurrent
neural networks when a practical learning algorithm is used. Also, the potential of
continuous dynamic HONUs to approximate high dynamic-order systems is discussed
as adaptable time delays can be implemented. By using some typical examples, this
chapter describes how and why higher order combinations or correlations can be
effective for modeling of systems.
Keywords: higher order neural networks, higher order neural units, second order
neural networks, second order neural units, sigma-pi networks, pi-sigma networks,
ridge polynomial neural networks, tapped delay line neural networks
1. Introduction
The human brain has more than 10 billion neurons, which have complicated
interconnections, and these neurons constitute a large-scale signal processing and
memory network. The mathematical study of a single neural model and its various
extensions is the first step in the design of a complex neural network for solving a
variety of problems in the fields of signal processing, pattern recognition, control of
complex processes, neurovision systems, and other decision making processes. Neural
network solutions for these problems can be directly used for computer science and
engineering applications.
1 University of Saskatchewan, Saskatoon, SK, Canada, [email protected] 2 Czech Technical University in Prague, Prague, Czech, [email protected] 3 Tohoku University, Sendai, Japan, [email protected] 4 Maverick Technologies America Inc., Wilmington, DE, USA,
[email protected] 5 The Chinese Academy of Sciences, Beijing, China, [email protected]
-
A simple neural model is presented in Figure 1. In terms of information processing,
an individual neuron with dendrites as multiple-input terminals and an axon as a
single-output terminal may be considered a multiple-input/single-output (MISO)
system. The processing functions of this MISO neural processor may be divided into
the following four categories:
(i) Dendrites: They consist of a highly branching tree of fibers and act as input
points to the main body of the neuron. On average, there are 103 to 104
dendrites per neuron, which form receptive surfaces for input signals to the
neurons.
(ii) Synapse: It is a storage area of past experience (knowledge base). It
provides long-term memory (LTM) to the past accumulated experience. It
receives information from sensors and other neurons and provides outputs
through the axons.
(iii) Soma: The neural cell body is called the soma. It is the large, round central
neuronal body. It receives synaptic information and performs further
processing of the information. Almost all logical functions of the neuron are
carried out in the soma.
(iv) Axon: The neural output line is called the axon. The output appears in the
form of an action potential that is transmitted to other neurons for further
processing.
The electrochemical activities at the synaptic junctions of neurons exhibit a complex
behavior because each neuron makes hundreds of interconnections with other neurons.
Each neuron acts as a parallel processor because it receives action potentials in parallel
from the neighboring neurons and then transmits pulses in parallel to other
neighboring synapses. In terms of information processing, the synapse also performs a
crude pulse frequency-to-voltage conversion as shown in Figure 1.
-
Figure 1. A simple neural model as a multiple-input (dendrites) and single-output
(axon) processor.
1.1. Neural mathematical operations
In general, it can be argued that the role played by neurons in the brain reasoning
processes is analogous to the role played by a logical switching element in a digital
computer. However, this analogy is too simple. A neuron contains a sensitivity
threshold, adjustable signal amplification or attenuation at each synapse and an
internal structure that allows incoming nerve signals to be integrated over both space
and time. From a mathematical point of view, it may be concluded that the processing
of information within a neuron involves the following two distinct mathematical
operations:
(i) Synaptic operation: The strength (weight) of the synapse is a representation
of the storage of knowledge and thus the memory for previous knowledge.
The synaptic operation assigns a relative weight (significance) to each
incoming signal according to the past experience (knowledge) stored in the
synapse.
(ii) Somatic operation: The somatic operation provides various mathematical
operations such as aggregation, thresholding, nonlinear activation, and
dynamic processing to the synaptic inputs. If the weighted aggregation of the
neural inputs exceeds a certain threshold, the soma will produce an output
signal to its axon.
A simplified representation of the above neural operations for a typical neuron is
shown in
Figure 2. A biological neuron deals with some interesting mathematical mapping
properties because of its nonlinear operations combined with a threshold in the soma.
N e ur a l i nput s S y na ps e
D e ndr i t e s ( i nput s )
A xon
S om a
N e ur a l out put w 0
-
If neurons were only capable of carrying out linear operations, the complex human
cognition and robustness of neural systems would disappear.
Figure 2. Simple model of a neuron showing (a) synaptic and (b) somatic operations.
Observations from both experimental and mathematical analysis have indicated
that neural cells can transmit reliable information if they are sufficiently redundant in
numbers. However, in general, a biological neuron has an unpredictable mechanism
for processing information. Therefore, it is postulated that the collective activity
generated by large numbers of locally redundant neurons is more significant than the
activity generated by a single neuron.
1.2. Synaptic operation
As shown in
Figure 2, let us consider a neural memory vector of accumulated past experiences
nT
nwww ],,,[ 21 w , which is usually called synapse weights, and a neural
input vector nT
nxxx ],,,[ 21 x as the current external stimuli. Through the
comparison process between the neural memory w and the input x , the neuron can
calculate a similarity between the usual (memory base) and current stimuli and thus
know the current situation (Kobayashi, 2006). According to the similarity, the neuron
can then derive its internal value as the membrane potential.
A similarity measure u can be calculated as an inner product of the neural memory
vector w and the current input vector x given by
-
n
i
iinn
T
xwxwxwxw
u
1
2211
)(
xwxw
(1)
The similarity implies the linear combination of the neural memory and the current
input, or correlation between them. This idea can be traced back to the milestone
model proposed by McCulloch and Pitts (1943).
As shown in Figure 3, the inner product can also be represented as
cos|||| xwu (2)
where |.| denotes the absolute value of the vector and is the angle between the
vectors w and x .
Figure 3. Inner product as a measure of similarity between a neural memory (past
experience) w and a neural input (current experience) x .
When a current input x points to the same or very similar direction of the neural
memory w , the similarity measure u becomes large and correlation between the
memory w and the input x becomes positively strong due to 1cos . If the
input x points to the opposite or nearly opposite direction of the memory w , the
absolute value of the similarity measure |u| also becomes large, but the negative
correlation becomes strong because 1cos . In these two cases, absolute values of
the memory w and the input x also influence the similarity measure. The other
particular case is that the input x and the memory w are orthogonal with each other.
In this case, the similarity measure u becomes very small due to 0cos . If the two
vectors are strictly orthogonal, the similarity measure u is equal to 0. Thus, the
similarity measure is independent of the absolute values of the memory w and the
input x .
The inner product indicates how much the directions of two vectors are similar to
each other. Indeed, in the case of normalized vectors w and x , i.e., 1|||| xw , the
similarity measure is nothing but cos :
uu coscos|||| xw (3)
Note that the linear combination can be extended to higher order combinations as in
-
the following section.
1.2.1. Higher Order Terms of Neural Inputs
In the linear combination given in eqn. (1), we considered a neural input vector
consisting of only the first order terms of neural inputs in the polynomial. Naturally,
we can extend the first order terms to the higher order terms of the neural inputs or any
other nonlinear ones. To separate different classes of data with a nonlinear
discriminant line, an HONN (Rumelhart et al., 1986a; Giles and Maxwell, 1987;
Softky and Kammen, 1991; Xu et al., 1992; Taylor and Commbes, 1993; Homma and
Gupta, 2002) is used. An HONN is composed of one or more HONUs.
Here let us consider the second order polynomial of the neural inputs. In this case,
the extended neural input and memory vectors, ax and aw , can be defined by
T
nnnnna xxxxxxxxxxxx ],,...,,,...,,,,,,[2
1
2
2121
2
121 x (4)
T
nnnnnna wwwwwwwww ],,...,,,...,,,,,,[ )1(221121100201 w (5)
Then the similarity measure can be given with the same notation
n
i
n
ij
jiij
n
i
ii
nnnnnnn
nnnn
a
T
aaaa
xxwxw
xwxxwxw
xxwxxwxwxwxwxw
u
11
2
1)1(
2
222
112112
2
1112211
xwxw
(6)
The second order terms of ji xx can be related to correlations between the two inputs
ix and jx . That is, if the two inputs are statistically independent of each other, then
the second order terms become 0 while absolute values of terms become large if there is
a linear relation between them. The squared terms of neural inputs 2
ix indicate the
power of the inputs from the physical point of view.
Consequently, the similarity measure of general higher order terms can be defined
as
n
nn
n i
n
i
n
ii
n
ii
iiiii
n
i
n
ij
jiij
n
i
iia xxxwxxwxwu
1
...
11 1 12 1
2121 (7)
-
1.3. Somatic operation
Typical neural outputs are generated by a sigmoidal activation function of the similarity
measure u of the inner product of neural memories (past experiences) and current
inputs. In this case, the neural output y can be given as 1)( uy (8)
where is a neural activation function. An example of the activation function can be
defined as a so-called sigmoidal function given by
)exp(1
1)(
xx
(9)
and shown in Figure 4.
Figure 4. A sigmoidal activation function.
Note that the activation function is not limited to the sigmoid one. However, this
type of sigmoid function has been widely used in various fields. Here if the similarity u
is large—that is, the current input x is similar to the corresponding neural memory
w —the neural output y is also large. On the other hand, if the similarity u is small,
the neural output y is also small. This is a basic characteristic of biological neural
activities. Note that the neural output is not proportional to the similarity u, but a
nonlinear function of u with saturation characteristics. This nonlinearity might be a
key mechanism to make the neural activities more complex as brains do.
1.4. Learning from experiences
From the computational point of view, we have discussed how neurons, which are
elemental computational units in the brain, produce outputs y as the results of neural
information processing based on comparison of current external stimuli x with neural
memories of past experiences w . Consequently, the neural outputs y are strongly
-
dependent on the neural memories w . Thus, how neurons can memorize past
experiences is crucial for neural information processing. Indeed, one of the most
remarkable features of the human brain is its ability to adaptively learn in response to
knowledge, experience, and environment. The basis of this learning appears to be a
network of interconnected adaptive elements by means of which transformation
between inputs and outputs is performed.
Learning can be defined as the acquisition of new information. In other words,
learning is a process of memorizing new information. Adaptation implies that the
element can change in a systematic manner and in so doing alter the transformation
between input and output. In the brain, transmission within the neural system
involves coded nerve impulses and other physical chemical processes that form
reflections of sensory stimuli and incipient motor behavior.
Many biological aspects are associated with such learning processes, including
(Harston, 1990)
Learning overlays hardwired connections
Synaptic plasticity versus stability: a crucial design dilemma
Synaptic modification providing a basis for observable organism behavior
Here, we have presented the basic foundation of neural networks starting from a
basic introduction to the biological foundations, neural models, and learning properties
inherent in neural networks. The rest of the chapter contains the following five
sections:
In section 2, as the first step to understanding HONNs, we will develop a general
matrix form of the second order neural units (SONUs) and the learning algorithm.
Using the general form, it will be shown that, from the point of view of both the neural
computing process and its learning algorithm, the widely used linear combination
neural units described above are only a subset of the developed SONUs.
In section 3, we will conduct some simulation studies to support the theoretical
development of second order neural networks (SONNs). The results will show how and
why SONNs can be effective for many problems.
In section 4, HONUs and HONNs with a learning algorithm will be presented.
Toward computer science and engineering applications, function approximation and
time series analysis problems will be considered in section 5.
Concluding remarks and future research directions will be given in section 6.
2. Second Order Neural Units and Second Order Neural Networks
-
Neural networks, consisting of first order neurons which provide the neural output as a
nonlinear function of the weighted linear combination of neural inputs, have been
successfully used in various applications such as pattern recognition/classification,
system identification, adaptive control, optimization, and signal processing (Sinha et al.,
1999; Gupta et al., 2003; Narendra and Parthasarathy, 1990; Cichochi and Unbehauen,
1993).
The higher order combination of the inputs and weights will yield higher neural
performance. However, one of the disadvantages encountered in the previous
development of HONUs is the larger number of learning parameters (weights) required
(Schmidt, 1993). To optimize the features space, a learning capability assessment
method has been proposed by Villalobos and Merat (1995).
In this section, in order to reduce the number of parameters without loss of higher
performance, an SONU is presented (Homma and Gupta, 2002); A SONU is also
sometimes denoted as a quadratic neural unit (Bukovsky et al 2010). Using a general
matrix form of the second order operation, the SONU provides the output as a nonlinear
function of the weighted second order combination of input signals. Note that the
matrix form can contribute to high speed computing, such as parallel and vector
processing, which is essential for scientific and image processing.
2.1. Formulation of the second order neural unit
A SONU with n-dimensional neural inputs,nt )(x , and a single neural output,
1)( ty , is developed in this section (Figure 5). Let
1,],,,[ 01
10 xxxx nTna x , be an augmented neural input vector. Here a new
second-order aggregating formulation is proposed by using an augmented weight
matrix )1()1()( nna tW as
aa
T
au xWx (10)
Then the neural output, y, is given by a nonlinear function of the variable u as 1)( uy (11)
-
Figure 5. An SONU defined by eqns. (10) and (11).
Because both the weights ijw and },,1,0{,, njiw ji in the augmented weight
matrix aW yield the same second order term ji xx (or ij xx ), an upper triangular
matrix or lower triangular matrix is sufficient to use. For instance, instead of
separately determining values for w01 and w10, both of which are weights for x0x1,
one can eliminate one of these weights and determine a value for either w01 or w10
that would be as much as both of these combined if they were computed separately.
This saves time in the neural network’s intensive procedure of computing weights.
The same applies for other redundant weights. The equation for the discriminant
line can be reexpressed as equal to transpose of the vector of neural inputs multiplied by
the upper triangular matrix of neural weights multiplied by the vector of neural inputs
again:
n
i
n
ij
jiijaa
T
a xxxwu0
0 1,xWx (12)
The number of elements, Wn, in the matrix of neural weights with redundant
elements is equal to (n+1) * (n+1). To calculate the number of elements in the final
matrix of neural weights with redundant elements eliminated, Wa, first find the
number of elements, which is (n+1) * (n+1). Then subtract the number of diagonal
elements in the matrix, which is n+1. Divide this by 2 and the result is the number
of elements above or below the diagonal in the matrix. Then add back the number
of diagonal elements in the matrix. Therefore, the number of elements in Wa with
redundant elements eliminated is given as
( 1)*( 1) ( 1)( 1)
2
n n nn
=
2 3 2
2
n n
-
Note that the conventional first order weighted linear combination is only a special
case of this second order matrix formulation. For example, the special weight matrix
(row vector). )1()1(
00100 ],,,[ nnna wwwRow W , can produce the equivalent
weighted linear combination, n
j jjxwu
0 0. Therefore, the proposed neural model
with the second order matrix operation is more general and, for this reason, it is called
an SONU.
2.2. Learning algorithms for second order neural units
Here learning algorithms are developed for SONUs. Let k denote the discrete time
steps, ,2,1k , and 1)( kyd be the desired output signal corresponding to the
neural input vector nk )(x at the k-th time step. A square error, )(kE , is defined
by the error, )()()( kykyke d , as
2)(2
1)( kekE (13)
where )(ky is the neural output corresponding to the neural input )(kx at the k-th
time instant.
The purpose of the neural units is to minimize the error E by adapting the weight
matrix aW as
)()()1( kkk aaa WWW (14)
Here )(kaW denotes the change in the weight matrix, which is defined as
proportional to the gradient of the error function )(kE
)(
)()(
k
kEk
a
aW
W
(15)
where 0 is a learning coefficient. Since the derivatives, },,2,1{,, njiwE ij ,
are calculated by the chain rule as
)()())((')(
)(
)(
)(
)(
)(
)(
)(
)(
kxkxkuke
kw
ku
ku
ky
ky
kE
kw
kE
ji
ijij
(16)
or
)()())((')()(
)(kkkuke
k
kE Taa
a
xxW
(17)
-
The changes in the weight matrix are given by
)()())((')()( kkkukekT
aaa xxW (18)
Here )(' u is the slope of the nonlinear activation function used in eqn. (11). For
activation functions such as sigmoidal function, 0)(' u and )(' u can be regarded
as a gain of the changes in weights. Then
)()()()( kkkekT
aaa xxW (19)
where )(' u . Note that taking the average of the changes for some input vectors,
the changes in the weights, )(kwij , implies the correlation between the error )(ke
and the corresponding inputs term )()( kxkx ji .
Therefore, conventional learning algorithms such as the backpropagation algorithm
can easily be extended for multilayered neural network structures having the proposed
SONUs.
In Table I, fundamental learning rules of static and dynamic SONUs are
summarized (for clarity with simplification of ()=for the case of time series
prediction. As an extension of the above static learning rule of SONUs, the update rule
of dynamic SONUs includes the recurrently calculated derivatives of neural output
( )n s ijy k n w where jij denotes columns of a recurrently calculated Jacobian matrix
(Table I).
Table I. Summary of fundamental static and dynamic learning techniques for SONU
for time series prediction where () = for simplicity.
-
Gradient Descent
k … sample number
Backpropagation Through Time (BPTT)
BPTT learning technique may be implemented as the combination
of:
a) RTRL for recurrent calculation of neural outputs and their
derivatives (with respect to weights) at every sample time k, and
b) Levenberg-Marquardt algorithm for calculation of weight
increments W when recurrent calculation are accomplished.
Recurrent Gradient Descent (RTRL)
Discrete
Dynamic
Levenberg-Marquardt (L-M)
Static
Learning RuleMathematical StructureSONU
Gradient Descent
k … sample number
Backpropagation Through Time (BPTT)
BPTT learning technique may be implemented as the combination
of:
a) RTRL for recurrent calculation of neural outputs and their
derivatives (with respect to weights) at every sample time k, and
b) Levenberg-Marquardt algorithm for calculation of weight
increments W when recurrent calculation are accomplished.
Recurrent Gradient Descent (RTRL)
Discrete
Dynamic
Levenberg-Marquardt (L-M)
Static
Learning RuleMathematical StructureSONU
0
n n
n i j ij
i j i
y x x w
Ta ax Wx
1
1
,
n
x
x
ax
1 2, ,..., ... external neural inputsnx x x
00 01 0
11 10
0 0
n
n
nn
w w w
w w
w
W
... neural outputny
W... upper triangular weight matrix
1
2
1( )
( )
( 1)
( )
( )
1
,
n s
n s
n
m
k n
k n
k
x k
x k
yy
y
ax
( )n sk ny T
a ax Wx
11( )T Tij ij ij ijw
j j j e
(1) (2) ( )T
Ne e ee
(1) (1)
(2) (2)
( ) ( )
i j
i j
ij
i jN N
x x
x x
wx x
nij
yj
... number of samples (data length)N
2( ) ( )( ) ( ) ( )
1
2ij i jk kk k kw e e x x
( 1) ( ) ( )ij ij ijk k kw w w
1
( )
( ) ( 2) ( 1)
( )
0 0 0
TTs
s s
Tni j
ij ij
T
n n n
ij ij ij ij
k n
k n k n k
yx x
w w
y y y
w w w w
a aij a a ij
a
ij
x Wxj W x x W j
xj
( )( )( )n s
ijij
kky k n
w ew
... real valuery
1
typically for prediction:( ) ( )( ) ( 1)
r
m r
k kx k y k mx y
3. Performance Assessment of Second Order Neural Units
To evaluate the learning and generalization abilities of the proposed general SONUs,
the XOR classification problem is used. The XOR problem will provide a simple
example of how well an SONU works for the nonlinear classification problem.
3.1. XOR problem
Because the two-input XOR function is not linearly separable, it is one of the simplest
logic functions that cannot be realized by a single linear combination neural unit.
Therefore, it requires a multilayered neural network structure consisting of linear
combination neural units.
On the other hand, a single SONU can solve this XOR problem by using its general
second order functions defined in eqn. (12). To implement the XOR function using a
single SONU, the four learning patterns corresponding to the four combinations of two
binary inputs )}1,1(),1,1(),1,1(),1,1{(),( 21 xx and the desired output
}1,1{21 xxyd were applied to the SONU.
For the XOR problem, the neural output, y, is defined by the signum function as
-
)sgn()( uuy . The correlation learning algorithm with a constant gain, 1 , in
eqn. (19) was used in this case. The learning was terminated as soon as the error
converged to 0. Because the SONU with the signum function classifies the neural
input data by using the second order nonlinear function of the neural inputs aaT
a xWx
as in eqn. (10), many nonlinear classification boundaries are possible such as a
hyperbolic boundary and an elliptical boundary (Table II).
Table II. Initial weights ( 0k ), final weights, and the classification boundaries for the
XOR problem.
Note that the results of the classification boundary are dependent on the initial weights
(Table II), and any classification boundary by the second order functions can be realized
by a single SONU. This realization ability of the SONU is obviously superior to the
linear combination neural unit, which cannot achieve such nonlinear classification
using a single neural unit. At least three linear combination neural units in a layered
structure are needed to solve the XOR problem.
Secondly, the number of parameters (weights) required for solving this problem can
be reduced by using the SONU. In this simulation study, by using the upper
triangular weight matrix, only six parameters including the threshold were required for
the SONU whereas at least nine parameters were required for the layered structure
with three linear combination neural units.
Each weight ijw represents how the corresponding input correlation term ji xx
-
affects the neural output. If the absolute value of the weight is very small, then the
effect of the corresponding input term on the output may also be very small. On the
other hand, the corresponding term may be dominant or important if the absolute value
of the weight is large compared to the other weights.
The weights in Table II suggest that the absolute value of 12w is always large
independent of the initial values and the largest except for only one case (middle row
where it is still the second largest). The absolute value of 00w is the largest in one
case (middle row) among three cases, but the smallest in one case (top row). The input
term corresponding to the weight 00w is nothing but the bias. Note that the large
|| 12w implies a large contribution of the correlation term 21xx to the output and that
the contribution of the term may be negative because 012 w . Indeed, the target XOR
function can be defined as 21xxy .
Consequently, if the target (unknown) function involves a higher order combination
of the input variables, the ability of the higher order neural units can be superior to
neural units that do not have necessary higher order input terms. Of course, this is
only a discussion on the synaptic operation, and somatic operation may create higher
order terms in the sense of Taylor expansion of the nonlinear activation functions.
However, such higher order terms by somatic operation may be limited or indirect.
Thus, the direct effect of the higher order terms is a reason why the higher order neural
units can be effective for such problems that may involve the higher order terms of the
input variables.
3.2. Time Series Prediction
In this subsection, the time-series prediction performance of dynamic SONUs
(Figure 7) adapted by dynamic gradient descent (RTRL) is demonstrated and compared
to single hidden layer perceptron-type recurrent neural networks with various numbers
of sigmoid neurons in the hidden layer (from 3 to 10) and two recurrent configurations,
recurrent hidden layer (RNN) and tapped delay feedbacks of neural output (TptDNN).
For comparison of the performance, extensive simulation analysis was performed on
theoretical and real data shown in Figure 6 and also white noise was added to training
and testing data to compare generalization and overfitting of SONUs.
-
-1
0
1 Art-3: Artificial ECG
-1
0
1 Real-2: Real ECG
-1
0
1 Real-3: EEG
-0.5
0
0.5
1 Art-4: Lorenz system
-0.5
0
0.5
1Art-5: MacKey-Glass
-0.5
0
0.5
1Art-2: Nonlinear periodic
-0.5
0
0.5
1Art-1: Quasiperiodic
0 200 400 600 800 1000 1200 1400 1600 1800 2000-1
0
1Real-4: R-R
k
-0.5
0
0.5
1
Real-1: Respiration
Figure 6. All signals (clean data) that were used in the experimental study. The first
1000 samples were training data. Samples for k=1001-2000 were used as testing
data.
Table III. Total counts of simulation experiments with SONU (QNU), recurrent
perceptron-type neural networks (RNN), and tapped-delay neural networks
(TptDNN) with a single hidden layer and various numbers of hidden neurons (3,
-
5, or 7)
Table IV. The percentage of counts of neural architectures that were tested with better
than average performance, measured with sum of square errors (SSE), of all
neural architectures that were tested.
Table V. Count of types of neural architectures that reached absolute minimum SSE
for three predicting horizons (after averaging results over three levels of noise
distortion). G OFpure smooth pure smooth
QNU RNNTptDN
NQNU
TptDN
NQNU RNN QNU RNN
TptD
NNQNU RNN
TptD
NN
Art-1 Quasiperiodic 3 1 1 3 1 2 8 3
Art-2 NonlinPeriodic 3 2 3 3 11
Art-3 ECG_Art 3 2 3 3 11
Art-4 Lorenz 3 2 3 2 1 8 3
Art-5 MacKeyGlass 3 2 3 3 11
Real-1 Respiration 2 1 2 3 2 1 9 1 1
Real-2 ECG_Real 3 2 3 3 11
Real-3 EEG 2 1 2 1 2 1 2 6 2 3
Real-4 RR 3 2 3 3 11
22 4 1 15 3 22 5 19 2 6
81% 15% 4% 83% 17% 81% 19% 70.4% 7.4% 22.2%percentage
Count J=Jmin
100% 100% 100% 100.0%
column count
row count
data info
G OF
pure smooth pure smooth
data info QNU RNN
TptDN
N QNU RNN
TptDN
N QNU RNN
TptDN
N QNU RNN
TptDN
N QNU RNN
TptD
NN
Art-1 Quasiperiodic 81% 46% 50% 89% 49% 59% 91% 85% 69% 89% 76% 67% 87% 64% 61%
Art-2 NonlinPeriodic 81% 46% 50% 84% 49% 56% 96% 75% 70% 90% 59% 57% 88% 57% 58%
Art-3 ECG_Art 100% 40% 33% 96% 43% 43% 93% 58% 59% 97% 48% 51% 97% 47% 46%
Art-4 Lorenz 76% 47% 46% 81% 48% 52% 81% 80% 69% 80% 79% 71% 80% 63% 59%
Art-5 MacKeyGlass 89% 51% 54% 76% 50% 50% 78% 65% 56% 82% 57% 55% 81% 56% 54%
Real-1 Respiration 82% 56% 51% 82% 50% 57% 97% 69% 59% 84% 59% 57% 86% 58% 56%
Real-2 ECG_Real 100% 33% 36% 97% 34% 36% 95% 55% 57% 97% 43% 42% 97% 41% 43%
Real-3 EEG 81% 63% 63% 61% 45% 46% 40% 73% 62% 37% 68% 59% 55% 62% 58%
Real-4 RR 89% 42% 49% 55% 44% 64% 83% 57% 57% 95% 52% 57% 81% 49% 56%
Column Average 86% 47% 48% 80% 46% 51% 84% 68% 62% 84% 60% 57%
Row Average% J
-
1
s r s rn n n nT
i j iji j i
x x w
a ax W x1
s r s rn n n nT
i j iji j i
x x w
a ax W x( )kx
1z1z
( 1)
( 1)
( )
( 1)
( 1)
1k nn s
kn
kr
kr
k nr r
y
y
y
y
y
( ) s rn n
k
ax
1z1z
1z1z
( )kry( )sk nny
Figure 7. Schematics of the recurrent QNU with ns-1 state feedbacks (recurrences) and
nr external inputs (real measured values) as used for time series prediction.
4. Higher Order Neural Units and Higher Order Neural Networks
To capture the higher order nonlinear properties of the input pattern space, extensive
efforts have been made by Rumelhart et al. (1986), Giles and Maxwell (1987), Softky
and Kammen (1991), Xu et al. (1992), Taylor and Commbes (1993), and Homma and
Gupta (2002) toward developing architectures of neurons that are capable of capturing
not only the linear correlation between components of input patterns, but also the
higher order correlation between components of input patterns. HONNs have proven
to have good computational, storage, pattern recognition, and learning properties and
are realizable in hardware (Taylor and Commbes, 1993). Regular polynomial networks
that contain the higher order correlations of the input components satisfy the
Stone-Weierstrass theorem that is a theoretical background of universal function
approximators by means of neural networks (Gupta et al., 2003), but the number of
weights required to accommodate all the higher order correlations increases
exponentially with the number of the inputs. HONUs are the basic building block for
such an HONN. For such an HONN as shown in Figure 8, the output is given by
)(uy (20)
n
i
n
ii
n
ii
iiiiiiiiii
N
NNxxwxxwxwwu
1 21 1
11212111
, ,,
0
(21)
where T
nxxx ],...,,[ 21x is a vector of neural inputs, y is an output, and (.) is a
-
strictly monotonic activation function such as a sigmoidal function whose inverse,
(.)1 , exists. The summation for the kth-order correlation is taken on a set
)1(),( 1 NjiiC j , which is a set of the combinations of j indices nii j 11
defined by
NjiiiniiiiiiC jjjj 1},,1:{)( 21111
Also, the number of the Nth-order correlation terms is given by
Njnj
jn
j
jn
1,
)!1(!
)!1(1
The introduction of the set )( 1 jiiC is to absorb the redundant terms due to the
symmetry of the induced combinations. In fact, eqn. (21) is a truncated Taylor series
with some adjustable coefficients. The Nth-order neural unit needs a total of
)!1(!
)!1(1
00
nj
jn
j
jn N
j
N
j
weights including the basis of all of the products up to N components.
Figure 8. Block diagram of the HONU, eqns. (20) and (21).
-
Example 1 In this example, we consider a case of the third order (N = 3) neural network
with two neural inputs (n = 2). Here
}2,1,0{)( iC
}22,12,11{)( 21 iiC
}222,122,112,111{)( 321 iiiC
and the network equation is
322222211222211123111122222112211122110 xwxxwxxwxwxwxxwxwxwxwwy
The HONUs may be used in conventional feedforward neural network structures as
hidden units to form HONNs. In this case, however, consideration of the higher
correlation may improve the approximation and generalization capabilities of the
neural networks. Typically, only SONNs are usually employed in practice to give a
tolerable number of weights as discussed in sections 2 and 3. On the other hand, if the
order of the HONU is high enough, eqns. (20) and (21) may be considered as a neural
network with n inputs and a single output. This structure is capable of dealing with
the problems of function approximation and pattern recognition.
To accomplish an approximation task for given input-output data )}(),({ kykx , the
learning algorithm for the HONN can easily be developed on the basis of the gradient
descent method. Assume that the error function is formulated as
)(2
1)]()([
2
1)( 22 kekykdkE
where )()()( kykdke , )(kd is the desired output, and )(ky is the output of the
neural network. Minimization of the error function by a standard steepest descent
technique yields the following set of learning equations:
)(')(00 uydwwoldnew (22)
jiii
old
ij
new
ij xxxuydww 21)(')( (23)
where dudu /)(' . Like the backpropagation algorithm for a multilayered
feedforward neural network (MFNN), a momentum version of the above is easily
obtained.
Alternatively, because all the weights of the HONN appear linearly in eqn. (21), one
may use the method for solving linear algebraic equations to carry out the preceding
learning task if the number of patterns is finite. To do so, one has to introduce the
following two augmented vectors
-
Tnnnnn wwwwwwwww ,...,,,...,,...,,,,...,, 2211121110w
and
TNnNNnn wxxxxxxxxxx ,...,,,...,,...,,,,...,,)( 21112212110 xu
where 10 x , so that the network equations, eqns. (20) and (21), may be rewritten in
the following compact form:
))(( xuwTy (24)
For the given p pattern pairs )}(),({ kdkx , ( pk 1 ), define the following vectors and
matrix
TTTTT pdddpuuu ))(()),...,2(()),1((,)(),...,2(),1( 111 dU
where ))(()( kk xuu , pk 1 . Then, the learning problem becomes one that finds a
solution of the following linear algebraic equation
dUw (25)
If the number of the weights is equal to the number of the data and the matrix U is
nonsingular, then eqn. (25) has a unique solution
dUw1
A more interesting case occurs when the dimension of the weight vector w is less than
the number of data p. Then the existence of the exact solution for the above linear
equation is given by
UdU rankrank In case this condition is not satisfied, the pseudoinverse solution is usually an option
and gives the best fit.
The following example shows how to use the HONN presented in this section to deal
with pattern recognition problems that are also typical applications in computer science
and engineering situations. It is of interest to show that solving such problems is
equivalent to finding the decision surfaces in the pattern space such that the given data
patterns are located on the surfaces.
Example 2 Consider a three-variable XOR function defined as
321213321321321 )()()(),,( xxxxxxxxxxxxxxxfy
The eight input pattern pairs and corresponding outputs are given in Table VI. This is
a typical nonlinear pattern classification problem. A single linear neuron with a
nonlinear activation function is unable to form a decision surface such that the patterns
are separated in the pattern space. Our objective here is to find all the possible
-
solutions using the third order neural network to realize the logic function.
Table VI. Truth table of XOR function 321 xxx .
A third order neural network is designed as
3211233223311321123322110 xxxwxxwxxwxxwxwxwxwwy
where }1,1{,, 321 xxx are the binary inputs, and the network contains eight weights.
To implement the above mentioned logic XOR function, one may consider the solution of
the following set of linear algebraic equations:
1
1
1
1
1
1
1
1
1232313123210
1232313123210
1232313123210
1232313123210
1232313123210
1232313123210
1232313123210
1232313123210
wwwwwwww
wwwwwwww
wwwwwwww
wwwwwwww
wwwwwwww
wwwwwwww
wwwwwwww
wwwwwwww
The coefficient matrix U is given by
11111111
11111111
11111111
11111111
11111111
11111111
11111111
11111111
U
Pattern Input 1x Input 2x Input 3x Output y
A 1 1 1 1
B 1 1 1 1
C 1 1 1 1
D 1 1 1 1
E 1 1 1 1
F 1 1 1 1
G 1 1 1 1
H 1 1 1 1
-
which is nonsingular. The equations have a unique set of solutions:
1,0 1232313123210 wwwwwwww
Therefore, the logic function is realized by the third order polynomial 321 xxxy .
This solution is unique in terms of the third order polynomial.
Xu et al. (1992) as well as Taylor and Commbes (1993) also demonstrated that
HONNs may be effectively applied to problems using a model of a curve, surface, or
hypersurface to fit a given data set. This problem, called nonlinear surface fitting, is
often encountered in many computer science and engineering applications. Some
learning algorithms for solving such problems can be found in their papers. Moreover,
if one assumes xx )( in the HONU, the weight exhibits linearity in the networks
and the learning algorithms for the HONNs may be characterized as a linear least
square (LS) procedure. Then the well-known local minimum problems existing in
many nonlinear neural learning schemes may be avoided.
4.1. Representation of Higher Order Neural Network Discriminant Using
Multidimensional Matrix Product
The discriminant of a HONN is a summation of quadratic terms. This can be
alternatively represented using multidimensional matrix multiplication (Solo, 2010).
For example,
3 3
1 1
ij i j
i j
w x x
= w11x12 + w12x1x2 + w13x1x3 + w21x2x1 + w22x22 + w23x2x3 + w31x3x1 + w32x3x2 + w33x32
= w11x12 + w22x22 + w33x32 + x1x2 (w12 + w21) + x1x3 (w13 + w31) + x2x3 (w23 + w32)
This weighted summation is easily represented using classical matrices multiplied
together:
3 3
1 1
ij i j
i j
w x x
= 11 12 13 1
1 2 3 21 22 23 2
31 32 33 3
* *
w w w x
x x x w w w x
w w w x
It is extremely useful to express these weighted summations as matrices multiplied
together to eliminate unnecessary terms in neural network designs. Because both the
weights wij and wji in the matrix above correspond to the same second-order term xixj, it
is sufficient to use only an upper triangular or lower triangular matrix. For instance,
instead of separately determining values for w12 and w21, both of which are weights for
x1x2, one can eliminate one of these weights and determine a value for either w12 or w21
that would be as much as both of these combined if they were computed separately.
The same applies for other redundant weights. This saves time in the neural
-
network’s intensive procedure of computing weights.
However, the following equation and more complicated equations used for neural
network applications cannot be expressed using classical matrices. Variables xi, xj, and
xk are inputs and wijk are weights for these inputs.
2 2 2
1 1 1
ijk i j k
i j k
w x x x
= w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23
= w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23
This weighted summation can be alternatively represented using multidimensional
matrices (Solo, 2010) multiplied together. Premultiply the 2 * 2 * 2 weight matrix by a
1 * 2 * 2 input matrix in the first dimension and second dimension. Then postmultiply
the 2 * 2 * 2 weight matrix by a 2 * 1 * 2 input matrix in the first dimension and second
dimension. Premultiply this entire product by a 1 * 2 input matrix in the first
dimension and second dimension. Note that because the first dimension and second
dimension of these multidimensional matrices are being multiplied, this does not need
to be indicated in the equations below.
2 2 2
1 1 1
ijk i j k
i j k
w x x x
=
111 121 1
1 2 211 221 2
1 2
112 122 11 2
212 222 2
* * *
w w x
x x w w xx x
w w xx x
w w x
The multidimensional matrix product (Solo, 2010) of the first dimension and second
dimension of the 1 * 2 * 2 input matrix and the 2 * 2 * 2 weight matrix results in a 1 * 2
* 2 matrix.
2 2 2
1 1 1
ijk i j k
i j k
w x x x
=
1
111 1 211 2 121 1 221 2 2
1 2
1112 1 212 2 122 1 222 2
2
* *
x
w x w x w x w x xx x
xw x w x w x w x
x
The multidimensional matrix product of the first dimension and second dimension
of the 1 * 2 * 2 matrix and the 2 * 1 * 2 input matrix results in a 1 * 1 * 2 matrix.
2 2 2
1 1 1
ijk i j k
i j k
w x x x
-
= 2 2
1 2
2 2
1 2
111 211 1 2 121 1 2 221
1 2
112 212 1 2 122 1 2 222
*w x w x x w x x w x
x xw x w x x w x x w x
The 1 * 1 * 2 matrix can be simplified into a 1-D matrix with 2 elements, so it can be
premultiplied by the 1 * 2 input matrix in the first dimension and second dimension.
2 2 2
1 1 1
ijk i j k
i j k
w x x x
= 2 2
1 2
2 2
1 2
111 211 1 2 121 1 2 2211 2
112 212 1 2 122 1 2 222*
w x w x x w x x w xx x
w x w x x w x x w x
= w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23
= w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23
Thus, this multidimensional matrix multiplication yields the same result as the
summation of quadratic terms above.
4.2. Modified Polynomial Neural Networks
4.2.1. Sigma-Pi Neural Networks
Note that an HONU contains all the linear and nonlinear correlation terms of the
input components to the order n. A slightly generalized structure of the HONU is a
polynomial network that includes weighted sums of products of selected input
components with an appropriate power. Mathematically, the input-output transfer
function of this network structure is given by
n
j
w
jiijxu
1
(26)
N
i
iiuwy1
(27)
where iji ww , , N is the order of the network and iu is the output of the i-th hidden
unit. This type of feedforward network is called a sigma-pi network (Rumelhart et al.
1986). It is easy to show that this network satisfies the Stone-Weierstrass theorem if
)(x is a linear function. Moreover, a modified version of the sigma-pi network, as
proposed by Hornik et al. (1989) and Cotter (1990), is
n
j
w
jiijxpu
1
(28)
N
i
iiuwy1
(29)
where iji ww , and jxp is a polynomial of jx . It is easy to verify that this
-
network satisfies the Stone-Weierstrass theorem, and thus, it can be an approximator
for problems of functional approximations. The sigma-pi network defined in eqns. (26)
and (27) is a special case of the above network while jxp is assumed to be a linear
function of jx . In fact, the weights ijw in both the networks given in eqns. (26) and
(28) may be restricted to integer or nonnegative integer values.
4.2.2. Ridge Polynomial Neural Networks
To obtain fast learning and powerful mapping capabilities, and to avoid the
combinatorial increase in the number of weights of HONNs, some modified polynomial
network structures have been introduced. One of these is the pi-sigma network (Shin
and Ghosh, 1991), which is a regular higher order structure and involves a much
smaller number of weights than sigma-pi networks. The mapping equation of a
pi-sigma network can be represented as
n
j
ijiji xwu1
(30)
N
i
n
j
ijij
N
i
i xwuy1 11
(31)
The total number of weights for an Nth-order pi-sigma network with n inputs is only
Nn )1( . Compared with the sigma-pi network structure, the number of weights
involved in this network is significantly reduced. Unfortunately, when xx )( , the
pi-sigma network does not match the conditions provided by the Stone-Weierstrass
theorem because the linear subspace condition is not satisfied (Gupta et al., 2003).
However, some studies have shown that it is a good network model for smooth functions
(Shin and Ghosh, 1991).
To modify the structure of the above mentioned pi-sigma networks such that they
satisfy the Stone-Weierstrass theorem, Shin and Ghosh (1991) suggested considering
the ridge polynomial neural network (RPNN). For the vectors Tijnijijij www ,...,, 21w
and Tnxxx ,...,, 21x , let
n
k
kijkij xw1
,wx
which represents an inner product between the two vectors. A one-variable continuous
-
function f of the form ijwx, is called a ridge function. A ridge polynomial is a
ridge function that can be represented as
i
ij
M
j
ij
N
i
a
wx,00
for some ija and n
ij w . The operation equation of a RPNN is expressed as
N
j
n
i
jiijy1 1
, wx
where xx )( . The denseness, which is a fundamental concept for universal
function approximators described in the Stone-Weierstrass theorem, of this network can
be verified (Gupta et al., 2003).
The total number of weights involved in this structure is 2)1)(1( nNN . A
comparison of the number of weights of the three types of polynomial network
structures is given in Table VII. The results show that when the networks have the
same higher-order terms, there are significantly less weights for a RPNN than for a
sigma-pi network. This is a very attractive improvement offered by RPNNs.
Table VII. The number of weights in the polynomial networks.
5. Engineering Applications
Function approximation problems are typical examples in many computer science and
engineering situations. The capability to approximate nonlinear complex functions
can be a basis of the complex pattern classification ability as well. Furthermore, the
neural network approach with high approximation ability can be used for time series
analysis by introducing time delay features into the neural network structure. Time
series analysis or estimation is one of the most important problems in computer science
and engineering applications. In this section, we will explain the function
approximation ability of HONNs first. Neural network structures to represent time
Order of
network
Number of weights
Pi-sigma RPNN Sigma-pi
N n=5 n=10 n=5 n=10 n=5 n=10
2 12 22 18 33 21 66
3 18 33 36 66 56 286
4 24 44 60 110 126 1001
-
delay features will then be introduced for time series analysis.
5.1. Function approximation problem
For evaluating the function approximation ability of HONNs, an example was taken
from Klassen et al. (1988). The task consists of learning a representation for an
unknown, one-variable nonlinear function, )(xF , with the only available information
being the 18 sample patterns (Villalobos and Merat, 1995).
For this function approximation problem, a two-layered neural network structure
was composed of two SONUs in the first layer and a single SONU in the output layer
(Figure 9). The nonlinear activation function of the SONUs in the first layer was
defined by a bipolar sigmoidal function as ))exp(1())exp(1()( uuu , but for
the single output SONU, instead of the sigmoidal function, the linear function was used:
uuy )( . The gradient learning algorithm with 1.0 was used for this
problem.
Figure 9. A two-layered neural network structure with two SONUs in the first layer
and a single SONU in the output layer for the function approximation problem.
The mapping function obtained by the SONU network after 710 learning iterations
appears in Figure 10. In this case, the average square error taken over 18 patterns
was 4.566E-6. The fact that the approximation accuracy shown in Figure 10 is
extremely high is evidence of the high approximation ability of the SONN.
-
Figure 10. Training pairs and outputs estimated by the network with SONUs for the
Klassen's function approximation problem (Klassen et al., 1988).
Five particular trigonometric functions, )2cos(),2sin(),cos(),sin( xxxx and
)4sin( x , were used as special features of the extra neural inputs (Klassen et al., 1988).
Also, it has been reported (Villalobos and Merat, 1995) that the term )cos( x is not
necessary to achieve a lower accuracy within the error tolerance 1.125E-4, but still four
extra features were required.
On the other hand, in this study, the high approximation accuracy of the proposed
SONU network was achieved by only two SONUs with the sigmoidal activation function
in the first layer and a single SONU with the linear activation function in the output
layer, and no special features were required for high accuracy. These are remarkable
advantages of the proposed SONN structure.
To highlight the superiority of HONN over the simple first-order neural networks in
capturing nonlinear correlations among multiple inputs, we show another example of
function approximation. For simplicity and to even more emphasize the strength of
concept of HONN, we will demonstrate the example using a single higher-order neural
unit of various orders N=2, 3, 4, 5.
We consider a multiple-input static function
-
2
2 2 2( , , )
0.1
x y x y zf x y z
x y z
(32)
where x,y, and z are normally distributed random variables (stdev=1) that represent the
input pattern data, and f( ) represent the target data. The length of training data was
300. For training both MLP and HONU, a basic version of the Levenberg-Marquardt
algorithm was implemented using a decreasing learning rate when training
performance, sum of square errors (SSE), stopped decreasing in two consequent
training epochs.
Figure 11. The upper plot shows training performance of static MLP neural network
with 10 sigmoidal neurons in a hidden layer and a linear output neuron. MLP needs
many epochs; the bottom plot shows that training performance of HONU improves with
increasing order N. HONUs are trained in very few epochs with the same
Levenberg-Marquardt algorithm.
-
Figure 12. Testing for trained MLP network and HONU from Error! Reference source
not found. on different data. The upper plot shows testing of static MLP network from
upper part of Error! Reference source not found.. The bottom plot shows that testing of
the best trained HONU for N=5. Mean average error of HONU is better than the one
of MLP.
-
Figure 13. Simulation run from different initial weights than in Error! Reference
source not found.. Again the upper plot shows training performance of a static MLP.
This time, the MLP typically gets stuck in a local minima. The bottom plot shows a
very similar training performance of a HONU for different initial weights and for the
same training data. This is because pure HONU (a polynomial neural unit) is linear in
its parameters, but it performs strong nonlinear mapping.
-
Figure 14. Testing for trained MLP network and HONU from Figure 13 on different
data. The upper plot shows testing of a static MLP network from the upper part of
Figure 13. The bottom plot shows that testing of the best trained HONU ( N=5).
HONU is pervasively more often precise then MLP. However, its MAE is this time
worse because the three outliers of the HONU become very imprecise. This may
occasionally happen with pure HONUs without output sigmoid function and it relates to
a lack of training data.
6. Concluding Remarks and Future Research Directions
In this chapter, the basic foundation of neural networks, starting from a basic
introduction to biological foundations, neural unit models, and learning properties, has
been introduced. Then as the first step to understanding HONNs, a general SONU
was developed. Simulation studies for both the pattern classification and function
approximation problems demonstrated that the learning and generalization abilities of
the proposed SONU and neural networks having SONUs are greatly superior to that of
the widely used linear combination neural units and their networks. Indeed, from the
point of view of both the neural computing process and its learning algorithm, it has
-
been found that linear combination of neural units that are widely used in multilayered
neural networks are only a subset of the proposed SONUs. Some extensions of these
concepts to radial basis function (RBF) networks, fuzzy neural networks, and dynamic
neural units will be interesting future research projects.
To further strengthen the readers’ interest in HONUs and HONNs, it should be
mentioned that HONUs are powerful nonlinear approximators that are linear in their
parameters. That is, if we look at the fundamental HONU representations, such as
eqn. (21) in this chapter, we clearly see that when input variables are substituted with
training data, the weight optimization of many fundamental HONN architectures
yields a linear optimization problem that is uniquely solvable by the
Levenberg-Marquardt algorithm or even by the least squares method. We believe that
HONNs represent a very big opportunity for many researchers as the need for more
advanced optimization methods is not so urgent for many HONUs that are basic
polynomials, yet nonlinearly powerful architectures. Therefore, rather than search for
some complicated optimization techniques, neural networks researchers and
practitioners may spend more effort with proper data selection and signal processing
that plays a crucial role in performance of neural networks including HONNs of course.
There is certainly rapidly growing research interest in the field of HONNs. There
are increasing complexities in applications not only in the fields of aerospace, process
control, ocean exploration, manufacturing, and resource based industry, but also in
computer science and engineering. This chapter deals with the theoretical foundations
of HONNs and will help readers to develop or apply the methods to their own modeling
and simulation problems. Most of the book deals with real modeling and simulation
applications.
We hope that our efforts in this chapter will stimulate research interests, provide
some new challenges to its readers, generate curiosity for learning more in the field, and
arouse a desire to seek new theoretical tools and applications. We will consider our
efforts successful if this chapter raises one’s level of curiosity.
7. Acknowledgements
Dr. Madan M. Gupta wishes to acknowledge the support from the Natural Sciences and
Engineering Research Council of Canada through the Discovery Grant. Dr. Ivo
Bukovsky’s research is supported by grants SGS12/177/OHK2/3T/12 and
SGS10/252/OHK2/3T/12. Dr. Zeng-Guang Hou’s research is partially supported by the
National Natural Science Foundation of China (Grant 61175076).
-
8. References
Bukovsky, I., Bila, J., Gupta, M., M, Hou, Z-G., Homma, N., (2010a).: “Foundation
and Classification of Nonconventional Neural Units and Paradigm of Nonsynaptic
Neural Interaction” in Discoveries and Breakthroughs in Cognitive Informatics and
Natural Intelligence within the series of the Advances in Cognitive Informatics and
Natural Intelligence (ACINI), ed. Y. Wang, IGI Publishing, USA, p.508-523.
Bukovsky, I., Homma, N., Smetana, L., Rodriguez, R., Mironovova M., Vrana S.,
(2010b): “Quadratic Neural Unit is a Good Compromise between Linear Models and
Neural Networks for Industrial Applications”, ICCI 2010 The 9th IEEE International
Conference on Cognitive Informatics, Beijing, China.
Bukovsky, I., Bila, J., & Gupta, M., M. (2005). Linear Dynamic Neural Units with Time
Delay for Identification and Control (in Czech). In Automatizace, Vol. 48, No. 10. Prague.
Czech Republic. ISSN 0005-125X, pp. 628-635,
Bukovsky, I., & Simeunovic, G. (2006). Dynamic-Order-Extended Time-Delay Dynamic
Neural Units. 8th Seminar on Neural Network Applications in Electrical Engineering
NEUREL-2006, IEEE (SCG) CAS-SP. Belgrade. ISBN 1-4244-0432-0
Bukovsky, I., Bila, J., & Gupta, M., M. (2006). Stable Neural Architecture of Dynamic
Neural Units with Adaptive Time Delays. In 7th International FLINS Conference on
Applied Artificial Intelligence. ISBN 981-256-690-2. pp. 215-222.
Cichochi, A., & Unbehauen, R. (1993). Neural Networks for Optimization and Signal
Processing. Chichester: Wiley.
Cotter, N. (1990). The Stone-Weierstrass Theorem and Its Application to Neural
Networks. IEEE Trans. Neural Networks, 1(4), 290-295.
Giles, C. L., & Maxwell, T. (1987). Learning invariance, and generalization in
higher-order networks. Appl. Optics, 26, 4972-4978.
Gupta, M. M., Jin, L., & Homma, N. (2003). Static and Dynamic Neural Networks:
From Fundamentals to Advanced Theory. Hoboken, NJ: IEEE & Wiley.
Harston, C. T. (1990). The Neurological Basis for Neural Computation. In Maren, A.
J., Harston, C. T., & Pap, R. M. (Eds.), Handbook of Neural Computing Applications, Vol.
1. (pp. 29-44). New York: Academic.
Homma, N., & Gupta, M. M. (2002). A general second order neural unit. Bull. Coll.
Med. Sci., Tohoku Univ., 11(1), 1-6.
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward Networks
Are Universal Approximators. Neural Networks, 2(5), 359-366.
http://www.icci2010.edu.cn/http://www.icci2010.edu.cn/
-
Klassen, M., Pao, Y., & Chen, V. (1988). Characteristics of the functional link net: a
higher order delta rule net. Proc. of IEEE 2nd Annual Int'l. Conf. Neural Networks.
Kobayashi, S. (2006). Sensation World Made by the Brain – Animals Do Not Have
Sensors. Tokyo: Corona (in Japanese).
Matsuba, I. (2000). Nonlinear time series analysis. Tokyo: Asakura-syoten (in
Japanese).
McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas imminent in
nervous activity. Bull. Math. Biophys., 5, 115-133.
Narendra, K., & Parthasarathy, K. (1990). Identification and control of dynamical
systems using neural networks. IEEE Trans. Neural Networks, 1, 4-27.
Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks, Reading, MA:
Addison-Wesley..
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Internal
Representations by Error Propagation. In Rumelhart, D. E. and McClelland, J. L.
(Eds.), Parallel Distributed Processing: Explorations in the Microstructure of
Cognition, Vol. 1 (pp. 318-362). Cambridge, MA: MIT Press.
Schmidt, W., & Davis, J. (1993). Pattern recognition properties of various feature
spaces for higher order neural networks. IEEE Trans. Pattern Analysis and Machine
Intelligence, 15, 795-801.
Shin, Y., & Ghosh, J. (1991). The Pi-sigma Network: An Efficient Higher-order
Neural Network for Pattern Classification and Function Approximation. Proc. Int.
Joint Conf. on Neural Networks (pp. 13-18).
Sinha, N., Gupta, M. M., & Zadeh, L. (1999). Soft Computing and Intelligent Control
Systems: Theory and Applications. New York: Academic.
Softky, R. W., & Kammen, D. M. (1991). Correlations in high dimensional or
asymmetrical data sets: Hebbian neuronal processing. Neural Networks, 4, 337-347.
Taylor, J. G., & Commbes, S. (1993). Learning higher order correlations. Neural
Networks, 6, 423-428.
Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional
Matrix Calculus: Part 1 of 5. Proceedings of the 2010 International Conference on
Scientific Computing (CSC'10), 353-359. CSREA Press.
Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional
Matrix Calculus: Part 2 of 5. Proceedings of the 2010 International Conference on
Scientific Computing (CSC'10), 360-366. CSREA Press.
Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional
Matrix Calculus: Part 3 of 5. Proceedings of the 2010 International Conference on
-
Scientific Computing (CSC'10), 367-372. CSREA Press.
Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional
Matrix Calculus: Part 4 of 5. Proceedings of the 2010 International Conference on
Scientific Computing (CSC'10), 373-378. CSREA Press.
Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional
Matrix Calculus: Part 5 of 5. Proceedings of the 2010 International Conference on
Scientific Computing (CSC'10), 379-381. CSREA Press.
Villalobos, L., & Merat, F. (1995). Learning capability assessment and feature space
optimization for higher-order neural networks. IEEE Trans. Neural Networks, 6,
267-272.
P. J. Werbos (1990), “Backpropagation through time: What it is and how to do it,” Proc.
IEEE, vol. 78, no. 10, pp. 1550–1560.
R. J. Williams and D. Zipser (1989) “A learning algorithm for continually running fully
recurrent neural networks,” Neural Comput., vol. 1, pp. 270–280.
Xu, L., Oja, E., & Suen, C. Y. (1992). Modified hebbian learning for curve and surface
fitting. Neural Networks, 5, 441-457.
9. Additional Reading
[A] Biological Motivation on Neural Networks [A.1] Ding, M.-Z., & Yang, W.-M. (1997). Stability of Synchronous Chaos and
On-Off Intermittency in Coupled Map Lattices. Phys. Rev. E, 56(4), 4009-4016. [A.2] Durbin, R. (1989). On the Correspondence Between Network Models and the
Nervous System. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The Computing Neurons. Reading, Mass.: Addison-Wesley.
[A.3] Engel, K., Konig, P., Kreiter, A. K., & Singer, W. (1991). Interhemispheric Synchronization of Oscillatory Neuronal Responses in Cat Visual Cortex. Science, 252, 1177-1178.
[A.4] Ersu, E., & Tolle, H. (1984). A New Concept for Learning Control Inspired by Brain Theory. Proc. 9th World Congress IFAC (pp. 245-250).
[A.5] Forbus, K. D., & Gentner, D. (1983). Casual Reasoning About Quantities. Proc. 5th Annual Conf. of the Cognitive Science Society (pp. 196-206).
[A.6] Fujita, M. (1982). Adaptive Filter Model of the Cerebellum. Biological Cybernetics, 45, 195-206.
[A.7] Garliaskas, A., & Gupta, M. M. (1995). A Generalized Model of Synapse-Dendrite-Cell Body as a Complex Neuron. World Congress on Neural Networks , Vol. 1 (pp. 304-307).
[A.8] Gupta, M. M. (1988). Biological Basis for Computer Vision: Some Perspective. SPW Conf. on Intelligent Robots and Computer Vision (pp. 811-823).
[A.9] Gupta, M. M., & Knopf, G. K. (1992). A Multitask Visual Information Processor with a Biologically Motivated Design. J. Visual Communicat., Image Representation, 3(3), 230-246.
-
[A.10] Hiramoto, M., Hiromi, Y., Giniger, E., & Hotta, Y. (2000). The Drosophila Netrin Receptor Frazzled Guides Axons by Controlling Netrin Distribution. Nature, 406(6798), 886-888.
[A.11] Honma, N., Abe, K., Sato, M., & Takeda, H. (1998). Adaptive Evolution of Holon Networks by an Autonomous Decentralized Method. Applied Mathematics and Computation, 9(1), 43-61.
[A.12] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.
[A.13] Kohara, K., Kitamura, A., Morishima, M., & Tsumoto, T. (2001). Activity-Dependent Transfer of Brain-Derived Neurotrophic Factor to Postsynaptic Neurons. Science, 291, 2419-2423.
[A.14] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605), Morgan Kaufmann.
[A.15] Lovejoy, C. O. (1981). The Origin of Man. Science, 211, 341-350. [A.16] Maire, M. (2000). On the Convergence of Validity Interval Analysis. IEEE
Trans. on Neural Networks, 11(3), 799-801. [A.17] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993).
Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.
[A.18] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ. Press.
[A.19] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.
[A.20] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.
[A.21] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.
[A.22] Pecht, O. Y., & Gur, M. (1995). A Biologically-Inspired Improved MAXNET. IEEE Trans. Neural Networks, 6, 757-759.
[A.23] Petshe, T., & Dickinson, B. W. (1990). Trellis Codes, Receptive Fields, and Fault-Tolerance Self-Repairing Neural Networks. IEEE Trans. Neural Networks, 1(2), 154-166.
[A.24] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, pp. 46-52.
[A.25] Rao, D. H., & Gupta, M. M. (1993). A Generic Neural Model Based on Excitatory - Inhibitory Neural Population. IJCNN-93 (pp. 1393-1396).
[A.26] Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386-408.
[A.27] Skarda, C. A., & Freeman, W. J. (1987). How Brains Make Chaos in Order to Make Sense of the World. Behavioral and Brain Sciences, 10, 161-195.
[A.28] Stevens, C. F. (1968). Synaptic Physiology. Proc. IEEE, 79(9), 916-930. [A.29] Wilson, H. R. and Cowan, J. D. (1972). Excitatory and Inhibitory
Interactions in Localized Populations of Model Neurons. Biophysical J, 12, 1-24.
[B] Neuronal Morphology: Concepts and Mathematical Models [B.1] Amari, S. (1971). Characteristics of Randomly Connected
Threshold-Element Networks and Network Systems. Proc. IEEE, 59(1), 35-47. [B.2] Amari, S. (1972). Characteristics of Random Nets of Analog Neuron - Like
Elements. IEEE Trans. Systems, Man and Cybernetics, 2, 643-654.
-
[B.3] Amari, S. (1972). Learning Patterns and Pattern Sequences by Self-Organizing Nets of Threshold Elements. IEEE Trans. on Computers, 21, 1197-1206.
[B.4] Amari, S. (1977). A Mathematical Approach to Neural Systems. In J. Metzler (Ed.), Systems Neuroscience (pp. 67-118). New York: Academic.
[B.5] Amari, S. (1977). Neural Theory of Association and Concept Formation. Biological Cybernetics, 26, 175-185.
[B.6] Amari, S. (1990). Mathematical Foundations of Neurocomputing. Proc. IEEE, 78(9), 1443-1462.
[B.7] Amit, D. J., Gutfreund, G., & Sompolinsky, H. (1985). Spin-Glass Model of Neural Networks. Physical Review A, 32, 1007-1018.
[B.8] Anagun, A. S., & Cin, I. (1998). A Neural-Network-Based Computer Access Security System for Multiple Users. Proc. 23rd Inter. Conf. Comput. Ind. Eng., Vol. 35 (pp. 351-354).
[B.9] Anderson, J. A. (1983). Cognition and Psychological Computation with Neural Models. IEEE Trans. System, Man and Cybernetics, 13, 799-815.
[B.10] Anninos, P. A. Beek, B., Csermel, T. J., Harth, E. E., & Pertile, G. (1970). Dynamics of Neural Structures. J. of Theoretical Biological, 26, 121-148.
[B.11] Aoki, C., & Siekevltz, P. (1988). Plasticity in Brain Development. Scientific American, Dec., 56-64,
[B.12] Churchland, P. S., & Sejnowski, T. J. (1988). Perspectives on Cognitive Neuroscience. Science, 242, 741-745.
[B.13] Holmes C. C., & Mallick, B. K. (1998). Bayesian Radial Basis Functions of Variable Dimension. Neural Computations, 10(5), 1217-1233.
[B.14] Hopfield, J. (1990). Artificial Neural Networks are Coming. An Interview by W. Myers, IEEE Expert, Apr., 3-6.
[B.15] Joshi, A., Ramakrishman, N., Houstis, E. N., & Rice, J. R. (1997). On Neurobiological, Neurofuzzy, Machine Learning, and Statistical Pattern Recognition Techniques. IEEE Trans. Neural Networks, 8.
[B.16] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.
[B.17] Kaneko, K. (1997). Coupled Maps with Growth and Death: An Approach to Cell Differentiation. Phys. D, 103, 505-527.
[B.18] Knopf, G. K., & Gupta, M. M. (1993). Dynamics of Antagonistic Neural Processing Elements. Inter. J. of Neural Systems, 4(3), 291-303.
[B.19] Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks, 1(1), 3-16.
[B.20] Kohonen, T. (1990). The Self-Organizing Map. Proc. of the IEEE, 78(9), 1464-1480.
[B.21] Kohonen, T. (1991). Self-Organizing Maps: Optimization Approaches. In T. Kohonen, K. Makisara, O. Simula, & J. Kangas (Eds.), Artificial Neural Networks (pp. 981-990). Amsterdam: Elsevier.
[B.22] Kohonen, T. (1993). Things You Haven't Heard About The Self-Organizing Map. Proc. Inter. Conf. Neural Networks 1993 (pp. 1147-1156).
[B.23] Kohonen, T. (1998). Self Organization of Very Large Document Collections: State of the Art. Proc. 8th Inter. Conf. Artificial Neural Networks, Vol. 1 (pp. 65-74).
[B.24] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605). Morgan Kaufmann.
[B.25] Lippmann, R. P. (1987). An Introduction to Computing with Neural Networks. IEEE Acoustics, Speech and Signal Processing Magazine, 4(2), 4-22.
-
[B.26] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993). Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.
[B.27] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ.
[B.28] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.
[B.29] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.
[B.30] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.
[B.31] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, 46-52.
[B.32] Sandewall, E. (1989). Combining Logic and Differential Equations for Describing Real-World Systems. Proc. 1st Inter. Conf. on Principles of Knowledge Representation and Reasoning (pp. 412-420). Morgan Kaufmann.
[B.33] Setiono, R., & Liu, H. (1996). Symbolic Representation of Neural Networks. Computer, 29(3), 71-77.
[B.34] Wilson, H. R., & Cowan, J. D. (1972). Excitatory and Inhibitory Interactions in Localized Populations of Model Neurons. Biophysical J., 12, 1-24.
View publication statsView publication stats
https://www.researchgate.net/publication/215884506