fundamental theory of artificial higher order neural networks · fundamentals of higher order...

41
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/215884506 Fundamentals of Higher Order Neural Networks for Modeling and Simulation Chapter · October 2012 DOI: 10.4018/978-1-4666-2175-6.ch006 CITATIONS 31 READS 3,659 5 authors, including: Some of the authors of this publication are also working on these related projects: Research on Medical Diagnosis with Intelligent Systems View project Intelligent Prediction Method of Lung Tumor Motion for Highly Accurate Radiation Therapy View project Madan M Gupta University of Saskatchewan 473 PUBLICATIONS 12,206 CITATIONS SEE PROFILE Ivo Bukovsky Czech Technical University in Prague 81 PUBLICATIONS 421 CITATIONS SEE PROFILE Noriyasu Homma Institute of Electrical and Electronics Engineers 207 PUBLICATIONS 1,626 CITATIONS SEE PROFILE All content following this page was uploaded by Ivo Bukovsky on 01 February 2017. The user has requested enhancement of the downloaded file.

Upload: others

Post on 25-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/215884506

    Fundamentals of Higher Order Neural Networks for Modeling and Simulation

    Chapter · October 2012

    DOI: 10.4018/978-1-4666-2175-6.ch006

    CITATIONS

    31READS

    3,659

    5 authors, including:

    Some of the authors of this publication are also working on these related projects:

    Research on Medical Diagnosis with Intelligent Systems View project

    Intelligent Prediction Method of Lung Tumor Motion for Highly Accurate Radiation Therapy View project

    Madan M Gupta

    University of Saskatchewan

    473 PUBLICATIONS   12,206 CITATIONS   

    SEE PROFILE

    Ivo Bukovsky

    Czech Technical University in Prague

    81 PUBLICATIONS   421 CITATIONS   

    SEE PROFILE

    Noriyasu Homma

    Institute of Electrical and Electronics Engineers

    207 PUBLICATIONS   1,626 CITATIONS   

    SEE PROFILE

    All content following this page was uploaded by Ivo Bukovsky on 01 February 2017.

    The user has requested enhancement of the downloaded file.

    https://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_2&_esc=publicationCoverPdfhttps://www.researchgate.net/publication/215884506_Fundamentals_of_Higher_Order_Neural_Networks_for_Modeling_and_Simulation?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_3&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Research-on-Medical-Diagnosis-with-Intelligent-Systems?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Intelligent-Prediction-Method-of-Lung-Tumor-Motion-for-Highly-Accurate-Radiation-Therapy?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_1&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/University-of-Saskatchewan?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Madan-Gupta-3?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Czech_Technical_University_in_Prague?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Institute_of_Electrical_and_Electronics_Engineers?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Noriyasu-Homma-2?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Ivo-Bukovsky?enrichId=rgreq-6723c9da9e49679319bff0f710171e70-XXX&enrichSource=Y292ZXJQYWdlOzIxNTg4NDUwNjtBUzo0NTY5MTYxNTQ0OTA4ODBAMTQ4NTk0ODcwODE5NA%3D%3D&el=1_x_10&_esc=publicationCoverPdf

  • Fundamentals of Higher Order Neural Networks

    for Modeling and Simulation

    Madan M. Gupta1,

    Ivo Bukovsky2, Noriyasu Homma3, Ashu M. G. Solo4, and Zeng-Guang Hou5

    Summary: In this chapter, we provide fundamental principles of higher order neural

    units (HONUs) and higher order neural networks (HONNs) for modeling and

    simulation. An essential core of HONNs can be found in higher order weighted

    combinations or correlations between the input variables and HONU. Except the high

    quality of nonlinear approximation of static HONUs, the capability of dynamic HONUs

    for modeling of dynamic systems is shown and compared to conventional recurrent

    neural networks when a practical learning algorithm is used. Also, the potential of

    continuous dynamic HONUs to approximate high dynamic-order systems is discussed

    as adaptable time delays can be implemented. By using some typical examples, this

    chapter describes how and why higher order combinations or correlations can be

    effective for modeling of systems.

    Keywords: higher order neural networks, higher order neural units, second order

    neural networks, second order neural units, sigma-pi networks, pi-sigma networks,

    ridge polynomial neural networks, tapped delay line neural networks

    1. Introduction

    The human brain has more than 10 billion neurons, which have complicated

    interconnections, and these neurons constitute a large-scale signal processing and

    memory network. The mathematical study of a single neural model and its various

    extensions is the first step in the design of a complex neural network for solving a

    variety of problems in the fields of signal processing, pattern recognition, control of

    complex processes, neurovision systems, and other decision making processes. Neural

    network solutions for these problems can be directly used for computer science and

    engineering applications.

    1 University of Saskatchewan, Saskatoon, SK, Canada, [email protected] 2 Czech Technical University in Prague, Prague, Czech, [email protected] 3 Tohoku University, Sendai, Japan, [email protected] 4 Maverick Technologies America Inc., Wilmington, DE, USA,

    [email protected] 5 The Chinese Academy of Sciences, Beijing, China, [email protected]

  • A simple neural model is presented in Figure 1. In terms of information processing,

    an individual neuron with dendrites as multiple-input terminals and an axon as a

    single-output terminal may be considered a multiple-input/single-output (MISO)

    system. The processing functions of this MISO neural processor may be divided into

    the following four categories:

    (i) Dendrites: They consist of a highly branching tree of fibers and act as input

    points to the main body of the neuron. On average, there are 103 to 104

    dendrites per neuron, which form receptive surfaces for input signals to the

    neurons.

    (ii) Synapse: It is a storage area of past experience (knowledge base). It

    provides long-term memory (LTM) to the past accumulated experience. It

    receives information from sensors and other neurons and provides outputs

    through the axons.

    (iii) Soma: The neural cell body is called the soma. It is the large, round central

    neuronal body. It receives synaptic information and performs further

    processing of the information. Almost all logical functions of the neuron are

    carried out in the soma.

    (iv) Axon: The neural output line is called the axon. The output appears in the

    form of an action potential that is transmitted to other neurons for further

    processing.

    The electrochemical activities at the synaptic junctions of neurons exhibit a complex

    behavior because each neuron makes hundreds of interconnections with other neurons.

    Each neuron acts as a parallel processor because it receives action potentials in parallel

    from the neighboring neurons and then transmits pulses in parallel to other

    neighboring synapses. In terms of information processing, the synapse also performs a

    crude pulse frequency-to-voltage conversion as shown in Figure 1.

  • Figure 1. A simple neural model as a multiple-input (dendrites) and single-output

    (axon) processor.

    1.1. Neural mathematical operations

    In general, it can be argued that the role played by neurons in the brain reasoning

    processes is analogous to the role played by a logical switching element in a digital

    computer. However, this analogy is too simple. A neuron contains a sensitivity

    threshold, adjustable signal amplification or attenuation at each synapse and an

    internal structure that allows incoming nerve signals to be integrated over both space

    and time. From a mathematical point of view, it may be concluded that the processing

    of information within a neuron involves the following two distinct mathematical

    operations:

    (i) Synaptic operation: The strength (weight) of the synapse is a representation

    of the storage of knowledge and thus the memory for previous knowledge.

    The synaptic operation assigns a relative weight (significance) to each

    incoming signal according to the past experience (knowledge) stored in the

    synapse.

    (ii) Somatic operation: The somatic operation provides various mathematical

    operations such as aggregation, thresholding, nonlinear activation, and

    dynamic processing to the synaptic inputs. If the weighted aggregation of the

    neural inputs exceeds a certain threshold, the soma will produce an output

    signal to its axon.

    A simplified representation of the above neural operations for a typical neuron is

    shown in

    Figure 2. A biological neuron deals with some interesting mathematical mapping

    properties because of its nonlinear operations combined with a threshold in the soma.

    N e ur a l i nput s S y na ps e

    D e ndr i t e s ( i nput s )

    A xon

    S om a

    N e ur a l out put w 0

  • If neurons were only capable of carrying out linear operations, the complex human

    cognition and robustness of neural systems would disappear.

    Figure 2. Simple model of a neuron showing (a) synaptic and (b) somatic operations.

    Observations from both experimental and mathematical analysis have indicated

    that neural cells can transmit reliable information if they are sufficiently redundant in

    numbers. However, in general, a biological neuron has an unpredictable mechanism

    for processing information. Therefore, it is postulated that the collective activity

    generated by large numbers of locally redundant neurons is more significant than the

    activity generated by a single neuron.

    1.2. Synaptic operation

    As shown in

    Figure 2, let us consider a neural memory vector of accumulated past experiences

    nT

    nwww ],,,[ 21 w , which is usually called synapse weights, and a neural

    input vector nT

    nxxx ],,,[ 21 x as the current external stimuli. Through the

    comparison process between the neural memory w and the input x , the neuron can

    calculate a similarity between the usual (memory base) and current stimuli and thus

    know the current situation (Kobayashi, 2006). According to the similarity, the neuron

    can then derive its internal value as the membrane potential.

    A similarity measure u can be calculated as an inner product of the neural memory

    vector w and the current input vector x given by

  • n

    i

    iinn

    T

    xwxwxwxw

    u

    1

    2211

    )(

    xwxw

    (1)

    The similarity implies the linear combination of the neural memory and the current

    input, or correlation between them. This idea can be traced back to the milestone

    model proposed by McCulloch and Pitts (1943).

    As shown in Figure 3, the inner product can also be represented as

    cos|||| xwu (2)

    where |.| denotes the absolute value of the vector and is the angle between the

    vectors w and x .

    Figure 3. Inner product as a measure of similarity between a neural memory (past

    experience) w and a neural input (current experience) x .

    When a current input x points to the same or very similar direction of the neural

    memory w , the similarity measure u becomes large and correlation between the

    memory w and the input x becomes positively strong due to 1cos . If the

    input x points to the opposite or nearly opposite direction of the memory w , the

    absolute value of the similarity measure |u| also becomes large, but the negative

    correlation becomes strong because 1cos . In these two cases, absolute values of

    the memory w and the input x also influence the similarity measure. The other

    particular case is that the input x and the memory w are orthogonal with each other.

    In this case, the similarity measure u becomes very small due to 0cos . If the two

    vectors are strictly orthogonal, the similarity measure u is equal to 0. Thus, the

    similarity measure is independent of the absolute values of the memory w and the

    input x .

    The inner product indicates how much the directions of two vectors are similar to

    each other. Indeed, in the case of normalized vectors w and x , i.e., 1|||| xw , the

    similarity measure is nothing but cos :

    uu coscos|||| xw (3)

    Note that the linear combination can be extended to higher order combinations as in

  • the following section.

    1.2.1. Higher Order Terms of Neural Inputs

    In the linear combination given in eqn. (1), we considered a neural input vector

    consisting of only the first order terms of neural inputs in the polynomial. Naturally,

    we can extend the first order terms to the higher order terms of the neural inputs or any

    other nonlinear ones. To separate different classes of data with a nonlinear

    discriminant line, an HONN (Rumelhart et al., 1986a; Giles and Maxwell, 1987;

    Softky and Kammen, 1991; Xu et al., 1992; Taylor and Commbes, 1993; Homma and

    Gupta, 2002) is used. An HONN is composed of one or more HONUs.

    Here let us consider the second order polynomial of the neural inputs. In this case,

    the extended neural input and memory vectors, ax and aw , can be defined by

    T

    nnnnna xxxxxxxxxxxx ],,...,,,...,,,,,,[2

    1

    2

    2121

    2

    121 x (4)

    T

    nnnnnna wwwwwwwww ],,...,,,...,,,,,,[ )1(221121100201 w (5)

    Then the similarity measure can be given with the same notation

    n

    i

    n

    ij

    jiij

    n

    i

    ii

    nnnnnnn

    nnnn

    a

    T

    aaaa

    xxwxw

    xwxxwxw

    xxwxxwxwxwxwxw

    u

    11

    2

    1)1(

    2

    222

    112112

    2

    1112211

    xwxw

    (6)

    The second order terms of ji xx can be related to correlations between the two inputs

    ix and jx . That is, if the two inputs are statistically independent of each other, then

    the second order terms become 0 while absolute values of terms become large if there is

    a linear relation between them. The squared terms of neural inputs 2

    ix indicate the

    power of the inputs from the physical point of view.

    Consequently, the similarity measure of general higher order terms can be defined

    as

    n

    nn

    n i

    n

    i

    n

    ii

    n

    ii

    iiiii

    n

    i

    n

    ij

    jiij

    n

    i

    iia xxxwxxwxwu

    1

    ...

    11 1 12 1

    2121 (7)

  • 1.3. Somatic operation

    Typical neural outputs are generated by a sigmoidal activation function of the similarity

    measure u of the inner product of neural memories (past experiences) and current

    inputs. In this case, the neural output y can be given as 1)( uy (8)

    where is a neural activation function. An example of the activation function can be

    defined as a so-called sigmoidal function given by

    )exp(1

    1)(

    xx

    (9)

    and shown in Figure 4.

    Figure 4. A sigmoidal activation function.

    Note that the activation function is not limited to the sigmoid one. However, this

    type of sigmoid function has been widely used in various fields. Here if the similarity u

    is large—that is, the current input x is similar to the corresponding neural memory

    w —the neural output y is also large. On the other hand, if the similarity u is small,

    the neural output y is also small. This is a basic characteristic of biological neural

    activities. Note that the neural output is not proportional to the similarity u, but a

    nonlinear function of u with saturation characteristics. This nonlinearity might be a

    key mechanism to make the neural activities more complex as brains do.

    1.4. Learning from experiences

    From the computational point of view, we have discussed how neurons, which are

    elemental computational units in the brain, produce outputs y as the results of neural

    information processing based on comparison of current external stimuli x with neural

    memories of past experiences w . Consequently, the neural outputs y are strongly

  • dependent on the neural memories w . Thus, how neurons can memorize past

    experiences is crucial for neural information processing. Indeed, one of the most

    remarkable features of the human brain is its ability to adaptively learn in response to

    knowledge, experience, and environment. The basis of this learning appears to be a

    network of interconnected adaptive elements by means of which transformation

    between inputs and outputs is performed.

    Learning can be defined as the acquisition of new information. In other words,

    learning is a process of memorizing new information. Adaptation implies that the

    element can change in a systematic manner and in so doing alter the transformation

    between input and output. In the brain, transmission within the neural system

    involves coded nerve impulses and other physical chemical processes that form

    reflections of sensory stimuli and incipient motor behavior.

    Many biological aspects are associated with such learning processes, including

    (Harston, 1990)

    Learning overlays hardwired connections

    Synaptic plasticity versus stability: a crucial design dilemma

    Synaptic modification providing a basis for observable organism behavior

    Here, we have presented the basic foundation of neural networks starting from a

    basic introduction to the biological foundations, neural models, and learning properties

    inherent in neural networks. The rest of the chapter contains the following five

    sections:

    In section 2, as the first step to understanding HONNs, we will develop a general

    matrix form of the second order neural units (SONUs) and the learning algorithm.

    Using the general form, it will be shown that, from the point of view of both the neural

    computing process and its learning algorithm, the widely used linear combination

    neural units described above are only a subset of the developed SONUs.

    In section 3, we will conduct some simulation studies to support the theoretical

    development of second order neural networks (SONNs). The results will show how and

    why SONNs can be effective for many problems.

    In section 4, HONUs and HONNs with a learning algorithm will be presented.

    Toward computer science and engineering applications, function approximation and

    time series analysis problems will be considered in section 5.

    Concluding remarks and future research directions will be given in section 6.

    2. Second Order Neural Units and Second Order Neural Networks

  • Neural networks, consisting of first order neurons which provide the neural output as a

    nonlinear function of the weighted linear combination of neural inputs, have been

    successfully used in various applications such as pattern recognition/classification,

    system identification, adaptive control, optimization, and signal processing (Sinha et al.,

    1999; Gupta et al., 2003; Narendra and Parthasarathy, 1990; Cichochi and Unbehauen,

    1993).

    The higher order combination of the inputs and weights will yield higher neural

    performance. However, one of the disadvantages encountered in the previous

    development of HONUs is the larger number of learning parameters (weights) required

    (Schmidt, 1993). To optimize the features space, a learning capability assessment

    method has been proposed by Villalobos and Merat (1995).

    In this section, in order to reduce the number of parameters without loss of higher

    performance, an SONU is presented (Homma and Gupta, 2002); A SONU is also

    sometimes denoted as a quadratic neural unit (Bukovsky et al 2010). Using a general

    matrix form of the second order operation, the SONU provides the output as a nonlinear

    function of the weighted second order combination of input signals. Note that the

    matrix form can contribute to high speed computing, such as parallel and vector

    processing, which is essential for scientific and image processing.

    2.1. Formulation of the second order neural unit

    A SONU with n-dimensional neural inputs,nt )(x , and a single neural output,

    1)( ty , is developed in this section (Figure 5). Let

    1,],,,[ 01

    10 xxxx nTna x , be an augmented neural input vector. Here a new

    second-order aggregating formulation is proposed by using an augmented weight

    matrix )1()1()( nna tW as

    aa

    T

    au xWx (10)

    Then the neural output, y, is given by a nonlinear function of the variable u as 1)( uy (11)

  • Figure 5. An SONU defined by eqns. (10) and (11).

    Because both the weights ijw and },,1,0{,, njiw ji in the augmented weight

    matrix aW yield the same second order term ji xx (or ij xx ), an upper triangular

    matrix or lower triangular matrix is sufficient to use. For instance, instead of

    separately determining values for w01 and w10, both of which are weights for x0x1,

    one can eliminate one of these weights and determine a value for either w01 or w10

    that would be as much as both of these combined if they were computed separately.

    This saves time in the neural network’s intensive procedure of computing weights.

    The same applies for other redundant weights. The equation for the discriminant

    line can be reexpressed as equal to transpose of the vector of neural inputs multiplied by

    the upper triangular matrix of neural weights multiplied by the vector of neural inputs

    again:

    n

    i

    n

    ij

    jiijaa

    T

    a xxxwu0

    0 1,xWx (12)

    The number of elements, Wn, in the matrix of neural weights with redundant

    elements is equal to (n+1) * (n+1). To calculate the number of elements in the final

    matrix of neural weights with redundant elements eliminated, Wa, first find the

    number of elements, which is (n+1) * (n+1). Then subtract the number of diagonal

    elements in the matrix, which is n+1. Divide this by 2 and the result is the number

    of elements above or below the diagonal in the matrix. Then add back the number

    of diagonal elements in the matrix. Therefore, the number of elements in Wa with

    redundant elements eliminated is given as

    ( 1)*( 1) ( 1)( 1)

    2

    n n nn

    =

    2 3 2

    2

    n n

  • Note that the conventional first order weighted linear combination is only a special

    case of this second order matrix formulation. For example, the special weight matrix

    (row vector). )1()1(

    00100 ],,,[ nnna wwwRow W , can produce the equivalent

    weighted linear combination, n

    j jjxwu

    0 0. Therefore, the proposed neural model

    with the second order matrix operation is more general and, for this reason, it is called

    an SONU.

    2.2. Learning algorithms for second order neural units

    Here learning algorithms are developed for SONUs. Let k denote the discrete time

    steps, ,2,1k , and 1)( kyd be the desired output signal corresponding to the

    neural input vector nk )(x at the k-th time step. A square error, )(kE , is defined

    by the error, )()()( kykyke d , as

    2)(2

    1)( kekE (13)

    where )(ky is the neural output corresponding to the neural input )(kx at the k-th

    time instant.

    The purpose of the neural units is to minimize the error E by adapting the weight

    matrix aW as

    )()()1( kkk aaa WWW (14)

    Here )(kaW denotes the change in the weight matrix, which is defined as

    proportional to the gradient of the error function )(kE

    )(

    )()(

    k

    kEk

    a

    aW

    W

    (15)

    where 0 is a learning coefficient. Since the derivatives, },,2,1{,, njiwE ij ,

    are calculated by the chain rule as

    )()())((')(

    )(

    )(

    )(

    )(

    )(

    )(

    )(

    )(

    kxkxkuke

    kw

    ku

    ku

    ky

    ky

    kE

    kw

    kE

    ji

    ijij

    (16)

    or

    )()())((')()(

    )(kkkuke

    k

    kE Taa

    a

    xxW

    (17)

  • The changes in the weight matrix are given by

    )()())((')()( kkkukekT

    aaa xxW (18)

    Here )(' u is the slope of the nonlinear activation function used in eqn. (11). For

    activation functions such as sigmoidal function, 0)(' u and )(' u can be regarded

    as a gain of the changes in weights. Then

    )()()()( kkkekT

    aaa xxW (19)

    where )(' u . Note that taking the average of the changes for some input vectors,

    the changes in the weights, )(kwij , implies the correlation between the error )(ke

    and the corresponding inputs term )()( kxkx ji .

    Therefore, conventional learning algorithms such as the backpropagation algorithm

    can easily be extended for multilayered neural network structures having the proposed

    SONUs.

    In Table I, fundamental learning rules of static and dynamic SONUs are

    summarized (for clarity with simplification of ()=for the case of time series

    prediction. As an extension of the above static learning rule of SONUs, the update rule

    of dynamic SONUs includes the recurrently calculated derivatives of neural output

    ( )n s ijy k n w where jij denotes columns of a recurrently calculated Jacobian matrix

    (Table I).

    Table I. Summary of fundamental static and dynamic learning techniques for SONU

    for time series prediction where () = for simplicity.

  • Gradient Descent

    k … sample number

    Backpropagation Through Time (BPTT)

    BPTT learning technique may be implemented as the combination

    of:

    a) RTRL for recurrent calculation of neural outputs and their

    derivatives (with respect to weights) at every sample time k, and

    b) Levenberg-Marquardt algorithm for calculation of weight

    increments W when recurrent calculation are accomplished.

    Recurrent Gradient Descent (RTRL)

    Discrete

    Dynamic

    Levenberg-Marquardt (L-M)

    Static

    Learning RuleMathematical StructureSONU

    Gradient Descent

    k … sample number

    Backpropagation Through Time (BPTT)

    BPTT learning technique may be implemented as the combination

    of:

    a) RTRL for recurrent calculation of neural outputs and their

    derivatives (with respect to weights) at every sample time k, and

    b) Levenberg-Marquardt algorithm for calculation of weight

    increments W when recurrent calculation are accomplished.

    Recurrent Gradient Descent (RTRL)

    Discrete

    Dynamic

    Levenberg-Marquardt (L-M)

    Static

    Learning RuleMathematical StructureSONU

    0

    n n

    n i j ij

    i j i

    y x x w

    Ta ax Wx

    1

    1

    ,

    n

    x

    x

    ax

    1 2, ,..., ... external neural inputsnx x x

    00 01 0

    11 10

    0 0

    n

    n

    nn

    w w w

    w w

    w

    W

    ... neural outputny

    W... upper triangular weight matrix

    1

    2

    1( )

    ( )

    ( 1)

    ( )

    ( )

    1

    ,

    n s

    n s

    n

    m

    k n

    k n

    k

    x k

    x k

    yy

    y

    ax

    ( )n sk ny T

    a ax Wx

    11( )T Tij ij ij ijw

    j j j e

    (1) (2) ( )T

    Ne e ee

    (1) (1)

    (2) (2)

    ( ) ( )

    i j

    i j

    ij

    i jN N

    x x

    x x

    wx x

    nij

    yj

    ... number of samples (data length)N

    2( ) ( )( ) ( ) ( )

    1

    2ij i jk kk k kw e e x x

    ( 1) ( ) ( )ij ij ijk k kw w w

    1

    ( )

    ( ) ( 2) ( 1)

    ( )

    0 0 0

    TTs

    s s

    Tni j

    ij ij

    T

    n n n

    ij ij ij ij

    k n

    k n k n k

    yx x

    w w

    y y y

    w w w w

    a aij a a ij

    a

    ij

    x Wxj W x x W j

    xj

    ( )( )( )n s

    ijij

    kky k n

    w ew

    ... real valuery

    1

    typically for prediction:( ) ( )( ) ( 1)

    r

    m r

    k kx k y k mx y

    3. Performance Assessment of Second Order Neural Units

    To evaluate the learning and generalization abilities of the proposed general SONUs,

    the XOR classification problem is used. The XOR problem will provide a simple

    example of how well an SONU works for the nonlinear classification problem.

    3.1. XOR problem

    Because the two-input XOR function is not linearly separable, it is one of the simplest

    logic functions that cannot be realized by a single linear combination neural unit.

    Therefore, it requires a multilayered neural network structure consisting of linear

    combination neural units.

    On the other hand, a single SONU can solve this XOR problem by using its general

    second order functions defined in eqn. (12). To implement the XOR function using a

    single SONU, the four learning patterns corresponding to the four combinations of two

    binary inputs )}1,1(),1,1(),1,1(),1,1{(),( 21 xx and the desired output

    }1,1{21 xxyd were applied to the SONU.

    For the XOR problem, the neural output, y, is defined by the signum function as

  • )sgn()( uuy . The correlation learning algorithm with a constant gain, 1 , in

    eqn. (19) was used in this case. The learning was terminated as soon as the error

    converged to 0. Because the SONU with the signum function classifies the neural

    input data by using the second order nonlinear function of the neural inputs aaT

    a xWx

    as in eqn. (10), many nonlinear classification boundaries are possible such as a

    hyperbolic boundary and an elliptical boundary (Table II).

    Table II. Initial weights ( 0k ), final weights, and the classification boundaries for the

    XOR problem.

    Note that the results of the classification boundary are dependent on the initial weights

    (Table II), and any classification boundary by the second order functions can be realized

    by a single SONU. This realization ability of the SONU is obviously superior to the

    linear combination neural unit, which cannot achieve such nonlinear classification

    using a single neural unit. At least three linear combination neural units in a layered

    structure are needed to solve the XOR problem.

    Secondly, the number of parameters (weights) required for solving this problem can

    be reduced by using the SONU. In this simulation study, by using the upper

    triangular weight matrix, only six parameters including the threshold were required for

    the SONU whereas at least nine parameters were required for the layered structure

    with three linear combination neural units.

    Each weight ijw represents how the corresponding input correlation term ji xx

  • affects the neural output. If the absolute value of the weight is very small, then the

    effect of the corresponding input term on the output may also be very small. On the

    other hand, the corresponding term may be dominant or important if the absolute value

    of the weight is large compared to the other weights.

    The weights in Table II suggest that the absolute value of 12w is always large

    independent of the initial values and the largest except for only one case (middle row

    where it is still the second largest). The absolute value of 00w is the largest in one

    case (middle row) among three cases, but the smallest in one case (top row). The input

    term corresponding to the weight 00w is nothing but the bias. Note that the large

    || 12w implies a large contribution of the correlation term 21xx to the output and that

    the contribution of the term may be negative because 012 w . Indeed, the target XOR

    function can be defined as 21xxy .

    Consequently, if the target (unknown) function involves a higher order combination

    of the input variables, the ability of the higher order neural units can be superior to

    neural units that do not have necessary higher order input terms. Of course, this is

    only a discussion on the synaptic operation, and somatic operation may create higher

    order terms in the sense of Taylor expansion of the nonlinear activation functions.

    However, such higher order terms by somatic operation may be limited or indirect.

    Thus, the direct effect of the higher order terms is a reason why the higher order neural

    units can be effective for such problems that may involve the higher order terms of the

    input variables.

    3.2. Time Series Prediction

    In this subsection, the time-series prediction performance of dynamic SONUs

    (Figure 7) adapted by dynamic gradient descent (RTRL) is demonstrated and compared

    to single hidden layer perceptron-type recurrent neural networks with various numbers

    of sigmoid neurons in the hidden layer (from 3 to 10) and two recurrent configurations,

    recurrent hidden layer (RNN) and tapped delay feedbacks of neural output (TptDNN).

    For comparison of the performance, extensive simulation analysis was performed on

    theoretical and real data shown in Figure 6 and also white noise was added to training

    and testing data to compare generalization and overfitting of SONUs.

  • -1

    0

    1 Art-3: Artificial ECG

    -1

    0

    1 Real-2: Real ECG

    -1

    0

    1 Real-3: EEG

    -0.5

    0

    0.5

    1 Art-4: Lorenz system

    -0.5

    0

    0.5

    1Art-5: MacKey-Glass

    -0.5

    0

    0.5

    1Art-2: Nonlinear periodic

    -0.5

    0

    0.5

    1Art-1: Quasiperiodic

    0 200 400 600 800 1000 1200 1400 1600 1800 2000-1

    0

    1Real-4: R-R

    k

    -0.5

    0

    0.5

    1

    Real-1: Respiration

    Figure 6. All signals (clean data) that were used in the experimental study. The first

    1000 samples were training data. Samples for k=1001-2000 were used as testing

    data.

    Table III. Total counts of simulation experiments with SONU (QNU), recurrent

    perceptron-type neural networks (RNN), and tapped-delay neural networks

    (TptDNN) with a single hidden layer and various numbers of hidden neurons (3,

  • 5, or 7)

    Table IV. The percentage of counts of neural architectures that were tested with better

    than average performance, measured with sum of square errors (SSE), of all

    neural architectures that were tested.

    Table V. Count of types of neural architectures that reached absolute minimum SSE

    for three predicting horizons (after averaging results over three levels of noise

    distortion). G OFpure smooth pure smooth

    QNU RNNTptDN

    NQNU

    TptDN

    NQNU RNN QNU RNN

    TptD

    NNQNU RNN

    TptD

    NN

    Art-1 Quasiperiodic 3 1 1 3 1 2 8 3

    Art-2 NonlinPeriodic 3 2 3 3 11

    Art-3 ECG_Art 3 2 3 3 11

    Art-4 Lorenz 3 2 3 2 1 8 3

    Art-5 MacKeyGlass 3 2 3 3 11

    Real-1 Respiration 2 1 2 3 2 1 9 1 1

    Real-2 ECG_Real 3 2 3 3 11

    Real-3 EEG 2 1 2 1 2 1 2 6 2 3

    Real-4 RR 3 2 3 3 11

    22 4 1 15 3 22 5 19 2 6

    81% 15% 4% 83% 17% 81% 19% 70.4% 7.4% 22.2%percentage

    Count J=Jmin

    100% 100% 100% 100.0%

    column count

    row count

    data info

    G OF

    pure smooth pure smooth

    data info QNU RNN

    TptDN

    N QNU RNN

    TptDN

    N QNU RNN

    TptDN

    N QNU RNN

    TptDN

    N QNU RNN

    TptD

    NN

    Art-1 Quasiperiodic 81% 46% 50% 89% 49% 59% 91% 85% 69% 89% 76% 67% 87% 64% 61%

    Art-2 NonlinPeriodic 81% 46% 50% 84% 49% 56% 96% 75% 70% 90% 59% 57% 88% 57% 58%

    Art-3 ECG_Art 100% 40% 33% 96% 43% 43% 93% 58% 59% 97% 48% 51% 97% 47% 46%

    Art-4 Lorenz 76% 47% 46% 81% 48% 52% 81% 80% 69% 80% 79% 71% 80% 63% 59%

    Art-5 MacKeyGlass 89% 51% 54% 76% 50% 50% 78% 65% 56% 82% 57% 55% 81% 56% 54%

    Real-1 Respiration 82% 56% 51% 82% 50% 57% 97% 69% 59% 84% 59% 57% 86% 58% 56%

    Real-2 ECG_Real 100% 33% 36% 97% 34% 36% 95% 55% 57% 97% 43% 42% 97% 41% 43%

    Real-3 EEG 81% 63% 63% 61% 45% 46% 40% 73% 62% 37% 68% 59% 55% 62% 58%

    Real-4 RR 89% 42% 49% 55% 44% 64% 83% 57% 57% 95% 52% 57% 81% 49% 56%

    Column Average 86% 47% 48% 80% 46% 51% 84% 68% 62% 84% 60% 57%

    Row Average% J

  • 1

    s r s rn n n nT

    i j iji j i

    x x w

    a ax W x1

    s r s rn n n nT

    i j iji j i

    x x w

    a ax W x( )kx

    1z1z

    ( 1)

    ( 1)

    ( )

    ( 1)

    ( 1)

    1k nn s

    kn

    kr

    kr

    k nr r

    y

    y

    y

    y

    y

    ( ) s rn n

    k

    ax

    1z1z

    1z1z

    ( )kry( )sk nny

    Figure 7. Schematics of the recurrent QNU with ns-1 state feedbacks (recurrences) and

    nr external inputs (real measured values) as used for time series prediction.

    4. Higher Order Neural Units and Higher Order Neural Networks

    To capture the higher order nonlinear properties of the input pattern space, extensive

    efforts have been made by Rumelhart et al. (1986), Giles and Maxwell (1987), Softky

    and Kammen (1991), Xu et al. (1992), Taylor and Commbes (1993), and Homma and

    Gupta (2002) toward developing architectures of neurons that are capable of capturing

    not only the linear correlation between components of input patterns, but also the

    higher order correlation between components of input patterns. HONNs have proven

    to have good computational, storage, pattern recognition, and learning properties and

    are realizable in hardware (Taylor and Commbes, 1993). Regular polynomial networks

    that contain the higher order correlations of the input components satisfy the

    Stone-Weierstrass theorem that is a theoretical background of universal function

    approximators by means of neural networks (Gupta et al., 2003), but the number of

    weights required to accommodate all the higher order correlations increases

    exponentially with the number of the inputs. HONUs are the basic building block for

    such an HONN. For such an HONN as shown in Figure 8, the output is given by

    )(uy (20)

    n

    i

    n

    ii

    n

    ii

    iiiiiiiiii

    N

    NNxxwxxwxwwu

    1 21 1

    11212111

    , ,,

    0

    (21)

    where T

    nxxx ],...,,[ 21x is a vector of neural inputs, y is an output, and (.) is a

  • strictly monotonic activation function such as a sigmoidal function whose inverse,

    (.)1 , exists. The summation for the kth-order correlation is taken on a set

    )1(),( 1 NjiiC j , which is a set of the combinations of j indices nii j 11

    defined by

    NjiiiniiiiiiC jjjj 1},,1:{)( 21111

    Also, the number of the Nth-order correlation terms is given by

    Njnj

    jn

    j

    jn

    1,

    )!1(!

    )!1(1

    The introduction of the set )( 1 jiiC is to absorb the redundant terms due to the

    symmetry of the induced combinations. In fact, eqn. (21) is a truncated Taylor series

    with some adjustable coefficients. The Nth-order neural unit needs a total of

    )!1(!

    )!1(1

    00

    nj

    jn

    j

    jn N

    j

    N

    j

    weights including the basis of all of the products up to N components.

    Figure 8. Block diagram of the HONU, eqns. (20) and (21).

  • Example 1 In this example, we consider a case of the third order (N = 3) neural network

    with two neural inputs (n = 2). Here

    }2,1,0{)( iC

    }22,12,11{)( 21 iiC

    }222,122,112,111{)( 321 iiiC

    and the network equation is

    322222211222211123111122222112211122110 xwxxwxxwxwxwxxwxwxwxwwy

    The HONUs may be used in conventional feedforward neural network structures as

    hidden units to form HONNs. In this case, however, consideration of the higher

    correlation may improve the approximation and generalization capabilities of the

    neural networks. Typically, only SONNs are usually employed in practice to give a

    tolerable number of weights as discussed in sections 2 and 3. On the other hand, if the

    order of the HONU is high enough, eqns. (20) and (21) may be considered as a neural

    network with n inputs and a single output. This structure is capable of dealing with

    the problems of function approximation and pattern recognition.

    To accomplish an approximation task for given input-output data )}(),({ kykx , the

    learning algorithm for the HONN can easily be developed on the basis of the gradient

    descent method. Assume that the error function is formulated as

    )(2

    1)]()([

    2

    1)( 22 kekykdkE

    where )()()( kykdke , )(kd is the desired output, and )(ky is the output of the

    neural network. Minimization of the error function by a standard steepest descent

    technique yields the following set of learning equations:

    )(')(00 uydwwoldnew (22)

    jiii

    old

    ij

    new

    ij xxxuydww 21)(')( (23)

    where dudu /)(' . Like the backpropagation algorithm for a multilayered

    feedforward neural network (MFNN), a momentum version of the above is easily

    obtained.

    Alternatively, because all the weights of the HONN appear linearly in eqn. (21), one

    may use the method for solving linear algebraic equations to carry out the preceding

    learning task if the number of patterns is finite. To do so, one has to introduce the

    following two augmented vectors

  • Tnnnnn wwwwwwwww ,...,,,...,,...,,,,...,, 2211121110w

    and

    TNnNNnn wxxxxxxxxxx ,...,,,...,,...,,,,...,,)( 21112212110 xu

    where 10 x , so that the network equations, eqns. (20) and (21), may be rewritten in

    the following compact form:

    ))(( xuwTy (24)

    For the given p pattern pairs )}(),({ kdkx , ( pk 1 ), define the following vectors and

    matrix

    TTTTT pdddpuuu ))(()),...,2(()),1((,)(),...,2(),1( 111 dU

    where ))(()( kk xuu , pk 1 . Then, the learning problem becomes one that finds a

    solution of the following linear algebraic equation

    dUw (25)

    If the number of the weights is equal to the number of the data and the matrix U is

    nonsingular, then eqn. (25) has a unique solution

    dUw1

    A more interesting case occurs when the dimension of the weight vector w is less than

    the number of data p. Then the existence of the exact solution for the above linear

    equation is given by

    UdU rankrank In case this condition is not satisfied, the pseudoinverse solution is usually an option

    and gives the best fit.

    The following example shows how to use the HONN presented in this section to deal

    with pattern recognition problems that are also typical applications in computer science

    and engineering situations. It is of interest to show that solving such problems is

    equivalent to finding the decision surfaces in the pattern space such that the given data

    patterns are located on the surfaces.

    Example 2 Consider a three-variable XOR function defined as

    321213321321321 )()()(),,( xxxxxxxxxxxxxxxfy

    The eight input pattern pairs and corresponding outputs are given in Table VI. This is

    a typical nonlinear pattern classification problem. A single linear neuron with a

    nonlinear activation function is unable to form a decision surface such that the patterns

    are separated in the pattern space. Our objective here is to find all the possible

  • solutions using the third order neural network to realize the logic function.

    Table VI. Truth table of XOR function 321 xxx .

    A third order neural network is designed as

    3211233223311321123322110 xxxwxxwxxwxxwxwxwxwwy

    where }1,1{,, 321 xxx are the binary inputs, and the network contains eight weights.

    To implement the above mentioned logic XOR function, one may consider the solution of

    the following set of linear algebraic equations:

    1

    1

    1

    1

    1

    1

    1

    1

    1232313123210

    1232313123210

    1232313123210

    1232313123210

    1232313123210

    1232313123210

    1232313123210

    1232313123210

    wwwwwwww

    wwwwwwww

    wwwwwwww

    wwwwwwww

    wwwwwwww

    wwwwwwww

    wwwwwwww

    wwwwwwww

    The coefficient matrix U is given by

    11111111

    11111111

    11111111

    11111111

    11111111

    11111111

    11111111

    11111111

    U

    Pattern Input 1x Input 2x Input 3x Output y

    A 1 1 1 1

    B 1 1 1 1

    C 1 1 1 1

    D 1 1 1 1

    E 1 1 1 1

    F 1 1 1 1

    G 1 1 1 1

    H 1 1 1 1

  • which is nonsingular. The equations have a unique set of solutions:

    1,0 1232313123210 wwwwwwww

    Therefore, the logic function is realized by the third order polynomial 321 xxxy .

    This solution is unique in terms of the third order polynomial.

    Xu et al. (1992) as well as Taylor and Commbes (1993) also demonstrated that

    HONNs may be effectively applied to problems using a model of a curve, surface, or

    hypersurface to fit a given data set. This problem, called nonlinear surface fitting, is

    often encountered in many computer science and engineering applications. Some

    learning algorithms for solving such problems can be found in their papers. Moreover,

    if one assumes xx )( in the HONU, the weight exhibits linearity in the networks

    and the learning algorithms for the HONNs may be characterized as a linear least

    square (LS) procedure. Then the well-known local minimum problems existing in

    many nonlinear neural learning schemes may be avoided.

    4.1. Representation of Higher Order Neural Network Discriminant Using

    Multidimensional Matrix Product

    The discriminant of a HONN is a summation of quadratic terms. This can be

    alternatively represented using multidimensional matrix multiplication (Solo, 2010).

    For example,

    3 3

    1 1

    ij i j

    i j

    w x x

    = w11x12 + w12x1x2 + w13x1x3 + w21x2x1 + w22x22 + w23x2x3 + w31x3x1 + w32x3x2 + w33x32

    = w11x12 + w22x22 + w33x32 + x1x2 (w12 + w21) + x1x3 (w13 + w31) + x2x3 (w23 + w32)

    This weighted summation is easily represented using classical matrices multiplied

    together:

    3 3

    1 1

    ij i j

    i j

    w x x

    = 11 12 13 1

    1 2 3 21 22 23 2

    31 32 33 3

    * *

    w w w x

    x x x w w w x

    w w w x

    It is extremely useful to express these weighted summations as matrices multiplied

    together to eliminate unnecessary terms in neural network designs. Because both the

    weights wij and wji in the matrix above correspond to the same second-order term xixj, it

    is sufficient to use only an upper triangular or lower triangular matrix. For instance,

    instead of separately determining values for w12 and w21, both of which are weights for

    x1x2, one can eliminate one of these weights and determine a value for either w12 or w21

    that would be as much as both of these combined if they were computed separately.

    The same applies for other redundant weights. This saves time in the neural

  • network’s intensive procedure of computing weights.

    However, the following equation and more complicated equations used for neural

    network applications cannot be expressed using classical matrices. Variables xi, xj, and

    xk are inputs and wijk are weights for these inputs.

    2 2 2

    1 1 1

    ijk i j k

    i j k

    w x x x

    = w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23

    = w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23

    This weighted summation can be alternatively represented using multidimensional

    matrices (Solo, 2010) multiplied together. Premultiply the 2 * 2 * 2 weight matrix by a

    1 * 2 * 2 input matrix in the first dimension and second dimension. Then postmultiply

    the 2 * 2 * 2 weight matrix by a 2 * 1 * 2 input matrix in the first dimension and second

    dimension. Premultiply this entire product by a 1 * 2 input matrix in the first

    dimension and second dimension. Note that because the first dimension and second

    dimension of these multidimensional matrices are being multiplied, this does not need

    to be indicated in the equations below.

    2 2 2

    1 1 1

    ijk i j k

    i j k

    w x x x

    =

    111 121 1

    1 2 211 221 2

    1 2

    112 122 11 2

    212 222 2

    * * *

    w w x

    x x w w xx x

    w w xx x

    w w x

    The multidimensional matrix product (Solo, 2010) of the first dimension and second

    dimension of the 1 * 2 * 2 input matrix and the 2 * 2 * 2 weight matrix results in a 1 * 2

    * 2 matrix.

    2 2 2

    1 1 1

    ijk i j k

    i j k

    w x x x

    =

    1

    111 1 211 2 121 1 221 2 2

    1 2

    1112 1 212 2 122 1 222 2

    2

    * *

    x

    w x w x w x w x xx x

    xw x w x w x w x

    x

    The multidimensional matrix product of the first dimension and second dimension

    of the 1 * 2 * 2 matrix and the 2 * 1 * 2 input matrix results in a 1 * 1 * 2 matrix.

    2 2 2

    1 1 1

    ijk i j k

    i j k

    w x x x

  • = 2 2

    1 2

    2 2

    1 2

    111 211 1 2 121 1 2 221

    1 2

    112 212 1 2 122 1 2 222

    *w x w x x w x x w x

    x xw x w x x w x x w x

    The 1 * 1 * 2 matrix can be simplified into a 1-D matrix with 2 elements, so it can be

    premultiplied by the 1 * 2 input matrix in the first dimension and second dimension.

    2 2 2

    1 1 1

    ijk i j k

    i j k

    w x x x

    = 2 2

    1 2

    2 2

    1 2

    111 211 1 2 121 1 2 2211 2

    112 212 1 2 122 1 2 222*

    w x w x x w x x w xx x

    w x w x x w x x w x

    = w111x13 + w112x12x2 + w121x12x2 + w122x1x22 + w211x12x2 + w212x1x22 + w221x1x22 + w222x23

    = w111x13 + x12x2 (w112 + w121 + w211) + x1x22 (w122+ w212 + w221) + w222x23

    Thus, this multidimensional matrix multiplication yields the same result as the

    summation of quadratic terms above.

    4.2. Modified Polynomial Neural Networks

    4.2.1. Sigma-Pi Neural Networks

    Note that an HONU contains all the linear and nonlinear correlation terms of the

    input components to the order n. A slightly generalized structure of the HONU is a

    polynomial network that includes weighted sums of products of selected input

    components with an appropriate power. Mathematically, the input-output transfer

    function of this network structure is given by

    n

    j

    w

    jiijxu

    1

    (26)

    N

    i

    iiuwy1

    (27)

    where iji ww , , N is the order of the network and iu is the output of the i-th hidden

    unit. This type of feedforward network is called a sigma-pi network (Rumelhart et al.

    1986). It is easy to show that this network satisfies the Stone-Weierstrass theorem if

    )(x is a linear function. Moreover, a modified version of the sigma-pi network, as

    proposed by Hornik et al. (1989) and Cotter (1990), is

    n

    j

    w

    jiijxpu

    1

    (28)

    N

    i

    iiuwy1

    (29)

    where iji ww , and jxp is a polynomial of jx . It is easy to verify that this

  • network satisfies the Stone-Weierstrass theorem, and thus, it can be an approximator

    for problems of functional approximations. The sigma-pi network defined in eqns. (26)

    and (27) is a special case of the above network while jxp is assumed to be a linear

    function of jx . In fact, the weights ijw in both the networks given in eqns. (26) and

    (28) may be restricted to integer or nonnegative integer values.

    4.2.2. Ridge Polynomial Neural Networks

    To obtain fast learning and powerful mapping capabilities, and to avoid the

    combinatorial increase in the number of weights of HONNs, some modified polynomial

    network structures have been introduced. One of these is the pi-sigma network (Shin

    and Ghosh, 1991), which is a regular higher order structure and involves a much

    smaller number of weights than sigma-pi networks. The mapping equation of a

    pi-sigma network can be represented as

    n

    j

    ijiji xwu1

    (30)

    N

    i

    n

    j

    ijij

    N

    i

    i xwuy1 11

    (31)

    The total number of weights for an Nth-order pi-sigma network with n inputs is only

    Nn )1( . Compared with the sigma-pi network structure, the number of weights

    involved in this network is significantly reduced. Unfortunately, when xx )( , the

    pi-sigma network does not match the conditions provided by the Stone-Weierstrass

    theorem because the linear subspace condition is not satisfied (Gupta et al., 2003).

    However, some studies have shown that it is a good network model for smooth functions

    (Shin and Ghosh, 1991).

    To modify the structure of the above mentioned pi-sigma networks such that they

    satisfy the Stone-Weierstrass theorem, Shin and Ghosh (1991) suggested considering

    the ridge polynomial neural network (RPNN). For the vectors Tijnijijij www ,...,, 21w

    and Tnxxx ,...,, 21x , let

    n

    k

    kijkij xw1

    ,wx

    which represents an inner product between the two vectors. A one-variable continuous

  • function f of the form ijwx, is called a ridge function. A ridge polynomial is a

    ridge function that can be represented as

    i

    ij

    M

    j

    ij

    N

    i

    a

    wx,00

    for some ija and n

    ij w . The operation equation of a RPNN is expressed as

    N

    j

    n

    i

    jiijy1 1

    , wx

    where xx )( . The denseness, which is a fundamental concept for universal

    function approximators described in the Stone-Weierstrass theorem, of this network can

    be verified (Gupta et al., 2003).

    The total number of weights involved in this structure is 2)1)(1( nNN . A

    comparison of the number of weights of the three types of polynomial network

    structures is given in Table VII. The results show that when the networks have the

    same higher-order terms, there are significantly less weights for a RPNN than for a

    sigma-pi network. This is a very attractive improvement offered by RPNNs.

    Table VII. The number of weights in the polynomial networks.

    5. Engineering Applications

    Function approximation problems are typical examples in many computer science and

    engineering situations. The capability to approximate nonlinear complex functions

    can be a basis of the complex pattern classification ability as well. Furthermore, the

    neural network approach with high approximation ability can be used for time series

    analysis by introducing time delay features into the neural network structure. Time

    series analysis or estimation is one of the most important problems in computer science

    and engineering applications. In this section, we will explain the function

    approximation ability of HONNs first. Neural network structures to represent time

    Order of

    network

    Number of weights

    Pi-sigma RPNN Sigma-pi

    N n=5 n=10 n=5 n=10 n=5 n=10

    2 12 22 18 33 21 66

    3 18 33 36 66 56 286

    4 24 44 60 110 126 1001

  • delay features will then be introduced for time series analysis.

    5.1. Function approximation problem

    For evaluating the function approximation ability of HONNs, an example was taken

    from Klassen et al. (1988). The task consists of learning a representation for an

    unknown, one-variable nonlinear function, )(xF , with the only available information

    being the 18 sample patterns (Villalobos and Merat, 1995).

    For this function approximation problem, a two-layered neural network structure

    was composed of two SONUs in the first layer and a single SONU in the output layer

    (Figure 9). The nonlinear activation function of the SONUs in the first layer was

    defined by a bipolar sigmoidal function as ))exp(1())exp(1()( uuu , but for

    the single output SONU, instead of the sigmoidal function, the linear function was used:

    uuy )( . The gradient learning algorithm with 1.0 was used for this

    problem.

    Figure 9. A two-layered neural network structure with two SONUs in the first layer

    and a single SONU in the output layer for the function approximation problem.

    The mapping function obtained by the SONU network after 710 learning iterations

    appears in Figure 10. In this case, the average square error taken over 18 patterns

    was 4.566E-6. The fact that the approximation accuracy shown in Figure 10 is

    extremely high is evidence of the high approximation ability of the SONN.

  • Figure 10. Training pairs and outputs estimated by the network with SONUs for the

    Klassen's function approximation problem (Klassen et al., 1988).

    Five particular trigonometric functions, )2cos(),2sin(),cos(),sin( xxxx and

    )4sin( x , were used as special features of the extra neural inputs (Klassen et al., 1988).

    Also, it has been reported (Villalobos and Merat, 1995) that the term )cos( x is not

    necessary to achieve a lower accuracy within the error tolerance 1.125E-4, but still four

    extra features were required.

    On the other hand, in this study, the high approximation accuracy of the proposed

    SONU network was achieved by only two SONUs with the sigmoidal activation function

    in the first layer and a single SONU with the linear activation function in the output

    layer, and no special features were required for high accuracy. These are remarkable

    advantages of the proposed SONN structure.

    To highlight the superiority of HONN over the simple first-order neural networks in

    capturing nonlinear correlations among multiple inputs, we show another example of

    function approximation. For simplicity and to even more emphasize the strength of

    concept of HONN, we will demonstrate the example using a single higher-order neural

    unit of various orders N=2, 3, 4, 5.

    We consider a multiple-input static function

  • 2

    2 2 2( , , )

    0.1

    x y x y zf x y z

    x y z

    (32)

    where x,y, and z are normally distributed random variables (stdev=1) that represent the

    input pattern data, and f( ) represent the target data. The length of training data was

    300. For training both MLP and HONU, a basic version of the Levenberg-Marquardt

    algorithm was implemented using a decreasing learning rate when training

    performance, sum of square errors (SSE), stopped decreasing in two consequent

    training epochs.

    Figure 11. The upper plot shows training performance of static MLP neural network

    with 10 sigmoidal neurons in a hidden layer and a linear output neuron. MLP needs

    many epochs; the bottom plot shows that training performance of HONU improves with

    increasing order N. HONUs are trained in very few epochs with the same

    Levenberg-Marquardt algorithm.

  • Figure 12. Testing for trained MLP network and HONU from Error! Reference source

    not found. on different data. The upper plot shows testing of static MLP network from

    upper part of Error! Reference source not found.. The bottom plot shows that testing of

    the best trained HONU for N=5. Mean average error of HONU is better than the one

    of MLP.

  • Figure 13. Simulation run from different initial weights than in Error! Reference

    source not found.. Again the upper plot shows training performance of a static MLP.

    This time, the MLP typically gets stuck in a local minima. The bottom plot shows a

    very similar training performance of a HONU for different initial weights and for the

    same training data. This is because pure HONU (a polynomial neural unit) is linear in

    its parameters, but it performs strong nonlinear mapping.

  • Figure 14. Testing for trained MLP network and HONU from Figure 13 on different

    data. The upper plot shows testing of a static MLP network from the upper part of

    Figure 13. The bottom plot shows that testing of the best trained HONU ( N=5).

    HONU is pervasively more often precise then MLP. However, its MAE is this time

    worse because the three outliers of the HONU become very imprecise. This may

    occasionally happen with pure HONUs without output sigmoid function and it relates to

    a lack of training data.

    6. Concluding Remarks and Future Research Directions

    In this chapter, the basic foundation of neural networks, starting from a basic

    introduction to biological foundations, neural unit models, and learning properties, has

    been introduced. Then as the first step to understanding HONNs, a general SONU

    was developed. Simulation studies for both the pattern classification and function

    approximation problems demonstrated that the learning and generalization abilities of

    the proposed SONU and neural networks having SONUs are greatly superior to that of

    the widely used linear combination neural units and their networks. Indeed, from the

    point of view of both the neural computing process and its learning algorithm, it has

  • been found that linear combination of neural units that are widely used in multilayered

    neural networks are only a subset of the proposed SONUs. Some extensions of these

    concepts to radial basis function (RBF) networks, fuzzy neural networks, and dynamic

    neural units will be interesting future research projects.

    To further strengthen the readers’ interest in HONUs and HONNs, it should be

    mentioned that HONUs are powerful nonlinear approximators that are linear in their

    parameters. That is, if we look at the fundamental HONU representations, such as

    eqn. (21) in this chapter, we clearly see that when input variables are substituted with

    training data, the weight optimization of many fundamental HONN architectures

    yields a linear optimization problem that is uniquely solvable by the

    Levenberg-Marquardt algorithm or even by the least squares method. We believe that

    HONNs represent a very big opportunity for many researchers as the need for more

    advanced optimization methods is not so urgent for many HONUs that are basic

    polynomials, yet nonlinearly powerful architectures. Therefore, rather than search for

    some complicated optimization techniques, neural networks researchers and

    practitioners may spend more effort with proper data selection and signal processing

    that plays a crucial role in performance of neural networks including HONNs of course.

    There is certainly rapidly growing research interest in the field of HONNs. There

    are increasing complexities in applications not only in the fields of aerospace, process

    control, ocean exploration, manufacturing, and resource based industry, but also in

    computer science and engineering. This chapter deals with the theoretical foundations

    of HONNs and will help readers to develop or apply the methods to their own modeling

    and simulation problems. Most of the book deals with real modeling and simulation

    applications.

    We hope that our efforts in this chapter will stimulate research interests, provide

    some new challenges to its readers, generate curiosity for learning more in the field, and

    arouse a desire to seek new theoretical tools and applications. We will consider our

    efforts successful if this chapter raises one’s level of curiosity.

    7. Acknowledgements

    Dr. Madan M. Gupta wishes to acknowledge the support from the Natural Sciences and

    Engineering Research Council of Canada through the Discovery Grant. Dr. Ivo

    Bukovsky’s research is supported by grants SGS12/177/OHK2/3T/12 and

    SGS10/252/OHK2/3T/12. Dr. Zeng-Guang Hou’s research is partially supported by the

    National Natural Science Foundation of China (Grant 61175076).

  • 8. References

    Bukovsky, I., Bila, J., Gupta, M., M, Hou, Z-G., Homma, N., (2010a).: “Foundation

    and Classification of Nonconventional Neural Units and Paradigm of Nonsynaptic

    Neural Interaction” in Discoveries and Breakthroughs in Cognitive Informatics and

    Natural Intelligence within the series of the Advances in Cognitive Informatics and

    Natural Intelligence (ACINI), ed. Y. Wang, IGI Publishing, USA, p.508-523.

    Bukovsky, I., Homma, N., Smetana, L., Rodriguez, R., Mironovova M., Vrana S.,

    (2010b): “Quadratic Neural Unit is a Good Compromise between Linear Models and

    Neural Networks for Industrial Applications”, ICCI 2010 The 9th IEEE International

    Conference on Cognitive Informatics, Beijing, China.

    Bukovsky, I., Bila, J., & Gupta, M., M. (2005). Linear Dynamic Neural Units with Time

    Delay for Identification and Control (in Czech). In Automatizace, Vol. 48, No. 10. Prague.

    Czech Republic. ISSN 0005-125X, pp. 628-635,

    Bukovsky, I., & Simeunovic, G. (2006). Dynamic-Order-Extended Time-Delay Dynamic

    Neural Units. 8th Seminar on Neural Network Applications in Electrical Engineering

    NEUREL-2006, IEEE (SCG) CAS-SP. Belgrade. ISBN 1-4244-0432-0

    Bukovsky, I., Bila, J., & Gupta, M., M. (2006). Stable Neural Architecture of Dynamic

    Neural Units with Adaptive Time Delays. In 7th International FLINS Conference on

    Applied Artificial Intelligence. ISBN 981-256-690-2. pp. 215-222.

    Cichochi, A., & Unbehauen, R. (1993). Neural Networks for Optimization and Signal

    Processing. Chichester: Wiley.

    Cotter, N. (1990). The Stone-Weierstrass Theorem and Its Application to Neural

    Networks. IEEE Trans. Neural Networks, 1(4), 290-295.

    Giles, C. L., & Maxwell, T. (1987). Learning invariance, and generalization in

    higher-order networks. Appl. Optics, 26, 4972-4978.

    Gupta, M. M., Jin, L., & Homma, N. (2003). Static and Dynamic Neural Networks:

    From Fundamentals to Advanced Theory. Hoboken, NJ: IEEE & Wiley.

    Harston, C. T. (1990). The Neurological Basis for Neural Computation. In Maren, A.

    J., Harston, C. T., & Pap, R. M. (Eds.), Handbook of Neural Computing Applications, Vol.

    1. (pp. 29-44). New York: Academic.

    Homma, N., & Gupta, M. M. (2002). A general second order neural unit. Bull. Coll.

    Med. Sci., Tohoku Univ., 11(1), 1-6.

    Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward Networks

    Are Universal Approximators. Neural Networks, 2(5), 359-366.

    http://www.icci2010.edu.cn/http://www.icci2010.edu.cn/

  • Klassen, M., Pao, Y., & Chen, V. (1988). Characteristics of the functional link net: a

    higher order delta rule net. Proc. of IEEE 2nd Annual Int'l. Conf. Neural Networks.

    Kobayashi, S. (2006). Sensation World Made by the Brain – Animals Do Not Have

    Sensors. Tokyo: Corona (in Japanese).

    Matsuba, I. (2000). Nonlinear time series analysis. Tokyo: Asakura-syoten (in

    Japanese).

    McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas imminent in

    nervous activity. Bull. Math. Biophys., 5, 115-133.

    Narendra, K., & Parthasarathy, K. (1990). Identification and control of dynamical

    systems using neural networks. IEEE Trans. Neural Networks, 1, 4-27.

    Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks, Reading, MA:

    Addison-Wesley..

    Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Internal

    Representations by Error Propagation. In Rumelhart, D. E. and McClelland, J. L.

    (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of

    Cognition, Vol. 1 (pp. 318-362). Cambridge, MA: MIT Press.

    Schmidt, W., & Davis, J. (1993). Pattern recognition properties of various feature

    spaces for higher order neural networks. IEEE Trans. Pattern Analysis and Machine

    Intelligence, 15, 795-801.

    Shin, Y., & Ghosh, J. (1991). The Pi-sigma Network: An Efficient Higher-order

    Neural Network for Pattern Classification and Function Approximation. Proc. Int.

    Joint Conf. on Neural Networks (pp. 13-18).

    Sinha, N., Gupta, M. M., & Zadeh, L. (1999). Soft Computing and Intelligent Control

    Systems: Theory and Applications. New York: Academic.

    Softky, R. W., & Kammen, D. M. (1991). Correlations in high dimensional or

    asymmetrical data sets: Hebbian neuronal processing. Neural Networks, 4, 337-347.

    Taylor, J. G., & Commbes, S. (1993). Learning higher order correlations. Neural

    Networks, 6, 423-428.

    Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

    Matrix Calculus: Part 1 of 5. Proceedings of the 2010 International Conference on

    Scientific Computing (CSC'10), 353-359. CSREA Press.

    Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

    Matrix Calculus: Part 2 of 5. Proceedings of the 2010 International Conference on

    Scientific Computing (CSC'10), 360-366. CSREA Press.

    Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

    Matrix Calculus: Part 3 of 5. Proceedings of the 2010 International Conference on

  • Scientific Computing (CSC'10), 367-372. CSREA Press.

    Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

    Matrix Calculus: Part 4 of 5. Proceedings of the 2010 International Conference on

    Scientific Computing (CSC'10), 373-378. CSREA Press.

    Solo, A. M. G. (2010). Multidimensional Matrix Algebra and Multidimensional

    Matrix Calculus: Part 5 of 5. Proceedings of the 2010 International Conference on

    Scientific Computing (CSC'10), 379-381. CSREA Press.

    Villalobos, L., & Merat, F. (1995). Learning capability assessment and feature space

    optimization for higher-order neural networks. IEEE Trans. Neural Networks, 6,

    267-272.

    P. J. Werbos (1990), “Backpropagation through time: What it is and how to do it,” Proc.

    IEEE, vol. 78, no. 10, pp. 1550–1560.

    R. J. Williams and D. Zipser (1989) “A learning algorithm for continually running fully

    recurrent neural networks,” Neural Comput., vol. 1, pp. 270–280.

    Xu, L., Oja, E., & Suen, C. Y. (1992). Modified hebbian learning for curve and surface

    fitting. Neural Networks, 5, 441-457.

    9. Additional Reading

    [A] Biological Motivation on Neural Networks [A.1] Ding, M.-Z., & Yang, W.-M. (1997). Stability of Synchronous Chaos and

    On-Off Intermittency in Coupled Map Lattices. Phys. Rev. E, 56(4), 4009-4016. [A.2] Durbin, R. (1989). On the Correspondence Between Network Models and the

    Nervous System. In R. Durbin, C. Miall, & G. Mitchison (Eds.), The Computing Neurons. Reading, Mass.: Addison-Wesley.

    [A.3] Engel, K., Konig, P., Kreiter, A. K., & Singer, W. (1991). Interhemispheric Synchronization of Oscillatory Neuronal Responses in Cat Visual Cortex. Science, 252, 1177-1178.

    [A.4] Ersu, E., & Tolle, H. (1984). A New Concept for Learning Control Inspired by Brain Theory. Proc. 9th World Congress IFAC (pp. 245-250).

    [A.5] Forbus, K. D., & Gentner, D. (1983). Casual Reasoning About Quantities. Proc. 5th Annual Conf. of the Cognitive Science Society (pp. 196-206).

    [A.6] Fujita, M. (1982). Adaptive Filter Model of the Cerebellum. Biological Cybernetics, 45, 195-206.

    [A.7] Garliaskas, A., & Gupta, M. M. (1995). A Generalized Model of Synapse-Dendrite-Cell Body as a Complex Neuron. World Congress on Neural Networks , Vol. 1 (pp. 304-307).

    [A.8] Gupta, M. M. (1988). Biological Basis for Computer Vision: Some Perspective. SPW Conf. on Intelligent Robots and Computer Vision (pp. 811-823).

    [A.9] Gupta, M. M., & Knopf, G. K. (1992). A Multitask Visual Information Processor with a Biologically Motivated Design. J. Visual Communicat., Image Representation, 3(3), 230-246.

  • [A.10] Hiramoto, M., Hiromi, Y., Giniger, E., & Hotta, Y. (2000). The Drosophila Netrin Receptor Frazzled Guides Axons by Controlling Netrin Distribution. Nature, 406(6798), 886-888.

    [A.11] Honma, N., Abe, K., Sato, M., & Takeda, H. (1998). Adaptive Evolution of Holon Networks by an Autonomous Decentralized Method. Applied Mathematics and Computation, 9(1), 43-61.

    [A.12] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.

    [A.13] Kohara, K., Kitamura, A., Morishima, M., & Tsumoto, T. (2001). Activity-Dependent Transfer of Brain-Derived Neurotrophic Factor to Postsynaptic Neurons. Science, 291, 2419-2423.

    [A.14] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605), Morgan Kaufmann.

    [A.15] Lovejoy, C. O. (1981). The Origin of Man. Science, 211, 341-350. [A.16] Maire, M. (2000). On the Convergence of Validity Interval Analysis. IEEE

    Trans. on Neural Networks, 11(3), 799-801. [A.17] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993).

    Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.

    [A.18] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ. Press.

    [A.19] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.

    [A.20] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.

    [A.21] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.

    [A.22] Pecht, O. Y., & Gur, M. (1995). A Biologically-Inspired Improved MAXNET. IEEE Trans. Neural Networks, 6, 757-759.

    [A.23] Petshe, T., & Dickinson, B. W. (1990). Trellis Codes, Receptive Fields, and Fault-Tolerance Self-Repairing Neural Networks. IEEE Trans. Neural Networks, 1(2), 154-166.

    [A.24] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, pp. 46-52.

    [A.25] Rao, D. H., & Gupta, M. M. (1993). A Generic Neural Model Based on Excitatory - Inhibitory Neural Population. IJCNN-93 (pp. 1393-1396).

    [A.26] Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386-408.

    [A.27] Skarda, C. A., & Freeman, W. J. (1987). How Brains Make Chaos in Order to Make Sense of the World. Behavioral and Brain Sciences, 10, 161-195.

    [A.28] Stevens, C. F. (1968). Synaptic Physiology. Proc. IEEE, 79(9), 916-930. [A.29] Wilson, H. R. and Cowan, J. D. (1972). Excitatory and Inhibitory

    Interactions in Localized Populations of Model Neurons. Biophysical J, 12, 1-24.

    [B] Neuronal Morphology: Concepts and Mathematical Models [B.1] Amari, S. (1971). Characteristics of Randomly Connected

    Threshold-Element Networks and Network Systems. Proc. IEEE, 59(1), 35-47. [B.2] Amari, S. (1972). Characteristics of Random Nets of Analog Neuron - Like

    Elements. IEEE Trans. Systems, Man and Cybernetics, 2, 643-654.

  • [B.3] Amari, S. (1972). Learning Patterns and Pattern Sequences by Self-Organizing Nets of Threshold Elements. IEEE Trans. on Computers, 21, 1197-1206.

    [B.4] Amari, S. (1977). A Mathematical Approach to Neural Systems. In J. Metzler (Ed.), Systems Neuroscience (pp. 67-118). New York: Academic.

    [B.5] Amari, S. (1977). Neural Theory of Association and Concept Formation. Biological Cybernetics, 26, 175-185.

    [B.6] Amari, S. (1990). Mathematical Foundations of Neurocomputing. Proc. IEEE, 78(9), 1443-1462.

    [B.7] Amit, D. J., Gutfreund, G., & Sompolinsky, H. (1985). Spin-Glass Model of Neural Networks. Physical Review A, 32, 1007-1018.

    [B.8] Anagun, A. S., & Cin, I. (1998). A Neural-Network-Based Computer Access Security System for Multiple Users. Proc. 23rd Inter. Conf. Comput. Ind. Eng., Vol. 35 (pp. 351-354).

    [B.9] Anderson, J. A. (1983). Cognition and Psychological Computation with Neural Models. IEEE Trans. System, Man and Cybernetics, 13, 799-815.

    [B.10] Anninos, P. A. Beek, B., Csermel, T. J., Harth, E. E., & Pertile, G. (1970). Dynamics of Neural Structures. J. of Theoretical Biological, 26, 121-148.

    [B.11] Aoki, C., & Siekevltz, P. (1988). Plasticity in Brain Development. Scientific American, Dec., 56-64,

    [B.12] Churchland, P. S., & Sejnowski, T. J. (1988). Perspectives on Cognitive Neuroscience. Science, 242, 741-745.

    [B.13] Holmes C. C., & Mallick, B. K. (1998). Bayesian Radial Basis Functions of Variable Dimension. Neural Computations, 10(5), 1217-1233.

    [B.14] Hopfield, J. (1990). Artificial Neural Networks are Coming. An Interview by W. Myers, IEEE Expert, Apr., 3-6.

    [B.15] Joshi, A., Ramakrishman, N., Houstis, E. N., & Rice, J. R. (1997). On Neurobiological, Neurofuzzy, Machine Learning, and Statistical Pattern Recognition Techniques. IEEE Trans. Neural Networks, 8.

    [B.16] Kaneko, K. (1994). Relevance of Dynamic Clustering to Biological Networks. Phys. D, 75, 55-73.

    [B.17] Kaneko, K. (1997). Coupled Maps with Growth and Death: An Approach to Cell Differentiation. Phys. D, 103, 505-527.

    [B.18] Knopf, G. K., & Gupta, M. M. (1993). Dynamics of Antagonistic Neural Processing Elements. Inter. J. of Neural Systems, 4(3), 291-303.

    [B.19] Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks, 1(1), 3-16.

    [B.20] Kohonen, T. (1990). The Self-Organizing Map. Proc. of the IEEE, 78(9), 1464-1480.

    [B.21] Kohonen, T. (1991). Self-Organizing Maps: Optimization Approaches. In T. Kohonen, K. Makisara, O. Simula, & J. Kangas (Eds.), Artificial Neural Networks (pp. 981-990). Amsterdam: Elsevier.

    [B.22] Kohonen, T. (1993). Things You Haven't Heard About The Self-Organizing Map. Proc. Inter. Conf. Neural Networks 1993 (pp. 1147-1156).

    [B.23] Kohonen, T. (1998). Self Organization of Very Large Document Collections: State of the Art. Proc. 8th Inter. Conf. Artificial Neural Networks, Vol. 1 (pp. 65-74).

    [B.24] LeCun, Y., Boser, B., & Solla, S. A. (1990). Optimal Brain Damage. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2 (pp. 598-605). Morgan Kaufmann.

    [B.25] Lippmann, R. P. (1987). An Introduction to Computing with Neural Networks. IEEE Acoustics, Speech and Signal Processing Magazine, 4(2), 4-22.

  • [B.26] Mantere, K., Parkkinen, J., Jaasketainen, T., & Gupta, M. M. (1993). Wilson-Cowan Neural Network Model in Image Processing. J. of Mathematical Imaging and Vision, 2, 251-259.

    [B.27] McCarthy, J., & Hayes, P. J. (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Meltzer & Michie (Eds.), Machine Intelligence, 4 (pp. 463-502). Edinburgh: Edinburgh Univ.

    [B.28] McCulloch, W. S., & Pitts, W. H. (1943). A Logical Calculus of the Ideas Imminent in Nervous Activity. Bulletin of Mathematical Biophysics, 5, 115-133.

    [B.29] McDermott, D. (1982). A Temporal Logic for Reasoning About Processes and Plans. Cognitive Science, 6, 101-155.

    [B.30] Melkonian, D. S. (1990). Mathematical Theory of Chemical Synaptic Transmission. Biological Cybernetics, 62, 539-548.

    [B.31] Poggio, T., & Koch, C. (1987). Synapses that Compute Motion. Scientific American, May, 46-52.

    [B.32] Sandewall, E. (1989). Combining Logic and Differential Equations for Describing Real-World Systems. Proc. 1st Inter. Conf. on Principles of Knowledge Representation and Reasoning (pp. 412-420). Morgan Kaufmann.

    [B.33] Setiono, R., & Liu, H. (1996). Symbolic Representation of Neural Networks. Computer, 29(3), 71-77.

    [B.34] Wilson, H. R., & Cowan, J. D. (1972). Excitatory and Inhibitory Interactions in Localized Populations of Model Neurons. Biophysical J., 12, 1-24.

    View publication statsView publication stats

    https://www.researchgate.net/publication/215884506