clustering irregular shapes using high-order neurons

23
LETTER Communicated by Shimon Edelman Clustering Irregular Shapes Using High-Order Neurons H. Lipson Mechanical Engineering Department, Technion—Israel Institute of Technology, Haifa 32000, Israel H. T. Siegelmann Industrial Engineering and Management Department, Technion—Israel Institute of Technology, Haifa 32000, Israel This article introduces a method for clustering irregularly shaped data arrangements using high-order neurons. Complex analytical shapes are modeled by replacing the classic synaptic weight of the neuron by high- order tensors in homogeneous coordinates. In the rst- and second-order cases, this neuron corresponds to a classic neuron and to an ellipsoidal- metric neuron. We show how high-order shapes can be formulated to follow the maximum-correlation activation principle and permit simple local Hebbian learning. We also demonstrate decomposition of spatial arrangements of data clusters, including very close and partially over - lapping clusters, which are dif cult to distinguish using classic neurons. Superior results are obtained for the Iris data. 1 Introduction In classical self-organizing networks, each neuron j is assigned a synaptic weight denoted by the column-vector w (j) . The winning neuron j(x) in re- sponse to an input x is the one showing the highest correlation with the input, that is, neuron j for which w (j)T x is the largest, j (x ) D arg j max kw ( j)T xk (1.1) where the operator k¢k represents the Euclidean norm of a vector. The use of the term w (j)T x as the matching criterion corresponds to selection of the neuron exhibiting the maximum correlation with the input. Note that when the synaptic weights w ( j) are normalized to a constant Euclidean length, then the above criterionbecomes the minimum Euclidean distance matching criterion, j (x ) D arg j min kx ¡ w ( j) k, j D 1, 2, . . . , N. (1.2) However, the use of a minimum-distance matching criterion incorporates several dif culties. The minimum-distance criterion implies that the fea- tures of the input domain are spherical; matching deviations are considered Neural Computation 12, 2331–2353 (2000) c ° 2000 Massachusetts Institute of Technology

Upload: lamnga

Post on 03-Jan-2017

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering Irregular Shapes Using High-Order Neurons

LETTER Communicated by Shimon Edelman

Clustering Irregular Shapes Using High-Order Neurons

H LipsonMechanical Engineering Department TechnionmdashIsrael Institute of Technology Haifa32000 Israel

H T SiegelmannIndustrial Engineering and Management Department TechnionmdashIsrael Institute ofTechnology Haifa 32000 Israel

This article introduces a method for clustering irregularly shaped dataarrangements using high-order neurons Complex analytical shapes aremodeled by replacing the classic synaptic weight of the neuron by high-order tensors in homogeneous coordinates In the rst- and second-ordercases this neuron corresponds to a classic neuron and to an ellipsoidal-metric neuron We show how high-order shapes can be formulated tofollow the maximum-correlation activation principle and permit simplelocal Hebbian learning We also demonstrate decomposition of spatialarrangements of data clusters including very close and partially over -lapping clusters which are difcult to distinguish using classic neuronsSuperior results are obtained for the Iris data

1 Introduction

In classical self-organizing networks each neuron j is assigned a synapticweight denoted by the column-vector w(j) The winning neuron j(x) in re-sponse to an input x is the one showing the highest correlation with theinput that is neuron j for which w(j)Tx is the largest

j(x) D argj max kw (j)Txk (11)

where the operator kcentk represents the Euclidean norm of a vector The useof the term w(j)Tx as the matching criterion corresponds to selection of theneuron exhibiting the maximum correlation with the input Note that whenthe synaptic weights w(j) are normalized to a constant Euclidean lengththen the above criterionbecomes the minimumEuclidean distance matchingcriterion

j(x) D argj minkx iexcl w (j)k j D 1 2 N (12)

However the use of a minimum-distance matching criterion incorporatesseveral difculties The minimum-distance criterion implies that the fea-tures of the input domain are spherical matching deviations are considered

Neural Computation 12 2331ndash2353 (2000) cdeg 2000 Massachusetts Institute of Technology

2332 H Lipson and H T Siegelmann

Figure 1 (a) Two data clusters with nonisotropic distributions (b) close clusters(c) curved clusters not linearly separable and (d) overlapping clusters

equally in all directions and distances between features must be larger thanthe distances between points within a feature These aspects preclude theability to detect higher-order complex or ill-posed feature congurationsand topologies as these are based on higher geometrical properties such asdirectionality and curvature This constitutes a major difculty especiallywhen the input is of high dimensionality where such congurations aredifcult to visualize and detect

Consider for example the simple arrangements of two data clusters re-siding in a two-dimensional domain shown in Figure 1 The input datapoints are marked as small circles and two classical neurons (a and b) arelocated at the centers of each cluster In Figure 1a data cluster b exhibitsa nonisotropic distribution therefore the nearest neighbor or maximumcorrelation criterion does not yield the desired classication because devia-

Clustering Irregular Shapes 2333

tions cannot be considered equally in all directions For example point p isfarther away from neuron b than is point q yet point p denitely belongs tocluster b while point q does not Moreover point p is closer to neuron a butclearly belongs to cluster b Similarly two close and elongated clusters willresult in the neural locations shown in Figure 1b More complex clusters asshown in Figure 1c and Figure 1d may require still more complex metricsfor separation

There have been several attempts to overcome the above difculties us-ing second-order metrics ellipses and second-order curves Generally theuse of an inverse covariance matrix in the clustering metric enables cap-turing linear directionality properties of the cluster and has been used ina variety of clustering algorithms Gustafson and Kessel (1979) use the co-variance matrix to capture ellipsoidal properties of clusters Dave (1989)used fuzzy clustering with a non-Euclidean metric to detect lines in imagesThis concept was later expanded and Krishnapuram Frigui and Nasraoui(1995) use general second-order shells such as ellipsoidal shells and sur-faces (For an overview and comparison of these methods see Frigui ampKrishnapuram 1996) Abe and Thawonmas (1997) discuss a fuzzy classi-er (ML) with ellipsoidal units Incorporation of Mahalanobis (elliptical)metrics in neural networks was addressed by Kavuri and Venkatasubrama-nian (1993) for fault analysis applications and by Mao and Jain (1996) as ageneral competitive network with embedded principal component analysisunits Kohonen (1997) also discusses the use of adaptive tensorial weightsto capture signicant variances in the components of input signals therebyintroducing a weighted elliptic Euclidean distance in the matching law Wehave also used ellipsoidal units in previous work (Lipson Hod amp Siegel-mann 1998)

At the other end of the spectrum nonparametric clustering methods suchas agglomerative techniques (Blatt Wiseman amp Domany 1996) avoid theneed to provide a parametric shape for clusters thereby permitting them totake arbitrary forms However agglomerative clustering techniques cannothandle overlappingclusters as shown in Figure 1d Overlapping clusters arecommon in real-life data and represent inherent uncertainty or ambiguityin the natural classication of the data Areas of overlap are therefore ofinterest and should be explicitly detected

This work presents an attempt to generalize the spherical-ellipsoidalscheme to more general metrics thereby giving rise to a continuum of pos-sible cluster shapes between the classic spherical-ellipsoidal units and thefully nonparametric approach To visualize a cluster metric we draw a hy-persurface at an arbitrary constant threshold of the metric The shape of theresulting surface is characteristic of the shape of the local dominance zone ofthe neuron and will be referred to as the shape of the neuron Some examplesof neuron distance metrics yielding practically arbitrary shapes at variousorders of complexity are shown in Figure 2 The shown hypersurfaces arethe optimal (tight) boundary of each cluster respectively

2334 H Lipson and H T Siegelmann

Figure 2 Some examples of single neuron shapes computed for the given clus-ters (after one epoch) (andashf) Orders two to seven respectively

In general the shape restriction of classic neurons is relaxed by replacingthe weight vector or covariance matrix of a classic neuron with a generalhigh-order tensor which can capture multilinear correlations among thesignals associated with the neurons This also permits capturing shapeswith holes andor detached areas as in Figure 3 We also use the concept ofhomogeneous coordinates to combine correlations of different orders intoa single tensor

Although carrying the same name our notion of high-order neurons isdifferent from the one used for example by Giles Griffen and Maxwell(1988) Pollack (1991) Goudreau Giles Chakradhar amp Chen (1994) andBalestrino and Verona (1994) There high-order neurons are those thatreceive multiplied combinations of inputs rather than individual inputsThose neurons were also not used for clustering or in the framework ofcompetitive learning It appears that high-order neurons were generallyabandoned mainly due to the difculty in training such neurons and theirinherent instability due to the fact that small variations in weight may causea large variation in the output Some also questioned the usefulness of high-order statistics for improvingpredictions (Myung Levy amp William1992) Inthis article we demonstrate a different notion of high-order neurons where

Clustering Irregular Shapes 2335

Figure 3 A shape neuron can have nonsimple topology (a) A single fth-orderneuron with noncompact shape including two ldquoholesrdquo (b)A single fourth-orderneuron with three detached active zones

the neurons are tensors and their order pertains to their internal tenso-rial order rather than to the degree of the inputs Our high-order neuronsexhibit good stability and excellent training performance with simple Heb-bian learning Furthermore we believe that capturing high-order informa-tion within a single neuron facilitates explicit manipulation and extractionof this information General high-order neurons have not been used forclustering or competitive learning in this manner

Section 2 derives the tensorial formulation of a single neuron startingwith a simple second-order (elliptical) metric moving into homogeneouscoordinates and then increasing order for a general shape Section 3 dis-cusses the learning capacity of such neurons and their functionality withina layer Finally section 4 demonstrates an implementation of the proposedneurons on synthetic and real data The article concludes with some remarkson possible future extensions

2 A ldquoGeneral Shaperdquo Neuron

We adopt the following notation

x w column vectorsW D matricesx(j) D(j) the jth vector matrix corresponding to the jth neuron classxi Dij element of a vectormatrixxH DH vector matrix in homogeneous coordinates

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 2: Clustering Irregular Shapes Using High-Order Neurons

2332 H Lipson and H T Siegelmann

Figure 1 (a) Two data clusters with nonisotropic distributions (b) close clusters(c) curved clusters not linearly separable and (d) overlapping clusters

equally in all directions and distances between features must be larger thanthe distances between points within a feature These aspects preclude theability to detect higher-order complex or ill-posed feature congurationsand topologies as these are based on higher geometrical properties such asdirectionality and curvature This constitutes a major difculty especiallywhen the input is of high dimensionality where such congurations aredifcult to visualize and detect

Consider for example the simple arrangements of two data clusters re-siding in a two-dimensional domain shown in Figure 1 The input datapoints are marked as small circles and two classical neurons (a and b) arelocated at the centers of each cluster In Figure 1a data cluster b exhibitsa nonisotropic distribution therefore the nearest neighbor or maximumcorrelation criterion does not yield the desired classication because devia-

Clustering Irregular Shapes 2333

tions cannot be considered equally in all directions For example point p isfarther away from neuron b than is point q yet point p denitely belongs tocluster b while point q does not Moreover point p is closer to neuron a butclearly belongs to cluster b Similarly two close and elongated clusters willresult in the neural locations shown in Figure 1b More complex clusters asshown in Figure 1c and Figure 1d may require still more complex metricsfor separation

There have been several attempts to overcome the above difculties us-ing second-order metrics ellipses and second-order curves Generally theuse of an inverse covariance matrix in the clustering metric enables cap-turing linear directionality properties of the cluster and has been used ina variety of clustering algorithms Gustafson and Kessel (1979) use the co-variance matrix to capture ellipsoidal properties of clusters Dave (1989)used fuzzy clustering with a non-Euclidean metric to detect lines in imagesThis concept was later expanded and Krishnapuram Frigui and Nasraoui(1995) use general second-order shells such as ellipsoidal shells and sur-faces (For an overview and comparison of these methods see Frigui ampKrishnapuram 1996) Abe and Thawonmas (1997) discuss a fuzzy classi-er (ML) with ellipsoidal units Incorporation of Mahalanobis (elliptical)metrics in neural networks was addressed by Kavuri and Venkatasubrama-nian (1993) for fault analysis applications and by Mao and Jain (1996) as ageneral competitive network with embedded principal component analysisunits Kohonen (1997) also discusses the use of adaptive tensorial weightsto capture signicant variances in the components of input signals therebyintroducing a weighted elliptic Euclidean distance in the matching law Wehave also used ellipsoidal units in previous work (Lipson Hod amp Siegel-mann 1998)

At the other end of the spectrum nonparametric clustering methods suchas agglomerative techniques (Blatt Wiseman amp Domany 1996) avoid theneed to provide a parametric shape for clusters thereby permitting them totake arbitrary forms However agglomerative clustering techniques cannothandle overlappingclusters as shown in Figure 1d Overlapping clusters arecommon in real-life data and represent inherent uncertainty or ambiguityin the natural classication of the data Areas of overlap are therefore ofinterest and should be explicitly detected

This work presents an attempt to generalize the spherical-ellipsoidalscheme to more general metrics thereby giving rise to a continuum of pos-sible cluster shapes between the classic spherical-ellipsoidal units and thefully nonparametric approach To visualize a cluster metric we draw a hy-persurface at an arbitrary constant threshold of the metric The shape of theresulting surface is characteristic of the shape of the local dominance zone ofthe neuron and will be referred to as the shape of the neuron Some examplesof neuron distance metrics yielding practically arbitrary shapes at variousorders of complexity are shown in Figure 2 The shown hypersurfaces arethe optimal (tight) boundary of each cluster respectively

2334 H Lipson and H T Siegelmann

Figure 2 Some examples of single neuron shapes computed for the given clus-ters (after one epoch) (andashf) Orders two to seven respectively

In general the shape restriction of classic neurons is relaxed by replacingthe weight vector or covariance matrix of a classic neuron with a generalhigh-order tensor which can capture multilinear correlations among thesignals associated with the neurons This also permits capturing shapeswith holes andor detached areas as in Figure 3 We also use the concept ofhomogeneous coordinates to combine correlations of different orders intoa single tensor

Although carrying the same name our notion of high-order neurons isdifferent from the one used for example by Giles Griffen and Maxwell(1988) Pollack (1991) Goudreau Giles Chakradhar amp Chen (1994) andBalestrino and Verona (1994) There high-order neurons are those thatreceive multiplied combinations of inputs rather than individual inputsThose neurons were also not used for clustering or in the framework ofcompetitive learning It appears that high-order neurons were generallyabandoned mainly due to the difculty in training such neurons and theirinherent instability due to the fact that small variations in weight may causea large variation in the output Some also questioned the usefulness of high-order statistics for improvingpredictions (Myung Levy amp William1992) Inthis article we demonstrate a different notion of high-order neurons where

Clustering Irregular Shapes 2335

Figure 3 A shape neuron can have nonsimple topology (a) A single fth-orderneuron with noncompact shape including two ldquoholesrdquo (b)A single fourth-orderneuron with three detached active zones

the neurons are tensors and their order pertains to their internal tenso-rial order rather than to the degree of the inputs Our high-order neuronsexhibit good stability and excellent training performance with simple Heb-bian learning Furthermore we believe that capturing high-order informa-tion within a single neuron facilitates explicit manipulation and extractionof this information General high-order neurons have not been used forclustering or competitive learning in this manner

Section 2 derives the tensorial formulation of a single neuron startingwith a simple second-order (elliptical) metric moving into homogeneouscoordinates and then increasing order for a general shape Section 3 dis-cusses the learning capacity of such neurons and their functionality withina layer Finally section 4 demonstrates an implementation of the proposedneurons on synthetic and real data The article concludes with some remarkson possible future extensions

2 A ldquoGeneral Shaperdquo Neuron

We adopt the following notation

x w column vectorsW D matricesx(j) D(j) the jth vector matrix corresponding to the jth neuron classxi Dij element of a vectormatrixxH DH vector matrix in homogeneous coordinates

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 3: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2333

tions cannot be considered equally in all directions For example point p isfarther away from neuron b than is point q yet point p denitely belongs tocluster b while point q does not Moreover point p is closer to neuron a butclearly belongs to cluster b Similarly two close and elongated clusters willresult in the neural locations shown in Figure 1b More complex clusters asshown in Figure 1c and Figure 1d may require still more complex metricsfor separation

There have been several attempts to overcome the above difculties us-ing second-order metrics ellipses and second-order curves Generally theuse of an inverse covariance matrix in the clustering metric enables cap-turing linear directionality properties of the cluster and has been used ina variety of clustering algorithms Gustafson and Kessel (1979) use the co-variance matrix to capture ellipsoidal properties of clusters Dave (1989)used fuzzy clustering with a non-Euclidean metric to detect lines in imagesThis concept was later expanded and Krishnapuram Frigui and Nasraoui(1995) use general second-order shells such as ellipsoidal shells and sur-faces (For an overview and comparison of these methods see Frigui ampKrishnapuram 1996) Abe and Thawonmas (1997) discuss a fuzzy classi-er (ML) with ellipsoidal units Incorporation of Mahalanobis (elliptical)metrics in neural networks was addressed by Kavuri and Venkatasubrama-nian (1993) for fault analysis applications and by Mao and Jain (1996) as ageneral competitive network with embedded principal component analysisunits Kohonen (1997) also discusses the use of adaptive tensorial weightsto capture signicant variances in the components of input signals therebyintroducing a weighted elliptic Euclidean distance in the matching law Wehave also used ellipsoidal units in previous work (Lipson Hod amp Siegel-mann 1998)

At the other end of the spectrum nonparametric clustering methods suchas agglomerative techniques (Blatt Wiseman amp Domany 1996) avoid theneed to provide a parametric shape for clusters thereby permitting them totake arbitrary forms However agglomerative clustering techniques cannothandle overlappingclusters as shown in Figure 1d Overlapping clusters arecommon in real-life data and represent inherent uncertainty or ambiguityin the natural classication of the data Areas of overlap are therefore ofinterest and should be explicitly detected

This work presents an attempt to generalize the spherical-ellipsoidalscheme to more general metrics thereby giving rise to a continuum of pos-sible cluster shapes between the classic spherical-ellipsoidal units and thefully nonparametric approach To visualize a cluster metric we draw a hy-persurface at an arbitrary constant threshold of the metric The shape of theresulting surface is characteristic of the shape of the local dominance zone ofthe neuron and will be referred to as the shape of the neuron Some examplesof neuron distance metrics yielding practically arbitrary shapes at variousorders of complexity are shown in Figure 2 The shown hypersurfaces arethe optimal (tight) boundary of each cluster respectively

2334 H Lipson and H T Siegelmann

Figure 2 Some examples of single neuron shapes computed for the given clus-ters (after one epoch) (andashf) Orders two to seven respectively

In general the shape restriction of classic neurons is relaxed by replacingthe weight vector or covariance matrix of a classic neuron with a generalhigh-order tensor which can capture multilinear correlations among thesignals associated with the neurons This also permits capturing shapeswith holes andor detached areas as in Figure 3 We also use the concept ofhomogeneous coordinates to combine correlations of different orders intoa single tensor

Although carrying the same name our notion of high-order neurons isdifferent from the one used for example by Giles Griffen and Maxwell(1988) Pollack (1991) Goudreau Giles Chakradhar amp Chen (1994) andBalestrino and Verona (1994) There high-order neurons are those thatreceive multiplied combinations of inputs rather than individual inputsThose neurons were also not used for clustering or in the framework ofcompetitive learning It appears that high-order neurons were generallyabandoned mainly due to the difculty in training such neurons and theirinherent instability due to the fact that small variations in weight may causea large variation in the output Some also questioned the usefulness of high-order statistics for improvingpredictions (Myung Levy amp William1992) Inthis article we demonstrate a different notion of high-order neurons where

Clustering Irregular Shapes 2335

Figure 3 A shape neuron can have nonsimple topology (a) A single fth-orderneuron with noncompact shape including two ldquoholesrdquo (b)A single fourth-orderneuron with three detached active zones

the neurons are tensors and their order pertains to their internal tenso-rial order rather than to the degree of the inputs Our high-order neuronsexhibit good stability and excellent training performance with simple Heb-bian learning Furthermore we believe that capturing high-order informa-tion within a single neuron facilitates explicit manipulation and extractionof this information General high-order neurons have not been used forclustering or competitive learning in this manner

Section 2 derives the tensorial formulation of a single neuron startingwith a simple second-order (elliptical) metric moving into homogeneouscoordinates and then increasing order for a general shape Section 3 dis-cusses the learning capacity of such neurons and their functionality withina layer Finally section 4 demonstrates an implementation of the proposedneurons on synthetic and real data The article concludes with some remarkson possible future extensions

2 A ldquoGeneral Shaperdquo Neuron

We adopt the following notation

x w column vectorsW D matricesx(j) D(j) the jth vector matrix corresponding to the jth neuron classxi Dij element of a vectormatrixxH DH vector matrix in homogeneous coordinates

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 4: Clustering Irregular Shapes Using High-Order Neurons

2334 H Lipson and H T Siegelmann

Figure 2 Some examples of single neuron shapes computed for the given clus-ters (after one epoch) (andashf) Orders two to seven respectively

In general the shape restriction of classic neurons is relaxed by replacingthe weight vector or covariance matrix of a classic neuron with a generalhigh-order tensor which can capture multilinear correlations among thesignals associated with the neurons This also permits capturing shapeswith holes andor detached areas as in Figure 3 We also use the concept ofhomogeneous coordinates to combine correlations of different orders intoa single tensor

Although carrying the same name our notion of high-order neurons isdifferent from the one used for example by Giles Griffen and Maxwell(1988) Pollack (1991) Goudreau Giles Chakradhar amp Chen (1994) andBalestrino and Verona (1994) There high-order neurons are those thatreceive multiplied combinations of inputs rather than individual inputsThose neurons were also not used for clustering or in the framework ofcompetitive learning It appears that high-order neurons were generallyabandoned mainly due to the difculty in training such neurons and theirinherent instability due to the fact that small variations in weight may causea large variation in the output Some also questioned the usefulness of high-order statistics for improvingpredictions (Myung Levy amp William1992) Inthis article we demonstrate a different notion of high-order neurons where

Clustering Irregular Shapes 2335

Figure 3 A shape neuron can have nonsimple topology (a) A single fth-orderneuron with noncompact shape including two ldquoholesrdquo (b)A single fourth-orderneuron with three detached active zones

the neurons are tensors and their order pertains to their internal tenso-rial order rather than to the degree of the inputs Our high-order neuronsexhibit good stability and excellent training performance with simple Heb-bian learning Furthermore we believe that capturing high-order informa-tion within a single neuron facilitates explicit manipulation and extractionof this information General high-order neurons have not been used forclustering or competitive learning in this manner

Section 2 derives the tensorial formulation of a single neuron startingwith a simple second-order (elliptical) metric moving into homogeneouscoordinates and then increasing order for a general shape Section 3 dis-cusses the learning capacity of such neurons and their functionality withina layer Finally section 4 demonstrates an implementation of the proposedneurons on synthetic and real data The article concludes with some remarkson possible future extensions

2 A ldquoGeneral Shaperdquo Neuron

We adopt the following notation

x w column vectorsW D matricesx(j) D(j) the jth vector matrix corresponding to the jth neuron classxi Dij element of a vectormatrixxH DH vector matrix in homogeneous coordinates

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 5: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2335

Figure 3 A shape neuron can have nonsimple topology (a) A single fth-orderneuron with noncompact shape including two ldquoholesrdquo (b)A single fourth-orderneuron with three detached active zones

the neurons are tensors and their order pertains to their internal tenso-rial order rather than to the degree of the inputs Our high-order neuronsexhibit good stability and excellent training performance with simple Heb-bian learning Furthermore we believe that capturing high-order informa-tion within a single neuron facilitates explicit manipulation and extractionof this information General high-order neurons have not been used forclustering or competitive learning in this manner

Section 2 derives the tensorial formulation of a single neuron startingwith a simple second-order (elliptical) metric moving into homogeneouscoordinates and then increasing order for a general shape Section 3 dis-cusses the learning capacity of such neurons and their functionality withina layer Finally section 4 demonstrates an implementation of the proposedneurons on synthetic and real data The article concludes with some remarkson possible future extensions

2 A ldquoGeneral Shaperdquo Neuron

We adopt the following notation

x w column vectorsW D matricesx(j) D(j) the jth vector matrix corresponding to the jth neuron classxi Dij element of a vectormatrixxH DH vector matrix in homogeneous coordinates

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 6: Clustering Irregular Shapes Using High-Order Neurons

2336 H Lipson and H T Siegelmann

m order of the high-order neurond dimensionality of the inputN size of the layer (the number of neurons)

The modeling constraints imposed by the maximum correlation match-ing criterion stem from the fact that the neuronrsquos synaptic weight w(j) hasthe same dimensionality as the input x that is the same dimensionality asa single point in the input domain while in fact the neuron is modelinga cluster of points which may have higher-order attributes such as direc-tionality and curvature We shall therefore refer to the classic neuron as arst-order (zero degree) neuron due to its correspondence to a point inmultidimensional input space

To circumvent this restriction we augment the neuron with the capacityto map additional geometric and topological information by increasing itsorder For example as the rst-order is a pointneuron the second-order casewill correspond to orientation and size components effectively attaching alocal oriented coordinate system with nonuniform scaling to the neuroncenter and using it to dene a new distance metric Thus each high-orderneuron will represent not only the mean value of the data points in thecluster it is associated with but also the principal directions curvaturesand other features of the cluster and the variance of the data points alongthese directions Intuitively we can say that rather than dening a spherethe high-order distance metric now denes a multidimensional-orientedshape with an adaptive shape and topology (see Figures 2 and 3)

In the following pages we briey review the ellipsoidal metric and es-tablish the basic concepts of encapsulation and base functions to be usedin higher-order cases where tensorial notation becomes more difcult tofollow

For a second-order neuron dene the matrix D(j) 2 Rdpoundd to encapsulatethe direction and size properties of a neuron for each dimension of theinput where d is the number of dimensions Each row i of D(j) is a unit rowvector vT

i corresponding to a principal direction of the cluster divided bythe standard deviation of the cluster in that direction (the square root ofthe variance si) The Euclidean metric can now be replaced by the neuron-dependent metric

dist(x j) DregregD(j) iexcl

x iexcl w (j)centregreg

where

D (j) D

266666664

1ps1

0 cent cent cent 0

0 1ps2

00 cent cent cent 0 1p

sd

377777775

2666664

vT1

vT2

vTd

3777775

(21)

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 7: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2337

such that deviations are normalized with respect to the variance of the datain the direction of the deviation The new matching criterion based on thenew metric becomes

i(x) D argj minregregregdist(x)(j)

regregreg

D argj minregregregD (j)

plusmnx iexcl w (j)

sup2regregreg j D 1 2 N (22)

The neuron is now represented by the pair (D(j) w(j)) where D(j) repre-sents rotation and scaling and w(j) represents translation The values of D(j)

can be obtained from an eigenstructure analysis of the correlation matrixR(j) D Sx(j)x(j)T 2 Rdpoundd of the data associated with neuron j Two well-known properties are derived from the eigenstructure of principal compo-nent analysis (1) the eigenvectors of a correlation matrix R pertaining tothe zero mean data vector x dene the unit vectors vi representing the prin-cipal directions along which the variances attain their extremal values and(2) the associated eigenvalues dene the extremal values of the variancessi Hence

D (j) D l(j)iexcl 1

2 V(j) (23)

where l (j) D diag(R(j)) produces the eigenvalues of R(j) in a diagonal formand V(j) D eig(R(j)) produces the unit eigenvectors of R(j) in matrix formAlso by denition of eigenvalue analysis of a general matrix A

AV D lV

and hence

A D VTlV (24)

Now since a term kak equals aTa we can expand the term kD(j)(x iexcl w(j))kof equation 22 as follows

regregregD (j)plusmn

x iexcl w (j)sup2regregreg D

plusmnx iexcl w (j)

sup2TD (j)TD (j)

plusmnx iexcl w (j)

sup2 (25)

Substituting equations 12 and 23 we see that

D (j)TD (j) Dpoundv1 v2 vd

curren

26666664

1s1

0 cent cent cent 0

0 1s2

00 cent cent cent 0 1

sd

37777775

2666664

vT1

vT2

vTd

3777775

D R(j)iexcl1 (26)

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 8: Clustering Irregular Shapes Using High-Order Neurons

2338 H Lipson and H T Siegelmann

meaning that the scaling and rotation matrix is in fact the inverse of R(j)Hence

i(x) D argj minmicroplusmn

x iexcl w (j)sup2T

Riexcl1plusmn

x iexcl w (j)sup2para

j D 1 2 N (27)

which corresponds to the term used by Gustafson and Kessel (1979) and inthe maximum likelihood gaussian classier (Duda amp Hart 1973)

Equation 27 involves two explicit terms R(j) and w(j) representing theorientation and size information respectively Implementations involvingellipsoidal metrics therefore need to track these two quantities separately ina two-stage step Separate tracking however cannot be easily extended tohigher orders which requires simultaneous tracking of additional higher-order qualities such as curvatures We therefore use an equivalent represen-tation that encapsulates these terms into one and permits a more systematicextension to higher orders

We combine these two coefcients into one expanded covariance denotedRH

(j) in homogenous coordinates (Faux amp Pratt 1981) as

RH(j) D Sx

(j)H x

(j)H

T (28)

where in homogenous coordinatesx(j)H 2R(dC1)pound1 and is dened by xH D

microxndash1

para

In this notation the expanded covariance also contains coordinate per-mutations involving 1 as one of the multiplicands and so different (lower)orders are introduced Consequently RH

(j) 2 R(dC1)pound(dC1) In analogy to thecorrelation matrix memoryand autoassociative memory(Haykin 1994) theextended matrix RH

(j) in its general form can be viewed as a homogeneousautoassociative tensor Now the new matching criterion becomes simply

i(x) D argj minmicro

XTR(j)H

iexcl1X

para

D argj minregregregregR

(j)H

iexcl 12 X

regregregreg

D argj minregregregregR

(j)H

iexcl1X

regregregreg j D 1 2 N (29)

This representation retains the notion of maximum correlation and for con-venience we now denote RH

iexcl1(j) as the synaptic tensorWhen we moved into homogenous coordinates we gained the following

The eigenvectors of the original covariance matrix R(j) representeddirections The eigenvectors of the homogeneous covariance matrixRH

(j) correspond to both the principal directions and the offset (both

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 9: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2339

second-order and rst-order) properties of the cluster accumulatedin RH

(j) In fact the elements of each eigenvector correspond to thecoefcient of the line equation ax C by C cent cent cent C c D 0 This propertypermits direct extension to higher-order tensors in the next sectionwhere direct eigenstructure analysis is not well dened

The transition to homogeneous coordinates dispensed with the prob-lem created by the fact that the eigenstructure properties cited abovepertained only to zero-mean data whereas RH

(j) is dened as the co-variance matrix of non-zero-mean data By using homogeneous coor-dinates we effectively consider the translational element of the clusteras yet another (additional) dimensions In this higher space the datacan be considered to be zero mean

21 Higher Orders The ellipsoidal neuron can capture only linear di-rectionality not curved directionalities such as that appearing for instancein a fork in a curved or in a sharp-cornered shape In order to capturemore complex spatial congurations higher-order geometries must be in-troduced

The classic neuron possesses a synaptic weight vector w(j) which cor-responds to a point in the input domain The synaptic weight w(j) can beseen as the rst-order average of its signals that is w(j) D

PxH (the last

element xHdC1 D 1 of the homogeneous coordinates has no effect in thiscase) Ellipsoidal units hold information regarding the linear correlationsamong the coordinates of data points represented by the neuron by usingRH

(j) DP

xHxHT Each element of RH is thus a proportionality constant

relating two specic dimensions of the cluster The second-order neuroncan therefore be regarded as a second-order approximation of the corre-sponding data distribution Higher-order relationships can be introducedby considering correlations among more than just two variables We mayconsequently introduce a neuron capable of modeling a d-dimensional datacluster to an mth-order approximation

Higher-order correlations are obtained by multiplying the generatingvector xH by itself in successive outer products more than just once Theprocess yields a covariance tensor rather than just a covariance matrix(which is a second-order tensor) The resulting homogeneous covariancetensor is thus represented by a 2(m iexcl 1)-order tensor of rank d C 1 denotedby Z

(j)H 2 R(dC1)poundcentcentcentpound(dC1) obtained by 2(m iexcl1) outer products of xH by itself

Z(j)H D x2(miexcl1)

H (210)

The exponent consists of the factor (m iexcl 1) which is the degree of the ap-proximation and a factor of 2 since we are performing autocorrelation sothe function is multiplied by itself In analogy to reasoning that leads toequation 29 each eigenvector (now an eigentensor of order m iexcl 1) corre-

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 10: Clustering Irregular Shapes Using High-Order Neurons

2340 H Lipson and H T Siegelmann

Figure 4 Metrics of various orders (a) a regular neuron (spherical metric) (b) aldquosquarerdquo neuron and (c) a neuron with ldquotwo holesrdquo

sponds to the coefcients of a principal curve and multiplying it by inputpoints produces an approximation of the distance of that input point fromthe curve Consequently the inverse of the tensor Z

(j)H can then be used to

compute the high-order correlation of the signal with the nonlinear shapeneuron by simple tensor multiplication

i(x) D argi minregregregregZ

(j)H

iexcl1shyxmiexcl1

H

regregregreg j iexcl 1 2 N (211)

where shydenotes tensor multiplication Note that the covariance tensor canbe inverted only if its order is an even number as satised by equation 210Note also that the amplitude operator is carried out by computing the rootof the sum of the squares of the elements of the argument The computedmetric is now not necessarily spherical and may take various other formsas plotted in Figure 4

In practice however high-order tensor inversion is not directly requiredTo make this analysis simpler we use a Kronecker notation for tensor prod-ucts (see Graham 1981) Kronecker tensor product attens out the elementsof X shyY into a large matrix formed by taking all possible products betweenthe elements of X and those of Y For example if X is a 2pound3 matrix thenXshyY is

X shyY D

Y cent X11 Y cent X12 Y cent X13- - - - - - - - - - - - - - - - - - - -Y cent X21 Y cent X22 Y cent X23

(212)

where each block is a matrix of the size of Y In this notation the internalstructure of higher-order tensors is easier to perceive and their correspon-dence to linear regression of principal polynomial curves is revealed Con-sider for example a fourth-order covariance tensor of the vector x D fx yg

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 11: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2341

The fourth-order tensor corresponds to the simplest nonlinear neuron ac-cording to equation 210 and takes the form of the 2pound2pound2pound2 tensor

laquox shyx shyx shyx

notD

laquoR shyR

notD

x2 xyxy y2

shy

x2 xyxy y2

D

266664

x4 x3y x3y x2y2

x3y x2y2 x2y2 xy3

- - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y2 xy3

x2y2 xy3 xy3 y4

377775

(213)

The homogeneous version of this tensor also includes all lower-order per-mutations of the coordinates of xH D fx y 1g namely the 3pound3pound3pound3 tensor

ZH(4) DlaquoxH xH xH xH

notD

laquoRH RH

notD

264

x2 xy xxy y2 yx y 1

375shy

264

x2 xy xxy y2 yx y 1

375

D

26666666666666664

x4 x3y x3 x3y x2y2 x2y x3 x2y x2

x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx3 x2y x2 x2y xy2 xy x2 xy x

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3y x2y2 x2y x2y2 xy3 xy2 x2y xy2 xyx2y2 xy3 xy2 xy3 y4 y3 xy2 y3 y2

x2y xy2 xy xy2 y3 y2 xy y2 y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -x3 x2y x2 x2y xy2 xy x2 xy xx2y xy2 xy xy2 y3 y2 xy y2 yx2 xy x xy y2 y x y 1

37777777777777775

(214)

Remark It is immediately apparent that the matrix in equation 214 corre-sponds to the matrix to be solved for nding a least-squares t of a conicsection equation ax2 C by2 C cxy C dxC eyC f D 0 to the data points Moreoverthe set of eigenvectors of this matrix corresponds to the coefcients of theset of mutually orthogonal best-t conic section curves that are the principalcurves of the data This notion adheres with Gnanadesikanrsquos (1977) methodfor nding principal curves Substitution of a data point into the equation ofa principal curve yields an approximation of the distance of the point fromthat curve and the sum of squared distances amounts to equation 211 Notethat each time we increase order we are seeking a set of principal curves ofone degree higher This implies that the least-squares matrix needs to be twodegrees higher (because it is minimizing the squared error) thus yieldingthe coefcient 2 in the exponent of equation 210

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 12: Clustering Irregular Shapes Using High-Order Neurons

2342 H Lipson and H T Siegelmann

3 Learning and Functionality

In order to show that higher-order shapes are a direct extension of classicneurons we show that they are also subject to simple Hebbian learning Fol-lowing an interpretation of Hebbrsquos postulate of learning synaptic modica-tion (ie learning) occurs when there is a correlation between presynapticand postsynaptic activities We have already shown in equation 29 that thepresynaptic activity xH and postsynaptic activity of neuron j coincide when

the synapse is strong that is RH(j)iexcl1

xH is minimum We now proceed toshow that in accordance with Hebbrsquos postulate of learning it is sufcient toincur self-organization of the neurons by increasing synapse strength whenthere is a coincidence of presynaptic and postsynaptic signals

We need to show how self-organization is obtained merely by increasingRH

(j) where j is the winning neuron As each new data point arrives at aspecic neuron j in the net the synaptic weight of that neuron is adaptedby the incremental corresponding to Hebbian learning

ZH(j)(k C 1) D ZH

(j)(k) C g(k)x(j)H

2(miexcl1) (31)

where k is the iteration counter and g(k) is the iteration-dependent learn-ing rate coefcient It should be noted that in equation 210 Z

(j)H becomes a

weighted sum of the input signals x2(miexcl1)H (unlike RH in equation 28 which

is a uniform sum) The eigenstructure analysis of a weighted sum still pro-vides the principal components under the assumption that the weights areuniformly distributed over the cluster signals This assumption holds trueif the process generating the signals of the cluster is stable over time a basicassumption of all neural network algorithms (Bishop 1997) In practice thisassumption is easily acceptable as will be demonstrated in the followingsections

In principle continuous updating of the covariance tensor may createinstability in a competitive environment as a winner neuron becomes in-creasingly dominant To force competition the covariance tensor can benormalized using any of a number of factors dependent on the applicationsuch as the number of signals assigned to the neuron so far or the distri-bution of data among the neuron (forcing uniformity) However a morenatural factor for normalization is the size and complexity of the neuron

A second-order neuron is geometrically equivalent to a d-dimensionalellipsoid A neuron may be therefore normalized by a factor V proportionalto the ellipsoid volume

V ps1 cent s2 cent cent cent cent cent sd D

dY

iD1

psi 1q

detshyshyR(j)

shyshy (32)

During the competition phase the factors R(j)x V are compared rather than

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 13: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2343

just R(j)x thus promoting smaller neurons and forcing neurons to becomeequally ldquofatrdquo The nal matching criteria for a homogeneous representationare therefore given by

j(x) D argj minregregreg ORiexcl1

H(j)xH

regregreg j D 1 2 N (33)

where

OR(j)H D

R(j)Hr

detshyshyshyR(j)

H

shyshyshy (34)

Covariance tensors of higher-order neurons can also be normalized bytheir determinant However unlike second-order neurons higher-ordershapes are not necessarily convex and hence their volume is not neces-sarily proportional to their determinant The determinant of higher-ordertensors therefore relates to more complex characteristics which include thedeviation of the data points from the principal curves and the curvatureof the boundary Nevertheless our experiments showed that this is a sim-ple and effective means of normalization that promotes small and simpleshapes over complex or large concave ttings to data The relationship be-tween geometric shape and the tensor determinant should be investigatedfurther for more elaborate use of the determinant as a normalization crite-rion

Finally in order to determine the likelihood of a pointrsquos being associatedwith a particular neuron we need to assume a distribution function As-suming gaussian distribution the likelihood p(j)(xH ) of data point xH beingassociated with cluster j is proportional to the distance from the neuron andis given by

p(j)(xH) D1

(2p )d 2eiexcl 1

2

regregOZ( j) XH

regreg2

(35)

4 Algorithm Summary

To conclude the previous sections we provide a summary of the high-order competitive learning algorithm for both unsupervised and supervisedlearning These are the algorithms that were used in the test cases describedlater in section 5

Unsupervised Competitive learning

1 Select layer size (number of neurons N) and neuron order (m) for agiven problem

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 14: Clustering Irregular Shapes Using High-Order Neurons

2344 H Lipson and H T Siegelmann

2 Initialize all neurons with ZH DP

x2(miexcl1)H where the sum is taken over

all input data or a representative portion or unit tensor or randomnumbers if no data are available To compute this value write downxmiexcl1

H as a vector with all mth degree permutations of fx1 x2 xd 1gand ZH as a matrix summing the outer product of these vectors StoreZH and Ziexcl1

H f for each neuron where f is a normalization factor (egequation 34)

3 Compute the winner for input x First compute vector xmiexcl1H from x as

in section 2 and then multiply it by the stored matrix Ziexcl1H f Winner

j is the one for which the modulus of the product vector is smallest

4 Output winner j

5 Update the winner by adding x2(miexcl1)H to ZH

(j) weighted by the learningcoefcient g(k) where k is the iteration counter Store ZH and ZH

iexcl1 f for the updated neuron

6 Go to step 3

Supervised Learning

1 Set layer size (number of neurons) to number of classes and selectneuron order (m) for a given problem

2 Train For each neuron compute Z(j)H D

Px2(miexcl1)

H where the sum istaken over all training points to be associated with that neuron Tocompute this value write down xH

miexcl1 as a vector with all mth degreepermutations of fx1 x2 xd 1g and ZH as a matrix summing theouter product of these vectors Store ZH

iexcl1 f for each neuron

3 Test Use test points to evaluate the quality of training

4 Use For each new input x rst compute vector xmiexcl1H from x as in

section 2 and then multiply it by the stored matrix Ziexcl1H f Winner j is

the one for which modulus of the product vector is smallest

5 Implementation with Synthesized and Real Data

The proposed neuron can be used as an element in any network by replacinglower-dimensionality neurons and will enhance the modeling capability ofeach neuron Depending on the application of the net this may lead toimprovement or degradation of the overall performance of the layer due tothe added degrees of freedom This section provides some examples of animplementation at different orders in both supervised and unsupervisedlearning

First we discuss two examples of second-order (ellipsoidal) neuronsto show that our formulation is compatible with previous work on el-lipsoidal neural units (eg Mao amp Jain 1996) Figure 5 illustrates how

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 15: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2345

Figure 5 (andashc) Stages in self-classication of overlapping clusters after learning200 400 and 650 data points respectively (one epoch)

a competitive network consisting of three second-order neurons gradu-ally evolves to model a two-dimensional domain exhibiting three mainactivation areas with signicant overlap The second-order neurons arevisualized as ellipses with their boundaries drawn at 2s of the distribu-tion

The next test uses a two-dimensional distribution of three arbitrary clus-ters consisting of a total of 75 data points shown in Figure 6a The networkof three second-order neurons self-organized to map this data set and createsthe decision boundaries shown in Figure 6b The dark curves are the deci-sion boundaries and the light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts forthe different distribution variances

Figure 6 (a) Original data set (b)Self-organized map of the data set Dark curvesare the decision boundaries and light lines show the local distance metric withineach cell Note how the decision boundary in the overlap area accounts for thedifferent distribution variances

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 16: Clustering Irregular Shapes Using High-Order Neurons

2346 H Lipson and H T Siegelmann

Table 1 Comparison of Self-Organization Results for IRIS Data

Method Epochs Number misclassied orUnclassied

Super paramagnetic (Blatt et al 1996) 25Learning vector quantization Generalized learning vector quantization(Pal Bezdek amp Tsao 1993) 200 17K-means (Mao amp Jain 1996) 16Hyperellipsoidal clustering(Mao amp Jain 1996) 5

Second-order unsupervised 20 4

Third-order unsupervised 30 3

The performance of the different orders of shape neuron has also beenassessed on the Iris data (Anderson 1939) Andersonrsquos time-honored dataset has become a popular benchmark problem for clustering algorithmsThe data set consists of four quantities measured on each of 150 owerschosen from three species of iris Iris setosa Iris versicolor and Iris virginicaThe data set constitutes 150 points in four-dimensional space Some re-sults of other methods are compared in Table 1 It should be noted thatthe results presented in the table are ldquobest resultsrdquo third-order neuronsare not always stable because the layer sometimes converges into differentclassications1

The neurons have also been tried in a supervised setup In supervisedmode we use the same training setup of equations 28 and 210 but traineach neuron individually on the signals associated with it using

Z(j)H D Sx2(miexcl1)

H xH 2 ordf(j)

where in homogeneous coordinates xH Dhxndash1

iand ordf(j) is the jth class

In order to evaluate the performance of this technique we trained eachneuron on a random selection of 80 of the data points of its class andthen tested the classication of the whole data set (including the remain-ing 20 for cross validation) The test determined the most active neuron

1 Performance of third-order neurons was evaluated with a layer consisting of threeneurons randomly initialized with normal distribution within the input domain withmean and variance of the input and with learning factor of 03 decreasing asymp-totically toward zero Of 100 experiments carried out 95 locked correctly on theclusters with an average 76 (5) misclassications after 30 epochs Statistics on per-formance of cited works have not been reported Download MATLAB demos fromhttpwwwcsbrandeisedu lipsonpapersgeomhtm

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 17: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2347

Table 2 Supervised Classication Results for Iris Data

Number MisclassiedOrder Epochs Average Best

3-hidden-layer neural network (Abe et al 1997) 1000 22 1Fuzzy hyperbox (Abe et al 1997) 2Fuzzy ellipsoids (Abe et al 1997) 1000 1Second-order supervised 1 308 1Third-order supervised 1 227 1Fourth-order supervised 1 160 0Fifth-order supervised 1 107 0Sixth-order supervised 1 120 0Seventh-order supervised 1 130 0

Notes Results after one training epoch with 20 cross validation Averaged results arefor 250 experiments

for each input and compared it with the manual classication This testwas repeated 250 times with the average and best results shown in Table2 along with results reported for other methods It appears that on aver-age supervised learning for this task reaches an optimum with fth-orderneurons

Figure 7 shows classication results using third-order neurons for anarbitrary distribution containing close and overlapping clusters

Finally Figure 8 shows an application of third- through fth-order neu-rons to detection and modeling of chromosomes in a human cell The cor-responding separation maps and convergence error rates are shown in Fig-ures 9 and 10 respectively

Figure 7 Self-classication of synthetic point clusters using third-order neu-rons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 18: Clustering Irregular Shapes Using High-Order Neurons

2348 H Lipson and H T Siegelmann

Figure 8 Chromosome separation (a) Designated area (b) Four third-orderneuron self-organized to mark the chromosomes in designated area Data fromSchrock et al (1996)

6 Conclusions and Further Research

We have introduced high-order shape neurons and demonstrated theirpractical use for modeling the structure of spatial distributions and forgeometric feature recognition Although high-order neurons do not di-rectly correspond to neurobiological details we believe that they can pro-vide powerful data modeling capabilities In particular they exhibit use-ful properties for correctly handling close and partially overlapping clus-ters Furthermore we have shown that ellipsoidal and tensorial clusteringmethods as well as classic competitive neurons are special cases of thegeneral-shape neuron We showed how lower-order information can beencapsulated using tensor representation in homogeneous coordinates en-abling systematic continuation to higher-order metrics This formulationalso allows for direct extraction of the encapsulated high-order informationin the form of principal curves represented by the eigentensors of eachneuron

The use of shapes as neurons raises some further practical questionswhich have not been addressed in this article and require further researchand are listed below Although most of these issues pertain to neural net-works in general the approach required here may be different

Number of neurons In the examples provided we have explicitlyused the number of neurons appropriate to the number of clustersThe question of network size is applicable to most neural architec-ture However our experiments showed that overspecication tends

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 19: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2349

Figure 9 Self-organization for chromosome separation (andashd) Second- throughfth-order respectively

to cause clusters to split into subsections whileunderspecication maycause clusters to unite

Initial weights It is well known that sensitivity to initial weights is ageneral problem of neural networks (see for example Kolen amp Pollack1990 and Pal et al 1993) However when third-order neurons wereevaluated on the Iris data with random initial weights they performedwith average 5 misclassications Thus the capacity of neurons tobe spread over volumes may provide new possibilities for addressingthe initialization problem

Computational cost The need to inverta high-order tensor introducesa signicant computational cost especially when the input dimension-

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 20: Clustering Irregular Shapes Using High-Order Neurons

2350 H Lipson and H T Siegelmann

Figure 10 Average error rate in self-organization for chromosome separationof Figure 9 respectively

ality is large There are some factors that counterbalance this computa-tional cost First the covariance tensor needs to be inverted only whena neuron is updated not when it is activated or evaluated When theneuron is updated the inverted tensor is computed and then storedand used whenever the neuron needs to compete or evaluate an inputsignal This means that the whole layer will need only one inversionfor each training point and no inversions for nontraining operationSecond because of the complexity of each neuron sometimes fewer ofthem are required Finally relatively complex shapes can be attainedwith low orders (say third order)

Selection of neuron order As in many other networks there is muchdomain knowledge about the problem that comes into the solutionthrough choice of parameters such as the order of the neurons How-ever we also estimate that due to the rather analytic representation

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 21: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2351

of each neuron and since high-order information is contained witheach neuron independently this information can be directly extracted(using the eigentensors) Thus it is possible to decompose analyticallya cross-shapes fourth-order cluster into two overlapping elliptic clus-ters and vice versa

Stability versus exibility and enforcement of particular shapesFurther research is required to nd methods for biasing neurons to-ward particular shapes in an attempt to seek particular shapes and helpneuron stability For example how would one encourage detection oftriangles and not rectangles

Nonpolynomial base functions We intend to investigate the proper-ties of shape layers based on nonpolynomial base functions Notingthat the high-order homogeneous tensors give rise to various degreesof polynomials it is possible to derive such tensors with other basefunctions such as wavelets trigonometric functions and other well-established kernel functions Such an approach may enable modelingarbitrary geometrical shapes

Acknowledgments

Wethank Eitan Domani for hishelpful commentsand insight Thiswork wassupported in part by the US-Israel Binational Science Foundation (BSF)the Israeli Ministry of Arts and Sciences and the Fund for Promotion ofResearch at the Technion H L acknowledges the generous support of theCharles Clore Foundation and the Fischbach Postdoctoral Fellowship Wethank Applied Spectral Imaging for supplying the data for the experimentsdescribed in Figure 8 through 10

Download further information MATLAB demos and implementationdetails of this work from httpwwwcsbrandeisedulaquolipsonpapersgeomhtm

References

Abe S amp Thawonmas R (1997) A fuzzy classier with ellipsoidal regionsIEEE Trans on Fuzzy Systems 5 358ndash368

Anderson E (1939) The Irises of the Gaspe Peninsula Bulletin of the AmericanIris Society 59 2ndash5

Balestrino A amp Verona B F (1994) New adaptive polynomial neural networkMathematics and Computers in Simulation 37 189ndash194

Bishop C M (1997) Neural networks for pattern recognition Oxford Claren-don Press

Blatt M Wiseman S amp Domany E (1996) Superparamagnetic clustering ofdata Physical Review Letters 7618 3251ndash3254

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 22: Clustering Irregular Shapes Using High-Order Neurons

2352 H Lipson and H T Siegelmann

Dave R N (1989) Use of the adaptive fuzzy clustering algorithm to detect linesin digital images In Proc SPIE Conf Intell Robots and Computer Vision 1192600ndash611

Duda R O amp Hart P E (1973) Pattern classication and scene analysis NewYork Wiley

Faux I D amp Pratt M J (1981) Computational geometry for design and manufac-ture New York Wiley

Frigui H amp Krishnapuram R (1996) A comparison of fuzzy shell-clusteringmethods for the detection of ellipses IEEE Transactions on Fuzzy Systems 4193ndash199

Giles C L Grifn R D amp Maxwell T (1988) Encoding geometric invariancesin higher-order neural networks In D Z Anderson (Ed) Neural informa-tion processing systems American Institute of Physics Conference Proceed-ings

Gnanadesikan R (1977) Methods for statistical data analysis of multivariate obser-vations New York Wiley

Goudreau M W Giles C L Chakradhar S T amp Chen D (1994) First-orderversus second-order single layer recurrent neural networks IEEE Transactionson Neural Networks 5 511ndash513

Graham A (1981) Kronecker products and matrix calculus With applications NewYork Wiley

Gustafson E E amp Kessel W C (1979) Fuzzy clustering with fuzzy covariancematrix In Proc IEEE CDC (pp 761ndash766) San Diego CA

Haykin S (1994) Neural networksA comprehensive foundation Englewood CliffsNH Prentice Hall

Kavuri S N amp Venkatasubramanian V (1993) Using fuzzy clustering withellipsoidal units in neural networks for robust fault classication ComputersChem Eng 17 765ndash784

Kohonen T (1997) Self organizing maps Berlin Springer-VerlagKolen J F amp Pollack J B (1990) Back propagation is sensitive to initial condi-

tions Complex Systems 4 269ndash280Krishnapuram R Frigui H amp Nasraoui O (1995) Fuzzy and possibilistic

shell clustering algorithms and their application to boundary detection andsurface approximationmdashParts I and II IEEE Transactions on Fuzzy Systems 329ndash60

Lipson H Hod Y amp Siegelmann H T (1998) High-order clustering metricsfor competitive learning neural networks In Proceedings of the Israel-KoreaBi- National Conference on New Themes in Computer Aided Geometric ModelingTel-Aviv Israel

Mao J amp Jain A (1996) A self-organizing network for hyperellipsoidal clus-tering (HEC) IEEE Transactions on Neural Networks 7 16ndash29

Myung I Levy J amp William B (1992) Are higher order statistics better formaximum entropy prediction in neural networks In Proceedings of the IntJoint Conference on Neural Networks IJCNNrsquo91 Seattle WA

Pal N Bezdek J C amp Tsao E C-K (1993) Generalized clustering networksand Kohonenrsquos self-organizing scheme IEEE Transactions on Neural Networks4 549ndash557

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999

Page 23: Clustering Irregular Shapes Using High-Order Neurons

Clustering Irregular Shapes 2353

Pollack J B (1991) The induction of dynamical recognizers Machine Learning7 227ndash252

Schrock E duManoir S Veldman T Schoell B Wienberg J Feguson SmithM A Ning Y Ledbetter D H BarAm I Soenksen D Garini Y amp RiedT (1996) Multicolor spectral karyotyping of human chromosomes Science273 494ndash497

Received August 24 1998 accepted September 3 1999