characterization of a class of sigmoid functions with applications to neural networks
TRANSCRIPT
PergamonNeurolNetworks,Vol.9,No.5,pp.819-S35,1996
Copyright01996 ElsevierscienceLtd.AllrishtsreservedPrintedinGreatBritain
0893-6080/96$15.00+.00OS93-6OSO(95)OO1O7-7
CONTRIBUTED ARTICLE
Characterizationof a Class of Sigmoid Functions withApplications to Neural Networks
ANIL MENON, KISHAN MEHROTRA, CHILUKURI K. MOHAN AND SANJAY RANKA
SyracuseUniversity
(Received 21 November 1994; revised and accepted 15 August 1995)
Abatraei-Westudy two classes of sigmoidr: the simple si&noi&, de~ned to be odd, asymptotically bounhd,completely monotone fmctions inonevariable,andthehyperbolicsigmoidr,apropersubsetofsimplesigmoih andanaturalgeneralizationof thehyperbolictangent.Weobtaina completecharactertiationfor theinversesofhyperbolicsigmoih usingEuler’sincompletebetafunctions,anddescribecompositionrulesthat illustratehowsuchfunctionsmay be synthesizedfromothers.Theseresultsare appliedto twoproblems.Firstweshowthat withrespectto simplesigmoidvthecontinuousCohen-Grossberg-Hopjieldmodelcanbe reducedto the (associated)L.egendredt~erentialequations.Second,we showthat theeffectof usingsimplesigmoti as nodetransferfmctions ina one-hiizkknlayerfeedforwardnetworkwithonesummingoutputmaybe interpretedas representingthe outputfmction as a Fourierseriessine transformevaluatedat thehiakknlayernodeinputs,thusextendingandcomplementingearlierresultsinthis area. Copyright01996 Elsevier Science Ltd
Keywords-Sigmoid functions, Hypergeometnc series, Legendre equation, Cohen-Grossberg-Hopfield model,Additive model, Fourier transform.
1. INTRODUCTION
Sigmoid functions, whose graphs are “S-shaped”curves,appearin a greatvarietyof contexts,such asthetransferfunctionsusedin manyneuralnetworks.lTheirubiquityis no accident;thesecurvesareamongthe simplest non-linear curves, striking a gracefulbalance between linear and non-linear behavior.Grossberg(1973)givesan illuminatingand pertinentdiscussionof thispoint.
Figure 1 shows threesigrnoidalfunctions,viz. thehyperbolic tangent y = tanh(x) (graph A), the
+“logistic” sigmoid y = 1/(1 + exp(-x (graph B),andthe“algebraic’ sigmoid,y = x/ (1 + X2) (graph
Acknowledgements:We wouldliketo acknowledgethehelpfulcommentsof the reviewersof this paper.We wouldalso like tothankK. GeorgeJosephfor severalusefuldiscussions.
Requestsfor reprintsshouldbe sentto K. Mehrotra,SchoolofComputerandInformationScience,2-120Centerfor ScieneeandTechnology,SyracuseUniversity,Syracuse,NY 13244-41OO,USA;e-mail:kishan(ij)top.ek.syr.edu
1 Otherexamplesof theuseof sigrnoidfunctionsarethelogisticfunction in populationmodels, the hyperbolictangent in spinmodels, the Langevinfknction in magneticdipole models, theGu&rmannianfunctionin specialfimctionstheory,the (cumula-tive)distributionfunctionsin mathematicalstatistics,thepiecewise,approximatorsin nonlinear approximationhysteresiscurvesin certainnonlinearsystems.
theory, and the
C). Figure 2 shows their inverses,x = tanh-l(y),x = ln(y/(1 – y)), and x = y/~~, labeled A‘,B‘ and C‘ respectively.In a feweases,sigrnoidcurvescan be deseribedby formulae; this rubric includespower series expansions (e.g., hyperbolic tangent),integralexpressions(e.g., error fimction), composi-tions of simplerfunctions (e.g., the Guderrnannianfunction), inversesof functionsdefinableby formulae[e.g., the “complexified” Langevin function, a
1
0.6
0.2
-0.2
-0.6
-1
.“.“
~ /“”
. . .
-3 -2 -1 0 1 2 3
FIGURE1. Some elgmoide.
819
A. Menonet al.820
I I I I I n2 -
1.2 - c
0.4 -
A . ..0.4 -
.’. ..“
-1.2 - B
-2I I !
-1 -0.6 -0.2 0.2 0.6 1
FIGURE2. Some Invereeslgmoide.
sigmoid defined as the inverse of the function,I/x – cot(x)], and differentialequations.
Although the level of abstraction in manyproblems is such that one does not need to workwithexplicitformulae,2it is usefulto studynetworkswith specific transferfunctions, as demonstratedbythe following considerations.
1.
2.
—
In determiningwhethera single layeredfeedfor-wardnetisuniquelydeterminedby itscorrespond-ing input-output map, Sussmann(1992) in hiselegantproof of uniquenessspecificallyused thepropertiesof thetanh(.)function. A lateranalysisby Albertiniet al. (1993)obtainedthesameresultswith fewer assumptions on the node transferfunction, but still requiressuch functions to beodd, and satisfycertain “independence” proper-ties. With respectto the uniquenessproblem, allnode transferfunctionsare not equivalent.3Without tractableanalyticalforms to work with,many problems relatingto sigmoidsare resistantto theory. Neural net theory offers manyexamples. For example, there have been claimsin the literatureabout the advantage(withrespectto computability,trainingtimes, etc.) of certainsigmoidaltransferfunctions over othersin back-propagation networks (Elliot, 1993; Barrington,
2 For example,in neuralnet approximationtheory,significantresultscan be obtainedaboutthe existenceof realizationswithinpreassignedtolerances,withveryfew constraintson the natureofthe node transferfunction;classic results along these lines arefound in Cybenko (1989), Diaconis and Shahshahani(1984),Funahashi(1989),andHorniket al. (1989).
3 Anotherexampleof the non-equivalenceof “sigmoids”isoffered by the work of Macintyreand Sontag (1993) on theVapnik-Chervonenkis(VC) dimensionof fecdforwardnetworks,whichshowedthatit is finiteonlyfora classof sigmoidalfunctionstheycalltheexp-RAfunctions.Theyshowedthatanalyticityof thetransferfunctionis crucial,andcannotbe relaxed,forinstancebymakingthe functionG’.
3.
4.
5<
1993). Some theoretical support comes fromconsideringthe firstderivatives(if defined)of thevarious transfer functions proposed; the firstderivativesare partiallyresponsiblefor control-lingthestepsizein theweightadjustmentphaseofthe back-propagation algorithms,which in turninfluences the rate of convergence. Explicitexpressions for sigmoids are useful in suchconsiderations.The dynamical systemdescribingthe continuousCohen-Grossberg-Hopfield (CGH) model(Grossberg, 1988), also known as the additiveSTM model and the Cohen-Grossberg model,raisesan intriguingquery.If one assumesa tanh(.)node transfer function, one can show that theCGH model is transformableto the Legendredifferentialequation (see Section 6.1). An im-portant question is whether this relationshipisrobust with respect to the choice of the transferfunction.The recentstudyof sigmoidalderivativesby Minaiand Williams(1993) is anothercase in point; theyderiveda connection with Euleriannumbers[seeGraham et al., (1989) for definitions], butrestrictedtheirinquiryto thevery specificlogisticsigmoid. Any generalization of their resultsrequiresa careful look at sigmoidsrepresentableby formulae.There are other relatedissues.For instance,thehyperbolic tangent and logistic sigtnoid areessentiallyequivalent,in thatone can be obtainedfrom the other by simpletranslationand scalingtransformations.Specifically,
1+Jp(-x)-;=;’anh( xl’)”(1.1)
Many sigrnoidshave power seriesexpansionswhichalternatein sign. Many have inverseswithhypergeo-metric seriesexpansions.On the other hand, manysigmoids have no such simple forms, or obviousconnectionswith well-knownsigmoids.It is naturalto askwhetherthesevariedanalyticalexpressionsforsigmoidshave anythingin common. It is difficulttoanswer such questions without a thorough under-standing of the analytical expressionsfor sigmoidfunctions.
In viewof theseconsiderations,thispaperundertakesa study of two classes of sigmoids: the simplesigmoidr, defied to be odd, asymptoticallybound-ed, completelymonotone functions in one variable,andthehyperbolic sigmoidr, a propersubsetof simplesigmoids and a natural generalization of thehyperbolictangent.The classof hyperbolicsigmoidsincludesa surprisingnumberof well-knownsigmoids.
A Class of SigmoidFunctions
The main contributions of the paper are asfollows:
●
●
●
●
●
Simple sigmoids, hyperbolic sigmoids and theirinversesarecompletelycharacterizedin Sections4and 5.Using seriesinversiontechniques,in Section 5, weobtaintheseriesexpansionsof hyperbolicsigmoidsfrom those of their inverses.These resultsextendthose of Minai and Williams(1993)for the logisticfunction.In Section4, we study the composition of simplesigmoids via differentiation,addition, multiplica-tion, and functional composition. These resultsalso completely specify the relationshipbetweenEuler’s incomplete beta function and the para-metrized sigmoids.In Section6.1weshowthatthecontinuousCohen–Grossberg-Hopfieldequationsbelong to the classof non-homogeneous Legendre differentialequa-tions if the neural transferfunction is a simplesigmoid.In Section 6.2 we establisha connection betweenFourier transformsand feedforwardnetswith onesummingoutputandone hiddenlayerwhosenodescontain simplesigmoidaltransferfunctions.
We do not purport to have discovered a generalframework to describeall sigmoids; indeed, such aquest is largelymeaningless;nor are we arguing forlimiting the notion of sigmoids to the classesconsideredin this paper. Simplesigmoidsare ratherspecial sigmoids, but their regular structure oftenmakes the relatedtheory tractable,paving the wayfor more generalanalysis.
2. PRELIMINARIES
Notation: 9t andS?+denoterealspace,and the setofpositive real numbers,respectively;(a, b) and [a, b]denote theopen andclosedintervalsfrom a to b. If Ais a set, then IA I is the cardinalityof A. Given afunction J its domain and range are denoted byDorn(f)and Ran~, respectively;~f~)refersto thekthderivativeof~(if it exists).Occasionally,we shalluseJ’(x) in place of~tl)(x). If a function is k timescontinuouslydifferentiableon a giveninterval1, thenwe writef E Ck(Z). Cm functionsare called smoothfunctions. The term “propositions” refers to resultscited from externalsources.
The conceptsof realanalyticfunctions(K.rantz&Parks, 1992), absolute monotonic and completelymonotonic functions (Widder, 1946) and hypergeo-metric functions(Erdelyiet al., 1953),are central towhat follows; for convenience they are reviewedbelow.
821
DEFINITION2.1 (real analyticity).Let U S L%be anopen set. A function f: U 4 9? k said to be realanalytic4 at X. E U, lfthe fmction may be representedby a convergent power seraks on some interval ofpositive radius centered at XO, i.e., f(x) =z~o aj(x–xo)J. The fmction is said to be realanalytic on V c U, f it is real analytic at eachxl) c v. ■
DEFINITION2.2 (monotonicity).Afunctionf: 91 ~ ~is absolutely monotonic in (a, b) &it has non-negativederivatives of all orders, i.e., f e Cm ((a, b)) and,
fl’J(x)>o a < x <b, k = O, 1,2.... (2.1)
A fmction f: ~ - B is completely monotonic in (a,b), ~ff(–x) is absolutely monotonic in (–b, – a).Equivalently, f is completely monotonic in (a, b)z~ff ● Cm((a, b)) and,
(-l)kJ@)(x)>O a < x <b, k=O, 1,2.... (2.2)
A fmction f: 97*3 is completely convex in (a, b),z~ff ● Cm; ((a, b)), and for aZZnon-negative k andx E (a, b), (–l)kflk)(x) >0. w
A fundamentalproperty of absolutelymonotoneand completelymonotone functions is that they arenecessarilyrealanalyticon theirdomains.5Addition-ally, iff is absolutelymonotone on an intervalZ~ $%,then it is non-negative,nondecreasing, convex, andcontinuouson Z.
DEFINITION 2.3. The generalized Gauss hypergeo-metric (GH) series ~Fg(al, . . . . r+; ‘H,. . . . 79; Z) iSdefined by,
~Fq(a,, ..., ap; ‘yl,. . . ,~q; z)
m (cr,)k(a,)k . . (a,)’ <‘~ (’_h),(%)k... (~,), ‘!
Vi : vi# O, – 1, – 2, . . . (2.3)
where (a). = (a)(a + 1) . . . (a+ n – 1) is the risingfactorial or Pochhammer’s polynomial in a. Bydefinition, (a). = 1. The @is are the numerator
4 Red analytic fimctions are also referred to as regular,homomorphic,or monogesicfunctions.
5 Thisis thecelebrated“Bernstein’stheorem”(Polya,1974).Infull, Bernstein’stheoremasserts tbat given a function f(z), ifinfinitelymanyof its derivativesf(w), f(w),... areof constantsignin the open intervalI, and if the sequencem, nz, . . . does notincreasemorerapidlythana geometricprogression(i.e., thereis afixedquantityC, suchthatVknk+l/n’ < ~, thenf(z) is analyticon the intervalI.
822 A. Menon et al.
parameters, and the ~is are referred to as the&nominator parameters of the GH series. ■
In particular, the classical GH seriesb in z,2F1(rY,~; ~; z) is definedby,
m(a)k(fl)k Z’k~.4)*F’,(@,~; ‘y;z) = F(% P; 7; z) = ~ (7)k n (
REMARK2.1. The ~Fq representationof a hypergeo-metric series,though standard,is not a uniqueone.For example,the series
could be viewedas OFO(;; z), or as IFI(1; 1; z), or as3F3(I, 1, 1; 1, 1, 1; z), etc. We shall henceforthusethe “minimal” representation,in thiscaseOFO(;; z).
REMARK2.2. In general,theparameters~i and ~i, aswell as the variable z, are allowed to be complex;however,we follow common practiceand restrictourattentionto real values,i.e., Vi : ~i, ~, z G9i?.Evenwith this restriction,the hypergeometricfunction isamazinglyversatile.Spanierand Oldham (1987) listover 170functionsthatare representablein termsofhypergeometricfunction. The hypergeometricfunc-tion is a periodic table d la Mendeleev formathematical functions; different functions getneatlypegged into variousgroups7 by the valuesofthe parameters and the form of the dependentvariable.
3. SIMPLE AND HYPERBOLIC SIGMOIDS
DEFINITION 3.1 (simple sigmoids). A fwctionu : 9 + (–1, 1) is said to be a simple sigmoid if itsatisfies the following conditions:
1. a(.) is a smooth function, i.e., u(–x) is~.2. a(.) is a oddfinction, i.e., 0(–x) = –a(x).3. a(.) has y = +1 as horizontal asymptotes, i.e.,
( ) =1.lim.+m u x4. a(x)/x is a completely convexfunction in (O, 1). ■
Simple sigmoids are required to be odd smoothfunctions bound by horizontal asymptotes; con-straintsimpose a degree of standardizationon the
6 TheclassicalGH seriesis referredto as the Gmwjiinction inthe literature(SpanierandOldham,1987).
7 ~~~em mustbe manyuniversitiestodaywhere95 per-t, ifnot 100percent, of the functionsstudiedby physics,engineering,andevenmathematicsstudents,arecoveredby this singlesymbolF(o, b;qz).” —W. W. Sawyer,citedby Grabamet al. (1989).
kinds of sigmoids being considered. The followingresults clarify the implications of the fourthconstraint.
PROPOSITION 3.1 (Feller, 1965) A fmctionf: (O, 1) d Si?is absolutely monotone on (O, 1) a~itpossesses a power series expansion with non-negativecoefficients, convergingfor O < x <1. ■
LEMMA3.1. A fmction f: (O, 1) 48 is completelymonotone on (O, 1) i~itpossesses an alternating powerseries expansion, convergingfor O < x < 1.
ProoJ8 Iff is completelymonotone in (O,1), thenthepower series expansion of f in (O, 1) has to bealternatingbecause, (–l)~fl~) >0]. On the otherhand, consider an alternating power series f(x)convergingfor all O< x < 1 and itaderivatives:
j(x) =ao–a,x+a~x2– aqx3 +...
(–l)J’)(x) = +al – ti~x + 3a~x2 +.. o (3.1)flz)(x)=2a2-6a3x+““ “.
where ai > 0 Vi. From real analysiswe know thateach of (–l)nfln)(x) has the same convergenceproperties as eqn (3.1). Also, the sum of aconvergent infinite alternatingseries is always lessthan or equal to the first term.This fact, along withtheaboveequationsimpliesthat(–l)~fl~l(x) >0, i.e.,f(x) is completely monotone on (O, 1). ■
COROLLARY3.1. u(x)/x k a completely convexfmction in (O, 1) If rT(fi/fi is a completelymonotonefunction in (O, 1).
Proof If a(x)/x is completelyconvex in (O, 1),thenithas to be analytic in (O, 1) (Widder, 1946). Also,cT(x)/x is an even function, implying that its powerseriesexpansionwillconsistonly of evenpowersin x,which alternate in sign. From Lemma 3.1,a(~/fi, will hence be completely monotone in(O, 1). The same argument suflices for the con-verse.■
If a simplesigmoidis also strictly increasing,thena much stronger statement can be made, asdemonstratedby the following proposition.
PROPOSITION3.2 (Krantz&Parks, 1992)Let y = a(x)be a strictly increasing simple sigmoid (i.e., Vx E 3,a’(x) > O). Then:
1. q - a-l : (–1, 1) -+ 9? exists.
8 Lemma3.1appearsto be “folklore”;we havebeenunabletofinda referenee.
A Classof SigmoidFunctions 823
2. q(y) is a strictly increasingfunction, analytic in theinterval (–1, 1).
3. q’(y) = l/c’(q(y)), where q’ and u’ are the firstakrivatives of q and u respectively.
4. q(y)/y is absolutely monotone in (O, 1).
REMARK3.1. If u(x)/x is completelymonotone on(O, 1) and cr is invertiblethen q(y)/y is absolutelymonotone on (O, 1),whereq denotestheinverseof a.The converse is also true, and is an immediateconsequenceof Lemma 3.1.
REMARK3.2. A simple sigmoid has two horizontalasymptotes,hence its inverse(if it exists)will havetwo vertical asymptotes(i.e., limY4~lq(y) + +~).It will be seen that as they have been defined,sigrnoidsand theirinversesarequitesimilar;both areodd, increasing, univalent, analytical functions.However, the two differ fundamentally in thatsigmoids are asymptotically bounakd, while theirinversesare not.
Simple sigmoids encompass many of the oftenusedsigmoidsdescribedby formulae.The hyperbolictangentand its close relative,the “exponential” orlogistic sigmoid, are often used in many neuralnetwork theoretical studies and applications. Forexample,most of the“spin-glassexplanations”of theCGH netusethehyperbolictangent.9The hyperbolictangenthas, among others, the following properties:
1.
2.
3.
It is an odd, strictlyincreasinganalyticalfunction,asymptoticallybounded by the linesy = +1.Itsinversetanh-l (y) hasa GH expansiongivenbyyF(l, 1/2; 3/2; y2).The first derivative of tanh-l(y) is given by1/(1 – yz) = IFO(l; ; y2), i.e., the “GH expansionof thefirstderivativeof tanh–l(y) is dependentononly one numeratorparameter.
It can be shown that many other simple sigmoids,such as thatof Elliot (1993), and the Gudermannian(Section 4.2), also have inverseswith classical GHse~es representations.loThe function tanh-*(y)/ysatisfiesa second order linear homogeneous differ-ential equation, with three regular singularpoints,located at O, 1 and oo. A sigmoid with similaranalytical behavior could be expected to have an
9Stochasticversionsof neuralnets oftenreplacedeterministicstate assignmentrulesby probabilisticones, obtainedfrom somedistribution-usually the Gibbs distribution(e.g., Boltzmannmachines and stochastic CGH models). Computingexpectedvalues for the states of the system then leads to the hyperbolictangentfunction.See Hertzet al. (1991)for a typicalexample.
10The phenomenon is not unduly surprising.A heuristicexplanationis thatif the graphsof two functions“look”thesame,theirrespectivedifferentialequationsare usuallymembersof thesamefamily.
inverse that is a solution to some second orderFuchsian equations.11 sin= any second order
Fuchsian equation with three singularitiescan betransformedinto the Gauss hypergeometricdiffer-entialequation,one solutionof which is theclassicalGH series (Klein-B6cher theorem; Whittaker &Watson, 192’7),it follows that the inverseswouldhaveclassicalseriesexpansions.Theseconsiderationsmotivatethe followingdefinition.
DEFINITION 3.2 (hyperbolic sigrnoids). A functionn : B + (–1, 1) is said to be a hyperbolic sigmoidfwction ifit satisfies the following conditions:
1. u is a real analytic, odd, strictly increasing sigmoid,such that Iimx+w a(x) = 1.
2. Letq :(–1, 1) + ~ denote the inverseof o, andq’ its firstderivative.Then,
(a) q(y)/y has a Gauss hypergeometric seriesexpansion in y2 with at most three para-meters.
(b) q’(y) has a Gauss hypergeometric seriesexpansion m y2 with at most one parameter.
4. CHARACTERIZATION: INVERSEHYPERBOLIC SIGMOIDS
The following resultis a complete characterizationfor the inversesof hyperbolic sigrnoids.Proofs arepresentedin the Appendix.
THEOREM4.1 (inverses).Let y = a (x) be a hyperbolicsigmoid, and let q : (–1, 1) * St be its inverse. Then,either
(4.1)
or
q(y) = YF(a, –; –; Y2)= (l –yy2)a~>0 (4.2)
where, by F(a, –; –; y2), we mean F(a, ~; /3;y2)(B c .%). ■
Notation. Eachinversehyperbolicsigrnoidis denotedby qa and is characterizedby a singleparameterm
COROLLARY4.1. The set of hyperbolic sigrnoih is aproper subset of the set of simple sigmoih.
11Fuchsian~wtiom are lineardifferentialeqUatiOnSMch ofwhosesingularpointsareregular(Rainville,1964).
824
A proof of Corollary 4.1 may be given along thefollowinglines.If a is a hyperbolicsigmoid,thenit issimple on the interval (–1, 1). This follows fromTheorem4.1. The seriesrepresentationfor its inversein (–1, 1) has non-negative coefficients, and thisimpliesq(y)/y is absolutelymonotone (Proposition3.1). Hence a(x)/x is completely monotone, andthereforesimple (Lemma 3.1 and Remark 3.1). Theconverse is not true. Simple sigmoids need not behyperbolic. The error function erf(.) is simple, butone can use the study of Carlitz (1963) on thisfunction to show that it does not have an inverserepresentableby a classicalhypergeometncseries.Itfollows that erf(.) is not a hyperbolic sigmoid, andhence the set of hyperbolic sigmoids is a propersubsetof the setof simplesigmoids. ■
For specificvaluesof itsparameters,thehypergeo-metric function often reducesto other well knownspecialfunctions.When inversehyperbolic sigmoidsare characterizedby eqn (4.1), there is an intimateconnectionwith Euler’sincomplete beta function.
PROPOSITION4.1 (Spanierand Oldham, 1987).Leta,P andy be such that, ~ = ~ – 1. Then,
where B(v; u; z) is the incomplete beta function,&fined by ~ t’-l(l – t)”-’dt, where O<z <1. Znparticular,
tmh-i(z)
~ i3(l/2; 1 – a; Z2)=~
cosh2(”-’)(t)dt. ■o
Spanierand Oldham (1987) give a detaileddescrip-tion of themany propertiesof thisimportantspecialfunction. The following corollary is an immediateconsequenceof Theorem4.1 and Proposition4.1. Itgives the connection between inverse hyperbolicsigmoidsand Euler’sincompletebeta function.
COROLLARY4.2. If q.(y) = yF(cr, 1/2; 3/2; yz), then
%(Y) = k B(l/2; 1 – ~; Y2)” ■
The relationshipbetween hyperbolic sigmoids andthe incompletebeta function also makes explicittherelationshipbetweenthehyperbolictangentfunction,and inverse hyperbolic sigmoids of formyF(a, 1/2; 3/2; y2). Otherconsequencesinclude:
1.
2.
The availabilityof good approximationsfor smallvaluesof y and (1 – y).Rapidlyconvergingseriesexpansionsfor y closeto1.
3.
4.
A. Menonet al.
Connections with other indefinite integrals ofpowers of trigonometricor hyperbolicfimctions.Connections with statistics via the functionL(P, q) (SPanierand Oldham, 1987).
When inversehyperbolic sigmoidsare characterizedby eqn (4.2), we can use the identity,
cosh(tanh-’ (y)) =J&
to show that,
ycoshti(tanh-l(y)) = (l –yy2)a.
(4.4)
(4.5)
The fundamental role played by the hyperbolictangent is once again evident. Here, it relatesthetwo types of hyperbolic sigmoids defined by eqns(4.1) and (4.2).
4.1. New Inversesfrom Old
Theorem4.1 makesit possibleto generatenewinversehyperbolicsigmoidsfrom others.ThekeyideaisthatifyF(a, 1/2; 3/2; y2) is an inversehyperbolicsigmoid,thenso isyF(a + 1, 1/2; 3/2; ~). A similarstatementmay be made for inversehyperbolicsigmoidsof theform yF(a, –; –; Z2). GH fimctions such asF(~, ~; ~; Z) and F(CY+ 1, /3;T; z) are stid to becontiguous. Erd61yiet al. (1953)givea completelistingof themanyidentitiesthatrelatethem.Lemma4.1isastraightforwardconsequenceof threesuch identities.
LEMMA 4.1. If qa : (–1, 1) ~ 9% is an inversehyperbolic sigmoid, then the functions qm+l and q._ldefined by:
(4.6)Y2(1-C4 d
%,1(Y) = ~ ~ (y‘-%(Y)) ~~I
2CZ-1%-l(Y) =
(h - ;;’1-y’)a-2
{y;?;-’w}x~@{z
dycY>2 (4.7)
are also inverse hyperbolic sigmoids. Also, there existfwctions K,(a, z), Kz(a, z) and K3(a, z) such that thefollowing relation holds:
K1(a, Z)V.-I (y)+ K2(cY,z)%(y) + ~3(a, Z)%+l (Y) =0.
(4.8)
ProoJ Equation (4.6), which definesqa+l(y), resultsfrom the following identity:
(CY)nZa-’F(a+ n, P; 7; z) = ~ [Za+n-’F(CY,/3;~; z)].
(4.9)
A Class of Sigmoid Functions 825
In the following,we will use F(O) as an abbreviationfor F(6; /3;-y;z). Equation (4.7) follows from theidentity:
(-y- CY)nZ7-a-’(l- Z)a+d-7-nF(~_ ~)
=~[Z~-a+n-’(l -Z)”+o-7F(a)]. (4.10)
Equation(4.8) relatingqa-l (Y),qa(y), and qa-l (y), isa consequenceof the identity:
(7 - ~)HCY- 1)+ (z~ - v - fYZi-~z)F(cY)+ cr(z– l)F(a + 1) = O. (4.11)
Inversehyperbolicsigmoidscome in two forms; oneform hasthreeparameters[eqn(4.1)],whiletheotherhastwo “missing” parameters[eqn(4.2)].Subjectto aminor condition, thelatterform is alwaysobtainablefrom the former.
LEMMA 4.2. Let qa= yF(cY,1/2; 3/2; y2), wherecs>1. Then the function q._l defined by:
%-I(Y) = Y(l –Y*); %(y) = yl’(a – 1, –; –; yz)
(4.12)
in an inverse hyperbolic sigmoid, with parametera – 1. ■
For inversehyperbolic sigmoids with “missing”parameters,thereis a very simplecomposition rule.
LEMMA4.3. ~ q.(y) =Y/(1 –y2)a and q., (y) =Y/(1 – Y*)& are two inverse hyperbolic sigmoih witha, a‘ >0, then thefunction (qo(y)q~ I(y))/y N also aninverse hyperbolic sigmoid with parameter(a +a’). ■
In general,the set of inversehyperbolic sigrnoidsisnot closed undermultiplicationor addition.But if Vaand q~~are inversesof two hyperbolicsigrnoidsthentheir sum would also be an inverse hyperbolicsigmoid qfi for some p c $%,i.e., q~+ qat= Kqfi, forsome K, if and only if
vu +%’ = KZI,* (CY)n+ (cr’)” = K(p)n VrI>l (4.13)
which in turn, is possible’2if and only if a = a‘, ora = O,or a‘ = O.
12Equation(4.13),with K = 1, provideaan amusingapplica-tionforFermst’slasttheorem;if weacceptthatforalln >2, therecannot exist positive integersa, b and c satisfyingthe identityan+ /?’= e“, then we may conclude that the sum of inver3ehyperbolicsigmoidswithdifferentintegralparametersearmotbean inversehyperbolicsigmoidwithan integralparameter.
The definitionof hyperbolic sigrnoidsimpliesthattheirinverseshaveGH expansionsiny2. Theorem4.2relaxes this requirement by only requiring GHexpansionsin some odd, injectiveC* function g(y).A proof is providedin the Appendix.
THEOREM4.2. Leta :9?-+ (–1, 1) be a retdarudytic,oaW strictly increasing sigmoid, such that its inverseq : (–1, 1) + 91 has a GH series expansion in someinjective, odd, increasing C’ function g(.), with at mostthree parameters, convergent in (–1, 1). Also let q’have a GH series expansion in g(.), with at most oneparameter. Then, either
V(Y)= dy)F(~, ;; ;; (dY))2)
or,
n(Y)=g(Y)~(w –; –; o!(y))’)/?(Y)
=(1 - (g(y))z)~‘“’itha >0(4.15)
provided
i?’(Y)PI (1 - yq” + “
where g’(.) is thefirst derivative of g(.). ■
In the case g(y) = y, we obtain the characterizationfor inversehyperbolic sigmoids.Another interestingspecial case is when g(y) = q(y), whereq(y) is aninversehyperbolicsigmoid(sinceq(y) is an injective,smooth, odd, increasingfunction the conditions ofthe theoremare satisfied).The elementarycomposi-tion rulespresentedhere allow the generationof aninfinitevarietyof inversehyperbolic sigmoids.’3Thenext sectionpresentssome examples.
4.2. Examples
Any fimctionof theform y/(1 – y2)”, wherecx>0, isthe inverseof a hyperbolicsigmoid.For example,fora = 2, the function y/~~ is the inverseof thehyperbolicsigrnoidx/~w.
Of all inverse hyperbolic sigmoids of the form
13An in~~ng ~w isthePieceWiserationalsigrnoiddefined
by Elliot (1993), a(z) = y/(1+ [zI). Although its inverseq(v)= v/(1 – Iyl) does not fit in an obvious way into theframeworkdevelopedin the last few sections,it is easy to relaxtheconditionsplacedong(y), in Theorem4.2, so as to includethissigmoidas well.
826 A. iUenonet al.
yF(cE,1/2; 3/2; y2), the function tanh(.) is note-worthy: first, it corresponds to the case a = 1;secondly, all inverse hyperbolic sigmoids withintegralvaluesof CYmay be generatedfrom tanh(x)by a process of differentiation(Lemma 4.1); andthirdly, it is a function often encounteredin neuralnets. As was mentioned in the Introduction, thelogistic function may be thought of as a translatedand scaledversion of the hyperbolictangent.
There is a good exampleof the hypergeometriccompositiondescribedin Theorem4.2. Sincetan(~y)is an odd, injective,smooth, increasingfunctionof y(for some constant ~ > O), from Theorem 4.2, onemay conclude that for positivea, the function, tan(@Y)F(~, 1/2; 3/2; tin2(@)) is the inverseof somerealanalytic,odd, strictlyincreasingsigmoid.It turnsout thattheinverseGudermannianfunction’4maybeobtained from this function, by choosing CY= 1 asfollows:
gal-*(y)= ln(aec(y)+ tan(y)) for– ~ < y < ~
= 2 tan(y/2)F(l, 1/2; 3/2; tan2(y/2)).
Many such examplescould be generated.15
5. CHARACTERIZATION: HYPERBOLICSIGMOIDS
It is often desirable and necessaryto work withsigmoids themselves,rather than their inverses.Inthis section, we obtain power seriesexpansionsofsigmoids.
If x = q(y) is an inversehyperbolicsigmoid,then~ ~ ~–1 must have a Maclaurinseriesexpansionofthe following form:
We are interested in determiningthe coefficients{b21+l}~0 associated with the inverse hyperbolicsigmoids:
v(1 - yqa
and yF(a, 1/2; 3/2; y2).
5.1. Hypbolic Sigmoidsof the First Kind
When an inversehyperbolic sigmoid is of the form
14‘f& inverwGudrjrmannianfunctionfinds use in relatingcircularand hyperbolicfunctions,without the usc of complexfunctions.
ISme tablesof SpanierandOldham(1987),andHansen(1975)in particular,containmanysuchfunctionsandexpansions.
Y/(1 – Y2)”, a remarkably explicit form for thecoefficients{b2J+1}~may be given:
THEOREM5.1 (hyperbolic s@ioids—1). Z~the inversesigmoid is given by y/(1 – y2)a, cx>0, then in someneighborhoodof the origin, we have the valid expansion
where,
b~+’=(-l)k(M+l)’ ((z:l)a)o “1)
ProoJ See the Appendix. ■
5.2. HyperbolicSigmoidaof the SecondKind
When an inversehyperbolic sigmoid is of the formx = yF(a, 1/2; 3/2; y2), the problem is much hard-er. The Lagrange inversion formula leads to anintractableexpression.Kamber’s formulae, as pre-sented by Goodman (1983), can be used to giveexplicit expressions for the coefficients. Unfortu-nately, the resulting expressions involve determin-ants, and are of little computational value. Themethodof repeateddifferentiationis more successful.The starting point for this line of attack is theobservationthat if x = q(y) is an inversehyperbolicsigmoid, then:
dxd 1— = — q(y) = q’ (Y) = (1 – y2)a .dydy
(5.2)
From Theorem 3.2, we seethat for y = m(x),
~ = : .(x)=*=(1– yz)”. (5.3)
By virtue of eqn (5.3), we can compute the higherderivativesof o(.) and hencecompute
dM+lu(x)~+1 = &k+l .
X=(I
Note that dy/dx is expressedin terms of y; thisnecessitatestheuseof thechainrule.For example,tocalculatethe second derivative:
2=(31-’’20%=(1 - yz)” (-$(1 - Yz)”). (5.4)
A Claw of SigmoidFunctions
The following theorem presentsan efficientway toimplementthisprocedure.
THEOREM5.2 (hyperbolic sigmoids-IIA). fit theinverse hyperbolic sigmoid be q. = yF (a, 1/2;3/2; y2), and u G q: ~. tit D ~ dldx. Then,
D“(y) = D“(cr(x))= Gn- , (y)(l – y2)m (5.5)
where Gn : (–1, 1) ~ 9i?i.ra fiction satisfying the
recursion
Go(y) = 1,
G.(Y) = ; G.-l (Y)
2yna–— G.- l(y), n> 1.1 – Y2 (5.6)
In particular, bzk = O, and b=+l = DX+*(a(x)) =GZk(0).
Proo$ Proved by inductionon n. ■
While the procedure implicit in Theorem 5.2 isefficient, it does involve the computation of thederivative of Gn(y). Equation (5.6) is a partialdifferenceequationwith variablecoefficients.There-fore, thereis littlehope of solvingit in any generalityand obtaininga closed form expression.Even moresophisticatedmethods—suchas Truesdell’sgenerat-ing fimctiontechniqueand Weisner’sgroup theoreticapproach (McBride, 1970)-do not give any specialinsightinto the nature of the polynomialsG. (y).16The next theorem offers a somewhat differentapproachto the method of repeatedderivatives.
THEOREM5.3 (hyperbolic s@noids—IIB). Let
be an expansion for a hyperbolic sigmoid, whoseinverse is of the form yF(a, 1/2; 3/2; y2), valid insome neighborhood of the origin. Then bzk = O, andbx+l = C(2k+ 1, k), where the sequence C(n, k)
satisjies:
16Equation (5.6) is a differential-differencesystem of theascending type; it ean then be shown that the polynomials{Gn(y)}~l satisfy Truesdell’sF-equation. Unfortunately,theresultinggeneratingfunction for Gn(y) is too complicatedforanypracticaluse.
C(l, o) = 1
C(n, @ = O Vk>n, k <0
C(rI+ 1, k) = (2k – n + l) C(rI, k)
–2(na – k +1) C(n, k – 1)
n a and n and k are natural numbers. Furthermore,Dn(a(x)), the nth a%ivative of u, is given by:
D“(y) = D“(a(x))n–1
=x C(n, k)yX-n+’(l -y2)w-k, forn>l. (5.8)k=O
Proof. See Appendix. ■
The recursivesystemdescribedby eqn (5.7) doesnot involve any differentiation.The desired valuebx+l may be obtained by computing the value ofC(2k + 1, k). Equation(5.8) givesinformationaboutthe shapes of the derivatives of the hyperbolicsigmoid. From eqn (5.7),
b, = 1 b,= -2c2, (5.9)b5=4a(7a–3) fq=–8cr(127cr2– 123cr+30). (5.10)
Theorem5.3may be viewedas a generalizationof thework of Minai andWilliams(1993)on thederivativesof the logistic sigmoid. They obtained relationssimilar to eqn (5.7).17 In general, eqn (5.7) is apartialdifferenceequationwith variablecoefficients,and the systemdoes not appearto be relatedto anywellknown setsof numbers.Obtaininga closed form
827
(5.7)
solution for the numbers C(n, k)
intractable.
6. APPLICATIONS
appears to be
In thissection,we presenttwo applications.The firstshowsthatif theneuralnetworktransferfunctionis ahyperbolic sigmoid, then the dynamical equationsdescribing the CGH neural network (Hopfield &Tank, 1986; Grossberg, 1988) can be transfonrtedinto a set of non-homogeneousassociatedLegendredifferentialequations. Some conclusions regardingthebehaviorof theCGH model can be drawn,as theoutputssaturate(i.e., output+ +1).
The second application derives an interestingconnection between Fourier transforms and one-hiddenlayerfeedforwardnets(one-HL nets).Subject
17Interestingly,in theeaseof the Io@stic siwoid, th~
relations happenedto be the recursionscorrespondingto theEuleriannumbers(Grahamet al., 1989). In other words, thecoeftkientsarisingin the computationof higherorderderivativesof the logisticsigmoidturnout to be the Euleriannumbers.
828 A. Menonet al.
to an additionalminor constraint,we show that theuse of one-HL nets with simple sigmoidal transferfunctions for function approximationis tantamountto assumingthatthe functionbeing approximatedisthe product of two functions;one thederivativeof abounded non-negative function, and the othersatisfying some linear nth order differentialequa-tion, where n is the numberof nodes in the hiddenlayer.
6.1. ContinuousCGH Neta and LegendreDifferentialEquations
The continuous CGH network with N neurons isdescribedby the followingdynamics:18
dui~+gi*{j=~ TijVj+Zi=Ei
j
= – ::, vi E {1,..., N} (6.1)1
wherew and viarethenetinputandnetoutputof theith neuron, respectively,Zi is a constant externalexcitation, and E is the “energy” of the network,givenby:
E = –~ ~ Tijvivj = zvi#=-D’’Ei” (6”2)l,]
Assume –1 < vi<1. ~t vi= ~(ui), where a(.) is ahyperbolicsigrnoid.Letqs a-l, or, ui= ~(vi). Thereare two casesto consider.
CaseZ. Let q(vi) = ViF(~, 1/2; 3/2; v?). In thiscase
dq 1Z = (1 – V;)a
and substitutingeqn (6.3) into eqn (6.1), we get:
(6.3)
(6.4)
The following sequenceof operations is applied toeqn (6.4):
1. Substitute
dvi
‘i= x’
and differentiatewith respectto vi.
18me ~ynaptj~wej~ts Z’ijare assumedto ~ symmetric.1tmakesthe derivationssimplerwithno loss in generality.
2. Multiplythroughoutby (1 – v~)u+l.3. Differentiateonce more with respectto vi.
Equation(6.4) is thentransformedinto:
dyi~ @ – Z(I – a) ~0 – ‘i) dv; + 2~yi =, Qi (6.5)
where
# 1Qi=$ [(1–));)”+l + + 2giVi.
Finally, substituteYi= zi(l – v~)Qi2in eqn (6.5),yielding
whereRi = (1 – V~)–a’2Qi. The associatedLegendredifferentialequation is of the form (Spanier &Oldham, 1987),
(1 -*7 ?.?&z. g
[ 1,::2 f =0.+ V(v+ 1) –— (6.7)
It is clear that the left hand side in eqn (6.6) is theassociatedLegendredifferentialequation with para-meters p = –a [eqn (6.5) requires us to choosep = –a, rather than +a], and v = a. In otherwords, the continuous CGH model with a neuraltransfer function given by ~(vi) = ViF(~) 1/2;3/2; v?), is reducible to the non-homogeneousassociatedLegendredifferentialequation with para-metersp = –d and v = cv.
CaseIZ.Let ~(vi) = ViF(~, –; –; v?). An analogousapproach leads to the very same conclusion as inCaseI, i.e., it is possibleto transformthecontinuousCGH equationwith the above transferfunction to anon-homogeneous associated Legendre equation.However, the right hand side of the transformedequationis complicatedand we do not considerthiscase further.
We emphasizethat the link between the contin-uous CGH equation and the Legendre differentialequation is not accidental, given that it can beestablished for all hyperbolic sigrnoidal transferfunctions.For ui= tanh-l(vi), a = 1, and the aboveequationshave a ratherelementaryform.
An immediateapplicationof the above transfor-mation is in studyingthe saturationbehavior of theCGH neural net. By saturation,we mean that the
A Class of Sigmoid Functions 829
outputsof the neuronstend to +1. As discussedbyHopfield and Tank (1986), this usuallyoccurs whenthenetworkis headingtowardsa criticalpoint (localor global). Saturationimpliesthat as a node outputvi~ +1, the quantity Ri ~ O. In other words, wemay studythe saturationbehaviorof the continuousCGH model by consideringthehomogeneousversionof eqn (6.6), viz.,
(1-v;)* ,[
–2Vi~+ a(~+ 1)‘—1:VJ ‘i = 0“
(6.8)
From thetheoryof associatedLegendreequations,itis seenthat eqn (6.8) has a solution in termsof theassociatedLegendrefunctions,1$’)(x), and Q$!)(x)(Erd61yiet al., 1953). Here, p = –a, v = a, andx = Vi,and we have:
(1 - v;)*– Zv, ~
‘ dvi
[+ (Y(CI+ 1) – & 1Zi = Ri (6.13)
i
whereRi = (1 – V~)–a’2Qi,and
[ 1: (1-v;)”+’: +2givi.1
Consider the case when Qi = K is a constant. Thenthe above equationreducesto the non-homogeneousequation,
(1 - @ * ,[
a’–2Vi ~+ fl’(~ + 1) ‘—
11 -v; ‘i
= K(l – v;)-”” (6.14)
‘a)(vi) + C2Q:-R)(vJZi= C] P;which may be solved using the specialfunction s~,al,
(1 -;;)@ = c1P~-u)(vJ+ C2Q:-U)(VJ (6.9) definedand describedby Babister(1967). Equation(6.14) first arose in the context of solving for
4 = c1p~-u)(vi)+ C2Q~-u)(vi). Poisson’s equationin sphericalpolar coordinates.(1- ;~)”l’ dt
Neglectingtheeffectof gi, as is common practice,weobtain from eqn (6.4):
(6.10)
Equation (6.9), in conjunction with eqn (6.10),implies:
E,= - #=(1 - v~)-ul’[clp~-”)(vi) + c’Q~-a)(vi)].1
(6.11)
Equation(6.11) in conjunctionwitheqn (6.2) impliesthat the overallenergyat saturationmay be writtenas follows:
+ c2Q~-u)(vi)]. (6.12)
Here, Ei does not dependon Ej for i #j. Thus, to acrude fist approximation, the CGH network“dissociates” at saturation,into independentunits,andthequadraticenergyfunctionmay be writtenasalinearsumof non-linearunivalentfunctions,givenbyeqns (6.11) and (6.12).
We wish to stress the possibilities revealed bydealingwith the CGH equationin a generalcontext.For example,in eqn (6.6),
6.2. FourierTransformsand FeedforwardNets
Therehave been many differentattemptsto describethe behavior of feedforward networks such as thegroup theoreticanalysisof the perception, proposedby MinskyandPapert(1988),thespacepartition(viahyperplanes)interpretationdiscussedby Lippmann(1987) (and many others), the metric synthesisviewpoint introduced by Pao and Sobajic (1987),and the statistical interpretation emphasized byWhite (1989). Gallant and White (1988) showedthat a one-HL feedforward net with “monotonecosine” squashingat thehiddenlayer,anda summingoutput node, embeds as a special case a “Fouriernetwork” that yieldsa Fourier seriesapproximationto a givenfunctionas itsoutput.We presenta relatedconstruction in this section; it is shown that a onehidden layer (one-HL) nets with simple sigmoidalconvextransferfunctions (at thehiddenlayer),and asingle summing output, can be thought of asperforming trigonometric approximation (regres-sion). Specifically,the inverseFourier transform ofthe function (to be learned) is approximatedas alinearcombination of weightedsinusoids.
The result is a consequence of a connectionbetween a class of simple sigmoids and Fouriertransforms,that facilitatesa novel interpretationofone-HL feedforward nets. The following. classictheoremdue to Polya (1949) is a startingpoint.
PROPOSITION6.1 (Polya, 1949). A real valued und
830
continuous fmctwn f(x) aljined for
satisfying thefollowrng properties:
● f(o) = 1,● f(x) = f(–x),
● f(x) is convex for x >0,
● limx+mf(x) = o,
all real x and
is always a characteristicfmction (Fourier transform)of an absolutely continuousdistributionfmction,~a i.e.,
f(X) = #(h(f); X) = j’” exp(ixt)h(t)dz.-m
Furthermore, the a%nsityh(t) isan evenfwction, and ir
continuouseverywhere except possibly at t = O. N
The following resultconnects simple sigmoidswithFouriertransforms.
THEOREM6.1. Let a(x) be asimpZe sigmoid. lfa(x)/xis a convex fmctwn, then it is theFourier transformofan absolutely continuousdistributionfmction, i.e.,
rU(X)– #(h(t); x) = -m exp(ixt)h(t)dt. (6.15)x
Prooj It sufiicesto prove that c(x)/x satisfiestheconditions of Polya’s theorem;u(x) being simple isbounded, and hence limx+w a(x)/x = o. Also,t7(-x)/-x = –a(x)/–x = cr(x)/x. Since cr(x) iscompletely monotone in (O, 1), it follows thatlima u(x)/x = K (some positive constant). Thereis no loss of generalityin assumingK = 1, sinceonecan always scale a(.) appropriately. Finally, theconvexityof a(x)/x ensuresthatall of theconditionsof Polya’s theorem are satisfiedand the conclusionfollows. ■
REMARK6.1. Polya’s theoremis a sufficientbut notnecessary condition for f(x) to be the Fouriertransform of some function h(t). Hence, Theorem6.1 is also only a sticient condition for a simplesigrnoidto be a Fouriertransform.The casein pointis the non-convex fimction tanh(x), whichis stilltheFouriertransformof a welldefinedfunction[thisandother examples may be found in Oberhettinger(1973)]:
In other words, the conclusionswe drawin the nextfew paragraphsmay be valid for some non-convexsimplesigrnoidsas well.
InR- tit m absolutelycontinuousfunction F(2) is adistributionfimction if it can Ix written in the form F(s) =Cmh(t)~, whereh(t) is calledthe densityof F(m).
A. Menonet al.
REMARK 6.2. In eqn (6.15), h(t) is an even function.Hence the transformis a Fourier cosine transform.The sinecomponentvanishesduringthecourseof anintegration.
Consider a one-HL net, with k input nodes, nhidden layer nodes with convex simple sigmoidaltransfer functions a(.), and one summing outputnode. Let wij denote the weight of the connectionbetweenthe ith node in the hiddenlayer and thejthnode in the input layer; similarly,let Cidenote theweightof theconnectionbetweenthe ithhiddenlayernode and theoutputnode. Thentheoutput O maybeexpressedas,
O = ~ Ci~f = ~ CiU(U,)i=l i=ln k
(6.17)= x (xC,u w~jxj+ 0/
i=l j=l )
where ui and Oiare the input and bias for the ithhidden node, respectively.Since m(.) is a convexsimplesigmoid,usingTheorem6.1, eqn (6.17)mayberewrittenas,
~(t) = ~ C,~, = ~ ciu&(h(t); U,) (6.18)i=] i=l
where #(h(t); ui) denotes that #(h(t); x) isevaluatedat the point
.i = ~ W,jXj+ O,.j=l
Usingthewellknownpropertyof Fouriertransforms(Davies, 1978), that if f(x) = #(h(t); x), thenxf(x) = iS(h’(t); x) = .$(-ih’(t); x), where h’(.) isthe first derivativeof h(.), and i = ~, Equation(6.18) may be rewrittenlgas,
o(t) = ~ C,s(–ih’ (t); u,). (6.19){=1
Equation(6.19)can be recognizedas beinganalogousto the Heaviside expansion formula in Laplacetransformtheory,20which allows the reconstruction
19In ~n @.19J, tie i term in #(–ih’(t); U) ~nv@.$ ‘ie
Fouriercosine transformrepresentationof u(z)/z (see Remark6.2) into a Fouriersine transform.
m Forconvenience,we restatea simpleversionof the formula:If the Laplacetransformof a functionh(t) is given by f(z), i.e.,f(z) = Y(h(t);z)= ~h(t)exp(–zt)dt, andf(z) hasonlyfmtorderpolesat 21,x2,...,2“, th~ h(t) = ~~=1 ~k(Zk), whereFk(z~) is the residue or pole-coe!lkient of f(z)exp(zt) (Bohn,1963).
A Clamof SigmoidFunctions
of a timevaryingfimction usinginformationrelatingto its spectralcomponents. Equation(6.19) suggeststhat 1-HLnetswithconvex simplesigmoidaltransferfunctions can be thought of as implementing aspectral reconstruction of the output using theweightedinputs u; to evaluatethe associatedpolecoefficients(residues)of the Heavisideexpansion.
In particular, it can be demonstratedthat theresultsof Gallant and White (1988) are implied byeqn (6.19).In whatfollows, weshalluse.!?S(h; x) and#c(h; x) to indicate the Fourier sine and cosinetransformsof h(t).
Since h(t), the continuous distributionfimctioncorresponding to cr(x)/x, is an even fhnction(from Polya’s theorem), it follows that a(x)=x.!F(h(t); x) = xSJh(t); x). Using the property ofFourier transforms that x~c(g(t); x) =&,(–g’(t); x) (Davies, 1978),we may conclude thata(x) = #,(–h’ (t); x).
Let Ui= u+ ri, where the ri are appropriatefunctionsof the xi (since the ui are functionsof theinputsXi)
o(u) = ~ C,#.(–h’(t); u + n). (6.20)isl
From the frequency shifting property of Fouriertransforms(Davies, 1978),
~ W~(f(t); x + a) = &,(f(t) cos(ar);x)
+ .%C(f(t)sin(at);x),
it follows that
O(u)= ~ Ci*.(–h’(t);U+n)i=l
= S 2ci{w(-~’(t)cos(rit)”~)si=l
+ SC(–h’(t) SiIl(~it); U)}
{D .!7. 2 ~ Ci(–h’(t) COS(rit);U)
ixl }
{
n
+%. 2 ~ ci(–h’(t) sin(rJ);U)i=l }
.%-’ (O(U)) = ‘h’(t) ~ CiSh(ri + U)t.i=l
(6.21)
(6.22)
But we may choose u artibrarily:let u = O,implying
rt = Ui = ~ WijXj+(.?i,j=l
and eqn (6.22) becomes,
831
(, 1#-’(o(u)) = -411’(t)~ Cfsin ~ Wfjxj+ Of
i=l
(6.23)
Equation(6.23)maybe usedasa startingpoint for ananalysis identical to that adopted by Gallant andWhite (1988) in their study of one-HL nets with“cosine squashing” functions. It is then straightfor-ward to show that the weights may be so chosen(hardwired) so that the one-HL nets embeds as aspecialcasea Fouriernetwork,whichyieldsa Fourierseries approximation to a given function as itsoutput. In this sense, the results of this sectionextendthose of Gallantand White.
More generally,one can draw similarconclusionsby consideringsigmoids that are the L.aplace trans-forms of somefunction; for exampletanh(x)/x is theLaplacetransformof sgn(sin(rt/2)), wheresgn(x) is+1, Oor –1 dependingon whetherx is greater,equalor lessthanzero (Spanier& Oldham, 1987).A similaranalysis would lead to a connection with realexponential approximation (rather than trigono-metric approximation).Efficientalgorithms,such asProny’s, exist for certain restricted forms of theexponentialapproximationproblem (Su, 1971).
Also relatedare the considerationsof Marks andArabshahi (1994) on the multidimensionalFouriertransformsof the output of a one-HL feedfonvardnet; they showedthat the transformof the output isthe sumof certainscaledDirac deltafimctions.Here,we viewthesigmoiditself as theFouriertransformofsome function;themainadvantageof our interpreta-tion is the algorithmsit suggestsfor trainingone-HLnets of the type consideredin this section.
Another potentialuse of eqn (6.23) is its possibleusein exploringthe“goodness” of theapproximationobtained by a one-HL net with simple sigmoidaltransfer functions. In the last 200 years, much hasbeen learned about the errors associated withexponential and trigonometric approximation, andways to dealwith it; however,considerationof theseissuesis beyond the scope of thispaper.
7. CONCLUSION
We have analyzedthe behavior of importantclassesof sigmoid functions, called simple and hyperbolicsigmoids, instancesof which are extensivelyused asnode transferfunctions in artificialneural networkimplementations. We have obtained a completecharacterization for the inverses of hyperbolicsigmoids using Euler’s incomplete beta functions,and have describedcomposition rules that illustratehow such functionsmay be synthesizedfrom others.We have obtainedpower seriesexpansionsof hyper-bolic sigmoids,and suggestedproceduresfor obtain-
832 A. Menonet al.
ing coefficientsof theexpansions.For a largeclassofnode fimctions, we have shown that the continuousCGH net equations can be reduced to Legendredifferentialequations.The fact that the connectionbetween Legendre differential equations and theCGH equation holds for such a wide variety ofsigrnoids,andisnotjust anaccidentalconsequenceofa particularsigmoid, stronglyindicatesthat furtherexplorationof this connection is warranted.Finally,we have shown that a large class of feedforwardnetworksrepresentthe output function as a Fourierseries sine transformevaluatedat the hidden layernode inputs, thus extendingan earlierresultdue toGallant and White.
REFERENCES
Albertini, F., Sontag,E., & Maillot, V. (1993). Uniquenessofweightsfor neuralnetworks.In R. Mammone(Ed.),Artz@cialneural networks with applications in speech and virion. London:ChapmanandHall.
Babister, A. W. (1967). Transcendental functions satisfyingnonhomogeneous linear dl~erential equations. New York:Macmillan.
Bohn, E. V. (1963). The transform analysis of linear systems.Reading,MA:Addison-Wesley.
Carlitz, L. (1963).The inverseof the error function. Pac~c J.Math. 13(2), 459-470.
Cybenko, G. (1989). Approximationby superposition of asigmoidrdfunction. Math. Control, Signals and Systems, 2,303-314.
Davies, B. (1978).Integral transforms and their applications. NewYork:SpringerVerlag.
Diaconis,P., &Shahshahani,M. (1984).Onnonlinearfunctionsoflinearcombinations.SIAM J Sci. Stat. Comput. 5, 175-191.
Elliot,L. D. (1993).A betteractivationfunctionforartificialneuralnetworks.TechnicalReport TR 93-8, Institute for SystemsResearch Universityof Maryland,CollegePark,MD.
Erd61yi,A., Magnua,W., Oberhettinger,F., & Tricomi, F. G.(1953). Higher transcendental fmctions (Vol. 1). New York:McGraw-Hill.
Feller, W. (1965). An introduction to probability theory and itsapplication (vol. II). New York:JohnWiley.
Funahashi, K. (1989). on the approximate realization ofcontinuousmappingsby neuralnetworks.Neural Networks2(3), 183-192.
Gallant,A. R., &White,H. (1988).Thereexistsa neuralnetworkthat does not makeavoidablemistakes.In IEEE InternationalConf on Neural Networks (Vol. 1, pp. 657%54).San Diego,CA.
Goodman,A. W. (1983).Univalentfwctions (Vol. I). New York:MannerPublishingCo.
Graham,R. L., Knuth,D. E., & Patashnik,O. (1989).Concretemathematics. Reading,MA:Addison-Wesley.
Grossberg,S. (1973).Contourenhancement,shorttermmemory,andconstanciesin reverberatingneuralnetworks.Studies Appl.Math., 52, 217–257.
Grossberg, S. (1988). Nonlinear neural networks: principles,mechanismsandarchitectures.Neural Networks 1, 17-61.
Hansen,E. R. (1975).A table of series and products. EnglewoodCliffs,NJ:Prentice-Hall.
Barrington,P. (1993).Sigmoidtransferfunctionsin baekpropaga-tion neuralnetworks.Amdyt. Chem.65 (15),2167–2168.
Hertz,J., Krogh,A., &Palmer,R. G. (1991).An introduction to thetheory of neural computation (Vol. 1),Santa Fe Institute studies
in the sciences of complexity. RedwoodCity, CA: Addison-Wesley.
Hopfield,J. J., & Tank, D. W. (1986). Computingwith neuralcircuits:A model.Science,233,625-633.
Hornik,K., Stinchcombe,M., & White, H. (1989). Multi-layerfeedforwardnetworks are universal approximators.NeuralNetworks, 2, 359-366.
Kran@ S. G., & Parks,H. R. (1992).A primer of real analyticfmctions. Berlin:BirkhauserVerlag.
Lippmann,R. P. (1987).An introductionto computingwithneuralnets. IEEE ASSP Magazine, 4, 422.
Maeintyre,A., &Sontag,E. (1993).Finitenessresultsforsigmoidal“neural” networks. In Proc. 25th Annual Symp. TheoryComputing, SanDiego. New York:Associationfor ComputingMachinery(ACM).
Marka,R. J., &Arabshahi,P. (1994).Fourieranalysisandtilteringof a single hiddenlayer pereeptron.In Int. ConJ ArtlJicialNeural Networks (IEEE/ENNS), Sorrento,Italy.
McBride,R. E. (1970).Obtaining generating frictions (Vol. 21).Berlin:SpringerVerlag.
Minai, A., & Williams, R. (1993). On the derivativesof thesigmoid.NeuralNetworkr,6(6), 845-853.
Minsky,M., &Papert,S. A. (1988).Perceptions, an introduction tocomputational geometry. Cambridge,MA:The MITPress.
Oberhettinger,F. (1973).Fourier traruforrns of distributions andtheir inverses. New York:AcademicPress.
Pao, Y. H., & Sobajic,D. J. (1987).Metricsynthesisandcomxptdiscoverywithconnectionistnetworks.InProc. IEEE Systems,Man and Cybernetics Con$, Alexandria,VA.
Polya,G. (1949).Remarkson the characteristicfunction.In Proc.4th Berkeley Symp. Math. Statist. & Probab. (pp. 115-123).
Polya,G. (1974).Onthezeroesof thederivativesof a functionandits analyticcharacter.In R. P. Boas (Ed), GeorgePolya:collected works. (pp. 178–189).Cambridge,MA:MIT Press.
Rainville,E. D. (1964). Intermeditite d~~erential eguations. NewYork:Macmillan.
Spanier, J., & Oldham,K. (1987).An arhrsof fmctions.Washington,DC: Hemisphere.
Su, K. L. (1971). Tinre+brrain synthesis of linear networks.EnglewoodCliffs,NJ:Prentice-Hall.
Sussmarm,H. J. (1992). Uniqueness of weights for minimalfeedforwardnets with a given input~utput map. NeuralNetworks, 5(4), 58%593.
White, H. (1989). baming in artificial neural networks: astatisticalperspective.Neural Computation, 1,42S464.
Whittaker,E. T., & Watson,G. N. (1927).Moakrnanalysis(4thcd.). Cambridge:CambridgeUniversityPress.
Widder, D. V. (1946). The Luplace tromforrn. Princeton,NJ:PrinectonUniversityPress.
Wilf, H. S. (1989).Generatingfiictionology. New York: AcademicPress.
APPENDIX
THEOREM4.1. Let y = u(x) be a hyperbolic sigmoid, and letT : (–1, 1) + ~ be its inverse. Then, either
or
(A.2)q(yqcl, –; –; /) = (~ -;2)0 ~> o
where, by F(a, –; –; y’), we mean F(a, ~; ~; y2)(/3 G 9?).
Proof Since.u(.) is hyperbolic,bydefinitionrI(.)/x is &scribedbya
A Classof Sigmoid Fmctiorrs
GH series with at most three parameters.Therepossibilities:
rt(X) = X3FO(aI, C3Z,(Z3; ; X2) + CSSC1q(x) = xzF1(a,, az; 71;X2) + casc2
q(x) = xl Fz(al; 7,, W; xz) + caac3
7)(X) = XOF3(; ’71,72, ‘h; X2) + CSSC4.
are four
(A.3)(A.4)(A.5)(A.6)
The followingpropositionshowswhythereis no needto considercases 1, 3, and4, as possibleformsfor hyperbolicsigmoida.
PROPOSrmONA (Spanier& Oldham, 1987).Let ~F~(czI,...,ap;‘h,... ,%;z). ~ a GHseries inZ.withp+ qparometers.Ifnoneofthe numerator parotneters are non-positive integers, i.e.,Vi: a, # O, – 1, – 2,..., then convergence behavior of ~F~ is osfollows:
p < q + 1: ~Fqnecessarilyconvergesforallfmitez.
P = q+ 1: cotwer@?rzceOfPF~islimitedto – 1< z <1,onddeperuirontheporornetersqandyi (A.7)
p > q i- 1 : *Fqnecessarilydr”vergesforollnon-zeroz.
Sincelim,-.., q(z) - *CO,butis finitein theinterval(–1, 1),itfollows that if a GH series is to representq(.), then it has toconvergein the interval(–1, 1), but divergeat z = *1.
This rulesout non-positiverntegralvahkesfor the numeratorparameters;otherwise,theserieswouldconvergeforall z c 91(andnot just in the interval (–1, l)). Yet, even if the numeratorparametersdo nothavenon-positiveintegralvalues,in threeof theabovecases,the numberof numeratorparametersto denominatorparametersis suchthateach seriesconvergesfor aUz (case 1),ordivergeafor aUz (cases 3 and 4). That leavesjust one case toconsider,viz., theclassicalseries,2F1(cq, cq;71,z) = F(cz, 0; ~; z),i.e., we maytakeq(x) = xF(a, f3;T; X2).
Sinceq(.) hasto be a GH serieswithat most three parameters,some of the parametersare allowed to be “missing”.133otherwords,case 2 spawnsthe followingpossibilities:
q(x) = xF(fl, ~; ‘y; X2) + CaseZ(a) (A.8)q(X) = X~(CY, ~; –; X2) + Caac2(b) (A.9)q(x)= xl+, –; ~;X2) +-caae2(c) (A.1O)q(x)= xF(a, –; –; X2) - Casc2(d) (All)q(x)= xF(–, –; 7; X2) - Case2(e) (A.12)q(x)= xF(–, –; –; X2) - CZSC2(f). (A.13)
PropositionAcanbeusedonceagainto weedoutallbuttwoof theaboveset, viz., eases 2(a) and2(d).Therestleadto inappropriatedivergenceor convergencebehaviorin the interval.The followingpropertyof GH functions will be needed.
PROPOSmrONB (Spanier& Oldham, 1987).If y = F(rI, ~; y; x),then
Caac2(a):
~ = ~F(~ + 1,~+ 1;~ +1; X).dx’y
■
three parameterGH series:
(a),(P), X*q(x) = xF(a, 8; 7; xz) = x ~0 ~ ~“ (A.14)
Let
A: (–1, 1)+ 9?+,
with
833
A(x) .*.
Then
A(x) =~=~(xF(a, fl; ~; X2)
=F(a, ~; V;x2)+xM(a’~7; x2)
= F(a, 6; ~; X2)+ 2# $ F(a + 1,~+ 1;T+ 1;X2)
((“)k(@k -+WJ(oJk :+22 (7)k (k - l)!
)= ~ (~)i ‘—
(
(a)k(~)k 1.1‘~ (~- 1)!(+k ())
~+2 x~
( )
(a)k(@)k ~~+,) ~
= 2 (7),
(
(a)k(fl)k (3/2)& xx
)= ~~~-
From the definitionof hyperbolicsigmoida,A(x) is representableby a GH functionwithatmostthreeparameters;wemustthereforemakethe identification,@= 1/2 and~ = 3/2. Fromthe symmetrypropertiesof theGH function,we neednotconsiderthecasewhena = 1/2, ~ = 3/2. It foUowsthat,
y = xF(a, 1/2; 3/2; X2)
~ = d~(x)—= F(a, –; -; X2)= —
ok (1-;2)0“ (A.16)
The parametera cannot take any arbitraryreal value. Thebehaviorof q(x) at the endpointsof its interval,requiresthat,
linrq(x)+ *CO*hnll A(x)+ *CO.X-* I
(A.17)
Equations(A.16)and(A.17)takentogetherimplythata >0. Thisis a necessarybut not sutlicientcondition.The following twopropositionsaUowus to pindowna’s valuemoreprecisely.
PROPOSITIONC (Erd&lyiet al., 1953).If a and ~ are dtflerent from0, – 1, . . . then F(cr, P; ~; z) converges absolutely for z <1. Forz = 1:
F(cz,~;7; z)arwergesobsolutely if(a + /3- 7) <0 (A.18)
F(a, p; ~; Z) convergesconditwnolly ~O<(a + B– y) <1 (A.19)
F(a, @;~; z) diverges if 1< (a + /3- 7). ■ (A.20)
PROPOSITIOND (Erd61yiet al., 1953).1f(~– a – /3)>0 then
r(7)r(7– Q- f7)‘(a’ 7; 1)= r(7 - a)r(~ -p)
where
r(x) =r~ exp(-t)#-’ di
is Euler’s gamma fmction. g
If a <1, fromPropositionC we see that the seriesconvergesabsolutelyat z = X2= 1. FromPropositionD, this in turnimpliesthat q(x)/x will havea finite valueat the endpointof ita domaininterval.Therefore,a> 1. The final formfor the threeparameterGH representationfor q(x) is, therefore, xF(a, 1/2; 3/2; X2)wherea> 1.
834 A. IUenonet al.
Case2(d)Gne-parameter GH series:In thiscase,
+k xtj(x) = XF(U, –; -; X7 = x ~ (q ~ = (~ - X2)-.
k>O
Thesituationis muchsimpler,sincewehaveto placeboundsonthevalueof oneparameteralone.Anargumentalmostidenticalto theone aboveallows us to concludethat for q(x)/x to satisfy thepropertiesof a hyperbolicsigmoid, it is both neceasmyandsufEcientthatwe takea >0. ■
THEORes14.2.Let a : @~ (–1, 1) be a realanalytic,oti, strictlyincreasingsigmoid, such that its rnverseq : (–1, 1) ~ S?hasa GHseries expansion in some injective, oaii, increasing C’ fitwn g(.),with at most threeparameters, convergentm (–1, 1).Also let V’ havea GH series expansionin g(.), with at most one parameter. Then,either
w =d)’)+>;;:;W))’)m (a)k (g(y)p ~ilha>l,
“g(Y) k~o~ —k! ‘ (A.21)
or
v(~)=g(y)~(a, –; –; (@))z)i?(Y)
=(1 - (g(y))’)” ‘‘i’ha >0(A.22)
provided
g’(Y)!5 (1–yq” + “
where g’(.) is thejirst derivative of g(.).
Prooj The proof of Theorem4.2 is very similar to that for~e~o~c~;i;~:eti;~t withq(x) =g(x)F(a; 1/2; 3/2; (g(x))’),
(A.23)
whereg’(x) isthe6rst derivativeof g(x). Sinceg’(x) >0 for allx ~ Dom(g), and C-Z>0, it follows that q’(x) >0 for allx ● Dom(q), i.e., q(x) is a strictly increasing function. Theanalyticity,continuity and oddness of q(.) follow from therespective properties of the GH function. We assure thatIimX-lrt(x)+ w, by forcingits derivativeq’(x) to go to infinityat theendpointsof its interval.■
THEOREM5.1. I’ the inverse sigmoidis given by y/(1 – yz)”, a >0,then in some neighborhoodof the origak,we have the valid expression
where,
~+’=(-l)k(u+l)’((ui’)a) (A.24)
Proof We will needthe Lagrangeinversionformula,statedbelow(Wilf, 1989).
Considerthefunctionalequation:u = t#(u). Suppose and~(u) arc analyticin some neighborhoodof the origin (*plane),
with+(0) = 1.Then thereis a neighborhoodof theorigin(inthe t-plane)in whichtheequationu = t+(u) has exactly one root for u.Let
be the Maclaurinexpansionof~(u(t)) in t,and
be the Maclaurinexpansionof the functionf’(u)[~(u)]”.Then
0# = : c.-,.
Here, y au, xs r, and ~(u) = (1 – yz)o. Takef(u) = ua y,andthe theoremfollowsfromthe Lagrangeinversionformula.
THEOREM5.3. Let
“ b+, X*a(x)=~~opk + 1)!
bean expansionfor a hyperbolic sigmoid, with an inverse of theformyF(a, 1/2; 3/2; y2), valid in some neighborhoodof the origrn. Tken,bk =0. ~ &+ I = c(2k + 1, k), wherewe d@nethe sequenceC(n,k) osfollows:
c(l, o)= 1C(n,k) = O Vk>n,k <0
C(n + 1, k) = (2k - n + l)C(n, ,4)
- 2(na -k+ l)c(n, k - 1), n>l. (A.25)
Here, n and k are natural numbers. D“(u(x)), ths nth derivatiws ofU, are given by:
D“(y) = D“(u(x))“-l
‘k~o C(n, k)yx-~’(l - Yz)nm-k. (A.26)
ProoJ This theoremwas obtainedby a processidenticalto thatdescribedin MinaiandWilliams(1993),on the derivativesof thelogisticsigmoid.We thereforerestrictourselvesto an outline.
It is giventhaty = q(x) = XF(CZ,1/2; 3/2; X2), andx = u(y).It can be shownthat
D(x) =: u(y)= l/q’(x)= (1 -X2)”.
Considerthe derivativesof the polynomialf~l(x) = Xk(1 – X2)’,
D~k,,(X)) = $fi,,(X)=kxk-l(l - X2)04
- 21X’+’(1 - X’)*’-’
= (k)fi-,,e, (x) + (–2/)&,4-,(x)
= ~(jk,, (X)) + ~(_fk,, (x)). (A.27)
In eqn (A.27)we havesplit the effectof the operator
into the sumof the actionsof two operatorsL andR (Minaiand
A Classof SigmoidFunctions 835
C(I, O)fo,o(z)
C(2, w-l,2tY(4
c(3,0)f_2,3a(z) C(3, l)fo,sa- l(z) C(3,2)/2, ~a-~(z)
C(4, o)f_3,4e(z) C(4, l)j_l,4e -1 (z) C(4, 2)fI,ta - 2(z) c(4,3)f3,4a -3(Z)
FIGURE3. BinaryWorlvalion’gtreefor hyperbolkslgmoids.
Williams refer to them as ~ and Al). Withpolynomials~~,l,theseoperatorsaredefinedby:
~(Af~,l(~))= ,@.1,-, (~)R(A~k,,(x))= ‘z/Afk+,,#-,(%)
respectto the
(A.28)
(A.29)
where,4 is a constant.The mainadvantageof introducingtheseoperatorsis that they give a systematicway of visualizingtheproductionof D“+’(x) fromD“(x);L andR maybe thoughtof asbeingappliedto a binarytree of expressions,whereeach node issome polymnial ~k,,(x), and the root is the polynomialA,. =(1 – X2)”.The action of L on each node of this tree is toproducea left child, given by eqn (A.28), and that of R is toproducea rightchild,giversby eqn (A.29);L actingupon~k,,(x)does threethings:mul@es it byk(= the degreeof x), reducesthedegreeofx by 1,irscreuresthe&greeof (1 – X2)bya. Ontheotherhand,R rncreasesthe degreeof x by 1, thatof(1–d)by(a–1),andmultipliestheoperandby –21,wherelis the degreeof (1 – X2).Figure3 depictsthe processfor the lirst fourlevels.By a detailedstudyof this “derivative”tree, the followingobservationsmaybeproved:
1. The rkhlevel of the tree correspondsto the nth derivativeofa(y), lr”(x) = Dn-’(u(y)) = L(D”-’(x)) +R(D’’-’(x)) (theroot of the treeis designatedn = 1, andDO~~,l(x)) =J~,l(x)).
2. At the ntblevel,the treehasn nodes,andthe kthnode (k runsfrom O through n – 1), is a polynomial in x, given by
‘n = 1
n = 2
n = 3
n = 4
C(rI,k)f~-~l,m-k = C(n, k)xw-*l(l - xz)’’’’-~,where C(n, k)is a constant.It canbe seenthatthersthderivativesof u satisfy:
3. Therearetwo sourcescontributingto thevalueof C(n, k). Oneis the action of R on the (k – I)th term,andtheotheris thatofL on the kth termon the (n- l)th level.
Inductionargumentsinconjunctionwiththeaboveargumentsthengive:
C(I, o)= 1C(n,k)= o Vk>n,k <0
C(n+ 1,k)= (2k– n+ l)C(n, k)- 2(na– k+ l)C(n, k - 1), n> 1. (A.30)
Now, all terms in D“(x) containingan x term havingpositivedegreewill vanish,whenevaluatedat x = O.For even n, all thenodes have an x term with an odd degree, and hence D“(x)vanishesidenticallyat x = O.Foroddn, all terms,exceptthe termcorresponding to k = (rI+ 1)/2, vanish at x = O. Sinceb.= D“(x) 11=0,itfollowsthat~k = Oandbu+,= C(2k+ 1, k).