researcharticle mexican hat wavelet kernel elm for

Research ArticleMexican Hat Wavelet Kernel ELM for Multiclass Classification

Jie Wang Yi-Fan Song and Tian-Lei Ma

School of Electrical Engineering Zhengzhou University Zhengzhou China

Correspondence should be addressed to Yi-Fan Song 574839023qqcom

Received 25 November 2016 Revised 23 January 2017 Accepted 24 January 2017 Published 21 February 2017

Academic Editor Jose David Martın-Guerrero

Copyright copy 2017 Jie Wang et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Kernel extreme learning machine (KELM) is a novel feedforward neural network which is widely used in classification problemsTo some extent it solves the existing problems of the invalid nodes and the large computational complexity in ELM However thetraditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems In order to solve theabove problem a new classifier MexicanHat wavelet KELM classifier is proposed in this paperThe proposed classifier successfullyimproves the training accuracy and reduces the training time in the multiclass classification problems Moreover the validity of theMexican Hat wavelet as a kernel function of ELM is rigorously proved Experimental results on different data sets show that theperformance of the proposed classifier is significantly superior to the compared classifiers

1 Introduction

Extreme learning machine which was proposed by Huang etal [1] in 2004 is a model of single-hidden layer feedforwardneural network In this model input weights and hiddenlayer biases are initialized randomly and output weights areobtained by using the Moore-Penrose generalized inverse ofthe hidden layer output matrix Compared with the conven-tional BP neural networks ELM has faster learning speedhigher testing accuracy and lower computational complexityTherefore ELM is widely used in sales forecasting [2] imagequality assessment [3] power loss analysis [4] and so on In2006Huang et al [5] proposed incremental extreme learningmachine (I-ELM) which continuously increased the numberof hidden layer nodes to improve the training accuracySubsequently Li [6] combined I-ELM with the convex opti-mization learning method and proposed ECI-ELM in 2014which reduced the training time of I-ELMThis improvementovercame the weakness of randomly selecting weights inI-ELM and eventually improved the training accuracy Atthe same time Wang and Zhang [7] introduced the Gram-Schmidt orthogonalizationmethod into I-ELMand saved thetraining time of I-ELM to a large degree But in general I-ELM and its varieties only improve the training accuracyTheir numbers of hidden layer nodes are very likely to exceedthe number of samples Thus I-ELM greatly improves thetraining time In another perspective in order to achieve

a higher training accuracy Rong et al [8] used statisticalmethods to measure the relevance of hidden nodes of ELMand proposed P-ELM in 2008 Then in 2010 Miche et al[9] proposed OP-ELM which is an improvement of P-ELMIn addition Akusok et al [10] proposed a high-performanceELM model in 2015 which provides a solid ground fortackling numerous Big Data challenges However none ofthese methods has changed the characteristic of the randomselection of input weights In addition the linear weightedmapping method in original ELM is not replaced at all

Therefore both ELM and its varieties have someinevitable problems A Because of the random selection ofinput weights some hidden nodes may be given an inputweight that is very close to 0 which are commonly called deadnodes This phenomenon leads to the minimal effect of thesenodes and eventually affects the output accuracy B Withthe increment of the number of samples the hidden nodesnumber also becomes large Thus some high dimensionaldot product operations will appear in the training processEventually that will cause the increase of computationalcomplexity and training time This problem is commonlycalled dimension explosion C For nonlinear samples thelinear weighted mapping method often has inevitable errorwhich leads to the reduction of the training accuracy

In order to solve the above problems Huang et al [11]proposed the kernel extreme learning machine (KELM) in2012 which utilized the kernel function to replace the linear

HindawiComputational Intelligence and NeuroscienceVolume 2017 Article ID 7479140 8 pageshttpsdoiorg10115520177479140

2 Computational Intelligence and Neuroscience

weighted mapping method Initially the kernel function theyselected is a Gauss function Although [11] solves the problemof dead nodes and dimension explosion in a sense the perfor-mance of the traditional kernel function for multiclass classi-fication problems is still not very good From [12 13] we knowthat wavelet functions can be used in SVM and ELM whichhave a strong fitting capability Therefore in this paper wepropose a Mexican Hat wavelet kernel ELM (MHW-KELM)classifier which effectively solves the problems in the con-ventional classifier Compared with the traditional KELMthe MHW-KELM classifier achieves better results on dealingwith the multiclass classification problems Because of thatthe new kernel function improves the training accuracy

The basic principle of ELM and some theorems are shownin Section 2 of this paper In Section 3 the Mexican Hatwavelet kernel ELM is proposed and its validity is alsoproved Performance evaluation is presented in Section 4Conclusion is given in Section 5

2 Preliminary Work

21 ELM Model Let us suppose that there are arbitrarydistinct samples (119909119894 119905119894) | 119909119894 isin 119877119863 119905119894 isin 119877119872 119894 = 1 2 119873If the number of the hidden nodes is 119871 and the activationfunction is 119892(119909) then we can randomly select the initial valueof the inputweights119882 and the hidden biases 119887 So the hiddenlayer output function of ELM can be obtained It is shown as

119867 = [[[[[

119892 (1199081198791 1199091 + 1198871)119892 (1199081198791 119909119873 + 1198871)

sdot sdot sdotdsdot sdot sdot

119892 (1199081198791198711199091 + 119887119871)119892 (119908119879119871119909119873 + 119887119871)

]]]]] (1)

where 119908119894 isin 119877119863 119887119894 isin 119877 119894 = 1 2 119871If the output weights are 120573 according to the proof given

by Huang et al [1] the norm of 120573 is smaller and thegeneralization performance of ELM is better Therefore theoutput weights 120573 can be obtained by finding the least squaresolution of the problem

Minimize 119871119875 = 12 100381710038171003817100381712057310038171003817100381710038172 + 1198622119873sumi=1

100381710038171003817100381712058511989410038171003817100381710038172Subject to ℎ (119909119894) 120573 = 119905119879119894 minus 120585119879119894 119894 = 1 2 119873

(2)

where ℎ(119909119894) is the 119894th output vector of hidden layer 119905119894 is the119894th label vector and 120585119894 is the error between the 119894th networkoutput vector and the label vector

According to KKT theory the above problem can betransformed into a Lagrange function

119871119863 = 12 100381710038171003817100381712057310038171003817100381710038172 + 1198622119873sum119894=1

100381710038171003817100381712058511989410038171003817100381710038172

minus 119873sum119894=1

119872sum119895=1

120572119894119895 (ℎ (119909119894) 120573119895 minus 119905119894119895 + 120585119894119895) (3)

where each of the Lagrange multipliers 120572119894 corresponds to asample 119909119894 By calculating the partial derivative of (3) we canget the following set of equations120597119871119863120597120573119895 = 0 997888rarr

120573119895 = 119873sum119894=1

120572119894ℎ (119909119894)119879 = 119867119879120572(4a)

120597119871119863120597120585119894 = 0 997888rarr120572119894 = 119862120585119894

119894 = 1 2 119873(4b)

120597119871119863120597120572119894 = 0 997888rarrℎ (119909119894) 120573 minus 119905119879119894 + 120585119879119894 = 0

119894 = 1 2 119873(4c)

where 120572 = [1205721 120572119873]119879 And the least square solution of120573 can be obtained by calculating the three equations in (4a)(4b) and (4c) The solution is

and120573= 119867119879 ( 119868119862 + 119867119867119879)minus1 119879 (5)

and the output function of ELM is

119891 (119909) = ℎ (119909)119867119879 ( 119868119862 + 119867119867119879)minus1 119879 (6)

22 Translation-Invariant Kernel Theorem Kernel functionmethod is often used in SVM as a method of replacing dotproduct According to the Mercer theorem (see [14]) byintroducing the kernel function119870(119909119894 119909119895) we can replace thecalculation of dot product in ELM In order to reduce thecomputational complexity of high dimensional dot productit is necessary to ensure that 119870(119909119894 119909119895) is only a mappingmethod of the relative position of two input samples (see (7))

119870(119909119894 119909119895) = 119870 (119909119894 minus 119909119895) (7)

The kernel functions which satisfy (7) are called thetranslation-invariant kernel function In fact it is difficultto prove that a translation-invariant kernel function satisfiestheMercer theorem Fortunately for the translation-invariantkernel function the following theorem provides a necessaryand sufficient condition to make it become an admissiblesupport vector kernel

Theorem 1 (translation-invariant kernel theorem see [1516]) A translation-invariant kernel 119870(119909119894 119909119895) = 119870(119909119894 minus 119909119895)is an admissible support vector kernel if and only if the Fouriertransform

119865 [119870] (120596) = (2120587)minus1198632 int119877119863

exp (minus119895120596119909)119870 (119909) 119889119909 (8)

is nonnegative

Computational Intelligence and Neuroscience 3

The kernel function selection method of ELM is the sameas SVM Therefore the above theorem can also be used todetermine whether a function is an admissible ELM kernelThe commonly used translation-invariant kernel functionsare Gauss kernel function and polynomial kernel functionIn these two functions Gauss kernel function is a kind oftranslation-invariant kernel function And the expression ofthe two kernel functions can be given as

119866119886119906119904119904 119870(119909119894 119909119895) = exp(minus10038171003817100381710038171003817119909119894 minus 11990911989510038171003817100381710038171003817221205902 )119875119900119897119910 119870(119909119894 119909119895) = (1 + 119909119894119909119895)119889

(9)

In (9) 120590 is a Gauss core width and 119889 is an adjustablepolynomial power exponent

3 Mexican Hat Wavelet Kernel ELM

31 Kernel ELM In original ELMmodel the linear weightedhidden output function ℎ(119909) is usually not satisfied with themapping method of the nonlinear samples In order to solvethis problem we can replace ℎ(119909)119867119879 and119867119867119879 in (6) with akernel function119870(119906 V) And the result is

119891 (119909) = [[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879 (10)

whereΩELM is the kernel function matrix of119883 (see (11))

ΩELM = [119870 (119909119894 119909119895)] 119894 = 1 2 119873 119895 = 1 2 119873 (11)

32 Mexican Hat Wavelet Kernel Function In this part theMexican Hat wavelet kernel function is proposed It is alsoproved that Mexican Hat wavelet function is an admissibleELM kernel

Theorem 2 (see [12]) Let120595(119909) be a mother wavelet Let 119886 and119888 denote the dilation and translation respectively and 119909 119886 119888 isin119877 If 119909119894 119909119895 isin 119877119863 then the dot product wavelet kernel is

119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119888119886 )120595(119909119889119895 minus 119888119886 ) (12)

If it satisfies the translation-invariant kernel theoremthe following translation-invariant kernel function can beobtained

119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119909119889119895119886 ) (13)

The proof ofTheorem 2 is given in [12] we will not repeatit in this paper We use Mexican Hat wavelet as the motherwavelet (see (14)) Then the Mexican Hat wavelet kernelfunction is derived (see (15)) In this paper it is also provedthat Mexican Hat wavelet satisfies the translation-invariantkernel theorem In other words it is also an admissible ELMkernel

120595 (119909) = (1 minus 1199092) exp(minus11990922 ) (14)

119870(119909119894 119909119895)= 119863prod119889=1

[1 minus (119909119889119894 minus 119909119889119895119886 )2] exp[minus12 (119909119889119894 minus 119909119889119895119886 )2] (15)

Lemma 3 As a kind of translation-invariant kernel functionMexican Hat wavelet is an admissible ELM kernel

Proof Firstly it should be proved that the Fourier transformof Mexican Hat wavelet is nonnegative (see (16))

119865 (120596) = (2120587)1198632 119863prod119889=1

int+infinminusinfin

exp (minus119895120596119889119909119889) (1 minus 11990921198891198862 )sdot exp(minus 119909211988921198862)119889119909119889 ge 0

(16)

Equation (17) can be decomposed into a set of integralinequalities (see (19)) And the derivation process is

119865 (120596) = (2120587)1198632 119886119863 119863prod119889=1

int+infinminusinfin

exp [minus119895120596119886 (119909119889119886 ) minus 12 (119909119889119886 )2] minus (119909119889119886 )2 exp [minus119895120596119886 (119909119889119886 ) minus 12 (119909119889119886 )2]119889(119909119889119886 )= (2120587)1198632 119886119863 119863prod

119889=1

exp(minus1211988621205962119889)int+infin

minusinfinexp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 minus int+infin

minusinfin1199092119889exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889

(17)

The integral term in (17) can be written as

119868 = 119863prod119889=1

(119865(119889)1 (120596) minus 119865(119889)2 (120596)) (18)

where 119868 is the integral term in (17)

119865(119889)1 (120596) = int+infinminusinfin

exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (19)


Table 1 Basic features of 12 data sets

Data set Training number Testing number Attribute CategoryAbalone 2000 2177 8 3Auto MPG 200 198 7 5Bank 2000 2521 16 2Car Evaluation 1000 728 6 4Wine 100 78 13 3Wine Quality 2000 4497 11 7Iris 100 50 4 3Glass 100 114 9 2Image 100 110 19 7Yeast 1000 484 8 4Zoo 50 51 16 7Letter 2000 18000 16 26


1199092119889 exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (20)

According to the translation invariance of the integral itis easy to get (21) by using the partial integrationmethodTheanswer is

119865(119889)1 (120596) = (2120587)12 119865(119889)2 (120596) = (2120587)12 (1 minus 11988621205962) (21)

Substituting (21) into (18) we have

119863prod119889minus1

(119865(119889)1 (120596) minus 119865(119889)2 (120596)) = (2120587)1198632 11988621198631205962119863 (22)

Then substituting (22) into (17) we can obtain theFourier transform

119865 (120596) = (2120587)119863 1198863119863 exp(minus11988622119863sum119889=1

1205962119889) 119863prod119889=1

1205962119889 (23)

From (23) it is known that if 119886 ge 0 119865(120596) ge 0 Thereforeaccording to the translation-invariant kernel theorem Mexi-can Hat wavelet is an admissible ELM kernel

33 MHW-KELM Classifier We have already proved thatMexican Hat wavelet is an admissible ELM kernel So we cansubstitute (15) into (10) and constructMHW-KELMclassifierFor a binary classification problem the output function of thenew classifier is

119891 (119909) = sgn[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(24)

Besides this classifier can also be used for the multiclassclassification problems And the output function is

119891 (119909)

= argmax[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(25)

Equation (25) means the classification result is expressedby the index value of the maximum value in output vector Inaddition we can combine the nonnegative constant param-eter 119886 of Mexican Hat wavelet and the penalty factor 119862 intoan individual and use some evolutionary algorithms such asPSO [17 18] to find the best values of these parameters Nextwe will analyze the performance of the proposed classifier

4 Performance Evaluation

This section will analyze the performance of MHW-KELMand compare it with the traditional Gauss-KELM Poly-KELM original ELM and BP classifier All these algorithmsrun on the R2014a MATLAB software The operating envi-ronment is Core-i7 26GHz CPU 8G RAM We choosescaled conjugate gradient algorithm to optimize BP neuralnetwork which is faster than normal BP neural network Inorder to get excellent performance the number of hiddennodes of original ELM and BP is selected as 100 and 30of training samples respectively The data sets used in theexperiment are from theUCI database [19]They areAbaloneAutoMPG Bank EvaluationWineWineQuality Iris GlassImage Yeast Zoo and Letter respectively The basic featuresof these 12 data sets are shown in Table 1

Then we use the 12 data sets given in Table 1 to test therunning time and training accuracy of 5 algorithms Eachdata set will be tested by each algorithm 100 times For eachtime the training sample will be randomly selected fromthe total sample In order to conduct a rigorous comparisonpaired Studentrsquos test is performed which gives the probability


Table 2 Performance comparison with statistical test on Abalone

Data set (training number category) MHW-KELM Gauss-KELM Poly-KELM Original ELM SCG-BP

Abalone (2000 3)

Mean 7942 7926 7724 6253 6420Std plusmn036 plusmn044 plusmn092 plusmn289 plusmn313119901 value 100 044 735e minus 04 212e minus 07 347e minus 07Time 0665 0673 0968 3357 7835

Table 3 Performance comparison with statistical test on Auto MPG


Auto MPG (200 5)

Mean 8224 7355 7901 5671 lt10Std plusmn051 plusmn122 plusmn090 plusmn324119901 value 100 115e minus 06 735e minus 04 516e minus 05 0Time 0070 0071 0075 0058 1235

Table 4 Performance comparison with statistical test on Bank


Bank (2000 2)

Mean 8971 8989 8650 6585 8799Std plusmn043 plusmn036 plusmn064 plusmn143 plusmn125119901 value 021 100 447e minus 06 109e minus 09 003Time 0657 0652 08917 3227 7611

that two sets come from distributions with an equal meanTables 2ndash13 record the results of these experiments and eachtable corresponds to a data set All tables have four elementswhich represent mean accuracy standard deviation 119901 valueobtained by paired Studentrsquos test and the running timerespectively For each data set the data with bold type meansthis is the best accuracy or the best running time (119901 value= 100) while the data with italic type means there is nostatistical difference between this one and the best accuracyor it is very close to the best time (119901 value ge 005)

By drawing the running time in all tables to a line graphwe can get Figure 1 In Figure 1 the horizontal coordinatecorresponds to the number of training samples 50 100200 1000 and 2000 respectively Without loss of generalitywe can select five data sets Zoo Image Auto MPG CarEvaluation and Abalone as the representations of differentnumbers of samples The vertical coordinate shows the meanrunning time of each data set Moreover the running timesofMHW-KELM andGauss-KELM are very close So we onlydraw the running time ofMHW-KELM Four lines are drawnwith different styles

From all tables and Figure 1 it is clear to see that whenthe training number is larger than 1000 compared to otheralgorithms MHW-KELM shows an obvious advantage inrunning time For the data sets whose training number ismore than 1000 such asAbalone Bank Car EvaluationWineQuality Yeast and Letter we can obtain that the runningtime of MHW-KELM and Gauss-KELM is less than thatof other algorithms That means translation-invariant kernelis superior to other kernels Therefore it can be concludedthat the choice of translation-invariant kernel function caneffectively shorten the running time when the training size islarge enough

50 100 200 1000 20000

1

2

3

4

5

6

7

8

Runn

ing

time (

s)

Number of training samples

MHWPoly

ELMBP

Figure 1 Comparison of running time of 4 algorithms

From Tables 2ndash13 it can be obviously seen that the clas-sification performance of MHW-KELM is better than otheralgorithmswhen the number of categories ismore than 4Theresults of paired Studentrsquos test show that the performance ofMHW-KELM is significantly different (119901 value le 005) fromthat of the original ELM and SCG-BP on all data sets and itis also different from Gauss-KELM and Poly-KELM on AutoMPG Car Evaluation Wine Quality and Image These fourdata sets have one thing in common which is the fact thatthe category numbers of these data sets are all more than 4


Table 5 Performance comparison with statistical test on Car Evaluation


Car Evaluation (1000 4)


Table 6 Performance comparison with statistical test on Wine


Wine (100 3)

Mean 9982 8363 9928 5010 3687Std plusmn009 plusmn081 plusmn013 plusmn293 plusmn128119901 value 100 552e minus 07 067 414e minus 09 805e minus 10Time 0070 0072 0058 0058 1088

Table 7 Performance comparison with statistical test on Wine Quality


Wine Quality (2000 7)


Table 8 Performance comparison with statistical test on Iris


Iris (100 3)

Mean 9920 9932 9885 6134 3541Std plusmn016 plusmn011 plusmn012 plusmn078 plusmn033119901 value 062 100 001 459e minus 05 621e minus 08Time 0071 0075 0062 0055 1290

Table 9 Performance comparison with statistical test on Glass


Glass (100 2)


Table 10 Performance comparison with statistical test on Image


Image (100 7)

Mean 9387 8512 8758 3556 1613Std plusmn154 plusmn078 plusmn046 plusmn194 plusmn325119901 value 100 664e minus 06 178e minus 06 891e minus 09 245e minus 11Time 0075 0072 0061 0056 1193

Table 11 Performance comparison with statistical test on Yeast


Yeast (1000 4)



Table 12 Performance comparison with statistical test on Zoo


Zoo (50 7)


Table 13 Performance comparison with statistical test on Letter


Letter (2000 26)

Mean 7262 8680 6879 1551 lt10Std plusmn1326 plusmn432 plusmn388 plusmn548119901 value 011 100 001 243e minus 03 0Time 1668 1833 2132 4559 7270

Besides when the category number is less than 4 such asAbalone Bank Wine Iris Yeast and Letter MHW-KELMstill has a similar performance to Gauss-KELM or Poly-KELM Therefore MHW-KELM is an excellent classifierin multiclass classification problems which is better thantraditional kernel ELMThat means the Mexican Hat waveletfunction is a better ELM kernel than the Gaussian function

5 Conclusion

In this paper we propose a classifier the Mexican Hatwavelet kernel ELM classifier which can be applied to themulticlass classification problem Besides its validity as anadmissible ELMkernel is also provedThis classifier solves theinevitable problems in original ELM by replacing the linearweighted mapping method with Mexican Hat wavelet Theexperimental results show that the training time of MHW-KELMclassifier ismuch less than that of original ELMwhichsolves the problem of the dimension explosion in originalELM Meanwhile the training accuracy of this classifier issuperior to the traditional Gauss-KELM and original ELM indealing with the multiclass classification problems

In future work in order to reduce the impact of inequalityof the training data on the performance we plan to utilize theboosting weighted ELMproposed by Li et al [20] to optimizethe proposed classifier In addition from the experimentalresults of this paper it can be seen that a single kernel functioncannot meet the requirements of all data sets So we areprepared to combine multiple kernel functions to constructmixed kernel ELM in order to suit different situations

Competing Interests

The authors declare that there are no competing interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the support of the fol-lowing foundations 973 Project of China (2013CB733605)

the National Natural Science Foundation of China (21176073and 61603343) and the Fundamental Research Funds for theCentral Universities

References

[1] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in Proceedings of the International Joint Conference onNeural Networks vol 2 no 2 pp 985ndash990 Budapest Hungary2004

[2] Z-L Sun T-M Choi K-F Au and Y Yu ldquoSales forecastingusing extreme learning machine with applications in fashionretailingrdquo Decision Support Systems vol 46 no 1 pp 411ndash4192008

[3] S Suresh R Venkatesh Babu and H J Kim ldquoNo-referenceimage quality assessment using modified extreme learningmachine classifierrdquo Applied Soft Computing Journal vol 9 no2 pp 541ndash552 2009

[4] A H Nizar Z Y Dong and YWang ldquoPower utility nontechni-cal loss analysis with extreme learning machine methodrdquo IEEETransactions on Power Systems vol 23 no 3 pp 946ndash955 2008

[5] G-B Huang L Chen and C-K Siew ldquoUniversal approxima-tion using incremental constructive feedforward networks withrandom hidden nodesrdquo IEEE Transactions on Neural Networksvol 17 no 4 pp 879ndash892 2006

[6] Y Li ldquoOrthogonal incremental extreme learning machine forregression and multiclass classificationrdquo Neural Computing ampApplications vol 27 no 1 pp 111ndash120 2016

[7] WWang andR Zhang ldquoImproved convex incremental extremelearning machine based on enhanced random searchrdquo inUnifying Electrical Engineering and Electronics Engineering vol238 of Lecture Notes in Electrical Engineering pp 2033ndash2040Springer New York NY USA 2014

[8] H-J Rong Y-S Ong A-H Tan and Z Zhu ldquoA fast pruned-extreme learning machine for classification problemrdquo Neuro-computing vol 72 no 1ndash3 pp 359ndash366 2008

[9] Y Miche A Sorjamaa P Bas O Simula C Jutten andA Lendasse ldquoOP-ELM optimally pruned extreme learningmachinerdquo IEEE Transactions on Neural Networks vol 21 no 1pp 158ndash162 2010


[10] A Akusok K-M Bjork Y Miche and A Lendasse ldquoHigh-performance extreme learning machines a complete toolboxfor big data applicationsrdquo IEEEAccess vol 3 pp 1011ndash1025 2015

[11] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[12] L Zhang W Zhou and L Jiao ldquoWavelet support vectormachinerdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 34 no 1 pp 34ndash39 2004

[13] WQin S Yuantong K YuWQiang and S Lin ldquoParsimoniouswavelet kernel extreme learning machinerdquo Journal of Engineer-ing Science and Technology Review vol 8 no 5 pp 219ndash2262015

[14] J Mercer ldquoFunctions of positive and negative type and theirconnection with the theory of integral equationsrdquo PhilosophicalTransactions of the Royal Society A Mathematical Physical andEngineering Sciences vol 209 no 441-458 pp 415ndash446 1909

[15] A J Smola B Scholkopf and K-R Muller ldquoThe connectionbetween regularization operators and support vector kernelsrdquoNeural Networks vol 11 no 4 pp 637ndash649 1998

[16] C J C Burges ldquoGeometry and invariance in kernel basedmethodsrdquo in Advances in Kernel Methods MIT Press 1999

[17] J Kennedy and R Eberhart ldquoParticle swarm optimizationrdquoin Proceedings of the IEEE International Conference on NeuralNetworks pp 1942ndash1948 December 1995

[18] R Eberhart and J Kennedy ldquoA new optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium on Micro Machine and Human Science (MHS rsquo95) pp 39ndash43 Nagoya Japan October 1995

[19] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[20] K Li X Kong Z Lu LWenyin and J Yin ldquoBoosting weightedELM for imbalanced learningrdquo Neurocomputing vol 128 pp15ndash21 2014

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



weighted mapping method Initially the kernel function theyselected is a Gauss function Although [11] solves the problemof dead nodes and dimension explosion in a sense the perfor-mance of the traditional kernel function for multiclass classi-fication problems is still not very good From [12 13] we knowthat wavelet functions can be used in SVM and ELM whichhave a strong fitting capability Therefore in this paper wepropose a Mexican Hat wavelet kernel ELM (MHW-KELM)classifier which effectively solves the problems in the con-ventional classifier Compared with the traditional KELMthe MHW-KELM classifier achieves better results on dealingwith the multiclass classification problems Because of thatthe new kernel function improves the training accuracy

The basic principle of ELM and some theorems are shownin Section 2 of this paper In Section 3 the Mexican Hatwavelet kernel ELM is proposed and its validity is alsoproved Performance evaluation is presented in Section 4Conclusion is given in Section 5

2 Preliminary Work

21 ELM Model Let us suppose that there are arbitrarydistinct samples (119909119894 119905119894) | 119909119894 isin 119877119863 119905119894 isin 119877119872 119894 = 1 2 119873If the number of the hidden nodes is 119871 and the activationfunction is 119892(119909) then we can randomly select the initial valueof the inputweights119882 and the hidden biases 119887 So the hiddenlayer output function of ELM can be obtained It is shown as

119867 = [[[[[

119892 (1199081198791 1199091 + 1198871)119892 (1199081198791 119909119873 + 1198871)

sdot sdot sdotdsdot sdot sdot

119892 (1199081198791198711199091 + 119887119871)119892 (119908119879119871119909119873 + 119887119871)

]]]]] (1)

where 119908119894 isin 119877119863 119887119894 isin 119877 119894 = 1 2 119871If the output weights are 120573 according to the proof given

by Huang et al [1] the norm of 120573 is smaller and thegeneralization performance of ELM is better Therefore theoutput weights 120573 can be obtained by finding the least squaresolution of the problem

Minimize 119871119875 = 12 100381710038171003817100381712057310038171003817100381710038172 + 1198622119873sumi=1

100381710038171003817100381712058511989410038171003817100381710038172Subject to ℎ (119909119894) 120573 = 119905119879119894 minus 120585119879119894 119894 = 1 2 119873

(2)

where ℎ(119909119894) is the 119894th output vector of hidden layer 119905119894 is the119894th label vector and 120585119894 is the error between the 119894th networkoutput vector and the label vector

According to KKT theory the above problem can betransformed into a Lagrange function

119871119863 = 12 100381710038171003817100381712057310038171003817100381710038172 + 1198622119873sum119894=1

100381710038171003817100381712058511989410038171003817100381710038172

minus 119873sum119894=1

119872sum119895=1

120572119894119895 (ℎ (119909119894) 120573119895 minus 119905119894119895 + 120585119894119895) (3)

where each of the Lagrange multipliers 120572119894 corresponds to asample 119909119894 By calculating the partial derivative of (3) we canget the following set of equations120597119871119863120597120573119895 = 0 997888rarr

120573119895 = 119873sum119894=1

120572119894ℎ (119909119894)119879 = 119867119879120572(4a)

120597119871119863120597120585119894 = 0 997888rarr120572119894 = 119862120585119894

119894 = 1 2 119873(4b)

120597119871119863120597120572119894 = 0 997888rarrℎ (119909119894) 120573 minus 119905119879119894 + 120585119879119894 = 0

119894 = 1 2 119873(4c)

where 120572 = [1205721 120572119873]119879 And the least square solution of120573 can be obtained by calculating the three equations in (4a)(4b) and (4c) The solution is

and120573= 119867119879 ( 119868119862 + 119867119867119879)minus1 119879 (5)

and the output function of ELM is

119891 (119909) = ℎ (119909)119867119879 ( 119868119862 + 119867119867119879)minus1 119879 (6)

22 Translation-Invariant Kernel Theorem Kernel functionmethod is often used in SVM as a method of replacing dotproduct According to the Mercer theorem (see [14]) byintroducing the kernel function119870(119909119894 119909119895) we can replace thecalculation of dot product in ELM In order to reduce thecomputational complexity of high dimensional dot productit is necessary to ensure that 119870(119909119894 119909119895) is only a mappingmethod of the relative position of two input samples (see (7))

119870(119909119894 119909119895) = 119870 (119909119894 minus 119909119895) (7)

The kernel functions which satisfy (7) are called thetranslation-invariant kernel function In fact it is difficultto prove that a translation-invariant kernel function satisfiestheMercer theorem Fortunately for the translation-invariantkernel function the following theorem provides a necessaryand sufficient condition to make it become an admissiblesupport vector kernel

Theorem 1 (translation-invariant kernel theorem see [1516]) A translation-invariant kernel 119870(119909119894 119909119895) = 119870(119909119894 minus 119909119895)is an admissible support vector kernel if and only if the Fouriertransform

119865 [119870] (120596) = (2120587)minus1198632 int119877119863

exp (minus119895120596119909)119870 (119909) 119889119909 (8)

is nonnegative



119866119886119906119904119904 119870(119909119894 119909119895) = exp(minus10038171003817100381710038171003817119909119894 minus 11990911989510038171003817100381710038171003817221205902 )119875119900119897119910 119870(119909119894 119909119895) = (1 + 119909119894119909119895)119889

(9)




119891 (119909) = [[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879 (10)


ΩELM = [119870 (119909119894 119909119895)] 119894 = 1 2 119873 119895 = 1 2 119873 (11)



119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119888119886 )120595(119909119889119895 minus 119888119886 ) (12)


119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119909119889119895119886 ) (13)


120595 (119909) = (1 minus 1199092) exp(minus11990922 ) (14)

119870(119909119894 119909119895)= 119863prod119889=1




119865 (120596) = (2120587)1198632 119863prod119889=1

int+infinminusinfin


(16)


119865 (120596) = (2120587)1198632 119886119863 119863prod119889=1

int+infinminusinfin


119889=1




(17)


119868 = 119863prod119889=1

(119865(119889)1 (120596) minus 119865(119889)2 (120596)) (18)



exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (19)





1199092119889 exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (20)


119865(119889)1 (120596) = (2120587)12 119865(119889)2 (120596) = (2120587)12 (1 minus 11988621205962) (21)



(119865(119889)1 (120596) minus 119865(119889)2 (120596)) = (2120587)1198632 11988621198631205962119863 (22)


119865 (120596) = (2120587)119863 1198863119863 exp(minus11988622119863sum119889=1

1205962119889) 119863prod119889=1

1205962119889 (23)



119891 (119909) = sgn[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(24)


119891 (119909)

= argmax[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(25)








Abalone (2000 3)




Auto MPG (200 5)




Bank (2000 2)





50 100 200 1000 20000

1

2

3

4

5

6

7

8

Runn

ing

time (

s)


MHWPoly

ELMBP










Wine (100 3)








Iris (100 3)




Glass (100 2)




Image (100 7)




Yeast (1000 4)





Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





119866119886119906119904119904 119870(119909119894 119909119895) = exp(minus10038171003817100381710038171003817119909119894 minus 11990911989510038171003817100381710038171003817221205902 )119875119900119897119910 119870(119909119894 119909119895) = (1 + 119909119894119909119895)119889

(9)




119891 (119909) = [[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879 (10)


ΩELM = [119870 (119909119894 119909119895)] 119894 = 1 2 119873 119895 = 1 2 119873 (11)



119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119888119886 )120595(119909119889119895 minus 119888119886 ) (12)


119870(119909119894 119909119895) = 119863prod119889=1

120595(119909119889119894 minus 119909119889119895119886 ) (13)


120595 (119909) = (1 minus 1199092) exp(minus11990922 ) (14)

119870(119909119894 119909119895)= 119863prod119889=1




119865 (120596) = (2120587)1198632 119863prod119889=1

int+infinminusinfin


(16)


119865 (120596) = (2120587)1198632 119886119863 119863prod119889=1

int+infinminusinfin


119889=1




(17)


119868 = 119863prod119889=1

(119865(119889)1 (120596) minus 119865(119889)2 (120596)) (18)



exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (19)





1199092119889 exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (20)


119865(119889)1 (120596) = (2120587)12 119865(119889)2 (120596) = (2120587)12 (1 minus 11988621205962) (21)



(119865(119889)1 (120596) minus 119865(119889)2 (120596)) = (2120587)1198632 11988621198631205962119863 (22)


119865 (120596) = (2120587)119863 1198863119863 exp(minus11988622119863sum119889=1

1205962119889) 119863prod119889=1

1205962119889 (23)



119891 (119909) = sgn[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(24)


119891 (119909)

= argmax[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(25)








Abalone (2000 3)




Auto MPG (200 5)




Bank (2000 2)





50 100 200 1000 20000

1

2

3

4

5

6

7

8

Runn

ing

time (

s)


MHWPoly

ELMBP










Wine (100 3)








Iris (100 3)




Glass (100 2)




Image (100 7)




Yeast (1000 4)





Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in







1199092119889 exp [minus12 (119909119889 + 119895119886120596119889)2] 119889119909119889 (20)


119865(119889)1 (120596) = (2120587)12 119865(119889)2 (120596) = (2120587)12 (1 minus 11988621205962) (21)



(119865(119889)1 (120596) minus 119865(119889)2 (120596)) = (2120587)1198632 11988621198631205962119863 (22)


119865 (120596) = (2120587)119863 1198863119863 exp(minus11988622119863sum119889=1

1205962119889) 119863prod119889=1

1205962119889 (23)



119891 (119909) = sgn[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(24)


119891 (119909)

= argmax[[[[[

119870 (119909 1199091)119870 (119909 119909119873)]]]]]

119879

( 119868119862 + ΩELM)minus1 119879

(25)








Abalone (2000 3)




Auto MPG (200 5)




Bank (2000 2)





50 100 200 1000 20000

1

2

3

4

5

6

7

8

Runn

ing

time (

s)


MHWPoly

ELMBP










Wine (100 3)








Iris (100 3)




Glass (100 2)




Image (100 7)




Yeast (1000 4)





Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






Abalone (2000 3)




Auto MPG (200 5)




Bank (2000 2)





50 100 200 1000 20000

1

2

3

4

5

6

7

8

Runn

ing

time (

s)


MHWPoly

ELMBP










Wine (100 3)








Iris (100 3)




Glass (100 2)




Image (100 7)




Yeast (1000 4)





Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in










Wine (100 3)








Iris (100 3)




Glass (100 2)




Image (100 7)




Yeast (1000 4)





Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






Zoo (50 7)




Letter (2000 26)



5 Conclusion



Competing Interests


Acknowledgments



References





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






















Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



researcharticle mexican hat wavelet kernel elm for

Documents