improving rolling bearing fault diagnosis by ds evidence...

15
Research Article Improving Rolling Bearing Fault Diagnosis by DS Evidence Theory Based Fusion Model Xuemei Yao, 1 Shaobo Li, 1,2 and Jianjun Hu 2,3 1 Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang 550025, China 2 School of Mechanical Engineering, Guizhou University, Guiyang 550025, China 3 Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA Correspondence should be addressed to Shaobo Li; [email protected] Received 28 May 2017; Revised 30 August 2017; Accepted 17 September 2017; Published 22 October 2017 Academic Editor: Guiyun Tian Copyright © 2017 Xuemei Yao et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Rolling bearing plays an important role in rotating machinery and its working condition directly affects the equipment efficiency. While dozens of methods have been proposed for real-time bearing fault diagnosis and monitoring, the fault classification accuracy of existing algorithms is still not satisfactory. is work presents a novel algorithm fusion model based on principal component analysis and Dempster-Shafer evidence theory for rolling bearing fault diagnosis. It combines the advantages of the learning vector quantization (LVQ) neural network model and the decision tree model. Experiments under three different spinning bearing speeds and two different crack sizes show that our fusion model has better performance and higher accuracy than either of the base classification models for rolling bearing fault diagnosis, which is achieved via synergic prediction from both types of models. 1. Introduction Healthy operation of machinery systems is important in modern manufacturing enterprises, which leads to increasing attention to fault diagnosis technology that detects, identifies, and predicts abnormal states of manufacturing systems. Complex fault states and uncertain fault information bring high demand for real-time and intelligent fault diagnosis. Rolling bearing is a key component of rotating machinery, and any failure may cause equipment malfunction or catas- trophic consequences. Almost every machine has at least one of these components, and their faults can be the direct cause of subsequent problems in other parts. us, bearing faults should be detected at the early stage [1]. Generally, rolling bearing consists of shaſt, balls, inner race, outer race, cage, and housing. In principle, each component may fail. How- ever, the inner race, outer race, and balls are the most vul- nerable components due to friction and thus more prone to malfunction. erefore, rolling bearing fault detection and diagnosis is of great significance to ensure production efficiency and equipment safety. e essence of fault diagnosis process is signal processing and pattern recognition. Signal processing functions to extract the features that characterize the nature of the faults from complex original signals, whereas pattern recognition can classify the fault types and identify specific faults according to the input features, which can thus reduce reliance on technical personnel. us far, several methods have been used for bearing fault diagnosis, and each has its case history of successes and failures. ese methods can be classified according to their information source types, such as acoustic measurements, current and temperature monitoring, wear debris detection, and vibration analysis. Vibration analysis is broadly consid- ered as the most effective monitoring technique in rotating machinery. Numerous vibration phenomena can be inter- preted to be an amplitude modulation of the characteristic vibration frequency of a machine. Once the bearing fails, vibration pulses are produced. Pulse signals with smooth and less fluctuations are produced even when the bearing is normally operating. Recently, many fault diagnosis studies for rolling bearing based on vibration data have been reported. Sanz et al. [2] presented a method for detecting the states of rotating machinery with vibration analysis. Zhou and Cheng [3] proposed a fault diagnosis method based on image recognition for rolling bearing to realize fault classification under variable working conditions. Li et al. [4] presented Hindawi Journal of Sensors Volume 2017, Article ID 6737295, 14 pages https://doi.org/10.1155/2017/6737295

Upload: others

Post on 18-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Research ArticleImproving Rolling Bearing Fault Diagnosis byDS Evidence Theory Based Fusion Model

Xuemei Yao1 Shaobo Li12 and Jianjun Hu23

1Key Laboratory of Advanced Manufacturing Technology Ministry of Education Guizhou University Guiyang 550025 China2School of Mechanical Engineering Guizhou University Guiyang 550025 China3Department of Computer Science and Engineering University of South Carolina Columbia SC 29208 USA

Correspondence should be addressed to Shaobo Li lishaobogzueducn

Received 28 May 2017 Revised 30 August 2017 Accepted 17 September 2017 Published 22 October 2017

Academic Editor Guiyun Tian

Copyright copy 2017 Xuemei Yao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Rolling bearing plays an important role in rotating machinery and its working condition directly affects the equipment efficiencyWhile dozens of methods have been proposed for real-time bearing fault diagnosis andmonitoring the fault classification accuracyof existing algorithms is still not satisfactory This work presents a novel algorithm fusion model based on principal componentanalysis and Dempster-Shafer evidence theory for rolling bearing fault diagnosis It combines the advantages of the learning vectorquantization (LVQ) neural networkmodel and the decision tree model Experiments under three different spinning bearing speedsand two different crack sizes show that our fusion model has better performance and higher accuracy than either of the baseclassification models for rolling bearing fault diagnosis which is achieved via synergic prediction from both types of models

1 Introduction

Healthy operation of machinery systems is important inmodernmanufacturing enterprises which leads to increasingattention to fault diagnosis technology that detects identifiesand predicts abnormal states of manufacturing systemsComplex fault states and uncertain fault information bringhigh demand for real-time and intelligent fault diagnosisRolling bearing is a key component of rotating machineryand any failure may cause equipment malfunction or catas-trophic consequences Almost every machine has at least oneof these components and their faults can be the direct causeof subsequent problems in other parts Thus bearing faultsshould be detected at the early stage [1] Generally rollingbearing consists of shaft balls inner race outer race cageand housing In principle each component may fail How-ever the inner race outer race and balls are the most vul-nerable components due to friction and thus more proneto malfunction Therefore rolling bearing fault detectionand diagnosis is of great significance to ensure productionefficiency and equipment safetyThe essence of fault diagnosisprocess is signal processing and pattern recognition Signalprocessing functions to extract the features that characterize

the nature of the faults fromcomplex original signals whereaspattern recognition can classify the fault types and identifyspecific faults according to the input features which can thusreduce reliance on technical personnel

Thus far several methods have been used for bearingfault diagnosis and each has its case history of successes andfailures These methods can be classified according to theirinformation source types such as acoustic measurementscurrent and temperature monitoring wear debris detectionand vibration analysis Vibration analysis is broadly consid-ered as the most effective monitoring technique in rotatingmachinery Numerous vibration phenomena can be inter-preted to be an amplitude modulation of the characteristicvibration frequency of a machine Once the bearing failsvibration pulses are produced Pulse signals with smoothand less fluctuations are produced even when the bearing isnormally operating Recentlymany fault diagnosis studies forrolling bearing based on vibration data have been reportedSanz et al [2] presented a method for detecting the statesof rotating machinery with vibration analysis Zhou andCheng [3] proposed a fault diagnosis method based on imagerecognition for rolling bearing to realize fault classificationunder variable working conditions Li et al [4] presented

HindawiJournal of SensorsVolume 2017 Article ID 6737295 14 pageshttpsdoiorg10115520176737295

2 Journal of Sensors

Rollingbearing

Vibrationsignal

Statisticfeatures

Lowdimension

dataset

LVQ neuralnetwork

Decisiontree

DSfusionmodel

Faulttype

Accelerometersensor

Wavelettransform

DecisionmakingPCA

BADC

Figure 1 Structure of our fusion model It mainly includesA feature extraction based on wavelet transformB dimension reduction usingPCAC training of LVQ neural network and decision tree models andD DS evidence theory based fusion prediction model

a model for deep statistical feature learning from vibrationmeasurements of rotating machinery and the results showedthat deep learning with statistical feature extraction hasan essential improvement potential for diagnosing rotatingmachinery faults

Initial high-dimensional features are obtained by decom-posing the vibration signals with wavelet transform How-ever redundant information of high-dimensional featuresmay cause dimensionality problems to subsequent patternanalysis Hence principal component analysis (PCA) hasbeen introduced to reduce dimensionality and eliminateredundant information aiming at improving the classificationspeed and accuracy Taouali et al [5] proposed a newmethodfor fault detection using a reduced kernel PCA and obtainedsatisfactory results Wodecki et al [6] presented a multichan-nel processing method for local damage detection in gearboxusing the combination of PCA and time frequency Cho et al[7] suggested a fault identification method that is especiallyapplicable to process monitoring using PCA and achieveda high efficiency Nguyen and Golinval [8] addressed thefault detection problem inmechanical systems using the PCAmethod which effectively improved the detection resultsTheir findings indicate that the optimal features extractedwith PCA are considered accurate and efficient

As a simple and efficient classifier the decision tree canbe used to infer the classification rules from a set of trainingsamples which has been extensively used in fault diagnosisKarabadji et al [9] discussed a new approach for faultdiagnosis in rotating machines based on improved decisiontrees and gained ideal results Rutkowski et al [10] presenteda new algorithm based on decision trees to determine thebest attributes and obtained a high accuracy of classificationwith a short processing time Amarnath et al [11] selected thedescriptive features extracted from acoustic signals using adecision tree algorithm to realize the fault bearing diagnosisKrishnakumari et al [12] selected the best features through adecision tree to train a fuzzy classifier for fault diagnosis

In recent years artificial neural networks (ANNs) arewidely used in fault diagnosis because of their capabilityof learning highly nonlinear relationships The back prop-agation (BP) neural network is extensively used but it caneasily fall into local optimum The learning vector quanti-zation (LVQ) neural network is a learning algorithm thattrains hidden layers under the supervision state which canovercome the shortcomings of the BP network and achievebetter prediction performance Rafiee et al [13] presented anANN for fault detection and identification in gearbox usingfeatures that are extracted from vibration signals Umer and

Khiyal [14] evaluated the LVQ network for classifying textdocuments and the results showed that the LVQ requiredless training of samples and exceeded other classificationmethods Melin et al [15] described the application of com-petitive neural networks using the LVQ algorithm to classifyelectrocardiogram signals and produced desired results

Another strategy to improve prediction performanceis to use information fusion approach For example theDempster-Shafer (DS) evidence theory has been adopted tohandle information fusion Kushwah et al [16] proposed amultisensor fusion methodology using evidence theory forindoor activity recognition which gained ideal identificationaccuracy Basir and Yuan [17] investigated the use of DStheory as a tool for modelling and fusing multisensor piecesof evidence pertinent to engine quality Bhalla et al [18]used DS evidence theory to integrate the results of BP neuralnetwork and fuzzy logic to overcome the conflicts of faultdiagnosis

Previous methods for fault diagnosis of rolling bearingsuffer from either a single source of information or singletype of models which leads to biased prediction To addressthese issues we propose an information fusion model forbearing fault diagnosis by combining the LVQneural networkand the decision tree classifier of which the predictions arefused using theDS evidence theory Our algorithm is inspiredby that fact that ensemble machine learning algorithmsbased on fusing predictions from multiple base machinelearning models have been shown to be able to achieve mostcompetitive performance [19 20] and DS evidence theorybased fusion method has been successfully applied to faultdiagnosis of hydraulic pump [21] rolling bearing [22 23] andcomplex electromechanical systems [24] However previouswork of DS fusion for bearing fault diagnosis has not studiedwhether DS based evidence fusion can be used to combineheterogeneous base models into more accurate predictionmodelsThe inputs of our model are statistical characteristicsobtained by decomposing the vibration signals using wavelettransform We use the PCA technique to reduce the featuresdimensions according to the cumulative contribution rate ofthe eigenvalues We also use the LVQ neural network anddecision tree to perform the initial fault prediction Then wecalculate the basic probability assignment (BPA) of the twomodels through normalized operations Finally we fuse theresults of the two methods and adopt the DS evidence theoryto identify the fault type of the rolling bearing The structureof our fusion model is shown in Figure 1

The rest of this paper is organized as follows Section 2describes the experimental setup and design Section 3

Journal of Sensors 3

Inductionmotor

AD

AmplifierComputer Feature

extraction

Accelerometer

Rolling bearing Signal processing unit

USB

Figure 2 Schematic of the setup

presents the methodology used in this study includingfeature extraction decision tree LVQ neural network PCAtechnique and DS evidence theory Section 4 presents theexperimental results together with some discussions FinallySection 5 concludes the paper

2 Experimental Setup and Design

21 Test Rig In the present study the bearing fault diagnosisproblem is to confirm whether a rolling bearing is in goodor faulty condition The rolling bearing state needs to becategorized into the three states good inner race fault andouter race fault The test rig is equipped with fixing andclamping devices for fixing the bearing outer race while theinner race rotates with the shaft The test rig consists of aninduction motor bearing piezoelectric accelerometer signalprocessing unit and computer

In this study five KBC 6203 rolling bearings are derivedby the induction motor (2HP) One is a new bearing withoutany defects to simulate the good condition The other fourbearings are designed to simulate the inner and outer racefaults Two bearings are used to simulate two types of cracksin each fault The defect is created by the spark erosion tech-nique to control the defectThe piezoelectric type accelerom-eter IMI Sensors 608A111 measures the vibration signalsand is mounted on the base of the rolling bearing usingadhesiveThe output of the accelerometer is sent to the signalprocessing unit where the signal goes through the chargeamplifier and analogue-to-digital (AD) converter Vibrationsignals are amplified by a DACELL DNAM100 amplifier andare transformed from analogue to digital signals The signalsare then transmitted to the computer memory through theUSB port Subsequently the signals are read and processedfrom the memory to extract different statistical featuresaccording to the requirements The schematic of the setup isshown in Figure 2

22 Experiment Design Some rubber sheets are added inthe legs of the test rig to avoid environmental noise andvibration thereby obtaining more realistic data The signalprocessing unit is switched on and the first few signals arediscarded purposefully to avoid initial random variationThevibration signals with undamped natural frequency of 10KHzare accepted after the signal becomes stable

The vibration signals are gained from the accelerometerat the base of the rolling bearing The sampling frequencyis 24 kHz the length of the record is 10 s and the samplelength is 1024 for all experiment cases The highest frequencyis found to be 12 kHz via experimenting The sampling

frequency must be twice that of the highest measured fre-quency according to the Nyquist sampling theorem thusit is set at 24 kHz The choice of the sample length isconsidered arbitrary to a certain extent However the statis-tical measurement is more meaningful when the number ofsamples is large meanwhile the computing time increaseswith the number of samples Generally a sample length ofapproximately 1000 is selected to achieve balance In somefeature extraction techniques the sample length is always 2119899and the nearest 2119899 to 1000 is 1024 Therefore 1024 is selectedas the sample length under normal circumstances [25]

Ninety experiments are conducted by varying the param-eters under three different spinning speeds of the bearing(500 700 and 900 rpm) First a rolling bearing without anydefects is used for the good case Ten samples are collectedunder each spinning speed of the bearing Then 30 differentcases are obtained by changing the shaft speed Second theouter race fault condition is conducted in the test rig Thecrack of the outer race fault is created via spark erosiontechnique The crack size is 05mm wide and 07mm deepand the other crack is 03mm wide and 06mm deep Theperformance characteristics of the outer race fault of thebearing are studied as explained for the good case Vibrationsignals with outer race fault are recorded in memory keepingall other modules in good condition A crack corresponds tothree different spinning speeds of the bearing Fifteen samplesare obtained for each speed with five samplesThirty differentcases are obtained by changing the shaft speed and crack sizeThird the inner race fault is simulated by the same crack sizeas that of the outer raceThirty samples are also obtainedThecumulative 90 samples of which the dimensions are reducedby PCA are used as input for the LVQ neural networkSimilarly 90 samples are collected again as the input for thedecision tree

3 Methodology

31 Feature Extraction Generally statistical features are goodindicators of the state of machine operation Vibrationsignals are obtained from different spinning speeds andfault types and the required statistical characteristics canbe extracted using time or frequency domain analysis Themost commonly used statistical feature extraction methodsare fast Fourier transform (FFT) and wavelet transform FFTconverts time domain signal into frequency domain signaland is widely used in signal detection However the FFTmethod is inherently flawed in handling unstable processesThis method only acquires the frequency components of thesignal as a whole but is unaware of the moment at which

4 Journal of Sensors

the components appear The difference between two timedomain signals is large but the spectrum may be the samethus FFT cannot render good performance [26] In compar-ison with FFT wavelet transform is a local transformationof time and frequency [27] Wavelet transform has goodspatial and frequency domain localization characteristics Itcan effectively extract information from the signal throughthe expansion translation and other computing functionsWavelet transform is widely applied in multiscale refinementanalysis of functions or signals It can focus on the detail ofthe analysis object by using fine time domain or space step inhigh frequency which solvesmany problems that FFT cannot[28]

In the present study wavelet transform is used to collecttime domain features of the vibration signals which aregained from the accelerometerWavelet coefficients cannot bedirectly used as input of the diagnostic model thus a featureextraction preprocessing step is required to prepare the datafor the model A large number of features can be extractedfrom each signal which can be divided into two categoriesfeatures with dimensions and dimensionless features Thefeatures with dimensions such as variance mean and peakare more likely to be affected by working conditions Dimen-sionless features such as kurtosis crest and pulse are lesssensitive to external factors Different features reflect differentaspects of the fault information of the bearing Effectivefeature extraction and selection and preprocessing are criticalfor successful classification [29] The increase in the numberof features will inevitably lead to redundancy and curse ofdimensionality while ensuring comprehensive and completeaccess to the fault information To achieve a balanced controlonly 10 statistical characteristics with good sensitivity anddifferentiation to the fault type are selected as inputs to themodel in this work as shown below

(1) Variance It is the measurement of signal dispersiondegree A larger variance indicates a greater fluctuation ofdata whereas a smaller variance indicates a smaller fluc-tuation of data The following formula is used to computevariance

Variance = sum119899119894=1 (119909119894 minus 119909)2119899 (1)

(2) Kurtosis It indicates the flatness or the spikiness of thesignal It is considerably low under normal condition butincreases rapidly with the occurrence of faults It is partic-ularly effective in detecting faults in the signal The followingformula is used to solve for kurtosis

Kurtosis = (1119899)sum119899119894=1 (119909119894 minus 119909)4((1119899)sum119899119894=1 (119909119894 minus 119909)2)2 (2)

(3) Mean It represents the central tendency of the amplitudevariations of the waveform and can be used to describe signalstability which is the static component of the signal Thefollowing formula is used to obtain the mean

Mean = sum119899119894=1 119909119894119899 (3)

(4) Standard Deviation (Std) It is the measurement of theeffective energy of the signal and reflects the discrepancydegree between individuals within the group The followingformula is used for its computation

Std = radic 1119899119899sum119894=1

(119909119894 minus 119909)2 (4)

(5) Skewness It is the measurement of the skew direction andextent of the data distribution and is a numerical feature ofthe degree of asymmetry of the statistical data The followingformula is used to compute skewness

Skewness = (1119899)sum119899119894=1 (119909119894 minus 119909)3((1119899)sum119899119894=1 (119909119894 minus 119909)2)32 (5)

(6) Peak It refers to the instantaneous maximum value of thefault signal in a given time The following formula is used tocompute peak

Peak = max 10038161003816100381610038161199091198941003816100381610038161003816 (6)

(7) Median It refers to the value of a variable in the middle ofthe array which is sorted from small to largewith all variablesThe following formula is used to determine the median

Median = 119909(119899+1)2 119899 is odd1199091198992 + 119909(1198992)+12 119899 is even

(7)

(8) Root Mean Square (RMS) It is an important index indetermining whether the running state is normal in themechanical fault diagnosis system Moreover it reflects themagnitude of the signal energyThe following formula is usedto compute for RMS

RMS = radicsum119899119894=1 (119909119894)2119899 (8)

(9) Crest Factor (CF) It is the measurement of a waveformshowing the ratio of peak values to the effective value In otherwords crest factor indicates the extremeness of peaks in awaveform The following formula is used to compute it

CF = 119909peak119909RMS (9)

(10)119870 Factor It reflects the shock characteristics of vibrationsignals and is sensitive to abnormal pulses that are producedby bearing faults Its normal value is 3 If it is close to or morethan 4 shock vibration exists The following formula is usedto determine 119870 factor

119870 factor = 119909kurtosisx4RMS

(10)

The statistical feature matrix of some samples is shown inTable 1

Journal of Sensors 5

Table1Featurem

atrix

ofsomes

amples

Class

Varia

nce

Kurtosis

Mean

Std

Skew

ness

Peak

Median

RMS

CF119870fa

ctor

Train

set

100214

1615

3900011

00870

20854

05870

minus00307

01377

83919

00873

101115

124934

00011

00984

206

47066

43minus00

13001063

87519

01421

100075

122625

00011

01346

20755

08143

minus00127

0114

882310

02199

200340

150917

00012

01843

26764

12198

minus00255

01843

66173

02248

2000

98107345

000

0900988

16925

06694

minus00104

00998

67763

006

612

01396

2013

6500010

03736

34306

26430

minus00684

03736

70734

09875

300886

2874

87000

0703918

41160

25614

minus00290

02149

63282

09030

301843

268139

00011

05596

40948

32983

minus00845

03269

49603

16723

302343

338579

00010

03422

34560

24916

minus00324

01991

37395

10920

Test

set

100274

120830

00010

00863

20656

09590

minus00218

01096

86386

01770

100559

137078

000

08006

6224363

07831

minus00092

00959

67902

00352

100088

130394

00010

01602

21901

12309

minus00227

01201

68103

00270

200934

172709

00012

01513

05766

06190

minus00632

02841

91493

09153

200059

245557

00011

0119

708212

200

96minus00

70701739

98313

07793

20117

8180737

00011

03140

10114

22261

minus00400

01789

106499

05463

301387

342424

000

0903473

43834

33007

minus00508

04638

93689

14088

303126

287760

000

0703842

41692

18429

minus00832

03311

106652

05623

302959

2913

2200010

04585

33189

23107

minus00182

01846

44879

12974

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

2 Journal of Sensors

Rollingbearing

Vibrationsignal

Statisticfeatures

Lowdimension

dataset

LVQ neuralnetwork

Decisiontree

DSfusionmodel

Faulttype

Accelerometersensor

Wavelettransform

DecisionmakingPCA

BADC

Figure 1 Structure of our fusion model It mainly includesA feature extraction based on wavelet transformB dimension reduction usingPCAC training of LVQ neural network and decision tree models andD DS evidence theory based fusion prediction model

a model for deep statistical feature learning from vibrationmeasurements of rotating machinery and the results showedthat deep learning with statistical feature extraction hasan essential improvement potential for diagnosing rotatingmachinery faults

Initial high-dimensional features are obtained by decom-posing the vibration signals with wavelet transform How-ever redundant information of high-dimensional featuresmay cause dimensionality problems to subsequent patternanalysis Hence principal component analysis (PCA) hasbeen introduced to reduce dimensionality and eliminateredundant information aiming at improving the classificationspeed and accuracy Taouali et al [5] proposed a newmethodfor fault detection using a reduced kernel PCA and obtainedsatisfactory results Wodecki et al [6] presented a multichan-nel processing method for local damage detection in gearboxusing the combination of PCA and time frequency Cho et al[7] suggested a fault identification method that is especiallyapplicable to process monitoring using PCA and achieveda high efficiency Nguyen and Golinval [8] addressed thefault detection problem inmechanical systems using the PCAmethod which effectively improved the detection resultsTheir findings indicate that the optimal features extractedwith PCA are considered accurate and efficient

As a simple and efficient classifier the decision tree canbe used to infer the classification rules from a set of trainingsamples which has been extensively used in fault diagnosisKarabadji et al [9] discussed a new approach for faultdiagnosis in rotating machines based on improved decisiontrees and gained ideal results Rutkowski et al [10] presenteda new algorithm based on decision trees to determine thebest attributes and obtained a high accuracy of classificationwith a short processing time Amarnath et al [11] selected thedescriptive features extracted from acoustic signals using adecision tree algorithm to realize the fault bearing diagnosisKrishnakumari et al [12] selected the best features through adecision tree to train a fuzzy classifier for fault diagnosis

In recent years artificial neural networks (ANNs) arewidely used in fault diagnosis because of their capabilityof learning highly nonlinear relationships The back prop-agation (BP) neural network is extensively used but it caneasily fall into local optimum The learning vector quanti-zation (LVQ) neural network is a learning algorithm thattrains hidden layers under the supervision state which canovercome the shortcomings of the BP network and achievebetter prediction performance Rafiee et al [13] presented anANN for fault detection and identification in gearbox usingfeatures that are extracted from vibration signals Umer and

Khiyal [14] evaluated the LVQ network for classifying textdocuments and the results showed that the LVQ requiredless training of samples and exceeded other classificationmethods Melin et al [15] described the application of com-petitive neural networks using the LVQ algorithm to classifyelectrocardiogram signals and produced desired results

Another strategy to improve prediction performanceis to use information fusion approach For example theDempster-Shafer (DS) evidence theory has been adopted tohandle information fusion Kushwah et al [16] proposed amultisensor fusion methodology using evidence theory forindoor activity recognition which gained ideal identificationaccuracy Basir and Yuan [17] investigated the use of DStheory as a tool for modelling and fusing multisensor piecesof evidence pertinent to engine quality Bhalla et al [18]used DS evidence theory to integrate the results of BP neuralnetwork and fuzzy logic to overcome the conflicts of faultdiagnosis

Previous methods for fault diagnosis of rolling bearingsuffer from either a single source of information or singletype of models which leads to biased prediction To addressthese issues we propose an information fusion model forbearing fault diagnosis by combining the LVQneural networkand the decision tree classifier of which the predictions arefused using theDS evidence theory Our algorithm is inspiredby that fact that ensemble machine learning algorithmsbased on fusing predictions from multiple base machinelearning models have been shown to be able to achieve mostcompetitive performance [19 20] and DS evidence theorybased fusion method has been successfully applied to faultdiagnosis of hydraulic pump [21] rolling bearing [22 23] andcomplex electromechanical systems [24] However previouswork of DS fusion for bearing fault diagnosis has not studiedwhether DS based evidence fusion can be used to combineheterogeneous base models into more accurate predictionmodelsThe inputs of our model are statistical characteristicsobtained by decomposing the vibration signals using wavelettransform We use the PCA technique to reduce the featuresdimensions according to the cumulative contribution rate ofthe eigenvalues We also use the LVQ neural network anddecision tree to perform the initial fault prediction Then wecalculate the basic probability assignment (BPA) of the twomodels through normalized operations Finally we fuse theresults of the two methods and adopt the DS evidence theoryto identify the fault type of the rolling bearing The structureof our fusion model is shown in Figure 1

The rest of this paper is organized as follows Section 2describes the experimental setup and design Section 3

Journal of Sensors 3

Inductionmotor

AD

AmplifierComputer Feature

extraction

Accelerometer

Rolling bearing Signal processing unit

USB

Figure 2 Schematic of the setup

presents the methodology used in this study includingfeature extraction decision tree LVQ neural network PCAtechnique and DS evidence theory Section 4 presents theexperimental results together with some discussions FinallySection 5 concludes the paper

2 Experimental Setup and Design

21 Test Rig In the present study the bearing fault diagnosisproblem is to confirm whether a rolling bearing is in goodor faulty condition The rolling bearing state needs to becategorized into the three states good inner race fault andouter race fault The test rig is equipped with fixing andclamping devices for fixing the bearing outer race while theinner race rotates with the shaft The test rig consists of aninduction motor bearing piezoelectric accelerometer signalprocessing unit and computer

In this study five KBC 6203 rolling bearings are derivedby the induction motor (2HP) One is a new bearing withoutany defects to simulate the good condition The other fourbearings are designed to simulate the inner and outer racefaults Two bearings are used to simulate two types of cracksin each fault The defect is created by the spark erosion tech-nique to control the defectThe piezoelectric type accelerom-eter IMI Sensors 608A111 measures the vibration signalsand is mounted on the base of the rolling bearing usingadhesiveThe output of the accelerometer is sent to the signalprocessing unit where the signal goes through the chargeamplifier and analogue-to-digital (AD) converter Vibrationsignals are amplified by a DACELL DNAM100 amplifier andare transformed from analogue to digital signals The signalsare then transmitted to the computer memory through theUSB port Subsequently the signals are read and processedfrom the memory to extract different statistical featuresaccording to the requirements The schematic of the setup isshown in Figure 2

22 Experiment Design Some rubber sheets are added inthe legs of the test rig to avoid environmental noise andvibration thereby obtaining more realistic data The signalprocessing unit is switched on and the first few signals arediscarded purposefully to avoid initial random variationThevibration signals with undamped natural frequency of 10KHzare accepted after the signal becomes stable

The vibration signals are gained from the accelerometerat the base of the rolling bearing The sampling frequencyis 24 kHz the length of the record is 10 s and the samplelength is 1024 for all experiment cases The highest frequencyis found to be 12 kHz via experimenting The sampling

frequency must be twice that of the highest measured fre-quency according to the Nyquist sampling theorem thusit is set at 24 kHz The choice of the sample length isconsidered arbitrary to a certain extent However the statis-tical measurement is more meaningful when the number ofsamples is large meanwhile the computing time increaseswith the number of samples Generally a sample length ofapproximately 1000 is selected to achieve balance In somefeature extraction techniques the sample length is always 2119899and the nearest 2119899 to 1000 is 1024 Therefore 1024 is selectedas the sample length under normal circumstances [25]

Ninety experiments are conducted by varying the param-eters under three different spinning speeds of the bearing(500 700 and 900 rpm) First a rolling bearing without anydefects is used for the good case Ten samples are collectedunder each spinning speed of the bearing Then 30 differentcases are obtained by changing the shaft speed Second theouter race fault condition is conducted in the test rig Thecrack of the outer race fault is created via spark erosiontechnique The crack size is 05mm wide and 07mm deepand the other crack is 03mm wide and 06mm deep Theperformance characteristics of the outer race fault of thebearing are studied as explained for the good case Vibrationsignals with outer race fault are recorded in memory keepingall other modules in good condition A crack corresponds tothree different spinning speeds of the bearing Fifteen samplesare obtained for each speed with five samplesThirty differentcases are obtained by changing the shaft speed and crack sizeThird the inner race fault is simulated by the same crack sizeas that of the outer raceThirty samples are also obtainedThecumulative 90 samples of which the dimensions are reducedby PCA are used as input for the LVQ neural networkSimilarly 90 samples are collected again as the input for thedecision tree

3 Methodology

31 Feature Extraction Generally statistical features are goodindicators of the state of machine operation Vibrationsignals are obtained from different spinning speeds andfault types and the required statistical characteristics canbe extracted using time or frequency domain analysis Themost commonly used statistical feature extraction methodsare fast Fourier transform (FFT) and wavelet transform FFTconverts time domain signal into frequency domain signaland is widely used in signal detection However the FFTmethod is inherently flawed in handling unstable processesThis method only acquires the frequency components of thesignal as a whole but is unaware of the moment at which

4 Journal of Sensors

the components appear The difference between two timedomain signals is large but the spectrum may be the samethus FFT cannot render good performance [26] In compar-ison with FFT wavelet transform is a local transformationof time and frequency [27] Wavelet transform has goodspatial and frequency domain localization characteristics Itcan effectively extract information from the signal throughthe expansion translation and other computing functionsWavelet transform is widely applied in multiscale refinementanalysis of functions or signals It can focus on the detail ofthe analysis object by using fine time domain or space step inhigh frequency which solvesmany problems that FFT cannot[28]

In the present study wavelet transform is used to collecttime domain features of the vibration signals which aregained from the accelerometerWavelet coefficients cannot bedirectly used as input of the diagnostic model thus a featureextraction preprocessing step is required to prepare the datafor the model A large number of features can be extractedfrom each signal which can be divided into two categoriesfeatures with dimensions and dimensionless features Thefeatures with dimensions such as variance mean and peakare more likely to be affected by working conditions Dimen-sionless features such as kurtosis crest and pulse are lesssensitive to external factors Different features reflect differentaspects of the fault information of the bearing Effectivefeature extraction and selection and preprocessing are criticalfor successful classification [29] The increase in the numberof features will inevitably lead to redundancy and curse ofdimensionality while ensuring comprehensive and completeaccess to the fault information To achieve a balanced controlonly 10 statistical characteristics with good sensitivity anddifferentiation to the fault type are selected as inputs to themodel in this work as shown below

(1) Variance It is the measurement of signal dispersiondegree A larger variance indicates a greater fluctuation ofdata whereas a smaller variance indicates a smaller fluc-tuation of data The following formula is used to computevariance

Variance = sum119899119894=1 (119909119894 minus 119909)2119899 (1)

(2) Kurtosis It indicates the flatness or the spikiness of thesignal It is considerably low under normal condition butincreases rapidly with the occurrence of faults It is partic-ularly effective in detecting faults in the signal The followingformula is used to solve for kurtosis

Kurtosis = (1119899)sum119899119894=1 (119909119894 minus 119909)4((1119899)sum119899119894=1 (119909119894 minus 119909)2)2 (2)

(3) Mean It represents the central tendency of the amplitudevariations of the waveform and can be used to describe signalstability which is the static component of the signal Thefollowing formula is used to obtain the mean

Mean = sum119899119894=1 119909119894119899 (3)

(4) Standard Deviation (Std) It is the measurement of theeffective energy of the signal and reflects the discrepancydegree between individuals within the group The followingformula is used for its computation

Std = radic 1119899119899sum119894=1

(119909119894 minus 119909)2 (4)

(5) Skewness It is the measurement of the skew direction andextent of the data distribution and is a numerical feature ofthe degree of asymmetry of the statistical data The followingformula is used to compute skewness

Skewness = (1119899)sum119899119894=1 (119909119894 minus 119909)3((1119899)sum119899119894=1 (119909119894 minus 119909)2)32 (5)

(6) Peak It refers to the instantaneous maximum value of thefault signal in a given time The following formula is used tocompute peak

Peak = max 10038161003816100381610038161199091198941003816100381610038161003816 (6)

(7) Median It refers to the value of a variable in the middle ofthe array which is sorted from small to largewith all variablesThe following formula is used to determine the median

Median = 119909(119899+1)2 119899 is odd1199091198992 + 119909(1198992)+12 119899 is even

(7)

(8) Root Mean Square (RMS) It is an important index indetermining whether the running state is normal in themechanical fault diagnosis system Moreover it reflects themagnitude of the signal energyThe following formula is usedto compute for RMS

RMS = radicsum119899119894=1 (119909119894)2119899 (8)

(9) Crest Factor (CF) It is the measurement of a waveformshowing the ratio of peak values to the effective value In otherwords crest factor indicates the extremeness of peaks in awaveform The following formula is used to compute it

CF = 119909peak119909RMS (9)

(10)119870 Factor It reflects the shock characteristics of vibrationsignals and is sensitive to abnormal pulses that are producedby bearing faults Its normal value is 3 If it is close to or morethan 4 shock vibration exists The following formula is usedto determine 119870 factor

119870 factor = 119909kurtosisx4RMS

(10)

The statistical feature matrix of some samples is shown inTable 1

Journal of Sensors 5

Table1Featurem

atrix

ofsomes

amples

Class

Varia

nce

Kurtosis

Mean

Std

Skew

ness

Peak

Median

RMS

CF119870fa

ctor

Train

set

100214

1615

3900011

00870

20854

05870

minus00307

01377

83919

00873

101115

124934

00011

00984

206

47066

43minus00

13001063

87519

01421

100075

122625

00011

01346

20755

08143

minus00127

0114

882310

02199

200340

150917

00012

01843

26764

12198

minus00255

01843

66173

02248

2000

98107345

000

0900988

16925

06694

minus00104

00998

67763

006

612

01396

2013

6500010

03736

34306

26430

minus00684

03736

70734

09875

300886

2874

87000

0703918

41160

25614

minus00290

02149

63282

09030

301843

268139

00011

05596

40948

32983

minus00845

03269

49603

16723

302343

338579

00010

03422

34560

24916

minus00324

01991

37395

10920

Test

set

100274

120830

00010

00863

20656

09590

minus00218

01096

86386

01770

100559

137078

000

08006

6224363

07831

minus00092

00959

67902

00352

100088

130394

00010

01602

21901

12309

minus00227

01201

68103

00270

200934

172709

00012

01513

05766

06190

minus00632

02841

91493

09153

200059

245557

00011

0119

708212

200

96minus00

70701739

98313

07793

20117

8180737

00011

03140

10114

22261

minus00400

01789

106499

05463

301387

342424

000

0903473

43834

33007

minus00508

04638

93689

14088

303126

287760

000

0703842

41692

18429

minus00832

03311

106652

05623

302959

2913

2200010

04585

33189

23107

minus00182

01846

44879

12974

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 3

Inductionmotor

AD

AmplifierComputer Feature

extraction

Accelerometer

Rolling bearing Signal processing unit

USB

Figure 2 Schematic of the setup

presents the methodology used in this study includingfeature extraction decision tree LVQ neural network PCAtechnique and DS evidence theory Section 4 presents theexperimental results together with some discussions FinallySection 5 concludes the paper

2 Experimental Setup and Design

21 Test Rig In the present study the bearing fault diagnosisproblem is to confirm whether a rolling bearing is in goodor faulty condition The rolling bearing state needs to becategorized into the three states good inner race fault andouter race fault The test rig is equipped with fixing andclamping devices for fixing the bearing outer race while theinner race rotates with the shaft The test rig consists of aninduction motor bearing piezoelectric accelerometer signalprocessing unit and computer

In this study five KBC 6203 rolling bearings are derivedby the induction motor (2HP) One is a new bearing withoutany defects to simulate the good condition The other fourbearings are designed to simulate the inner and outer racefaults Two bearings are used to simulate two types of cracksin each fault The defect is created by the spark erosion tech-nique to control the defectThe piezoelectric type accelerom-eter IMI Sensors 608A111 measures the vibration signalsand is mounted on the base of the rolling bearing usingadhesiveThe output of the accelerometer is sent to the signalprocessing unit where the signal goes through the chargeamplifier and analogue-to-digital (AD) converter Vibrationsignals are amplified by a DACELL DNAM100 amplifier andare transformed from analogue to digital signals The signalsare then transmitted to the computer memory through theUSB port Subsequently the signals are read and processedfrom the memory to extract different statistical featuresaccording to the requirements The schematic of the setup isshown in Figure 2

22 Experiment Design Some rubber sheets are added inthe legs of the test rig to avoid environmental noise andvibration thereby obtaining more realistic data The signalprocessing unit is switched on and the first few signals arediscarded purposefully to avoid initial random variationThevibration signals with undamped natural frequency of 10KHzare accepted after the signal becomes stable

The vibration signals are gained from the accelerometerat the base of the rolling bearing The sampling frequencyis 24 kHz the length of the record is 10 s and the samplelength is 1024 for all experiment cases The highest frequencyis found to be 12 kHz via experimenting The sampling

frequency must be twice that of the highest measured fre-quency according to the Nyquist sampling theorem thusit is set at 24 kHz The choice of the sample length isconsidered arbitrary to a certain extent However the statis-tical measurement is more meaningful when the number ofsamples is large meanwhile the computing time increaseswith the number of samples Generally a sample length ofapproximately 1000 is selected to achieve balance In somefeature extraction techniques the sample length is always 2119899and the nearest 2119899 to 1000 is 1024 Therefore 1024 is selectedas the sample length under normal circumstances [25]

Ninety experiments are conducted by varying the param-eters under three different spinning speeds of the bearing(500 700 and 900 rpm) First a rolling bearing without anydefects is used for the good case Ten samples are collectedunder each spinning speed of the bearing Then 30 differentcases are obtained by changing the shaft speed Second theouter race fault condition is conducted in the test rig Thecrack of the outer race fault is created via spark erosiontechnique The crack size is 05mm wide and 07mm deepand the other crack is 03mm wide and 06mm deep Theperformance characteristics of the outer race fault of thebearing are studied as explained for the good case Vibrationsignals with outer race fault are recorded in memory keepingall other modules in good condition A crack corresponds tothree different spinning speeds of the bearing Fifteen samplesare obtained for each speed with five samplesThirty differentcases are obtained by changing the shaft speed and crack sizeThird the inner race fault is simulated by the same crack sizeas that of the outer raceThirty samples are also obtainedThecumulative 90 samples of which the dimensions are reducedby PCA are used as input for the LVQ neural networkSimilarly 90 samples are collected again as the input for thedecision tree

3 Methodology

31 Feature Extraction Generally statistical features are goodindicators of the state of machine operation Vibrationsignals are obtained from different spinning speeds andfault types and the required statistical characteristics canbe extracted using time or frequency domain analysis Themost commonly used statistical feature extraction methodsare fast Fourier transform (FFT) and wavelet transform FFTconverts time domain signal into frequency domain signaland is widely used in signal detection However the FFTmethod is inherently flawed in handling unstable processesThis method only acquires the frequency components of thesignal as a whole but is unaware of the moment at which

4 Journal of Sensors

the components appear The difference between two timedomain signals is large but the spectrum may be the samethus FFT cannot render good performance [26] In compar-ison with FFT wavelet transform is a local transformationof time and frequency [27] Wavelet transform has goodspatial and frequency domain localization characteristics Itcan effectively extract information from the signal throughthe expansion translation and other computing functionsWavelet transform is widely applied in multiscale refinementanalysis of functions or signals It can focus on the detail ofthe analysis object by using fine time domain or space step inhigh frequency which solvesmany problems that FFT cannot[28]

In the present study wavelet transform is used to collecttime domain features of the vibration signals which aregained from the accelerometerWavelet coefficients cannot bedirectly used as input of the diagnostic model thus a featureextraction preprocessing step is required to prepare the datafor the model A large number of features can be extractedfrom each signal which can be divided into two categoriesfeatures with dimensions and dimensionless features Thefeatures with dimensions such as variance mean and peakare more likely to be affected by working conditions Dimen-sionless features such as kurtosis crest and pulse are lesssensitive to external factors Different features reflect differentaspects of the fault information of the bearing Effectivefeature extraction and selection and preprocessing are criticalfor successful classification [29] The increase in the numberof features will inevitably lead to redundancy and curse ofdimensionality while ensuring comprehensive and completeaccess to the fault information To achieve a balanced controlonly 10 statistical characteristics with good sensitivity anddifferentiation to the fault type are selected as inputs to themodel in this work as shown below

(1) Variance It is the measurement of signal dispersiondegree A larger variance indicates a greater fluctuation ofdata whereas a smaller variance indicates a smaller fluc-tuation of data The following formula is used to computevariance

Variance = sum119899119894=1 (119909119894 minus 119909)2119899 (1)

(2) Kurtosis It indicates the flatness or the spikiness of thesignal It is considerably low under normal condition butincreases rapidly with the occurrence of faults It is partic-ularly effective in detecting faults in the signal The followingformula is used to solve for kurtosis

Kurtosis = (1119899)sum119899119894=1 (119909119894 minus 119909)4((1119899)sum119899119894=1 (119909119894 minus 119909)2)2 (2)

(3) Mean It represents the central tendency of the amplitudevariations of the waveform and can be used to describe signalstability which is the static component of the signal Thefollowing formula is used to obtain the mean

Mean = sum119899119894=1 119909119894119899 (3)

(4) Standard Deviation (Std) It is the measurement of theeffective energy of the signal and reflects the discrepancydegree between individuals within the group The followingformula is used for its computation

Std = radic 1119899119899sum119894=1

(119909119894 minus 119909)2 (4)

(5) Skewness It is the measurement of the skew direction andextent of the data distribution and is a numerical feature ofthe degree of asymmetry of the statistical data The followingformula is used to compute skewness

Skewness = (1119899)sum119899119894=1 (119909119894 minus 119909)3((1119899)sum119899119894=1 (119909119894 minus 119909)2)32 (5)

(6) Peak It refers to the instantaneous maximum value of thefault signal in a given time The following formula is used tocompute peak

Peak = max 10038161003816100381610038161199091198941003816100381610038161003816 (6)

(7) Median It refers to the value of a variable in the middle ofthe array which is sorted from small to largewith all variablesThe following formula is used to determine the median

Median = 119909(119899+1)2 119899 is odd1199091198992 + 119909(1198992)+12 119899 is even

(7)

(8) Root Mean Square (RMS) It is an important index indetermining whether the running state is normal in themechanical fault diagnosis system Moreover it reflects themagnitude of the signal energyThe following formula is usedto compute for RMS

RMS = radicsum119899119894=1 (119909119894)2119899 (8)

(9) Crest Factor (CF) It is the measurement of a waveformshowing the ratio of peak values to the effective value In otherwords crest factor indicates the extremeness of peaks in awaveform The following formula is used to compute it

CF = 119909peak119909RMS (9)

(10)119870 Factor It reflects the shock characteristics of vibrationsignals and is sensitive to abnormal pulses that are producedby bearing faults Its normal value is 3 If it is close to or morethan 4 shock vibration exists The following formula is usedto determine 119870 factor

119870 factor = 119909kurtosisx4RMS

(10)

The statistical feature matrix of some samples is shown inTable 1

Journal of Sensors 5

Table1Featurem

atrix

ofsomes

amples

Class

Varia

nce

Kurtosis

Mean

Std

Skew

ness

Peak

Median

RMS

CF119870fa

ctor

Train

set

100214

1615

3900011

00870

20854

05870

minus00307

01377

83919

00873

101115

124934

00011

00984

206

47066

43minus00

13001063

87519

01421

100075

122625

00011

01346

20755

08143

minus00127

0114

882310

02199

200340

150917

00012

01843

26764

12198

minus00255

01843

66173

02248

2000

98107345

000

0900988

16925

06694

minus00104

00998

67763

006

612

01396

2013

6500010

03736

34306

26430

minus00684

03736

70734

09875

300886

2874

87000

0703918

41160

25614

minus00290

02149

63282

09030

301843

268139

00011

05596

40948

32983

minus00845

03269

49603

16723

302343

338579

00010

03422

34560

24916

minus00324

01991

37395

10920

Test

set

100274

120830

00010

00863

20656

09590

minus00218

01096

86386

01770

100559

137078

000

08006

6224363

07831

minus00092

00959

67902

00352

100088

130394

00010

01602

21901

12309

minus00227

01201

68103

00270

200934

172709

00012

01513

05766

06190

minus00632

02841

91493

09153

200059

245557

00011

0119

708212

200

96minus00

70701739

98313

07793

20117

8180737

00011

03140

10114

22261

minus00400

01789

106499

05463

301387

342424

000

0903473

43834

33007

minus00508

04638

93689

14088

303126

287760

000

0703842

41692

18429

minus00832

03311

106652

05623

302959

2913

2200010

04585

33189

23107

minus00182

01846

44879

12974

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

4 Journal of Sensors

the components appear The difference between two timedomain signals is large but the spectrum may be the samethus FFT cannot render good performance [26] In compar-ison with FFT wavelet transform is a local transformationof time and frequency [27] Wavelet transform has goodspatial and frequency domain localization characteristics Itcan effectively extract information from the signal throughthe expansion translation and other computing functionsWavelet transform is widely applied in multiscale refinementanalysis of functions or signals It can focus on the detail ofthe analysis object by using fine time domain or space step inhigh frequency which solvesmany problems that FFT cannot[28]

In the present study wavelet transform is used to collecttime domain features of the vibration signals which aregained from the accelerometerWavelet coefficients cannot bedirectly used as input of the diagnostic model thus a featureextraction preprocessing step is required to prepare the datafor the model A large number of features can be extractedfrom each signal which can be divided into two categoriesfeatures with dimensions and dimensionless features Thefeatures with dimensions such as variance mean and peakare more likely to be affected by working conditions Dimen-sionless features such as kurtosis crest and pulse are lesssensitive to external factors Different features reflect differentaspects of the fault information of the bearing Effectivefeature extraction and selection and preprocessing are criticalfor successful classification [29] The increase in the numberof features will inevitably lead to redundancy and curse ofdimensionality while ensuring comprehensive and completeaccess to the fault information To achieve a balanced controlonly 10 statistical characteristics with good sensitivity anddifferentiation to the fault type are selected as inputs to themodel in this work as shown below

(1) Variance It is the measurement of signal dispersiondegree A larger variance indicates a greater fluctuation ofdata whereas a smaller variance indicates a smaller fluc-tuation of data The following formula is used to computevariance

Variance = sum119899119894=1 (119909119894 minus 119909)2119899 (1)

(2) Kurtosis It indicates the flatness or the spikiness of thesignal It is considerably low under normal condition butincreases rapidly with the occurrence of faults It is partic-ularly effective in detecting faults in the signal The followingformula is used to solve for kurtosis

Kurtosis = (1119899)sum119899119894=1 (119909119894 minus 119909)4((1119899)sum119899119894=1 (119909119894 minus 119909)2)2 (2)

(3) Mean It represents the central tendency of the amplitudevariations of the waveform and can be used to describe signalstability which is the static component of the signal Thefollowing formula is used to obtain the mean

Mean = sum119899119894=1 119909119894119899 (3)

(4) Standard Deviation (Std) It is the measurement of theeffective energy of the signal and reflects the discrepancydegree between individuals within the group The followingformula is used for its computation

Std = radic 1119899119899sum119894=1

(119909119894 minus 119909)2 (4)

(5) Skewness It is the measurement of the skew direction andextent of the data distribution and is a numerical feature ofthe degree of asymmetry of the statistical data The followingformula is used to compute skewness

Skewness = (1119899)sum119899119894=1 (119909119894 minus 119909)3((1119899)sum119899119894=1 (119909119894 minus 119909)2)32 (5)

(6) Peak It refers to the instantaneous maximum value of thefault signal in a given time The following formula is used tocompute peak

Peak = max 10038161003816100381610038161199091198941003816100381610038161003816 (6)

(7) Median It refers to the value of a variable in the middle ofthe array which is sorted from small to largewith all variablesThe following formula is used to determine the median

Median = 119909(119899+1)2 119899 is odd1199091198992 + 119909(1198992)+12 119899 is even

(7)

(8) Root Mean Square (RMS) It is an important index indetermining whether the running state is normal in themechanical fault diagnosis system Moreover it reflects themagnitude of the signal energyThe following formula is usedto compute for RMS

RMS = radicsum119899119894=1 (119909119894)2119899 (8)

(9) Crest Factor (CF) It is the measurement of a waveformshowing the ratio of peak values to the effective value In otherwords crest factor indicates the extremeness of peaks in awaveform The following formula is used to compute it

CF = 119909peak119909RMS (9)

(10)119870 Factor It reflects the shock characteristics of vibrationsignals and is sensitive to abnormal pulses that are producedby bearing faults Its normal value is 3 If it is close to or morethan 4 shock vibration exists The following formula is usedto determine 119870 factor

119870 factor = 119909kurtosisx4RMS

(10)

The statistical feature matrix of some samples is shown inTable 1

Journal of Sensors 5

Table1Featurem

atrix

ofsomes

amples

Class

Varia

nce

Kurtosis

Mean

Std

Skew

ness

Peak

Median

RMS

CF119870fa

ctor

Train

set

100214

1615

3900011

00870

20854

05870

minus00307

01377

83919

00873

101115

124934

00011

00984

206

47066

43minus00

13001063

87519

01421

100075

122625

00011

01346

20755

08143

minus00127

0114

882310

02199

200340

150917

00012

01843

26764

12198

minus00255

01843

66173

02248

2000

98107345

000

0900988

16925

06694

minus00104

00998

67763

006

612

01396

2013

6500010

03736

34306

26430

minus00684

03736

70734

09875

300886

2874

87000

0703918

41160

25614

minus00290

02149

63282

09030

301843

268139

00011

05596

40948

32983

minus00845

03269

49603

16723

302343

338579

00010

03422

34560

24916

minus00324

01991

37395

10920

Test

set

100274

120830

00010

00863

20656

09590

minus00218

01096

86386

01770

100559

137078

000

08006

6224363

07831

minus00092

00959

67902

00352

100088

130394

00010

01602

21901

12309

minus00227

01201

68103

00270

200934

172709

00012

01513

05766

06190

minus00632

02841

91493

09153

200059

245557

00011

0119

708212

200

96minus00

70701739

98313

07793

20117

8180737

00011

03140

10114

22261

minus00400

01789

106499

05463

301387

342424

000

0903473

43834

33007

minus00508

04638

93689

14088

303126

287760

000

0703842

41692

18429

minus00832

03311

106652

05623

302959

2913

2200010

04585

33189

23107

minus00182

01846

44879

12974

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 5

Table1Featurem

atrix

ofsomes

amples

Class

Varia

nce

Kurtosis

Mean

Std

Skew

ness

Peak

Median

RMS

CF119870fa

ctor

Train

set

100214

1615

3900011

00870

20854

05870

minus00307

01377

83919

00873

101115

124934

00011

00984

206

47066

43minus00

13001063

87519

01421

100075

122625

00011

01346

20755

08143

minus00127

0114

882310

02199

200340

150917

00012

01843

26764

12198

minus00255

01843

66173

02248

2000

98107345

000

0900988

16925

06694

minus00104

00998

67763

006

612

01396

2013

6500010

03736

34306

26430

minus00684

03736

70734

09875

300886

2874

87000

0703918

41160

25614

minus00290

02149

63282

09030

301843

268139

00011

05596

40948

32983

minus00845

03269

49603

16723

302343

338579

00010

03422

34560

24916

minus00324

01991

37395

10920

Test

set

100274

120830

00010

00863

20656

09590

minus00218

01096

86386

01770

100559

137078

000

08006

6224363

07831

minus00092

00959

67902

00352

100088

130394

00010

01602

21901

12309

minus00227

01201

68103

00270

200934

172709

00012

01513

05766

06190

minus00632

02841

91493

09153

200059

245557

00011

0119

708212

200

96minus00

70701739

98313

07793

20117

8180737

00011

03140

10114

22261

minus00400

01789

106499

05463

301387

342424

000

0903473

43834

33007

minus00508

04638

93689

14088

303126

287760

000

0703842

41692

18429

minus00832

03311

106652

05623

302959

2913

2200010

04585

33189

23107

minus00182

01846

44879

12974

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

6 Journal of Sensors

32 Dimensionality Reduction PCA is a statistical methodand is widely used in data reduction By means of orthogonaltransformation a group of variables that may be related toone another are transformed into a set of linearly uncorre-lated variables which is called the principal components Itfunctions to maintain the primary information of originalfeatures while reducing the complexity of the data whichreveals the simple structure behind the complex data PCAis a simple and nonparametric method of extracting relevantinformation from intricate data

The purpose of PCA is to reduce the dimensionality ofdata while preserving as many changes as possible in theoriginal dataset PCA transforms the data into the coordinatesystem thus themaximum variance of any projection of datalies on the first coordinate and the second largest variancefalls on the second coordinate and so on PCA algorithmcan remove redundant information simplify the difficultyof dealing with the problem and improve the resistance toexternal interference through the processing of raw dataTherefore PCA is used in this paperThe specific steps of PCAalgorithm are shown as follows

Step 1 Input sample matrix 119863 = 1199091 1199092 119909119899119879 The rowsof the matrix represent the number of samples whereas thecolumns represent the dimensions

Input percentage of information retention after dimen-sion reduction (e)

Step 2 Calculate the mean by columns

119909 = sum119899119894=1 119909119894119899 (11)

Step 3 Obtain the new sample matrix 119872 with data central-ization 120579119894 = 119909119894 minus 119909

119860 = [1205791 1205792 120579119899] 119872 = 119860119860119879

(12)

Step 4 Calculate the eigenvalues and eigenvectors

119872119880 = 120582119880 997888rarr1205821 gt 1205822 gt sdot sdot sdot gt 120582119899119860119880 = 120582119880 997888rarr119880 = 1199061 1199062 119906119899

(13)

Step 5 Determine the final dimension k

sum119896119894=1 120582119894sum119899119894minus1 120582119894 ge 119890 997888rarr1205821 1205822 120582119896

(14)

The cumulative contribution rate of the eigenvalues (119890) isused to measure the representation of the newly generatedprincipal components to the original data Generally 119890 shouldbe greater than or equal to 85 to extract the former 119896principal components as the sample features

Step 6 Output the principal components

119880119896 = (1199061 1199062 119906119896) 119875 = 119909 lowast 119880119896 (15)

Notably the dataset is divided into training and testingsets prior to importing the model in this study Thereforeboth sets must be processed separately when PCA is usedSubtracting the mean value of the training sample and usingthe transformationmatrix obtained from the training sampleare important when the dimension of the testing sample isreduced to ensure that the training and testing samples areswitched to the same sample space In this study the firstfour principal components are selected and some of them areshown in Table 2

33 Decision Tree Decision tree is a common supervisedclassification method in data mining In supervised learninga bunch of samples is provided Each sample has a set ofattributes and a category label These categories are deter-mined in advance and a classifier is then created by a learningalgorithm The topmost node of the decision tree is the rootnode Decision tree classifies a sample from the root to theleaf node Each nonleaf node represents a test of the attributevalue Each branch of the tree represents the results of the testEach leaf node represents a category of the object In shortthe decision tree is a tree structure similar to a flow diagram

A decision tree is built recursively following a top-downapproach It compares and tests the attribute values on itsinternal nodes from the root node then determines thecorresponding branch according to the attribute value ofthe given instance and finally draws the conclusion in itsleaf node The process is repeated on the subtree whichis rooted with a new node Thus far many decision treealgorithms exist but themost commonly used one is the C45algorithmThe pseudocode of the C45 algorithm is shown inPseudocode 1

A leafy decision tree may be created due to the noises andoutliers of the training data which will result in overfittingMany branches reflect anomalies of the data The solutionis pruning and cutting off the most unreliable branchesin which pre- and postpruning are widely used The C45algorithm adopts a pessimistic postpruning If the error ratecan be reduced through replacing a subtree with its leaf nodethe subtree will be pruned

34 LVQNeural Network LVQ is an input feed-forward neu-ral network with supervised learning for training the hiddenlayers The LVQ neural network consists of input hiddenand output layers The hidden layer automatically learns andclassifies the input vectorsThe results of classification dependonly on the distance between the input vectors If the distancebetween the two input vectors is particularly similar thenthe hidden layer divides them into the same class and outputthem

The network is completely connected between the inputand hidden layers meanwhile the hidden layer is partiallyconnected with the output layer Each output layer neuron isconnected to different groups of hidden layer neurons The

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 7

Table 2 Principal components of some samples

Class PCA1 PCA2 PCA3 PCA4

Trainset

1 minus19913 minus02596 minus10560 minus024121 minus21547 minus19631 19927 086111 minus31183 minus00118 11916 015642 minus21922 06848 minus05967 030522 minus08282 12599 minus06597 066122 minus13281 minus05063 minus08747 044673 37605 16074 02554 minus051143 72074 minus13968 11143 minus056163 42621 18465 02609 00197

Testset

1 minus15410 minus10017 03682 020041 minus12102 minus21453 07366 056071 minus16120 04242 04177 minus056932 22970 minus14313 minus09372 minus261932 minus35065 minus13869 minus17524 minus096302 minus03051 minus08110 minus03567 minus071033 28365 minus04960 minus11518 184983 39953 minus04302 03341 074503 29665 minus09305 minus06439 03039

Input an attribute set dataset DOutput a decision tree

(a) Tree = (b) if119863 is ldquopurerdquo or other end conditions are met then(c) terminate(d) end if(e) for each attribute 119886 isin 119863 do(f) compute information gain ratio (InGR)(g) end for(h) 119886best = attribute with the highest InGR(i) Tree = create a tree with only one node 119886best in the root(j)119863V = generate a subset from119863 except 119886best(k) for all119863V do(l) subtree = C45 (119863V)(m) set the subtree to the corresponding branch of the Tree according to the InGR(n) end for

Pseudocode 1 Pseudocode of the C45 algorithm

number of neurons in the hidden layer is always greater thanthat of the output layer Each hidden layer neuron is con-nectedwith only one output layer neuron and the connectionweight is always 1 However each output layer neuron canbe connected to multiple hidden layer neuronsThe values ofthe hidden and output layer neurons can only be 1 or 0 Theweights between the input and hidden layers are graduallyadjusted to the clustering center during the training When asample is placed into the LVQ neural network the neurons ofthe hidden layer generate the winning neuron by the winner-learning rules allowing the output to be 1 and the other to be

0 The output of the output layer neurons that are connectedto the winning neuron is 1 whereas the other is 0 then theoutput provides the pattern class of the current input sampleThe class learned by the hidden layer becomes a subclass andthe class learned by the output layer becomes the target class[30] The architecture of the LVQ neural network is shown inFigure 3

The training steps of the LVQ algorithm are as follows

Step 1 The learning rate 120578 (120578 gt 0) and the weights 119882119894119895between the input and hidden layers are initialized

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

8 Journal of Sensors

Input layer

Hidden layerOutput layer

Class 1

Class n

Class i

Input vector

middot middot middot

middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middot

middot middot middot middot middot middot

middot middot middotx1

xi

xn

Figure 3 Architecture of the LVQ neural network

Step 2 The input vector 119909 = (1199091 1199092 119909119899)119879 is fed to theinput layer and the distance (119889119894) between the hidden layerneuron and the input vector is calculated

119889119894 = radic 119899sum119895=1

(119909119895 minus 119908119894119895)2 (16)

Step 3 Select the hidden layer neuron with the smallestdistance from the input vector If 119889119894 is the minimum then theoutput layer neuron connected to it is labeled with 119888119894Step 4 The input vector is labeled with 119888119909 If 119888119894 = 119888119909 theweights are adjusted as follows

119908119894119895-new = 119908119894119895-old + 120578 (119909 minus 119908119894119895-old) (17)

Otherwise the weights are updated as follows

119908119894119895-new = 119908119894119895-old minus 120578 (119909 minus 119908119894119895-old) (18)

Step 5 Determine whether the maximum number of iter-ations is satisfied The algorithm ends if it is satisfiedOtherwise return to Step 2 and continue the next round oflearning

35 Evidence Theory Data fusion is a method of obtainingthe best decision from different sources of data In recentyears it has attracted significant attention for its wideapplication in fault diagnosis Generally data fusion can beconducted at three levels The first is data-level fusion [31]Raw data from different sources are directly fused to producemore information than the original data At this level thefusion exhibits small loss and high precision but is time-consuming and unstable and has weak anti-interferencecapabilities The second is feature-level fusion [32] At thislevel statistical features are extracted separately using signalprocessing technique All features are fused to find an optimalsubset which is then fed to a classifier for better accuracyAt this level information compression for transmission isachieved but with poor integration accuracy The third isdecision-level fusion [33] This fusion is the highest level

Recognition frame

a b c a b d

a b

a b c d

a c a db c b d

b c d

c d

a c d

Figure 4 Recognition framework of the DS theory

of integration which influences decision making and it isthe ultimate result of three-level integration The decision-level fusion exhibits strong anti-interference ability and smallamount of communication but suffers from large amount ofdata loss and high cost of pretreatment In this paper we focuson decision-level fusion which is known as the DS evidencetheory

The DS evidence theory has been originally establishedin 1967 by Dempster and developed later in 1976 by Shaferwho is a student of Dempster Evidence theory is an extensionof Bayesian method In the Bayesian method the proba-bility must satisfy the additivity which is not the case forthe evidence theory The DS evidence theory can expressuncertainty leaving the rest of the trust to the recognitionframework This theory involves the following mathematicaldefinitions

Definition 1 (recognition framework) Define Ω = 1205791 1205792 120579119899 as a set where Ω is a finite set of possible valuesand 120579119894 is the conclusion of the model The set is called therecognition framework 2Ω is a power set composed of allthe subsets The recognition framework with capacity of fourlayers and the relationship between the subsets is shown inFigure 4 a b c and 119889 are the elements of the framework

Definition 2 (BPA) BPA is a primitive function in DSevidence theory Assume Ω as the recognition frameworkthen m is a mapping from 2120579 to [0 1] and 119860 is the subset ofΩm is called the BPA when it meets the following equation

119898(0) = 0sum119860subeΩ

119898(119860) = 1 (19)

Definition 3 (combination rules) For forall119860 sube Ω a finite num-ber of 119898 functions (1198981 1198982 119898119899) exist on the recognitionframework The combination rules are as follows

119898(0) = 0119898 (119860)

= 1119896 sum1198601cap1198602capsdotsdotsdotcap119860119899=119860

1198981 (1198601) lowast 1198982 (1198602) sdot sdot sdot 119898119899 (119860119899) (20)

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 9

2 2 3

1 2

lt 3066 ge 3066

lt 176181 ge 176181 lt 001715 ge 001715

lt 0151858 ge 0151858

x5 x5

x1x1

x8x8

x2x2

Figure 5 Pruning of the decision tree

where 119896 = sum1198601cap1198602capsdotsdotsdotcap119860119899 =01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) or 119896 =1minussum1198601cap1198602capsdotsdotsdotcap119860119899=01198981(1198601)lowast1198982(1198602) sdot sdot sdot 119898119899(119860119899) which reflectsthe conflict degree between evidences

4 Results and Discussions

The experiments are conducted to predict good and outerand inner race fault conditions of the rolling bearing asdiscussed in Section 2 The diagnosis model in this articleshould undergo three steps whether it is a neural networkor decision tree First the relevant model is created with thetraining set Then the testing set is imported to simulateresults Finally simulation and actual results are comparedto obtain the fault diagnosis accuracy Hence each group ofexperimental data which are extracted from the vibrationsignals is separated into two parts Sixty samples are ran-domly selected for training and the remaining 30 samples areused for testing

41 Results of the Tree-PCA Sixty samples in different casesof fault severity have been fed into the C45 algorithmThe algorithm creates a leafy decision tree and the sampleclassification accuracy is usually high with the trainingset However the leafy decision tree is often overfitted orovertrained thus such a decision tree does not guarantee anapproximate classification accuracy for the independent test-ing set which may be lower Therefore pruning is requiredto obtain a decision tree with relatively simple structure (ieless bifurcation and fewer leaf nodes) Pruning the decisiontree reduces the classification accuracy of the training setbut improves that of the testing set The re-substitution andcross-validation errors are a good evidence of the changeSixty samples including 10 statistical features extracted fromthe vibration signals are used as input of the algorithm andthe output is the pruning of the decision tree as shown inFigure 5

Figure 5 shows that the decision tree has leaf nodeswhich stand for class labels (namely 1 as good 2 as outerrace fault and 3 as inner race fault) and decision nodeswhich stand for the capability of discriminating (namely 1199095as skewness 1199092 as kurtosis 1199091 as variance and 1199098 as RMS)

Table 3 Error values before and after pruning

Before pruning After pruningre-sub-err 001 007cross-val-err 009 008Averageaccuracy 8298 8409

Not every statistical feature can be a decision node whichdepends on the contribution of the entropy and informationgain Attributes that meet certain thresholds appear in thedecision tree otherwise they are discarded intentionallyThe contribution of 10 features is not the same and theimportance is not consistent Only four features appear in thetreeThe importance of the decision nodes decreases from topto bottomThe top node is the best node for classificationThemost dominant features suggested by Figure 5 are kurtosisRMS mean and variance

Re-substitution error refers to the difference betweenthe actual and predicted classification accuracy which isobtained by importing the training set into the model againafter creating the decision tree using the training set Thecross-validation error is an error value of prediction modelin practical application by cross-validation Both are used toevaluate the generalization capability of the predictionmodelIn this study the re-substitution error is expressed by ldquore-sub-errrdquo the cross-validation error is expressed by ldquocross-val-errrdquo and the average classification accuracy rate is expressedby ldquoaverage accuracyrdquo In the experiment we can obtain theresults shown in Table 3

Table 3 shows that the cross-val-err is approximatelyequal (008 asymp 009) and the re-sub-err after pruning is greaterthan before (007 gt 001) but the average accuracy of thefault of the testing set after pruning significantly improves(8409 gt 8298)

At the same time the PCA technique is used to reducethe dimension of statistical features The first four principalcomponents are extracted to create the decision tree accord-ing to the principle that the cumulative contribution rate ofeigenvalues is more than 85 Thus far the dimension of the

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

10 Journal of Sensors

2

1 3

lt 0619423 ge 0619423

lt minus072165 ge minus072165x1x1

x2 x2

Figure 6 Decision tree after dimension reduction

Table 4 Classification errors before and after dimensionalityreduction

Before PCA After PCAre-sub-err 007 003cross-val-err 008 008Averageaccuracy 8409 8656

Time (s) 202 133

statistical feature is reduced from 10 to 4 and the amount ofdata is significantly reduced The decision tree is constructedwith the first four main components as shown in Figure 6

Figure 6 shows that the testing set can be classifieddepending on the first (1199091) and second (1199092) principal com-ponents The remaining two principal components do notappear in the decision tree because their contribution valuedoes not reach the thresholdsWhen comparing Figure 5withFigure 6 the decision tree after dimension reduction is sim-pler and has fewer decision nodes than before Furthermorethe cross-val-err is equal and the average accuracy is not lowTable 4 shows the experimental results

Table 4 shows that the cross-val-err is equal (008 = 008)the re-sub-err of the decision tree after dimension reductionis lower (003 lt 007) and the average accuracy is slightlyhigher (8656 gt 8409) however the running time of theprogram is considerably lower (133 seconds lt 202 seconds)Therefore dimension reduction is necessary and effective forconstructing the decision tree especiallywithmany statisticalattribute values

42 Results of the LVQ-PCA The LVQ neural network be-longs to the feed-forward supervised neural network it isone of the most widely used methods in fault diagnosisThus LVQ neural network is used in this study to distinguishthe different fault states of the rolling bearing The trainingsamples are imported into the LVQ neural network Theinput layer contains 10 statistical characteristics which areextracted from the vibration signals The output layer is theclassification accuracy of the fault including three types offault namely good outer race fault and inner race fault

Best training performance is 0066667 at epoch 48

TrainBestGoal

3510 15 20 25 30 400 45548 epochs

10minus2

10minus1

100

Mea

n sq

uare

d er

ror (

MSE

)Figure 7 LVQ neural network training error convergence diagram

Meanwhile the design of the hidden layer is importantin the LVQ neural network thus it is defined by 119870 cross-validation method The initial sample is divided into 119870 sub-samples A subsample is retained as the validation data andthe other 119870 minus 1 subsamples are used for training The cross-validation process is repeated119870 times Each subsample is val-idated once and a single estimate is gained by considering theaverage value of119870 timesThismethod can avoid overlearningor underlearning state and the results are more convincingIn this study the optimal number of neurons in the hiddenlayer is 11 through 10-fold cross-validation which is mostcommonly used

Sixty samples are used for training and the remaining 30samples are used for testing in the LVQ neural network Thenetwork structure of 10-11-3 is used in the experiment Thenetwork parameters are set as follows

Maximum number of training steps 1000Minimum target training error 01Learning rate 01

The LVQ neural network is created to obtain the errorcurve as shown in Figure 7 To highlight the superiority ofLVQ a BP neural network is also created using the sameparameter settings and the error curve is shown in Figure 8

We used the mean squared error (MSE) as the evaluationmeasure which calculates the average squared differencebetween outputs and targets A lower value is better and zeromeans no error When comparing Figure 7 with Figure 8the BP neural network has less training time and smallerMSE Evidently the BP neural network is superior to LVQin this case However the BP neural network algorithm isessentially a gradient descent method It is a way of localoptimization and can easily fall into the local optimal solu-tion which is demonstrated by the results in Table 5 The

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 11

Table 5 Comparisons of classification performances of different models

Bearing condition Classification accuracyLVQ BP LVQ-PCA Decision tree Tree pruning Tree-PCA DS fusion

Good 1198651 () 700 91 875 855 862 824 904Outer race fault 1198652 () 667 111 789 727 740 979 966Inner race fault 1198653 () 1000 900 1000 907 921 794 949Average accuracy () 789 367 888 830 841 866 940Time (s) 23 16 15 26 20 13 42

Best training performance is 007486 at epoch 8

TrainBestGoal

63 80 521 4 78 epochs

10minus2

10minus1

100

101

Mea

n sq

uare

d er

ror (

MSE

)

Figure 8 BP neural network training error convergence diagram

maximum classification accuracy of BP neural network is900 and the minimum is 91 under the same data andnetwork parameters The gap is significantly large whichleads to the low average accuracy of approximately 367The classification accuracies of the LVQ neural network arenot much different from each other and the average accuracyis 789 This phenomenon indicates that the performanceof the LVQ neural network is better than that of BP neuralnetworks

Likewise better performance can be achieved by com-bining PCA and LVQ The original 10 feature attributes arereplaced by the four principal components and the othernetwork parameters are unchanged Figures 9 and 10 areobtained from the experiment

Figure 9 shows the ROC curve which is a plot of the truepositive rate (sensitivity) versus the false positive rate (1 minusspecificity) as the thresholds vary A perfect test would showpoints in the upper-left corner with 100 sensitivity and100 specificity Figure 10 shows the training regressionplot Regression value (R) measures the correlation betweenoutputs and targets 1 means a close relationship and 0meansa random relationship All ROC curves are in the upper-leftcorner and the value of 119877 is 096982 which is approximately

Training ROC

Class 1Class 2Class 3

0

02

04

06

08

1

True

pos

itive

rate

02 04 06 080 1False positive rate

Figure 9 Receiver operating characteristic (ROC) plot

equal to 1 Therefore the network performs well The classi-fication accuracy of LVQ-PCA further illustrated it as shownin Table 5

43 Results of the Fusion Model The decision tree and theLVQ neural network are widely used in fault diagnosis dueto their simplicity and good generalization performanceHowever their classification accuracy is still dependent onthe datasets and may get unsatisfactory performance Tosolve this problem the DS evidence theory is introducedin this study The target recognition frame 119880 is establishedby considering the three states of the bearing good (1198651)outer race fault (1198652) and inner race fault (1198653) Each faultsample belongs to only one of the three failure modes andis independent The outputs of the LVQ neural networkare used as evidence 1 and those of decision tree are usedas evidence 2 then data fusion is performed based on theaforementioned DS method The experiment has been run20 times to reduce the occurrence of extreme values andto obtain reliable results The classification accuracies on

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

12 Journal of Sensors

Training R = 096283

15 25 321Target

DataFitY = T

1

15

2

25

3

targ

et +

02

6lowast

ONJONsim=

088

Figure 10 Regression plot

Boxplot of DS fusion results

86

88

90

92

94

96

98

100

Accu

racy

()

-trainF1 -testF1 -trainF2 -testF2 -trainF3 -testF3

Figure 11 Boxplot of DS fusion results

the training and testing sets for each run are recorded andthe final performance comparisons are plotted as a boxplot(Figure 11)

Figure 11 shows that the accuracy on the training setsof three types of faults fluctuates slightly around 98 Theaccuracy on 1198651-train is low and has an outlier The accuracyon 1198652-train is on the higher side with a maximum of up

to 100 The accuracy on 1198653-train is between those of 1198651-train and1198652-trainwith small variationThe small variations ofthe accuracy on the training sets indicate that the predictionmodels are stableThe accuracy on the testing sets is relativelyscattered The accuracy on 1198651-test is concentrated on 90and has an exception of up to 97 The accuracy on 1198652-test is concentrated on 97 while the accuracy on 1198653-test isnear 94 The average value of the 20 experimental resultsis considered as the final result of data fusion to reduce theerror as shown in Table 5

Table 5 presents the results of all the seven algorithmsused in the present study Each of the algorithms hasbeen run 20 times with different partition of the trainingand test datasets and the average prediction accuracy forthree fault types is recorded First it was found that theBP neural network falls into local optimal solutions asits average accuracy is only 367 (see second column ofTable 5) Therefore we conclude it as a failing method inour experiments The average accuracy of the LVQ neuralnetwork has increased from 789 to 888 after applyingdimension reduction The performance of the decision treeimproves slightly using pruning (from 830 to 841) butincreases to 866 through combining PCA and decisiontree The results indicate that dimensionality reduction is aneffective means to improve prediction performance for bothbase classification models The DS fusion model proposed inthis study achieved an average accuracy of 940 by fusingpredictions of LVQ-PCA and Tree-PCA which is the bestcompared to all other 6 base methods This demonstrates thecapability of DS fusion to take advantage of complementaryprediction performance of LVQ and decision tree classifiersThis can be clearly seen from the second row and the thirdrow which show the performance of the algorithms onpredicting outer race fault and inner race fault In the priorcase the Tree-PCA achieves a higher performance (979)compared to LVQ-PCA (789) while in the latter caseLVQ-PCA achieved an accuracy of 1000 compared to794 of Tree-PCA Through the DS fusion algorithm theprediction performances are 966 and 949 respectivelywhich avoids the weakness of either of the base models inpredicting some specific fault types

5 Conclusion

We have developed a DS evidence theory based algorithmfusion model for diagnosing the fault states of rollingbearings It combines the advantages of the LVQ neuralnetwork and decision tree This model uses vibration signalscollected by the accelerometer to identify bearing failuresTen statistical features are extracted from the vibration signalsas the input of the model for training and testing To improvethe classification accuracy and reduce the input redundancythe PCA technique is used to reduce 10 statistical features to4 principal components

We compared different methods in terms of their faultclassification performance using the same dataset Experi-mental results show that PCA can improve the classificationaccuracy of LVQ neural network inmost cases but not alwaysfor the decision tree Both LVQ neural network and decision

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

Journal of Sensors 13

tree do not achieve good performance for some classesThe proposed DS evidence theory based fusion model fullyutilizes the advantages of the LVQ neural network decisiontree PCA and evidence theory and obtains the best accuracycompared with other signal models Our results show thatthe DS evidence theory can be used not only for informationfusion but also for model fusion in fault diagnosis

The accuracy of the prediction models is important inbearing fault diagnosis while the convergence speed and therunning time of the algorithms also need special attentionespecially in the case of large number of samples The resultsin Table 5 show that the fusion model has the highest classifi-cation accuracy but takes the longest time to run Thereforeour future research is not only to ensure the accuracy but alsoto speed up the convergence and reduce the running time

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (51475097)Ministry of Industry and IntelligentManufacturing Demonstration Project (Ministry of Industry[2016]213) and Program of Guizhou Province of China (nosJZ[2014]2001 [2015]02 and [2016]5103)References

[1] P N Saavedra and C G Rodriguez ldquoAccurate assessment ofcomputed order trackingrdquo Shock Vibration vol 13 no 1 pp 13ndash32 2006

[2] J Sanz R Perera and C Huerta ldquoFault diagnosis of rotatingmachinery based on auto-associative neural networks andwavelet transformsrdquo Journal of Sound and Vibration vol 302no 4-5 pp 981ndash999 2007

[3] B Zhou andYCheng ldquoFault diagnosis for rolling bearing undervariable conditions based on image recognitionrdquo Shock andVibration vol 2016 Article ID 1948029 2016

[4] C Li R-V Sanchez G Zurita M Cerrada and D CabreraldquoFault diagnosis for rotating machinery using vibration mea-surement deep statistical feature learningrdquo Sensors vol 16 no6 article no 895 2016

[5] O Taouali I Jaffel H LahdhiriM F Harkat andHMessaoudldquoNew fault detectionmethod based on reduced kernel principalcomponent analysis (RKPCA)rdquo The International Journal ofAdvanced Manufacturing Technology vol 85 no 5-8 pp 1547ndash1552 2016

[6] J Wodecki P Stefaniak J Obuchowski A Wylomanska andR Zimroz ldquoCombination of principal component analysis andtime-frequency representations of multichannel vibration datafor gearbox fault detectionrdquo Journal of Vibroengineering vol 18no 4 pp 2167ndash2175 2016

[7] J-H Cho J-M Lee S W Choi D Lee and I-B Lee ldquoFaultidentification for process monitoring using kernel principalcomponent analysisrdquo Chemical Engineering Science vol 60 no1 pp 279ndash288 2005

[8] V H Nguyen and J C Golinval ldquoFault detection based onKernel Principal Component Analysisrdquo Engineering Structuresvol 32 no 11 pp 3683ndash3691 2010

[9] N E I Karabadji H Seridi I Khelf N Azizi and R Boulk-roune ldquoImproved decision tree construction based on attributeselection and data sampling for fault diagnosis in rotatingmachinesrdquo Engineering Applications of Artificial Intelligence vol35 pp 71ndash83 2014

[10] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoTheCART decision tree for mining data streamsrdquo Information Sci-ences vol 266 pp 1ndash15 2014

[11] M Amarnath V Sugumaran andH Kumar ldquoExploiting soundsignals for fault diagnosis of bearings using decision treerdquoMeasurement vol 46 no 3 pp 1250ndash1256 2013

[12] A Krishnakumari A Elayaperumal M Saravanan and CArvindan ldquoFault diagnostics of spur gear using decision treeand fuzzy classifierrdquo The International Journal of AdvancedManufacturing Technology vol 89 no 9-12 pp 3487ndash3494 2017

[13] J Rafiee F Arvani A Harifi and M H Sadeghi ldquoIntelligentcondition monitoring of a gearbox using artificial neural net-workrdquoMechanical Systems amp Signal Processing vol 21 no 4 pp1746ndash1754 2007

[14] M F Umer and M S H Khiyal ldquoClassification of textual doc-uments using learning vector quantizationrdquo Information Tech-nology Journal vol 6 no 1 pp 154ndash159 2007

[15] P Melin J Amezcua F Valdez and O Castillo ldquoA new neuralnetwork model based on the LVQ algorithm for multi-classclassification of arrhythmiasrdquo Information Sciences vol 279 pp483ndash497 2014

[16] A Kushwah S Kumar and R M Hegde ldquoMulti-sensor datafusion methods for indoor activity recognition using temporalevidence theoryrdquo Pervasive and Mobile Computing vol 21 pp19ndash29 2015

[17] O Basir andXH Yuan ldquoEngine fault diagnosis based onmulti-sensor information fusion using Dempster-Shafer evidencetheoryrdquo Information Fusion vol 8 no 4 pp 379ndash386 2007

[18] D Bhalla R K Bansal and H O Gupta ldquoIntegrating AI basedDGA fault diagnosis using dempster-shafer theoryrdquo Interna-tional Journal of Electrical Power amp Energy Systems vol 48 no1 pp 31ndash38 2013

[19] A Feuerverger Y He and S Khatri ldquoStatistical significance ofthe Netflix challengerdquo Statistical Science A Review Journal of theInstitute of Mathematical Statistics vol 27 no 2 pp 202ndash2312012

[20] C Delimitrou and C Kozyrakis ldquoThe netflix challenge Data-center editionrdquo IEEE Computer Architecture Letters vol 12 no1 pp 29ndash32 2013

[21] X Hu ldquoThe fault diagnosis of hydraulic pump based on thedata fusion of D-S evidence theoryrdquo in Proceedings of the 20122nd International Conference on Consumer Electronics Com-munications and Networks CECNet 2012 pp 2982ndash2984 IEEEYichang China April 2012

[22] X Sun J Tan Y Wen and C Feng ldquoRolling bearing faultdiagnosis method based on data-driven random fuzzy evidenceacquisition and Dempster-Shafer evidence theoryrdquo Advances inMechanical Engineering vol 8 no 1 2016

[23] KHHuiMH LimM S Leong and SMAl-Obaidi ldquoDemp-ster-Shafer evidence theory for multi-bearing faults diagnosisrdquoEngineering Applications of Artificial Intelligence vol 57 pp160ndash170 2017

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

14 Journal of Sensors

[24] H Jiang R Wang J Gao Z Gao and X Gao ldquoEvidencefusion-based framework for condition evaluation of complexelectromechanical system in process industryrdquo Knowledge-Based Systems vol 124 pp 176ndash187 2017

[25] N R Sakthivel V Sugumaran and S Babudevasenapati ldquoVi-bration based fault diagnosis of monoblock centrifugal pumpusing decision treerdquo Expert Systems with Applications vol 37no 6 pp 4040ndash4049 2010

[26] H Talhaoui A Menacer A Kessal and R Kechida ldquoFast Fou-rier and discrete wavelet transforms applied to sensorless vectorcontrol induction motor for rotor bar faults diagnosisrdquo ISATransactions vol 53 no 5 pp 1639ndash1649 2014

[27] A Rai and S H Upadhyay ldquoA review on signal processing tech-niques utilized in the fault diagnosis of rolling element bear-ingsrdquo Tribology International vol 96 pp 289ndash306 2016

[28] Z K Peng and F L Chu ldquoApplication of the wavelet transformin machine condition monitoring and fault diagnostics areview with bibliographyrdquoMechanical Systems amp Signal Process-ing vol 18 no 2 pp 199ndash221 2004

[29] T L Chen G Y Tian A Sophian and P W Que ldquoFeatureextraction and selection for defect classification of pulsed eddycurrent NDTrdquoNdt amp E International vol 41 no 6 pp 467ndash4762008

[30] D Nova and P A Estevez ldquoA review of learning vector quan-tization classifiersrdquoNeural Computing and Applications vol 25no 3 pp 511ndash524 2014

[31] O Kreibich J Neuzil and R Smid ldquoQuality-based multiple-sensor fusion in an industrial wireless sensor network forMCMrdquo IEEE Transactions on Industrial Electronics vol 61 no9 pp 4903ndash4911 2014

[32] P Kumari and A Vaish ldquoFeature-level fusion of mental taskrsquosbrain signal for an efficient identification systemrdquo NeuralComputing and Applications vol 27 no 3 pp 659ndash669 2016

[33] K Gupta S N Merchant and U B Desai ldquoA novel multistagedecision fusion for cognitive sensor networks using AND andOR rulesrdquo Digital Signal Processing vol 42 pp 27ndash34 2015

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Improving Rolling Bearing Fault Diagnosis by DS Evidence ...downloads.hindawi.com/journals/js/2017/6737295.pdf · Rolling bearing Signal processing unit USB Figure2:Schematicofthesetup

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of