deep domain adaptation model for bearing fault diagnosis

Research ArticleDeep Domain AdaptationModel for Bearing Fault Diagnosis withDomain Alignment and Discriminative Feature Learning

Jing An 12 Ping Ai 2 and Dakun Liu1

1School of Mechanical Engineer Yancheng Institute of Technology Yancheng 224051 China2College of Computer and Information Engineering Hohai University Nanjing 211100 China

Correspondence should be addressed to Jing An anjing991982163com

Received 18 November 2019 Accepted 17 February 2020 Published 20 March 2020

Academic Editor Felix Albu

Copyright copy 2020 Jing An et alis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Deep learning techniques have been widely used to achieve promising results for fault diagnosis In many real-world faultdiagnosis applications labeled training data (source domain) and unlabeled test data (target domain) have different distributionsdue to the frequent changes of working conditions leading to performance degradation is study proposes an end-to-endunsupervised domain adaptation bearing fault diagnosis model that combines domain alignment and discriminative featurelearning on the basis of a 1D convolutional neural network Joint training with classification loss center-based discriminative lossand correlation alignment loss between the two domains can adapt learned representations in the source domain for application tothe target domain Such joint training can also guarantee domain-invariant features with good intraclass compactness andinterclass separability Meanwhile the extracted features can efficiently improve the cross-domain testing performance Ex-perimental results on the Case Western Reserve University bearing datasets confirm the superiority of the proposed method overmany existing methods

1 Introduction

Traditional machine learning techniques especially deeplearning have recently made great achievements in thedata-driven fault diagnosis field [1ndash6] Most machinelearning methods assume that the training data (sourcedomain) and test data (target domain) must be in the sameworking condition and have the same distribution andfeature space However in many real-world workingconditions the distribution of source domain samples isdifferent from that of target domain samples resulting inperformance degradation

To address this challenge the main research on domainadaptation techniques focuses on how a machine learningmodel built in a source domain can be adapted in a differentbut related target domain which is necessary to avoid re-construction efforts In the field of knowledge engineeringmany beneficial and promising examples with domain ad-aptation have been found including image classification

object recognition natural language processing and featurelearning [7ndash10]

In recent years considerable research has been con-ducted on domain adaptation on the basis of deep archi-tectures Most published deep domain adaptation works canbe roughly divided into three categories [11] (1) discrep-ancy-based (2) adversarial adaptation and (3) recon-struction-based methods

Typical discrepancy-based methods are shown in[12 13] ey are usually implemented by adding a loss tominimize the distribution discrepancy between the sourceand target domains in the shared feature space For exampleTzeng et al [12] applied a single linear kernel to one layer forminimizing the maximum mean discrepancy (MMD)whereas Long et al [13 14] minimized MMD by applyingmultiple kernels to multiple layers across domains Anotherimpressive work is Deep Coral [15] which extends CORAL[16] to deep architectures and aligns the second-orderstatistics of the source and target distributions

HindawiShock and VibrationVolume 2020 Article ID 4676701 14 pageshttpsdoiorg10115520204676701

Another increasingly popular work is adversarial do-main adaptation methods which include adversarial dis-criminative and generative methods e former aims toencourage domain confusion through an adversarial ob-jective with respect to a domain discriminator Tzeng et al[17] proposed a unified adversarial domain adaptationframework that combines discriminative modeling untiedweight sharing and a Generative Adversarial Network(GAN) loss [18] Among the discriminative models Tzenget al proposed a model with confusion loss [19] and alsoconsidered an inverted label GAN loss [17] whereas Ganinet al [20] proposed a model with minimax loss e lattercombines the discriminative model with a generativecomponent on the basis of GANs Liu and Tuzel [21]developed a Coupled Generative Adversarial Network(CoGAN) that adopts two GANs each corresponding toone of the domains and the CoGAN learns a joint dis-tribution of multidomain samples and enforces a weight-sharing constraint to limit the network capacity emethod presented in [22] also adopts GANs to generatesource domain images to appear as if drawn from the targetdomain

Typical reconstruction-based methods can be seen in[23ndash25] Data reconstruction can be viewed as an auxiliarytask to support the adaptation of the label predictionGhifary et al [23] combined the standard convolutionalnetwork for source label prediction with a deconvolutionalnetwork [26] for target data reconstruction Bousmalis et al[24] introduced the notion of private and shared subspacesfor each domain Meanwhile a reconstruction loss is inte-grated in the model by using a shared decoder which learnsto reconstruct the input sample through domain-specificand share features Tan et al [25] presented a selectivelearning algorithm that uses the reconstruction error toselect useful unlabeled data from intermediate domains

Despite the success achieved by domain adaptationlimited research can be found with respect to its applicationon fault diagnosis Zhang et al [27] took raw vibrationsignals as inputs of a deep convolutional neural networkwith a wide first-layer kernel convolutional neural network(WDCNN) model ey also used adaptive batch normal-ization (AdaBN) as the algorithm of domain adaptation torealize fault diagnosis under different load conditions andnoisy environments Lu et al [28] introduced a deep CNNmodel with domain adaptation for fault diagnosis and thismodel integrates MMD as the regularization term into theloss function of the model to reduce the cross-domaindistribution difference Zhang et al [29] developed anadversarial domain adaptation model which comprises asource feature extractor a target feature extractor a domaindiscriminator and a label classifier for fault diagnosis Jianet al [30] proposed a fusion CNN model that combines1DCNN and DempsterndashShafer evidence theory to enhancethe cross-domain adaptive capability for fault diagnosisTong et al [31] proposed a bearing fault diagnosis domainadaptation method to find transferable features across do-mains which were obtained by reducing marginal andconditional distributions simultaneously based on MMD Liet al [32] presented a deep domain adaptation method for

bearing fault diagnosis on the basis of the multikernelmaximummean discrepancies between domains in multiplelayers to learn representations from the source domainapplied to the target domain Furthermore Han et al [33]proposed a new intelligent fault diagnosis framework whichextends the marginal distribution adaptation to joint dis-tribution adaptation and guarantees an accurate distributionadaptation

Improving CNN performance by learning additionaldiscriminative features has become a recent trend For ex-ample contrastive loss [34] and center loss [35] are pre-sented to learn discriminative deep features for faceverification and recognition Furthermore Liu et al [36]proposed a large-margin softmax loss to extend the softmaxloss to large margin softmax which leads to a large angularseparability between the learned features Chen et al [37]proposed two discriminative feature learning approachesnamely instance-based and center-based discriminativelosses with joint domain alignment and discriminativefeature learning

Inspired by these methods we propose a novel deepdomain adaptation model for bearing fault diagnosis withDeep Coral and center-based discriminative feature learn-ing By combining domain alignment and discriminativefeature learning the domain-invariant features extracted bythe model can be well clustered and separable which canclearly contribute to domain adaptation and classification

e main contributions of this work include thefollowing

(1) An end-to-end method with domain adaptation forfault diagnosis is proposed that is CACD-1DCNNis method directly works on raw temporal signalsand does not require time-consuming denoisingpreprocessing and a separate feature extractionalgorithm

(2) By combining domain alignment and discriminativefeature learning CACD-1DCNN aims to extractdomain-invariant features with improved intraclasscompactness and interclass separability and guar-antees high classification performance in the twodomains

(3) Extensive experiments on the Case Western ReserveUniversity (CWRU) bearing datasets demonstratethat CACD-1DCNN achieves superior diagnosisperformance over existing baseline methods

(4) Furthermore network visualization and loss analysisprovide an intuitive presentation of the adaptationresults and verify the effectiveness of our method

e remaining parts are organized as follows InSection 2 the domain adaptation problem of fault diag-nosis is formulated and the basic theories of Deep Cor-relation Alignment and Center-Based Discriminative Lossare introduced In Section 3 the proposed intelligent faultdiagnosis method CACD-1DCNN is presented ecomparison methods the experiments and discussion aregiven in Section 4 Finally the conclusions are drawn inSection 5

2 Shock and Vibration

2 Theoretical Background

21 Problem Formulation Traditional machinery fault di-agnosis aims to identify the fault location and severity of theunknown fault set on the basis of a prior known fault setTaking fault diagnosis as an example the collected labeled rawvibration temporal signals are taken as the source domainsamples and the collected unlabeled raw vibration temporalsignals as the target domain samples e assumption is thatthe distributions of the source and target domains are thesame and the learned fault patterns obtained from the labeledtraining data can be directly applied to the unlabeled test dataHowever a difference inevitably exists between the source andtarget domains in practical tasks thereby deteriorating themodel generalization capability across domains ereforethe domain adaptation problem for fault diagnosis hasattracted increasing attention is study focuses on theproblem of unsupervised domain adaptation that is thetarget domain data have no label

Studies on the unsupervised domain adaptation problemof rolling bearing fault diagnosis are generally conductedunder the following assumptions

(1) e source and target domains are related to eachother but have different distributions

(2) For different domains the fault diagnosis task is thesame which is to share class labels

(3) e labeled samples from the source domain areused for training whereas the unlabeled samplesfrom the target domain are available for training andtesting

Formally domain D is composed of m-dimensionalfeature spaceX sub Rm with marginal probability distributionP(X) where X x1 xn1113864 1113865 isin X Task J consists of twocomponents label space Y and predictive function f(X)corresponding to the labels f(X) Q(Y|X) is also theconditional probability distribution and Y isin Y where Y

represents the possible machine health condition e fol-lowing is the formal definition of the unsupervised domainadaptation problem for fault diagnosis Given labeled sourcedomain dataset DS (xs

i ysi )1113864 1113865|

Ns

i1 and unlabeled targetdataset DT (xt

i)1113864 1113865|Nt

i1 where Ns and Nt are the numbers ofsamples of the source and target domains respectively eeigenspaces of DS and DT (that isXs Xt) the label space(that is Ys Yt) and conditional probability distributionQs(ys | xs) Qt(yt | xt) are assumed to be the sameHowever the marginal probability distributions of the twodomains that is Ps(xs)nePt(xt) are different Unsuper-vised domain adaptation aims to use labeled DS to learnclassifier f(xt)⟶ yt for predicting labels yt of DT whereyt isinYt

22 Deep Correlation Alignment To fill the gap between thedomains CORAL loss is adopted by aligning the second-order statistics of the source and target features In theactivations computed at a given layer xs

i and xti are d-di-

mensional representations e domain discrepancy lossmeasured by CORAL loss (LCA) [16] as shown below

minimizes the distribution discrepancy between the second-order statistics (covariance) of the source and target features

LCA 14d2 Cs minus CT

2F (1)

where 2F denotes the squared matrix Frobenius norm andCS and CT are the covariance matrices of the source andtarget features respectively According to reference [16] CS

and CT are respectively computed as follows

CS ASJATS

CT ATJATT

(2)

where J is the centering matrix [38] Taking the sourcedomain as an example J is a matrix of NS times NS and it isderived as follows

Jii 1

NS

Jij 1

NS NS minus 1( 1113857

ine j

(3)

e training process is realized by mini-batch StochasticGradient Descent (SGD) in which only a batch of trainingsamples is aligned in each iteration

23 Center-Based Discriminative Loss To make the deepfeatures learned by the deep CNN model further discrim-inative center-based discriminant loss LCD is adopted [37]which is different from center loss [35] e latter penalizesthe distance of each sample to the center of the corre-sponding class whereas the former not only has the char-acteristic of center loss but also enforces large marginsamong centers across different categories Center-baseddiscriminant loss is defined as follows

LCD β1113944

ns

i1max 0 f

si minus cyi

2

2minus m11113874 1113875

+ 1113944c

ij1inejmax 0 m2 minus ci minus cj

2

21113874 1113875

(4)

e loss is composed of two items e first item is usedto measure the intraclass compactness whereas the seconditem is used to measure the interclass separability β is thetrade-off parameter and m1 and m2 are the two constraintmargins fs

i isin Rd represents the deep features of the i-thtraining sample in the fully connected layer and n is thenumber of neurons of the fully connected layercyjisin Rd denotes the class center of the yi-th sample cor-

responding to the deep features yi isin 1 2 c where c isthe number of classes

Equation (4) shows that the center-based discriminativeloss forces the distance between intraclass samples to be nomore than m1 and the distance between the interclasssamples to be no less than m2 Obviously this penalty itemcan make the deep features further discriminative

Shock and Vibration 3

Ideally class center cyjshould be calculated by averaging

the deep features of all samples which is evidently inefficientand unrealistic In practical applications we update thecentral point through the mini-batch training samples Ineach iteration we calculate the central point by averaging thefeatures of the corresponding class e updated formula ineach iteration is presented as follows

Δcj

1113936bi1 δ yi j( 1113857 cj minus fs

i1113872 1113873

1 + 1113936bi1 δ yi j( 1113857

(5)

ct+1j c

tj minus λΔct

j (6)

When the condition is true δ(condition) 1 and 0otherwise b denotes the batch size and λ is the learning rateEvery class center is initialized as the ldquobatch class centerrdquo inthe first iteration and it is updated according to (5) and (6)for the next batch of samples in each iteration

3 Fault Diagnosis Framework Basedon CACD-1DCNN

31 CACD-1DCNNFaultDiagnosisModel CACD-1DCNN isproposed to solve the cross-domain learning problem in thebearing fault diagnosis area As illustrated in Figure 1 takingCNN as the main architecture the two-stream CNN ar-chitecture with shared weights is adopted and the modelemploys a domain adaptation layer with correlation align-ment and center-based losses before the classifier

As the input of the two-stream the labeled source andunlabeled target data are fed into the CACD-1DCNNmodelduring the training process Subsequently domain-invariantfeatures with discriminative raw vibration signals areextracted through the multiple convolutional and poolinglayers e distribution discrepancy is minimized at the lastfully connected layer eoretically correlation alignmentcan be performed at multiple layers in parallel Empiricalevidence [15 39] indicates that a solid performance is ob-tained even if this alignment is conducted only once As acommon practice correlation alignment loss (LCA) is per-formed after the last fully connected layer Similarly center-based discriminative loss (LCD) is generally placed after thelast fully connected layer erefore the two kinds of lossfunctions in the proposed model are trained on the basis ofthe features extracted by the last fully connected layer Inaddition to the conventional softmax loss function (Ls)based on source domain the loss function of the CACD-1DCNN model is defined as follows

L LS + αLCA + βLCD (7)

where α and β are the trade-off parameters for balancing thecontributions of the domain discrepancy and discriminativelosses

Only the source domain data are discriminated hereDuring the training process the source features are dis-criminant learning and aligned with the target features Jointtraining with classification correlation alignment andcenter-based discriminative losses between the two domains

in the last fully connected layer can adapt the learnedrepresentations in the source domain for application to thetarget domain is joint training can also guarantee do-main-invariant features with improved intraclass com-pactness and interclass separability Meanwhile theextracted features can efficiently improve the cross-domaintesting performance

32 Architecture Designs of 1DCNN Considering that thebearing vibration signals collected by acceleration sensorsare usually 1D using a 1DCNN is reasonable for the pro-cessing of vibration signals In this study the 1DCNN isadopted to deal with bearing fault diagnosis e networkstructure is composed of four convolution and poolinglayers two fully connected layers and a softmax layer at theend e first convolutional layer uses wide kernel forextracting feature and suppressing high-frequency noiseSmall convolutional kernels in the following layers are usedto deepen the network for multilayer nonlinear mapping andpreventing overfitting [27] e parameters of 1DCNN arepresented in Table 1 e pooling type is max pooling andthe activation function is ReLU To minimize the lossfunction the Adam stochastic optimization algorithm isapplied to train our model and the learning rate is set to1eminus 4 e experiments are conducted using the TensorFlowtoolbox of Google

33 Data Augmentation Without sufficient training sam-ples the model can easily result in overfitting Data aug-mentation techniques are commonly used in computervision to increase the generalization of networks by addingthe number of training samples In fault diagnosis the vi-bration signals collected by the acceleration sensor is 1D andoverlap sampling can easily obtain a large number of data byslicing the training samples with overlap Figure 2 illustratesa vibration signal with 120000 points We can take 2048data points from this signal as a sample We can also offset itby a certain amount to be the second sample

4 Experimental Analysis of the ProposedCACD-1DCNN Model

41 Data Description e bearing fault data used for ex-perimental validation were obtained from the Bearing DataCenter of CWRU [40]e data were collected from a motordriving mechanical system under four different loads (0 1 2and 3 hp) and three different locations (fan end drive endand base) e sampling frequency includes 48 and 12 kHze bearing has three fault types outer race fault (OF) innerrace fault (IF) and roller fault (RF) Each fault type containsfault diameters of 0007 0014 and 0021 inches respectivelythere are also normal condition (N) a total of 10 healthstates

In this study vibration signals of different fault locationsand different health states with a sampling frequency of12 kHz at the driving end of rolling bearing are selected forexperimental research e detailed description of thedatasets is shown in Table 2 ree datasets are acquired


Table 1 Details of 1DCNN architecture used in experiments

No Layer type Kernel size Stride Kernel number Output Padding1 Convolution1 32times1 8times1 32 253times 32 Yes2 Pooling1 2times1 2times1 32 126times 32 No3 Convolution2 3times1 2times1 64 124times 64 Yes4 Pooling2 2times1 2times1 64 62times 64 No5 Convolution3 3times1 2times1 64 60times 64 Yes6 Pooling3 2times1 2times1 64 30times 64 No7 Convolution4 3times1 1times 1 64 28times 64 Yes8 Pooling4 2times1 2times1 64 14times 64 No9 Fully connected1 100 1 100times110 Fully connected2 64 1 64times111 Softmax 10 1 10

LCA

LCD

LSSource samples

Target samples

Fc5fullyconnected layer




Shar

e

Shar

e

Shar

e

Shar

e

Share Share

Conv1firstlayer withwide conv

kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels

Figure 1 Architecture of CACD-1DCNN

Overlap Shi

Training samples

Sample 1Sample 2Sample n

Figure 2 Schematic of overlap sampling


under three loads of 1 2 and 3HP respectively Each largedataset contains training and testing samples and eachsample contains 2048 data points To increase the number oftraining samples the overlap sampling technique is used Inthis study the training samples are overlapped to augmentdata However no overlapping is observed in the testing seterefore each dataset is composed of 6600 trainingsamples and 250 test samples of 10 health states

42 Accuracy across Different Domains e source domainsamples have labels whereas the target domain samples havenone Owing to the three domains the experiments areconducted in six domain transfer scenarios A⟶BA⟶C B⟶C C⟶B C-⟶A and B⟶A TakingA⟶B as an example dataset A is the source domainwhereas dataset B is the target domain

Comparison Methods e proposed method is comparedwith several successful machine learning methods to verifythe effectiveness of the CACD-1DCNN model

(1) SVM(2) Multilayer perceptron (MLP)(3) Deep neural network (DNN) [1](4) WDCNN proposed in [27] and the domain adap-

tation capacity from the AdaBN(5) OFNN-DE proposed in [30](6) Adversarial adaptive model based on 1-D CNN

(A2CNN) [29]

1ndash3 and 6 are methods that work with the data trans-formed through fast Fourier transform whereas 4 and 5 areCNN-based methods that work with normalized raw signalsNotably the OFNN in [30] is not used here because theOFNN uses the diagnostic result of data fusion between thedrive-end and fan-end datasets which are different from thedatasets used in the experiments in this research By con-trast the datasets used by OFNN-DE are the same as thedatasets used in the experiments in this study Furthermorethe diagnosis accuracy of the OFNN is slightly lower thanthat of the method used in this research

For a fair comparison we adopt the accuracy reported byother authors with the same setting or conduct experimentsby using the source code provided by the authors

A total of 10 experiments are conducted for each domaintransfer scenario to reduce the influence of random factors

e experimental results of the six domain scenarios aredisplayed in Figure 3 In the domain shift scenarios ofA⟶B A⟶C B⟶C and C⟶B the test accuracy ofeach scenario reaches 100 In the domain shift scenarios ofC⟶A and B⟶A the test accuracies exceed 976 and972 respectively ese results show that the domainadaptation performance of the proposed method is re-markable and stable

e comparison with other approaches is shown inFigure 4 e average performance of the CACD-1DCNN isbetter than that of the A2CNN and six other baselinemethodse CACD-1DCNN also achieves the state-of-the-art average accuracy of domain adaptation in all domaintransfer scenarios

As illustrated in Figure 4 the performance of SVMMLP and DNN in domain adaptation is poor with averageaccuracies of 6663 8040 and 7805 in the six sce-narios respectively ese results suggest that the sampledistribution differs under varying conditions and the modeltrained in one working condition is unsuitable for faultdiagnosis in another condition

Compared with recent approaches such as OFNN-DEand A2CNN our method achieves an average accuracy of9947 which is higher than those of OFNN-DE andA2CNN with average accuracies of 9873 and 9921respectively is result shows that the features learned bythe proposed method have better domain invariance andfault discrimination than those learned by other methods

In five out of six shifts that is A⟶B A⟶C B⟶CC⟶B and C⟶A the fault diagnosis accuracy of theproposed method achieves state-of-the-art domain adap-tation performance and reaches up to 100 in the first fourdomain shifts In the domain transfer scenario of B⟶Athe accuracy of the proposed method is 98 which is re-spectively 018 and 05 lower than those of the A2CNNandOFNN-DEmethods and far better than the accuracies ofthe SVM MLP DNN andWDCNNmethods On this basisthe CACD-1DCNN can well learn domain-invariant andfault-discriminate features and effectively solve the domainadaptation problem caused by different loads of bearingdata

Taking the domain shift scenarios of C⟶B C⟶Aand B⟶A as examples for the CACD-1DCNN model wecompare the test accuracy of the target domain under thefour loss functions of Ls Ls + LCA Ls + LCD and Ls + LCA +

LCD (for simplicity the coefficients of each loss function isomitted) as presented in Table 3 e results of other shifts

Table 2 Description of 12 kHz drive-end bearing datasets

Fault location None RF IF OFCategory labels 0 1 2 3 4 5 6 7 8 9

Fault diameter (inch) 0 0007 0014 0021 0007 0014 0021 0007 0014 0021

Dataset A (1HP) Train 660 660 660 660 660 660 660 660 660 660Test 25 25 25 25 25 25 25 25 25 25

Dataset B (2HP) Train 660 660 660 660 660 660 660 660 660 660Test 25 25 25 25 25 25 25 25 25 25

Dataset C (3HP) Train 660 660 660 660 660 660 660 660 660 660Test 25 25 25 25 25 25 25 25 25 25


are similar We observe that the target test accuracy is worstwith the loss function of Ls because no adaptive strategy isadopted between the source and target domains e target

test accuracy is in the middle level with the loss functions ofLs + LCA and Ls + LCD and the target test accuracy is thehighest with the loss function of Ls + LCA + LCD erefore

ArarrB ArarrC BrarrC CrarrB CrarrA BrarrA AvgSVM 6860 6000 6760 6200 6840 7320 6663MLP 8210 8560 8240 7900 8180 7150 8040DNN 8220 8260 7700 7730 7690 7230 7805WDCNN 9920 9100 9150 8510 7810 9510 9000

WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595

OFNN-DE 100 9962 9990 9999 9437 9850 9873A2CNN 9999 9930 9990 9999 9793 9818 9921CACD-1DCNN 100 100 100 100 9880 9800 9947

5000

6000

7000

8000

9000

10000

Acc

urac

y (

)

Figure 4 Accuracy on six domain shifts of the proposed and compared methods

095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA

Figure 3 10 diagnosis results of six domain-shift scenarios of the proposed method (a) A⟶B (b) A⟶C (c) B⟶C (d) C⟶B(e) C⟶A and (f) B⟶A

Table 3 Accuracy of the CACD-1DCNN model under four different loss functions

LossesDomain shifts

C⟶B C⟶A B⟶A AverageLs 0948 0916 096 0941Ls + LCA 1 0972 0976 0983Ls + LCD 1 0976 0972 0983Ls + LCA + LCD 1 0988 098 0989


in the case of Ls + LCA and Ls + LCD the model classificationperformance is comparable and only when jointly super-vised by the three loss functions can the proposed modelachieve the best performance

e results confirm that (1) the CACD-1DCNN is ef-fective in filling the domain gap and (2) LCD is added on thebasis of LCA that is by combining domain alignment anddiscriminative feature learning the proposed model guar-antees that the domain-invariant features are extracted withimproved intraclass compactness and interclass separabilitye gap between the class clusters and the hyperplane islarge which is conducive to the correct classification oftarget samples near the edge or far from their correspondingclass centers

Furthermore taking the domain shift scenario of C⟶Aas an example the accuracy of the training and testing stagesin the case of the joint loss of Ls + LCA + LCD is illustrated inFigure 5 is approach clearly helps us to achieve improvedperformance in the target domain while maintaining a strongclassification accuracy in the source domain

Taking the domain shift of C⟶A as an example thedomain loss and intraloss are analyzed Notably the pro-posed model adopts batch values Although batch valuescannot fully represent the distance between the entire sourceand target domains it is a practical and fast approximationmethod for classifying samples

e domain loss under the four loss functions is illus-trated in Figure 6 In the case of Ls where only the sourcedomain is trained the feature representations obtained fromthe source domain are likely to be different from the targetfeatures because the sample features of the target domain arenot considered at all during the model training ereforeoverfitting occurs and the domain loss is great In the case ofLs + LCD without considering domain alignment and onlyconsidering discriminant learning the domain loss issmaller than that in the case of Ls within the range of 0ndash055Only Ls + LCA and Ls + LCA + LCD are compared in Figure 7for clarity e domain loss is minimal and the domain losscurve is smooth in the case of Ls + LCA + LCD indicating thatthe slight change of weight causes a slight change of distancebetween domains A further stable and accurate domainadaptation model can be obtained by considering the do-main alignment and discriminative feature learning

e intraclass loss under the four loss functions is il-lustrated in Figure 8 Evidently the intraclass loss is large inthe case of Ls because the sample features of the targetdomain are not considered e case of Ls + LCD takessecond place e cases of Ls + LCA and Ls + LCA + LCD aresmall suggesting that discriminant learning can achieve asmall intraclass loss only in the model training of jointdomain alignmente comparison of the intraclass losses ofLs + LCA and Ls + LCA + LCD shows that the curve is smoothin the case of Ls + LCA + LCD indicating that the model isstable and reasonable in this case

43 Sensitivity Analysis of Fault For each type of fault de-tection we introduce three evaluation indexes namelyPrecision Recall and F-Measure to further analyze the

sensitivity of the proposed CACD-1DCNN method In themulticlassification problem of fault diagnosis for each faultcategory f Precision and Recall are defined as follows

Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)

where True Positive (TP) represents the number of faultscorrectly identified as fault category f False Positive (FP)means the number of faults wrongly identified as faultcategory f and False Negative (FN) represents the number offaults c incorrectly labeled as not belonging to f

F-Measure is defined as a reference for diagnosis anal-ysis e calculation method of F-Measure is as follows

00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)

2000 4000 8000 100000 6000Training iterations

LS + LCALS + LCD LS

LS + LCA + LCD

Figure 6 e domain loss of the proposed model with four lossfunctions

00

02

04

06

08

10

Acc

urac

y

100004000 80002000 60000Training iterations

TrainingTest

Figure 5 Training and test accuracy of domain transfer scenario ofC⟶A in the proposed model


F 1 + α21113872 1113873 timesPrecision times Recall

α2 times(Precision + Recall) (9)

F-Measure denotes the geometric weighted average ofPrecision and Recall with α as the weight Setting α to 1indicates that Precision is as important as Recall Whenαgt 1 Precision is important when αlt 1 Recall is im-portant In this study α is set to 1 the closer the F-Measureto 1 the better the fault diagnosis performance isevaluation method considers the Precision and Recall ehighest F-Measure is 1 e Precision Recall and F-Measure of each health state in the comparison methodA2CNN and the proposed method CACD-1DCNN arepresented in Table 4 and the comparison results of othermethods are similar

For the first (rolling body) and fourth (inner raceway)fault types both of which have a fault size of 0007 inch

the CACD-1DCNN method has low Precision values inthe domain shift scenario B⟶A which are 89 and93 respectively us approximately 10 of thesekinds of fault alerts in this domain shift are unreliableFor the first fault type the Precision of the proposedmethod in the domain shift of C⟶A is 89 indicatingthat 11 of the samples are incorrectly classified as thisfault category In A2CNN we can see that in somedomain shifts based on the first second third and fifthfault types and normal state there are some fault alarmswhich are not reliable

For the third fault type the rolling body fault size is 0021inch e Recall values of the CACD-1DCNN method indomain shift scenarios C⟶A and B⟶A are low whichare 88 and 80 respectively at is 12 of these faults areundetected in the domain shift scenario of C⟶A whereas20 are undetected in the domain shift B⟶A In A2CNN

0

4

8

12

16

Dom

ain

loss

(le ndash

3)


LS + LCALS + LCA + LCD

Figure 7 e domain loss of the proposed model with the two loss functions

0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD

Figure 8 e interclass loss of the proposed model with four loss functions


we can see that in some domain shifts based on the secondthird and eighth fault types there are some undetected faults

Similarly the F-Measure values in the domain shiftscenarios of C⟶A and B⟶A for the first fault type areall 09434 For the third fault type the F-Measure values inthe domain shifts of C⟶A and B⟶A are 09362 and08889 respectively e F-Measure in the domain shift ofB⟶A for the fourth fault type is 09615 All the other F-Measure values of the fault classes are 1 In A2CNN we cansee that in some domain shifts based on the first thesecond the third and the eighth fault types and normal statethe F-Measure values are less than 1

In general the Precision Recall and F-Measure of theCACD-1DCNN are higher than that of the A2CNN whichmeans that the CACD-1DCNN has less false alarms andmissed alarms Except for a few third fault types in domainshift B⟶A which are incorrectly classified into the firstand fourth fault types and a few third fault types in domainshift C⟶A which are incorrectly classified into the firstfault type the CACD-1DCNN method divides all categoriesinto the correct classes e results show that after com-bining the domain alignment and discriminative featurelearning the classification performance of the proposedmethod achieves remarkable improvement

Table 4 Precision Recall and F-Measure of the proposed CACD-1DCNN and A2CNN on six domain shifts

Fault location N RF IF OFFault diameter (inch) 0007 0014 0021 0007 0014 0021 0007 0014 0021Category labels 0 1 2 3 4 5 6 7 8 9PrecisionA2CNNA⟶B 100 100 100 9988 100 100 100 100 100 100A⟶C 9346 100 100 100 100 100 100 100 100 100B⟶C 100 9007 9259 100 100 100 100 100 100 100C⟶B 100 100 100 100 100 9901 100 100 100 100C⟶A 100 9091 9040 100 100 100 100 100 100 100B⟶A 100 9988 100 100 100 100 100 100 100 100

CACD-1DCNNA⟶B 100 100 100 100 100 100 100 100 100 100A⟶C 100 100 100 100 100 100 100 100 100 100B⟶C 100 100 100 100 100 100 100 100 100 100C⟶B 100 100 100 100 100 100 100 100 100 100C⟶A 100 89 100 100 100 100 100 100 100 100B⟶A 100 89 100 100 93 100 100 100 100 100

RecallA2CNNA⟶B 100 100 9988 100 100 100 100 100 100 100A⟶C 100 100 9300 100 100 100 100 100 100 100B⟶C 100 100 100 8175 100 100 100 100 100 100C⟶B 100 100 100 100 100 100 100 100 9900 100C⟶A 100 100 100 7938 100 100 100 100 100 100B⟶A 100 100 100 9988 100 100 100 100 100 100


F-MeasureA2CNNA⟶B 1 1 09994 09994 1 1 1 1 1 1A⟶C 09662 1 09637 1 1 1 1 1 1 1B⟶C 1 09478 09615 08996 1 1 1 1 1 1C⟶B 1 1 1 1 1 1 1 1 09950 1C⟶A 1 09524 09496 08850 1 1 1 1 1 1B⟶A 1 09994 1 09994 1 1 1 1 1 1



44 Parameter Sensitivity In this section we study hyper-parameter α which is a critical coefficient for cross-vali-dation A high value of α may force networks to learnoversimplified low-rank feature representations Although ahigh value may lead to perfectly aligned covariances it maynot be useful for classification Meanwhile a small αmay beinsufficient to bridge the domain shift

Taking the domain shift of C⟶A as an example theresults of α with different values are illustrated in Figure 9 βis fixed at 0003 Similar trends are observed in other domaintransfer scenarios A large range of α (α isin [10minus1 104]) can beselected to obtain better results than those of the bestbaseline methods When the value of α is larger than 104 theaccuracy rapidly decreases e effectiveness and robustnessof the proposed method are further verified

With a fixed α of 100 we consider the influence ofparameter β which balances the discriminant loss toincrease the intraclass compactness and interclass

dispersion A large β can produce deep discriminatefeatures whereas a small β is insufficient to improve thediscrimination of features Figure 10 shows the change inaccuracy in the domain shift scenario of C⟶A when β(β isin 00003 0003 003 03 1 10 20) takes differentvalues When β is very small the classification accuracy ofthe target domain is high At this time correlationalignment loss plays a role and the model classificationperformance is high With the increase of β when thedomain alignment keeps up with the change of the sourcefeatures under the influence of discriminant loss a do-main adaptation model with high accuracy is obtainedHowever when β is larger than a certain interval which is10 in this case the classification performance of the targetdomain is poor e reason may be because the dis-criminant influence is too large to exceed the speed ofdomain alignment In this case we can conclude thatwhen β is taken (0 10] the test accuracy remains high

075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest

Figure 9 Accuracy corresponding to different α in domain shift of C⟶A

080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest

Figure 10 Accuracy corresponding to different β in domain shift of C⟶A


e experimental results reveal that appropriate trade-off parameters between domain alignment and discriminantfeature learning in the CACD-1DCNN model can improvethe domain adaptation performance

45 Network Visualizations To further describe the effec-tiveness of the CACD-1DCNN the t-SNE technology [41] isadopted for visualizing the feature representations of theproposed approach in all convolutional and fully connectedlayers Domain scenario B⟶C in Figure 11 is taken as anexample ree points are worth noting (1) As the numberof layers of the CACD-1DCNN model increases the signalsbecome increasingly separable at each layer indicating thenecessity of a deep structure (2) In the third and fourthconvolutional layers the phenomenon of linear insepara-bility of feature representations occurs In the fully con-nected layer the feature representations of all faults arelinearly separable erefore the nonlinear expressionability of the model increases as the number of layers in-creases (3) e feature representation of signals in the firstconvolutional layer is similar to that of the original signalsand fails to show any separability In the third and fourth

convolutional layers the signal samples gradually showseparability At the fully connected layer the faults can bewell distinguished e following describes the core idea ofthe proposed model to implement domain adaptation ecorrelation of the data is initially removed and thenrecorrelation operations are performed on the basis of theinformation of the target domain

5 Conclusion

In this work we propose a CACD-1DCNN model for thedomain adaptation of bearing fault diagnosis by combiningdomain alignment and discriminative feature learning eCACD-1DCNN aims to extract domain-invariant featureswith improved intraclass compactness and interclass sepa-rability and guarantees high classification performance inthe two domains e experimental results on the CWRUbearing datasets confirm the superiority of the proposedmethod over many existing methods Future research mayfocus on two aspects (1) applying correlation alignment atmultiple layers between the two domains in parallel and (2)further reducing the domain shifts in the aligned featurespace through other constraints

ndash75ndash50ndash25

0255075

100

ndash40 ndash20 0 20 40 60 80 100ndash60

(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30

ndash40 0 20 40ndash20

(c)

ndash40

ndash20

0

20

10ndash20 ndash10 0ndash30 20 30ndash40

(d)

40ndash40 20ndash20 0

ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )

ndash40ndash30ndash20ndash10

010203040


(g)

Figure 11 Feature visualization via t-SNE for domain shift of B⟶C feature representations for original test signals four convolutionallayers and two fully connected layers respectively (a) Original signal (b) Conv1 (c) Conv2 (d) Conv3 (e) Conv4 (f ) FC5 (g) FC6


Data Availability

e data used in this paper are acquired from the BearingData Center of Case Western Reserve University (CWRU)and web page httpcsegroupscaseedubearingdatacenterhome (accessed October 2015)

Conflicts of Interest

e authors declare that they have no conflicts of interest

Authorsrsquo Contributions

Jing An conceived the study participated in the researchdesign conducted the experiments and wrote the paperPing Ai designed the methodology and reviewed themanuscript critically for important intellectual contentDakun Liu carried out more supplementary experimentsincluding the calculation of Precision Recall and F-Measureof the comparison methods and the effectiveness verificationof the proposed method All authors read and approved thefinal manuscript

Acknowledgments

is research was supported by the ldquoNatural ScienceFoundation of the Jiangsu Higher Education Institutions ofChinardquo (Grant no 18KJB520050) ldquoNatural Science Foun-dation of Chinardquo (Grant no 51805466) and ldquoNatural ScienceFoundation of Jiangsu Provincerdquo (Grant no BK20181055)

References

[1] F Jia Y Lei J Lin X Zhou and N Lu ldquoDeep neural net-works a promising tool for fault characteristic mining andintelligent diagnosis of rotating machinery with massivedatardquo Mechanical Systems and Signal Processing vol 72-73pp 303ndash315 2016

[2] Z Chen K Gryllias and W Li ldquoMechanical fault diagnosisusing convolutional neural networks and extreme learningmachinerdquoMechanical Systems and Signal Processing vol 133p 106272 2019

[3] T Han C Liu W Yang and D Jiang ldquoA novel adversariallearning framework in deep convolutional neural network forintelligent diagnosis of mechanical faultsrdquo Knowledge-BasedSystems vol 165 pp 474ndash487 2019

[4] M M M Islam and J-M Kim ldquoAutomated bearing faultdiagnosis scheme using 2D representation of wavelet packettransform and deep convolutional neural networkrdquo Com-puters in Industry vol 106 pp 142ndash153 2019

[5] H Shao H Jiang H Zhang W Duan T Liang and S WuldquoRolling bearing fault feature learning using improved con-volutional deep belief network with compressed sensingrdquoMechanical Systems and Signal Processing vol 100 pp 743ndash765 2018

[6] Z Gao C Cecati and S X Ding ldquoA survey of fault diagnosisand fault-tolerant techniques-part I fault diagnosis withmodel-based and signal-based approachesrdquo IEEE Transac-tions on Industrial Electronics vol 62 no 6 pp 3757ndash37672015

[7] R Liu Y Shi C Ji andM Jia ldquoA survey of sentiment analysisbased on transfer learningrdquo IEEE Access vol 7 pp 85401ndash85412 2019

[8] T Kim M Jeong S Kim S Choi and C Kim ldquoDiversify andmatch a domain adaptive representation learning paradigmfor object detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition pp 12456ndash12465Long Beach CA USA June 2019

[9] L Gautheron I Redko and C Lartizien ldquoFeature selectionfor unsupervised domain adaptation using optimal trans-portrdquo in Proceedings of the Joint European Conference onMachine Learning and Knowledge Discovery in Databasespp 759ndash776 Dublin Ireland September 2018

[10] A-A Liu S Xiang W-Z Nie and D Song ldquoEnd-to-endvisual domain adaptation network for cross-domain 3D CPSdata retrievalrdquo IEEE Access vol 7 pp 118630ndash118638 2019

[11] G Csurka ldquoDomain adaptation for visual applications acomprehensive surveyrdquo 2017 httpsarxivorgabs170205374

[12] E Tzeng J Hoffman N Zhang K Saenko and T DarrellldquoDeep domain confusion maximizing for domain invari-ancerdquo 2014 httpsarxivorgabs14123474

[13] M Long Y Cao J Wang and M I Jordan ldquoLearningtransferable features with deep adaptation networksrdquo 2015httpsarxivorgabs150202791

[14] M Long H Zhu J Wang and M I Jordan ldquoDeep transferlearning with joint adaptation networksrdquo in Proceedings of the34th International Conference on Machine Learning vol 70JMLR org Sydney Australia pp 2208ndash2217 August 2017

[15] B Sun and K Saenko ldquoDeep coral correlation alignment fordeep domain adaptationrdquo in Computer VisionmdashECCV 2016Workshops pp 443ndash450 Springer 2016Springer BerlinGermany Lecture Notes in Computer Science

[16] B Sun J Feng and K Saenko ldquoReturn of frustratingly easydomain adaptationrdquo in Proceedings of the irtieth AAAIConference on Artificial Intelligence Phoenix AZ USAFebruary 2016

[17] E Tzeng J Hoffman K Saenko and T Darrell ldquoAdversarialdiscriminative domain adaptationrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 7167ndash7176 Honolulu HI USA July 2017

[18] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeadversarial netsrdquo in Proceedings of the Advances in NeuralInformation Processing Systems pp 2672ndash2680 MontrealCanada December 2014

[19] E Tzeng J Hoffman T Darrell and K Saenko ldquoSimulta-neous deep transfer across domains and tasksrdquo in Proceedingsof the IEEE International Conference on Computer Visionpp 4068ndash4076 Santiago Chile December 2015

[20] Y Ganin E Ustinova H Ajakan et al ldquoDomain-adversarialtraining of neural networksrdquoe Journal of Machine LearningResearch vol 17 no 1 pp 2096ndash2030 2016

[21] M-Y Liu and O Tuzel ldquoCoupled generative adversarialnetworksrdquo in Proceedings of the Advances in Neural Infor-mation Processing Systems pp 469ndash477 Barcelona SpainDecember 2016

[22] K Bousmalis N Silberman D Dohan D Erhan andD Krishnan ldquoUnsupervised pixel-level domain adaptationwith generative adversarial networksrdquo in Proceedings of theIEEE conference on computer vision and pattern recognitionpp 3722ndash3731 Honolulu HI USA July 2017

[23] M Ghifary W B Kleijn M Zhang D Balduzzi and W LildquoDeep reconstruction-classification networks for unsupervised


domain adaptationrdquo in Computer VisionmdashECCV 2016pp 597ndash613 Springer Berlin Germany 2016

[24] K Bousmalis G Trigeorgis N Silberman D Krishnan andD Erhan ldquoDomain separation networksrdquo in Proceedings ofthe Advances in Neural Information Processing Systemspp 343ndash351 Barcelona Spain December 2016

[25] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the irty-First AAAIConference on Artificial Intelligence San Francisco CA USAFebruary 2017

[26] M D Zeiler D Krishnan G W Taylor and R FergusldquoDeconvolutional networksrdquo in Proceedings of the 2010 IEEEComputer Society Conference on computer vision and patternrecognition pp 2528ndash2535 San Francisco CA USA June2010

[27] W Zhang G Peng C Li Y Chen and Z Zhang ldquoA new deeplearning model for fault diagnosis with good anti-noise anddomain adaptation ability on raw vibration signalsrdquo Sensorsvol 17 no 2 p 425 2017

[28] W Lu B Liang Y Cheng D Meng J Yang and T ZhangldquoDeep model based domain adaptation for fault diagnosisrdquoIEEE Transactions on Industrial Electronics vol 64 no 3pp 2296ndash2305 2017

[29] B Zhang W Li J Hao X-L Li and M Zhang ldquoAdversarialadaptive 1-D convolutional neural networks for bearing faultdiagnosis under varying working conditionrdquo 2018 httpsarxivorgabs180500778

[30] X JianW Li X Guo and RWang ldquoFault diagnosis of motorbearings based on a one-dimensional fusion neural networkrdquoSensors vol 19 no 1 p 122 2019

[31] Z Tong W Li B Zhang and M Zhang ldquoBearing fault di-agnosis based on domain adaptation using transferable fea-tures under different working conditionsrdquo Shock andVibration vol 2018 Article ID 6714520 12 pages 2018

[32] X Li W Zhang Q Ding and J-Q Sun ldquoMulti-layer domainadaptation method for rolling bearing fault diagnosisrdquo SignalProcessing vol 157 pp 180ndash197 2019

[33] T Han C LiuW Yang and D Jiang ldquoDeep transfer networkwith joint distribution adaptation a new intelligent faultdiagnosis framework for industry applicationrdquo ISA Trans-actions vol 97 pp 269ndash181 2019

[34] Y Sun Y Chen X Wang and X Tang ldquoDeep learning facerepresentation by joint identification-verificationrdquo in Pro-ceedings of the Advances in Neural Information ProcessingSystems pp 1988ndash1996 Montreal Canada December 2014

[35] Y Wen K Zhang Z Li and Y Qiao ldquoA discriminativefeature learning approach for deep face recognitionrdquo inComputer VisionmdashECCV 2016 pp 499ndash515 Springer BerlinGermany 2016

[36] W Liu Y Wen Z Yu and M Yang ldquoLarge-margin softmaxloss for convolutional neural networksrdquo in Proceedings of the2016 International Conference on Machine Learning (ICML)p 7 New York City NY USA June 2016

[37] C Chen Z Chen B Jiang and X Jin ldquoJoint domainalignment and discriminative feature learning for unsuper-vised deep domain adaptationrdquo Proceedings of the AAAIConference on Artificial Intelligence vol 33 pp 3296ndash33032019

[38] J Cavazza A Zunino M San Biagio and V MurinoldquoKernelized covariance for action recognitionrdquo in Proceedingsof the 2016 23rd International Conference on Pattern Recog-nition (ICPR) pp 408ndash413 Cancun Mexico December 2016

[39] P Morerio J Cavazza and V Murino ldquoMinimal-entropycorrelation alignment for unsupervised deep domain adap-tationrdquo 2017 httpsarxivorgabs171110288

[40] Case Western Reserve University bearing data center websitehttpscsegroupscaseedubearingdatacenterpageswelcome-case-western-reserve-university-bearing-data-center-website

[41] L V D Maaten and G Hinton ldquoVisualizing data usingt-SNErdquo Journal of Machine Learning Research vol 9pp 2579ndash2605 2008


Another increasingly popular work is adversarial do-main adaptation methods which include adversarial dis-criminative and generative methods e former aims toencourage domain confusion through an adversarial ob-jective with respect to a domain discriminator Tzeng et al[17] proposed a unified adversarial domain adaptationframework that combines discriminative modeling untiedweight sharing and a Generative Adversarial Network(GAN) loss [18] Among the discriminative models Tzenget al proposed a model with confusion loss [19] and alsoconsidered an inverted label GAN loss [17] whereas Ganinet al [20] proposed a model with minimax loss e lattercombines the discriminative model with a generativecomponent on the basis of GANs Liu and Tuzel [21]developed a Coupled Generative Adversarial Network(CoGAN) that adopts two GANs each corresponding toone of the domains and the CoGAN learns a joint dis-tribution of multidomain samples and enforces a weight-sharing constraint to limit the network capacity emethod presented in [22] also adopts GANs to generatesource domain images to appear as if drawn from the targetdomain

Typical reconstruction-based methods can be seen in[23ndash25] Data reconstruction can be viewed as an auxiliarytask to support the adaptation of the label predictionGhifary et al [23] combined the standard convolutionalnetwork for source label prediction with a deconvolutionalnetwork [26] for target data reconstruction Bousmalis et al[24] introduced the notion of private and shared subspacesfor each domain Meanwhile a reconstruction loss is inte-grated in the model by using a shared decoder which learnsto reconstruct the input sample through domain-specificand share features Tan et al [25] presented a selectivelearning algorithm that uses the reconstruction error toselect useful unlabeled data from intermediate domains

Despite the success achieved by domain adaptationlimited research can be found with respect to its applicationon fault diagnosis Zhang et al [27] took raw vibrationsignals as inputs of a deep convolutional neural networkwith a wide first-layer kernel convolutional neural network(WDCNN) model ey also used adaptive batch normal-ization (AdaBN) as the algorithm of domain adaptation torealize fault diagnosis under different load conditions andnoisy environments Lu et al [28] introduced a deep CNNmodel with domain adaptation for fault diagnosis and thismodel integrates MMD as the regularization term into theloss function of the model to reduce the cross-domaindistribution difference Zhang et al [29] developed anadversarial domain adaptation model which comprises asource feature extractor a target feature extractor a domaindiscriminator and a label classifier for fault diagnosis Jianet al [30] proposed a fusion CNN model that combines1DCNN and DempsterndashShafer evidence theory to enhancethe cross-domain adaptive capability for fault diagnosisTong et al [31] proposed a bearing fault diagnosis domainadaptation method to find transferable features across do-mains which were obtained by reducing marginal andconditional distributions simultaneously based on MMD Liet al [32] presented a deep domain adaptation method for

bearing fault diagnosis on the basis of the multikernelmaximummean discrepancies between domains in multiplelayers to learn representations from the source domainapplied to the target domain Furthermore Han et al [33]proposed a new intelligent fault diagnosis framework whichextends the marginal distribution adaptation to joint dis-tribution adaptation and guarantees an accurate distributionadaptation

Improving CNN performance by learning additionaldiscriminative features has become a recent trend For ex-ample contrastive loss [34] and center loss [35] are pre-sented to learn discriminative deep features for faceverification and recognition Furthermore Liu et al [36]proposed a large-margin softmax loss to extend the softmaxloss to large margin softmax which leads to a large angularseparability between the learned features Chen et al [37]proposed two discriminative feature learning approachesnamely instance-based and center-based discriminativelosses with joint domain alignment and discriminativefeature learning

Inspired by these methods we propose a novel deepdomain adaptation model for bearing fault diagnosis withDeep Coral and center-based discriminative feature learn-ing By combining domain alignment and discriminativefeature learning the domain-invariant features extracted bythe model can be well clustered and separable which canclearly contribute to domain adaptation and classification

e main contributions of this work include thefollowing

(1) An end-to-end method with domain adaptation forfault diagnosis is proposed that is CACD-1DCNNis method directly works on raw temporal signalsand does not require time-consuming denoisingpreprocessing and a separate feature extractionalgorithm

(2) By combining domain alignment and discriminativefeature learning CACD-1DCNN aims to extractdomain-invariant features with improved intraclasscompactness and interclass separability and guar-antees high classification performance in the twodomains

(3) Extensive experiments on the Case Western ReserveUniversity (CWRU) bearing datasets demonstratethat CACD-1DCNN achieves superior diagnosisperformance over existing baseline methods

(4) Furthermore network visualization and loss analysisprovide an intuitive presentation of the adaptationresults and verify the effectiveness of our method

e remaining parts are organized as follows InSection 2 the domain adaptation problem of fault diag-nosis is formulated and the basic theories of Deep Cor-relation Alignment and Center-Based Discriminative Lossare introduced In Section 3 the proposed intelligent faultdiagnosis method CACD-1DCNN is presented ecomparison methods the experiments and discussion aregiven in Section 4 Finally the conclusions are drawn inSection 5










i ysi )1113864 1113865|

Ns


i)1113864 1113865|Nt



i and xti are d-di-




2F (1)



CS ASJATS

CT ATJATT

(2)


Jii 1

NS

Jij 1


ine j

(3)



LCD β1113944

ns

i1max 0 f

si minus cyi

2

2minus m11113874 1113875

+ 1113944c


2

21113874 1113875

(4)








Δcj


i1113872 1113873

1 + 1113936bi1 δ yi j( 1113857

(5)

ct+1j c

tj minus λΔct

j (6)

















LCA

LCD

LSSource samples

Target samples





Shar

e

Shar

e

Shar

e

Shar

e

Share Share


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


Overlap Shi

Training samples









(A2CNN) [29]





















WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References





















































i ysi )1113864 1113865|

Ns


i)1113864 1113865|Nt



i and xti are d-di-




2F (1)



CS ASJATS

CT ATJATT

(2)


Jii 1

NS

Jij 1


ine j

(3)



LCD β1113944

ns

i1max 0 f

si minus cyi

2

2minus m11113874 1113875

+ 1113944c


2

21113874 1113875

(4)








Δcj


i1113872 1113873

1 + 1113936bi1 δ yi j( 1113857

(5)

ct+1j c

tj minus λΔct

j (6)

















LCA

LCD

LSSource samples

Target samples





Shar

e

Shar

e

Shar

e

Shar

e

Share Share


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


Overlap Shi

Training samples









(A2CNN) [29]





















WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References















































Δcj


i1113872 1113873

1 + 1113936bi1 δ yi j( 1113857

(5)

ct+1j c

tj minus λΔct

j (6)

















LCA

LCD

LSSource samples

Target samples





Shar

e

Shar

e

Shar

e

Shar

e

Share Share


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


Overlap Shi

Training samples









(A2CNN) [29]





















WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References















































LCA

LCD

LSSource samples

Target samples





Shar

e

Shar

e

Shar

e

Shar

e

Share Share


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


kernels

Conv2with small

convkernels

Conv3with small

convkernels

Conv4with small

convkernels


Overlap Shi

Training samples









(A2CNN) [29]





















WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References


















































(A2CNN) [29]





















WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References
















































WDCNN (AdaBN) 9940 9340 9720 9990 8830 9750 9595


5000

6000

7000

8000

9000

10000

Acc

urac

y (

)


095

096

097

098

099

100

Test

accu

racy

65 7 9321 84 100Trial number

ArarrBArarrCBrarrC

CrarrBCrarrABrarrA



LossesDomain shifts











Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References





















































Precision(f) TP

TP + FP

Recall(f) TP

TP + FN

(8)



00

01

02

03

04

05

Dom

ain

loss

0

1

2

3

4

5

Dom

ain

loss

(le3

)


LS + LCALS + LCD LS

LS + LCA + LCD


00

02

04

06

08

10

Acc

urac

y


TrainingTest









0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References



















































0

4

8

12

16

Dom

ain

loss

(le ndash

3)




0

1

2

3

Intr

a los

s (le

3)

0

5

10

15

25

20

Intr

a los

s (le

4)


LS + LCALS + LCD LS

LS + LCA + LCD


















075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References




























































075

080

085

090

095

100

Acc

urac

y

10ndash1 100 101 102 103 104 10510ndash2

α

TrainingTest


080

085

090

095

100

Acc

urac

y

001 01 1 101E ndash 3β

TrainingTest






5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References
















































5 Conclusion



0255075

100


(a)

ndash40ndash20

020406080

100

ndash25 0 25 50 75 100ndash50

(b)

ndash30

ndash20

ndash10

0

10

20

30


(c)

ndash40

ndash20

0

20


(d)


ndash30

ndash20

ndash10

0

10

20

30

(e)


010203040


(f )


010203040


(g)



Data Availability






Acknowledgments


References













































Data Availability






Acknowledgments


References













































deep domain adaptation model for bearing fault diagnosis

Documents