data-driven robot fault detection and diagnosis using generative …€¦ · data-driven robot...

Data-Driven Robot Fault Detection and Diagnosis Using Generative Models: AModified SFDD Algorithm

Alex Mitrevski and Paul G. PlögerHochschule Bonn-Rhein-Sieg, Sankt Augustin, Germanye-mail: {aleksandar.mitrevski, paul.ploeger}@h-brs.de

Abstract

This paper presents a modification of the data-driven sensor-based fault detection and diagno-sis (SFDD) algorithm for online robot monitoring.Our version of the algorithm uses a collection ofgenerative models, in particular restricted Boltz-mann machines, each of which represents the dis-tribution of sliding window correlations betweena pair of correlated measurements. We use suchmodels in a residual generation scheme, wherehigh residuals generate conflict sets that are thenused in a subsequent diagnosis step. As a proofof concept, the framework is evaluated on a mo-bile logistics robot for the problem of recognis-ing disconnected wheels, such that the evaluationdemonstrates the feasibility of the framework (onthe faulty data set, the models obtained 88.6%precision and 75.6% recall rates), but also showsthat the monitoring results are influenced by thechoice of distribution model and the model pa-rameters as a whole.

1 IntroductionTo increase the autonomy and adaptivity of robots, learning-based fault detection and diagnosis (FDD) methods repre-sent a viable alternative to classical model-based algorithmssince they minimise the need for accurate, manually speci-fied behavioural models, which are often impractical to ob-tain. In principle, the reliable application of learning to FDDdoes require injecting some knowledge about the systeminto the learning process in order to increase the learningefficiency and improve the model’s contextual awareness.

In [1], Khalastchi and Kalech describe two versions ofa data-driven FDD algorithm, called sensor-based fault de-tection and diagnosis (SFDD), that uses information aboutthe structural model of a system as well as the correlationsbetween sensor measurements in order to detect and sub-sequently diagnose robot faults. The general idea behindthis method is to find the pairs of correlated sensors in asystem and then, by monitoring manually specified datamodes1(such as stuck at or drifting values), look for viola-tions of those correlations, which are taken to be indicationsof a fault. Faults detected in this manner can then be diag-nosed in a subsequent step.

1In [1], these are referred to as patterns.

Figure 1: Overview of our learning-based FDD schema

The practical usefulness of the SFDD algorithm is signif-icantly affected by the choice of data modes that are to bemonitored during the operation of a robot, since an incom-plete or suboptimal choice of modes leads to either unde-tected correlation violations or a large number of false pos-itive detections.2 To address this issue, we present a modi-fication of the mode monitoring method in [1], such that wereplace the manually specified modes with models of pair-wise sliding window correlations, namely we learn a prob-ability distribution of sliding window correlations betweenthe measurements of the correlated sensor pairs. Each suchdistribution is represented by a generative model, which isused in a residual generation scheme during online opera-tion; these residuals are then used for conflict set generationand diagnosis. Our methodology is summarised in Fig. 1.

We use Restricted Boltzmann Machines (RBMs) for dis-tribution modelling due to their flexibility and extensibility[2; 3]. For each model, a residual is generated by samplingfrom the distribution given a window of measurement cor-relations and comparing the sample with the observation. Aresidual higher than a model-specific threshold indicates afault in one of the components and is used to create a con-flict set. For finding diagnoses given the conflict sets, we usethe HS-DAG algorithm [4] as implemented in [5]. The feasi-bility of our framework is analysed for a mobile robot that isused in a logistics application [6]. We additionally compareRBMs to Gaussian Mixture Models (GMMs) on the same

2Our implementation of the SFDD method, along withreal-robot data from a KUKA youBot that demonstratethe problem, can be found at https://github.com/alex-mitrevski/SFDD

data to illustrate the conceptual independence on the choiceof generative model. As shown in section 5, our representa-tion is not constrained to RBMs, as other generative modelscan be used as well, although the choice of model does havean influence on the quality of the generated residuals.

We organise this paper as follows. Section 2 discussesrelated work in learning-based FDD; the SFDD algorithmis described in more detail in section 3, along with somepreliminaries on restricted Boltzmann machines; section 4presents our modification to the SFDD method, while sec-tion 5 shows results of preliminary experiments performedwith a mobile robot platform; finally, section 6 concludesthe paper.

2 Related WorkGenerative models, in particular restricted Boltzmann ma-chines, have been used for anomaly detection in variouscontexts. Wulsin et al. [7] use a deep belief network(which is made up of restricted Boltzmann machines) forfinding anomalies in electroencephalography (EEG) wave-forms; deep belief networks are also shown to be more reli-able than one-class support vector machines in this context.Chopra and Yadav [8] apply restricted Boltzmann machinesfor feature extraction from acoustic signals; these featuresare then used for detecting faults in a combustion engine. Asimilar application of restricted Boltzmann machines to theproblem of bearing fault detection is discussed in [9] and[3].

More recent generative models, such as Generative Ad-versarial Networks (GANs) and Variational Autoencoders(VAEs), have also been applied to anomaly detection. In[10], Di Mattia et al. compare various GAN models foranomaly detection, mostly on the task of detecting visualanomalies. Zhang and Chen [11] on the other hand presenta model that combines a VAE and an LSTM (long short-term memory) network, which is used for detecting anoma-lies in electrocardiogram (ECG) data. Since our aim is tolearn pairwise correlation models, RBMs are advantageoussince they are easier to train than VAEs and GANs, and, asshown in section 5, can work reliably even when a smallermodel is used.

Learning-based fault detection and diagnosis have alsobeen successfully applied in robotics for addressing differ-ent aspects of the problem. Christensen et al. [12] use atime-delay neural network for fault detection, where faultinjection is used for training data collection; similar to thiswork, we learn data models from sliding windows, but welearn correlations instead of direct observations. In [13],Golombek et al. learn the distribution of time intervalsbetween occurrences of event pairs; these distributions arethen used for assigning scores to sequences of observations,which allows detecting anomalies in the observations. Ina similar manner, Li and Parker [14] learn a state transi-tion diagram on features extracted by clustering sensor ob-servations, such that unlikely transitions or unusually longperiods of time in a single state are used as indications ofanomalies. Fox et al. [15] use learned hidden Markov mod-els for execution failure detection, where the states of themodel are obtained by clustering observations in particularexecutions of a given task. We consider these models to becomplementary to ours since they analyse executions ratherthan the health of components as such, which is what wefocus on in this paper.

3 Preliminaries3.1 Sensor-Based Fault Detection and Diagnosis

(SFDD)The SFDD method presented in [1] comes in two variants,the so-called basic and extended versions. In this paper, wefocus on the extended approach, where we are given a struc-tural model of a system, a set of sensors S, and a set P ofdata modes that measurements might follow, such as drift orzero slope. The algorithm operates in two phases - an of-fline and an online phase. In the offline phase, we are givena matrix X of size m × n, where m is the number of sen-sors and n is the length of a sequence of consecutive sensormeasurements collected during fault-free system operation.The objective is to find a set C of correlated sensors

C = {(Si, Sj) | 1 ≤ i, j ≤ m, i 6= j, corr(Si, Sj) = 1}(1)

where

corr(Si, Sj) =

{1 if Si and Sj are correlated0 otherwise

along with pairs of modes that are observed together in win-dows of X; the mode identification step thus takes intoconsideration scenarios in which the correlation betweensensors is temporarily lost and the sensors follow differentmodes. In the online phase, the modes exhibited by eachsensor Si are continuously monitored and compared againstthose of the correlated sensors; if a mode exhibited by Si isnot observed for any of the correlated sensors that belong toindependent components according to the structural graph,Si is considered faulty.

3.2 Restricted Boltzmann MachinesAs mentioned before, we use Restricted Boltzmann Ma-chines (RBMs) as generative data models for anomaly de-tection. While a detailed treatment of RBMs is not in thescope of this paper, we present a brief summary of RBMsand their working mechanism here, drawing much of thecontent and notation from [16; 2; 17].

An RBM is an undirected graphical model given in theform of a fully-connected bipartite graph. One of thegraph’s layers, called the visible layer, has a number of unitsequal to the dimensionality of the model’s input data, whichis denoted |V |, and the other layer is called the hidden layerand has dimensionality that is denoted |H|. Being an undi-rected model, and RBM encodes a Gibbs distribution of theform [16]

P (v, h) =1

Ze−E(v,h)

where E(v, h) is an energy function that is given as

E(v, h) = −|H|∑i=1

|V |∑j=1

wijhivj −|V |∑j=1

ajvj −|H|∑i=1

bihi

and Z is called a partition function and acts as a normalisingconstant, which is calculated as

Z =

|V |∑j=1

|H|∑i=1

e−E(vj ,hi)

In the above equations, wij is a connection weight betweenthe i-th hidden unit and the j-th visible unit, aj is a bias

term of the j-th visible unit, and bi is a bias term of the i-thhidden unit.

Assuming binary units in the network, the conditionalprobability of a unit being equal to one can be written as

p(hi = 1|v) = σ

|V |∑j=1

wijvj + bi

p(vj = 1|h) = σ

|H|∑i=1

wijhi + aj

where σ is the logistic function

σ(x) =1

1 + e−x

Training an RBM consists of learning the distribution rep-resented by the training samples, which entails maximis-ing the likelihood, or the log-likelihood in practice, of themodel parameters given the training data. It can be foundthat in order to maximise the (log-)likelihood, summationsover the visible variables are required; however, these willgenerally be difficult or impossible to compute. As a re-sult of that, these terms are approximated by sampling fromthe distribution encoded by the RBM and running a Markovchain for a few iterations. Such an approximation can beperformed by an algorithm called Contrastive Divergence(CD-k), where k is the number of iterations of the samplingalgorithm. Once a sample has been obtained, the updates ofthe weights and biases can be performed using the followingupdate terms:

∆wij = p(hi = 1|v0)v0j − p(hi = 1|v)vj= 〈vihj〉data − 〈vihj〉model

(2)

∆aj = v0j − vkj 〈vj〉data − 〈vj〉model (3)

∆bi = p(hi = 1|v0)− p(hi = 1|v)= 〈hj〉data − 〈hj〉model

(4)

Here, v is a sample obtained using CD, v0 is the currenttraining sample, and 〈〉 denotes an average log-likelihood:〈〉data is the likelihood obtained by assigning training datato the visible layer, while 〈〉model is the likelihood resultingfrom the model samples.

As discussed in [17], CD-1 can provide sufficiently goodestimates of the likelihood in practice. For that reason, weuse this version of the algorithm rather than a more ex-pensive one that would approximate the likelihood moreclosely.

4 Learning Generative Dependency ModelsIn this section, we first introduce our general formalisationfor anomaly detection using generative models. We thendescribe our modified SFDD algorithm and finally discussthe embedding of the framework on the robotic black boxintroduced in [6].

4.1 Anomaly Detection Using Generative Models:General Formulation

We assume that we are given a sequence y of n measure-ments from a given system variable, namely

y = {y1, y2, ..., yn} (5)

We can then define anomaly detection as the problem offinding a set of intervals F

F = {[ti, tj ] | 1 ≤ i, j ≤ n, i < j} (6)during which the measurements deviate compared to a se-quence of nominal observed measurements. Formally, weassume that y follows an unknown density f with some ad-ditive noise ε

y ∼ f(·) + ε (7)where no assumptions have been made on the nature of f .We then consider a sample yt−k:t of k measurements, wherek is a predefined window size, to be nominal if

yt−k:t ∼ f(·) + ε

Under this formalism, the objective is to learn a model Mthat represents the unknown data distribution f describingthe nominal measurements; M can then be used for verify-ing whether measurements are likely to follow the distribu-tion. In particular, given M and yt−k:t, we use a residualgeneration and comparison paradigm. We define a residualr to be the dissimilarity between yt−k:t and the measure-ments predicted by M , such that the larger the dissimilar-ity becomes, the more likely it is that yt−k,t is anomalous.Given a dissimilarity measure d and a sample m drawn fromM , we calculate a residual as

r = d(yt−k,t,m) (8)Using a predefined threshold δ, we can classify yt−k,t as{

nominal, if r ≤ δfaulty otherwise (9)

We use a restricted Boltzmann machine (RBM) to repre-sent M since it fits our main requirements for the model,namely (i) it allows learning f in an unsupervised fashionand, (ii) due to it being a generative model, can be used forsampling from the distribution. Similar to [12], temporalinformation is encoded into the models by using a measure-ment window as an input to the model. As a distance func-tion, we use the Hellinger distance3 [18], which measuresthe discrepancy between two probability distributions and isdefined as

d2(a,b) =1

2

n∑i=1

(√ai −

√bi

)2(10)

for discrete distributions, where a and b are the measure-ments that are being compared. It should be noted that ifraw sensor measurements are used to represent a and b, aand b are not valid probability distributions, so we actuallyabuse the Hellinger distance here.

4.2 Dependency Anomaly Detection and FaultDiagnosis

Learning the measurement distribution of individual vari-ables as described above would be useful for following thevariables’ individual trends; however, in line with the SFDDmethod, we are interested in learning dependency modelsbetween variable pairs and monitoring anomalies in thosedependencies. As illustrated in Fig. 1, we use the follow-ing procedure for modelling, monitoring, and subsequentlydiagnosing faults:

3We use the Hellinger distance instead of e.g. the Kullback-Leibler (KL) divergence since the Hellinger distance is symmetric.Conventional distance metrics, such as the L2 norm, were experi-mentally found to be inappropriate for residual generation.

1. We identify the set of correlated pairs of sensors givena set of nominal measurements

2. For each pair of correlated sensors, a dependencymodel that encodes the nominal distribution of slidingwindow correlations is learned

3. During online operation, we monitor the dependencydistribution by comparing samples from the modelwith the observed correlations

4. For any anomalous dependencies, conflict sets are gen-erated and used in a subsequent diagnosis step

This section describes each of these steps in more detail.

Identification of Correlated Sensor PairsAssuming we are given a structural model of a system, a setS of sensors, and a data set X of nominal measurements,we first identify the set C of correlated sensors as definedin equation 1. Let ρSi,Sj

be the correlation between Si andSj based on the measurements in X . Just as in [1], we usethe Pearson correlation coefficient for identifying correlatedsensors; however, the coefficient itself is undefined for con-stant signals since their variance is zero, which is why weuse the following modified definition:

ρ(x,y) =

cov(x,y)σiσj

, σi, σj > 0

1, σi, σj = 00, σi = 0 xor σj = 0

(11)

where x and y are sequences of measurements, cov is thecovariance between x and y, and σ is the standard deviation.We then have

corr(Si, Sj) =

{1, ρSi,Sj

= ρ(xi,xj) > κ0 otherwise (12)

for a predefined threshold value κ, where xi and xj are themeasurement sequences of Si and Sj in X respectively.

Learning Dependency ModelsGiven C, we learn a generative model Mi,j for each pair ofcorrelated sensors Si and Sj . The model learning process isperformed in an offline fashion, such that eachMi,j encodesthe distribution of the nominal dependency state between Siand Sj .

We encode the dependency between two sensors by thecorrelation between sliding windows extracted from xi andxj . Let k be the sliding window size; we then split xi andxj into overlapping sliding windows of size k4 and calculatethe correlation between the windows, such that we use themodified correlation coefficient defined in equation 11. Cal-culating the correlation for all windows of size k results ina sequence ci,j of windowed correlations between the mea-surements of Si and Sj .

For trainingMi,j , we use a sliding window of size s5 withvalues from ci,j ; this means that Mi,j encodes the distribu-tion of correlations between the measurements of Si and Sj .

Anomaly Detection Using the Dependency ModelsAfter learning Mi,j , we calculate a threshold δi,j as

δi,j = µi,j + wσi,j (13)

4There are n− k + 1 such windows in total.5In other words, we calculate the correlation between pairs of

sensors using windows of size k, and then use a window of size sof consecutive correlations as an input to each Mi,j .

where µi,j is the mean residual calculated on the trainingmeasurements and σi,j is the standard deviation of the train-ing residuals, and w ∈ N is a multiple of the standard devi-ation. During online operation, we generate a sample mi,j

given the current input and calculate a residual r as in equa-tion 8. The decision about the nominality of the observationthen proceeds as in equation 9.

Fault DiagnosisFor fault diagnosis, just as in [1], we use the traditional for-malisation of DeKleer and Williams [19] and Reiter [20].We create a conflict set for each of pair of components Siand Sj for which ri,j exceeds δi,j ; this gives rise to a collec-tion of conflict sets CS. Given CS, we apply the HS-DAGalgorithm [4] for finding diagnoses using the implementa-tion of the algorithm by Quaritsch and Pill [5].

4.3 Robotic Black Box ApplicationThe learning-based framework described here is designedso that it can be used on a robotic black box as described in[6]. The black box continuously logs data during the opera-tion of a robot, where data from different data sources maybe logged at different frequencies; however, as describedabove, the modelsMi,j require aligned measurements in or-der for the data correlations to be of any meaningful value.To resolve this issue, the black box needs to log a measure-ment only if its value changes significantly6 compared toits previous value; the measurement is otherwise consideredconstant. If this condition on the logged data is satisfied,correlations can be calculated even when correlated mea-surements are observed at different frequencies.

Another important aspect that needs to be considered isthe operating mode in which the correlation models areused. Each Mi,j is trained with data collected during nom-inal operation, but depending on the context in which thedata are collected, the model might only be usable in cer-tain operating modes (for instance, a robot moving over asmooth floor and moving over an uneven surface). As in[15], we assume that dedicated models can be created fordifferent operating modes and that the appropriate modelswill be used depending on the context in which the robot isoperating [21].

5 Experimental AnalysisTo show the feasibility of our proposed modification tothe SFDD method, we analyse our framework on the RO-POD7 platform, which is shown in Fig. 2. This robot wasdeveloped for logistics applications, such that the base isequipped with four so-called smart wheels, which are om-nidirectional wheels made up of two standard wheels with acaster offset. The wheels themselves provide different sen-sor measurements8, particularly current and voltage, wheelvelocities, IMU measurements, as well as atmospheric pres-sure and temperature. The platform itself also uses a 3Dlidar for distance measurements; the lidar is also used whenthe robot needs to attach itself to carts.

6Where the significance level can be set differently for differentvariables.

7ROPOD is a Horizon 2020 project: http://cordis.europa.eu/project/rcn/206247_en.html

8The wheels receive commands and send feedback through anEtherCAT communication channel.

Figure 2: ROPOD platform with four smart wheels

To simplify the presentation, we constrain our analysisto the current measurements from the smart wheels.9 Fortraining correlation models, we collected data by movingthe robot around our university building (as in Fig. 2) with ajoypad. The current measurements from this run are shownin Fig. 3a. For later testing, we collected another data set inwhich the communication lines of two wheels were cut offone after the other and the wheels were then reconnectedback, such that the robot was moved with a joypad both be-fore disconnecting the wheels and while they were discon-nected. The current measurements in the faulty data set areshown in Fig. 3b. Since the current measurements are gen-erally noisy, we used a median filter for visualisation, butalso as a preprocessing step before learning the models.

Given the data set from the nominal operation, we firstidentified the set of correlated measurements using κ = 0.5as a correlation threshold. In this case, all four current mea-surements are correlated to each other, such that the pair-wise correlations of the measurements using a sliding win-dow size k = 100 are shown in Fig. 4a. Similarly, thecorrelations on the faulty data set are visualised in Fig. 4b.The models Mi,j , 1 ≤ i < 4, i < j ≤ 4 were trained withthe correlations on the fault-free data.10 For each Mi,j , weused |V | = 10 and |H| = 20 and CD-1 for updating theweights, such that the models were trained for 30 learningepochs.11 We also tried other values of |V | and |H|, butsince our analysis is not focused on finding the most opti-mal training parameters, all results below are shown for thepreviously mentioned values.12

Once the correlation models were learned, we calculatedthe thresholds δi,j for each model according to equation 13using w = 3 as a safe choice. As a sanity check, we showthe residuals on the nominal data set in Fig. 5a. As canbe seen there, the models do occasionally raise false alarmssince the residuals cross the detection thresholds, but these

9The code and data for reproducing the results presentedin this section can be found at https://github.com/alex-mitrevski/generative-model-fdd

10Since we have binary network units, we normalise the networkinputs to lie between 0 and 1, using the range of current measure-ments observed in the nominal data as a normalisation factor.

11We note that |V | = s, the size of the sliding window of valuesfrom ci,j .

12In addition, other values for the sizes of the visible and hid-den layer did not significantly affect the results, at least not in thissimple use case.

are transient detections that could be avoided by smoothingthe detections, for instance using a hidden Markov model[15]. The more interesting residuals are the ones on thefaulty data set, which are depicted in Fig. 5b. As can be seenthere, the models were able to reliably detect the injectedfaults; the residuals then went down within the normal rangewhen the faults were removed. One observation that can bemade here is that the detections are slightly delayed, whichis however to be expected since the correlations are calcu-lated over a sliding window whose size affects the observeddelay (as mentioned before, we used a conservative value ofk = 100 in our experiments).

To evaluate the detection results quantitatively, we use thecommon precision and recall metrics:

precision =TP

TP + FP

recall =TP

TP + FNwhere TP is the number of true positive detections over alltime steps, FP is the number of false alarms, and FN is thenumber of missed detections, such that it should be notedthat these were calculated jointly over all pairwise models.The results shown in Fig. 5 evaluate to 88.6% precision and75.6% recall. The false alarms affecting the precision rateare particularly visible for the correlation model betweenthe currents of wheels 1 and 2; on the other hand, the de-layed detections caused by the sliding window size resultin a relatively low recall rate, but this is expected given theconservative value of k.

As mentioned in the previous section, the residuals cal-culated from the models produce conflict sets consisting ofpairs of components. In the case of both wheel 3 and wheel4 being disconnected, the collection of conflict sets is givenas follows:

X = {{c1, c3}, {c1, c4}, {c2, c3}, {c2, c4}, {c3, c4}}If we constrain ourselves to diagnoses of maximum cardi-nality 2, applying HS-DAG on the above collection of con-flict sets results in the only diagnosis {c3, c4}, which isclearly also the correct diagnosis in this case. If we allowcardinality 3 for the diagnoses, we also obtain {c1, c2, c3}and {c1, c2, c4} as potential diagnoses.

For the purposes of completeness, we also used an alter-native representation of the correlation modelsMi,j , namelywe represented them by Gaussian Mixture Models (GMMs)instead of RBMs. The results of applying GMMs for resid-ual generation on the fault-free and faulty data sets are de-picted in Fig. 6. We used five mixture components in thiscase; increasing the number of components did not seem toaffect the results significantly. As can be seen here, GMMsare also able to identify the trend in the data - we obtain92.1% precision when using a GMMs as a generative model,which is slightly better than the precision of the RBM - butthe residuals are generally more noisy than in the case ofthe RBMs and thus require post-processing in order to be ofpractical value. This is confirmed by the recall on the faultydata obtained when using GMMs, which is only 34.6%.

6 Discussion and ConclusionsThis paper discussed a modification of the sensor-basedfault detection and diagnosis (SFDD) algorithm by Kha-lastchi and Kalech [1] based on which the manually spec-ified data modes required by the algorithm are replaced by

(a) Fault-free current measurements (b) Current measurements with faults in wheels 3 and 4

Figure 3: Current measurements from the smart wheels. In the plot of the faulty data, the vertical dotted lines represent eventsin which faults were first introduced and then removed: the communication of wheel 4 and then 3 was cut off, which are thefirst two events, while the third event is when the wheels were reconnected back.

(a) Pairwise correlations on the fault-free data (b) Pairwise correlations on the faulty data

Figure 4: Pairwise correlations on the fault-free and faulty data. In the plot of the faulty data, the vertical dashed linesrepresent events that introduce and then remove a fault.

(a) Residuals on the fault free data (b) Residuals on the faulty data

Figure 5: Residuals on the fault-free and faulty data when using a Restricted Boltzmann Machine. The horizontal dot-dashedlines represent the residual thresholds for each model. In the plot of the faulty data, the vertical dashed lines represent eventsthat introduce and then remove a fault.

(a) Residuals on the fault free data (b) Residuals on the faulty data

Figure 6: Residuals on the fault-free and faulty data when using a Gaussian Mixture Model. The horizontal dot-dashed linesrepresent the residual thresholds for each model. In the plot of the faulty data, the vertical dashed lines represent events thatintroduce and then remove a fault.

models of pairwise sliding window correlations betweensets of correlated measurements. The distribution of eachcorrelation pair is represented by a generative model, whichis used for online residual generation and subsequent con-flict generation and diagnosis.

As is generally the case when a learning-based model isused, we assume that the data used for training the mod-els are representative enough of the nominal operation of arobot. Since nominal data are generally more abundant thanfaulty data, we see this as an advantage of our method overmethods based on discriminative models (e.g. [22]) sincethose also require faulty data for learning; however, if thedetection of specific faults is desired, a discriminative modelmay be more appropriate than a fault-independent genera-tive model.

There are various aspects that need to be addressed infuture work. First of all, as discussed in section 4.3, dif-ferent operating modes may require different sets of corre-lation models since the correlations may change betweenoperating modes; this however necessitates a suitable con-text transition and potentially recognition model, such asfor instance [23]. Related to that, our diagnosis module asdiscussed here is quite rudimentary and is limited to diag-nosing component faults, but could also be extended for di-agnosing higher-level execution failures [24]. Furthermore,even though our method is not robot-specific, the portabil-ity to different robots, for instance to mobile manipulators,needs to be investigated in a follow-up study.13 Finally, ona more conceptual note, while the correlation between pairsof sensor measurements does seem to be a suitable featurefor anomaly detection, it should be possible to obtain sim-ilar results with other features, such as moving averages orsliding window finite differences. This also applies to thedistance metric used for residual generation; the Hellingerdistance was experimentally found to be suitable for gen-erating residuals, but it should be possible to replicate theresults with other distance metrics. Both the features and

13In our lab, we are planning to use our method on a Toyota HSRmobile manipulator [25] and a KINOVA KORTEX Gen3 arm.

the distance metric are thus design parameters, which mayalso be learnable in an end-to-end fashion.

AcknowledgmentsROPOD is an Innovation Action funded by the EuropeanCommission under grant no. 731848 within the Horizon2020 framework program. We are additionally gratefulfor the continuous support by the b-it International Centerfor Information Technology. We would also like to thankAhmed Abdelrahman for his comments on our manuscript.

References[1] E. Khalastchi and M. Kalech. A sensor-based ap-

proach for fault detection and diagnosis for roboticsystems. Autonomous Robots, 42(6):1231–1248, Aug.2018.

[2] A. Fischer and C. Igel. An Introduction to RestrictedBoltzmann Machines. In Progress in Pattern Recog-nition, Image Analysis, Computer Vision, and Appli-cations, volume 7441 of Lecture Notes in ComputerScience, pages 14–36. 2012.

[3] S. Zhang, S. Zhang, B. Wang, and T. G. Habetler.Machine Learning and Deep Learning Algorithms forBearing Fault Diagnostics - A Comprehensive Review.CoRR, abs/1901.08247, 2019.

[4] R. Greiner, B. A. Smith, and R. W. Wilkerson. A Cor-rection to the Algorithm in Reiter’s Theory of Diagno-sis. Artificial Intelligence, 41:79–88, 1989.

[5] T. Quaritsch and I. Pill. PyMBD: A Library of MBDAlgorithms and a Light-weight Evaluation Platform.In 25th Int. Workshop Principles of Diagnosis DX’14,2014.

[6] A. Mitrevski, S. Thoduka, A. Ortega Sáinz, M. Schö-bel, P. Nagel, P. G. Plöger, and E. Prassler. DeployingRobots in Everyday Environments: Towards Depend-able and Practical Robotic Systems. In 29th Int. Work-shop Principles of Diagnosis DX’18, 2018.

[7] D. Wulsin, J. Blanco, R. Mani, and B. Litt. Semi-Supervised Anomaly Detection for EEG WaveformsUsing Deep Belief Nets. In 2010 9th Int. Conf.Machine Learning and Applications (ICMLA), pages436–441, Dec. 2010.

[8] P. Chopra and S. K. Yadav. Restricted Boltzmannmachine and softmax regression for fault detectionand classification. Complex & Intelligent Systems,4(1):67–77, Mar. 2018.

[9] Z. Chen, X. Zeng, W. Li, and G. Liao. Machine faultclassification using deep belief network. In Proc. IEEEInt. Conf. Instrumentation and Measurement Technol-ogy, pages 1–6, May 2016.

[10] F. Di Mattia, P. Galeone, M. De Simoni, and E. Ghelfi.A Survey on GANs for Anomaly Detection. CoRR,abs/1906.11632, 2019.

[11] C. Zhang and Y. Chen. Time Series AnomalyDetection with Variational Autoencoders. CoRR,abs/1907.01702, 2019.

[12] A. L. Christensen, R. O’Grady, M. Birattari, andM. Dorigo. Fault detection in autonomous robotsbased on fault injection and learning. AutonomousRobots, 24(1):49–67, 2008.

[13] R. Golombek, S. Wrede, M. Hanheide, and M. Heck-mann. A Method for learning a Fault Detection Modelfrom Component Communication Data in RoboticSystems. In 7th IARP Workshop on Technical Chal-lenges for Dependable Robots in Human Environ-ments, 2010.

[14] X. Li and L. E. Parker. Distributed Sensor Analysis forFault Detection in Tightly-Coupled Multi-Robot TeamTasks. In Proc. IEEE Int. Conf. Robotics and Automa-tion (ICRA), pages 3103–3110, May 2009.

[15] M. Fox, J. Gough, and D. Long. Detecting Execu-tion Failures Using Learned Action Models. In Proc.22nd National Conf. Artificial Intelligence - Vol. 2,AAAI’07, pages 968–973, 2007.

[16] D. Koller and N. Friedman. Undirected graphical mod-els. In Probabilistic Graphical Models: Principles andTechniques, chapter 4, pages 104–133. Upper SaddleRiver, NJ: Pearson Education, Inc., 2009.

[17] G. E. Hinton. A Practical Guide to Training RestrictedBoltzmann Machines. In Neural Networks: Tricks ofthe Trade, volume 7700 of Lecture Notes in ComputerScience, pages 599–619. 2012.

[18] X. L. Nguyen et al. On surrogate loss functions and f-divergences. The Annals of Statistics, 37(2):876–904,Apr. 2009.

[19] J. de Kleer and B. C. Williams. Diagnosing MultipleFaults. Artificial Intelligence, 32:97–130, Apr. 1987.

[20] R Reiter. A Theory of Diagnosis from First Principles.Artificial Intelligence, 32(1):57–95, Apr. 1987.

[21] M. Blanke et al. Diagnosis and Fault Tolerant Control.Germany: Springer-Verlag Berlin Heidelberg, 2nd edi-tion, 2006.

[22] T. Matsuno, J. Huang, and T. Fukuda. Fault detectionalgorithm for external thread fastening by robotic ma-nipulator using linear support vector machine classi-

fier. In 2013 IEEE Int. Conf. Robotics and Automation,pages 3443–3450, May 2013.

[23] M. Karg and A. Kirsch. Acquisition and use oftransferable, spatio-temporal plan representations forhuman-robot interaction. In Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ Int. Conf., pages5220–5226, Oct 2012.

[24] S. Banerjee and S. Chernova. Fault Diagnosis in RobotTask Execution. In AAAI Spring Symposium Series,2019.

[25] T. Yamamoto et al. Development of Human SupportRobot as the research platform of a domestic mobilemanipulator. ROBOMECH Journal, 6(1):4, 2019.

data-driven robot fault detection and diagnosis using generative …€¦ · data-driven robot...

Documents