an integrated diagnosis and repair architecture for ros ... · an integrated diagnosis and repair...

8
An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2 , Johannes Maurer 1 , Gerald Steinbauer 1 , Suzana Uran 2 , and Safdar Zaman 1 * 1 Institute for Software Technology, Graz University of Technology, Graz, Austria {johannes.maurer, steinbauer, szaman}@ist.tugraz.at 2 Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia {peter.lepej, suzana.uran}@uni-mb.si ABSTRACT The paper presents a novel diagnosis and repair system for ROS-based robot systems. It is an extension to the existing ROS diagnostics stack. It comprises five major modules: (1) a diagnosis model server, (2) a set of system observers, (3) a model-based diagnosis engine, (4) a planner- based repair engine and (5) a hardware diagnosis board. A diagnosis model is a logic-based de- scription of the system’s correct behavior. The diagnosis engine obtains this model from diagno- sis model server. Observers are software entities that observe particular properties of the system and provide information in standardized first or- der logic sentences. Using the observed informa- tion in combination with the model of the desired behavior of the robot system and model-based reasoning the diagnosis engine derives the root cause of a problem. The obtained diagnosis to- gether with the observations is used by the repair engine. It uses an action planner and a description of available repair actions to derive a sequence of repair actions that brings the robot system back to nominal state. The hardware diagnosis board on one hand provides observations about the power status and consumption of hardware components and on the other hand allows to power up or down selected hardware components. The paper pro- vides three contributions: a combination of diag- nosis and repair, the integration of hardware and software and the integration into ROS. 1 INTRODUCTION The motivation for the work presented in this paper origins from two fundamental observations. First, au- tonomous robots are artifacts that comprise a signif- icant number of hardware and software components that are quite heterogeneous in their structure and func- tionality. Moreover, these components closely inter- act with dynamic real-world environments. Hence, * Authors are listed in alphabetical order with respect to last names. because of various problems like wear, damage, de- sign and implementation flaws or shortage of testing the involved modules are always subject to faults and unexpected behavior. Moreover, the potentially non- deterministic nature of the interaction of the robot with its environment leads to additional faults and unex- pected situations. Obviously, such problems nega- tively affect the performance and the autonomy of a robot. Therefore, a truly autonomous and dependable robot has to have the capability to actively cope with such phenomena in an automated way. Second, an in- creasing number of research groups around the globe use the Robot Operating System (ROS) (Quigley et al., 1999) as a standard framework for their development of robot systems. ROS already provides a simple fault diagnosis system. But the system is mainly limited to monitoring hardware modules and code execution. In order to tackle the observations above we de- veloped an advanced diagnosis and repair architecture which is based on ROS and is able to inter-operate with the existing diagnostics stack. In the development we utilized our experience in diagnosis systems and re- sults of previous work of other colleagues and us such as robotics software diagnosis (Steinbauer et al., 2006; Kleiner et al., 2008; Golombek et al., 2010), hardware repair (Brandst¨ otter et al., 2007) and sensor validation (Kleiner et al., 2009). There are three major contributions of this paper. First the proposed and implemented system integrates diagnosis and repair for robot systems in one running system. In particular active repair are hardly seen in such systems. Second, the system integrates soft- ware and hardware components in the diagnosis and repair process. This allows to approach more complex systems comprising hardware and software. Finally, the system is based on the already widely used ROS framework and its standards. This fact is beneficial for the acceptance of the system by other robot developer or researcher. The system is publicly available on an open-source basis. 2 ROS AND ITS DIAGNOSTICS STACK ROS is an open-source, meta-operating system that primarily runs on Unix-based platforms. The main 1

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

An integrated Diagnosis and Repair Architecture forROS-Based Robot Systems

Peter Lepej 2, Johannes Maurer 1, Gerald Steinbauer 1, Suzana Uran 2, and Safdar Zaman 1 ∗

1 Institute for Software Technology, Graz University of Technology, Graz, Austria{johannes.maurer, steinbauer, szaman}@ist.tugraz.at

2 Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia{peter.lepej, suzana.uran}@uni-mb.si

ABSTRACT

The paper presents a novel diagnosis and repairsystem for ROS-based robot systems. It is anextension to the existing ROS diagnostics stack.It comprises five major modules: (1) a diagnosismodel server, (2) a set of system observers, (3)a model-based diagnosis engine, (4) a planner-based repair engine and (5) a hardware diagnosisboard. A diagnosis model is a logic-based de-scription of the system’s correct behavior. Thediagnosis engine obtains this model from diagno-sis model server. Observers are software entitiesthat observe particular properties of the systemand provide information in standardized first or-der logic sentences. Using the observed informa-tion in combination with the model of the desiredbehavior of the robot system and model-basedreasoning the diagnosis engine derives the rootcause of a problem. The obtained diagnosis to-gether with the observations is used by the repairengine. It uses an action planner and a descriptionof available repair actions to derive a sequence ofrepair actions that brings the robot system back tonominal state. The hardware diagnosis board onone hand provides observations about the powerstatus and consumption of hardware componentsand on the other hand allows to power up or downselected hardware components. The paper pro-vides three contributions: a combination of diag-nosis and repair, the integration of hardware andsoftware and the integration into ROS.

1 INTRODUCTIONThe motivation for the work presented in this paperorigins from two fundamental observations. First, au-tonomous robots are artifacts that comprise a signif-icant number of hardware and software componentsthat are quite heterogeneous in their structure and func-tionality. Moreover, these components closely inter-act with dynamic real-world environments. Hence,

∗Authors are listed in alphabetical order with respect tolast names.

because of various problems like wear, damage, de-sign and implementation flaws or shortage of testingthe involved modules are always subject to faults andunexpected behavior. Moreover, the potentially non-deterministic nature of the interaction of the robot withits environment leads to additional faults and unex-pected situations. Obviously, such problems nega-tively affect the performance and the autonomy of arobot. Therefore, a truly autonomous and dependablerobot has to have the capability to actively cope withsuch phenomena in an automated way. Second, an in-creasing number of research groups around the globeuse the Robot Operating System (ROS) (Quigley et al.,1999) as a standard framework for their developmentof robot systems. ROS already provides a simple faultdiagnosis system. But the system is mainly limited tomonitoring hardware modules and code execution.

In order to tackle the observations above we de-veloped an advanced diagnosis and repair architecturewhich is based on ROS and is able to inter-operate withthe existing diagnostics stack. In the development weutilized our experience in diagnosis systems and re-sults of previous work of other colleagues and us suchas robotics software diagnosis (Steinbauer et al., 2006;Kleiner et al., 2008; Golombek et al., 2010), hardwarerepair (Brandstotter et al., 2007) and sensor validation(Kleiner et al., 2009).

There are three major contributions of this paper.First the proposed and implemented system integratesdiagnosis and repair for robot systems in one runningsystem. In particular active repair are hardly seenin such systems. Second, the system integrates soft-ware and hardware components in the diagnosis andrepair process. This allows to approach more complexsystems comprising hardware and software. Finally,the system is based on the already widely used ROSframework and its standards. This fact is beneficial forthe acceptance of the system by other robot developeror researcher. The system is publicly available on anopen-source basis.

2 ROS AND ITS DIAGNOSTICS STACKROS is an open-source, meta-operating system thatprimarily runs on Unix-based platforms. The main

1

Page 2: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

idea is a distributed system of programs called nodes,that are individually designed and loosely coupledat run-time. The framework provides functionalityfor hardware abstraction, low-level device control andmessage passing between nodes. ROS provides com-monly used operations for high-level applications, anumber of tools for the organization and executionof the development process, a number of means ofinter-software communication, basic data types and anumber of off-the-shelf third-party modules. Com-munication between nodes is established either by apublisher/subscriber mechanism on appropriate topicsor by calling services on responsible nodes. Topicsare strings that uniquely identify communication chan-nels. Services implement request/reply interactionsbetween nodes and are similar to remote method invo-cation (RMI). Messages are simple names data struc-tures that are passed between nodes.

ROS provides a simple fault diagnostic system. Itis mainly designed monitoring hardware modules andcode execution. The diagnostics system collects infor-mation from hardware drivers and robot hardware foranalysis, troubleshooting and logging1. The diagnos-tics stack is built around the /diagnostics topic. Thehardware drivers and devices publish standard diagno-sis messages on the /diagnostics topic with the devicenames, states and specific key-value pairs.

A complete diagnostic system should easily be ableto diagnose all aspects of a system such as hardwareproblems, problems in software modules or other dif-ficulties occurring at run-time. The existing ROS di-agnostic stack is not complete in that sense. It mainlydeals with hardware drivers or publishers related tohardware that publish data on the /diagnostics topic.

Moreover, the existing system requires that the mod-ule or node to be diagnosed have to publish data onthe /diagnostics topic. This means that if a hardwarenode does not provide such data it cannot be includedinto the diagnostics system until one alters the sourcecode of the node and adds a publisher for that informa-tion. This can be a problem if one is unable to alter thecode because it might not be available or complicatedto extend. Using a more general diagnosis concept canprevent a developer from altering nodes while still be-ing able to supervise the node to a certain extent. Thesame applies to pure software nodes or services.

The ROS diagnostics system follows a more lineardiagnosis approach where the relation of a symptomto a root cause is quite clear, which is fine for simplehardware devices. If the system and the interaction be-tween various modules become more complex it mightbe hard to connect an undesired behavior observed toits real root cause. For instance, wrong pose estimationcan have several root causes and might be not only tiedto a malfunctioning laser scanner. Using a more gen-eral observer and diagnosis concept the proposed di-agnostic system will be able to cover a much broaderscope of systems, faults and interactions.

Finally, the existing diagnostics system does nothave any capabilities for autonomous repair. It is es-sential for an autonomous robot that once a problem

1For further information on the ROS diagnostics stackplease refer to http://www.ros.org/wiki/diagnostics.

has been identified the system is able to derive and ex-ecute appropriate repair actions automatically.

3 RUNNING EXAMPLERescue and searching of victims after an earthquakeis a highly risky operation. Buildings are in dangerof collapse and further unpredictable earth shakes mayappear. Robots can be used to reduce the risk of firstresponders in such scenarios. The task of a search andrescue robot is to look for victims, to build a map andto make positions of victims in the map so that res-cue teams could efficiently plan and execute rescueactions. High risk environments demand a high levelof rescue robot autonomy therefore self diagnosis andself repair ability of a search and rescue robots is de-sirable. This makes a search and rescue robot a perfectexample for verification of the novel diagnosis and re-pair architecture proposed in this paper.

The running example is inspired by the RoboCupRescue competition. Goal of the competition is tobuild search and rescue robots that explore an un-known environment reproducing a collapsed building,to generate a map of the explored area, to search forsimulated victims, to mark the position of that vic-tims in the map and to examine vital signs of vic-tims. The joint RoboCup Rescue team of the GrazUniversity of Technology (TUG) and the University ofMaribor (FERI) built a search and rescue robot for theRoboCup Rescue competition. The TEDUSAR robotis shown in Figure 1.

Figure 1: TEDUSAR search and rescue robot.

The robot is based on the Dr.Robot Jaguar mo-bile robot platform with two tracks and two articu-lated, tracked, independently controlled arms. Theplatform can move over various terrains and climbup slopes and stairs. The robot is equipped with aHokuyo laser range finder (LRF) mounted on a lev-eling mechanism with 2 degrees of freedom to assurescanning in horizontal plane for building maps. Asensor head with two degree of freedom consisting ofa FLIR PathFindIR thermal camera and a MicrosoftKinect sensor is used to detect victims around the roboton basis of body temperature and computer vision.An XSens MTi inertial measurement unit (IMU) de-termines the alignment of the robot. In additional, awireless router is mounted on the robot to establish aconnection to the remote operator control station and ahardware diagnosis board was designed and mountedon the robot to observe power consumption behaviorsof hardware components (for a detailed description see

2

Page 3: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

Section 4.1). The robot is controlled by an embed-ded PC running an Ubuntu Linux and control softwarebased on the ROS framework. The operator stationconsists of a joystick connected to a laptop PC run-ning an Ubuntu Linux, the joystick driver node and thevisualization tool for the operator.

/joy

/map

/cmd_vel

/scan

/imu_data

rviz

jtn

hm

lac

la_servo

1_movin

g

la_servo

2_movin

g

jysjn

hn

lanla

imu

hu

ja

in

jyn

Figure 2: Simple control architecture for the searchand rescue robot. Rectangles represent hardware mod-ules. Gray circles represent hardware nodes. Darkgray circles represent hardware driver nodes withswitchable hardware device. White circles representnormal software nodes. Solid arrows represent pub-lisher/subscriber communication. Dot-dashed arrowsrepresent service calls. Dotted lines represent hard-ware connections.

The scenario for the diagnosis example is a tele-operated exploration and mapping of an unknownenvironment. Figure 2 shows the components ofthe system and the communication between them.In the Figure abbreviations ja, hu, la, jn, hn, in,lan, jtn, hm, lac, jyn and jys represent jaguar,hokuyo, laser alignment, jaguar node, hokuyo node,imu node, laser alignment node, jaguar teleop node,hector mapping, laser alignment control, joy nodeand joystick respectively. The movement of the robotis remotely controlled by an operator using a joystick.The signals from the joystick are converted to velocitycommands for the robot platform. The communica-tion between the operator station and the robot is car-ried out via wireless network. Laser scans are used togenerate a map of the explored space by using a flexi-ble and scalable mapping approach (Kohlbrecher et al.,2011). For building a map the laser range finder has tobe aligned to the horizontal plane. Therefore, the pos-ture of the robot is determined with the IMU and theangles of the servos in the laser alignment system areset accordingly. The visualization tool RViz is used todisplay the map to the operator.

4 SYSTEM ARCHITECTUREThe development of the architecture of the proposeddiagnosis and repair system was guided by the follow-ing goals: (1) compatibility with ROS and its diagnos-tics stack, (2) minimizing the need to alter or annotateexisting system code, (3) integration of both hardwareand software observations and repair actions, (4) use of

advanced diagnosis and repair approaches, (5) generalabstract interfaces for observations and actions and (6)easy modeling of the diagnosis and repair domains.

Following these requirements we developed the ar-chitecture of the diagnosis and repair system shown inFigure 3. The architecture resembles the five modulesalready mentioned in the introduction: (1) a diagno-sis model server, (2) a set of system observers, (3) amodel-based diagnosis engine, (4) a planner-based re-pair engine and (5) a hardware diagnosis and repairboard. Each module provides a particular functionalitywhich is described below in more detail. The modulescommunicate using three types of communication:

Standard Topics and Messages All observerspublish their observations in form of an array ofstrings containing first-order logic sentences to theROS topic /observations. For instance the string"ok(temp cpu1) and ok(temp cpu2)" rep-resents the fact that the temperature of CPU1 andCPU2 are within the proper range. Diagnosis resultsare published to the topic /diagnosis using a diagnosismessage containing sets of working and faulty compo-nents (bad,good). The diagnosis message may containmultiple diagnoses. A dedicated diagnosis model mes-sage contains a set of propositions and a set of clauses.Please refer to Section 4.3 for more details on the di-agnosis model and the diagnosis process.

Repair Actions All repair actions in the system areimplemented using the ROS action server/client archi-tecture2. Actions use a generic action definition ac-cepting a list of strings as parameters and returning asingle standard integer value for the feedback and theresult. This allows for an easy integration of new re-pair actions. For instance shutdown("jaguar")represents the ground repair action for shutting downthe power supply for the robot base.

Individual Communication This category com-prises special means of communication used by specialnodes or services outside ROS. For instance the diag-nosis system communicates with the hardware diag-nosis and repair board via a proprietary protocol overTCP/IP. Moreover, some OS-related repair actions likeresetting the USB bus use special OS calls, e.g. ioctl.

4.1 Diagnostic BoardOur experiences and previous research work haveshown that in addition to software diagnosis and re-pair hardware-based diagnosis and repair is needed forsystems comprising devices with no direct support fordiagnosis and reconfiguration. In order to get addi-tional information for the diagnosis system we devel-oped a hardware diagnosis board and a hardware diag-nosis observer. Together they are able to gather infor-mation like current measurements and power supplydetection of individual components of the system. Theknowledge that a particular component draws a certain

2For further information on the ROS action library pleaserefer to http://ros.org/wiki/actionlib.

3

Page 4: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

DiagnosisEngine

RepairEngine

DiagnosisModelServer

OperatingSystem

RepairModel

HardwareDiagnosis

Board

HDBNode

DiagnosisModel

PObs

ROSMaster

DObsHardware

Node

ROSDiagnostics

Tools

GObsNode GObsNode GObsNode

QObsNode

Node Node MObs

Observations

Diagnosis

Node Actions

Power Actions

OS-relatedActions

NObs

Figure 3: Overview System Architecture. Solid linesrepresent ROS topic communication. Dash-dottedlines represent ROS action calls. Dashed lines repre-sent other means of interaction. Dark gray modulesdepict existing ROS modules. The lighter gray mod-ule represents the underlying operating system. Whitemodules are the novel modules of the architecture.

amount of current or that a device is indeed poweredcan assist the diagnosis process.

Moreover, the board provides the possibility to turnon and off the power supply for different system com-ponents. This allows two different kinds of repair ac-tions related to hardware: (1) power off/on cycles and(2) hardware reconfiguration. The board is organizedin ten individual channels that can be switched on andoff individually. Moreover, individual current mea-surements can be taken. Figure 4 shows an overview ofthe hardware diagnosis board. For the running exam-ple seven channels are used for the following hardwarecomponents: (1) robot base, (2) PC, (3) sensor pan-tiltunit, (4) laser alignment system, (5) thermal camera,(6) laser range finder and (7) Kinect.

I1

U1

I2

U2

I3

U3

I4

U4

I5

U5

I6

U6

I7

U7

ProcessingUnit

7

TCP/IP

PC

Sensor Pan-Tilt

Laser Alignment

Laser Rang Finder

Thermal-Camera

Kinect

Robot Base22V

22V

15V

12V

12V

12V

12V

Figure 4: Overview hardware diagnosis board androbot connections (search and rescue robot example).

Similarly systems called power management system(Soltanzadeh et al., 2011) had been already used in

robot systems. Such systems allows user to manuallyswitch on or off individual on-board systems and per-forms current and voltage monitoring. In addition tothis, our system is also capable to do these steps au-tonomously. The system is designed to repair and re-configure the robot system without touching it.

The central part of the hardware diagnosis board isa micro-controller embedding a TCP/IP stack. It isresponsible for the current measurements, the powermonitoring and power configuration of the individualchannels. The board is connected with the outside overEthernet and embeds a server with a proprietary proto-col. The protocol allows the client to pull informationlike measurements and states of channels or to initiatebroadcasts on a regular basis. Moreover, it providescommands to change the power configuration of indi-vidual channels. Finally, the board can be configuredin a way that at power up a defined set of channels isautomatically on. For the running example we config-ured the board to start automatically the PC to allowbootstrapping during system start. This configurationallows us to automatically start the board, the PC andthe ROS-based software parts of the architecture.

4.2 ObserversObservers are general software entities implementedas ROS nodes that monitor particular aspects of thesystem. There are different kinds of observers su-pervising different aspects or properties. The re-sult of the observation is published on the /observa-tions topic as first order logic (FOL) sentence. Thesentence is a representation of the observed prop-erty and its status. Observations are published asset of strings. Each string represents a single FOLsentence. Currently only simple literals are sup-ported, e.g. ok(temp cpu1)("ok(temp cpu1)") or¬ok(temp cpu1)("not ok(temp cpu1)"). Butmore complex expressions using for instance quanti-fiers can be easily integrated. Please note that obvi-ously the representation and expressiveness of obser-vations have to match with the capabilities of the useddiagnosis approach. Currently we provide the follow-ing observer types:Diagnostic Observer (DObs) are the connection be-

tween the existing ROS diagnostics system andthe integrated diagnosis and repair system. DObsreceives diagnostic messages and generate theoutput based on similar rules as the existingROS diagnostic analyzers, e.g. temp cpu1 <40.0 ⇔ ok(temp cpu1). These observers caneasily reuse the existing information coming fromhardware devices.

General Observer (GObs) simply subscribes to aparticular topic and uses the received messagesas input. This allows to observe the commu-nication behavior of nodes without altering thenode’s code. For this general case this is doneby checking the frequency of messages. If itobserves the right frequency f(t) for a topic t(|f(t)−µf (t)| < σf (t)) it submits ok(t), ¬ok(t)otherwise. Where the frequency is measured us-ing a time window ω.

Node Observer (NObs) observes the running state ofsoftware nodes. It submits running(n) if node n

4

Page 5: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

actually runs, ¬running(n) otherwise.Multiple Observer (MObs) is similar to a general

observer but it subscribes to several topics.Therefore, more complex communication behav-iors such as conditioned or triggered communica-tion are observed. For details about the processbehind please refer to (Kleiner et al., 2008).

Qualitative Observers (QObs) observes the abstracttrend of a particular value in the massage of atopic. It reports if value v actually increases(inc(v)), decreases (dec(v)) or stays constant(const(v)). The observer fits a linear regressionfor a given time window ω and compares it witha given slope b. For details about the abstractionprocess please refer to (Kleiner et al., 2009).

Hardware Observer (HObs) subscribes to the mea-surements and status updates coming from thehardware diagnosis board to supervise the sta-tus of the different power supply channels. Itprovides the observation on(m) if the mod-ule m is powered, otherwise ¬on(m). More-over, it provides all current measurements andpower states in a special message on the topic/board measurments. This allows further ob-servers to reuse the information.

Property Observer (PObs) supervises properties notdirectly related to a topic that may affect therobot’s performance like CPU or memory usageof a particular service. The observer publishes in-dividual observations and have to be individuallyimplemented according to the actual needs.

4.3 Diagnosis EngineThe diagnosis engine takes observations in the form ofFOL sentences and an abstract diagnosis model (sys-tem description) to generate a diagnosis, a set of com-ponents that are faulty and a set of components thatare still working properly. Predicate ¬AB(m) denotesthat module m working properly. While AB(m) de-notes that module m shows a faulty behavior.

The diagnosis engine follows the principles ofmodel-based diagnosis presented in (Reiter, 1987).The approach uses an abstract model that defines cor-rect behavior and some current observations of the sys-tem. A fault is detected if the outcome of the modeland the observation lead to a contradiction. Using ahitting set algorithm the approach calculates diagnosesthat resolves the contradiction, i.e. explain the misbe-havior. The engine locates that component or set ofcomponents that are the root cause for the contradic-tion. For details on the diagnosis step we refer the in-terested reader to (Reiter, 1987). The diagnosis engineuses an open-source Java-based implementation of theapproach (Peischl and Wotawa, 2003).

Each single observation ot that occurs at time t onthe topic /observations is integrated into an observa-tion set OBS. OBS = OBS ⊕ ot, where

S ⊕ o =

{(S\¬L) ∪ L o = L(S\L) ∪ ¬L o = ¬L L is an atom.

The diagnosis engine obtains the diagnosis modelat run-time from a model server. This allows makingchanges in the diagnosis model during run-time. For adetailed description of the model see next paragraph.

The results of the diagnosis engine are published onthe /diagnosis topic. As described earlier the diagnosismessage may comprise several diagnoses build up bysets of working (good) and faulty (bad) components.The union of both sets is equal to the set of all systemcomponents (good ∪ bad = C) for each individual di-agnosis. Apart from this it also generates diagnosticmessages compatible for the ROS diagnostics stack.

4.4 Diagnosis Model ServerThe diagnosis model server is an action server whichprovides the diagnosis model. The diagnosis model isstored as a YAML file (similar to XML) and keeps fivesections: (1) a string that denotes the proposition thatrepresents the AB predicate, (2) the same for ¬AB,(3) a string for a prefix denoting a negative literal, (4)a set of all propositions, (5) a set of clauses that definesthe diagnosis model. We have to describe the diagnosismodel this way as the currently used diagnosis engineonly supports propositions and Horn clauses.

For the modeling of a robot system we use the fol-lowing assumptions:• the set of all componentsC is divided into the dis-

junctive sets N (ordinary nodes), HN (hardwaredriver nodes), SN (hardware driver nodes withswitchable hardware), SH (switchable hardwaredevices) and H (other hardware devices). Theterm switchable means the related hardware canbe controlled by the hardware diagnosis board.

• a function out : C → 2T returning the outputtopics of a component, T is the set of all topics inthe system

• a function in : T × C → 2T returning the inputtopics of a component that influence an output

• a function hardware : SN → 2SH returning thehardware related to a hardware driver nodes withswitchable hardware

For the running example HN = {in, jyn}, SH ={ja, hu, la}, H = {js, imu}, SN = {jn, hn, lan}and N = {jtn, hm, rviz, lac}.

We add the following clauses to the system descrip-tion:

1. for each h ∈ SH add: ¬AB(h)→ on(h)

2. for each s ∈ SN and each t ∈ out(s) add:¬AB(s) ∧

∧h∈hardware(s) ¬AB(h)→ ok(t)

3. for each h ∈ HN and each t ∈ out(s) add:¬AB(h)→ ok(t)

4. for each n ∈ N and each t ∈ out(n) add:¬AB(n) ∧

∧i∈in(t,s) ok(i)→ ok(t)

5. for each n ∈ C\{H,SH} add:¬AB(n)→ running(n)

Please note that this is a first minimalistic diagno-sis model. Obviously based on the complexity of thesystem more observations and clauses can be added.Clause 1 states that a hardware device should be pow-ered to work correctly. Clause 2 states that if a hard-ware node and its related hardware components workcorrectly the output of the node have to be ok. Simplehardware nodes produce correct output if they work

5

Page 6: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

correctly (clause 3). Ordinary nodes produce properoutput if the node itself works correctly and all rele-vant inputs are correct. Clause 5 states the fact thatcorrectly working software nodes have to run.

The diagnosis engine is automatically invoked everysecond to be reactive to chances in the system. Pleasenote that in the case that a complete system is workingcorrectly no diagnoses are published. But one mightthink of more suitable triggers like some rules basedon the observations.

4.5 Repair EngineIn order to describe and enable repair actions we modelthe repair as a planning domain. The repair enginetakes current observations and diagnoses as input. Ifthe repair engine receives a diagnosis message it con-verts the diagnosis into to a planning problem andsolves it. Please note that in the current implemen-tation in the case of multiple diagnosis always a repairplan for the first diagnosis is obtained.

The repair planning problem for a diagnosis ∆ anda set of observations O comprise the following parts:

• an initial stateI = O ∪

⋃c∈∆bad

AB(c) ∪⋃

c∈∆good¬AB(c)

• a set of repair actionsAwith action preconditionspre(a) and effects eff (a) for the individual ac-tions a• a goal state G =

⋃c∈C ¬AB(c)

The initial state is the union of the actual observa-tions and the state of all components. The goal of arepair plan is that all components work correctly again.

We use the widely recognized Planning DomainDefinition Language (PDDL) to represent the domainand problem definitions for the planning (Knoblocket al., 1998). The description of a planning prob-lem in PDDL comprises two parts: (1) domain de-scription and (2) problem description. The advan-tage of this approach is that a wide range of ex-isting high-performance planner and various exten-sions to classical planning like typing can be easilyused directly. To parse the PDDL domain descrip-tion that contains all definitions of actions and domainobjects we use the open-source Java-based packagepddl4j (Pellier, 2008). All the possible repair actionsare defined in the domain description. These actionsinclude start node(n), stop node(n), power up(h)and shutdown(h). The actions start node(n) andstop node(n) are for starting and stopping a softwarenode n while power up(h) and shutdown(h) are forpower up and shutdown a hardware component h. Ex-amples of action description are shown here:(:action power_up:parameters (?c):precondition (and (not (on ?c))(bad ?c)):effect (and (on ?c)(good ?c))

)

(:action stop_node:parameters (?c):precondition (and (bad ?c)(running ?c)):effect (and (bad ?c)(not (running ?c)))

)

The first action description states that the plannercan power up a component c only if c is off and faulty.

The effect of the action is that component c is poweredand works correctly.

The problem description is simply a PDDL descrip-tion of the initial state I and the goal G. Within therepair engine we use a Java-based GraphPlan imple-mentation to find a plan (sequence of repair actions)(Blum and Furst, 1997). Once the planner has found avalid repair plan it starts the execution of the sequenceof repair actions. The execution of action is triggeredby an invocation of the appropriate action server. Be-cause of the standard signature for repair action servers(a unique name and a list of parameter strings) an easymatching between the planner and the ROS-based sys-tem is possible. Please note that the planner simplywaits for the completion message of the called actionserver and currently does not check the actions’ effect.A better strategy for the future would be to wait untilthe effects have been established. For instance it mighttake longer that a node’s output topics become correctthan simply the time to restart the node.

5 ADAPTATION TO NEW ROBOTSThe proposed diagnosis and repair system is a generalsystem and it can easily be adapted to new robot sys-tems. As it is itself a ROS-based system the diagnosisand repair system can easily be applied to ROS-basedsystems. If one implements bridges for the observa-tion and the actions the systems not based on ROScan benefit too. For a new system the only thing re-quired is that the diagnosis model is adapted accord-ingly. The clauses given in the model section can beused as a starting point. If new repair actions are nec-essary they can be easily added to the domain descrip-tion of the planner. Also the set of observers can beeasily extended if necessary. The system provides al-ready templates for observer development. Even theproposed hardware diagnosis board can be easily inte-grated in a new system by simply routing component’spower lines via a channel of the board. Overall the pro-posed diagnosis system has the capability to be easilyadapted to new system.

6 EXPERIMENTAL RESULTSIn order to evaluate the function and the performanceof the proposed system we conducted preliminary ex-periments using the example robot. We tested the di-agnosis and repair system for both hardware devicesas well as software nodes. The components comprisesthe devices and nodes introduced in Section 3. The di-agnosis model and the repair domain description wascreated as described in Section 4. Moreover, we setup one hardware observer (monitoring the power sta-tus of the hardware), one node observer for each soft-ware node and one general observer each for the topics/odom, /map, /scan, /imu data and /cmd vel. Ex-periments were conducted for two scenarios:

6.1 System-Power-up scenarioIn order to evaluate the correctness of the diagno-sis model and the planning domain we conducted apower-up experiment. In the system-power-up sce-nario the whole robots system is initially switched offand diagnosis and repair system has to transfer it toa state ready for the mapping. In the beginning all the

6

Page 7: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

hardware components are switched off and all softwarenodes are not running. The only powered hardwareis the hardware diagnosis board and the PC. The onlyrunning software is the diagnosis and repair system.The diagnosis engine obtains the following diagnosisusing the diagnosis model and observations from theobservers: ∆good = {},∆bad = {j, h, la, lan, lac, hm, jn, hn, in, jtn, jyn}

The diagnosis and observations are then used by theplanner to plan and invoke proper action. Figure 5shows for the system-power-up scenario the diagnosisand planner results for all the hardware and softwarecomponents. At the top of the figure the sequence ofthe repair actions is depicted. Initially all hardwareand software components are abnormal. The plannerfirst powers up the laser alignment hardware by in-voking a power up action. After laser alignment pow-ers up and becomes normal. The planner continues topower up hokuyo. After all the hardware is poweredup the planner continues with the software nodes andstarts them one by one by calling start node actions.First the planner starts laser alignment node thenhokuyo node, hector mapping, jaguar teleop node,jaguar node, laser alignment control and imu node.All the components become normal at the end of therepair plan. As shown in Figure 5 it takes 30.2 sec-onds to bring all components in normal state. The re-sults clearly shows that our system is able to obtain theright diagnosis and to execute a repair plan to set thesystem’s hardware and software in a correct state.

6.2 Device-Shut-down scenarioIn the second experiment we evaluated how the sys-tem reacts to a dynamic fault occurring at run-time. Inthe device-shut-down scenario the system is operatingcorrectly. Then suddenly one hardware device goesdown. In this scenario jaguar hardware is switchedoff. As a result the jaguar node gets disconnectedfrom jaguar and stops publishing odometry data. Sothe required sequence of actions should be power upthe jaguar, stop node and start node for jaguar node.When jaguar is switched off the diagnosis engine addsit to the faulty components in the diagnosis. The plan-ner takes this diagnosis and invokes power up actionfor the jaguar. When jaguar was powered up the plan-ner killed (the still running) jaguar node by invokingstop node action. After jaguar node was stopped suc-cessfully, the planner invokes start node action to startit again. Figure 6 shows this scenario with related di-agnoses and sequence of the repair actions. Please notethat in this scenario the planner is invoked twice. Firstit is invoked for the power up action for the jaguar.Second it is invoked for the two actions stop nodeand start node for jaguar node. Then jaguar node be-comes normal making whole the system running nor-mally again.

7 CONCLUSION AND FUTURE WORKThe work presented in this paper provides three con-tributions. First, the proposed system combines au-tomated diagnosis for robot systems with automatedrepair. Second, the system incorporates software andhardware into the diagnosis and repair process. Fi-nally, the proposed system is based on and extends thepopular Robot Operating System (ROS).

The proposed system comprises several types of ob-servers, a diagnosis engine, a repair engine and a hard-ware diagnosis board. Observers monitor the state ofhardware, nodes and topics. The diagnosis engine fol-lows the model-based diagnosis approach and takes adiagnosis model along with observations to derive di-agnoses. The repair engine uses a PDDL-based do-main and problem description to obtain a proper re-pair plan. A novel hardware diagnosis board allows forhardware monitoring and reconfiguration. Preliminaryexperimental results show that the proposed diagnosisand repair system is able to detect faults in a robot sys-tem comprising software and hardware. Moreover, itis able to bring back a system to a normal state.

In future work we will further investigate the vari-ous interactions within the diagnosis and repair systemthat have a huge impact on the stability and quality ofthe diagnosis and repair process. Moreover, we willincrease the capabilities for monitoring and modeling.Finally, we have to compare the system to other inte-grated systems like the Livingston and LAAS architec-tures (Muscettola et al., 1998; Alami et al., 1998).

8 ACKNOWLEDGMENTSThe work has been partly funded by the EuropeanFund for Regional Development (EFRE), the LandSteiermark and the Republic of Slovenia under theTedusar grant. Safdar Zaman is supported by theHigher Education Commission (HEC) of the govern-ment of Pakistan.

REFERENCES(Alami et al., 1998) R. Alami, R. Chatila, S. Fleury,

M. Ghallab, and F. Ingrand. An architecture forautonomy. International Journal of Robotics Re-search, 17:315–337, 1998.

(Blum and Furst, 1997) Avrim L. Blum and Mer-rick L. Furst. Fast planning through planning graphanalysis. Artificial Intelligence, 90:281–300, 1997.

(Brandstotter et al., 2007) Mathias Brandstotter,Michael Hofbaur, Gerald Steinbauer, and FranzWotawa. Model-based fault diagnosis and re-configuration of robot drives. In 2007 IEEE/RSJInternational Conference on Intelligent Robots andSystems (IROS), San Diego, CA, USA, 2007.

(Golombek et al., 2010) R. Golombek, S. Wrede,M. Hanheide, and M. Heckmann. Learning a prob-abilistic self-awareness model for robotic systems.In 2010 IEEE/RSJ International Conference on In-telligent Robots and Systems (IROS), 2010.

(Kleiner et al., 2008) Alexander Kleiner, GeraldSteinbauer, and Franz Wotawa. Towards Au-tomated Online Diagnosis of Robot NavigationSoftware. In First International Conference onSimulation, Modeling, and Programming forAutonomous Robots (SIMPAR 2008), volume 5325of Lecture Notes in Computer Science, pages159–170. Springer, 2008.

(Kleiner et al., 2009) Alexander Kleiner, GeraldSteinbauer, and Franz Wotawa. Using qualitativeand model-based reasoning for sensor validationof autonomous robots. In Twentieth International

7

Page 8: An integrated Diagnosis and Repair Architecture for ROS ... · An integrated Diagnosis and Repair Architecture for ROS-Based Robot Systems Peter Lepej 2, Johannes Maurer 1, Gerald

23rd International Workshop on Principles of Diagnosis

Figure 5: Behavior of the diagnosis and repair system for the System-Power-up scenario.

Figure 6: Behavior of the diagnosis and repair system for the Device-Shut-down scenario.

Workshop on Principles of Diagnosis (DX 2009),Stockholm, Sweden, 2009.

(Knoblock et al., 1998) Craig Knoblock, AnthonyBarrett, Dave Christianson, Marc Friedman, ChungKwok, Keith Golden, Scott Penberthy, David ESmith, Ying Sun, and Daniel Weld. Pddl the plan-ning domain definition language. AIPS-98 Compe-tition Committee, 78(4):1–27, 1998.

(Kohlbrecher et al., 2011) S. Kohlbrecher, J. Meyer,O. von Stryk, and U. Klingauf. A flexible and scal-able slam system with full 3d motion estimation. InProc. IEEE International Symposium on Safety, Se-curity and Rescue Robotics (SSRR), 2011.

(Muscettola et al., 1998) Nicola Muscettola, P. Pan-durang Nayak, Barney Pell, and Brian C. Williams.Remote Agent: to boldly go where no AI systemhas gone before. Artificial Intelligence, 103(1-2):5– 47, 1998.

(Peischl and Wotawa, 2003) Bernhard Peischl andFranz Wotawa. Model-based Diagnosis Or Rea-soning From First Principles. IEEE IntelligentSystems, 18(3):32–37, 2003.

(Pellier, 2008) Damien Pellier. Pddl4j, 2008.http://sourceforge.net/projects/pddl4j.

(Quigley et al., 1999) M. Quigley, K. Conley, B. P.Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler,and A. Y. Ng. Ros: an open-source robot operatingsystem. In Proceedings of IEEE Aerospace Confer-ence, 1999.

(Reiter, 1987) Raymond Reiter. A theory of diag-nosis from first principles. Artificial Intelligence,32(1):57 – 95, 1987.

(Soltanzadeh et al., 2011) Amir H. Soltanzadeh,Amir H. Rajabi, Arash Alizadeh, Golnaz Eftekhari,and Mehdi Soltanzadeh. RoboCupRescue 2011 -Robot League Team AriAnA (Iran). In RoboCup2011 - RoboCup Rescue Robot - Team DescriptionPapers, 2011.

(Steinbauer et al., 2006) Gerald Steinbauer, MartinMorth, and Franz Wotawa. Real-time diagnosis andrepair of faults of robot control software. In In-ternational RoboCup Symposium, volume 4020 ofLecture Notes in Computer Science, Osaka, Japan,2006. Springer.

8