diagnosis with behavioral modes - parc

7
Abstract Diagnostic tasks involve identifying faulty com- ponents from observations of symptomatic device behavior. This paper presents a general diagnos- tic theory that uses the perspective of diagnosis as identifying consistent modes of behavior, cor- rect or faulty. Our theory draws on the intu- itions behind recent diagnostic theories to iden- tify faulty components without necessarily know- ing how they fail. To derive additional diagnos- tic discrimination we use the models for behav- ioral modes together with probabilistic informa- tion about the likelihood of each mode of hehav- ior. 1 Introduction When you have eliminated the impossible, what- ever remains, however improbable, must he the truth. Sherlock Holmes. The Sign of the Four. The objective of our research is to develop a general theory of diagnosis that captures a human diagnostician’s predominant modes of reasoning. This theory is intended to serve as the conceptual foundation for computational systems that diagnose devices. Early approaches [1, 4] to diagnosis used fault models to identify failure modes of faulty components that ex- plain the observations made. The ability to predict fail- ing components’ behaviors provided powerful diagnostic discrimination, However, these techniques depend on the assumption that all failure modes are known a priori an assumption that is sometimes warranted but is never guaranteed. The unacceptable result of not satisfying this assumption faulty diagnoses —~ has led researchers to abandon this powerful approach. The model-based diagnostic approach adopted by most recent researchers [3, 6, 9] provides a framework for di- agnosing a device from correct behavior only. This ap- proach is based on the observation that it is not necessary to determine how a component is failing to know that it is faulty —~ a component is faulty if its correct behavior (i.e., as specified by its manufacturer) is inconsistent with the observations. Since only correct behavior needs to he modeled, any knowledge about the behavior of component ‘This paper originally appeared in Proceedings IJCAI-89, Detroit, MJ(1989) 104—109. fault modes is ignored. This provides a fundamental ad- vantage over earlier techniques requiring a priori knowl- edge of all fault modes. Unforeseen failure modes pose no difficulty. However, what is lost is the additional diag- nostic discrimination derived from knowing the likely ways a component fails, and the ability to determine whether these failure modes are consistent with the observations. Thus, unlikely possibilities are entertained as seriously as likely ones. For example, as far as most model-based di- agnostic approaches are concerned, a light bulb is equally likely to burn out as to become permanently lit (even if electrically disconnected). Human diagnosticians, however, take great advantage of behavioral models of known failure modes, together with the likelihood that these modes will occur. Knowledge of fault modes is used to pinpoint faulty components faster, and to help determine specific repairs that must he made to the faulty components. We view the central task of diagnosis as identifying the behavioral modes (correct or faulty) of all the components. Whether a mode is faulty or not is irrelevant. Our syn- thesis hypothesizes that it is not the notion of fault, but behavioral mode that is fundamental to diagnosis. Each component has a set of possible behavioral modes including an unknown mode which makes no predictions, and there- fore can never conflict with the evidence. The unknown mode is included to allow for the possibility, albeit small, of unforeseen behavioral modes. This unknown mode is cru- cial because early diagnostic algorithms, when confronted with an unforeseen fault mode, either start making use- less probes or simply give up. Our approach pinpoints the failing component as behaving in an unknown mode. The introduction of fault models potentially introduces significant computational overhead for the diagnostic al- gorithms. Diagnosing multiple faults is inherently a com- hinatoric process. Introducing fault models exacerbates the process, by introducing multiple modes and possible behaviors to consider. To control the combinatorics we introduce computational techniques which focus reasoning on more probable possibilities first. These techniques, in effect, focus diagnostic reasoning only on those component behavioral modes that are more probable given the evi- dence. This set grows and shrinks as evidence is collected. By using the new perspective of diagnosis as identify- ing probable behavioral modes, we are able to extend our earlier work on model-based diagnosis (the General Diag- nostic Engine (ODE) [6]) to reason about modes of be- havior. The resulting system we call Sherlock. GDE pro- vides a general domain-independent architecture for diag. DIAGNOSIS WITH BEHAVIORAL MODES 1 Johan de Kleer Brian C. Williams Xerox Palo 3333 Coyote Hill Email: dekleer, Alto Research Center Road, Palo Alto CA 94304 bwilliarns©parc .xerox.com

Upload: others

Post on 12-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diagnosis with behavioral modes - Parc

Abstract

Diagnostic tasks involve identifying faulty com-ponentsfrom observationsof symptomaticdevicebehavior. This paperpresentsa generaldiagnos-tic theory that uses the perspectiveof diagnosisas identifying consistentmodesof behavior, cor-rect or faulty. Our theory draws on the intu-itions behind recentdiagnostic theories to iden-tify faulty componentswithout necessarilyknow-ing how they fail. To derive additional diagnos-tic discrimination we use the models for behav-ioral modestogether with probabilistic informa-tion about the likelihood of eachmodeof hehav-ior.

1 Introduction

When you haveeliminatedthe impossible,what-ever remains, however improbable,must he thetruth. — SherlockHolmes. The Sign of theFour.

The objective of our researchis to developa generaltheory of diagnosisthat capturesa humandiagnostician’spredominantmodesof reasoning. This theory is intendedto serveas the conceptualfoundationfor computationalsystemsthat diagnosedevices.

Early approaches[1, 4] to diagnosisused fault modelsto identify failure modesof faulty componentsthat ex-plain the observationsmade. The ability to predict fail-ing components’behaviors provided powerful diagnosticdiscrimination, However, these techniquesdependon theassumptionthat all failure modes are known a priorian assumptionthat is sometimeswarrantedbut is neverguaranteed.The unacceptableresult of not satisfying thisassumption— faulty diagnoses—~ has led researcherstoabandonthis powerful approach.

The model-baseddiagnosticapproachadoptedby mostrecent researchers[3, 6, 9] providesa framework for di-agnosinga device from correct behavior only. This ap-proachis basedon theobservationthat it is not necessaryto determinehow a componentis failing to know that itis faulty —~ a componentis faulty if its correct behavior(i.e., as specifiedby its manufacturer)is inconsistentwiththe observations.Sinceonly correct behaviorneeds to hemodeled,anyknowledgeabout the behaviorof component

‘This paperoriginally appearedin ProceedingsIJCAI-89,Detroit, MJ(1989) 104—109.

fault modesis ignored. This providesa fundamentalad-vantageover earlier techniquesrequiring a priori knowl-edge of all fault modes. Unforeseen failure modes poseno difficulty. However, what is lost is the additional diag-nostic discrimination derivedfrom knowing the likely waysa componentfails, and the ability to determinewhetherthesefailure modes are consistentwith the observations.Thus, unlikely possibilities are entertainedas seriouslyaslikely ones. For example,as far as most model-baseddi-agnosticapproachesare concerned,a light bulb is equallylikely to burn out as to becomepermanentlylit (even ifelectrically disconnected).

Humandiagnosticians,however,takegreatadvantageofbehavioralmodelsof known failure modes, togetherwiththe likelihood that thesemodeswill occur. Knowledge offault modesis used to pinpoint faulty componentsfaster,and to help determinespecific repairs that must he madeto the faulty components.

We view the central task of diagnosisas identifying thebehavioralmodes(correct or faulty) of all the components.Whether a mode is faulty or not is irrelevant. Our syn-thesis hypothesizesthat it is not the notion of fault, butbehavioralmode that is fundamentalto diagnosis. Eachcomponenthasasetof possiblebehavioralmodesincludingan unknown modewhich makesno predictions, and there-fore can never conflict with the evidence. The unknownmodeis includedto allow for the possibility,albeit small, ofunforeseenbehavioralmodes.This unknown modeis cru-cial becauseearly diagnosticalgorithms,when confrontedwith an unforeseenfault mode, either start making use-lessprobesor simply give up. Our approachpinpoints thefailing componentas behavingin an unknown mode.

The introduction of fault models potentially introducessignificant computationaloverheadfor the diagnostical-gorithms. Diagnosingmultiple faults is inherently a com-hinatoric process. Introducing fault models exacerbatesthe process, by introducing multiple modesand possiblebehaviors to consider. To control the combinatoricsweintroducecomputationaltechniqueswhich focus reasoningon more probablepossibilities first. These techniques,ineffect, focus diagnosticreasoningonly on thosecomponentbehavioral modes that are more probablegiven the evi-dence. This set growsandshrinks asevidenceis collected.

By using the new perspectiveof diagnosis as identify-ing probable behavioralmodes,we are able to extendourearlier work on model-baseddiagnosis (the GeneralDiag-nostic Engine (ODE) [6]) to reasonabout modes of be-havior. The resulting system we call Sherlock. GDE pro-videsa generaldomain-independentarchitecturefor diag.

DIAGNOSIS WITH BEHAVIORAL MODES1

Johan de KleerBrian C. Williams

Xerox Palo3333 CoyoteHillEmail: dekleer,

Alto ResearchCenterRoad,Palo Alto CA 94304bwilliarns©parc.xerox.com

Page 2: Diagnosis with behavioral modes - Parc

nosing any numberof simultaneousfaults in a devicegivensolely adescriptionof its structure (e.g., electrical circuitschematic)andspecificationsof correctcomponentbehav-iors (e.g., that resistorsobey Ohm’s law). Given a setof observations,ODE constructshypotheses(called diag-noses)identifiesthe faulty componentsandsuggestspointswhere additional measurements(called probes) should hemade to localize the diagnosiswith as few measurementsas possible.

We haveimplementedour approachby extendingGDEto Sherlock,and havetested it on a variety of digital cir-cuits from simple three inverter circuits, to ALUs con-sisting of 400 gateswith 4 behavioralmodeseach. Sher-lock exploitsknowledgeof failure modesto pinpoint faultsmore equally and identify in what mode componentsarefunct ioning.

2 Related work

Exploiting the use of fault models has recently becomean active researcharea [10, ii, 12, 15]. In particular,Holtzhlatt’s [ii] generalizationof GDE incorporates thenotion of behavioralmodesin a similar spirit to Sherlock.But Holtzhlatt’s GMODS systemis missing manykey fea-tures of Sherlocksuch asaccommodatingunexpectedfail-ures, incorporatingprobabilistic information to rank diag-nosesandguide probing,incorporatingmost-probable-firstheuristics to limit the computational complexity whicharisesfor largerdevices,andcombiningevidencegatheredfrom multiple observationsof a device. As GMODS doesnot use probabilistic information it relies on an expensivehyperresolutionrule to rule out fault modesand cannotfocus reasoningon more probablediagnoses. Struss [15]arguesagainstthe useof probabilistic informationandtheuseof an unknown mode. Instead he employs a resolu-tion rule and controls reasoningto introduce appropriatefault modesonly when necessary.Through the use of analternativearchitecturewhich redefinesthe notion of fault,Raiman[12] achievessomeof the advantagesof knowledgeof fault modeswithout having to incorporatethem. Ham-scher[10] incorporatesfault modelswith his generalizationof ODE called XDE.

3 Diagnosis with modes

The perspectiveof diagnosisas identifying probable be-havioral modesis best appreciatedthrough an example.Consider the simple three inverter circuit shown below.Supposethat the input (I) is set to zero, and that, al-

‘s_ G~O~O 9

though the output (0) should he one if functioning cor-rectly, it is measuredto he zero. Without knowledge offault modes, all three inverters are equally likely to hefaulted. If we knew that inverters (almost) alwaysfailedwith output stuck-at-i, then we could infer that inverter Bwas likely to he faulted. Thus knowledgeof failure modescan provide significant diagnosticinformation.

Knowledge of failure modesis also important to decidewhat measurementto makenext. If all faults wereequallylikely, measuringX or Y providesequalinformation. How-ever, supposewe know that inverters A and B almost al-ways fail by having their output stuck-at-i, and that in-verter C almost alwaysfails by having its output stuck-at-U (becauseit is designeddifferently to drive an externalload). Given that knowledge, it is unlikely that inverterA is failing, as its most common fault does not explainthesymptom. If operatingcorrectly, A’s output should heone, so A being stuck-at-i would not explain the incor-rect value beingobservedat the device’soutput. However,the likely failuresof inverters B andC areconsistentwiththe symptom since either explainsthe deviation from ex-pectedbehavior. Hencethe diagnosticianshould measureat Y next to determinewhichof thetwo invertersis failing.

The objective of the the diagnosisdictates the granu-larity of Sherlock’sanalysis. Sometimestheobjective is toidentify which componentsarefailing andhow. Sometimesthe task is simply to identify the failing componentsso thatthey can he replaced. Sometimesthe task is to identify allthe behavioralmodes (good and had). Sometimesdiag-nosisconsidersmultiple test vectors, andsometimesthereis only one. To accommodatethese possibilities Sherlockmust he told which modesit must discriminate among.

It is important to note that even if it is diagnosti-cally unimportant to distinguishbetweensomebehavioralmodes,knowledgeof behavioralmodesstill helpsSherlock.Supposea componenthas two faulty modes,M

1of high

probability, andM2

of low probability, whichwe arenot in-terestedin discriminatingbetween. If a measurementelim-inates M

1from considerationthen the (posterior) proba-

bility that the componentis faulty becomeslow.

4 Framework

This sectionpresentstheoverall framework including defi-nitions for basic terminology andequationsfor computingthe relevantprobabilistic information. Section 5 presentsheuristics for avoiding thecombinatorialexplosion result-ing from movingfrom ODE to Sherlock.

The key conceptualextensionto GDE is the introduc-tion of behavioralmodes. The extensionis very easy asODE can he viewed ashaving two behavioralmodes(thegood oneandthe faulty onewith unspecifiedbehavior)percomponent. In Sherlocktherearesimply more behavioralmodesper component.

The structureof thedevice to he diagnosedspecifiesthecomponentsandtheir interconnections. Componentsaredescribedasbeingin oneof a set of distinct modes,whereeachmode capturesa physical manifestationof the com-ponent (e.g., a valve beingopen,closed, cloggedor leaky).The behavior of eachcomponentis characterizedby de-scribing its behavior in each of its distinct modes. Werequire that a componentcan he in only one mode at atime. We also require that a faulty componentmust re-main in the samemode for all test vectors(in the excep-tional casewhere a fault cannothe modeledthis way, itsbehavior is capturedby the unknownmode). Other thanthesethereare veryfew restrictionson behaviormodels: amodel can makeincompletepredictions, the set of modescan he incomplete,and the predictionsof different modescan overlap.

Page 3: Diagnosis with behavioral modes - Parc

A component’smodes consistsof a set describing thecomponent’sproperbehavior (e.g., the valve being on oroff), and a set describingfaulty behavior (e.g., the valvebeing cloggedor leaky). When there is only one modeforproper behaviorwe abbreviateit as C (for “good”). For-mally, a behavioralmode is a predicateon components,which is true of a device exactly when the device is inthat behavioralmode. Every componenthasan unknownmode, U with no behavioralmodel, representingall (fail-ure) modeswhosebehaviorsare unknown.

In ourexample,we considerfour behavioralmodesof adigital inverter: good (abbreviatedC), output stuck-at-i(abbreviatedSi), output stuck-at-U(abbreviatedSO), andan unknownfailure mode(abbreviatedU). The axioms forthe behaviorof the inverterare:INVERTER(x) —

[C(x) UN(x) 0 OUT(x) 1[]A

ISi(x) OUT(x) 1]A[S0(x) OUT(v) 0]].

The unknown behavioralmode U(v) hasno model.Given the model library and the device structure Sher-

lock directly constructs a set of axioms SD, called thesystemdescription [13].

An observation is a set of literals describing the out-comesof measurements(e.g., {I 0, X i, 0 — 0}) for atest vector which hasbeenapplied to the device. The evi-denceconsistsof a set of observations(e.g., {{I 0, X —

i, 0 O}, {I i, 0 O}}).~ This definition of allowsus to incorporateaccumulatedevidencefrom differenttestvectors.

A candidate assignsa behavioralmode to every com-ponent of the device. Intuitively, a diagnosis is a can-didate that is consistentwith the evidence,however, wedistinguish betweena diagnosis for a particular observa-tion anda diagnosisfor all the evidence. A diagnosisforan observation is a candidatethat is consistent with theobservation— formally, that the union of the systemde-scription, the candidate,and the observationis logicallyconsistent3.Formally a candidateis a set of literals, e.g.,{G(A), G(B), U(C)}. To distinguishsetsrepresentingcan-didateswe write [G(A), G(B), U(C)]. ~ote that in ODE acandidateis representedby the set of failing components,while in Sherlocka candidateis representedby a set thatassignsa behavioralmodeto everycomponent.Thus, theSherlockcandidate[G(A), G(B), U(C)] correspondsto theODE candidate[C].

In combining information from different observationswe need to treat good and bad modes differently. Bydefinition, a componentmanifeststhe samefailure modethroughoutall observations.However, if acomponentis ina good mode(e.g., valveis on) in one observationthereisno reasonto believeit shouldhe in thesamegood modefor

2The processof generatinggood test vectorsis outside thepresentscopeof our theory.3Notethat by this definition somecandidatesmaybe elimi-natedas diagnoseson the basisof no observationswhatsoever.For example,considerhypotheticalmodels for two invertersinseries where the first inverter had a mode output-stuck-at-iand the secondhad a mode input-stuck-at-U. Note also thatthecandidatein which everycomponentis operatingin its un-known modeis alwaysacandidateunlessthecombinationof thesystemdescriptionandanyobservationby itself is inconsistent.

anothertest vector. If componentshaveonly a singlegoodmode,combining information from multiple test vectorsisstraight-forward. Namely, a diagnosisfor the evidenceis asetof literals suchthat for everyobservation,the union ofthesystem description,thecandidate,andthe observationis logically consistent. For brevity we operatewithin oneobservation,in the remainderof this paper, unlessother-wise indicated. However, it is important to hear in mindthat manyof thedesigndecisionsunderlyingSherlockonlymakesensewhenmultiple observationsare taken into con-sideration.

Like ODE, we make the basic assumptionthat compo-nentsfail independently(which is sometimesunfounded)and that the prior probabilities of finding a componentin a particular mode are provided. Recall that, althoughthe behaviorsof the different modesmay sometimesover-lap, we requirethat eachmodecapturesa distinct physicalstate or condition of the component. Thus, the probabil-ities of all the modesof a componentalways sum to one.Under theseassumptions,theprior probability that a par-ticular candidateC1 is the actualoneis:

p(Cj) — fl p(m).md C,

wherep(rn) denotestheprior probability of behaviormodein being manifested(i.e., a particular componentbeingina particular mode).

As candidates are eliminated, the probabilities of theremaining diagnosesmust increase. (On occasiona candi-date is eliminated purely asa result of the device’s topol-ogy in which case the probability is adjusted by a renor-mahization.) Usually candidatesare eliminatedas a resultof measurements.Bayes rule allows us to calculate theconditional probability of the candidatesgiven that pointa, is measuredto he i’,k (unlessotherwiseindicated, allprobabilities are conditional on evidencepreviously accu-mulated. See[6] for more details):

_ p(x, V,k~Cl)p(Cl)p(C1~a, — V,k) —

p(x, — ~k)

The denominator, p(x, v,k), is jtist a normalization.p(Ci) was computedas a result of the previousmeasure-ment (or is the prior). Finally, p( a, — ~ C

1) is determined

as follows:

i. If x~ v,~is predicted by C1 given the evidenceso farthen p(x, v,k~Cl)= i.

2. If a, v~tis inconsistent with C1 and the evidencethen p(a, qk~C1) 0.

3. If a~ v,i~ is neither predicted by nor inconsistentwith C1 andthe evidencethen we makethe presuppo-sition (sometimesinvalid) thateverypossiblevalueforx~is equally likely. Hence, p(a, v,~C

1) ~ where

in is the numberof possiblevaluesa, might ~ave (ina conventionaldigital circuit in 2). Intuitively, thisprovides a bias for candidateswhich predict a mea-surementover thosethat don’t.

Throughout the diagnostic session, the probability ofany particular observationa~= v,~is houndedbelow bythesum of the current probabilities of the candidatesthatentail it and houndedabove by one minus the sum of the

Page 4: Diagnosis with behavioral modes - Parc

currentprobabilitiesof the candidatesthat areinconsistent componentsand how they are faulted. In this case Sher-with it. See[6] for the estimateused. Similarly, the prob-ability that a componentis in a particular mode is givenby the sum of the current probabilities of the candidatesin which it appears.

4.1 Varieties of diagnostic tasks

In order to determinewhat next measurementis likely toprovidethemostinformation, Sherlockmtist determinethelikelihood of hypothetical measurementoutcomesanditsconsequenceson the candidatespace. The different diag-nostic objectivesdictate differing scoring functions. Slier-lock is asked to discriminate amongsomemodesand notothers; by supplying Sherlock with sets of discriminationspecifications —- (a set of modesthat are not to he dis-criminated). The discrimination specification partitionsthe diagnosesinto a set of d-partitions. The goal of diag-nosis is to identify theprobabled-partitionsandto suggestmeasurementswhich best pinpoint the actual one. For ex-ample, it mayonly he important to discriminatebetweengood and faulty behavior. In this case, the most prob-able cl-partition identifies which componentshave to hereplaced. In the simple case where the objective is todiscriminate among all behavioralmodes, then every d-partition is just a singleton set consisting of a single diag-

nosis. Note that, in general,the different diagnoseswithina single d-partition make different predictions. Althoughit may be unimportant to discriminateamongthem as faras the overall diagnostic objective is concerned, it is im-portant to keep them separateto correctly computetheprobabilities of measurementoutcomes.

The specific approachused to select measurementsisa minimum entropy technique pick that measurementto make next that will yield, on average,the minimumentropyH (or converselythat measurementwhich extractsmaximuminformation):

H = — ~p(D1)log p(D1).

Where p(Di) is the probability of a d-partition given ev-idence. This, in turn, requires computing the candidateprobabilities given a hypothetical outcome. Fortunatelythis is computablefairly directly using Bayesrule (see [6]for details). The expectedentropy resulting from measur-ing a, is:

k =rn

He(a~)= ~ p(aj = v,t)H(a, =

k=1

where v~ are the possible measurementoutcomesandH(a, = O,k) is the entropy of the resulting set of d-partitions. Information theory tells us that, given cer-tain assumptions,the measurementchosenby this scor-ing function will on averageenable Sherlockto make thefewest number of measurementsto identify the actual d-partition to a certain level of confidence. This approach(seeexamplesin [6]) almost alwayssuggeststhe optimummeasurementcommon sensewould suggest. The subse-quent examplesrestate entropy as a cost function: idealmeasurementshaveU cost, anduselessmeasurementshavecost 1.

If there are multiple test vectors, far greater caremusthe taken. Supposethe objective is to identify the faulty

lock need only discriminateamong faulty modes. The d-partitions for the overall objective are the intersectionofthoseobtainedfrom eachof the multiple test vectors. Incomputing He(a,) we must take care to use theseglobald-partition, hut only use the relevantcandidatesfor de-termining p(a = V,k) for a test vector. Thus, Sherlockidentifies not only the best place to measurehut also thebesttest vector (given thie test vector set with which it hasbeensupplied) under which to makethe measurement.

4.2 Algorithms commmon to GDE and Sherlock

Sherlock, like ODE, exploits an assumption-basedtruthmaintenancesystem(ATMS)[5[. Every literal stating thatsomecomponentis in somebehavioralmodeis representedby an ATMS assumption. A literal indicating measure-ment outcome (e.g., IN(A) = 0) is representedby anATMS premise4. The underlying Sherlockalgorithmsaresimilar to thoseof ODE exceptcomponentscan havemul-tiple modes.

Sherlockcomputesthe diagnosesby first constructingasetof conflicts.A conflict is a setof componentbehavioralmodeswhich is inconsistent with the system descriptionandsomeobservation(i.e., a conflict is representedby anATMS nogood). A conflict containsat mostonebehavioralmodeper component.As in ODE, we representthe set ofconflicts compactlyin termsof thseminimal conflicts, sinceconflicts are orderedby set-inclusion: every supersetof aconflict is necessarilya conflict as well.

Intuitively, a minimal conflict identifies a small kernelset of componentbehavioralmodes which violates someobservation. It is easilyshown that a candidate is a diag-nosis iff it doesnot containanyminimal conflict. Thus, thecompleteset of diagnosesis computablefrom the minimalconflicts alone. Thus, Sherlockattempts to determinetheminimal conflicts (in ATMS terminology theseare mini-nial nogoods) as these provide the maximum diagnosticinformation.

Sherlock is typically usedwith a sound but incompleteprediction facility. Although soundnessguaranteestheconflicts Sherlock discovers are indeed conflicts, incom-pletenesssometimesmakesit impossible to identify theminimal conflicts andconsequentlyfails to rule out candi-dates asdiagnoses. In the rest of this paperby minimalconflicts we simply mean the set of unsubsumedconflictsfound by Sherlock, andby diagnosiswe meanacandidatenot ruled out by one of theseconflicts. The consequencesof incompletenessarenot catastrophicand usually resultin only a minimal degradationin diagnosticperformance.This issueis discussedin more detail in [6].

In order to select the next measurement(and underwhich test vector) to make, Sherlockmust evaluatetheeffectsof a hypotheticalmeasurement.To do so, Sherlockmust be able to determiiie what possibleoutcomesholdin which candidates. Sherlock computesthe sets of be-havioral modeswhich support each possibleoutcome. Ifan outcomefollows from a set of behaviormodes,then itnecessarilyfollows from any superset.Therefore,Sherlockneedonly record with eachpossibleoutcomethe minimal

4To implementthe searchstrategydiscussedin the next sec-tion thesehiterals have to be assumptionsas well but this isoutside the scopeof this paper.

Page 5: Diagnosis with behavioral modes - Parc

setsof behaviormodesupon which it depends.Thusa pos-sible measurementoutcomeholds in a candidateif a set ofbehavioralmodes supporting the outcomeis a subsetofthe candidate. Each set of behavioralmodessupportingan outcomeis representedby an ATMS environmentandthe set of all environmentsfor an outcomeis representedby an ATMS label. The details for this algorithm can hefound in [5, 6). In a later section we work througha simpleexampleillustrating Sherlock’sfunctioning.

5 Controlling the combinatorics

The presenceof behavioralmodeshastwo immediatecon-sequencesaffecting the algorithms: (i) there arefar morebehavioral modes to reasonabout, and (2) the conceptof minimal diagnoseswhich was so useful to ODE is nowvirtually meaningless.For example,if there are ii compo-nents, eachwith k behavioralmodes,there are k

T’ candi-

dateswhich might have to he considered(as opposedtoODE’s 2”). Together theseconsequencesmake Sherlocksignificantly slower than ODE. This potential combina-torial explosion manifestsitself in two ways in Sherlock.First, the set of conflicts, aswell as the setsof behavioralmodes underlying possible outcomes(i.e., the ATMS la-bels, explodes).This causesthe prediction phaseof Shier-lock to explode. Second,the numberof possiblediagnosesis exponential, causing candidategenerationto explode.Thus, the Sherlock architectureadds two tactics beyondthoseused in ODE to keep the combinatorial explosionundercontrol. Thesetacticsapply to ODE aswell asSher-lock.

The focussingtactics do not affect the set of diagnoses

produced (or probabilities ratios among them). We firstpresentour strategyfor the diagnosticobjectiveof identi-fying all fault modes,andthen later show it can he modi-fied to find thebestd-partitions. The basicideais to focusreasoning to the subsetof the diagnoses(called leading di-agnoses)that satisfy the following conditions:

• All leadingdiagnoseshavehigher probability than allnon-leadingdiagnoses.

• There are no more than k1

(usually k~= 5) leadingdiagnoses.The exceptionis that all diagnoseshavingprobability approximatelyequalto the k1th diagnosisare included (to accommodateroundoff difficulties).

• Candidateswith probability less than ~-th (usually

k2

= 100) of the best diagnosisarenot considered.

• The diagnosesneednot include more than k3

(usuallyIc3 = .75) of the total probability massof the candi-dates.

This approachfocussescandidate generationto a smalltractableset of leadingdiagnoses.

The primary remaining source of combinatorial explo-sion is thesize of the ATMS labels for Sherlock’spredic-tions. This is dealt with using a generalization of the fo-cussing strategiesoutlined in [8] and are similar to somesuggestedin [7]. To handlethis both the ATMS and theunderlyingconstraintpropagatorusedby Sherlockare re-stricted to focus their reasoningonly on the leading di-agnosesor tentative leading diagnoses. No prediction ismadeunlessits resultshold (i.e., one of its environments

is a subset of somefocus environment) in the current fo-cus. Furthermore,no environmentis added to any ATMSlabel unless it holds in somecurrent focus. If the ATMSdiscoversan environmentnot part of any current diagno-sis, it does not addit to the prediction’s label and insteadstoresit on its “blocked” label.

Unfortunately, there is a bootstrapping problem. Theleading diagnosescannothe accuratelyidentified withoutsufficient minimal conflicts. The reasoningcannotproduceenoughiminimal conflicts unlessthereare leadingdiagnosesto focus on. Another complicationis that Sherlockcannotcorrectly evaluatethe probability of a candidatevia Bayesrule unless it is in the focus.

The following is an outline of the procedureSherlockuses to identify the leading diagnosesarid consistsof abacktrackingbest-first searchcoupled with focussingtac-tics just discussed.The normalization factor of Bayesruleis left out in the searchsince it doesnot changethe prob-ability orderingof diagnosesand is the samefor all candi-dates. The searchestimatesthe probability of a~tentativediagnosis— a candidatewhich is consistentwith thepre-dictions (more precisely containsno known conflict as asubset),hut which has not yet been focussedupon — tohe simply its prior probability (corrected by the normal-ization). This is an tipper hound of its correct probability.Focussingthe attention of the predictor on the tentativediagnosismight producea conflict which eliminatesit (i.e.,drivesits probability to zero) or it might hediscoveredthatthe diagnosisdoesnot predict everymeasurementoutcome(in which caseits probability needs to he adjusteddown-wards by Bayesrule). Using thesetechniquesthe followingsearchguaranteesthat it finds the sameleadingdiagnosesan unfocussedSherlockwould find.

1. If, accordingto the criteria, therearesufficient leadingcandidates,stop. Let b he the tipper-bound of theprobabilitiesof thediagnoseswhichare reachablefromthe next place to push the best-first searchforward.The key test is: is b less thanthe leadingcandidates?

2. Continue a best-first search for the next highest-probability (estimatedby its upper hound) candidatewhich accountsfor all the minimal conflicts.

3. Focusthe predictoron the candidate(i.e., by unblock-ing the ATMS labels and permitting consumerexe-cution). This finds any conflicts. It also finds anynew predictionswhich follow from this candidatehutwhich haven’t beendiscoveredearlier.

4. If the candidatecontainsaconflict, go to step i.

5. Compute the probability of the candidateaccordingto Bayes rule by multiplying its probability by -~

wheren is the number of times the candidatefails topredict somemeasurementoutcome.

6. Go to i.

This searchmay find more than the requirednumber ofdiagnosesbecausethe corrected probability of a best nextcandidate may he much lower than estimated. Althoughsuch candidatesarediagnoses,they arenot necessarilytheleading ones.

Thus far we presumedthat it is important to discrimi-nate betweenall modesand that the d-partitions are thesimple diagnoses. If it is not important to discriminate

Page 6: Diagnosis with behavioral modes - Parc

among certain modes, the precedingalgorithms must hemodified to identify d-partitions.

To identify d-partitions efficiently requiressomesubtlechangesto the best-first search. Whenevera diagnosisisfound, all the other candidatespotentially in the samepartition must he identified to fill out the d-partition andaccuratelycalculateits probability. However, this alone isinsufficient to ensurethst our previousalgorithm finds thiebest d-partitions becausethe probability of the best diag-noseswithin a d-partitions is only a lower hound on theprobability of the d-partition it could he part of. There-fore, we must modify the searchsuch that eachdiagnosesis scored (only for thie purposesof the search)by an up-per hound of the probability of the d-partition it is partof. For eachcomponent, the ‘probability’ score assignedto eachmodeis the sum of the prior probabilities of thatmode and all modeslater in the mode ordering amongwhich Sherlock is riot reqturedto discriminate.As a resultof this ordering the ‘probability’ of adiagnosisis an upper-houndof thed-partition of which it is a member. Onceonediagnosisof a d-partition is found, the remainingmembersof the d-partitions are filled out and its probability cor-rectly computed. (Sherlock incorporates heuristics thatavoid filling out the d-partition with extremely low prob-ability diagnoses.)As a result, Sherlockfinds the leadingd-partitions meetingthe criteria for simple diagnosesjustlaid out.

6 A simple example

To demonstratethe basic ideas of Sherlock’s operationwith fault modesconsider the three inverter example.Weset the focussing thresholdsas indicated earlier and thediagnosticobjectiveto identify the modeof every compo-nent. However,as this exampleis tractablewithout usingany focussingheuristics, we also show the correct values(i.e., having computed all the diagnoses)in parentheses.Supposeevery inverter is modeledas described earlier,with A and B tending to fail stuck-at-i and C tendingto fail stuck-at-U:

A B C’p(C(A)) = .99 p(C1B)) .99 p(C(C)) .99p(Si(A)) = .008 p(Si(B)) = .008 p(Si(C)) = .001p~SU(A))= .001 p(S0(B)) — .001 p(SU(C’)) = .008p(U(A)) = .001 p(U(B)) = .001 p(U(C’)) — .UUi

With no observationsSherlock finds the single leadingdiagnosis(all diagnosisprobabilities arenormalized):

p[C(A), C(B), C(C’)] = 1(970).

The unfocussedSherlockfinds 43 = 64 diagnoses(as thereareno symptomsanythingcould he happening). Given azero input (I = U), Sherlockcomputesthe following out-comesand their supporting environments. The final col-umn indicatesthe additional environmentsan unfocussedSherlockdiscovers):x = 0,x = 1,Y -0,Y = 1,

{SU(A)}{Si(A)}{Si(A), C(B)}{S0(B)}{SU(A), G(B)} {S1(B)}{S0(A), G(B), C(C)}{S1(B)G(C’)} {S0(C)}{S1(A), C(B), G(C)}{SU(B),C(C)} ~Si(C)}

The first line statesthat X — 0 tinder assumption50(A).or equivalently that X 0 in every candidatewhich in-cludes S0(A~. However, the focussed Sherlock finds nolabel for X — U as it doesnot hold in the single leadingdiagnosis. Intuitively, the last line statesthat the output isoneif either (1) all thecomponentsaregood (which is theleading diagnosis), (2) the first inverter is stuck-at-i andtheother two aregood, ([I) the secondinverter is stuck-at-0, and the final inverter is good, or (4) the last inverter isstuck-at-i.

If we applytheminimum entropytechniquewe find costs(8 denotesthe cost function,andthe focussedSherlockcostis shown first followed by the correct cost in parentheses):

S(X) i(.99), $(Y) 1(.95), 8(0) — i).9i).

All thesecostsarehigh becausethere is no evidencethat afault necessarilyexists. The costs areall equalfor the fo-cussedSherlockbecausethereis only oneleadingdiagnosisand therefore nothing can he learnedby further measure-ment.

Suppose0 is measuredto be zero. There arefour min-imal conflicts (becauseeach set of the minimal environ-mentssupporting 0 i is now a conflict):

{G(A). C(B),C(C)} {Si(A), G(B), C(C’)}

{S0(B),C(C)} {Si(C)}

Sherlocknoticesthe leadingcandidateis eliminatedandcontinuesbest-first searchto find the following leadingdi-agIioses:

p([C(A), C(B), SU(C)[) — 0.432(4261

p([C(A), Si(B). C~C)[)— 0.432)426)

p([SU)A),C(B), 0(C)]) — 0.054(053)

p([C(A). 0(B), U(C)[) = 0.027(.027)

p(.U(A),0(B), 0(C)]) = 0.027(027)

p(G(A1,0(C), U(B)]1 = 0.027(027)

Total = .999(986)

The next highest probability diagnosesbeing:

p((Si(A), 0(B), S0(C’)]) = .003(003).

Thie figures in parenthesesindicate the correct probabil-ities (initially therewere 64 diagnoses,now 42 diagnosesremain)and Sherlockhas identified the leadingones.

There are four important things to notice about thislist of leading diagnoses. First, even though the Sher-lock algorithm is running with k

1= 5, it finds 6 can-

didates becauseit expandsits search slightly to makesure candidatesare not eliminated simply by round offerrors. Second, the top 6 of thie 42 candidatescontain98.6% of the probability mass. Tlurd, the heuristic es-timates are quite accurate — they are equal to the cor-rect valuesnormalizedby .986. Fourth,althoughdiagnosisS0(A), G(B), 0(C)] hasthe sameprior probability of the

three candidates[0(A), 0(B), U(C)], U(A), 0(B), 0(C’)[and C(A), 0(C), U(B)], after the two measurementsitsprobability is twice that of theothers. This is becausethethreecandidatespredict no value of the output andhencetheir posteriorprobability is reducedby one-halfby Bayesrule.

{ 0(A)}{0(A),0(B)}

0_0.

0 — i, {G(A), 0(B), 0(C’)}

Page 7: Diagnosis with behavioral modes - Parc

Given the leadingdiagnoses.the restiltant probabilitiesof behaviormodesare:

A B C’p(O) .9i8(.9ii) .541(538) .54i(.538)p(Si) 0(007) .432(434) 0p(SO) .0541.054) 0(0005) .432(435)p(U) .027(028) .027(027) .027(027)

The table indicatesthat the ma)orfailtire modesto con-sider are C’ stuck-at-U, and B stuck-at-i andthat all otherfaults are tinhikely.

The resulting ATMS labelsare (for this simple examplefoctissing no longer hasany affect on labels):X 0, {0(B), 0(C)} {S0(A)}x i, {0(A)} {SI(A)}Y = 0, {0(A), 0(B)} {0(B), Si(A)} {SU(B)}Y - i, {0(C’)} {0(B), S0(A)} {Si(B)}

{Si(A), 0(B), 0(C)}{SU(B),0(C)} {Si(C’)}Supposethat we applieda secondtest vector with I = i

(the first test vector was I = U), and evaluatedthe hypo-thetical measurenients:

$(X1

) — .72(.72), 8(X3) .94(92),$(}~) .31(31)

$(Y2) — .91(90),$(0~)= .89(89).

Thtis we see that measuringY trsing the first test vector(I — U) is the best measurement.This is hecatisemeastir-ing Y will differentiate betweenthe two hugh probabilitycandidates. However, meastiring0 tinder the secondtestvector (I = i) is tiseftil as well. Suppose0 = 0 tinder thesecondtest vector. The restilting probabilities are:

p( 0(A). 0(B), 50(C)]) — 0.450(444)

p(G(A), Si(B), 0(C)]) — 0.450(444)

p( S0(A),0(B), 0(C)]) U.056(.056)

p( 0(A), 0(B), U(C)]) U.0i4(.0i4)

p([U(A), 0(B), 0(C)]) — 0.0i4(.0i4)

p([0(A), 0(C), U(B)]) — 0.014(014)

Althoughi measuring0 = U again does not eliminate anydiagnosis.it provides further evidencethat a componentisnot behavingin someunknownmode, thtis slightly raisingthe probabilitiesof the first three diagnoses.

7 Acknowledgments

We appreciatediscussionand commentsfrom Daniel 0.Bobrow, Brian Falkenhainer,Adam Farquhar,Alan Mack-worth, Sanjay Mittal, Olivier Raiman, Mark Shirley andPeterStruss on the topics of thus paper.

References

[1] Brown, J.S., Burton, R. R. and de Kleer, J., Peda-gogical, nattiral languageand knowledgeengineeringtechniquesin SOPHIE I, II and III, in: D. Sleemanand J.S. Brown (Eds.). Intelligent Tutoring Systems.(Academic Press, New York, i982) 227—282. An cx-pansionof the relevantsectionsof thus paperappearsin thesereadings under the title, Model-baseddiag-nosis in SOPHIE III.

12] Davis, R.. Diagnostic Reasoningbasedon strticttireand behavior, Artificial Intelligence 24 (1984) 347410. Appearsin thesereadings.

3] Davis. R., aridHamscher,W., Model-basedreasoning:Trotibleshooting, in Eaploring artificial intellige ace,editedby HE. Shroheandthe AmericanAssociationfor Artificial Intelligence, (Morgan Kaufman, i988(,297 346. Appearsin thesereadings.

4] de Eleer, J., Local methods of localizing fatilts inelectronic circuits, Artificial Intelligence Laboratory,AIM-394, Cambridge: M.I.T., i976. An expansionofthe relevant sectionsof thus paper appearsin thesereadingstinder the title, Model-baseddiagnosisin SO-PHIE III. 2

5] de Kleer, J., An assumption-basedtruth maintenancesystem.Artificial Intelligence28 (1986) 127—i62. Alsoin Readingsin NonMonotoruc Reasoning,edited byMatthew L. Ginsberg, (Morgan Kaufman, 1987), 280—297.

6] de Kleer, J. arid Williams, B.C., Diagnosingmultiplefatilts, Artificial Intelligence 32 (1987) 97—130. Alsoin Readingsin NonMonotonic Reasoning,edited byMatthiew L. Ginsberg.(MorganKaufman, 1987), 372388. Appearsin thesereadings.

7] Dressler, 0. and A. Farqtihiar, Problem solver con-trol over the ATMS, SiemensTechsnicalreport INF 2ARM-i3-89, Munich, 1989.

8’ Forhtis, K.D. and de Kleer, 3.. Focusing the ATMS,Proceedingsof the National Conferenceon ArtificialIntelligence.Saint Patil, MN (August 1988), 193—198.

9] Genesereth,MR., The useof design descriptionsinautomateddiagnosis,Artificial Intelligence24 (i984)4ii—436. Appears in thiese readings.

10] Hamschier,W.C., Model-basedtroubleshootingof dig-ital systems, Artificial Intelligence Laboratory, TR-1074. Cambridge:M.I.T., 1988.

ii] Holtzhlatt, L.J., Diagnosing multiple failures usingknowledgeof componentstates,IEEE ProceedingsonAl applications, i988, i39—i43. Appearsin theseread-ings.

12] Raiman, 0., Diagnosisas a trial: The alibi principle,IBM Scientific Center, 1989. Appears in theseread-ings.

13] Reiter, Raymond, A theory of diagnosis from firstprinciples, Artificial Tntelligevce 32 (1987) 57 95.Also in Readingsin NonMonotonic Reasoning,editedby Matthew L. Ginsberg, (Morgan Kaufman, 1987),352—371. Appearsin thesereadings.

p14

] Strtiss, P., A framework for model-baseddiagnosis,SiemensTRINF2 ARM-iU-88. Appearsin theseread-ings.

13 Struss, P., and DressIer, 0., “Physical negation”Integrating fault models into the general diagnos-tic engine, Proceedingsof the Eleventh InternationalJoint Conferenceon Artificial Intelligence, Detroit,MI (August 1989). Appearsin thesereadings.