explanation-based generalization: a - stanford universityjb732xy4310/jb732xy4310.pdf ·...

34
Machine Learning 1: 47-80, 1986 © 1986 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands Explanation-Based Generalization: A Unifying View TOM M. MITCHELL (MITCHELL @ RED.RUTGERS.EDU) RICHARD M. KELLER (KELLER @ RED.RUTGERS.EDU) SMADAR T. KEDAR-CABELLI (KEDAR-CABELLI @ RED.RUTGERS.EDU) Computer Science Department, Rutgers University, New Brunswick, NJ 08903, U.S.A. (Received August 1, 1985) Key words: explanation-based learning, explanation-based generalization, goal regression, constraint back-propagation, operationalization, similarity-based generalization. Abstract. The problem of formulating general concepts from specific training examples has long been a major focus of machine learning research. While most previous research has focused on empirical methods for generalizing from a large number of training examples using no domain-specific knowledge, in the past few years new methods have been developedfor applying domain-specific knowledge to for- mulate valid generalizations from single training examples. The characteristic common to these methods is that their ability to generalize from a single example follows from their ability to explain why the training example is a member of the concept being learned. This paper proposes a general, domain-independent mechanism, called that unifiesprevious approaches to explanation-based generalization. The EBG method is illustrated in the context of several example problems, and used to contrast several existing systems for explanation-basedgeneralization. The perspective on explanation-based generalization af- forded by this general method is also used to identify open research problems in this area. 1. Introduction and motivation The ability to generalize from examples is widely recognized as an essential capability of any learning system. Generalization involves observing a set of training examples of some general conceptT^eiitTfyTri'glhF features common to these ex- amples, then formulating a concept definition based on these common features. The generalizatktnprpcess can thus be viewed as a search through a vast space of possible concept defirTitionTfin search of a correct definition of the concept to be learned. Because this space of possible concept definitions is vast, the heart ot the generaliza- tion problem lies in utilizing whatever training data, assumptionsjinjijmo^ available to constrain this search. "^olFrVsea^ generalization problem has focused on empirical, data- intensive methods that rely on large numbers of training examples to constrain the search for the correct generalization (see Mitchell, 1982; Michalski, 1983; Dietterich,

Upload: others

Post on 21-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

Machine Learning 1: 47-80, 1986© 1986 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands

Explanation-Based Generalization: A Unifying View

TOM M. MITCHELL (MITCHELL @ RED.RUTGERS.EDU)

RICHARD M. KELLER (KELLER @ RED.RUTGERS.EDU)SMADAR T. KEDAR-CABELLI (KEDAR-CABELLI @ RED.RUTGERS.EDU)

Computer Science Department, Rutgers University, New Brunswick, NJ 08903, U.S.A.

(Received August 1, 1985)

Key words: explanation-basedlearning, explanation-based generalization, goal regression, constraintback-propagation, operationalization,similarity-based generalization.

Abstract. The problem of formulating general concepts from specific training examples has long been a

major focus of machine learning research. While most previous research has focused on empiricalmethods for generalizing from a large number of training examplesusing no domain-specific knowledge,in the past few years new methods have been developedfor applying domain-specific knowledge to for-mulate valid generalizations from single training examples. The characteristic common to these methodsis that theirability to generalize from asingle example followsfrom theirability toexplainwhy the trainingexample is a member of the concept being learned. This paper proposes a general, domain-independentmechanism, called

EBG,

that unifiesprevious approachesto explanation-basedgeneralization. The EBGmethod is illustrated in the context of several exampleproblems, and used to contrast several existingsystems for explanation-basedgeneralization. The perspective on explanation-basedgeneralization af-forded by this general method is also used to identify open research problems in this area.

1. Introduction and motivation

The abilityto generalize from examples is widely recognized as an essential capability

of any learning system. Generalization involves observing a set of training examplesof some general conceptT^eiitTfyTri'glhF features common to these ex-amples, then formulating a concept definition based on these common features. Thegeneralizatktnprpcesscan thus be viewed as a search through a vast space of possibleconcept defirTitionTfin search ofa correct definition of the concept to be learned.Because this space ofpossible concept definitions is vast, the heart ot the generaliza-tion problem lies in utilizingwhatever trainingdata, assumptionsjinjijmo^available to constrain this search."^olFrVsea^ generalization problem has focused on empirical, data-intensive methods that rely on large numbers of training examples to constrain thesearch for the correct generalization(see Mitchell, 1982; Michalski, 1983; Dietterich,

Page 2: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

48 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

I

SJ-j

'_>

1982 for overviews of these methods).These methods all employ somekind of induc-tive bias to guide the inductive leapthat theymust make in orderto define a conceptfrom only a subset of its examples(Mitchell, 1980). This bias is typically built intothe generalizerby providing it with knowledge onlyof those example features thatare presumed relevant to describing the concept to be learned. Through variousalgorithms it is then possible to search through the restricted space of conceptsdefinable in terms of these allowed features, to determine concept definitions consis-tent with the training examples. Because these methods are based on searching forfeatures that are common to the training examples, we shall refer to them assimilarity-basedgeneralizationmethods. 1

In recent years, a number of researchers have proposed generalizationmethodsthat contrast sharply with these data-intensive, similarity-based methods (e.g.,Borgida et al., 1985; DeJong, 1983; Kedar-Cabelli, 1985; Keller, 1983; Lebowitz,1985; Mahadevan, 1985; Minton, 1984; Mitchell, 1983; Mitchell et al., 1985;O'Rorke, 1984; Salzberg & Atkinson, 1984; Schank, 1982; Silver, 1983; Utgoff,1983; Winston et al., 1983). Rather than relying on many training examplesand aninductive bias to constrain the search for a correct generalization, these more recentmethods constrain the search by relying on knowledgeof the task domain and of theconcept under study. After analyzing a single training example, in terms of thisknowledge, these methods are able to produce a valid generalizationof the examplealong with a deductive justificationof the generalization in terms of the system'sklwwleT}g"eTMo'rVpTecisdy, these exptanaTiori-based methods2 analyze the trainingexampleby first constructing an explanationof how the examplesatisfies the defini-tion of the concept under study. The features of the example identified by this ex-planation are then used as the basis for formulating the generalconcept definition.The justificationfor this concept definition follows from theexplanation constructedfor the training example.

Thus, by relying on knowledge of the domain and of the concept under study,explanation-basedmethods overcome the fundamental difficulty associated with in-ductive, similarity-based methods: their inability to justifythe generalizationsthatthey produce. The basic difference between the two classes of methods is thatsimilarity-based methods must rely on some form of inductive bias to guidegeneralization,whereas explanation-based methods rely instead on their domainknowledge. While explanation-basedmethods provide a more reliable means of

1 The term similarity-based generalization was suggested by Lebowitz (1985). We use this term to coverboth methods that search for similarities among positiveexamples, and for differences between positiveand negative examples.

2 The term explanation-basedgeneralization was first introduced by DeJong (1981) to describe his par-ticular generalization method. The authors have previously used the term goal-directed generalization(Mitchell, 1983) to refer to theirown explanation-basedgeneralization method. In this paper, we use theterm explanation-basedgeneralization to refer to the entire class of methods that formulate generaliza-tions by constructing explanations.

Page 3: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

49EXPLANATION-BASED GENERALIZATION

generalization,and are able to extract more information from individual trainingexamples, theyalso require that the learner possess knowledgeof the domain and ofthe concept under study. It seems clear that for a large number of generalizationproblems encountered by intelligent agents, this required knowledge is available tothe learner. In this paper we present and analyze a number of such generalizationproblems.

The purpose of this paper is to consider in detail the capabilitiesand requirementsof explanation-based approaches to generalization, and to introduce a singlemechanism that unifies previously described approaches. In particular, we presenta domain-independent method (calledEBG) for utilizing domain-specificknowledgeto guide generalization, and illustrate its use in a numberof generalizationtasks thathave previously been approached using differing explanation-basedmethods. EBGconstitutes a more general mechanism for explanation-basedgeneralization thanthese previous approaches.Because it requires a larger numberofexplicit inputs (i.e.,the trainingexample,a domain theory, a definition of the concept under study, anda description of the form in which the learned concept must be expressed) EBG canbe instantiated for a wider varietyof learning tasks. Finally, EBG provides a perspec-tive for identifying the present limitations of explanation-basedgeneralization, andfor identifying open research problems in this area.

The remainder of this paper is organized as follows. Section 2 introduces thegeneral EBG method for explanation-based generalization, and illustrates themethod with an example. Section 3 then illustrates the EBG method in the contextof two additional examples: (1) learning a structural definition ofa cup from a train-ing exampleplus knowledge about the function of a cup (based on Winston et al.'s(1983) work), and (2) learning a search heuristic from an example search tree plusknowledge about search and the search space (based on Mitchell et al.'s (1983) workon the LEX system). Section 4 concludes with a generalperspective on explanation-based generalizationand a discussion of significant open research issues in this area.The appendix relates DeJong's (1981, 1983) research on explanation-basedgeneralizationand explanatoryschema acquisition to the other work discussed here.

2. Explanation-basedgeneralization: discussion and an example

Thekey insight behind explanation-basedgeneralizationis that it is possible to forma justifiedgeneralizationof a single positive training exampleprovidedthe learningsystem is endowed with some explanatory capabilities.Inparticular, the system mustbe able to explainto itself why the training example is an example of the conceptunder study. Thus, the generalizeris presumed to possess a definition of the conceptunder study as well as domain knowledge for constructing therequired explanation.In this section, we define more precisely the class of generalizationproblemscoveredby explanation-basedmethods, define the generalEBG method, and illustrate it interms of a specific example problem.

Page 4: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

50 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

2. 1 The explanation-basedgeneralizationproblem

In order to define the generalizationproblem considered here, we first introducesome terminology. A concept is defined as a predicate over some universe of in-stances, and thus characterizes some subset of the instances. Each instance in thisuniverse is describedby a collection of groundliterals that represent its features andtheir values. A concept definition describes the necessary and sufficient conditionsfor being an example of the concept, while a sufficient concept definition describessufficient conditions for being an exampleof the concept. An instance that satisfiesthe concept definition is called an example, or positive example of that concept,whereas an instance that does not satisfy the concept definition is called a negativeexampleof that concept. A generalizationof an exampleis a concept definition whichdescribes a set containing that example.3 An explanationof how an instance is an ex-ample of a concept is a proof that the examplesatisfies the concept definition. Anexplanationstructure is the proof tree, modified by replacing each instantiated ruleby the associated generalrule.

The generic problem definition shown in Table 1 summarizes the class ofgeneralizationproblems considered in this paper. Table 2 illustrates a particular in-stance of an explanation-basedgeneralization problem from this class. As indicatedby these tables, defining an explanation-basedlearning problem involves specifyingfour kinds of information:

The goal concept defines the concept to be acquired. For instance, in theproblem defined in Table 2 the task is to learn to recognize pairs of objects<x,y> such that it is safe to stack x on top of y. Notice that the goal concept

Table I. The explanation-basedgeneralization problem

Given:Goal Concept: A concept definition describing the concept to be learned. (It is assumed that thisconcept definition fails to satisfy the Operationality Criterion.)Training Example: An example of the goal concept.

Domain Theory: A set of rules and facts to be used in explaining how the training example is anexample of the goal concept.

Operationality Criterion: A predicate over concept definitions, specifying the form in which thelearned concept definition must be expressed.

Determine:A generalization ofthe training example that is a sufficient concept definition for the goal concept

and that satisfies the operationality criterion.

3 In fact, weuse the term generalizationin this paper both as a noun (torefer to a general concept defini-tion), and as a verb (to refer to the process of deriving this generalization).

Page 5: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 51

Table 2. The SAFE-TO-STACK generalization problem after Borgida et al. (1985)

Given:Goal Concept: Pairs of objects <x, y> such that SAFE-TO-STACK (x, y), whereSAFE-TO-STACK (x, y) » NOT (FRAGILE (y)) V LIGHTER (x, y).Training Example:ON (OBJ 1, OBJ2)ISA(OBJl, BOX)ISA (OBJ2, ENDTABLE)COLOR (OBJ 1, RED)COLOR (OBJ2, BLUE)VOLUME (OBJ1, 1)DENSITY (OBJI, .1)

Domain Theory:VOLUME (pi, vl) A DENSITY (pi, dl) -» WEIGHT (pi, vl*dl)WEIGHT (pi, wl) A WEIGHT (p2, w2) A LESS (wl, w2) -» LIGHTER pi, p2)ISA (pi, ENDTABLE) - WEIGHT (pi, 5) (default)LESS(.I, 5)

Operationality Criterion: The concept definition must be expressed in terms of the predicates usedto describe examples (e.g.,

VOLUME,

COLOR, DENSITY) or other selected, easily evaluated,predicates from the domain theory(e.g., LESS).

Determine:A generalization oftraining example that is a sufficient concept definitionfor the goal concept andthat satisfies the operationality criterion.

here, SAFE-TO-STACK, is defined in terms of thepredicates FRAGILE andLIGHTER, whereas the training example is defined in terms of other pred-icates (i.e., COLOR, DENSITY, VOLUME, etc.).The training example is a positive exampleof the goal concept. For instance,the training exampleof Table 2 describes a pair of objects, a box and an end-table, where one is safely stacked on the other.The domain theory includes a set of rules and facts that allow explaining howtrainingexamples are members of the goal concept. For instance, the domaintheory for this problem includes definitions of FRAGILE and LIGHTER,rules for inferring features like the WEIGHT of an object from its DENSITYand VOLUME, rules that suggest default values such as the WEIGHT of anENDTABLE, and facts such as '.1 is LESS than s'.The operationality criterion defines the terms in which the output conceptdefinition must be expressed. Our use of this term is based on Mostow's (1981)definition that a procedure is operationalrelative to a given agent and task,provided that the procedure can be applied by the agent to solve the task.Similarly, we assume that the learned concept definition will be used by someagent to perform some task, and must be defined in terms operationalfor

Page 6: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

52 T.M.

MITCHELL,

R.M. KELLER AND S.T. KEDAR-CABELLI

that agent and task. For this problem, the operationality criterion requiresthat the final concept definition be described in terms of the predicates usedto describe the training example(e.g., COLOR, VOLUME, DENSITY) or interms of a selected set of easily evaluated predicates from the domain theory(e.g., LESS). Reexpressing the goal concept in these terms will make it opera-tional with respect to the task of efficiently recognizing examples of theconcept.

Given these four inputs, the task is to determine a generalizationof the trainingexample that is a sufficient concept definition for the goal concept and that satisfiesthe operationality criterion. Note that the notion of operationality is crucial forexplanation-based generalization:if the operationality criterion were not specified,the input goal concept definition could alwaysbe a correct output concept definitionand there would be nothingto learn! The operationality criterion imposes arequire-ment that learned concept definitions must be not only correct, but also in a usableform before learning is complete. This additional requirement is based on the view-point that concept definitions are not learned as theoretical entities, but rather aspractical entities to be used by a particular agent for a particular task.

2.2 The EBG method

The EBG method, which is designed to address the above class of problems, is de-fined as follows:

The EBG method1 . Explain: Construct an explanationin terms of the domain theory thatproves how

the training examplesatisfies the goal concept definition.

This explanation must be constructed so that each branch of the explana-tion structure terminates in an expression that satisfies the operationalitycriterion.

2. Generalize: Determine a set of sufficient conditions under which the explanationstructure holds, stated in terms that satisfy the operationalitycriterion.

This is accomplishedby regressing the goal conceptthrough the explanationstructure. The conjunctionof theresultingregressed expressions constitutesthe desired concept definition.

To see more concretely how the EBG method works, consider again the problemof learning the concept SAFE-TO-STACK (x, y). The bottom of Figure 1 shows atraining example for this problem, described in terms of a semantic network of ob-jectsandrelations. Inparticular, the example consists of twophysical objects, OBJI

Page 7: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

53EXPLANATION-BASED GENERALIZATION

EXPLANATIONSTRUCTURE:

SAFE

-

TO

STACK (OBJ1.0BJ2)

LIGHTER (OBJI, OBJ2)

Figure 1. Explanationof SAFE-TO-STACK (OBJI, OBJ2).

and OBJ2, in which OBJI is ON OBJ2, and for which several features of the objectsare described (e.g., their OWNERs, COLORs).

Given this training example, the task is to determine which of its features are rele-vant to characterizing the goal concept, and which are irrelevant. To this end, thefirst step of theEBG method is to construct an explanation of how the training exam-ple satisfies the goal concept. Notice that the explanationconstitutes a proof, andconstructing such an explanationtherefore may involve in general the complexitiesof theorem proving. The explanationfor the training example depicted in the lower

Page 8: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

54 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

portion of Figure 1 is given in the top portionof the figure. As shown there, the pairof objects <OBJI, OBJ2> satisfies the goal concept SAFE-TO-STACK becauseOBJI is LIGHTER than OBJ2. Furthermore, this is known because the WEIGHTSof OBJI and OBJ2 can be inferred. For OBJI, the WEIGHT is inferred from itsDENSITY and VOLUME, whereas for OBJ2 the WEIGHT is inferred based on arule that specifies the default weight of ENDTABLEs in general.

Through this chain of inferences, the explanation structure demonstrates howOBJI and OBJ2 satisfy the goal concept definition. Note that the explanationstruc-ture has been constructedso that each of its branches terminates in an expressionthatsatisfies the operationality criterion (e.g., VOLUME (OBJI, 1),LESS (.1,5)). In thisway, the explanationstructure singles out those features of the trainingexample thatare relevant to satisfying the goal concept, and that provide thebasis for constructinga justifiedgeneralizationof the training example.For thecurrent example, theserele-vant training example features are shown shaded over in the figure, and correspondto the conjunction VOLUME (OBJI, 1) A DENSITY (OBJI, 0.1) A ISA (OBJ2,ENDTABLE).

Whereas the first step of theEBG method isolates therelevant features of the train-ing example, it does not determine the desired generalizedconstraints on featurevalues. For instance, while thefeature VOLUME(OBJI, 1)is relevant to explaininghow thepresent training examplesatisfies the goal concept, the generalconstraint onacceptablevalues for VOLUME is yet to be determined. The second step of the EBGmethod therefore generalizes on those feature values selected by the first step, bydetermining sufficient conditions on these feature values that allow each step in theexplanationstructure to carry through.

In order to determine generalsufficient conditions under which the explanationholds, the second step of the EBG method involves regressing (back propagating)thegoalconcept stepby stepback through the explanationstructure. In general, regress-ing a given formula F through arule R is a mechanism for determining the necessaryand sufficient (weakest) conditions under which that rule R can be used to infer F.We employ a slightly modified version of the goal-regression algorithm describedby Waldinger (1977) and Nilsson (1980).4 Our modified goalregression algorithmcomputes an expression that represents only a sufficient condition (rather thannecessary and sufficient conditions) under which rule R can be used to infer formulaF, but that correspondsclosely to the training example under consideration. In par-ticular, whereas the general goalregressionalgorithm considers all possible variablebindings (unifications) under which R can infer F, our modified algorithmconsidersonly the specific variable bindings used in the explanationof the training example.Furthermore, if therule R contains a disjunctive antecedent(left-handside), then our

4 Dijkstra (1976) introduces the related notion of weakest preconditions in the context of provingprogram correctness. The weakest preconditionsof a program characterize the set of all initial statesof that program such that activation guarantees a final state satisfying somepostcondition.

Page 9: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 55

GOAL CONCEPT: SAFE

-

TO- STACK

(x. y)

WEIGHT (pl,wl) LESS(wI,w2) WEIGHT (p2, w2)

WEIGHT

(x, w1) LESS(wI.w2)

WEIGHT

(y, w2)

R3: WEIGHT (pl,vl*dl) D4: WEIGHT (p2, 5)fx/pl,vl*dl/w1) (y/p2, 5/w2|

VOLUME (Pl,vl)

DENSITY

(pi. dl) ISA (p2, ENDTABLE)

VOLUME

(x, vl)

DENSITY

(x, dl) LESS(vl'dl, 5)

ISA

(y. ENDTABLE)

Figure 2. Generalizing from the explanation of SAFE-TO-STACK (OBJI, OBJ2). (Underlined expres-sions are the results of regressing the goal concept.)

algorithmconsiders only the particular disjuncts satisfied by the training example.Figure 2 illustrates the second step of theEBG method in the contextof i i . . SAFE-

TO-STACK example. In the first (topmost) step of this figure, the goal concept ex-pression SAFE-TO-STACK (x, y) is regressed through the rule LIGHTER (pi, p2)-> SAFE-TO-STACK (pi, p2),5 to determine that LIGHTER (x, y) is a sufficientcondition for inferring SAFE-TO-STACK (x, y). Similarly, regressing LIGHTER(x, y) through the next step in the explanation structure yields the expressionWEIGHT (x, wl) A WEIGHT (y, w2) A LESS (wl, w2). This expression is in turnregressed through thefinal steps of theexplanationstructure to yieldthe operationaldefinition for SAFE-TO-STACK (x, y).

To illustrate the goal regression process in greaterdetail, consider the final step ofFigure 2 in which the expression WEIGHT (x, wl) A WEIGHT (y, w2) A LESS (wl,w2) is regressed through the final steps of the explanationstructure. Each conjunctof the expression is regressed separately through the appropriaterule, in the follow-ing way. The conjunct is unified (matched) with the consequent (right-hand side) of

5 Notice that the definition ofSAFE-TO-STACK given in Table 2 is a disjunctive definition. As notedabove, the procedure considers only the disjunct that is satisfied by the current training example(e.g., thedisjunct involving the LIGHTER predicate).

Page 10: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI56

therule to yieldsome set of substitutions (particularvariable bindings). The substitu-tion consistent with the example is then applied to the antecedent (left-handside) ofthe rule to yield the resulting regressed expression.6 Any conjuncts of the originalexpression which cannot be unified with the consequentof any rule are simply addedto the resulting regressed expression (with the substitutions applied to them). As il-lustrated in the figure, regressing the conjunct WEIGHT (x^wl) through the ruleVOLUME (pi, vl) A DENSITY (pi, dl) -» WEIGHT (pi, vl*dl) therefore yieldsVOLUME (x, vl) A DENSITY (x, dl). Regressing the conjunct WEIGHT (y, w2)through the rule ISA (p2, ENDTABLE) - WEIGHT (p2, 5) yields ISA (y, END-TABLE). Finally, since no rule consequent can be unified with the conjunct LESS(wl, w2), this conjunct is simply added to the resulting regressed expression afterapplying the substitutions produced by regressing the other conjuncts. In this casethese substitutions are {x/pl,vl*dl/wl,y/p2, 5/w2},which yieldthe third conjunctLESS (vl*dl, 5). The final, operationaldefinition for SAFE-TO-STACK (x, y) istherefore:

VOLUME (x, vl)A DENSITY (x, dl)A LESS(vl*dl, 5)A ISA (y, ENDTABLE) - SAFE-TO-STACK (x, y)

This expressioncharacterizes in operational terms the features of the training ex-ample that are sufficient for the explanation structure to carry through in general.As such, it represents a justified generalizationof the training example, for whichthe explanationstructure serves as a justification.

2.3 Discussion

Several general points regarding the EBG method are illustrated in the above exam-ple. The main point of the above example is that theEBG method produces a justifiedgeneralization from a single training example in a two-step process. The first stepcreates an explanation thatseparates therelevant feature values in the examplesfromthe irrelevant ones. The second step analyzes this explanation to determine the par-ticular constraints on these feature values that are sufficient for the explanationstructure to apply in general. Thus, explanation-based methods such as EBG over-come the main limitation of similarity-based methods: their inability to producejustified generalizations. This is accomplished by assuming that the learner hasavailable knowledge of the domain, the goal concept, and the operationality

6 It is correctly observed in DeJong (1986) that the substitution list used to regress expressionsthroughprevious steps in theexplanationmust be applied to the current expressionbeforethe next regression step.

Page 11: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 57

criterion, whereas similarity-based generalization does not rely on any of theseinputs.

A second point illustrated by the above example is that the languagein which thefinal concept definition is stated can be quite rich. Notice in the above example thatthe final generalization includes a constraint that the product of the DENSITY andVOLUME of x must be less than 5. There are very many such relations among theparts of the training example that might be considered during generalization (e.g.,why not consider the fact that the OWNERs of the two objects are of differentSEX?). The interesting point here is that the appropriate constraint was deriveddirectly by analyzing the explanation, without considering the universe of possiblerelations among parts of the training example. This is in marked contrast withsimilarity-based generalizationmethods (e.g. Michalski, 1983; Quinlan, 1985). Suchmethods are typically based on a heuristic focusing criterion, such as the heuristicthat 'less complex features are preferred over more complexfeatures for character-izing concepts'. Therefore, before such methods will consider the feature LESS(vl*dl, 5) as a plausible basis for generalization, they must first consider vastnumbers of syntactically simpler, irrelevant features.

A final point illustrated by this example is that the final concept definition pro-duced by EBG is typically a specialization of the goal concept rather than a directreexpression of the concept. This is largelydueto the fact that the explanationstruc-ture is created for the given training example, and does not explain every possibleexample of the goal concept. Thus, the generalization produced from analyzing thisexplanationwill only cover examplesfor which theexplanation holds. Furthermore,because the modified goal regression algorithm computes only sufficient (notnecessary and sufficient) conditions under which the explanationholds, it leads toa further specialization of the concept. This limitation of explanation-basedgeneralization suggests an interesting problem for further research: developingexplanation-basedmethods that can utilizemultipletrainingexamples(see the discus-sion in Section 4).

3. Other examplesand variations

This section discusses two additional examples of explanation-basedgeneralizationthat have previously been reported in the literature. The first is Winston, Binford,Katz, and Lowry's (1983) research on learning structural definitions of concepts suchas 'cup' from their functional definitions. The second is Mitchell, Keller, andUtgoff's (1983) research on learning search heuristics from examples (see alsoUtgoff, 1984; Keller, 1983).A common perspective on these two systems is providedby instantiating the EBG method for both problems. Differences among the twooriginal approachesand the EBG method are also considered, in order to underscoresome subtleties of the EBG method, and to suggest some possible variations.

Page 12: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

58 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

3.1 An example: learning the concept CUP

In this subsection we first summarize the applicationof the EBG method to a secondexample of an explanation-based generalizationproblem: the CUP generalizationproblem, patterned after the work of Winston et al. (1983). We then discuss the rela-tionship between the EBG method and the ANALOGY program of Winston et al.,which addresses this same problem.

TheCUP generalizationproblem, defined in Table 3, involves learning a structuraldefinition of acup from its functional definition. Inparticular, the goal concept hereis the concept CUP, defined as theclass of objects that are OPEN-VESSELS, LIFT-ABLE, and STABLE. The domain theory includesrules that relate these propertiesto the more primitivestructural properties ofphysical objects, such as FLAT, HAN-DLE, etc. The operationality criterion requires that the output concept definition beuseful for the task of visually recognizing examples of CUPs. It thus requires thatthe output concept definition be expressed in terms of its structural features.

Figure 3 illustrates a training example describing a particular cup, OBJI, alongwith the explanation that shows how OBJI satisfies the goal concept CUP. In par-ticular, the explanationindicates how OBJI is LIFTABLE, STABLE, and an OPEN-VESSEL. As in the SAFE-TO-STACK example, this explanationdistinguishes therelevant features of the training example (e.g., its LlGHTness) from irrelevantfeatures (e.g., its COLOR), The second step of the EBG method, regressing the goalconcept through the explanation structure, results in the following generaldefinitionof the CUP concept:

Table 3.The CUP generalization problem after Winston et al. (1983)

Given:Goal Concept: Class of objects,

x,

such that CUP(x), whereCUP(x) «. LIFTABLE(x) A STABLE(x) A OPEN-VESSEL(x)Training Example:OWNER(OBJI, EDGAR)PART-OF(OBJl, CONCAVITY- 1)IS(OBJl, LIGHT)

Domain Theory:IS(x, LIGHT) A PART-OF(x, y) A ISA(y, HANDLE) - LIFTABLE(x)PART-OF(x, y) A ISA(y, BOTTOM) A IS(y, FLAT) -+ ST4gILE,(x)PART-OF(x, y) A ISA(y, CONCAVITY) A IS(y, UPWARD-POINTING - OPEN-VESSEL(x)

Operationality Criterion: Concept definition must be expressedin terms of structural features usedin describing examples (e.g., LIGHT, HANDLE, FLAT, etc.).

Determine:A generalization oftraining example that is a sufficient concept definition for the goal concept andthat satisfies the operationality criterion.

P /A//4 L. (PART-OF (x, xc) A ISA (xc, CONCAVITY) A IS (xc, UPWARD-POINTING)

' ">/ 'CA . A PART-°F <x' xb ) A ISA <xb > BOTTOM) AIS (xb, FLAT)/ 4.-_W n*-

n

\ A PART.OF (Xj xh) A ISA (xh> HANDLE) AIS (x, LIGHT)) -► CUP (x)

»««^**

f -r~ui~ ">

tv„ /~'i ;r>

i: »: vi

„f* \17:«„<..

— „..

„i nnoi\

Page 13: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATIONSTRUCTURE:

OPEN-VESSEL (OBJ1)

=ART-OF(OBJ1, CONCAVITY-1)ISA(CONCAVITY-1, CONCAVITY)

IS (CONCAVITY-1, UPWARD-POINTING)

LIFTABLE(OBJI)

IS(OBJ1, LIGHT)

PART-OF (OBJI, HANDLE-1)ISA(HANDLE-1, HANDLE)

STABLE (OBJ1)

TRAININGEXAMPLE:

Figure 3. Explanationof CUP (OBJ1).

CUP(OBJ1)

Page 14: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

60 T.M.

MITCHELL,

R.M. KELLER AND ST. KEDAR-CABELLI

3.1.1 Discussion

As in the first example, the EBG method is able to produce a valid generalizationfrom a singletraining example,by explainingand analyzinghow the trainingexamplesatisfies the definition of the goal concept. Notice that the regression step in thisexample is quite straightforward, and leads to generalizing the training examplefeatures effectively by replacing constants with variables (i.e., replacing OBJI by x).It is interesting that several earlier attempts at explanation-basedgeneralization(e.g.,Mitchell, 1983) involved the assumption that the explanation could always begeneralized simply by replacing constantsby variables in theexplanation, without theneed to regress the goal concept through the explanation structure. 7 As the earlierSAFE-TO-STACK example illustrates, this is not the case. In general, one mustregress the goal concept through the explanation structure to ensure a valid general-ization of the training example (which may involve composing terms from differentparts of the explanation,requiring constants where no generalization is possible, andso on).

Several interesting features of this example come to light when the EBG methodis compared to the method used in Winston et al.'s (1983) ANALOGY program,upon which this exampleproblem is based. The most striking differencebetween thetwo methods is that although ANALOGY does construct an explanation,and alsouses this explanationto generalizefrom a singleexample, the system has no domaintheory of thekind used in theabove example. Instead, Winston's program constructsits explanations by drawing analogies between the training exampleand a library ofprecedent cases (e.g., annotated descriptions of example suitcases, bricks, bowls,etc.). For example, ANALOGY explains that the FLAT BOTTOM of OBJI allowsOBJI to be STABLE, by drawingan analogy to a stored description of a brick whichhas been annotated with the assertion that its FLAT BOTTOM 'causes' it to beSTABLE. Similarly, it relies on an annotated exampleof a suitcase to explainbyanalogywhy a handle allows OBJI to be LIFTABLE.

In general, the precedents used by ANALOGY are assumed to be annotated bylinks that indicate which features of theprecedent account for which of itsproperties.Thus, for the ANALOGY program, the causal links distributed over the library ofprecedents constitute its domain theory. However, this theory is qualitatively dif-ferent than the domain theory used in the CUP exampleabove: it is described by ex-tension rather than intention (i.e., by examples rather than by generalrules), and istherefore a weaker domain theory. Because ANALOGY'S knowledgeabout causalityis summarized by acollection of instances of causal relations rather than by generalrules of causality, its theory cannot lead deductivelyto assertions about new causallinks.

7 This observation is due in part to Sridhar Mahadevan.

Page 15: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 61

Because its domain theory is weak, theANALOGY system raises some interestingquestions about explanation-basedgeneralization.Whereas the SAFE-TO-STACKand CUP examplesabove show how a sufficient set of domain theory rules can pro-vide powerful guidance for generalization,ANALOGY suggests how a weaker, ex-tensional theory might be used to focus the generalization process in a weakerfashion. In particular, thecausal links are used by ANALOGY to construct aplausi-ble explanation,but not a proof, that the training example satisfies the goalconceptdefinition. As discussedabove, such plausible explanationscan guide generalizationby focusing on plausibly relevant features of the training example. But sinceANALOGY lacks general inferencerules to characterize the links in this explanation,it cannotperform the second (goalregression) step in theEBG method, and thereforehas no valid basis for generalizingthe explanation.In fact, the ANALOGY programgeneralizes anyway, implicitly, assuming that each causal link of the form (feature(OBJ1) -► property (OBJ1)) is supported by a generalrule of theform ((Vx) feature(x) -» property (x)).8 Thus, ANALOGY represents an important step in consideringthe use of a weak domain theory to guide generalization, and helps to illuminate anumber of open research issues (see the discussion in Section 4).

3.2 An example: learninga search heuristic

This section presentsa third example ofan explanation-basedgeneralizationproblem- this one involving the learningof search heuristics. This example is based on theproblem addressed by the LEX program (Mitchell et al., 1983) which learns searchcontrol heuristics for solving problems in the integral calculus. In particular, LEXbegins with a set of legal operators (transformations) for solving integrals (e.g., in-tegration by parts, movingconstants outside the integrand).For each such operator,the system learns a heuristic that summarizes the class of integrals (problem states)for which it is useful to apply that operator. For example, one typical heuristiclearned by the LEX system is:

IF the integral is of the form f <polynomial'-fn> " < trigonometric-fn> dx,THEN apply Integration-by-Parts

Thus, for each of its given operators, LEX faces a generalizationproblem: learningthe class of integrals for which that operator is useful in reaching a solution.

Table 4 defines the generalizationproblem that corresponds to learning when it isuseful to apply OP3 (movingconstants outside the integrand). Here the goal conceptUSEFUL-OP3(x) describes theclass of problemstates (integrals) for which OP3 will

8 Winston's (1985) own workon building rule censors canbe viewed asan attempt to address difficultiesthat arise from this implicit assumption.

Page 16: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

62 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

Table 4. The search heuristic generalization problem after Mitchell et al. (1983)

Given:Goal Concept: The class of integral expressions that can be solved by first applying operator OP3(removing a constant from the integrand); that is, the class of integrals, x, such that USEFUL-OP3(x), where

USEFUL-OP3(x) « NOT(SOLVED(x)) A SOLVABLE(OP3(x))

and

Domain Theory:SOLVABLE(x) «» (3op) (SOLVED(op(x)) V SOLVABLE(op(x)))MATCHES(x, 'J <any-fn>dx') -» NOT (SOLVED(x))MATCHES(x, i<any-Jn>') -» SOLVED(x)MATCHES(op(x), y) «» MATCHES(x, REGRESS(y, op))Operationality Criterion: Concept definition must be expressed in a form that uses easily-com-putable features of the problem state, x, (e.g., features such as <polynomial-fn > ,

<transcendental-fn>,

<any-fn>, r, k).

Determine:A generalization oftraining example that is a sufficient concept definition ofthe goal concept andthat satisfies the operationality criterion.

be useful. This is defined to be the class of problem states that are NOT alreadySOLVED (i.e., algebraic expressions that contain an integral sign), and for whichapplying OP3 leads to a SOLVABLEproblem state. The domain theory in this casecontains rules thatrelate SOLVED and SOLVABLE to observable features ofprob-lem states (e.g., one rule states that if problem state x MATCHES the expression\ <any-fn > dx, thenx is NOT a SOLVED state.). Notice that some of the domaintheory rules constitute knowledge about search and problem solving in general(e.g.,the definition of SOLVABLE), while otherrules representknowledge specific to in-tegral calculus (e.g., that the absence of an integral sign denotes a SOLVED state).The operationality condition in this problem requires that the final concept definitionbe stated in terms of easily-computable features of the givenproblem state. This re-quirement assures that the final concept definition will be in a form that permits itseffective use as a search control heuristic. 9 For the LEX program, the set of easily-computable features is described by a well-defined generalization language overproblem states that includes features (e.g. , < trigonometric-fn> , <real-constant >)which LEX can efficientlyrecognize using its MATCHES predicate.

9 If the concept definition were permitted to include features that are difficult to compute (e.g.,SOLVABLE), then the resulting heuristic would be so expensive to evaluate that its use would degrade,rather than improve, the problem solver's performance.

OP3: | r " <any-fn>dx -> r J <any-fn>dx.Training Example: \ 1x2 dx

Page 17: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

63EXPLANATION-BASED GENERALIZATION

EXPLANATIONSTRUCTURE: USEFUL -OP3(/7x2 dx)

TRAININGEXAMPLE:

Figure 4. Explanation of USEFUL-OP 3(f 7x 2 dx).

Atraining examplefor the goal concept USEFUL-OP3is shown in thebottom por-tion of Figure 4. Inparticular, theproblem state 17x 1dxconstitutes a trainingexam-ple of aproblem state for which application of OP3 is useful. The training exampledescribed in the figure is shown along with the other problem states involved in itssolution.

The explanation of USEFUL-OP 3(j lx2 dx) is shown in the top portion ofFigure 4. The left-hand branch of this explanation structure leads to a node whichasserts that USEFUL-OP3is satisfied in part because the training examplestate isnot alreadya SOLVED state. The right-hand branch of the explanationstructure ex-plains that applying OP3 to the examplestate leads to a SOLVABLEproblem state.

Page 18: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

64 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

This is in turn explained by indicating that applying OP9 10 to the resulting statepro-duces a SOLVED problem, as evidenced by the fact that the resulting stateMATCHES the expression i <any-fn>' (i.e., that it contains no integral sign).Thus, up to this point(marked as (a) in the figure),each step in the right-handbranchof the explanationstructure corresponds to some step along the solution path of thetraining example.

By point (a), the explanationhas indicated that one relevant feature of the trainingexample state is that the result of applying OP3 followed by OP9, is a state thatMATCHES l <any-fn>\ The operationalitycriterion requires, however, that theexplanation be in terms of features of the single given training examplestate, ratherthan features of its resulting solution state. Thus, the remainder of the explanationconsists of reexpressing this constraint in terms of the training examplestate Thisis accomplishedby applying the last rule in the domain theory of Table 4. This rule 1 'allows back propagating the expression '<any-fn>' through the general defini-tions ofOP9 and OP3, to determine theequivalent constraint on thetraining examplestate. The resulting constraint is that the training example state must MATCH theexpression '<any-fn> jr\-xn^-ldx' (Here n and r2 stand for two distinct realnumbers, wherer2must not be equalto - 1.). This togetherwith the left-hand branchof the explanation structure, explainswhich features of j lx2 dx guarantee that itsatisfies the goal concept USEFUL-OP3.

Given this explanationstructure, the second step of the EBG method is straight-forward. As in the CUP example, regressing the goal concept expression USEFUL-OP3 (x) through the explanationstructure effectivelyresults in replacing the trainingexamplestateby a variable, so that theresulting generalization(takenfrom the leavesof the explanation tree) is:

which simplifies to:

11 The domain theoryrule MATCHES (op (x), y) * MATCHES (x, REGRESS (y, op)) indicates thatif the result of applying operator op tostate xMATCHES some expressiony, then the state x MATCHESsome expression which can be computed by REGRESSing the expressiony through operator op. Noticethat the regression here involves propagating constraints on problem states through problem solvingoperators. This is a different regression step from thesecond step of the EBG process, in which the goalconcept is regressed through the domain theory rules used in the explanation structure.

MATCHES (x, 'j <any-fn>dx') A

MATCHES (x, i <any-fn> \r_> xn^-idx1 ) -* USEFUL-OP3

MATCHES (x, i <any-fn> f - xr2^-\dx') -> USEFUL-OP3(x)

10 OP9: j jcr **_,

_/jf — xr+l /(r + 1).

Page 19: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 65

3.2. 1 Discussion

To summarize, this example demonstrates again the generalEBG method of con-structing an explanation in terms that satisfy the operationality condition, thenregressing the goalconcept through the explanationstructure to determine a justifiedgeneralization of the training example. As in the previous examples, this processresults in a generalizationof the trainingexamplewhich is a sufficient condition forsatisfying the goal concept, and which is justified in terms of the goal concept,domain theory, and operationalitycriterion.

In the above example, the goal concept corresponds to the precondition for asearch heuristic that is to be learned. The domain theory therefore involves bothdomain-independent knowledge about search (e.g., the definition of SOLVABLE)and domain-dependentknowledge (e.g., how to recognize a SOLVED integral). Touse this method to learn heuristics in a new domain, one would have to replace onlythe domain-dependentportion of the theory. To learn a different type of heuristicin the same domain, one could leave the domain theory intact, changing only thedefinition of the USEFUL-OP3goal concept accordingly. For example,as suggestedin Mitchell (1984), the system could be modified to learn heuristics that suggest onlysteps along the minimum cost solution path, by changing the goal concept to

USEFUL-OP3(s) .» NOT(SOLVED (s)) AMIN-COST-SOLN (SOLUTION-PATH (OP3, s))

Note that the explanationstructure in this example, like the domain theory, separatesinto a domain-independent and a domain-dependent portion. Domain-independentknowledge about search is applied above point (a) in Figure 4, and domain-dependent knowledge about calculus problem solving operators is applied belowpoint (a). In the implementation of the LEX2program (Mitchell, 1983), these twophases of the explanationwere considered to be two unrelated subprocessesand wereimplemented as separate procedures. From the perspective afforded by the EBGmethod, however, these two subprocesses are better seen as different portions of thesame explanation-generation step.

One final point regarding the current example has to do with the ability ofexplanation-based methods to augment their description language of concepts. InLEX2, as in the SAFE-TO-STACK problem, this method is able to isolate fairlycom-plex features of the training example that are directlyrelated to the explanationofhow it satisfies the goal concept. In the context of the LEX project, Utgoff (1985)studied this issue and developedthe STABB subsystem. STABB is able toextend theinitial vocabulary of terms used by LEX, by naming and assimilating terms that cor-respond to the constraints derived during the regression step. For example, in oneinstance STABB derived the definition of odd integers through this regression step,defining it as 'the set of real numbers, x, such that subtracting 1 then dividing by 2produces an integer.

Page 20: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

66 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

3.2.2 Relatedmethods for strategy learning

There are a number of additional systems that learn problem solving strategies byanalyzingsingle examples of successful solutions. 12 These systems (e.g. Fikes et al.,1972; Utgoff, 1984; Minton, 1984; Mahadevan, 1985), which we might call STRIPS-Iike systems, can all be viewed as systems that learn a goal concept of the followingform: 'the set of problem states such that applying a.given operator sequence, OS,yields a final state matching a given solution property, P.' Since these systems aretuned to this single goal concept, and are not intended to learn other forms of con-cepts, they typically do notrepresent the goalconcept and explanation declaratively.However, they dorepresent the solution property, P, explicitly, and regress this prop-erty through the operatorsequenceOS to determine an operationaldefinition of the(implicit) goal concept. From the perspective of the above LEX example, the stepsthat they perform correspond to constructing the portion of the explanation belowpoint (a) in Figure 4. It is in constructing these steps of the explanation that LEXregresses its solution property 'MATCHES (x, <any-fn>y through the operatorsequence < OP3, OP9 > to determine the equivalentconstraint on theinitial problemstate. These STRIPS-likesystems donot construct theportionof the explanationcor-responding to the section above point (a) in Figure 4. Because they are tuned to afixed goal concept, they do not need to generate this portion of the explanationexplicitly for each training example.

To illustrate this point, consider Minton's (1984) program for learning searchheuristics in two-person games such as Go-Moku, Tic-Tac-Toe, and Chess. Thisprogram analyzes one positive instance of a sequence of moves that leads to awinning board position, in order to determine an operational definition of the goalconcept 'the class of board positions for which the given sequenceof moves leads toa forced win. But Minton's system has no explicit definition of this goal concept.It has onlya definition of the solution propertyP that characterizes a winningposi-tion. For example, in the gameof Go-Moku, (a variant ofTic-Tac-Toe) this solutionproperty characterizes a winning board position as 'a board position with five X'sin a row. This solution property is regressed by Minton's program through theoperator sequence for the given training example. In this way, the program deter-mines that an effective definition of its implicit goal concept is 'board positions thatcontain threeX's, with a blank spaceon one side, and twoblank spaces on theother.

12 SeeKibler and Porter(1985) for athoughtful critique ofanalytic goal regression methods for learningsearch control heuristics. They discuss certain requirements for regression to succeed: that the operators

be invertible, and that the representation language be able to express goal regression products.

Page 21: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 67

4. Perspective and research issues

The previous sections presented a generalmethod for explanation-basedgeneraliza-tion, and illustrated its applicationto several generalizationtasks. This section sum-marizes some general points regarding explanation-basedgeneralization,and con-siders a number of outstanding research issues.

To summarize, explanation-based generalization utilizes a domain theory andknowledge of the goal concept to guide the generalizationprocess. By doing so, themethod is able to produce a valid generalizationof the training example, along withan explanation that serves as a justificationfor the generalization.The EBG methodintroduced hereunifies mechanisms for explanation-based generalizationthat havebeen previously reported for a variety of task domains. The generalityof the EBGmethod stemsfrom thefact that the goalconcept, domain theory, and operationalitycriterion are made explicit inputs to the method, rather than instantiated implicitlywithin the method.

4 . 1 Perspectives on explanation-basedgeneralization

Several perspectives on explanation-basedgeneralization,and on the EBG methodin particular, are useful in understanding their strengths and weaknesses:

EBG as theory-guided generalizationof training examples. EBG can be seen as theprocess of interpreting or perceiving agiventraining exampleas a member of the goalconcept, based on a theoryof the domain. Soloway's (1978) early work on learningaction sequences in the gameofbaseball shares this viewpoint on generalization.Thisis theperspective stressed in the above sections, and it highlights the centrality of thegoalconcept anddomain theory..Jlalsojhjj^ljghts an important feature of EBG: thati£2"iA&-d l̂u^ knows. One consequence ofthis is that the degree of generalizationproduced for a particular training examplewill depend strongly on the generality with which the rules in the domain theory areexpressed.A second consequenceis that the learning system can improve its learningperformance to the degree that it can learn new rules for its domain theory.

EBG as example-guidedoperationalizationof the goal concept. One can also viewEBG as the process of reformulating the goal concept in terms that satisfy the opera-tionality criterion, with the domain theory providing the means for reexpressing theJ?£L.9.9ncePt- Given this perspective, one wonders why training examples are re-quired at all. In principle, they are not. Mostow's (1983) FOO system operationalizesgeneral advice about how to play the card game of Hearts, without consideringspecific examples that apply that advice. Similarly,Keller (1983) describes a processof concept operationalization,by which a sequenceof transformations is applied to

Page 22: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

68 T.M.

MITCHELL,

R.M. KELLER AND S.T. KEDAR-CABELLI

ji

the goal concept in search of a reformulation that satisfies the operationalitycriterion, without the guidanceof specific training examples.

However, training examplescan be critical in guiding the learner to consider rele-yantTfarisforhiatidrisof the'gSaTconicept"For instance, consider the CUP learningtask as described in Section 3'Tj wfiere a functional definition of CUP is reexpressedin structural terms for use by a vision system recognizing cups. A system that refor-mulates the functional definition of CUP in structural terms, without the guidanceof training examples, amounts to a system for producing all possible structuraldefinitions for classes of cups (i.e., for designing all possibleclasses of cups). Sincethere are so many possible designs for cups, and since so few of these are actuallyencountered in the world, the learning system could easily waste its effort learningstructural definitions corresponding to cups that will never be seen by the visionsystem! 13 Training examples thus provide a means of focusing the learner on for-mulatingonly concept descriptions that are relevant to the environment in which itoperates.

EBG as Reformulating/Operationalizing/Deducing from what is already known.The above paragraph suggests that explanation-based generalizationdoes not leadto acquiring truly 'new' knowledge, but only enables the learner to reformulate/operationalize/deduce what the learner already knows implicitly. While this state-ment is true, it is somewhat misleading. Consider, for example, the task of learningto play chess. Once one is told the rules of the game (e.g., the legal moves, and howto recognize a checkmate), one knows in principle everything there is to know aboutchess - even the optimal strategyfor playingchess follows deductivelyfrom therulesof the game. Thus, although theEBG method is restricted tocompiling the deductiveconsequences of its existing domain theory, this kind of learning is often nontrivial(as is the case for learning chess strategies). Nevertheless, it is a significant limitationthat EBG is highly dependent upon its domain theory. As discussed below, furtherresearch is needed to extend the method to generalization tasks in which the domaintheory is not sufficient to deductively infer the desired concept.

4.2 Research issues

4.2.1 Imperfect theory problems

As the above discussion points out, one important assumption of EBG is that the

13 Ofcourse information about what types ofcups areto be encountered by the visionsystem also couldbe presentedin the operationality criterion, since this informationrelates to the use ofthe concept defini-tion for the recognition task. This

information,

however, may not be easily described in the declarativeform required by the operationality criterion.

Page 23: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 69

domain theory is sufficient to prove that the training exampleis a member of the goalconcept; that is, that the inferred generalizations follow deductively(even if remote-ly) from what the learner already knows. Although this assumption is satisfied ineach of theexampleproblems presentedabove, and although there are interesting do-mains in which this assumption is satisfied (e.g., chess, circuit design (Mitchell et al.,1985)), for the majority of real-world learning tasks it is unrealistic to assume thatthe learner begins with such a strong theory. For both the SAFE-TO-STACK domainand the CUP domain, it is easy to imagine more realistic examplesfor which the re-quired domain theory is extremely complex, difficult to describe, or simplyunknown. For generalizationproblems such as inferringgeneralrules for predictingthe stock market or the weather, it is clear that available theories ofeconomics andmeteorology are insufficient to produce absolutely predictive rules. Thus, a majorresearch issue for explanation-basedgeneralizationis to developmethods that utilizeimperfect domain theories to guide generalization,as well as methods for improvingimperfect theories as learning proceeds. The problem of dealing with imperfecttheories can be broken down into several classes of problems:

TheIncompleteTheoryProblem. Thestock market and weather prediction examplesabove both illustrate the incomplete theory problem. The issue here is that suchtheories are not complete enough toprove that the training example is a member ofthe goal concept (e.g., toprove why a particular training example stock has doubledover a twelve month period). However, evenan incomplete theory might allow con-structing plausible explanationssummarizing likely links between features of thetraining example and the goal concept. For example, even a weak theory ofeconomics allows one to suggest that the 'cash on hand' of thecompany mayberele-vant to the goal concept 'stocks that double over a twelve month period', whereasthe 'middle initial of the companypresident' is probably an irrelevant feature. Thus,incomplete theories that contain onlyinformation about plausiblecause-effect rela-tions, with onlyqualitativerather than quantitativeassociations,can still provideim-portant guidance in generalizing. Methods for utilizing andrefining such incompletetheories would constitute a major step forward in understanding explanation-basedgeneralization.

The Intractable TheoryProblem. A second class of imperfect theories includes thosewhich are complete, but for which it is computationally prohibitive to construct ex-planations in terms of the theory. For instance, quantum physics constitutes a fairlycomplete theory that would be inappropriate for generating explanations in theSAFE-TO-STACK problem - generating the necessary explanations in terms ofquantum physics is clearly intractable. Similarly, although the rules of chess con-stitute a domain theory sufficient to explain why any given move is good or bad, onewould never use this theory to explain why the opening move 'pawn to king four'is a member of the goal concept 'moves that lead to a win or draw for white. In fact,

Page 24: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

70 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

this theory of chess is intractable for explaining why nearly any moveis good orbad.Humans tend to respond to this problem by constructing more abstract, tractabletheories that are approximations to the underlying intractable theory. In chess, forexample, the learner might formulate a more abstract theory that includes approx-imate assertions such as 'there is no threat to the king if it is surrounded by manyfriendlypieces' (Tadepalli, 1985). Such approximate, abstracted theories can be trac-table enough and accurate enough to serve as a useful basis for creating and learningfrom explanations.Developingcomputer methods that can construct such abstractedtheories,and that can judgewhen theycan safelybe applied, is a problem for furtherresearch.

The Inconsistent Theory Problem. A third difficulty arises in theories from whichinconsistent statementscan be derived. The domain theory in the SAFE-TO-STACKproblem provides one example of such a theory. While this theory has a default rulefor inferring the weight ofan end table, it also has arule for computing weights fromthe known density and volume. Thus, the theory will conclude two different weightsfor a given end table provided that its density and volume are known, and providedthat these are inconsistent with the default assumption about its weight. In suchcases, it is possible to construct inconsistent explanationsfor a single training exam-ple. Furthermore if two different training examples of the same concept areexplained in inconsistent terms (e.g., by utilizing one default assumption for oneexample, and some other assumptions for the second example),difficulties will cer-tainly arise in merging the resulting generalizations. Because of this, and becausedefault assumptions are commonplacein theoriesof many domains, theproblem ofdealingwith inconsistent theories and inconsistent explanationsis also an importantone for future research.

4.2.2 Combining explanation-basedand similarity-basedmethods

While EBG infers concept definitions deductively from a single example,similarity-based methods infer concept definitions inductivelyfrom a number of training ex-amples. It seems clearly desirable to develop combined methods that would utilizeboth a domain theory and multiple training examples to infer concept definitions.This kind of combined approach to generalization will probably be essential in do-mains where only imperfect theories are available.

Although few results have been achieved in combining explanation-based andsimilarity-basedmethods, a number of researchers have begun to consider this issue.Lebowitz (Lebowitz, 1985) is exploring methods for combining similarity-basedmethods and explanation-based methods in his UNIMEM system. UNIMEM ex-amines a database of the votingrecords of congresspersons, searching for empirical,similarity-based generalizations (e.g., midwe crn congresspersons vote in favor offarm subsidies). The system then attempts to verify these empirical generalizations

Page 25: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 71

by explainingthem in terms ofa domain theory (e.g. explaininghow midwesterncon-gresspersons satisfy the goal concept 'people who favor farm subsidies'). This ap-proach has theadvantagethat the similarity-basedtechniques can be used to generatea candidate set of possible generalizations from a largenumber of potentially noisytraining examples.Once such empiricalgeneralizationsare formulated, explanation-based methods can help prune and refine them by using other knowledge in thesystem.

Whereas Lebowitz's approach involves applying similarity-based generalizationfollowed by explanation-basedmethods, an alternative approach is to first applyexplanation-basedmethods to each training example, then to combine the resultinggeneralizedexamples using a similarity-based generalizationtechnique. Consider,for example, using the version space method 14 to combine the results ofexplanation-based generalizationsof a number of training examples (Mitchell, 1984). Since theexplanation-basedgeneralizationof a positive training exampleconstitutes a suffi-cient condition for the goal concept, this can be used as a generalizedpositive exam-ple to refine (generalize) the specific boundary set of the versionspace. Similarly, onecould imagine generalizing negative training examples using explanation-basedgeneralization,by explaining why they are not members of the goal concept. Theresulting generalizednegative example could then be used to refine (specialize) thegeneralboundaryset of theversion space.Thus, while this combined method still suf-fers the main disadvantage ofsimilarity-basedmethods (i.e., it makes inductive leapsbased on its generalization language, which it cannot justify), it converges morerapidly on a final concept definition because it employsEBG to generalizeeach train-ing example.

Kedar-Cabelli (1984, 1985) proposes an alternative method for combining theresults of explanation-basedgeneralizationsfrom multiple training examples. Thismethod, Purpose-Directed Analogy, involves constructing an explanation of oneex-ampleby analogy with an explanationof afamiliar example, thencombining the twoexplanationsto produce a generalconcept definition based on both. Given explana-tions for two different examples, the proposed system combines the explanationsasfollows: Common portions of the two explanationsremain unaltered in the com-bined explanation. Differing portions either become disjunctive subexpressions inthe combined explanation, or are generalized to the next more specific commonsubexpression in the explanation. For example, given an explanation that a blue,ceramic mug is a CUP, and a second exampleofa white styrofoamcup, the explana-tion of the first example is used to construct by analogy an explanationfor the

14 The version space method(Mitchell, 1978) is a similarity-based generalization method based- on sum-marizing the alternativeplausible concept definitions by maintaining two sets: the 'specific' set containsthe setof most specific concept definitions consistent with the observed data, and the 'general' set containsthe most general concept definitions consistent with the data. All other plausible concept definitions liebetween these two sets in the general-to-specific ordering overconcept definitions.

Page 26: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI72

second example. The two resulting explanations may differ in how they explainthat the two example cups are GRASPABLE (assume the first example cup isGRASPABLE because it has a handle, whereas the second is GRASPABLE becauseit is conical). In this case, a generalizationof the two explanationswould include adisjunction, that either the conical shape, or a handle, makes it graspable. That,along with the common features of the two objects in the combined explanationstructure leads to the generalizationthat cups include objects which are concave up-ward, have a flat bottom, are light, and have either a conical shape or a handle. Alter-natively, thecombined explanation would retain only the nextmost-specificcommonsubexpression, GRASPABLE, which would lead to a slightly more general, yet lessoperational, definition of a cup. Thus, this method of combining explanationsofmultiple examples provides a principled method for introducing disjunctions whereneeded into the common generalizationof the two examples.

The three methods discussed above for combining similarity-based andexplanation-based generalizationoffer differing advantages. The first method usessimilarity-based generalizationto determine empirical generalizations which maythen be validated andrefined by explanation-based methods. The second method in-volves employing a similarity-based method to combine the results of explanation-based generalizations from multiple examples. It suffers the disadvantage that thiscombination of methods still produces unjustifiedgeneralizations.The third methodmerges the explanations of multiple examples in order to produce a combinedgeneralization that is justified in terms of the merged explanations. More researchis required on these and other possible methods for employingexplanation-basedmethods when multiple training examples are available.

4.2.3 Formulating generalizationtasks

The above discussion focuses on research issues within the framework ofexplanation-based generalization.An equallyimportantset of research issues has todo with how such methods for generalizationwill be used as subcomponents of largersystems that improve their performance at some given task. As our understandingof generalization methods advances, questions about how to construct performancesystems that incorporate generalization mechanisms will become increasinglyimportant.

One key issue toconsider in this regard is how generalizationtasks are initiallyfor-mulated. In other words, where do the inputs to the EBG method (the goal concept,the domain theory, the operationality criterion) come from? Is it possible to builda system that automaticallyformulates its own generalization tasks and these inputs?Is it possible to build learningsystems that automaticallyshift their focus ofattentionfrom one learning problem to the next as required? What kind of knowledge mustbe transmitted between theperformance system and the learning system to enable theautomatic formulation of generalizationtasks?

Page 27: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 73

Again, little work has been devoted to these issues. The SOAR system (Laird etal., 1984, Laird et al., 1986) is oneexampleof a learning system that formulates itsown generalizationtasks. Each time that SOAR encounters and solves a subgoal, itformulates the generalizationproblem of inferring the general conditions underwhich it can reuse the solution to this subgoal. SOAR then utilizes a techniquecloselyrelated to explanation-basedgeneralization, called implicit generalization(Laird etal., 1986), to infer these subgoal preconditions.

A second research effort which confronts the problem of formulating learningtasks is Keller's research on contextuallearning (Keller, 1983, 1985, 1986). In thiswork, Keller suggests how a problem solving system could itself formulategeneralizationproblems such as those addressed by the LEX2system. In particular,heshows how the task of learning the goal concept USEFUL-OP3arises as a subgoalin the process of planning to improve performance at solving calculus problems. Byreasoning from a top-level goal of improving the efficiency of the problem solver,as well as a declarative description of theproblem solvingsearch schema, themethodderives the subgoal of introducing a filter to prune the search moves that itconsiders.The definition of this filter includes the specification that it is to allow only problemsolving steps that are 'useful' (i.e., that lead toward solutions). The subgoal ofintroducing this filter leads, in turn, to theproblemof operationalizing the definitionof 'useful' (i.e., to the subgoal corresponding to the LEX2generalization task).

Recent work by Kedar-Cabelli (1985) also addresses the problem of formulatinglearning tasks. In this work, Kedar-Cabelli proposes a system to automatically for-mulate definitions of goal concepts in the domain of artifacts. In particular, theproposed system derives functional definitions of artifacts (e.g., CUP) from infor-mation about the purpose for which agents use them (e.g., to satisfy their thirst).Given two different purposes for which an agent might use a cup (e.g., as an orna-ment, versus to satisfy thirst), two different functional definitions can be derived. 15

To derive thefunctional definition of theartifact, theproposed system first computesaplan of actions that leads to satisfying the agent's goal. For example,if the agent'sgoal is to satisfy thirst, then this plan might be to POUR the liquids into the cup,GRASP the cup with the liquid in order to LIFT, and finallyDRINK the liquids. Inorder to be used as part of this plan, the artifact must satisfy the preconditions ofthose plan actions in which it is involved. These preconditions form the functionaldefinition of a cup: an open-vessel, which is stable, graspable, liftable. Thus, for-mulatingfunctional definitions ofartifacts is accomplishedby analyzing therole thatthe artifact plays in facilitating the goal of some agent.

15 This extends Winston's work (see Section 3.1), in that it can derive its own goal concept from a givenpurpose.

Page 28: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

74 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

4.2.4 Using contextual knowledge to solve the generalization task

Above we have discussed some approaches to automatically formulating learningtasks, given knowledgeof the performance task for which the learning takes place.In cases where the learner formulates its own learning task, information about howand why the task was formulated can provide important guidance in solving thelearning task. Keller's (1986) METALEX system provides an example of how suchinformation can be used in guiding learning. Like LEX2, METALEX addresses thelearning task of operationalizing the goal concept USEFUL-OP3. It takes as inputa procedural representation of the performance task to be improved (the calculusproblem solver), a specification of the performance objectives to be achieved('minimizeproblem solvingtime') and knowledge of theperformanceimprovementplan (search spacepruning via filtering), which is a record of how theoperationaliza-tion task was originally formulated. METALEX uses this additional knowledgeabout the context in which its learning task was formulated to guide its search foran operational transformation of the goal concept. Specifically, it executes thecalculus problem solver using the initial (and subsequent intermediary)definitions ofthe goal concept to collect diagnostic information which aids in operationalizing thegoal concept if performance objectives are not satisfied.

In effect, the performance task and performance objective inputs required byMETALEX elaborate on the operationalitycriterion required by the EBG method.Instead of evaluatingoperationality in terms of a binary-valued predicate over con-cept definitions (as in EBG), METALEX evaluates the degreeof operationality ofthe concept definition in relation to the performance task and objective. This abilityto make a more sophisticated analysis ofoperationalityenables METALEX to makeimportant distinctions amongalternative concept definitions. For example, becauseMETALEX uses approximating (non truth-preserving) transforms to modify thegoalconcept, it can generateconcept definitions that only approximate the goal con-cept. In such cases, METALEX is able to determine whether such an approximateconcept definition is desirable based on the degree to which it helps improvethe per-formance objectives.

As the first sections of this paper demonstrate, explanation-based generalizationmethods offer significant promise in attempts to build computer models of learningsystems. Significant progress has been made in understanding explanation-basedgeneralization, especially for problems in which the learner possessesa completeandcorrect theory. As the final section illustrates,much more remains to be discoveredabout how a learner can use what it already knows to guide the acquisition of newknowledge.

Page 29: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

75EXPLANATION-BASED GENERALIZATION

Acknowledgments

The perspective on Explanation-Based Generalization reported here has arisen fromdiscussions over a period of time with a number of people in the Machine LearningGroup at Rutgers and elsewhere. Discussions with the following people have beenparticularly useful in formulating the ideas presented here: Alex Borgida, Gerry

DeJong, Thomas Dietterich, Thorne McCarty, Sridhar Mahadevan, Jack Mostow,Michael Sims, Prasad Tadepalli, Paul Utgoff, and Keith Williamson. We also thankthe following people for providinguseful comments on an earlier draft of this paper:Pat Langley, Sridhar Mahadevan, Jack Mostow, Louis Steinberg, Prasad Tadepalli,and Keith Williamson.

This material is based on work supported by the Defense Advanced Research Pro-jects Agency under Research Contract

NOOOl4-81-K-0394,

by the National ScienceFoundation under grants DCSB3-5 1523 and MCSBO-08889, by the National Institutesof Health under grant RR-64309, by GTE under grant GTE840917, and by a RutgersUniversity GraduateFellowship. The views and conclusions contained in this docu-ment are those of the authorsand should not be interpreted as necessarily represent-ing the official views or policies of these sponsors.

Appendix

The appendix describes DeJong's research on explanation-basedgeneralization. Inparticular, it casts the work on learning schemata for story understanding in terms

of theEBG method. In addition to this project, there has been a great dealof recentresearch on explanation-based generalization, including (DeJong, 1985; Ellman,1985; Mooney, 1985; O'Rorke, 1985; Rajamoney, 1985; Schooley, 1985; Segre,1985; Shavlik, 1985; Sims, 1985; Watanabe, 1985; Williamson, 1985).

DeJong (1981, 1983) developedone of the earliest successful explanation-basedgeneralization systems as part of his research on explanatoryschema acquisition.DeJong is interested in the problem of learning schemata for use in natural languagestoryunderstanding. DeJong's system takes as input an examplestory and producesas output a generalized schema representing thestereotypical action sequence that isinstantiated in the story. For example, the system can process the following story(adapted from G. DeJong, personal communication, November 16, 1984):

Fred is Mary's father. Fred is rich. Mary wearsblue jeans. John approachedMary.He pointed a gun at her. He told her to get into his car. John drove Mary to thehotel. He locked her in his room. John called Fred. He told Fred he had Mary.He promised not to harm her if Fred gave him $250,000 at Treno's Restaurant.Fred delivered the money. Mary arrived home in a taxi.

Page 30: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

76 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

It then produces as output a generalized schema for KIDNAPPING. TheKIDNAPPING schema contains only the relevant details of the kidnapping (c gthat three peopleare involved: Person A who wants money, Person B who has moneyand Person C who is valued by Person B), but none of the irrelevant details (e.gthat Person C wears blue jeans).

DeJong's system uses a generalization method that closely parallels [he EBGmethod. Although there is no direct counterpart to the goal concept in DeJong'ssystem, the goal concept can be thought of as 'the class of action sequences thatachieve personalgoal X for actor V.' For the kidnapping story, the actor's personalgoal is 'attainment of wealth.' The system constructs an explanationfor how theactions in the story lead to the kidnapper's 'attainment of wealth' as a by-productof the story-understanding process. During the story parse, data dependency linksare created to connect actions in the story with the inference rules that are used bytheparser in interpretingthe actions. The set of inference rules constitutes a domaintheory for DeJong's system, and includes knowledge about the goals and plans ofhuman actors, as well as causal knowledge used to set up and verifyexpectationsforfuture actions. The network ofall the data dependencylinks created during the storyparse is called an inferencejustificationnetwork, and correspondsto an explanationfor the action sequence.

Generalization of the inference justification network is carried out by replacinggeneralentities for thespecific objects and events referenced in the network. As withthe EBG method, the entities in the inference justification network are generalizedas far as possible while maintaining the correctness of the data dependency links. 16

Then a new schema is constructed from the network. The issue of operationalityenters into the process of determining an appropriate level of generalizationfor theschema constructed from the network. Should, for example, a generalized schemabe created to describe theKIDNAPPING action sequenceor the moregeneralactionsequences representing BARGAINING-FOR-MONEY or BARGAINING-FOR-WEALTH? All of these schemata explain the actions in the examplestory. DeJongcites several criteria to use in determining the level ofgeneralizationat which to repre-sent the new schema (DeJong, 1983). Thecriteria include such considerations as: 1)Will the generalizedschema be useful in processing stories in the future, or does theschema summarize an event that is unlikely to recur? 2) Are the preconditionsforschema activation commonly achievable? 3) Will the newschema represent a moreefficient method of achieving personal goalsthan existing schemata? Note that theseschema generalizationcriteria roughly correspond to the operationalizationcriteriaused in the EBG method. Because the generalized schemata are subsequently usedfor a story-understanding task, the operationalitycriteria pertain to that task.

It is unclear whether the system regresses its equivalent of the goal concept through the inferencejustification network.

Page 31: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 77

Table 5. The wealth acquisition schema generalizationproblem: Learning about ways to achieve wealth(DeJong, 1983)

Given:Goal Concept: The class of action sequences (i.e., a general schema) by which actor x can achievewealth:

WEALTH-ACQUISITION-ACTION-SEQUENCE(<al, a2, . . ., an>, x) «NOT (WEALTHY (x, sO)) A WEALTHY (x, EXECUTE (x, <al, a2, . . ., an>, sO))

where

<al, a2, . . an> is an action sequence; xis the actor; sO is the actor's current state; and EXE-CUTE (a, b, c) returns the state resulting from the execution ofaction sequence b by actor a in statec.Training Example: The kidnapping story:FATHER-OF (FRED, MARY) A WEALTHY (FRED)

A DESPERATE (JOHN) A WEARS (MARY, BLUE-JEANS)A APPROACHES (JOHN, MARY, sO) A POINTS-GUN-AT (JOHN, MARY, si)

A EXCHANGES (FRED, JOHN, $250000, MARY, si 2)Domain Theory: Rules about human interaction, and knowledge about human goals, intentions,desires, etc.:FATHER-OF (person 1, person2) -> LOVES (person 1, person 2)

A VALUES (person 1, person2)WEALTHY (person) -» HAS (person, $250000)EXCHANGES (person 1, person2, object 1, object2) «*

NOT (HAS (personl, object 1)) A VALUES (personl, object 1)A NOT (HAS (person2, object2)) A VALUES (person2, object2)A HAS (personl, object2) A HAS (person2, object 1)

Operationality Criterion: Acquired generalization (i.e., the generalized schema) must satisfy the re-quirements for future usefulness in story-understanding (see text).

Determine:A generalization of training example(i.e., a generalized schema) that is a sufficient concept defini-tion for the goal concept (i.e., that is a specializationof the wealth acquisition schema) and thatsatisfies the operationality criterion.

Table 5 summarizes the explanation-basedgeneralizationproblem addressed bythe explanatoryschema acquisitionresearch.

References

Borgida, A., Mitchell, T. & Williamson, K.E. (1985). Learning improved integrity constraints andschemas from exceptionsin data and knowledge bases. In M.L. Brodie & J. Mylopoulos (Eds.), Onknowledge base management systems. New York, NY: Springer Verlag.

Page 32: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

78 T.M. MITCHELL, R.M. KELLER AND S.T. KEDAR-CABELLI

DeJong, G. (1981). Generalizationsbased on explanations. ProceedingsoftheSeventhInternational JointConference on Artificial Intelligence (pp. 67-69). Vancouver, 8.C., Canada: Morgan Kaufmann.

DeJong, G. (1983). Acquiring schemata through understanding and generalizing plans. Proceedings ofthe Eighth International Joint Conference on Artificial Intelligence (pp. 462-464). Karlsruhe, WestGermany: Morgan Kaufmann.

DeJong, G. (1 985). A brief overview ofexplanatory schema acquisition.Proceedingsof the Third Interna-tional Machine Learning Workshop. Skytop, PA, June.

DeJong, G., &Mooney, R.(in press). Explanation-basedlearning: An alternative view.Machine learning.Dietterich, T.G., London, 8., Clarkson, X., & Dromey, G. (1982). Learning and Inductive Inference. In

P.R. Cohen, & E.A. Feigenbaum (Eds.), The handbook of artificial intelligence. Los Altos, CA:William Kaufmann, Inc.

Dijkstra, E.W. (1976). A discipline ofprogramming. Englewood Cliffs, NJ: Prentice Hall.Ellman, T. (1985). Explanation-based learning in logic circuit design. Proceedings of the Third Interna-

tional Machine Learning Workshop. Skytop, PA, June.Fikes, R., Hart, P., & Nilsson, N.J. (1972). Learning and executing generalized robot plans. Artificial

intelligence, 3, 251-288. Also in B.L. Webber & N.J. Nilsson (Eds.), Readings in artificial intelligence.Kedar-Cabelli, S.T. (1984). Analogy with purpose in legal reasoningfrom precedents. (Technical Report

LRP-TR-17). Laboratory for Computer Science Research, Rutgers University, New Brunswick, NJ.

Kedar-Cabelli,

S.T. (1985). Purpose-directedanalogy.Proceedings of the Cognitive Science Society Con-ference. Irvine, CA: Morgan Kaufmann.

Keller, R.M. (1983). Learning by re-expressing concepts for efficientrecognition. Proceedings of theNa-tional Conference on Artificial Intelligence (pp 182-186). Washington, D.C: Morgan Kaufmann.

Keller, R.M. (1985). Development of a framework for contextual concept learning. Proceedings of theThird InternationalMachine Learning Workshop. Skytop, PA, June.

Keller, R.M.(1986). Contextuallearning:A performance-based model ofconceptacquisition. Unpublish-ed doctoral dissertation, Rutgers University.

Kibler, D., & Porter, B. (1985). A comparisonofanalytic and experimentalgoal regression for machinelearning. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. LosAngeles, CA: Morgan Kaufmann.

Laird, J.E., Rosenbloom, P.S., Newell, A. (1984). Toward chunking as a general learning

mechanism,

(pp 188-192). Proceedings oftheNational Conference onArtificial Intelligence. Austin, TX: MorganKaufmann.

Laird, J.E., Rosenbloom, P.S., & Newell, A. (1986). SOAR: The architecture of a general learningmechanism. Machine learning, I, 11-46.

Lebowitz, M. (1985). Concept learning in a rich input domain: Generalization-basedmemory. In R.S.Michalski, J.G. Carbonell & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence ap-proach, Vol. 2. Los Altos, CA: Morgan Kaufmann.

Mahadevan, S. (1985). Verification-based learning: A generalization strategy for inferring problem-decomposition methods. Proceedings of the Ninth International Joint Conference on Artificial In-telligence. Los Angeles, CA: Morgan Kaufmann.

Michalski, R.S. (1983). A theory and methodology of inductive learning. Artificial intelligence, 20,111-161. Also in R.S. Michalski, J.G. Carbonell & T.M. Mitchell (Eds.), Machine learning: An ar-tificial intelligence approach.

Minton, S. (1984). Constraint-based generalization: Learning game-playing plans from single examples,(pp 251-254). Proceedingsof theNational Conference onArtificial Intelligence. Austin, TX: MorganKaufmann.

Mitchell, T.M.(1978). Versio spaces: An approachto concept learning. PhD thesis, Department of Elec-trical Engineering, Stanford University. Also Stanford CS reports STAN-CS-78-711, HPP-79-2.

Mitchell, T.M. (1980). The needfor biases in learning generalizations(Technical Report CBM-TR-1 17).Rutgers University, New Brunswick, NJ.

Page 33: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

EXPLANATION-BASED GENERALIZATION 79

Mitchell, T.M. (1982). Generalization as search. Artificial intelligence, March, 75(2), 203-226.Mitchell, T.M. (1983). Learning andProblem Solving. Proceedings oftheEighth InternationalJoint Con-

ference on Artificial Intelligence (pp 1139-1151). Karlsruhe, West Germany: Morgan Kaufmann.Mitchell, T.M. (1984). Toward combining empirical and analytic methods for learning heuristics. In A.

Elithorn & R.Banerji (Eds.), Human andartificial intelligence. Amsterdam: North-Holland PublishingCo. Also Rutgers Computer Science Department Technical Report LCSR-TR-27, . .81.

Mitchell, T.M., Utgoff, P.E., & Banerji, R.B. (1983). Learning by experimentation: Acquiring andrefining problem-solving heuristics. In R.S. Michalski, J.G. Carbonell & T.M. Mitchell (Eds.),Machine learning. Palo Alto, CA: Tioga.

Mitchell, T.M., Mahadevan, S., & Steinberg, L. (1985). LEAP: A learning apprentice for VLSI design.Proceedings ofthe Ninth International Joint Conferenceon Artificial Intelligence(pp. 573 -580). LosAngeles, CA: Morgan Kaufmann.

Mooney, R. (1985). Generalizing explanationsof narratives into schemata. Proceedings of the ThirdInternationalMachine Learning Workshop. Skytop, PA.

Mostow, D.J. (1981). Mechanical transformation of task heuristics into operational procedures. PhDthesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA.

Mostow, D.J. (1983). Machine transformation of advice into a heuristic search procedure. In R.S.Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning. Palo Alto, CA: Tioga.

Nilsson, N.J. (1980). Principles ofartificial intelligence. Palo Alto, CA: Tioga.O'Rorke, P. (1984). Generalization for explanation-basedschema acquisition. Proceedings of the Na-

tional Conference on Artificial Intelligence(pp 260-263). Austin, TX: Morgan Kaufmann.O'Rorke, P. (1985). Recent progress on the 'mathematician's apprentice' project. Proceedings of the

Third International Machine Learning Workshop. Skytop, PA.

Quinlan,

J.R. (1985). The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell& T.M.Mitchell (Eds.), Machine learning: An artificial intelligence approach, Vol. 2. Los Altos, CA: MorganKaufmann. Modified, also in Proceedings of the Second International Machine Learning Workshop.

Rajamoney, S. (1985). Conceptual knowledge acquisitionthrough directed experimentation.Proceedingsof the Third InternationalMachine Learning Workshop. Skytop, PA.

Salzberg, S., & Atkinson, D.J. (1984). Learning by building causal explanations. Pro-ceedings of the Sixth European Conference on Artificial Intelligence (pp 497-500). Pisa, Italy.

Schank,

R.C. (1982). Looking at learning. Proceedings of the Fifth European Conference on ArtificialIntelligence (pp 11-18). Paris, France.

Schooley,P. (1985). Learning state evaluation functions. Proceedingsof the Third InternationalMachineLearning Workshop. Skytop, PA.

Segre, A.M. (1985). Explanation-basedmanipulator learning. Proceedings of the Third InternationalMachineLearning Workshop. Skytop, PA.

Shavlik,

J.W. (1985). Learning classical physics. Proceedings of the Third InternationalMachine Learn-ing Workshop. Skytop, PA.

Sims, M. (1985). An investigation ofthe natureofmathematical discovery. Proceedingsofthe Thirdlnter-national MachineLearning Workshop. Skytop, PA.

Silver, B. (1983). Learning equationsolving methods from worked examples. Proceedingsof theSecondInternational Machine Learning Workshop (pp. 99- 104). Urbana, IL.

Soloway, E.M.(1978). Learning = interpretation + generalization:A case study in knowledge-directedlearning. PhD thesis, DepartmentofComputer and Information Science, University ofMassachusetts,Amherst. Computer and Information Science Report COINS TR-78-13.

Tadepalli,P.V. (1985). Learning in intractable domains.Proceedings of the Third International MachineLearning Workshop. Skytop, PA.

Utgoff, P.E. (1983). Adjusting bias in concept learning. Proceedings of the Eighth International Con-ference on Artificial Intelligence (pp. 447-449). Karlsruhe, West Germany: Morgan Kaufmann.

Utgoff, P.E. (1984). Shift ofbias for inductive concept learning. PhD thesis, Department of ComputerScience, Rutgers University, New Brunswick, NJ.

Page 34: Explanation-Based Generalization: A - Stanford Universityjb732xy4310/jb732xy4310.pdf · explanation-based generalization 49 generalization,and areable toextract more information from

T.M.

MITCHELL,

R.M. KELLER AND S.T. KEDAR-CABELLI80

I

i-T,

"J

MC'E

*J

-J_:

I

Utgoff, P.E. (1985). Shift of bias for inductive concept learning. In R.S. Michalski, J.G. Carbonell &T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach, Vol. 2. Los Altos, CA:Morgan Kaufmann. Modified, also in Proceedings of the Second International Machine LearningWorkshop.

Waldinger, R. (1977). Achieving several goals simultaneously. In E. Elcock & D. Michie, D. (Eds.),Machineintelligence 8, London: Ellis Horwood, Limited. Also in B.L. Webber & N.J. Nilsson (Eds.),Readings in artificial intelligence.

Watanabe, M. (1985). Learning implementationrules in operating-conditions depending onstates in VLSIdesign. Proceedings of the Ji:rd InternationalMachineLearning Workshop. Skytop, PA.

Williamson, K. (1985). Learning from exceptions to constraints in databases. Proceedings of the ThirdInternationalMachine Learning Workshop. Skytop, PA.

Winston, P.H. (1985). Learning by augmenting rules and accumulating censors. In R.S. Michalski, J.G.Carbonell & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach,Vol. 2. LosAltos, CA: Morgan Kaufmann.

Winston, P.H.

Binford,

T.0.,Katz, 8., & Lowry, M. (1983). Learning physical descriptionsfrom func-tional definitions, examples, and precedents. National Conference on Artificial Intelligence (pp.433-439). Washington, D.C: Morgan Kaufmann.