journal of arti cial telligence in h researc 13 (2000) 95 ... · vian asimo e adaptiv ts agen diana...

Journal of Arti�cial Intelligence Research 13 (2000) 95-153 Submitted 1/00; published 9/00

Asimovian Adaptive Agents

Diana F. Gordon [email protected]

Navy Center for Applied Research in Arti�cial Intelligence

Naval Research Laboratory, Code 5515

Washington, D.C. 20375-5337 USA

Abstract

The goal of this research is to develop agents that are adaptive and predictable and

timely. At �rst blush, these three requirements seem contradictory. For example, adap-

tation risks introducing undesirable side e�ects, thereby making agents' behavior less pre-

dictable. Furthermore, although formal veri�cation can assist in ensuring behavioral pre-

dictability, it is known to be time-consuming.

Our solution to the challenge of satisfying all three requirements is the following. Agents

have �nite-state automaton plans, which are adapted online via evolutionary learning (per-

turbation) operators. To ensure that critical behavioral constraints are always satis�ed,

agents' plans are �rst formally veri�ed. They are then reveri�ed after every adaptation.

If reveri�cation concludes that constraints are violated, the plans are repaired. The main

objective of this paper is to improve the e�ciency of reveri�cation after learning, so that

agents have a su�ciently rapid response time. We present two solutions: positive re-

sults that certain learning operators are a priori guaranteed to preserve useful classes of

behavioral assurance constraints (which implies that no reveri�cation is needed for these

operators), and e�cient incremental reveri�cation algorithms for those learning operators

that have negative a priori results.

1. Introduction

Agents are becoming increasingly prevalent and e�ective. Robots and softbots, working

individually or in concert, can relieve people of a great deal of labor-intensive tedium in their

jobs as well as in their day-to-day lives. Designers can furnish agents with plans to perform

desired tasks. Nevertheless, a designer cannot possibly foresee all circumstances that will

be encountered by the agent. Therefore, in addition to supplying an agent with plans, it

is essential to also enable the agent to learn and modify its plans to adapt to unforeseen

circumstances. The introduction of learning, however, often makes the agent's behavior

signi�cantly harder to predict.

1

The goal of this research is to verify the behavior of adaptive

agents. In particular, our objective is to develop e�cient methods for determining whether

the behavior of learning agents remains within the bounds of prespeci�ed constraints (called

\properties") after learning. This includes verifying that properties are preserved for single

adaptive agents as well as verifying that global properties are preserved for multiagent

systems in which one or more agents may adapt.

An example of a property is Asimov's First Law (Asimov, 1950). This law, which

has also been studied by Weld and Etzioni (1994), states that an agent may not harm a

1. Even adding a simple, elegant learning mechanism such as chunking in Soar can substantially reduce

system predictability (Soar project members, personal communication).

c

2000 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Gordon

human or allow a human to come to harm. The main contribution of Weld and Etzioni is

a \ `call to arms:' before we release autonomous agents into real-world environments, we

need some credible and computationally tractable means of making them obey Asimov's

First Law...how do we stop our artifacts from causing us harm in the process of obeying

our orders?" Of course, this law is too general for direct implementation and needs to be

operationalized into speci�c properties testable on a system, such as \Never delete a user's

�le." This paper addresses Weld and Etzioni's call to arms in the context of adaptive agents.

To respond to the call to arms, we are working toward \Asimovian" adaptive agents, which

we de�ne to be adaptive agents that can verify, in a reasonably e�cient manner, whether

user-de�ned properties are preserved after adaptation.

2

Such agents will either constrain

their adaptation methods, or repair themselves in such a way as to preserve these properties.

The veri�cation method assumed here, model checking, consists of building a �nite

model of a system and checking whether the desired property holds in that model. In the

context of this paper, model checking determines whether S j= P for plan S and property

P , i.e., whether plan S \models" (satis�es) property P . The output is either \yes" or \no"

and, if \no," one or more counterexamples are provided. Model checking has proven to be

very e�ective for safety-critical applications, e.g., a model checker uncovered a potentially

disastrous error in a system designed to make buildings more earthquake resistant. This

error would have unleashed a structural force to worsen earthquake vibrations, rather than

dampen them (Elseaidy et al., 1994).

Essentially, model checking is brute force search through the set of all reachable states of

the plan to check if the property holds. If the plan has a �nite number of states, this process

terminates. Model checking global properties of a multiagent plan has time complexity that

is exponential in the number of agents.

3

With a large number of agents, this is could be

a serious problem. In fact, even model checking a single agent plan with a huge number

of states can be computationally prohibitive. A great deal of research in the veri�cation

community is currently focused on reduction techniques for handling very large state spaces

(Clarke & Wing, 1997). One of the largest systems model checked to date using these

reduction techniques had 10

120

states (Burch et al., 1994). Nevertheless, the applicability

of many of these reduction techniques is restricted and few are completely automated.

Furthermore, none of them are tailored for e�cient reveri�cation after learning has altered

the system. Some methods in the literature are designed for software that changes. One

that emphasizes e�ciency, as ours does, is Sokolsky and Smolka's (1994). However none

of them, including Sokolsky and Smolka's method, are applicable to multiagent systems in

which a single agent could adapt, thereby altering the global behavior of the overall system.

In contrast, our approach addresses the timeliness of adaptive multiagent systems.

Consider how reveri�cation �ts into our overall adaptive agents framework. In this

framework (see Figure 1), there are one or more agents with \anytime" plans (Grefenstette

& Ramsey, 1992), i.e., plans that are continually executed in response to internal and

external environmental conditions. Each agent's plan is assumed to be in the form of a

�nite-state automaton (FSA). FSAs have been shown to be e�ective representations of

2. They are also called APT agents because they are adaptive, predictable and timely.

3. The states in a multiagent plan are formed by taking the Cartesian product of states in the individual

agent plans (see Section 3).

96


OFFLINE:

�

�

plan

(1) Develop initial agent plan(s)

(2) If SIT

1plan

or SIT

multplans

, form multiagent plan

(3) Verify (multi)agent plan

(4) Repair plan(s) if properties not satis�ed

?

�

ONLINE:

�

�

agent's plan

?

(5) Learning modi�es (multi)agent plan

�

�

new plan

?

6

(6) If SIT

multplans

, re-form multiagent plan

(7) Rapidly reverify (multi)agent plan

(8) Choose another learning operator or repair

plan(s) if properties not satis�ed

�

Figure 1: Veri�able adaptive agents.

reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel & Markovitch,

1996; Fogel, 1996).

Let us begin with step 1 in Figure 1. There are at least a couple of ways that the FSA

plans could be formed initially. For one, a human plan designer could engineer the initial

plans. This may require considerable e�ort and knowledge. An appealing alternative is to

evolve (i.e., learn using evolutionary algorithms) the initial plans in a simulated environ-

ment. Fogel (1996) outlines a procedure for evolving FSAs that is e�ective for a number of

problems, including an iterated version of the Prisoner's Dilemma.

Human plan engineers or evolutionary algorithms can develop plans that satisfy an

agent's goals to a high degree. However, to provide strict behavioral guarantees, formal

veri�cation is also required. Therefore we assume that prior to �elding the agents, the

(multi)agent plan has been veri�ed o�ine to determine whether it satis�es critical properties

(steps 2 and 3). If not, the plan is repaired (step 4). Plan repair is not addressed in this

paper, although it is an important topic for future research. Steps 2 through 4 require some

clari�cation. If there is a single agent, then it has one FSA plan and that is all that is

veri�ed and repaired, if needed. We call this SIT

1agent

. (This notation, as well as other

notation used in the paper, is included in the glossary of Appendix A.) If there are multiple

agents that cooperate, we consider two possibilities. In SIT

1plan

, every agent uses the same

multiagent plan, which is a \product" of the individual agent plans. This multiagent plan

is formed and veri�ed to see if it satis�es global multiagent coordination properties. The

multiagent plan is repaired if veri�cation produces any errors, i.e., failure of the plan to

satisfy a property. In SIT

multplans

, each agent independently uses its own individual plan.

To verify global properties, one of the agents takes the product of these individual plans to

form a multiagent plan. This multiagent plan is what is veri�ed. For SIT

multplans

, one or

more individual plans are repaired if the property is not satis�ed.

After the initial plan(s) have been veri�ed and repaired, the agents are �elded. While

�elded (online), the agents apply learning (e.g., evolutionary operators) to their plan(s)

as needed (step 5). Learning may be required to adapt the plan to handle unexpected

97

Gordon

situations or to �ne-tune the plan. If SIT

1agent

or SIT

1plan

, the single (multi)agent plan

is adapted. If SIT

multplans

, an agent adapts its own FSA, after which the multiagent

(product) plan is re-formed. For all situations, one agent then rapidly reveri�es the new

(multi)agent plan to ensure it still satis�es the required properties (steps 6 and 7). Re-

formation of the multiagent plan and reveri�cation are required to be as time-e�cient

as possible because they are performed online, perhaps in a highly time-critical situation.

Whenever (re)veri�cation fails, it produces a counterexample that is used to guide the choice

of an alternative learning operator or other plan repair as needed (step 8). This process of

executing, adapting, and reverifying plans cycles inde�nitely as needed. The main focus of

this paper is steps 6 and 7.

Rapid reveri�cation after learning is a key to achieving timely agent responses. Our long-

term goal is to examine all learning methods and important property classes to determine

the quickest reveri�cation method for each combination of learning method and property

class. In this paper we present new results that certain useful learning operators are a

priori guaranteed to be \safe" with respect to important classes of properties. In other

words, if the property holds for the plan prior to learning, then it is guaranteed to still

hold after learning.

4

If an agent uses these learning operators, it will be guaranteed to

preserve the properties with no reveri�cation required, i.e., steps 6 through 8 in Figure 1

need not be executed. This is the best one could hope for in an online situation where rapid

response time is critical. For other learning operators and property classes our a priori

results are negative. However, for the cases in which we have negative results, we present

novel incremental reveri�cation algorithms. These methods localize the reveri�cation in

order to save time over total reveri�cation from scratch.

5

We also present a novel algorithm

for e�ciently re-forming a multiagent plan, for the situation (SIT

multplans

) in which there

are multiple agents, each learning independently.

The novelty of our approach is not in machine learning or veri�cation per se, but rather

the synthesis of the two. There are numerous important potential applications of our

approach. For example, if antiviruses evolve more e�ective behaviors to combat viruses, we

need to ensure that they do not evolve undesirable virus-like behavior. Another example is

data mining agents that can exibly adapt their plans to dynamic computing environments

but whose behavior is adequately constrained for operation within secure or proprietary

domains. A third example is planetary rovers that adapt to unforeseen conditions while

remaining within critical mission parameters. Yet another example is automated factories

that adapt to equipment failures but continue operation within essential tolerances and

other speci�cations. Also, there are ongoing discussions at the Universities Space Research

Association about launching orbiting unmanned vehicles to run laboratory experiments.

The experiments would be semiautomated, and would thus require both adaptation and

behavioral assurances.

The last important application that we will mention is in the domain of power grid

and telecommunications networks. The following is an event that occurred (The New York

Times, September 21, 1991, Business Section). In 1991 in New York, local electric utilities

had a demand overload. In attempting to assist in solving the regional shortfall, AT&T

put its own generators on the local power grid. This was a manual adaptation, but such

4. This idea of property-preserving learning transformations was �rst introduced by Gordon (1998).

5. Incremental methods are often used in computer science for improving the time-e�ciency of software.

98


adaptations are expected to become increasingly automated in the future. As a result of

AT&T's actions, there was a local power overload and AT&T lost its own power, which

resulted in a breakdown of the AT&T regional communications network. The regional net-

work breakdown propagated to create a national breakdown in communications systems.

This breakdown also triggered failures of many other control networks across the country,

such as the air tra�c control network. Air travel nationwide was shut down. In the future,

it is reasonable to expect that some network controllers will be implemented using multiple,

distributed cooperating software agents. This example dramatically illustrates the poten-

tial vulnerability of our national resources unless these agents satisfy all of the following

criteria: continuous execution/monitoring, exible adaptation to failures, safety/reliability,

and timely responses. Our approach ensures that agents satisfy all of these.

This paper is organized as follows. Section 2 provides an illustrative example that is used

throughout the paper. Section 3 has the necessary background de�nitions of FSAs, property

types, formal veri�cation, and machine learning operators. A priori results for speci�c

machine learning operators are in Section 4. These learning operators alter automaton edges

and the transition conditions associated with edges. A transition condition speci�es the

condition under which a state-to-state transition may be made. We present positive a priori

results for some of these operators, where a \positive a priori result" means that the learning

operator preserves a speci�ed class of properties. On the other hand, counterexamples are

presented to show that some of the learning operators do not necessarily preserve these

properties. Section 5 extends the a priori results for the multiagent situation SIT

multplans

.

For all cases where we obtain negative a priori results, Section 6 provides incremental

algorithms for re-forming the multiagent plan and reverifying it, along with a worst-case

complexity analysis and empirical time complexity results. The empirical results show

as much as a

1

2

-billion-fold speedup for one of the incremental algorithms over standard

veri�cation. The paper concludes with a discussion of related work and ideas for future

research.

2. Illustrative Example

We begin with a multiagent example for SIT

1plan

or SIT

multplans

that is used throughout the

paper to illustrate the de�nitions and ideas. The section starts by addressing SIT

multplans

,

where multiple agents have their own independent plans. Later in the section we address

SIT

1plan

, where each agent uses a joint multiagent plan.

Imagine a scenario where a vehicle has landed on a planet for the purpose of exploration

and sample collection, for example as in the Path�nder mission to Mars. Like the Path�nder,

there is a lander (called agent \L") from which a mobile rover emerges. However, in this

case there are two rovers: the far (\F") rover for distant exploration, and the intermediary

(\I") rover for transferring data and samples from F to L.

We assume an agent designer has developed the initial plans for F, I, and L, shown

in Figures 2 and 3. These are simpli�ed, rather than realistic, plans { for the purpose of

illustration. Basically, rover F is either collecting samples/data (in state COLLECTING) or

it is delivering them to rover I (when F is in its state DELIVERING). Rover I can either be

receiving samples/data from rover F (when I is in its RECEIVING state) or it can deliver

them to lander L (when it is in its DELIVERING state). If L is in its RECEIVING state,

99

Gordon

�

�

�

�

H

Hj

COLLECTINGfF-collect,

F-deliverg

�

��

�

F-collect ^ I-deliver

else

?

6

else

�

�

�

�

DELIVERINGfF-deliverg

�

��

*

F-deliver ^ I-receive

�

�

�

�

H

Hj

RECEIVING fI-receiveg

�

��

�

else

F-deliver ^

I-receive ^

L-transmit

?

6

I-deliver ^

: L-transmit

�

�

�

�

DELIVERING fI-deliverg

�

��

*

else

Figure 2: Plans for rovers F (left) and I (right).

�

�

�

�

H

Hj

TRANSMITTINGfL-transmit,

L-pauseg

�

��

�

else

I-receive ^

L-transmit

?

6

else

H

H

H

HY

else

I-receive ^

L-pause

H

H

H

H

H

H

Hj

�

�

�

�

�

��

PAUSING fL-pauseg

I-receive ^

L-pause

�

�

�

�

H

Hj

RECEIVINGfL-receiveg

�

�

�

�

�

�

�)

�

��

*

I-deliver ^

L-receive

Figure 3: Plan for the lander L.

then it can receive the samples/data from I. Otherwise, L could be busy transmitting data

to Earth (in state TRANSMITTING) or pausing between actions (in state PAUSING).

As mentioned above, plans are represented using FSAs. An FSA has a �nite set of states

(i.e., the vertices) and allowable state-to-state transitions (i.e., the directed edges between

vertices). The purpose of having states is to divide the agent's overall task into subtasks.

A state with an incoming arrow not from any other state is an initial state. Plan execution

begins in an initial state.

Plan execution occurs as the agent takes actions, such as agent F taking action F-collect

or F-deliver. Each agent has a repertoire of possible actions, a subset of which may be

taken from each of its states. A plan designer can specify this subset for each state. The

choice of a particular action from this subset is modeled in the FSA as nondeterministic.

It is assumed that further criteria, not speci�ed here, are used to make the �nal run-time

choice of a single action from a state.

Let us specify the set of actions for each of the agents (F, I, L) in our example. F has

two possible actions: F-collect and F-deliver. The �rst action means that F collects samples

and/or data, and the second action means that it delivers these items to I. Rover I also

has two actions: I-receive and I-deliver. The �rst action means I receives samples/data

from F, and the second means that it delivers these items to L. L has three actions: L-

transmit, L-pause, and L-receive. The �rst action means L transmits data to Earth, the

second that it pauses between operations, and the third that it receives samples/data from

I. For each FSA, the set of allowable actions from each state is speci�ed in Figures 2 and 3

100


in small font next to the state. For example, rover F can only take action F-deliver from

its DELIVERING state.

The transition conditions (i.e., the logical expressions labeling the edges) in an FSA plan

describe the set of actions that enable a state-to-state transition to occur. The operator ^

means \AND," _ means \OR," and : means \NOT." The condition \else" will be de�ned

shortly. The transition conditions of one agent can refer to the actions of one or more other

agents. This is because each agent is assumed to be reactive to what it has observed other

agents doing. If not visible, agents communicate their action choice.

Once an agent's action repertoire and its allowable actions from each state have been

de�ned, \else" can be de�ned. The transition condition \else" labeling an outgoing edge

from a state is an abbreviation denoting the set of all remaining actions that may be taken

from the state that are not already covered by other transition conditions. For example,

in Figure 3, L's three transition conditions from state TRANSMITTING are (I-receive ^

L-transmit), (I-receive ^ L-pause), and \else." L can only take L-transmit or L-pause from

this state. However, rover I could take I-deliver instead of I-receive. Therefore, in this case

\else" is equivalent to ((I-deliver ^ L-transmit) _ (I-deliver ^ L-pause)).

An FSA plan represents a set of allowable action sequences. In particular, a plan is the

set of all action sequences that begin in an initial state and obey the transition conditions.

An example action sequence allowed by F's plan is ((F-collect ^ I-deliver), (F-collect ^

I-receive), (F-deliver ^ I-receive), ...) where F takes its actions and observes I's actions at

each step in the sequence.

At run-time, these FSA plans are interpreted in the following manner. At every discrete

time step, every agent (F, I, L) is at one of the states in its plan, and it selects the next

action to take. Agents choose their actions independently. They do not need to synchronize

on action choice. The choice of action might be based, for example, on sensory inputs from

the environment. Although a complete plan would include the basis for action choice, as

mentioned above, here we leave it unspeci�ed in the FSA plans. Our rationale for doing

this is that that the focus of this paper is on the veri�cation of properties about correct

action sequences. The basis for action choice is irrelevant to these properties.

Once each agent has chosen an action, all agents are assumed to observe the actions

of the other agents that are mentioned in its FSA transition conditions. For example, F's

transition conditions mention I's actions, so F needs to observe what I did. Based on its

own action and those of the other relevant agent(s), an agent knows the next state to which

it will transition. There is only one possible next state because the FSAs are assumed to

be deterministic. For example, if F is in its COLLECTING state, and it chooses action

F-collect, and it observes I taking action I-deliver, then it will stay in its COLLECTING

state. The process of being in a state, choosing an action, observing the actions of other

agents, then moving to a next state, is repeated inde�nitely.

So far, we have been assuming SIT

multplans

where each agent has its own individual

plan. If we assume SIT

1plan

, then each agent uses the same multiagent plan to decide its

actions. A multiagent plan is formed by taking a \product" (de�ned in Subsection 3.1)

of the plans for F, I, and L. This product models the synchronous behavior of the agents,

where \synchronous" means that at each time step every agent takes an action, observes

actions of other agents, and then transitions to a next state. The product plan is formed,

essentially, by taking the Cartesian product of the individual automaton states and the in-

101

Gordon

tersection of the transition conditions. Multiagent actions enable state-to-state transitions

in the product plan. For example, if the agents jointly take the actions F-deliver and I-

receive and L-transmit, then all agents will transition from the joint state (COLLECTING,

RECEIVING, TRANSMITTING) to the joint state (DELIVERING, DELIVERING, RE-

CEIVING) represented by triples of states in the FSAs for F, I, and L. A multiagent plan

consists of the set of all action sequences that begin in a joint initial state of the product

plan and obey the transition conditions.

Whether the situation is SIT

multplans

or SIT

1plan

, a multiagent plan needs to be formed

to verify global multiagent coordination properties (see step 2 of Figure 1). Veri�cation of

global properties consists of asking whether all of the action sequences allowed by the

product plan satisfy the property.

One class of (global) properties of particular importance, which is addressed here, is that

of forbidden multiagent actions that we want our agents to always avoid, called Invariance

properties. An example is property P1: :(I-deliver ^ L-transmit), which states that it

should always be the case that I does not deliver at the same time that L is transmitting.

This property prevents problems that may arise from the lander simultaneously receiving

new data from I while transmitting older data to Earth. The second important class ad-

dressed here is Response properties. These properties state that if a particular multiagent

action (the \trigger") has occurred, then eventually another multiagent action (the neces-

sary \response") will occur. An example is property P2: If F-deliver has occurred, then

eventually L will execute L-receive.

If the plans in Figures 2 and 3 are combined into a multiagent plan, will this multiagent

plan satisfy properties P1 and P2? Answering this question is probably di�cult or impos-

sible for most readers if the determination is based on visual inspection of the FSAs. Yet

there are only a couple of very small, simple FSAs in this example! This illustrates how

even a few simple agents, when interacting, can exhibit complex global behaviors, thereby

making global agent behavior di�cult to predict. Clearly there is a need for rigorous be-

havioral guarantees, especially as the number and complexity of agents increases. Model

checking fully automates this process. According to our model checker, the product plan

for F, I, and L satis�es properties P1 and P2.

Rigorous guarantees are also needed after learning. Suppose lander L's transmitter

gets damaged. Then one learning operator that could be applied is to delete L's action

L-transmit, which thereafter prevents this action from being taken from state TRANS-

MITTING. After applying a learning operator, reveri�cation may be required. For this

particular operator (deleting an action), no reveri�cation is needed (see Section 4).

In a multiagent situation, what gets modi�ed by learning? Who forms and veri�es the

product FSA? And who performs repairs if veri�cation fails, and what is repaired? The

answers to these questions depend on whether it is SIT

1plan

or SIT

multplans

. If SIT

1plan

,

the agent with the greatest computational power, e.g., lander L in our example, maintains

the product plan by applying learning to it, verifying it, repairing it as needed, and then

sending a copy of it to all of the agents to use. If SIT

multplans

, an agent applies learning to

its own individual plan. The individual plans are then sent to the computationally powerful

agent, who forms the product and veri�es that properties are satis�ed. If repairs are needed,

one or more agents repair their own individual plans.

102


It is assumed here that machine learning operators are applied one-at-a-time per agent

rather than in batch and, if SIT

multplans

, the agents co-evolve plans by taking turns learning

(Potter, 1997). Beyond these assumptions, this paper does not focus on the learning opera-

tors per se (other than to de�ne them). It focuses instead on the outcome resulting from the

application of a learning operator. In particular, we address the reveri�cation issue. The

next section gives useful background de�nitions needed for understanding reveri�cation.

3. Preliminary De�nitions

This section provides de�nitions of FSAs, properties, veri�cation, and machine learning

operators. For a clear, unambiguous understanding of the results in this paper, many of

these de�nitions are formal.

3.1 Automata for Agents' Plans

FSAs have at least four advantages over classical plans (Nilsson, 1980; Dean & Wellman,

1991). For one, unlike classical plans, the type of �nite-state automaton plans used here

allows potentially in�nite (indeterminate) length action sequences.

6

This provides a good

model of embedded agents that are continually responsive to their environment without

any arti�cial termination to their behavior. Execution and learning may be interleaved in

a natural manner. Another advantage is that FSA plans have states, and the plan designer

can use these states to represent subtasks of the overall task. This subdivides the plan into

smaller units, thereby potentially increasing the comprehensibility of the plans. States also

enable di�erent action choices at di�erent times, even if the sensory inputs are the same.

A third advantage of FSA plans is that they are particularly well-suited to modeling the

concurrent behavior of multiple agents. An arbitrary number of single-agent plans can be

developed independently and then composed into a synchronous multiagent plan (for which

global properties may be tested) in a straightforward manner. Finally, FSA plans can be

veri�ed using the very popular and e�ective automata-theoretic model checking methods,

e.g., see Kurshan (1994).

A disadvantage of FSA plans as opposed to classical plans is that there is a great deal

of research that has been done on automatically forming classical plans, e.g., see Dean and

Wellman (1991). It is unclear how much of this might be applicable to FSAs. On the

other hand, evolutionary algorithms can be used to evolve FSA plans (Fogel, 1996). A

disadvantage of FSA plans as opposed to plans composed of rule sets is that the latter may

express a plan more succinctly. Nevertheless for plans that require formal veri�cation, FSAs

are preferable because the complex interactions that can occur between rules make them

very hard to verify. Formal veri�cation for FSAs is quite sophisticated and widely used in

safety-critical industrial applications.

This subsection, which is based on Kurshan (1994), brie y summarizes the basics of the

FSAs used to model agent plans. Figures 2 and 3 illustrate the de�nitions. This paper

focuses on FSAs that model agents with a potentially in�nite lifetime, represented as an

in�nite-length \string" (i.e., a sequence of actions).

6. Results for agents with �nite lifetimes may be found in Gordon (1998, 1999).

103

Gordon

Before beginning our discussion of automata, we brie y digress to de�ne Boolean al-

gebra. Examples throughout this paper have automaton transition conditions expressed

in Boolean algebra, because Boolean algebra succinctly summarizes these transition condi-

tions. Boolean algebra is also useful for succinctly expressing the properties. Furthermore,

it is easier for us to describe two of the incremental reveri�cation algorithms if we use

Boolean algebra notation. Therefore, we brie y summarize the basics of Boolean algebra.

A Boolean algebra K is a set of elements with distinguished elements 0 and 1, closed

under the Boolean ^, _, and : operations, and satisfying the standard properties (Sikorski,

1969). For elements x and y of K, x^ y is called the meet of x and y, x_ y is called the join

of x and y, and :x is called the complement of x. For those readers who are unfamiliar with

Boolean algebras and who want some intuition for these operations, it may help to imagine

that each element of K is itself a set, e.g., a set of actions. Meet, join, and complement

would then be set intersection, union, and complement, respectively. Elements 0 and 1,

in this case, would be the empty set (;) and the set of all elements in the universe (U),

respectively.

The Boolean algebras are assumed to be �nite. There is a partial order among the

elements, �, which is de�ned as x � y if and only if x^ y = x. It may help to think of � as

analogous to � for sets. The elements 0 and 1 are de�ned as 8x 2 K, 0 � x, and 8x 2 K,

x � 1. The atoms (analogous to single-element sets) of K, �(K), are the nonzero elements

of K minimal with respect to �. In the rovers example, agents F, I, and L each have their

own Boolean algebra with its atoms. The atoms of F's Boolean algebra are its actions

F-collect and F-deliver; the atoms of I's algebra are I-receive and I-deliver; the atoms of L's

algebra are L-transmit, L-pause, and L-receive. The element (F-collect _ F-deliver) of F's

Boolean algebra describes the set of actions fF-collect, F-deliverg.

A Boolean algebra K

i

is a subalgebra of K if K

i

is a nonempty subset of K that is closed

under the operations ^, _, and :, and also has the distinguished elements 0 and 1.

Q

K

i

is the product algebra of subalgebras K

i

. An atom of the product algebra is the meet of

the atoms of the subalgebras. For example, if a

1

; :::; a

n

are atoms of subalgebras K

1

; :::;K

n

,

respectively, then a

1

^ ::: ^ a

n

is an atom of

Q

n

i=1

K

i

.

The Boolean algebra K

F

for agent F's actions is the smallest one containing the atoms

of F's algebra. It contains all Boolean elements formed from F's atoms using the Boolean

operators ^, _, and :, including 0 and 1. These same de�nitions hold for I and L's

algebras K

I

and K

L

. K

F

K

I

K

L

is the product algebra used for all transition conditions

in the multiagent plan (i.e., the product of the F, I, and L FSAs). One atom of the

product algebra K

F

K

I

K

L

is (F-collect ^ I-receive ^ L-pause). This is the form of actions

taken simultaneously by the three agents. Algebras K

F

, K

I

, and K

L

are subalgebras of the

product algebra K

F

K

I

K

L

.

Let us return now to automata. Formally, an FSA of the type considered here is a

three-tuple S = (V (S);M

K

(S); I(S)) where V (S) is the set of vertices (states) of S, K

is the Boolean algebra corresponding to S, M

K

(S) : V (S) � V (S) ! K is the matrix of

transition conditions which are elements of K, and I(S) � V (S) are the initial states.

7

Also,

E(S) = fe 2 V (S) � V (S) j M

K

(e) 6= 0g is the set of directed edges connecting pairs of

vertices of S. M

K

(e), which is an abbreviation for M

K

(S)(e), is the transition condition of

7. There should also be an output subalgebra, as in Kurshan (1994). This would help distinguish an agent's

own actions from those of other agents. However it is omitted here for notational simplicity.

104


�

�

�

�

H

Hj

COLLECTING, RECEIVING, TRANSMITTING

F-deliver ^ I-receive ^ L-transmit

?�

�

�

�

DELIVERING, DELIVERING, RECEIVING

Figure 4: Part of the product plan for agents F, I, and L.

M

K

(S) corresponding to edge e. Note that we omit edges labeled \0." By our de�nition,

an edge whose transition condition is 0 does not exist. We can alternatively denote M

K

(e)

as M

K

(v

i

; v

j

) for the transition condition corresponding to the edge going from vertex v

i

to

vertex v

j

. For example, in Figure 3, M

K

((TRANSMITTING, PAUSING)) is (I-receive ^

L-pause).

Figures 2 and 3 illustrate these FSA de�nitions. There are FSA plans for three agents,

F, I, and L with vertices, edges, and transition conditions. An incoming arrow to a state,

not from any other state, signi�es that this is an initial state.

A multiagent plan is formed from single agent plans by taking the tensor product (also

called the \synchronous product" or simply \product") of the FSAs corresponding to the

individual plans. Formally, the tensor product is de�ned as:

n

i=1

S

i

= (�V (S

i

);

i

M(S

i

); � I(S

i

))

where � is the Cartesian product, and the tensor product M(S

i

) ::: M(S

n

) of n tran-

sition matrices is de�ned as M(S

1

) ::: M(S

n

)((v

1

; v

1

0

); :::; (v

n

; v

n

0

)) = M(S

1

)(v

1

; v

1

0

)

^ ::: ^ M(S

n

)(v

n

; v

n

0

) for (v

1

; v

1

0

) 2 E(S

1

), :::; (v

n

; v

n

0

) 2 E(S

n

). In words, the product

FSA is formed by taking the Cartesian product of the vertices and the intersection of the

transition conditions. Initial states of the product FSA are tuples formed from the initial

states of the individual FSAs.

The product FSA models a set of synchronous FSAs. The Boolean algebra correspond-

ing to the product FSA is the product algebra. For Figures 2 and 3, to formulate the FSA

S modeling the entire multiagent plan, we take the tensor product S = F I L of the

three FSAs. For this tensor product, I(S) = f(COLLECTING, RECEIVING, TRANS-

MITTING), (COLLECTING, RECEIVING, PAUSING), (COLLECTING, RECEIVING,

RECEIVING)g. Part of the tensor product FSA is shown in Figure 4.

Next we de�ne the language of an FSA, which is the set of all action sequences permitted

by the FSA plan. To do this, we �rst de�ne a string, which is a sequence of actions (atoms).

Formally, a string x is an in�nite-dimensional vector, (x

0

; :::) 2 �(K)

!

, i.e., a string is an

in�nite (!) length sequence of actions (where K is the Boolean algebra used by S). A

run v of string x is a sequence (v

0

; :::) of vertices such that 8i, x

i

^M

K

(v

i

; v

i+1

) 6= 0,

i.e., x

i

� M

K

(v

i

; v

i+1

) because the x

i

are atoms. In other words, a run of a string is the

sequence of vertices visited in an FSA when the string satis�es the transition conditions

along the edges.

105

Gordon

The language of FSA S is de�ned as:

L(S) = fx 2 �(K)

!

j x has a run v = (v

0

; :::) in S with v

0

2 I(S) g

Such a run is called an accepting run, and S is said to accept string x. Any requirement

on accepting runs of an FSA are what is called the FSA acceptance criterion. In this case,

the acceptance criterion consists of one condition: accepting runs must begin in an initial

state. The veri�cation literature calls these FSAs, which accept in�nite-length strings,

!-automata (Kurshan, 1994).

A few more de�nitions are needed. An FSA is complete if, for each state v 2 V (S),

P

w2V (S)

M

K

(v; w) = 1. In other words, an FSA is complete if it speci�es what state-to-state

transition the agent should make for all possible actions taken by the other agents. This is

a very reasonable assumption to make because otherwise the agent would not know what

to do in some circumstances. An FSA is deterministic at state v if w 6= w

0

) M

K

(v; w)

^M

K

(v; w

0

) = 0. In other words, the choice of action uniquely determines which edge will be

taken from a state. An FSA is deterministic if it is deterministic at each of its states. Unless

otherwise stated, it is assumed here that all FSAs are complete and deterministic. The

restriction to deterministic FSAs is not a major problem because for every nondeterministic

FSA there is a deterministic one accepting the same language (Kurshan, 1994).

We also need the de�nition of a cycle in a graph. Model checking typically consists of

looking for cycles, as described in Section 3.3. A path in FSA S is a sequence of vertices

v = (v

0

; :::; v

n

) 2 V (S)

n+1

, for n � 1 such that (v

i

; v

i+1

) 2 E(S) for i = 0; :::; n � 1, i.e.,

M

K

(v

i

; v

i+1

) 6= 0. If v

n

= v

0

, then v is a cycle. Each cycle in an FSA plan allows the

possibility that the agent can in�nitely often, or as long as desired, revisit the vertices of

the cycle. It also implies that a substring can be repeated inde�nitely.

We next illustrate some of these de�nitions. An example string in the language of FSA

S, the multiagent FSA that is the product of F, I, and L, is

((F-collect ^ I-receive ^ L-transmit),

(F-deliver ^ I-receive ^ L-receive),

(F-deliver ^ I-receive ^ L-transmit),

(F-deliver ^ I-deliver ^ L-receive), ...).

This is a sequence of atoms of S. A run of this string is

((COLLECTING, RECEIVING, TRANSMITTING),

(DELIVERING, RECEIVING, RECEIVING),

(DELIVERING, RECEIVING, TRANSMITTING),

(DELIVERING, DELIVERING, RECEIVING),

(COLLECTING, RECEIVING, RECEIVING), ...).

All FSAs in Figures 2 and 3 are complete and deterministic. For example, in Figure 2,

rover I can only take action I-deliver from its DELIVERING state. However every possible

action choice of L determines a unique next state for I from DELIVERING. For example,

if L takes L-transmit then I must stay in state DELIVERING, and if L takes L-receive or

L-pause then I must go to state RECEIVING.

106


3.2 Properties

Now that we have presented the FSA formalism used for agent plans, we can address the

question of how to formalize properties. For veri�cation, properties are typically expressed

either as FSAs (for automata-theoretic veri�cation) or in temporal logic. Here, we assume

linear temporal logic. In other words, we assume that time proceeds linearly and we do not

consider simultaneous possible futures. Using the algorithm of Vardi and Wolper (1986),

one can convert any linear temporal logic formula into an automaton (because automata

are more expressive than linear temporal logic). Both representations are used here. To

simplify our proofs in Section 4, properties are expressed in temporal logic. For some of the

incremental reveri�cation methods in Section 6, we use automata-theoretic methods with

an FSA representation for the property.

Let us begin by de�ning temporal logic properties. Many of the de�nitions are based on

Manna and Pnueli (1991). To bridge the gap between automata (for plans) and temporal

logic (for properties), we need to de�ne a computational state (c-state). A computation is

an in�nite sequence of temporally-ordered atoms, i.e., a string. A c-state is an atom in a

computation. In other words, it is a (single or multiagent) action that occurs at a single

time step in a computation. We continue to refer to an automaton state as simply a \state."

P is a property that is true (false) for an FSA S. S j= P (S 6j= P ), if and only if P

is true for every string in the language L(S) (false for some string in L(S)). The notation

x j= P (x 6j= P ) means string x satis�es (does not satisfy) property P , i.e., the property

holds (does not hold) for x. Before de�ning what it means for properties to be true (i.e.,

hold) for a string, we �rst de�ne what it means for a formula that is a Boolean expression

to be true at a c-state. A c-state formula p is true (false) at c-state x

i

, i.e., x

i

j= p (x

i

6j=

p) if and only if x

i

� p (x

i

6� p), i.e., x

i

^ p 6= 0 (= 0) because p is a Boolean expression

with no variables on the same Boolean algebra used by FSA S, and x

i

is an atom of that

algebra. For example, F-collect j= (F-collect _ F-deliver) for c-state F-collect and c-state

formula (F-collect _ F-deliver). One can also talk about a c-state formula being true or

false for an atom, since a c-state is an atom.

A c-state formula p is true or false in particular c-states of a string. Property P is

de�ned in terms of p, and is true or false of an entire string. In particular, x j= P or x 6j= P

for the string x.

We focus on two property classes that are among those most frequently encountered in

the veri�cation literature: Invariance and Response properties. Invariance and Response

properties are likely to be useful for agents. For the case of a single agent (SIT

1agent

), In-

variance properties can express the requirement that a particular action never be executed.

8

Response properties are also useful for a single agent. They can be used to verify that a

pair of the agent's actions will occur in the correct order (i.e., a \response" always fol-

lows a \trigger") in the plan. In the context of multiple agents (SIT

1plan

or SIT

multplans

)

Invariance properties express the need for parallel multiagent coordination. In particular,

they express that multiple agents should not simultaneously perform some con icting set of

8. This could alternatively be implemented as a run-time check, but then there would be no assurance that

the plan without the action is a good one, for example, in terms of how well the revised plan satis�es

the agent's goals (perhaps captured in a \�tness function"). Alternatively, the action (atom) could be

omitted from the set of actions �(K). But in general one may not wish to rule out actions, in case the

situation and/or properties might change.

107

Gordon

actions. Response properties express the need for sequential multiagent coordination. For

example, they can express the requirement that one agent's action must follow in response

to a particular \triggering" action of another agent.

Here, we only present informal de�nitions of these properties; the formal de�nitions are

in Appendix B. An Invariance property P = 2:p (\Invariant not p") is true of a string

if p is \never" true, i.e., if p is not true in any c-state of the string. P = 2(p ! 3q)

is a Response property, where 3 means \eventually." We call p the \trigger" and q the

\response." A Response formula states that every trigger is eventually (in �nite time)

followed by a response.

To illustrate these property types, we continue the rovers and lander example. The

property P1 from Section 2, which states that it should always be the case that I does

not deliver at the same time that L is transmitting, is formally expressed as an Invariance

property P1 de�ned as: P1 = 2 (:(I-deliver ^ L-transmit)). Property P2 from Section 2,

which states that if F-deliver has occurred then eventually L will execute L-receive, is an

example of a Response property. This is expressed in temporal logic as P2 = 2 (F-deliver

! 3 L-receive).

Next consider the FSA representation for properties. As will be explained in Section 3.3

on veri�cation, what we really need to express for automata-theoretic veri�cation is the

negation of the property, i.e., :P . Strings in the language of FSA :P violate property P .

In this paper, we assume that :P is expressed using the popular B�uchi !-automaton (B�uchi,

1962). We decided to use the B�uchi FSA because one of the simplest and most elegant model

checking algorithms in the literature assumes this type of FSA for the property, and we use

that algorithm (see Subsections 3.3 and 6.1). A B�uchi automaton is de�ned to be a four-

tuple S = (V (S);M

K

(S); I(S); B(S)), where B(S) � V (S) is a set of \bad" states. To

de�ne the language of a B�uchi automaton, we require the following preliminary de�nition.

For a run v of FSA S, �(v) = fv 2 V (S) j v

i

= v for in�nitely many v

i

s in run vg. In

other words, �(v) equals the set of all vertices of S that occur in�nitely often in the run v.

Then for a B�uchi automaton S, L(S) = fx 2 �(K)

!

j x has a run v = (v

0

; :::) in S with

v

0

2 I(S) and �(v) \ B(S) 6= ;g. In other words, the B�uchi automaton has an acceptance

criterion that requires visiting some bad state in�nitely often, as well as beginning in an

initial state.

An example deterministic B�uchi FSA for :P1, where Invariance property P1 = 2 :(I-

deliver ^ L-transmit), is in Figure 5 (on the left) with B(:P1) = f2g. Note that visiting a

state in B(:P1) in�nitely often implies B�uchi acceptance, and because the FSA expresses

the negation of the property, visiting a \bad" state in B(:P1) in�nitely often is undesirable.

From Figure 5 we can see that any string that includes (I-deliver ^ L-transmit) will visit

state 2 in�nitely often, and B(:P1) = f2g. Thus any string that starts in state 1 and

includes (I-deliver ^ L-transmit) is in L(:P1) and therefore violates property P1.

Next consider Response properties of the form 2(p ! 3q). For this paper, the only

type of FSA that we need for verifying Response properties is the very simple deterministic

B�uchi FSA for the negation of a \First-Response" property.

9

(Determinism is needed for

our e�cient internal representation. See Subsection 6.1.) A First-Response property checks

9. A straightforward inductive argument shows that it is not possible to construct a deterministic B�uchi

automaton with a �nite number of states for the negation of the full Response property 2(p ! 3q)

(Mahesh Viswanathan, personal communication).

108


�

�

�

�

H

Hj

1

�

��

�

else

I-deliver ^ L-transmit

?

�

�

�

�

2

�

��

*

1

�

�

�

�

H

Hj

1

�

��

�

else

F-deliver

?

�

�

�

�

2

�

��

*

else

�

�

�

�

3

�

L-receive

�

��

*

1

Figure 5: Invariance property :P1 (left) and the First-Response version of property :P2

(right) as B�uchi FSAs, where B(S) = f2g for both automata.

whether the first trigger p in every string is followed by a response q. Figure 5 (on the right)

shows a B�uchi FSA for the First-Response property corresponding to :P2, where property

P2 = 2 (F-deliver ! 3 L-receive). For this FSA, B(:P2) = f2g. Any string whose

accepting run visits state 2 in�nitely often will include the �rst trigger and not the response

that should follow it. As discussed in Subsection 6.5, verifying First-Response properties

can in some circumstances (including all of our experiments) be equivalent to verifying the

full Response property 2(p ! 3q). Henceforth, when we use the term \Response" this is

assumed to include both the full Response and the First-Response versions.

3.3 Model Checking for Veri�cation

Now that we have our representations for plans and properties, it is possible to describe

model checking, i.e., for plan S and property P determining whether S j= P . First, however,

we need to begin with two essential de�nitions of accessibility: accessibility of one vertex

from another, and accessibility of an atom from a vertex.

De�nition 1 Vertex v

n

is accessible from vertex v

0

if and only if there exists a path from

v

0

to v

n

.

De�nition 2 Atom a

n�1

2 �(K) is accessible from vertex v

0

if and only if there exists a

path from v

0

to v

n

and a

n�1

�M

K

(v

n�1

; v

n

).

Accessibility from initial states is central to model checking. The reason is the following.

Recall from Section 3.2 that property P is true (false) for an FSA S, (i.e., S j= P (S 6j= P )),

if and only if P is true for every string in the language L(S) (false for some string in

L(S)). By de�nition, every string in the language has an accepting run. Therefore, it is

only necessary to verify the property for strings that have an accepting run. By de�nition,

every accepting run begins with an initial state. Therefore, every state in an accepting run

is accessible from an initial state, and every atom (c-state) in a string of the language is

accessible from an initial state. Clearly, the only states and atoms that need to be involved

in veri�cation are those accessible from initial states.

Invariance properties can be re-expressed in terms of accessibility. Invariance property

2:p could be restated as saying that there does not exist any atom a, where a � p, that

is accessible from an initial state. It is much more di�cult to express Response properties

109

Gordon

succinctly using accessibility. Nevertheless, accessibility plays a key role in verifying all

properties, as will be seen shortly.

There are a number of ways to perform model checking, but here we focus on two.

The �rst method is speci�cally tailored for one class of properties; the second is su�ciently

general for use in verifying many classes of properties. The rationale for choosing a speci�c

and a general algorithm is that this allows for a comparison to determine the computational

e�ciency gained by property-speci�c tailoring (see Subsection 6.5). In this section, we give

high-level sketches of these two model checking algorithms. The full algorithms are in

Section 6.

The �rst algorithm is a very simple and e�cient method tailored for Invariance properties

P = 2:p. For every initial state v

i

, this method begins at v

i

and visits every atom a

j

accessible from v

i

. If this atom has not already been checked, it checks to see whether

a

j

� p. If a

j

� p, then this is considered a veri�cation failure. If there are no failures,

veri�cation succeeds.

The second method, automata-theoretic (AT) model checking, is very popular in the

veri�cation literature (e.g, see Vardi and Wolper, 1986) and can be used to verify any prop-

erty expressible as a �nite-state automaton. It is used here for First-Response properties.

In AT model checking, asking whether S j= P is equivalent to asking whether L(S) � L(P )

for property P . This is equivalent to L(S) \ L(P ) = ; (where L(P ) denotes the comple-

ment of L(P )), which is algorithmically tested by �rst taking the tensor product of the

plan FSA S and the FSA corresponding to :P (i.e., S :P ). The FSA corresponding to

:P accepts L(P ). The tensor product implements language intersection. The algorithm

then determines whether L(S :P ) 6= ;, which implies L(S) \ L(P ) 6= ; (S 6j= P ). This

determination is implemented as a check for cycles in the product FSA S:P that are ac-

cessible from some initial state and that satisfy any other conditions in the FSA acceptance

criterion. Recall that a cycle is a sequence of vertices (v

0

; :::; v

n

) such that v

n

= v

0

. A cycle

is accessible from an initial state if one of its vertices is accessible from the initial state. A

cycle that is accessible from an initial state and that satis�es the FSA acceptance criterion

implies a nonempty language. This is because a string is in the language of an FSA if it is

an in�nite-length sequence of actions satisfying the FSA acceptance criterion, which always

includes the requirement that its accepting run must begin in an initial state. All in�nite

behavior eventually ends up in a cycle because the FSA has a �nite number of states.

Therefore, to be certain that the language is nonempty, it is necessary to determine

whether any accessible cycle satis�es the FSA acceptance criterion. The criterion of inter-

est is the B�uchi criterion, for the following reason. It is assumed here that the negation

of the property (:P ) is expressed as a B�uchi automaton. This implies that the FSA be-

ing searched, i.e., S :P , is also a B�uchi automaton, because taking the tensor product

preserves this criterion. The �nal check of this algorithm is whether an accessible cycle in

S :P satis�es the B�uchi acceptance criterion, because in that case the language is not

empty. A product state s is in B(S :P ) whenever it has a component state in B(:P ),

e.g., (COLLECTING, RECEIVING, RECEIVING, 2) is in B(S :P2) for property P2

because its fourth component is state 2 of B(:P2). According to the B�uchi acceptance

criterion, visiting a state v 2 B(S :P ) in�nitely often (assuming v is accessible from

an initial state) implies L(S :P ) 6= ;. This will happen if v is part of an accessible

cycle. In that case, S 6j= P and veri�cation fails. Otherwise, if no accessible product state

110


v 2 B(S :P ) is visited in�nitely often (i.e., it is not in a cycle), then L(S :P ) = ;

and therefore L(S) � L(P ), i.e., S j= P and veri�cation succeeds. A relatively e�cient

algorithm for AT veri�cation from the literature is presented in Section 6.

3.4 Machine Learning to Adapt Plans

Given plan S and property P , model checking determines whether S j= P . Next we consider

the case of learning, which is a change to S. This subsection addresses the issue of how a

learning operator can a�ect a plan S to generate a new plan S

0

.

We begin by presenting a taxonomy of FSA learning operators. It is likely that any

learning method for complete deterministic FSAs will be composed of one or more of these

operators. Nothing about our approach requires evolutionary learning per se; however to

make the discussion concrete, this is the form of learning that is assumed here. In the

context of evolutionary algorithms, the FSA learning operators are perturbations, such as

mutations, applied to the FSAs.

Procedure EA

t = 0; /* initial generation */

initialize population(t);

evaluate �tness(t);

until termination-criterion do

t = t+ 1; /* next generation */

select parents(t);

perturb(t);

evaluate �tness(t);

enduntil

end procedure

Figure 6: The outline of an evolutionary algorithm.

We assume that learning occurs in two phases: the o�ine and online phases (see Fig-

ure 1). During the o�ine phase, each agent starts with a randomly initialized population of

candidate FSA plans. This population is evolved using the evolutionary algorithm outlined

in Figure 6. The main loop of this algorithm consists of selecting parent plans from the

population, applying perturbation operators to the parents to produce o�spring, evaluat-

ing the �tness of the o�spring, and then returning the o�spring to the population if they

are su�ciently �t. After this evolution, veri�cation and repair are done to these initially

generated plans.

At the start of the online phase, each agent selects one \best" (according to its \�tness

function") plan from its population for execution. The agents are then �elded and plan

execution is interleaved with learning (adaptation), reveri�cation, and plan repair as needed.

The purpose of learning during the online phase is to �ne-tune the plan and adapt it to

keep pace with a gradually shifting environment, since normally real-world environments

are not static. The evolutionary algorithm of Figure 6 is also used during this phase, but the

assumption is a population size of one and incremental learning (i.e., one learning operator

111

Gordon

applied per FSA per generation). This is practical for situations in which the environment

changes gradually, rather than radically.

Formally, a machine learning operator o : S ! S

0

changes a (product or individual)

FSA S to post-learning FSA S

0

. A mapping between two automata S and S

0

is de�ned as a

mapping between their elements (Bavel, 1983). At the highest level, we can subdivide the

learning operators according to the elements of the FSA that they alter:

� One class of operators adds, deletes, or moves edge transition conditions. In other

words, o :M

K

(S)!M

K

(S

0

).

� Another class of operators adds, deletes, or moves edges, i.e., o : E(S)! E(S

0

).

� The third class of operators adds or deletes vertices, along with their edges, i.e.,

o : V (S)! V (S

0

) and o : E(S)! E(S

0

).

� The fourth class of operators changes the Boolean algebra used in the transition

conditions, i.e., o : K ! K

0

.

Here, we do not de�ne operators that add or delete states. In other words, we do not

address the third class of operators. The reason is that with the type of FSAs used here,

adding or deleting a state does not, in itself, a�ect properties. It is what we do with the

edges to/from a state and their transition conditions that can alter whether a property

is true or false for a plan. This is because properties are true or false for comp-states

(atoms) rather than for FSA states. Furthermore, this paper does not address changes to

the Boolean algebra, which is the fourth class of operators. This class of operators, which

includes abstractions, is addressed in Gordon (1998).

Therefore we are focusing on the �rst and second classes of operators. We de�ne operator

schemas, rather than operators. A machine learning operator schema applies to unspeci�ed

(variable) vertices, edges, and transition conditions. When instantiated with particular

vertices, edges, and transition conditions, it becomes a machine learning operator. In order

to avoid tedium, the operator schema de�nitions consider only the relevant parts of the

FSA, e.g., those parts that get altered. There is an implicit assumption that all unspeci�ed

parts of the FSA remain the same after operator application. There is also an assumption

that the learner ensures that all operators keep the automaton complete and deterministic.

The operators can be seen in the taxonomy (partition) of Figure 7. We de�ne each of

the corresponding operator schemas as follows, beginning with the most general one, called

o

change

, which changes edge transition conditions:

Operator Schema 1 (o

change

) Let S be an FSA with Boolean algebra K, and let o

change

:

S ! S

0

. Then we de�ne o

change

: M

K

(S)!M

K

(S

0

). In particular, suppose z �M

K

(v

1

; v

2

),

z 6= 0, for (v

1

; v

2

) 2 E(S) and z 6�M

K

(v

1

; v

3

) for (v

1

; v

3

) 2 E(S). Then o

change

(M

K

(v

1

; v

2

))

= M

K

(v

1

; v

2

) ^ :z (step 1) and/or o

change

(M

K

(v

1

; v

3

)) = M

K

(v

1

; v

3

) _ z (step 2). In other

words, o

change

may consist of two steps: the �rst to remove condition z from edge (v

1

; v

2

),

and the second to add (the same) condition z to edge (v

1

; v

3

). Alternatively, o

change

may

consist of only one of these two steps.

112


o

change

step (1) and step (2) step (1) or step (2) alone

access-cond : access-cond

v

1

=v

3

v

1

6= v

3

step (1) step (2)

X

X

X

X

X

X

X

X

X

X

X

X

X

Xz

�

�

�

�

�

�

�

�

�9

o

move

�

�=

Z

Z~

o

delete_spec

�

�=

Z

Z~

o

add_gen

�

�=

Z

Z~

o

delete

o

add

o

spec

o

gen

�

�

�

�=

�

�

�9

�

�

�

�/

�

�=

Z

Z~

X

X

X

X

X

X

X

X

Xz

�

�=

Z

Z~

o

delete+gen

o

spec+gen

o

stay

o

delete+add

o

spec+add

Figure 7: Taxonomy (partition) of learning operators.

All of the remaining operators are easier to describe in terms of a set of four primitive

operators. Therefore, we next de�ne these four primitives, which are one-step operators

that are special cases of o

change

and appear at the bottom right as leaves in the hierarchy of

Figure 7. The �rst two primitive operators delete (o

delete

) and add (o

add

) edges. We de�ne

o

delete

to delete edge (v

1

; v

2

) with the operator schema:


delete


delete

:

S ! S

0

be de�ned with o

delete

: E(S) ! E(S) n f(v

1

; v

2

)g for deleted edge (v

1

; v

2

) of S.

Recall that a nonexistent edge has transition condition 0. Operator o

delete

could therefore be

considered a special case of o

change

that consists only of step (1) and an additional condition

that must be met, namely, that o

delete

(M

K

(v

1

; v

2

)) = (M

K

(v

1

; v

2

) ^ :z) = 0.

We de�ne o

add

to add edge (v

1

; v

3

) with the operator schema:


add


add

: S !

S

0

be de�ned with o

add

: E(S)! E(S)[f(v

1

; v

3

)g for added edge (v

1

; v

3

) of S. Operator o

add

could be considered a special case of o

change

that consists only of step (2) and the additional

condition that M

K

(v

1

; v

3

) = 0 prior to applying o

add

.

The other two primitive operators are specialization (o

spec

) and generalization (o

gen

).

Specialization and generalization are operators commonly found in the machine learning lit-

erature, e.g., see Michalski (1983). In the context of an FSA, specialization lowers the level

of a particular state-to-state transition condition in the partial order �, whereas general-

ization raises it, as in Mitchell's Version Spaces (Mitchell, 1978). In particular, a transition

condition can be specialized with a meet and can be generalized with a join, which is

analogous to adding a conjunct to specialize and a disjunct to generalize as in Michalski

(1983).

Formally, we de�ne specialization and generalization, respectively, as follows:


spec


spec

: S !

S

0

. Then we can de�ne o

spec

: M

K

(S) ! M

K

(S

0

), where o

spec

(M

K

(v

1

; v

2

)) = M

K

(v

1

; v

2

) ^

113

Gordon

:z, for some z 2 K, z 6= 0. Operator o

spec


change

that

consists only of step (1) and the additional two conditions o

spec

(M

K

(v

1

; v

2

)) = (M

K

(v

1

; v

2

)

^ :z) 6= 0 (i.e., o

spec

6= o

delete

), and M

K

(v

1

; v

2

) 6= :z (since otherwise o

spec

has no e�ect).


gen


gen

: S !

S

0

. Then we can de�ne o

gen

:M

K

(S)!M

K

(S

0

), where o

gen

(M

K

(v

1

; v

3

)) =M

K

(v

1

; v

3

) _ z,

for some z 2 K, z 6= 0. Operator o

gen


change

that

consists only of step (2) and the two additional conditions that M

K

(v

1

; v

3

) 6= 0 (i.e., o

gen

6= o

add

) and (M

K

(v

1

; v

3

) ^z) = 0 (because otherwise z adds redundancy) prior to o

gen

.

Next, 10 learning operators are de�ned from these four primitives. Below o

change

in

the operator hierarchy of Figure 7 are two subtrees. The right subtree consists of one-step

operators, and the left subtree consists of two-step operators. We de�ne the two one-step

operators just below o

change

�rst (since we just de�ned the primitive operators below them):


delete_spec

) This operator consists of applying either of the prim-

itive operators o

delete

or o

spec

.


add_gen

) This operator consists of applying either of the primitive

operators o

add

or o

gen

.

It is relevant at this point to introduce two more operators that are not in the hierarchy

of Figure 7. They are not in the hierarchy because they are merely minor variants of

o

delete_spec

and o

add_gen

and they do not belong strictly below our most general operator

o

change

. These operators are introduced here because they are very useful and also because

they are guaranteed to preserve completeness of FSAs. In other words, if the FSA is

complete prior to applying these operators then it will be complete after applying them.

Recall from Section 2 that each FSA state is associated with a set of allowable actions that

may be taken from that state. These operators delete or add an action from the set of

allowable actions from a state:


delete�action

) Delete an allowable action from a state v

1

by one

or more applications of operator o

delete_spec

. Each application may be to a di�erent outgoing

edge from v

1

.


add�action

) Add an allowable action from a state v

1

by one or

more applications of operator o

add_gen

. Each application may be to a di�erent outgoing

edge from v

1

.

To understand why o

delete�action

consists of one or more applications of o

delete_spec

,

consider the following example. In Figure 2, deleting F-collect as an allowable action from

F's COLLECTING state results in F-deliver being the only allowable action from that state.

Furthermore, this results in the edge (COLLECTING, COLLECTING) being deleted and

the edge (COLLECTING, DELIVERING) being specialized. The reasoning is similar for

why o

add�action

is one or more applications of o

add_gen

.

The remaining operators, which are all of the operators on the left subtree of o

change

in

Figure 7, consist of two steps: the �rst to remove condition z from edge (v

1

; v

2

), and the

114


�

�

�

�

STATE1

a _ b

c

?

Z

Z

Z~

�

�

�

�

STATE2

�

�

�

�

STATE3

�

�

�

�

STATE4

a

?

Figure 8: Moving transition conditions between edges.

second to add (the same) condition z to edge (v

1

; v

3

). The �rst step consists of applying

one primitive operator, and the second step consists of applying another primitive operator.

Every one of the following operators preserves determinism and completeness of the FSAs.

In other words, if the FSA is deterministic and complete prior to operator application then

it will be deterministic and complete afterwards.


move

) This operator schema is identical to that of o

change

, with

one exception. Replace \and/or" with \and" in the de�nition. In other words, we have

o

move

(M

K

(v

1

; v

2

)) = M

K

(v

1

; v

2

) ^ :z and o

move

(M

K

(v

1

; v

3

)) = M

K

(v

1

; v

3

) _ z for some

(v

1

; v

2

); (v

1

; v

3

) 2 E(S). Therefore o

move

moves z from one edge to another.

All of the remaining operators are special cases of o

move

. We begin with the right subtree

of o

move

:


delete+add

) Apply o

delete

to edge (v

1

; v

2

) and then apply o

add

to

edge (v

1

; v

3

).

An example of o

delete+add

, using Figure 8, is to delete edge (STATE1, STATE3) (i.e.,

makeM

K

(STATE1, STATE3) = 0) and add a new edge (STATE1, STATE1) with transition

condition M

K

(STATE1, STATE1) = c.


spec+add

) Apply o

spec

to edge (v

1

; v

2

) and then apply o

add

to edge

(v

1

; v

3

).

For example, using Figure 8, we can move \b" from edge (STATE1, STATE2) to a

newly created edge (STATE1, STATE1) to make M

K

(STATE1, STATE2) = a ^ :b and

M

K

(STATE1, STATE1) = b. This is specialization of the condition on edge (STATE1,

STATE2) followed by addition of edge (STATE1, STATE1).

Next consider the left subtree of o

move

. At this point, it is relevant to examine the

reason for the split into the two subtrees of o

move

. All of the operators in the left subtree

satisfy a condition that is called the \accessibility condition." This condition states that

prior to learning (and also after learning), if vertex v

1

is accessible from some initial state

then vertex v

3

is guaranteed to also be accessible from that initial state. The reason for

this partition will become clear in Subsection 4.2, where we show that a theorem holds for

the two-step operators if and only if the accessibility condition is true. The reason that the

115

Gordon

two operators in the right subtree of o

move

fail to satisfy the accessibility condition is due

to their having o

add

as their second step. The de�nition of o

add

states that M

K

(v

1

; v

3

) = 0

prior to operator application, and therefore we have no guarantee of v

3

's accessibility, given

that v

1

is accessible from an initial state. The following are the de�nitions of the operators

for which the accessibility condition is true:


delete+gen

) Apply o

delete

to edge (v

1

; v

2

) and then apply o

gen

to

edge (v

1

; v

3

).

As an example, in Figure 8, we can move the condition \a _ b" from edge (STATE1,

STATE2) to edge (STATE1, STATE3) to make M

K

(STATE1, STATE2) = 0 and to make

M

K

(STATE1, STATE3) = c _ a _ b. This is deletion of edge (STATE1, STATE2) followed

by generalization of the transition condition on edge (STATE1, STATE3).


spec+gen

) Apply o

spec

to edge (v

1

; v

2

) and then apply o

gen

to edge

(v

1

; v

3

).

As an example, in Figure 8, we can move the disjunct \b" from edge (STATE1, STATE2)

to edge (STATE1, STATE3) to make M

K

(STATE1, STATE2) = a ^ :b andM

K

(STATE1,

STATE3) = c _ b. This is a specialization of the transition condition on edge (STATE1,

STATE2) followed by a generalization of the transition condition on edge (STATE1, STATE3).


stay

) The de�nition is the same as that of o

move

, with one ex-

ception. Replace vertex v

3

with vertex v

1

everywhere. In other words, the operator consists

of moving a condition from edge (v

1

; v

2

) to edge (v

1

; v

1

).

Note that each operator instantiation of the schema for o

stay

will be a special case of one

of the following: o

delete+add

, o

spec+add

, o

delete+gen

, or o

spec+gen

. It is considered o

stay

if and

only if on the second step of the operator the transition condition is moved to edge (v

1

; v

1

).

For example, using Figure 8, when we applied operator o

spec+add

(in the example above)

to move the disjunct \b" from edge (STATE1, STATE2) to edge (STATE1, STATE1) to

make M

K

(STATE1, STATE2) = a ^ :b and M

K

(STATE1, STATE1) = b, this could

be considered an instantiation of o

stay

, as well as o

spec+add

. Likewise when we applied

o

delete+add

to delete edge (STATE1, STATE3) and add edge (STATE1, STATE1) with \c"

as the transition condition, this could also be considered an instantiation of o

stay

.

Operator o

stay

is an especially useful operator. It makes the reasonable assumption that

when an agent no longer wants to transition to another state (e.g., an edge is deleted),

the agent just stays in its current state. In other words, the condition for transitioning to

another state is transferred to the edge leading back to the current state. For example,

suppose rover I becomes stuck at the lander and cannot rendezvous with F for an inde-

terminate period of time. It could generate a temporary plan (see Figure 2) that keeps

I in its DELIVERING state by deleting edge (DELIVERING, RECEIVING) and making

M

K

(DELIVERING, DELIVERING) = 1 (and DELIVERING would have to become an

initial state).

Recall that accessibility is a key issue for veri�cation. Now that we have a set of op-

erator schemas, let us consider how these operators a�ect accessibility from initial states.

116


Clarifying this will be relevant for understanding both the a priori proofs about property

preservation, and the motivation for the incremental reveri�cation algorithms. There are

two fundamental ways that our learning operators may a�ect accessibility: locally (ab-

breviated \L"), i.e., by directly altering the accessibility of atoms or states, or globally

(abbreviated \G"), i.e., by altering accessibility of states or atoms that could be visited

after the part of the FSA modi�ed by the learning operator. In particular, any change to

the accessibility of v

1

, v

2

, v

3

or atoms in M

K

(v

1

; v

2

) or M

K

(v

1

; v

3

), referenced in the oper-

ator de�nition, is considered local. Changes to accessibility of any other states or atoms is

considered global.

As an example of an L (local) change to accessibility, using Figure 8, suppose the agent

discovers a new action \d" that it can take. It adds \d" to its action repertoire, as well as

to the set of allowable actions from one of the states in its FSA. In particular, the agent

decides to allow \d" from STATE1 and decides to apply o

gen

to the transition condition for

(STATE1, STATE3) to get condition \c _ d." Then atom \d" was not previously accessible

from any initial state, but if we assume STATE1 is accessible from an initial state then the

application of o

gen

made the atom \d" accessible. Using Figure 8 to illustrate a G (global)

change to accessibility, suppose we delete edge (STATE1, STATE3) in that �gure. Then

STATE4, which was previously accessible (because we assume STATE1 is accessible) is no

longer accessible. On the other hand, the fact that STATE3 is no longer accessible is a local

change.

Now we are ready to summarize what the learning operators can do to accessibility.

First, we introduce one more notational convenience. The symbols " and # denote \can

increase" and \can decrease," respectively, and 6" and 6# denote \cannot increase" and \can-

not decrease," respectively. We use these symbols with G and L, e.g., " G means that a

learning operator can (but does not necessarily) increase global accessibility, and 6# L means

that an operator cannot decrease local accessibility.

The results for the primitive operators are intuitively obvious:

� o

delete

: # G # L 6" G 6" L

� o

spec

: 6# G # L 6" G 6" L

� o

add

: 6# G 6# L " G " L

� o

gen

: 6# G 6# L 6" G " L

The primitive operators provide answers about changes in accessibility for all of the

one-step operators. For the two-step operators (i.e., o

move

and all operators below it in the

hierarchy of Figure 7), we need to consider the net e�ect. For the results in this paper, we

only need to focus on one distinction { the di�erence in the net e�ect for those operators

that satisfy the accessibility condition (i.e., the left subtree of o

move

) versus the net e�ect for

those operators that do not satisfy this condition (i.e., the right subtree). The net e�ect of

those operators that satisfy the accessibility condition is that accessibility (global and local)

will never be increased, i.e., 6" G and 6" L. The reason is as follows. By looking at the results

for the primitive operators, it is apparent that the �rst step in these two-step operators can

never increase accessibility, because the �rst step is always o

delete

or o

spec

. Therefore, to

understand the intuition behind this result we need to examine the second step. Consider

117

Gordon

o

delete+gen

and o

spec+gen

. Note that o

gen

does not increase global accessibility (6" G), but

it can increase local accessibility (" L). Is " L a net e�ect due to the generalization step?

Because atoms are being transferred from one outgoing edge of some vertex v

1

to another

outgoing edge of v

1

with these two operators, by de�nition the local accessibility of those

atoms from an initial state will not be increased as a net e�ect. In other words, the atoms

are accessible from an initial state if and only if v

1

is, and these two learning operators

do not increase the accessibility of v

1

. Furthermore, by de�nition M

K

(v

1

; v

3

) 6= 0 prior to

learning, so the accessibility of v

3

is not increased. We conclude that 6" L is a net e�ect.

A similar line of reasoning explains why operator o

stay

will not increase local accessibility.

Operator o

stay

cannot increase global accessibility, even if it adds an edge, because the only

edge that this operator could add is (v

1

; v

1

). In conclusion, all three operators that satisfy

the accessibility condition have a net e�ect of not increasing accessibility (6" G and 6" L). On

the other hand, because operators o

delete+add

and o

spec+add

have o

add

as their second step,

they can increase accessibility.

Results from lower in the hierarchy of Figure 7 are inherited up the tree. For example,

because o

delete+add

can increase global accessibility, o

move

can as well. The following is a

summary of the relevant results we have so far about how the two-step learning operators

can change accessibility. To avoid overwhelming the reader, we present only those results

necessary for understanding this paper.

� o

stay

, o

delete+gen

, o

spec+gen

: 6" G 6" L

� o

delete+add

, o

spec+add

, o

move

, o

change

: " G

Before concluding this section, we brie y consider a di�erent partition of the learning

operators than that re ected in the taxonomy of Figure 7. This di�erent partition is neces-

sary for understanding the a priori proofs about the preservation of Response properties (in

Section 4). For this partition, we wish to distinguish those operators that can introduce at

least one new string with an in�nitely repeating substring (e.g., (a,b,c,d,e,d,e,d,e,...) where

the ellipsis represents in�nite repetition of d followed by e) into the FSA language versus

those that cannot. Any operator that can add atoms to the transition condition for an edge

in a cycle, add an edge to an existing cycle, or add an edge to create a new cycle belongs

to the �rst class (the class that can add such substrings). Thus this �rst class includes our

operators that can create new cycles (e.g., o

stay

because it can add a new edge (v

1

; v

1

)), as

well as our operators that can generalize the transition condition along some edge of a cycle

(e.g., o

delete+gen

because it can generalize M

K

(v

1

; v

1

)). The operators are divided between

these two classes as follows:

1. o

add

, o

gen

, o

add_gen

, o

add�action

, o

stay

, o

delete+gen

, o

spec+gen

, o

delete+add

, o

spec+add

, o

move

,

o

change

2. o

delete

, o

spec

, o

delete_spec

, o

delete�action

It is important to note that all of the two-step operators are in the �rst class.

At this point we have de�ned a set of useful operators (via their operator schemas) that

one could apply to an FSA plan for adaptation. With these operators, it is possible to

improve the e�ectiveness of a plan, and to adapt it to handle previously unforeseen external

118


and internal conditions. To ensure the usefulness of these learning operators, the learner

needs to check that it has not generated a useless plan (i.e., L(S) 6= ;). Although not

addressed in this paper, we are currently developing e�cient methods for making this check

using the knowledge of the learning that was done.

The particular choice of learning operators presented here was motivated by four factors.

First, these operators translate into easy-to-implement perturbations of entries in a table,

which is the representation of FSAs used in our implementation (see Section 6). Second,

these operators were inspired by the literature. For example, generalization and special-

ization operators are considered fundamental for inductive inference (Michalski, 1983), and

deleting/adding FSA edges are e�ective for evolving FSAs (Fogel, 1996). Third, these oper-

ators made practical sense in the context of applications that were considered. Fourth, the

particular taxonomies presented here facilitate powerful theoretical and empirical results

for reducing the time complexity of reveri�cation, as shown in the remainder of this paper.

4. A Priori Results about the Safety of Machine Learning Operators

Subsection 3.4 de�ned several useful learning operator schemas to modify automaton edges

(o : E(S)! E(S

0

)) and the transition conditions along edges (o :M

K

(S)!M

K

(S

0

)). The

results in this section establish which of these operator schemas o are a priori guaranteed to

preserve two property classes of interest (Invariance and Response). This section assumes

that all learning operators are applied to a single FSA plan, i.e., SIT

1agent

or SIT

1plan

.

Section 5 addresses the translation of the operators applied to a single plan into their e�ect

on a product plan (for SIT

multplans

), and how this a�ects the results. We begin by formally

de�ning what we mean by \safe machine learning operator."

4.1 \Safe" Online Machine Learning

Our objective is to lower the time complexity of reveri�cation. The ideal solution is to iden-

tify safe machine learning methods (SMLs), which are machine learning operators that are

a priori guaranteed to preserve properties (also called \correctness preserving mappings")

and require no run-time reveri�cation. For a plan S and property P , suppose veri�cation

has succeeded prior to learning, i.e., 8x, x 2 L(S) implies x j= P (i.e., S j= P ). Then a

machine learning operator o(S) is an SML if and only if veri�cation is guaranteed to succeed

after learning. In other words, if S

0

= o(S), then S j= P implies S

0

j= P .

Subsection 4.2 provides results about the a priori safety of machine learning operators.

Some of the results in Subsection 4.2 are negative. Nevertheless, although we do not have

an a priori guarantee for these learning operators, Section 6 shows that we can perform

reveri�cation more e�ciently than total reveri�cation from scratch.

4.2 Theoretical Results

Let us begin by considering the primitive operators. The results for all primitive operators

are corollaries of two fundamental theorems, Theorems 1 and 2, which may not be imme-

diately intuitive. For example, it seems reasonable to suspect that if an edge is deleted

somewhere along the path from a trigger to a response, then this could cause failure of a

Response property to hold because the response is no longer accessible. In fact, this is not

119

Gordon

true. What actually happens is that deletions reduce the number of strings in the language.

If the original language satis�es the property then so does is the smaller language. Theorem

1 formalizes this.

Theorem 1 Let S

0

be an FSA with Boolean algebra K. Let S be identical to S

0

, but with

additional edges, i.e., o : S

0

! S is de�ned as o : E(S

0

) ! E(S), where E(S

0

) � E(S).

Then L(S

0

) � L(S).

Proof. The language may be enlarged by the addition of new edges that have newly learned

transition conditions. On the other hand, because every accepting run remains an accepting

run regardless of new edges, x 2 L(S

0

) implies x 2 L(S), and we are never reducing the

size of the language. Therefore, L(S

0

) � L(S). 2

The results about the machine learning operator schemas o

delete

and o

add

follow as corol-

laries:

Corollary 1 o

delete

is an SML with respect to any property P .

Proof. Assume S j= P . Then 8x, x 2 L(S) implies x j= P . De�ne o

delete

(S) = S

0

. By

Theorem 1, L(S

0

) � L(S). Therefore, 8x, x 2 L(S

0

) implies x j= P . We conclude that

S

0

j= P , i.e., o

delete

(S) j= P . 2

To be consistent with Theorem 1, in Corollary 2 only (but not in the rest of the paper), we

use S

0

for the pre-o

add

FSA and S for the post-o

add

FSA, i.e., o

add

(S

0

) = S.

Corollary 2 o

add

is not necessarily an SML for any property, including Invariance and

Response properties.

Proof. Assume S

0

j= P . Then 8x, x 2 L(S

0

) implies x j= P . By Theorem 1, L(S

0

) �

L(S). Then we cannot be certain that S j= P , i.e., that o

add

(S

0

) j= P . For instance,

a counterexample for Invariance property 2:p occurs if we add an accessible edge with

transition condition p. 2

Now we consider a priori results for o

spec

and o

gen

. Again, we begin with a relevant

theorem for operator schema o.

Theorem 2 Let S

0

be an FSA with Boolean algebra K, and let o : S

0

! S be de�ned as

o : M

K

(S

0

) ! M

K

(S) where 9z 2 K, z 6= 0, (v

1

; v

3

) 2 E(S

0

), such that o(M

K

(v

1

; v

3

)) =

M

K

(v

1

; v

3

) _ z. Then L(S

0

) � L(S).

Proof. Similar to the proof of Theorem 1. 2

Corollary 3 o

spec

is an SML for any property.

Proof. Similar to the proof of Corollary 1 of Theorem 1. 2

Corollary 4 o

gen

is not necessarily an SML for any property, including Invariance and

Response properties.

120


Proof. Similar to the proof of Corollary 2 of Theorem 1. 2

We can draw the following conclusions from the theorems and corollaries just presented:

� Of the one-step learning operators, those that are guaranteed to be SMLs for any

property are o

delete

, o

spec

, and o

delete_spec

(which implies that o

delete�action

is also an

SML for any property).

� We need never be concerned with the �rst step in a two-step operator. It is guaranteed

to be an SML (because o

delete

or o

spec

is always the �rst step).

Next consider theorems that are needed to address the two-step operators. Although we

found results for the one-step operators that were general enough to address any property,

we were unable to do likewise for the two-step operators. Our results for the two-step op-

erators determine whether these operators are necessarily SMLs for Invariance or Response

properties in particular. Future work will consider other property classes. The theorems

are quite intuitive. The �rst theorem distinguishes those learning operators that will satisfy

Invariance properties from those that will not:

Theorem 3 A machine learning operator is guaranteed to be an SML with respect to any

Invariance property P if and only if 6" G and 6" L are both true (which, for our two-step

operators, implies that the operator satis�es the accessibility condition).

Proof. Suppose 6" G and 6" L are both true. Let Invariance property P = 2 :p. Assume P

is true of FSA S prior to learning. Then for every string y 2 L(S), it must be the case that

:p is true in every c-state of y. If accessibility of atoms is not increased (i.e., 6" G and 6"

L), then it must be the case that every c-state of every string x 2 L(S

0

), where S

0

= o(S),

is also a c-state of some string in L(S). Therefore, for every string x 2 L(S

0

), it must be

the case that :p is true in every c-state of x. In other words, moving transition conditions

around in an FSA without increasing accessibility will not alter the truth of an Invariance

property, which holds in every c-state of every string in the language of the FSA.

Suppose "G or " L. Increasing accessibility of atoms implies the possibility of introducing

a c-state in some string x 2 L(S

0

), where S

0

= o(S), that was not in any string of L(S).

This can cause violation of an Invariance property, as in the counterexample in the proof

of Corollary 2. Knowing that :p is true in every c-state of every string of L(S) provides no

guarantee that :p is true in every c-state of every string of L(S

0

). 2

Since we already have results to cover the one-step operators, we need only consider the

two-step operators.

Corollary 5 The machine learning operator schemas o

delete+gen

, o

spec+gen

, and o

stay

are

guaranteed to be SMLs with respect to any Invariance property P because for all of these

operators 6" G and 6" L.

Corollary 6 The machine learning operator schemas o

delete+add

, o

spec+add

, o

move

, and

o

change

are not necessarily SMLs with respect to any Invariance property P because for

all of these operators " G.

121

Gordon

�

�

�

�

H

Hj

STATE1

b

?

6

c

�

�

�

�

STATE2

a

?

6

d

�

�

�

�

STATE3

�

��

Y

e

�

�

�

�

H

Hj

STATE1

b

?

6

c

�

�

�

�

STATE2

6

d

�

�

�

�

STATE3

�

��

Y

e _ a

Figure 9: The automata S1 (left) and S1

0

(right).

The next theorem characterizes those learning operators that cannot be guaranteed to

be SMLs with respect to Response properties.

Theorem 4 Any machine learning operator schema that can introduce a new string with

an in�nitely repeating substring into the FSA language cannot be guaranteed to be an SML

for Response properties.

Proof. Assume FSA S satis�es a Response property prior to learning. Therefore every

string accepted by S satis�es the property. For each accepted string, every instance (or

the �rst instance if it is a First-Response property) of the trigger is eventually followed

by a response. Suppose the machine learning operator introduces a new string with an

in�nitely repeating substring into the language. Then it is possible that the pre�x of this

string before the in�nitely repeating substring includes a trigger and no response, and the

in�nitely repeating substring does not include a response. 2

Since we already have results to cover the one-step operators, we need only consider the

two-step operators.

Corollary 7 All of the two-step learning operators cannot be guaranteed to be SMLs with

respect to Response properties because they are in the �rst class in the partition related to

this theorem, i.e., they may introduce strings with in�nitely repeating substrings.

Consider a couple of illustrative examples of Theorem 4 and its corollary, using Figure 9.

Prior to learning (the FSA on the left of Figure 9), 8x, where x 2 L(S1), x j= P3, for

Response property P3 = 2 (a ! 3 d). Assume operator o

stay

: S1 ! S1

0

deletes edge

(STATE2, STATE3) and generalizes the transition condition on edge (STATE2, STATE2)

to \e _ a" (see Figure 9 on the right). Then the string consisting of b followed by in�nitely

many a's (b,a,a,a,...) 2 L(S1

0

) but 6j= P3. This helps us to see why o

stay

is not necessarily

an SML for Response properties. The same example illustrates why o

delete+gen

cannot be

guaranteed to be an SML for Response properties. For o

spec+gen

, suppose the condition for

(STATE2, STATE3) is \f _ a" in S1, and \f ^: a" in S1

0

but everything else is the same

as in Figure 9. Again, we can see the problem for Response properties.

We conclude by summarizing the positive a priori results:

� o

delete

, o

spec

, o

delete_spec

and o

delete�action

are SMLs for any property (expressible in

temporal logic).

122


� o

delete+gen

, o

spec+gen

and o

stay

are SMLs for Invariance properties.

and the negative a priori results:

� o

add

, o

gen

, o

add_gen

, o

add�action

, o

spec+add

, o

delete+add

, o

move

and o

change

are not neces-

sarily SMLs for Invariance or Response properties.

� o

delete+gen

, o

spec+gen

and o

stay

are not necessarily SMLs for Response properties.

The fact that all three learning operators that satisfy the accessibility condition are

guaranteed to be SMLs for Invariance properties is signi�cant, because Invariance properties

are extremely useful and common for verifying systems and many important applications

need only test properties of this class (Heitmeyer et al., 1998).

Finally, from Theorems 1 and 2 we learned that the heart of the problem for all of

the negative results is either an o

gen

step or an o

add

step. Later in this paper we address

these troublesome steps by �nding more e�cient methods for dealing with them than total

reveri�cation from scratch. However, �rst, in the next section, we consider how our a priori

results are translated from a single to a product FSA for SIT

multplans

.

5. Translating Learning Operators to a Product Automaton

In this section we address SIT

multplans

where each agent maintains and uses its own in-

dividual FSA, but for veri�cation the product FSA needs to be formed and veri�ed. For

SIT

multplans

, a learning operator is applied to an individual agent FSA and then the product

is formed. Therefore, it is necessary to consider the translation of each learning operator

from individual to product FSA, and how that a�ects the a priori SML results presented

above.

For operators o

spec+gen

, o

delete+gen

, o

spec+add

, and o

delete+gen

, we consider only the trans-

lations of the primitive operators. This is because the translations of these operators are

simply translations of their primitive components. The remaining translations are:

� o

spec

translates to o

spec

and/or o

delete

.

� o

delete

translates to o

spec

and/or o

delete

.

� o

gen

translates to o

gen

and/or o

add

.

� o

add

translates to o

gen

and/or o

add

.

� o

stay

translates to o

stay

and/or o

move

.

� o

move

translates to o

move

.

� o

change

translates to o

change

.

It may not be intuitive to the reader how o

gen

can translate to o

add

. To illustrate, we

use Figure 10, where the transition conditions, such as (a _ c), denote sets of multiagent

actions. Suppose o

gen

is applied to edge (1, 2) in the leftmost FSA so that the transition

condition is now (d _ b). Then a new edge (11

0

, 21

0

) is added to the product FSA (rightmost

123

Gordon

�

�

�

�

H

Hj

1

�

��

�

a _ c

d

?

6

e

�

�

�

�

2

�

��

*

c

�

�

�

�

H

Hj

1

0

�

��

�

b _ c

PRODUCT

�

�

�

�

H

Hj

11

0

�

��

�

c

�

�

�

�

21

0

�

��

*

c

Figure 10: Generalization can become addition in product.

in Figure 10) with the transition condition b. Recall that to form the product FSA we take

the Cartesian product of the vertices and the intersection of the transition conditions.

Likewise, o

spec

translates to either o

spec

or o

delete

in the product FSA.

To illustrate why o

stay

can become o

move

in the product, we use Figure 3. Suppose

we delete the edge (TRANSMITTING, RECEIVING) and move the transition condition

to edge (TRANSMITTING, TRANSMITTING). Then the global state (DELIVERING,

DELIVERING, TRANSMITTING) becomes accessible from initial state (COLLECTING,

RECEIVING, TRANSMITTING) by taking multiagent action (F-deliver ^ I-receive ^ L-

transmit). Previously, that multiagent action forced the product FSA to go to (DELIVER-

ING, DELIVERING, RECEIVING).

What implications do these translations have for the safety of the learning operators

for the product FSA? The positive a priori results for o

delete+gen

, o

spec+gen

, and o

stay

for

preserving Invariance properties become negative for the product. This is because o

gen

may

become o

add

and o

stay

may become o

move

. On the other hand, the positive a priori results

for o

delete

, o

spec

, o

delete_spec

and o

delete�action

preserving all properties remain positive for

the product. For o

delete

, o

spec

, o

delete_spec

, and o

delete�action

, this implies that the product

FSA never needs to be formed, reveri�cation does not have to be done, and thus there is

no run-time cost, even for multiple agents learning autonomously. As mentioned above,

the troublesome parts of all operators are due to their o

gen

or o

add

component. In the

next section we develop methods for reducing the complexity of reveri�cation over total

reveri�cation from scratch when these operators have been applied.

6. Incremental Reveri�cation

Recall that operators o

spec

and o

delete

cannot cause problems with the safety of learning,

whereas o

gen

and o

add

are risky (i.e., are not a priori guaranteed to be SMLs). Furthermore,

o

gen

and o

add

can cause problems when they are the second step in a two-step operator.

Fortunately, we have developed incremental reveri�cation algorithms for these operators

that can signi�cantly decrease the time complexity over total reveri�cation from scratch.

Recall that there are two ways that operators can alter accessibility: globally (G) or

locally (L). Furthermore, recall that o

add

can increase accessibility either way (" G " L),

whereas o

gen

can only increase accessibility locally (6" G " L). We say that o

gen

has only

124


a \localized" e�ect on accessibility, whereas the e�ects of o

add

may ripple through many

parts of the FSA. The implication is that we can have very e�cient incremental methods

for reveri�cation tailored for o

gen

, whereas we cannot do likewise for o

add

. In other words, a

more localized e�ect on accessibility implies that it is easier to localize reveri�cation to gain

speed. This is also true for both two-step operators that have o

gen

as their second step, i.e.,

o

delete+gen

and o

spec+gen

are amenable to incremental (localized) reveri�cation. Because

no advantage is gained by considering o

add

per se, we develop incremental reveri�cation

algorithms for the most general operator o

change

. These algorithms apply to o

add

and all

other special cases of o

change

.

We have developed two types of incremental reveri�cation algorithms: those that follow

the application of o

gen

, and those that follow the application of o

change

. For each of our

learning operators, one or more of these algorithms is applicable. Before presenting the

incremental algorithms, Subsection 6.1 presents two algorithms for total reveri�cation from

scratch, namely, one for Invariance properties and the other for all properties expressible as

FSAs, as well as an algorithm for taking the tensor product of the FSAs. These algorithms

apply to SIT

1agent

, SIT

1plan

, or SIT

multplans

. Subsection 6.2 gives incremental versions of

all the algorithms in Subsection 6.1. These algorithms are applicable when the learning

operator is o

change

or any of its special cases. Furthermore, they apply to any of SIT

1agent

,

SIT

1plan

, or SIT

multplans

. Subsection 6.3 has incremental algorithms for SIT

1agent

and

SIT

1plan

, learning operator o

gen

, and Invariance and full Response properties in particular.

The section concludes with theoretical and empirical results comparing the time complexity

of the incremental algorithms with the time complexity of the corresponding total version

(as well as with each other).

The goal in developing all of the incremental reveri�cation algorithms is maximal e�-

ciency. These algorithms make the assumption that S j= P prior to learning, which means

that any errors found on previous veri�cation(s) have already been �xed. Then learning oc-

curs (o(S) = S

0

), followed by the incremental reveri�cation algorithm (see Figure 1). Next

let us consider the soundness and completeness of the algorithms, where we assume normal

termination. All of the incremental reveri�cation algorithms presented here are sound (i.e.,

whenever they conclude that reveri�cation succeeds, it is in fact true that S

0

j= P ) for

\downstream" properties and \directionless" properties for which the negation is express-

ible as a B�uchi FSA. Downstream properties (which include Response) check sequences of

events in temporal order, e.g., whether every p is followed by a q. In contrast, \upstream"

properties check for events in reverse temporal order, e.g., whether every q is preceded by

a p.

10

Directionless properties, such as Invariance, impose no order for checking. Some of

the incremental algorithms are also complete, i.e., whenever they conclude that reveri�ca-

tion fails, it is in fact true that S

0

6j= P . (The reader should avoid confusing \complete

algorithm" with \complete FSA.")

When reveri�cation fails, it does so because of one or more errors, where an \error"

implies there is a property violation (S

0

6j= P ). There are two ways to resolve such errors.

Either return to the prelearning FSA(s) and choose another learning operator and reverify

again, or keep the results of learning but repair the FSA(s) in some other way to �x the

error. With one exception, the complete algorithms in this section �nd all true errors

10. William Spears (personal communication) identi�ed the upstream/downstream distinction as being rel-

evant to the applicability of the incremental algorithms described here.

125

Gordon

crt crr crp cdt cdr cdp drt drr drp ddt ddr ddp

T R 0 P T 0 T R 0 P T 0 T

R 0 T 0 0 R 0 0 T 0 0 R 0

P 0 0 R 0 0 T 0 0 R 0 0 T

Table 1: The transition function for agent L's FSA plan. The rows correspond to states

and the columns correspond to multiagent actions.

introduced by learning. The algorithms that are not complete may also �nd false errors.

Any algorithm that �nds all and only true errors can resolve these errors in either of the two

ways. An algorithm that does not �nd all errors or �nds false ones requires more restricted

error resolution. In particular, it can only be used with the �rst method for resolving errors,

which consists of choosing another learning operator. The algorithms that are sound but not

complete (can �nd false errors) are overly cautious. In other words, they may recommend

avoiding a learning operator when in fact the operator may be safe to apply.

Before presenting the incremental algorithms, we �rst present algorithms for total rever-

i�cation from scratch. These algorithms do not assume that learning has occurred, and they

apply to all situations. They are more general (not tailored for learning), but less e�cient,

than our incremental algorithms.

6.1 Product and Total (Re)veri�cation Algorithms for All Situations

For implementation e�ciency, all of our algorithms assume that FSAs are represented using

a table of the transition function �(v

1

; a) = v

2

, which means that for state v

1

, taking action

a leads to next state v

2

, as shown in Table 1. Rows correspond to states and columns

correspond to multiagent actions. This representation is equivalent to the more visually

intuitive representation of Figures 2 and 3. In particular, Table 1 is equivalent to the FSA

in Figure 3 for the lander agent L. In Table 1, states are abbreviated by their �rst letter,

and the multiagent actions are abbreviated by their �rst letters. For example, \crt" means

agent F takes action (F-collect), I takes (I-receive), and L takes (L-transmit). The table

consists of entries for the next state, i.e., it corresponds to the transition function. A \0" in

the table means that there is no possible transition for this state-action pair. One situation

in which this occurs is when an action is not allowed from a state. Consider an example

use of the table format for �nite-state automata. According to the �rst (upper leftmost)

entry in Table 1, if L is in state TRANSMITTING (\T") and F takes action F-collect, I

takes I-receive, and L takes L-transmit (which together is multiagent action \crt"), then

L will transition to its RECEIVING (\R") state, i.e., �(T, crt) = R. With this tabular

representation, o

change

is implemented as a perturbation (mutation) operator that changes

a table entry to another randomly chosen value for the next state. Operator o

gen

is a

perturbation operator that changes a 0 entry to a next state already appearing in that row.

For example, generalizing the transition condition along edge (T,R) can be accomplished

by changing one of the 0s to an R in the �rst row of Table 1. This is because the transition

condition associated with edge (T,R) is the set of all multiagent actions that transition from

126


Suppose there are n agents, and 1 � j

k

� the number of states in the FSA for agent k.

Then the algorithm forms all product states v = (v

j

1

; ::; v

j

n

) and speci�es their transitions:

Procedure product

for each product state v = (v

j

1

; ::; v

j

n

) do

if all v

j

k

; 1 � k � n, are initial states, then v is a product initial state

endif

for each multiagent action a

i

do

if (�(v

j

k

; a

i

) == 0) for some k, 1 � k � n, then �(v; a

i

) = 0

else �(v, a

i

) = (�(v

j

1

, a

i

),..., �(v

j

n

, a

i

)); endif

endfor

endfor

end procedure

Figure 11: Total

prod

product algorithm.

T to R, i.e., fcrt, drtg in Table 1. This set is expressed in Boolean algebra as (I-receive ^

L-transmit) (see Figure 3).

For SIT

1plan

or SIT

multplans

, prior to veri�cation the multiagent product FSA S needs to

be formed from the individual agent FSAs (see Figure 1). We can implement the algorithm

Total

prod

for generating the product FSA using the data structure of Table 1 as shown in

Figure 11. In the product FSA, an example product state and transition is �(CRT, drt)

= DDR because �(C, drt) = D, �(R, drt) = D, and �(T, drt) = R for agents F, I, and L,

respectively. The initial states of the product FSA are formed by testing whether every

individual state of the product is an initial state. For example, if D, D, and R are initial

states for F, I, and L, respectively, then DDR will be an initial state for F I L. After

forming the product states and specifying which are initial, the algorithm of Figure 11

speci�es the � transition for every product state and multiagent action.

Note that the algorithm in Figure 11 forms the product FSA S for testing Invariance

properties. To test First-Response properties using AT veri�cation, we need to form the

product FSA S:P . To do this simply requires considering :P to be the (n+1)st agent.

The algorithm in Figure 11 is modi�ed by changing n to n + 1 everywhere. It is also

important to note that in all situations (including SIT

1agent

), Total

prod

must be executed

to form the product S :P if AT veri�cation is to be done. In SIT

1agent

, S is just the

single agent FSA and n is 1. For SIT

1plan

, n = 1 also. In other words, for SIT

1plan

the

multiagent plan, once formed, is never subdivided and therefore it could be considered like

a single agent plan. In both of these cases, if AT veri�cation is done the product is taken

of the single plan FSA and the property FSA.

Given that the product FSA has been formed if needed, then the �nal (multi)agent FSA

can be veri�ed. We �rst consider a very simple model checking algorithm, called Total

I

,

tailored speci�cally for verifying Invariance properties of the form 2:p. The algorithm,

shown in Figure 12, consists of a depth-�rst search of S beginning in each initial state. Any

accessible atom a

i

that is part of a transition condition, where a

i

� p, violates the property.

(We store the set of all atoms a

i

� p for rapid access.)

127

Gordon

Procedure verify

for each state v 2 V (S) do

visited(v) = 0

endfor

for each initial state v 2 I(S) do

if (visited(v) == 0) then dfs(v); endif

endfor

end procedure

Procedure dfs(v)

visited(v) = 1;

for each atom a

i

2 �(K), a

i

� p, do

if �(v; a

i

) 6= 0 then print \Veri�cation error"; endif

endfor

for each atom a

i

2 �(K) set w = �(v; a

i

) and do

if (w 6= 0) and (visited(w) == 0) then dfs(w); endif

endfor

end procedure

Figure 12: Total

I

veri�cation algorithm.

Next we consider an algorithm for verifying any property whose negation is expressible

as a B�uchi FSA, including First-Response properties. The reader may wish to review the

high-level description of this AT model checking algorithm presented in Subsection 3.3 before

continuing. Figure 13 gives a basic version of this algorithm from Courcoubetis et al. (1992)

and Holzmann et al. (1996).

11

We call this algorithm Total

AT

because it is total automata-

theoretic veri�cation. Recall that in AT model checking, the property is represented as an

FSA, and asking whether S j= P is equivalent to asking whether L(S) � L(P ) for property

P . This is equivalent to L(S) \ L(P ) = ;, which is algorithmically tested by taking the

tensor product of the plan FSA and the FSA corresponding to :P . If L(S :P ) = ; then

L(S) � L(P ), i.e., S j= P and veri�cation succeeds; otherwise, S 6j= P and veri�cation fails.

The algorithm of Figure 13 assumes that the negation of the property (:P ) is expressed as

a B�uchi automaton and the FSA being searched is S :P .

Algorithm Total

AT

, in Figure 13, actually checks whether S 6j= P for any property P .

To check if S 6j= P , we can determine whether L(S :P ) 6= ;. This is true if there is some

\bad" state in B(S :P ) reachable from an initial state and reachable from itself, i.e.,

part of an accessible cycle and therefore visited in�nitely often. The algorithm of Figure 13

performs this check using a nested depth-�rst search on the product FSA S:P . The �rst

depth-�rst search begins at initial states and visits all accessible states. Whenever a state

s 2 B(S :P ) is discovered, it is called a \seed," and a nested search begins to look for a

cycle that returns to the seed. If there is a cycle, this implies the B(S :P ) (seed) state

can be visited in�nitely often, and therefore the language is nonempty (i.e., there is some

action sequence in the plan that does not satisfy the property) and veri�cation fails.

11. This algorithm is used in the well-known Spin system (Holzmann, 1991). A modi�cation was made to

the published algorithm for readability, as well as for e�ciency, for the case where it is desirable to halt

after the �rst veri�cation error. This modi�cation makes the nested call �rst in procedure dfs.

128


Procedure verify

for each state v 2 V (S :P ) do

visited(v) = 0

endfor

for each initial state v 2 I(S :P ) do


endfor

end procedure

Procedure dfs(v)

visited(v) = 1;

if v 2 B(S :P ) then

seed = v;


visited2(v) = 0

endfor

ndfs(v)

endif

for each successor (i.e., next state) w of v do

if (visited(w) == 0) then dfs(w); endif

endfor

end procedure

Procedure ndfs(v) /* the nested search */

visited2(v) = 1;


if (w == seed) then print \Bad cycle. Veri�cation error";

break

else if (visited2(w) == 0) then ndfs(w); endif

endif

endfor

end procedure

Figure 13: Total

AT

veri�cation algorithm.

129

Gordon

Suppose there are n agents, and agent i was modi�ed, 1 � i � n.

Operator o

change

modi�ed �(v

i

, a

adapt

) to be w

i

0

for state v

i

and multiagent action a

adapt

.

1 � j

k

� the number of states in the FSA for agent k.

Then the algorithm is:

Procedure product


j

1

; :::; v

i

; :::; v

j

n

) formed from state v

i

do

if (�(v

j

k

; a

adapt


adapt

) = 0

else �(v, a

adapt

) = (�(v

j

1

, a

adapt

),..., w

i

0

, ..., �(v

j

n

, a

adapt

)); endif

endfor

end procedure

Figure 14: Inc

prod

product algorithm.

Total

I

and Total

AT

are sound and complete (for any property whose negation is express-

ible as a B�uchi FSA), and they �nd all veri�cation errors. Before elaborating on this, �rst

note that the term \veri�cation error" has a di�erent connotation for Total

I

and Total

AT

.

For Total

I

an error is an accessible bad atom (i.e., an atom a � p where the property is

2:p). For Total

AT

it is an accessible bad state that is part of a cycle. The reason Total

I

is sound is that it ags as errors only those atoms a � p. It is complete and �nds all errors

because it does exhaustive search and testing of all accessible atoms. Total

AT

is also sound

and complete, for analogous reasons. Because Total

I

and Total

AT

�nd all errors, they can

be used with either method of error resolution (i.e., choose another operator or �x the FSA).

6.2 Incremental Algorithms for o

change

and All Situations

All of the algorithms in the previous subsection can be streamlined given that it is known

that a learning operator (in this case, o

change

) has been applied. For simplicity, all of our

algorithms assume o

change

is applied to a single atom (multiagent action). For example,

we assume that if �(v

i

; a

adapt

) = w

i

, then o

change

(�(v

i

; a

adapt

)) = w

i

0

where w

i

and w

i

0

are

states (or 0, implying no next state), and a

adapt

is a multiagent action. Since we use the

tabular representation, this translates to changing one table entry.

Figure 14 shows an incremental version of Total

prod

, called Inc

prod

, which is tailored for

re-forming the product FSA after o

change

has been applied. The algorithm of Figure 14 is

for Invariance properties; for AT veri�cation change n to n+1 in the algorithm and assume

:P is the (n+ 1)st agent. Although Inc

prod

is applicable in all situations when taking the

product with the property FSA, the primary motivation for developing this algorithm was

the multiagent SIT

multplans

. Recall that in this situation, every time learning is applied to

an individual agent FSA, the product must be re-formed to verify global properties. The

wasted cost of doing this motivated the development of this algorithm.

Algorithm Inc

prod

assumes that the product was formed originally (before learning)

using Total

prod

. Inc

prod

capitalizes on the knowledge of what (individual or multiagent)

state (v

i

) and multiagent action (a

adapt

) transition to a new next state as speci�ed by

operator o

change

. This algorithm assumes that the prelearning product FSA is stored. Then

the only product FSA states whose next state needs to be modi�ed are those states that

130


Procedure product

I(S) = ;;


j

1

; :::; v

i

; :::; v

j

n

) formed from state v

i

do

if visited(v) then I(S) = I(S) [ fvg; endif

if (�(v

j

k

; a

adapt


adapt

) = 0

else �(v, a

adapt

) = (�(v

j

1

, a

adapt

),..., w

i

0

,..., �(v

j

n

, a

adapt

)); endif

endfor

end procedure

Figure 15: Inc

prod�NI

product algorithm: a variation of Inc

prod

that gets new initial states.

include v

i

and transition on action a

adapt

. The method for reveri�cation that is assumed to

follow Inc

prod

is total reveri�cation, i.e., Total

I

or Total

AT

.

Next, consider another pair of product and reveri�cation algorithms that is expected

to be, overall, potentially even more e�cient. The goal is to streamline reveri�cation after

o

change

. This requires a few simple changes to the algorithms. The motivation for these

changes is that when model checking downstream properties, o

change

has only \downstream

e�ects," i.e., it only a�ects the accessibility of vertices and atoms altered by o

change

or those

that would be visited by veri�cation after those altered by o

change

.

Consider the changes. We start by building a set of the Cartesian product states v =

(v

j

1

; :::; v

i

; :::; v

j

n

) that are formed from the state v

i

that was a�ected by learning. The

�rst way that we can shorten reveri�cation is by using these states as the new initial states

for reveri�cation. In fact, we need only select those that were visited during the original

veri�cation (i.e., are accessible from the original initial states). In other words, suppose

for agent i, o

change

modi�ed �(v

i

; a

adapt

). Then we reinitialize the set of initial states to be

; and add all product states formed from v

i

that were marked \visited" during previous

veri�cation. This can be done by modifying the product algorithm of Figure 14 as shown in

Figure 15. The algorithm of Figure 15 is to form the product FSA for verifying Invariance

properties. To form the product for AT veri�cation, substitute I(S :P ) for I(S) and

(n + 1) for n in Figure 15. We call this incremental product algorithm Inc

prod�NI

, where

\NI" denotes the fact that we are getting new initial states.

The second way to streamline reveri�cation is by only considering a transition on ac-

tion a

adapt

, the action whose � value was modi�ed by learning, from these new initial

states. Thereafter, incremental reveri�cation proceeds exactly like total (re)veri�cation.

With these changes, Total

I

becomes Inc

I�NI

, shown in Figure 16. Likewise, with these

changes Total

AT

becomes Inc

AT�NI

, as shown in Figure 17. Figure 17 shows only changes

to procedures dfs and verify; ndfs is the same as in Figure 13. One �nal streamlining added

to Inc

I�NI

, but not Inc

AT�NI

, is that only the new initial states have \visited" reinitialized

to 0. This can be done for Invariance properties because they are not concerned with the

order of atoms in strings.

12

12. Suppose o

change

adds a new edge (v

1

; v

3

). If v

3

was visited on previous veri�cation of an Invariance

property (from a state other than v

1

), then all atoms that can be visited after v

3

would already have

been tested for the property. On the other hand, when testing First-Response properties the order of

131

Gordon

Procedure verify

for each new initial state v 2 I(S) do

visited(v) = 0

endfor

for each new initial state v 2 I(S) do


endfor

end procedure

Procedure dfs(v)

visited(v) = 1;

if v 2 I(S) and w 6= 0, where w = �(v; a

adapt

), then

if (a

adapt

� p) then print \Veri�cation error"; endif


else

for each atom a

i

2 �(K), a

i

� p, do

if �(v; a

i

) 6= 0 then print \Veri�cation error"; endif

endfor

for each atom a

i

2 �(K) set w = �(v; a

i

) and do

if (w 6= 0) and (visited(w) == 0) then dfs(w); endif

endfor

endif

end procedure

Figure 16: Inc

I�NI

reveri�cation algorithm.

Inc

I�NI

is sound for Invariance properties, and Inc

AT�NI

is sound for any downstream

or directionless property whose negation is expressible as a B�uchi FSA, including First-

Response and Invariance. Assuming S j= P prior to applying o

change

to form S

0

, if these

incremental reveri�cation algorithms conclude that S

0

j= P , then total reveri�cation would

also conclude that S

0

j= P . Recall that total reveri�cation is sound. Therefore, the same

is true for these incremental algorithms. Furthermore, these incremental reveri�cation al-

gorithms will �nd all of the new violations of the property introduced by o

change

. The

reason the algorithms are sound and �nd all new errors (for downstream or directionless

properties) is that there are only two ways that accessibility can be modi�ed by any of our

learning operators, including o

change

: locally or globally. Recall that local change alters the

accessibility of atom a

adapt

or the state �(v

i

; a

adapt

), and a global change alters the acces-

sibility of states or atoms that would be visited after �(v

i

; a

adapt

). In neither case (local

or global) will the learning operator modify accessibility of atoms or states visited before,

but not after, a

adapt

. Our algorithms reverify exhaustively (i.e., they reverify as much as

total reveri�cation would) for all atoms and states visited at or after a

adapt

. Since these

incremental algorithms perform reveri�cation exactly the same way as their total versions

atoms is relevant. Even if v

3

was previously visited, since it might not have been visited from v

1

, the

addition of (v

1

; v

3

) could add a new string with a new atom order that might violate the First-Response

property. Therefore, v

3

needs to be revisited for First-Response properties, but not for Invariance

properties.

132


Procedure verify


visited(v) = 0

endfor

for each new initial state v 2 I(S :P ) do


endfor

end procedure

Procedure dfs(v)

visited(v) = 1;

if v 2 B(S :P ) then

seed = v;


visited2(v) = 0

endfor

ndfs(v)

endif

if v 2 I(S :P ) and w 6= 0 and (visited(w) == 0),

where w = �(v; a

adapt

), then dfs(w)

else



endfor

endif

end procedure

Figure 17: Procedures verify and dfs of the Inc

AT�NI


do after the part of the FSA that was modi�ed by learning, they will �nd all new errors

introduced by learning.

Inc

I�NI

is complete for Invariance properties because it ags errors using the same

method as Total

I

, and because Invariance properties are directionless and are therefore

impervious to the location of atoms in a string. On the other hand, Inc

AT�NI

is not

complete for all downstream properties. For example, it is not complete for properties

that check for the first occurrence of a pattern in a string, e.g., First-Response properties.

Because Inc

AT�NI

does not identify whether the new initial states are before or after the

�rst occurrence, there is no way to know if the �rst occurrence is being checked after learning.

Nevertheless, this lack of completeness for First-Response properties actually turns out to

be a very useful trait, as we will discover in Subsection 6.5.

6.3 Incremental Algorithms for o

gen

and SIT

1agent=1plan

We next present our �nal two incremental reveri�cation algorithms, which are applicable

only in SIT

1agent

and SIT

1plan

, when there is one FSA to reverify. These are powerful

algorithms in terms of their capability to reduce the complexity of reveri�cation. However,

their soundness relies on the assumption that the learning operator's e�ect on accessibility

133

Gordon

Procedure check-invariance-property

if v

1

was not previously visited, then output \Veri�cation succeeds."

else

if (z j= :p) then output \Veri�cation succeeds."

else output \Avoid this instance of o

gen

."; endif

endif

end procedure

Figure 18: Inc

gen�I


is localized, i.e., that it is o

gen

with SIT

1agent

or SIT

1plan

but not SIT

multplans

(where

o

gen

might become o

add

). An important advantage of these algorithms is that they never

require forming a product FSA, not even S :P , regardless of whether the property is

type Response. The algorithms gain e�ciency by being both tailored to a speci�c property

type and to a speci�c learning operator. The objective in developing these algorithms was

maximal e�ciency and therefore they sacri�ce completeness and/or the ability to �nd all

errors.

These two incremental algorithms are tailored for reveri�cation after operator o

gen

.

Assume that property P holds for S prior to learning, i.e., S j= P . Now we generalize the

transition condition M

K

(v

1

; v

3

) = y to form S

0

via o

gen

(M

K

(v

1

; v

3

)) = y _ z, where y ^ z

= 0. We want to verify that S

0

j= P .

One additional de�nition is needed before presenting our algorithms. We previously

de�ned what it means for a c-state formula p to be true at a c-state, but to simplify

the algorithms we also de�ne what it means for a c-state formula to be true of a transition

condition. A c-state formula p is de�ned to be true of a transition condition y, i.e., \y j= p,"

if and only if y � p (which can be implemented by testing whether for every atom a � y,

a � p.)

Let us begin with the algorithm Inc

gen�I

(which consists of two very simple tests)

tailored for o

gen

and Invariance properties, shown in Figure 18. Recall that M

K

(v

1

; v

3

) = y

and o

gen

(M

K

(v

1

; v

3

)) = y _ z. Inc

gen�I

, which tests \z j= :p," localizes reveri�cation to a

restricted portion of the FSA. (For e�ciency, z j= :p is implemented as a test for z � p

rather than z � :p because p is typically expected to be more succinct than :p.) Assume

the Invariance property is P = 2:p and S j= P . Then every string x in L(S) satis�es

Invariance property P , so for each x, :p is true of every atom in x. This implies y j= :p.

This statement is based on our assumption that v

1

is accessible from an initial state. If not,

reveri�cation is not needed. The generalization will not violate P . Therefore, the algorithm

begins by testing whether v

1

was visited on previous veri�cation. If not, the output is

\success." (Note that o

gen

does not alter the accessibility of v

1

.)

Inc

gen�I

is sound and complete for Invariance properties. Generalization of M

K

(v

1

; v

3

)

is application of o

gen

(M

K

(v

1

; v

3

)) = y_z to form S

0

. This operator o

gen

preserves Invariance

property P if and only if S

0

j= P , which is true if and only if z j= :p. The reason for this

is that we know S satis�es P from our original veri�cation, and therefore :p is true for all

atoms in all strings in L(S). The only possible new atoms in L(S

0

) but not in L(S) are

in z. If z j= :p, then :p is true for all atoms in L(S

0

), which implies that every string in

134


Procedure check-response-property

if y j= q then

if (z j= q and z j= :p) then output \Veri�cation succeeds."


gen

"; endif

else

if (z j= :p) then output \Veri�cation succeeds."


gen

"; endif

endif

end procedure

Figure 19: Inc

gen�R


L(S

0

) satis�es P . In other words, S

0

j= P . Therefore, Inc

gen�I

is sound. We also know

that it is complete because if 9a, a � z, a 6� p, then it must be the case that S

0

6j= P .

In conclusion, Inc

gen�I

, which consists of the test \z j= :p," is sound and complete. For

maximal e�ciency, our implementation of Inc

gen�I

halts after the �rst error, although it

is simple to modify it to �nd all errors (and this does not signi�cantly a�ect the empirical

time complexity results of Subsection 6.5, nor does it a�ect the worst-case time complexity).

Inc

gen�I

is incremental because it is localized to just checking whether the property holds

of the newly added atoms in z, rather than all atoms in L(S

0

). Finally, this algorithm only

needs to be executed for o

gen

, but not for o

spec+gen

or o

delete+gen

, because o

gen

is the only

version that can add new atoms via generalization. Recall that o

spec+gen

and o

delete+gen

are

SMLs for Invariance properties.

As an example of Inc

gen�I

, suppose a, b, c, d, and e are atoms, and the transition

condition y between STATE1 and STATE2 equals a. Let (a, b, b, d, d,...), where the

ellipsis indicates in�nite repetition of d, be a string in L(S) that includes STATE1 and

STATE2 as the �rst two vertices in its accepting run. The property is P = 2: e. Assume

the fact that this string satis�es : e was proved in the original veri�cation. Suppose o

gen

generalizes M

K

(STATE1, STATE2) from a to (a _ c) (i.e., it adds a new allowable action

c from STATE1), which adds the string (c, b, b, d, d,...) to L(S

0

). Then rather than test

whether all of the elements of f a, b, c, d g are � : e, we really only need to test whether

c � : e, because c is the only newly added atom.

The next algorithm, Inc

gen�R

, is for generalization and full Response properties (and is

nothing more than some simple tests). Like Inc

gen�I

, Inc

gen�R

localizes reveri�cation to a

restricted portion of the FSA. Assume the Response property is P = 2(p! 3q), where p

is the trigger and q is the response for c-state formulae p and q. Assume property P holds

for S prior to learning (S j= P ). Now we generalize M

K

(v

1

; v

3

) = y to form S

0

by applying

o

gen

(M

K

(v

1

; v

3

)) = y _ z, where y ^ z = 0. We need to verify that S

0

j= P .

Inc

gen�R

for o

gen

and full Response properties is in Figure 19. (Inc

gen�R

is also appli-

cable for o

delete+gen

and o

spec+gen

.) The algorithm �rst checks whether a response could be

required of the transition condition M

K

(v

1

; v

3

). A response is required if, for at least one

string in L(S) whose run includes (v

1

; v

3

), the pre�x of this string before visiting vertex

v

1

includes the trigger p not followed by response q, and the string su�x after v

3

does not

include q either. Such a string satis�es the property if and only if y j= q. Thus if y j= q

135

Gordon

and the property is true prior to learning (i.e., for S), then it is possible that a response is

required. In this situation (i.e., y j= q), the only way to be sure we are safe (S

0

j= P ) is if

the newly added condition z also has the response, i.e., z j= q. If not, then there could be

new strings in L(S

0

) whose accepting runs include (v

1

; v

3

) but do not satisfy the property.

For example, suppose a, b, c, and d are atoms, and the transition condition y between

STATE4 and STATE5 equals d. Let x = (a, b, b, d, ...) be a string in L(S) that includes

STATE4 and STATE5 as the fourth and �fth vertices in its accepting run. The property is

P = 2 (a ! 3 d), and therefore y j= q and x j= P . Suppose o

gen

generalizes M

K

(STATE4,

STATE5) from d to (d _ c), where z is c, which adds the string x

0

= (a, b, b, c, ...) to

L(S

0

). Then z 6j= q. If the string su�x after (a, b, b, c) does not include d, then there is

now a string that includes the trigger but does not include the response. In other words,

x

0

6j= P . Finally, if y j= q and z j= q, an extra check is made to be sure z j= :p because an

atom could be both a response and a trigger. New triggers should be avoided.

The second part of the algorithm states that if y 6j= q and no new triggers are introduced

by generalization, then the operator is \safe" to do. It is guaranteed to be safe (S

0

j= P )

in this case because if y 6j= q, then a response cannot be required here. In other words,

because S j= P , for every string in L(S) whose accepting run includes (v

1

; v

3

), either no

trigger occurred prior to visiting v

1

, or every trigger was followed by a response prior to

visiting v

1

, or a response occurred after visiting v

3

.

Inc

gen�R

is sound but not complete for full Response properties. Its soundness is based

on the fact that o

gen

does not increase accessibility of vertices or atoms visited after state

v

3

(i.e., globally) and therefore reveri�cation can be localized to only M

K

(v

1

; v

3

). Inc

gen�R

is not complete because it may output \Avoid this instance of o

gen

" when in fact o

gen

is safe to do. For example, if y j= q but z 6j= q, the algorithm will output \Avoid this

instance of o

gen

." Yet it may be the case that S

0

j= P if no trigger p precedes response q in

L(S

0

), or if a response is after v

3

. When Inc

gen�R

outputs veri�cation failure, it does not

supply su�cient information for FSA repair. Errors must be resolved by selecting another

learning operator. Note that \error" has a di�erent connotation for Inc

gen�R

than for the

AT veri�cation algorithms. Any \Avoid..." output is considered an error.

Another disadvantage of Inc

gen�R

is that it does not allow generalizations that add

triggers. If it is desirable to add new triggers during generalization, then one needs to modify

Inc

gen�R

to call Inc

AT

when reveri�cation with Inc

gen�R

fails, instead of outputting \Avoid

this instance of o

gen

." This modi�cation also �xes the false error problem, and preserves

the enormous time savings (see Section 6.5) when reveri�cation succeeds.

6.4 Theoretical Worst-Case Time Complexity Analysis

Recall that one of our primary objectives is timely agent responses. This section compares

the worst-case time complexity of the algorithms. Let us begin with the time complexity

of Total

prod

. This is O((

Q

n

i=1

jV (S

i

)j) � j�(K)j � n) to form the product of the individual

agent FSAs for Invariance property veri�cation, and O((

Q

n

i=1

jV (S

i

)j) � jP j � j�(K)j � n) to

form the product for AT veri�cation. Here n is the number of agents, jV (S

i

)j is the number

of states in single agent FSA S

i

, jP j is the number of states in the property FSA P , and

j�(K)j is the total number of atoms (multiagent actions). The reason for this complexity

result is that there are

Q

n

i=1

jV (S

i

)j product states for Invariance property veri�cation, and

136


(

Q

n

i=1

jV (S

i

)j) � jP j product states for AT veri�cation. The outer loop of Total

prod

iterates

through all product states. The inner loop of Total

prod

iterates through all j�(K)j atoms.

Note that (

Q

n

i=1

jV (S

i

)j)�j�(K)j and (

Q

n

i=1

jV (S

i

)j)�jP j�j�(K)j are the sizes of the product

FSA transition function tables built for Total

I

and Total

AT

, respectively. Total

prod

does

at most n lookups for each table entry.

By comparison, our incremental algorithm Inc

prod

for generating the product FSA has

time complexity O((

Q

n�1

1=1

jV (S

i

)j) �n) or O((

Q

n�1

i=1

jV (S

i

)j) � jP j �n) to modify the product

FSA for Invariance property reveri�cation or AT reveri�cation, respectively. This is because

the total number of revised product states is (

Q

n�1

i=1

jV (S

i

)j) or (

Q

n�1

i=1

jV (S

i

)j) � jP j, and

only one atom is considered (because we assume o

change

changes the next state for a single

atom a

adapt

). The time complexity of Inc

prod�NI

is the same as that of Inc

prod

.

Next consider the worst-case time complexity of total (re)veri�cation after the product

has been formed. It is O((

Q

n

i=1

jV (S

i

)j) � j�(K)j) for Total

I

. This is because, in the

worst case, every product state is accessible and therefore every entry in the product FSA

transition function table is visited. Assuming jBj is the number of \bad" (in the B�uchi sense)

states in the product FSA, then the worst-case time complexity of Total

AT

is O((jBj+1) �

(

Q

n

i=1

jV (S

i

)j) � jP j � j�(K)j). This is because, in the worst case, every entry in the product

FSA transition function table is visited once on the depth-�rst search and, for each bad

state, again on the nested depth-�rst search. Unfortunately, the worst-case time complexity

of Inc

I�NI

and Inc

AT�NI

are the same as that of Total

I

and Total

AT

, respectively. This

is because, in the worst case, every product state is still accessible. The restriction to

transition only on a

adapt

at �rst does not reduce the \big O" complexity.

Finally, we consider the worst-case complexity of Inc

gen�I

and Inc

gen�R

. First, we

de�ne for any Boolean expression x, jxj is the number of elements in fa j a 2 �(K) and a �

xg. For Invariance properties P of the form 2:p, jP j equals jpj since we test for each atom a

whether a j= p rather than a j= :p, because we expect jpj < j:pj in general. Then Inc

gen�I

requires time O(jzj � jpj) to determine whether z j= :p. (Checking whether v

1

was visited

requires constant time.) Assuming jpj <

Q

n

i=1

jV (S

i

)j (which should be true except under

bizarre circumstances), and since jzj � �(K), Inc

gen�I

saves time over Total

I

. Inc

gen�R

requires time O((jyj � jqj) + (jzj � (jpj + jqj))) to determine whether y j= q, and then to

determine whether z j= q and z j= :p.

13

Clearly (jyj+jzj) � j�(K)j because by the de�nition

of o

gen

, y ^ z = 0. Therefore, assuming (jpj+ jqj) < ((jBj+1) � j

Q

n

i=1

jV (S

i

)j � jP j) (which,

again, should be true except in bizarre circumstances), the worst-case time complexity of

Inc

gen�R

is lower than that of Total

AT

.

6.5 Empirical Time Complexity Comparisons

Worst-case time complexity is not always a useful measure. Therefore we supplement the

worst-case analyses with empirical results on cpu time. Our primary objective in these

experiments is to compare the incremental algorithms with total reveri�cation, as well as

with each other, for the context of evolving behaviorally correct FSAs. The time required

13. Determining whether z j= :p can be done by determining whether z � p. Also, for Inc

gen�R

, an

additional time O(j�(K)j � (jyj+ jzj)) is needed to identify y when using the representation of Table 1.

This does not a�ect our complexity comparisons or conclusions.

137

Gordon

for reveri�cation is signi�cant to address if we want timely agent responses, because rever-

i�cation occurs after every learning operator application.

Before describing the experimental results, let us consider the experimental methodol-

ogy. All code was written in C and run on a Sun Ultra 10 workstation. In our experiments,

FSAs were randomly initialized, subject to certain restrictions. The reason for randomness

is that this is a typical way to initialize individuals in a population for an evolutionary

algorithm. There are two restrictions on the FSAs. First, although determinism and com-

pleteness of FSAs are execution, rather than veri�cation, issues and therefore need not

be enforced for these experiments, our choice of tabular representation of the FSAs (see

Table 1) restricts the FSAs to being deterministic. Second, because the incremental algo-

rithms assume S j= P prior to learning, we restrict the FSAs to comply with this. There

are two alternative methods for enforcing this in the experiments: (1) use sparse FSAs (i.e.,

with many 0s) and keep generating new FSAs until total veri�cation succeeds (which does

not take long with sparse FSAs), or (2) use dense FSAs engineered to guarantee property

satisfaction. In particular, dense FSAs are forced to satisfy Invariance properties 2:p by

inserting 0s in every column of the transition function table (such as Table 1) labeled with

an atom a � p. Dense FSAs are forced to satisfy First-Response properties with trigger p

and response q by inserting 0s in every column labeled with an atom a � p. This eliminates

triggers initially. Note that either of these methods is a viable way to initialize a population

of FSAs for evolution because it ensures early success in satisfying the property. This paper

presents only the results with dense FSAs. See Gordon (1999) for the results with sparse

FSAs.

14

Another experimental design decision was to show scaleup in the size of the FSAs.

Throughout the experiments there were assumed to be three agents, each with the same 12

multiagent actions. Each individual agent FSA had 25 or 45 states.

15

With 45 states the

transition table contains 45

3

� 12 entries.

A suite of �ve Invariance and �ve Response properties was used, which is in Appendix

C. Invariance properties were expressed by storing the set of all atoms a � p for property

2:p. This su�ces for all of our algorithms tailored for Invariance properties. For AT

veri�cation, Response properties were expressed with a First-Response B�uchi FSA for the

negation of the property. An explanation of why this is adequate for our experiments is

below. For Inc

gen�R

, trigger p, and response q, all atoms a

i

� p and a

j

� q were stored. Six

independent experiments were performed to verify each of the properties. In other words,

every reveri�cation algorithm was tested with 30 runs { six runs for each of �ve Invariance

or �ve Response properties. For every one of these runs, a di�erent random seed was used

for generating the three FSAs. However, it is important to point out that all algorithms

being compared with each other saw the same FSAs. For example, in Table 2 we compare

Inc

prod

(row 1), Inc

prod�NI

(row 4), and Total

prod

(row 7). They all input the same three

FSAs. Furthermore, the learning operator (speci�c instantiation of the operator schema)

was the same for all algorithms being compared.

14. Sparse FSAs have an additional advantage, assuming they remain relatively sparse after evolution. The

advantage is their succinctness for e�cient execution, as in multientity models (Tennenholtz & Moses,

1989).

15. The sparse FSAs had 25, 45, or 65 states. To get accurate timing results with the dense FSAs, though,

65 states required a cpu free of any interfering processes for an unreasonably long time.

138


Let us consider the results in Tables 2 and 3. In both of these tables, each row corre-

sponds to an algorithm. Rows are numbered for later reference. The entries give perfor-

mance results, to be described shortly. Table 2 compares the performance of total rever-

i�cation with the algorithms of Subsection 6.2, which were designed for o

change

and all

situations. The situation assumed for these experiments was SIT

multplans

. Three dense

random (subject to the above-mentioned restrictions) FSAs were generated, and then the

product was formed. The result was a product FSA satisfying the property. Operator

o

change

was then applied, which consisted of a random (but points to a state instead of 0)

change to a randomly chosen table entry in the FSA transition table for a random choice

of one of the three agents. Finally, the product FSA was re-formed and reveri�cation done.

The methodology for generating Table 3 was similar to that for Table 2, except that

o

gen

was the learning operator and the situation was assumed to be SIT

1plan

. In other

words, the product FSA was formed, and then o

gen

applied to the product FSA of the three

agents, the product was taken with the property FSA if needed for AT veri�cation, and

then reveri�cation performed. Operator o

gen

consisted of choosing a random state s

i

and

a random action a

i

for which �(s

i

; a

i

) = s

k

, and choosing a random action a

j

for which

�(s

i

; a

j

) = 0, and then setting �(s

i

; a

j

) = s

k

.

Any column in Tables 2 or 3 labeled \sec" gives a mean, over 30 runs, of the cpu time

of the algorithm. Columns labeled \spd" give the speedup over total, i.e., the cpu time of

the incremental algorithm in that row divided by the cpu time of the corresponding total

algorithm. For example, the \spd" entry for Inc

prod

in row 1 gives its cpu time divided

by the cpu time of Total

prod

in row 7. Columns labeled \err" show the average number

of veri�cation errors over 30 runs. This is important to monitor because, for example, the

cpu time is most strongly correlated with the number of states \visited" during dfs, and

\visited2" during ndfs when AT veri�cation is used. Every property error causes ndfs to be

called with a nested search, which may be quite time-consuming. Also, it is important to

note that we did not force any veri�cation errors to occur. It was our objective to monitor

cpu time under natural circumstances for evolving FSAs. When errors arose they were

the natural result of applying a learning operator. The \err" columns are missing from

Table 2 because the values are all 0, i.e., no errors occurred during the experiments due to

applying o

change

, although we have observed errors to occur with this operator not during

the experiments. The lack of errors in the experiments resulted from the particular random

FSAs that happened to be generated during the experiments. Errors are quite common

with the speci�c o

gen

version of o

change

, as can be seen in Table 3. Note that \N/A" is in

the \err" column for anything other than a veri�cation algorithm because \err" refers to

veri�cation errors.

The algorithms (rows) should be considered in triples \p," \v," and \b," or else as a

single item \v+b." A \p" next to an algorithm name in Table 2 or 3 denotes it is a product

algorithm, a \v" that it is a veri�cation algorithm, and a \b" that it is the sum of the \p"

and \v" entries, i.e., the time for both re-forming the product and reverifying. For example,

Inc

I

(b) is considered to be an algorithm pair consisting of Inc

prod

(p) followed by Total

I

(v) (see rows 1-3 of Table 2). If no product needs to be formed, then the \b" version of the

algorithm is identical to the \v" version, in which case there is only one row labeled \v+b."

Tables 4, 5, and 6 re-present a subset (cpu time only) of the data from Tables 2 and 3

in a format that facilitates some comparisons. In other words, Tables 4, 5, and 6 contain

139

Gordon

no new data, only reformatted data from Tables 2 and 3. In Tables 4, 5, and 6, results are

grouped by \p," \v," or \b."

Let us elaborate on one more interesting issue before listing our experimental hypotheses.

Recall that we are using a First-Response property FSA and that this FSA checks only that

the first trigger in every string is followed by a response. For our evolutionary paradigm

(with dense FSA initialization) when using Inc

AT�NI

, verifying a First-Response property

is equivalent to verifying the full Response property. The false errors found by Inc

AT�NI

due to its incompleteness are in fact violations of the full Response property.

16

Therefore

for Inc

AT�NI

, First-Response FSAs are entirely adequate for reveri�cation of full Response

properties. Because we used the evolutionary paradigm in these experiments, and because

Inc

AT�NI

found the same number of errors as Total

AT

(i.e., Inc

AT�NI

found no false

errors), for the FSAs in these experiments testing First-Response properties was equivalent

to testing full Response properties.

For our experiments, �ve hypotheses were tested:

H1: Algorithms tailored speci�cally for Invariance properties are faster than those for

AT veri�cation, because the latter are general-purpose (and the product algorithms

include an additional FSA).

H2: The incremental algorithms are faster than the total algorithms for both product and

reveri�cation. This is expected to be true because they were tailored for learning.

H3: The \NI" versions of the incremental algorithms are faster than their counterparts,

which do not �nd new initial states. This is expected because of the increase in

streamlining.

H4: Inc

gen�I

and Inc

gen�R

are the fastest of all the algorithms, because they are tailored

for a less generic learning operator (i.e., o

gen

rather than o

change

), plus they are also

tailored for one speci�c property type, and they sacri�ce �nding all errors.

H5: Inc

gen�I

and Inc

gen�R

will have the best scaleup properties. They will not take more

time as FSA size increases. This latter expectation comes from the worst-case time

complexity analysis.

Subsidiary issues we examine are the percentage of wrong predictions (for Inc

AT�NI

and

Inc

gen�R

, which are not complete algorithms), and the maximum observed speedup.

The results are the following (unless stated otherwise, look at the \sec" columns):

H1: To see the results, in Table 2 look at rows 1 through 9 and compare each row r

in this set with row r+9. In other words, compare row 1 with row 10, row 2 with

row 11, and so on. Rows 1 through 9 are algorithms for Invariance properties, and

16. The reason is the following. Dense FSA initialization creates FSAs with no triggers. A learning operator

is then applied. After learning, Inc

AT�NI

begins reveri�cation at every state from which a new trigger

could have been added by learning. Thus every trigger in the FSA will be checked to see if it is followed

by a response. At every generation of our evolutionary learning paradigm, at most one learning operator

is applied per FSA, and this is immediately followed by reveri�cation and error resolution (if needed).

Therefore every new trigger will be caught by Inc

AT�NI

and, if not followed by a response, the problem

will be immediately resolved.

140


25-state FSAs 45-state FSAs

sec spd sec spd

1 Inc

prod

p .000157 .00497 .000492 .00255

2 Total

I

v .023798 .95663 .206406 .97430

3 Inc

I

b .023955 .07023 .206898 .51133

4 Inc

prod�NI

p .000206 .00652 .000617 .00320

5 Inc

I�NI

v .000169 .00680 .000528 .00320

6 Inc

I�NI

b .000375 .00110 .001762 .00435

7 Total

prod

p .031594 1.0 .192774 1.0

8 Total

I

v .024877 1.0 .211851 1.0

9 Total

I

b .340817 1.0 .404625 1.0

10 Inc

prod

p .000493 .00507 .001521 .00259

11 Total

AT

v .021103 .98903 .177665 .96869

12 Inc

AT

b .024798 .20022 .180707 .23441

13 Inc

prod�NI

p .000574 .00590 .001786 .00304

14 Inc

AT�NI

v .009011 .37450 .090824 .49520

15 Inc

AT�NI

b .009585 .07900 .092824 .12013

16 Total

prod

p .097262 1.0 .587496 1.0

17 Total

AT

v .024062 1.0 .183409 1.0

18 Total

AT

b .121324 1.0 .770905 1.0

Table 2: Average performance over 30 runs (5 properties, 6 runs each) with operator o

change

and dense FSAs. Rows 1 through 9 are for reveri�cation of Invariance properties

and rows 10 through 18 are for AT reveri�cation of Response properties.


sec spd err sec spd err

1 Inc

gen�I

v+b .000001 4.25e-5 .20 .000002 9.75e-6 .07

2 Inc

I�NI

v+b .000002 8.51e-5 .20 .000003 1.46e-5 .07

3 Total

I

v+b .023500 1.0 .20 .205082 1.0 .07

4 Inc

gen�R

v+b .000007 7.23e-8 .73 .000006 2.09e-9 .73

5 Inc

prod�NI

p .000006 5.22e-5 N/A .000006 8.51e-6 N/A

6 Inc

AT�NI

v 94.660700 .98099 3569.33 2423.550000 .84442 12553.40

7 Inc

AT�NI

b 94.660706 .97982 N/A 2423.550006 .84421 N/A

8 Total

prod

p .114825 1.0 N/A .704934 1.0 N/A

9 Total

AT

v 96.495400 1.0 3569.33 2870.080000 1.0 12553.40

10 Total

AT

b 96.610225 1.0 N/A 2870.784934 1.0 N/A

Table 3: Average performance over 30 runs (5 properties, 6 runs each) with operator o

gen

and dense FSAs. Rows 1 through 3 are for reveri�cation of Invariance properties

and rows 4 through 10 are for reveri�cation of Response properties.

141

Gordon


1 Inc

prod

p .000157 .000492

2 Inc

prod�NI

p .000206 .000617

3 Total

prod

p .031594 .192774

4 Total

I

v .023798 .206406

5 Inc

I�NI

v .000169 .000528

6 Total

I

v .024877 .211851

7 Inc

I

b .023955 .206898

8 Inc

I�NI

b .000375 .001762

9 Total

I

b .340817 .404625

Table 4: Average cpu time (in seconds) over 30 runs with operator o

change

and �ve Invari-

ance properties. This table is a duplication of some of the material in Table 2.


1 Inc

prod

p .000493 .001521

2 Inc

prod�NI

p .000574 .001786

3 Total

prod

p .097262 .587496

4 Total

AT

v .021103 .177665

5 Inc

AT�NI

v .009011 .090824

6 Total

AT

v .024062 .183409

7 Inc

AT

b .024798 .180707

8 Inc

AT�NI

b .009585 .092824

9 Total

AT

b .121324 .770905


change

and �ve Response

properties. This table is a duplication of some of the material in Table 2.


1 Inc

gen�R

p 0 0

2 Inc

prod

p .000006 .000006

3 Total

prod

p .114825 .704934

4 Inc

gen�R

v .000007 .000006

5 Inc

AT

v 94.660700 2423.550000

6 Total

AT

v 96.495400 2870.080000

7 Inc

gen�R

b .000007 .000006

8 Inc

AT

b 94.660706 2423.550006

9 Total

AT

b 96.610225 2870.784934


gen

and �ve Response

properties. This table is a duplication of some of the material in Table 3.

142


rows 10 through 18 are algorithms for AT veri�cation. In Table 3, rows 1 through

3 are algorithms for Invariance properties, and rows 5 through 10 are algorithms for

AT veri�cation. Compare row 2 with 7, and 3 with 10. (Rows 1 and 4 cannot be

compared because row 4 has an algorithm tailored for Response properties.) Note

that these comparisons are between a \v+b" and a \b." Since \v+b" means \v" or

\b," this is a correct comparison. These results show that H1 is mostly, but

not completely, con�rmed. It is con�rmed for all results in Table 3. On the other

hand, the results are mixed for Table 2.

H2: The easiest way to compare is to examine Tables 4, 5, and 6. In these cases the

comparison is between the �rst two rows labeled \p" (or \v" or \b") versus the third

row of that same label. The reason for making these comparisons is that the �rst

two rows of a given label correspond to an incremental algorithm (except for row 4 of

Tables 4 and 5) and the third row of a given label corresponds to a total algorithm.

Alternatively, one could examine Tables 2 and 3. In Table 2, rows 1 through 6 (other

than 2) and 10 through 15 (other than 11) are incremental algorithms, and rows 2, 11,

7 through 9, and 16 through 18 are total reveri�cation algorithms. The appropriate

comparisons are between rows 1 and 7, 4 and 7, 5 and 8, 3 and 9, 6 and 9, 10 and 16,

13 and 16, 14 and 17, 12 and 18, and 15 and 18. In Table 3, rows 1, 2, and 4 through

7 are incremental algorithms, and rows 3 and 8 through 10 are total. The appropriate

comparisons are between rows 1 and 3, 2 and 3, 4 and 10, 5 and 8, 6 and 9, and 7

and 10. All results con�rm H2. The statistical signi�cance of the comparisons in

Tables 2 and 3 were tested. Using an exact Wilcoxon rank-sum test, all comparisons

relevant to hypothesis H2 in Table 2 are statistically signi�cant (p < 0:01 and, in

most cases, p < 0:0001). In Table 3, however, the di�erences between Inc

AT�NI

and

Total

AT

(both the (v) and (b) versions) are not statistically signi�cant at the p < 0:01

level. All other comparisons in Table 3 are signi�cant at the p < 0:01 level.

H3: This hypothesis does not apply to the algorithms for re-forming the product FSA

because, obviously, it will require more time to get the new initial states for the

\NI" versions. We wish to test the overall time savings of the \NI" versions, so we

concentrate on the rows labeled \b." The relevant comparisons are row 7 versus 8

in Table 4 and row 7 versus 8 in Table 5. (Alternatively, one could compare row 3

versus 6, and row 12 versus 15 in Table 2.) Each of these comparisons is between an

\NI" version and a counterpart version of the algorithm that is the same as the \NI"

version except that it does not �nd new initial states. Tables 3 and 6 are not relevant

because they only have the \NI" versions but not their counterparts. (We only saw

the need to make one comparison between all \NI" versions and their counterparts,

which is re ected in Table 2.) All results con�rm hypothesis H3. After testing

the statistical signi�cance, it is found that the results are signi�cant (p < 0:01).

H4: To determine H4 requires considering Table 3 but not Table 2. This is because we only

need to compare algorithms for which o

gen

has been applied. Compare row 1 versus 2,

1 versus 3, 4 versus 7, and 4 versus 10 to see the results. All results show Inc

gen�I

(row

1) and Inc

gen�R

(row 4) to be at least as fast as the other algorithms. Therefore H4

is con�rmed. In all cases other than Inc

gen�I

(row 1) versus Inc

I�NI

(row 2), there

143

Gordon

is a noticeable speedup. In most cases, the speedup is quite dramatic. All noticeable

speedups are statistically signi�cant (p < 0:0001).

H5: To test H5, compare the �rst \spd" column (for 25-state FSAs) with the second

column with this label (for 45-state FSAs). A more desirable scaleup shows a lower

value for \spd" as the size of the FSA increases. It implies that the ratio of the cpu

time of the incremental algorithm to the cpu time of the total algorithm decreases

more (or increases less) as the FSA size increases. One should make this two-column

comparison for rows 1 through 6 (but not 2) and 10 through 15 (but not 11) of Table 2,

and rows 1 and 2, and 4 through 7 of Table 3 because these are all the incremental

algorithms. (We don't care about the total algorithms because \spd" is, by de�nition,

always 1.0 for them.)

17

If one considers the results of algorithms appearing in both

tables (e.g., Inc

I�NI

shows di�erent scaleup properties in the two tables, but we need

to consider both sets of results), then clearly Inc

gen�I

(row 1) and Inc

gen�R

(row 4) in

Table 3 show the best scaleup of all the incremental algorithms. H5 is con�rmed. It

is apparent from the \sec" columns that the time complexity of these two algorithms

does not increase (other than minor uctuations) as FSA size increases (see Table 3).

A couple of subsidiary issues are now addressed. For one, recall that Inc

AT�NI

and

Inc

gen�R

are not complete. Therefore, it is relevant to consider the percentage of incorrect

predictions (i.e., false errors) they made. Inc

AT�NI

made none. For the results in Table 3,

33% of Inc

gen�R

's predictions were wrong (i.e., false errors) for the size 25 FSAs, and 50%

were wrong for the size 45 FSAs.

Finally, consider the maximum observable speedup. Inc

gen�R

shows a

1

2

-billion-fold

speedup over Total

AT

on size 45 FSA problems (averaged over 30 runs)! This alleviates much

of the concern about Inc

gen�R

's false error rate. For example, given the rapid reveri�cation

time of Inc

gen�R

, an agent could use it to reverify a long sequence of learning operators

culminating in one that satis�es the property in considerably less time than it takes Total

AT

to reverify one learning operator.

We conclude this section by summarizing, in Table 7, the fastest algorithm (based on

our results) for every operator, situation, and property type. In Table 7, it is assumed

that a First-Response FSA is used for AT veri�cation of Response properties. Operator

o

add�action

is omitted from this table because it is not clear at this time whether it would

be faster to apply total reveri�cation or perform multiple applications of the incremental

algorithm (one for each primitive operator application). Section 8 considers an alternative

solution as future work. In Table 7, \None" means no reveri�cation is required, i.e., the

learning operator is a priori guaranteed to be an SML for this situation and property class.

7. Related Work

There has been a great deal of recent research on model checking, and even on model

checking of distributed systems (Holzmann, 1991). Nevertheless, there is very little in the

literature about model checking applied to systems that change. Two notable exceptions

are the research of Sokolsky and Smolka (1994) on incremental reveri�cation and that of

17. If \spd" 6= 1.0 for a total algorithm, this is due to the statistical variation in run time.

144


SIT

1agent=1plan

SIT

1agent=1plan

SIT

multplans

SIT

multplans

and Invariance and Response and Invariance and Response

o

change

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

delete

None None None None

o

spec

None None None None

o

add

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

gen

Inc

gen�I

or Inc

I�NI

Inc

gen�R

Inc

I�NI

Inc

AT�NI

o

delete_spec

None None None None

o

delete�action

None None None None

o

add_gen

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

move

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

delete+add

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

spec+add

Inc

I�NI

Inc

AT�NI

Inc

I�NI

Inc

AT�NI

o

delete+gen

None Inc

gen�R

Inc

I�NI

Inc

AT�NI

o

spec+gen

None Inc

gen�R

Inc

I�NI

Inc

AT�NI

o

stay

None Inc

AT�NI

Inc

I�NI

Inc

AT�NI

Table 7: Learning operators with the fastest reveri�cation method.

Sekar et al. (1994). Both of these papers are about reveri�cation of software after user

edits rather than adaptive agents. Nevertheless the work is related. Sokolsky and Smolka

use the modal �-calculus to express Invariance and Liveness properties. They present an

incremental version of a model checker that does block-by-block global computations of �xed

points, rather than AT or property-speci�c model checking as we do. The learning operators

assumed by their algorithm are edge deletions/additions on a representation similar to FSAs

called LTS (but unlike our multiagent work, they assume a single LTS). The worst-case

time complexity of their algorithm is the same as that of total reveri�cation, although

their empirical results are good. Note that we have a priori results for edge deletion.

However we do not have an incremental algorithm speci�cally tailored for edge addition (for

multiple agents and AT or property-speci�c model checking); thus this may be a fruitful

direction for future research. Sekar et al.'s approach consists of converting rule sets to

FSAs, then generating and testing functions that map from the post- to the prelearning

FSA and property. If the desired function can be found, they apply a theorem from Kurshan

(1994), which guarantees that the learning is \safe." Although no complexity results are

provided, the generate-and-test approach that they describe appears to be computationally

expensive. In contrast to Sekar et al., we have proofs and empirical evidence that our

methods are e�cient and, in some cases, that they are substantially more e�cient than

total reveri�cation from scratch.

There is also related research in the �eld of classical planning. In particular, Weld and

Etzioni (1994) have a method to incrementally test an agent's plan to decide whether to

add new actions to the plan. Actions are added only when their e�ects do not violate a

certain type of Invariance property. Their method has some similarities with our Inc

gen�I

algorithm. One di�erence is that our method is for reactive rather than projective plans.

145

Gordon

Another is that our veri�cation method is expressed using the formal foundations in the

model checking literature.

As mentioned in the introduction of this paper, FSAs have been shown to be e�ective

representations of reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel

& Markovitch, 1996; Fogel, 1996). FSA plans have been used both for multiagent competi-

tion and coordination. For example, Fogel's (1996) co-evolving FSA agents for competitive

game playing were mentioned above. A similarity with our work is that Fogel assumes

agents' plans are expressed as !-automata. Nevertheless, Fogel never discusses veri�cation

of these plans. Goldman and Rosenschein (1994) present a method for multiagent coordi-

nation that assumes FSA plans. Multiple agents cooperate by taking actions to favorably

alter their environment. The cooperation strategy is implemented by a plan developer who

manually edits the FSAs. The relationship to the work here is that they present FSA

transformations that ensure multiagent coordination. Likewise, in our research, a learn-

ing operator that is a priori guaranteed \safe" for some multiagent coordination property

transforms the FSA while ensuring coordination. Although both their method and ours

guarantee this coordination, their solution is manual whereas ours is entirely automated.

Some of the more recent research on agent coordination applies formal veri�cation meth-

ods. For example, Lee and Durfee (1997) model their agents' semantics with a formalism

similar to Petri nets (rather than FSAs). They verify synchronization (Invariance) proper-

ties, which prevent deadlock, using model checking. Furthermore, Lee and Durfee suggest

recovery from failed veri�cation using two methods: concept learning, and a method analo-

gous to that used by Ramadge and Wonham (1989). Burkhard (1993) and Kabanza (1995)

assume agent plans are represented as !-automata, and they address issues of model check-

ing temporal logic properties of the joint (multiagent) plans. Thus there is a growing

precedent for addressing multiagent coordination by expressing plans as !-automata and

verifying them with model checking. Our work builds on this precedent, and also extends

it, because none of this previous research addresses e�cient reveri�cation for agents that

learn.

Finally, there are alternative methods for constraining the behavior of agents, which

are complementary to reveri�cation and self-repair. For example, Shoham and Tennenholtz

(1995) design agents that obey social laws, e.g., safety conventions, by restricting the agents'

actions. Nevertheless, the plan designer may not be able to anticipate and engineer all laws

into the agents beforehand, especially if the agents have to adapt. One solution is to use

laws that allow maximum exibility (Fitoussi & Tennenholtz, 1998). However this solution

does not allow for certain changes in the plan, such as the addition or deletion of actions.

An appealing alternative would be to couple initial engineering of social laws with e�cient

reveri�cation after learning.

A method for ensuring physically bounded behavior of agents is \arti�cial physics"

(Spears & Gordon, 1999). With arti�cial physics, multiagent behavior is restricted by

arti�cial forces between the agents. Nevertheless, when encountering severe unanticipated

circumstances, arti�cial physics needs to be complemented with reveri�cation and \steering"

for self-repair (Gordon et al., 1999).

146


8. Summary and Future Work

Agent technology is growing rapidly in popularity. To handle real-world domains and in-

teractions with people, agents must be adaptable, predictable, and rapidly responsive. An

approach to resolving these potentially con icting requirements is presented here. In sum-

mary, we have shown that certain machine learning operators are a priori (with no run-time

reveri�cation) safe to perform. In other words, when certain desirable properties hold prior

to learning, they are guaranteed to hold post-learning. The property classes considered here

are Invariance and Response. Learning operators o

delete

, o

spec

, o

delete_spec

, and o

delete�action

were found to preserve properties in either of these classes. For SIT

1agent

and SIT

1plan

,

where there is a single (multi)agent FSA plan, o

delete+gen

, o

spec+gen

and o

stay

were found to

preserve Invariance properties. All of the a priori results are independent of the size of the

FSA and are therefore applicable to any FSA that has been model checked originally.

We then discussed transformations of learning operators and their corresponding a priori

results to a product plan. This addresses SIT

multplans

, where multiple agents each have their

own plan but the multiagent plan must be re-formed and reveri�ed to determine whether

multiagent properties are preserved. It was discovered that only o

delete

, o

spec

, o

delete_spec

,

and o

delete�action

preserve their a priori results for this situation.

Finally, we presented novel incremental reveri�cation algorithms for all cases in which the

a priori results are negative. It was shown in both theoretical and empirical comparisons that

these algorithms can substantially improve the time complexity of reveri�cation over total

reveri�cation from scratch. Empirical results showed as much as a

1

2

-billion-fold speedup.

These are initial results, but continued research along these lines will likely be applicable to

a wide range of important problems, including a variety of agent domains as well as more

general software applications.

When learning is required, we suggest that the a priori results should be consulted

�rst. If no positive results (i.e., the learning operator is an SML) exist, then incremental

reveri�cation proceeds.

To test our overall framework, we have implemented the rovers example of this paper

as co-evolving agents assuming SIT

multplans

, i.e., multiple agents each with its own plan.

By using the a priori results and incremental algorithms, we achieved signi�cant speedups.

We have also developed a more sophisticated application that uses reveri�cation during

evolution. Two agents compete in a board game, and one of the agents evolves its strategy to

improve it. The key lesson that has been learned from this implementation is that although

the types of FSAs and learning operators are slightly di�erent from those presented in this

paper, and the property is quite di�erent (it is a check for a certain type of cyclic behavior

on the board), initial experiences show that the methodology and basic results here could

potentially be easily extended to a variety of multiagent applications.

Future work will focus primarily on extending the a priori results to other learning

operators/methods and property classes, developing other incremental reveri�cation algo-

rithms, and exploring plan repair to recover from reveri�cation failures. One way in which

the a priori results might be extended is by discovering when learning operators will make

a property true, even if it was not true before learning.

A question that was not addressed here is whether the incremental methods are useful if

multiple machine learning operators are applied in batch (e.g., as one might wish to do with

147

Gordon

operator o

add�action

). In the future we would like to explore how to handle this situation

{ is it more e�cient to treat the operators as having been done one-at-a-time and use

incremental reveri�cation for each? Or is total reveri�cation from scratch preferable? Or,

better yet, can we develop e�cient incremental algorithms for sets of learning operators?

Plan repair was not discussed in this paper and is an important future direction. The

research of De Raedt and Bruynooghe (1994), which uses counterexamples to guide the

revision of theories subject to integrity constraints, may provide some ideas. There are

also plan repair methods in the classical planning literature that might be relevant to our

approach (Joslin & Pollack, 1994; Weld & Etzioni, 1994). It would be interesting to compare

the time to repair plans versus trying another learning operator and reverifying.

A limitation of our approach is that it does not handle stochastic plans or properties

with time limits, e.g., a Response property for which the response must occur within a

speci�ed time after the trigger. We would like to extend this research to stochastic FSAs

(Tzeng, 1992) and timed FSAs/properties (Alur & Dill, 1994; Kabanza, 1995), as well

as other common agent representations besides FSAs. Another direction for future work

would be to extend our results to symbolic model checking, which uses binary decision

diagrams (BDDs) so that the full state space need not be explicitly explored during model

checking (Burch et al., 1994). In some cases, symbolic model checking can produce dramatic

speedup. However, none of the current research on symbolic model checking addresses

adaptive systems.

Additionally, the ideas here are applicable to some of the FSA-based control theory

work. For example, Ramadge and Wonham (1989) assume FSA representations for both the

plant (which is assumed to be a discrete-event system) and the supervisor (which controls

the actions of the plant). We are currently applying some of the principles of e�cient

reveri�cation to change the supervisor in response to changes in the plant in a manner that

preserves properties (Gordon & Kiriakidis, 2000).

Finally, future work should focus on studying how to operationalize Asimov's Laws for

intelligent agents. What sorts of properties best express these laws? Weld and Etzioni

(1994) provide some initial suggestions, but much more remains to be done.

Acknowledgments

This research is supported by the O�ce of Naval Research (N0001499WR20010) in con-

junction with the \Semantic Consistency" MURI. I am grateful to William Spears, Joseph

Gordon, Stan Sadin, Chitoor Srinivasan, Ramesh Bharadwaj, Dan Hoey, and the anony-

mous reviewers for useful suggestions and advice. The presentation of the material in this

paper was enormously improved thanks to William Spears' suggestions.

148


Appendix A. Glossary of Notation

j= Models (satis�es)

model checking A veri�cation method entailing brute-force search

AT Automata-theoretic model checking

SIT

1agent

Single agent situation

SIT

1plan

Multiagent situation where each agent uses a multiagent plan

SIT

multplans

Multiagent situation where each agent uses an individual plan

FSA Finite-state automaton

V (S) The set of states (vertices) of FSA S

E(S) The set of state-to-state transitions (edges) of FSA S

transition condition Logical description of the set of actions enabling a transition

K A Boolean algebra

� Boolean algebra partial order; x � y i� x ^ y = x

M

K

(S) The matrix of transition conditions of FSA S

M

K

(v

i

; v

j

) Transition condition associated with edge (v

i

; v

j

)

I(S) The set of initial states of FSA S

atoms Primitive elements of a Boolean algebra; atoms are actions

string Sequence of actions (atoms)

L(S) The language of (set of strings accepted by) FSA S

!-automaton An FSA that accepts in�nite-length strings

run The sequence of FSA vertices visited by a string

accepting run The run of a string in the FSA language

acceptance criterion A requirement of accepting runs of an FSA

The tensor (synchronous) product of FSAs

complete FSA Speci�es a transition for every possible action

deterministic FSA The choice of action uniquely determines the next state

path Sequence of vertices connected by edges

cycle A path with start and end vertices identical

c-state Computational state; an action occurring in a computation

accessible from There exists a path from

2 Temporal logic \invariant"

3 Temporal logic \eventually"

Invariance property 2:p, i.e., \Invariant not p"

Response property 2(p! 3q), i.e., \Every p is eventually followed by q"

First-Response property The �rst p (trigger) is followed by a q (response)

B(S) The set of \bad" (to be avoided) states of FSA S

" Can increase accessibility

6" Cannot increase accessibility

# Can decrease accessibility

6# Cannot decrease accessibility

SML Safe machine learning operator, i.e., preserves properties

sound algorithm One that is correct when it states that S j= P

complete algorithm One that is correct when it states that S 6j= P

� The FSA transition function

149

Gordon

Appendix B. Temporal logic properties

This appendix, which is based on Manna and Pnueli (1991), formally de�nes Invariance and

Response properties in temporal logic. We begin by de�ning the basic temporal operator U

(Until). We assume a string (x

0

; :::) of c-states of FSA S, where 0 � i; j; k. Then for c-state

formulae p and q, we de�ne Until as x

j

j= p U q , for some k � j, x

k

j= q, and for every i

such that j � i < k, x

i

j= p.

Invariance properties are de�ned in terms of Eventually properties, so we de�ne Even-

tually �rst. For c-state formula p and FSA S, we de�ne property P = 3p (\Eventually p")

as a property that is true (false) for a string if it is true (false) at the initial c-state x

0

of

the string. Formally, if x = (x

0

; :::) is a string of FSA S, then x j= 3p , x

0

j= true U p,

i.e., \eventually p." A property P = 2:p (\Invariant not p") is de�ned as x j= 2:p ,

x j= :3p, i.e., \never p." Finally, a Response formula is of the form 2(p ! 3q), where p

is called the \trigger" and q the \response." A Response formula states that every trigger

is eventually followed by a response.

Appendix C. Properties for Experiments

The following �ve Invariance properties were used in the test suite:

2 (:(I-deliver ^ L-transmit))

2 (:(I-deliver ^ L-pause))

2 (:(F-collect ^ I-deliver))

2 (:(F-collect ^ I-deliver ^ L-receive))

2 (:(F-deliver ^ I-receive ^ L-pause))

The following �ve Response properties were used in the test suite:

2 (F-deliver ! 3 L-receive)

2 (F-deliver ! 3 I-receive)

2 (F-collect ! 3 L-transmit)

2 ((F-collect ^ I-deliver) ! 3 L-receive)

2 (F-deliver ! 3 (I-receive ^ L-receive))

References

Alur, R., & Dill, D. (1994). A theory of timed automata. Theoretical Computer Science,

126, 183{235.

Asimov, I. (1950). I, Robot. Greenwich, CT: Fawcett Publications, Inc.

Bavel, Z. (1983). Introduction to the Theory of Automata. Reston, VA: Prentice-Hall.

B�uchi, J. (1962). On a decision method in restricted second-order arithmetic. InMethodology

and Philosophy of Science, Proceedings of the Stanford International Congress, pp. 1{

11. Stanford, CA: Stanford University Press.

150


Burch, J., Clarke, E., Long, D., McMillan, K., & Dill, D. (1994). Symbolic model checking

for sequential circuit veri�cation. IEEE Transactions on Computer-Aided Design of

Integrated Circuits and Systems, 13(4), 401{424.

Burkhard, H. (1993). Liveness and fairness properties in multi-agent systems. In Proceedings

of the Thirteenth International Joint Conference on Arti�cial Intelligence (IJCAI),

pp. 325{330. Chambery, France.

Carmel, D., & Markovitch, S. (1996). Learning models of intelligent agents. In Proceedings

of the Thirteenth National Conference on Arti�cial Intelligence (AAAI), pp. 62{67.

Portland, OR.

Clarke, E., & Wing, J. (1997). Formal methods: State of the art and future directions.

ACM Computing Surveys, 28(4), 626{643.

Courcoubetis, C., Vardi, M., Wolper, P., & Yannakakis, M. (1992). Memory-e�cient algo-

rithms for the veri�cation of temporal properties. Formal Methods in System Design,

1, 257{288.

De Raedt, L., & Bruynooghe, M. (1994). Interactive theory revision. In Michalski, R.,

& Tecuci, G. (Eds.), Machine Learning IV, pp. 239{264. San Mateo, CA: Morgan

Kaufmann.

Dean, T., & Wellman, M. (1991). Planning and Control. San Mateo, CA: Morgan Kauf-

mann.

Elseaidy, W., Cleaveland, R., & Baugh, J. (1994). Verifying an intelligent structure control

system: A case study. In Proceedings of the Real-Time Systems Symposium, pp.

271{275. San Juan, Puerto Rico.

Fitoussi, D., & Tennenholtz, M. (1998). Minimal social laws. In Proceedings of the Fifteenth

National Conference on Arti�cial Intelligence, pp. 26{31. Madison, WI.

Fogel, D. (1996). On the relationship between duration of an encounter and the evolution

of cooperation in the iterated prisoner's dilemma. Evolutionary Computation, 3(3),

349{363.

Goldman, S., & Rosenschein, J. (1994). Emergent coordination through the use of coop-

erative state-changing rules. In Proceedings of the Twelfth National Conference on

Arti�cial Intelligence, pp. 408{413. Seattle, WA.

Gordon, D. (1998). Well-behaved borgs, bolos, and berserkers. In Proceedings of the Fif-

teenth International Conference on Machine Learning (ICML), pp. 224{232. Madison,

WI.

Gordon, D. (1999). Re-veri�cation of adaptive agents' plans. Tech. rep., Navy Center for

Applied Research in Arti�cial Intelligence.

Gordon, D., & Kiriakidis, K. (2000). Adaptive supervisory control of interconnected discrete

event systems. In Proceedings of the International Conference on Control Applications

(ICCA), pp. 50{56. Anchorage, AK.

151

Gordon

Gordon, D., Spears, W., Sokolsky, O., & Lee, I. (1999). Distributed spatial control, global

monitoring and steering of mobile physical agents. In Proceedings of the IEEE Inter-

national Conference on Information, Intelligence, and Systems (ICIIS), pp. 681{688.

Washington, D.C.

Grefenstette, J., & Ramsey, C. (1992). An approach to anytime learning. In Proceed-

ings of Ninth International Workshop on Machine Learning, pp. 189{195. Aberdeen,

Scotland.

Heitmeyer, C., Kirby, J., Labaw, B., Archer, M., & Bharadwaj, R. (1998). Using abstraction

and model checking to detect safety violations in requirements speci�cations. IEEE

Transactions on Software Engineering, 24(11), 927{948.

Holzmann, G., Peled, D., & Yannakakis, M. (1996). On nested depth-�rst search. In

Proceedings of the Second Spin Workshop, pp. 81{89. Rutgers, NJ.

Holzmann, G. J. (1991). Design and Validation of Computer Protocols. NJ: Prentice-Hall.

Joslin, D., & Pollack, M. (1994). Least-cost aw repair: A plan re�nement strategy for

partial-order planning. In Proceedings of the Twelfth International Conference on

Arti�cial Intelligence, pp. 1004{1009. Seattle, WA.

Kabanza, F. (1995). Synchronizing multiagent plans using temporal logic speci�cations. In

Proceedings of the First International Conference on Multiagent Systems (ICMAS),

pp. 217{224. San Francisco, CA.

Kurshan, R. (1994). Computer Aided Veri�cation of Coordinating Processes. Princeton,

NJ: Princeton University Press.

Lee, J., & Durfee, E. (1997). On explicit plan languages for coordinating multiagent plan

execution. In Proceedings of the Fourth International Workshop on Agent Theories,

Architectures, and Languages (ATAL), pp. 113{126. Providence, RI.

Manna, Z., & Pnueli, A. (1991). Completing the temporal picture. Theoretical Computer

Science, 83(1), 97{130.

Michalski, R. (1983). A theory and methodology of inductive learning. In Michalski, R.,

Carbonell, J., & Mitchell, T. (Eds.), Machine Learning I, pp. 83{134. Palo Alto, CA:

Tioga.

Mitchell, T. (1978). Version Space: An Approach to Concept Learning. Ph.D. thesis,

Stanford University.

Nilsson, N. (1980). Principles of Arti�cial Intelligence. Palo Alto, CA: Tioga.

Potter, M. (1997). The Design and Analysis of a Computational Model of Cooperative

Coevolution. Ph.D. thesis, George Mason University.

Ramadge, P., & Wonham, W. (1989). The control of discrete event systems. Proceedings of

the IEEE, 1, 81{98.

152


Sekar, R., Lin, Y.-J., & Ramakrishnan, C. (1994). Modeling techniques for evolving dis-

tributed applications. In Proceedings of Formal Description Techniques (FORTE),

pp. 22{29. Berne, Switzerland.

Shoham, Y., & Tennenholz, M. (1995). On social laws for arti�cial agent societies: O�-line

design. Arti�cial Intelligence, 73(1-2), 231{252.

Sikorski, R. (1969). Boolean Algebras. New York, NY: Springer-Verlag.

Sokolsky, O., & Smolka, S. (1994). Incremental model checking in the modal mu-calculus.

In Proceedings of Computer-Aided Veri�cation (CAV), pp. 351{363. Stanford, CA.

Spears, W., & Gordon, D. (1999). Using arti�cial physics to control agents. In Proceedings

of the IEEE International Conference on Information, Intelligence, and Systems, pp.

281{288. Washington, D.C.

Tennenholtz, M., & Moses, Y. (1989). On cooperation in a multi-entity model. In Pro-

ceedings of the Eleventh International Joint Conference on Arti�cial Intelligence, pp.

918{923.

Tzeng, W. (1992). Learning probabilistic automata and markov chains via queries. Machine

Learning, 8, 151{166.

Vardi, M., & Wolper, P. (1986). An automata-theoretic approach to automatic program

veri�cation. In Proceedings of the First Annual Symposium on Logic in Computer

Science (LICS), pp. 332{345. Cambridge, MA.

Weld, D., & Etzioni, O. (1994). The �rst law of robotics. In Proceedings of the Twelfth

National Conference on Arti�cial Intelligence, pp. 1042{1047. Seattle, WA.

153

journal of arti cial telligence in h researc 13 (2000) 95 ... · vian asimo e adaptiv ts agen diana...

Documents