journal of arti cial telligence in h researc 13 (2000) 95 ... · vian asimo e adaptiv ts agen diana...
TRANSCRIPT
Journal of Arti�cial Intelligence Research 13 (2000) 95-153 Submitted 1/00; published 9/00
Asimovian Adaptive Agents
Diana F. Gordon [email protected]
Navy Center for Applied Research in Arti�cial Intelligence
Naval Research Laboratory, Code 5515
Washington, D.C. 20375-5337 USA
Abstract
The goal of this research is to develop agents that are adaptive and predictable and
timely. At �rst blush, these three requirements seem contradictory. For example, adap-
tation risks introducing undesirable side e�ects, thereby making agents' behavior less pre-
dictable. Furthermore, although formal veri�cation can assist in ensuring behavioral pre-
dictability, it is known to be time-consuming.
Our solution to the challenge of satisfying all three requirements is the following. Agents
have �nite-state automaton plans, which are adapted online via evolutionary learning (per-
turbation) operators. To ensure that critical behavioral constraints are always satis�ed,
agents' plans are �rst formally veri�ed. They are then reveri�ed after every adaptation.
If reveri�cation concludes that constraints are violated, the plans are repaired. The main
objective of this paper is to improve the e�ciency of reveri�cation after learning, so that
agents have a su�ciently rapid response time. We present two solutions: positive re-
sults that certain learning operators are a priori guaranteed to preserve useful classes of
behavioral assurance constraints (which implies that no reveri�cation is needed for these
operators), and e�cient incremental reveri�cation algorithms for those learning operators
that have negative a priori results.
1. Introduction
Agents are becoming increasingly prevalent and e�ective. Robots and softbots, working
individually or in concert, can relieve people of a great deal of labor-intensive tedium in their
jobs as well as in their day-to-day lives. Designers can furnish agents with plans to perform
desired tasks. Nevertheless, a designer cannot possibly foresee all circumstances that will
be encountered by the agent. Therefore, in addition to supplying an agent with plans, it
is essential to also enable the agent to learn and modify its plans to adapt to unforeseen
circumstances. The introduction of learning, however, often makes the agent's behavior
signi�cantly harder to predict.
1
The goal of this research is to verify the behavior of adaptive
agents. In particular, our objective is to develop e�cient methods for determining whether
the behavior of learning agents remains within the bounds of prespeci�ed constraints (called
\properties") after learning. This includes verifying that properties are preserved for single
adaptive agents as well as verifying that global properties are preserved for multiagent
systems in which one or more agents may adapt.
An example of a property is Asimov's First Law (Asimov, 1950). This law, which
has also been studied by Weld and Etzioni (1994), states that an agent may not harm a
1. Even adding a simple, elegant learning mechanism such as chunking in Soar can substantially reduce
system predictability (Soar project members, personal communication).
c
2000 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Gordon
human or allow a human to come to harm. The main contribution of Weld and Etzioni is
a \ `call to arms:' before we release autonomous agents into real-world environments, we
need some credible and computationally tractable means of making them obey Asimov's
First Law...how do we stop our artifacts from causing us harm in the process of obeying
our orders?" Of course, this law is too general for direct implementation and needs to be
operationalized into speci�c properties testable on a system, such as \Never delete a user's
�le." This paper addresses Weld and Etzioni's call to arms in the context of adaptive agents.
To respond to the call to arms, we are working toward \Asimovian" adaptive agents, which
we de�ne to be adaptive agents that can verify, in a reasonably e�cient manner, whether
user-de�ned properties are preserved after adaptation.
2
Such agents will either constrain
their adaptation methods, or repair themselves in such a way as to preserve these properties.
The veri�cation method assumed here, model checking, consists of building a �nite
model of a system and checking whether the desired property holds in that model. In the
context of this paper, model checking determines whether S j= P for plan S and property
P , i.e., whether plan S \models" (satis�es) property P . The output is either \yes" or \no"
and, if \no," one or more counterexamples are provided. Model checking has proven to be
very e�ective for safety-critical applications, e.g., a model checker uncovered a potentially
disastrous error in a system designed to make buildings more earthquake resistant. This
error would have unleashed a structural force to worsen earthquake vibrations, rather than
dampen them (Elseaidy et al., 1994).
Essentially, model checking is brute force search through the set of all reachable states of
the plan to check if the property holds. If the plan has a �nite number of states, this process
terminates. Model checking global properties of a multiagent plan has time complexity that
is exponential in the number of agents.
3
With a large number of agents, this is could be
a serious problem. In fact, even model checking a single agent plan with a huge number
of states can be computationally prohibitive. A great deal of research in the veri�cation
community is currently focused on reduction techniques for handling very large state spaces
(Clarke & Wing, 1997). One of the largest systems model checked to date using these
reduction techniques had 10
120
states (Burch et al., 1994). Nevertheless, the applicability
of many of these reduction techniques is restricted and few are completely automated.
Furthermore, none of them are tailored for e�cient reveri�cation after learning has altered
the system. Some methods in the literature are designed for software that changes. One
that emphasizes e�ciency, as ours does, is Sokolsky and Smolka's (1994). However none
of them, including Sokolsky and Smolka's method, are applicable to multiagent systems in
which a single agent could adapt, thereby altering the global behavior of the overall system.
In contrast, our approach addresses the timeliness of adaptive multiagent systems.
Consider how reveri�cation �ts into our overall adaptive agents framework. In this
framework (see Figure 1), there are one or more agents with \anytime" plans (Grefenstette
& Ramsey, 1992), i.e., plans that are continually executed in response to internal and
external environmental conditions. Each agent's plan is assumed to be in the form of a
�nite-state automaton (FSA). FSAs have been shown to be e�ective representations of
2. They are also called APT agents because they are adaptive, predictable and timely.
3. The states in a multiagent plan are formed by taking the Cartesian product of states in the individual
agent plans (see Section 3).
96
Asimovian Adaptive Agents
OFFLINE:
�
�
plan
(1) Develop initial agent plan(s)
(2) If SIT
1plan
or SIT
multplans
, form multiagent plan
(3) Verify (multi)agent plan
(4) Repair plan(s) if properties not satis�ed
?
�
ONLINE:
�
�
agent's plan
?
(5) Learning modi�es (multi)agent plan
�
�
new plan
?
6
(6) If SIT
multplans
, re-form multiagent plan
(7) Rapidly reverify (multi)agent plan
(8) Choose another learning operator or repair
plan(s) if properties not satis�ed
�
Figure 1: Veri�able adaptive agents.
reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel & Markovitch,
1996; Fogel, 1996).
Let us begin with step 1 in Figure 1. There are at least a couple of ways that the FSA
plans could be formed initially. For one, a human plan designer could engineer the initial
plans. This may require considerable e�ort and knowledge. An appealing alternative is to
evolve (i.e., learn using evolutionary algorithms) the initial plans in a simulated environ-
ment. Fogel (1996) outlines a procedure for evolving FSAs that is e�ective for a number of
problems, including an iterated version of the Prisoner's Dilemma.
Human plan engineers or evolutionary algorithms can develop plans that satisfy an
agent's goals to a high degree. However, to provide strict behavioral guarantees, formal
veri�cation is also required. Therefore we assume that prior to �elding the agents, the
(multi)agent plan has been veri�ed o�ine to determine whether it satis�es critical properties
(steps 2 and 3). If not, the plan is repaired (step 4). Plan repair is not addressed in this
paper, although it is an important topic for future research. Steps 2 through 4 require some
clari�cation. If there is a single agent, then it has one FSA plan and that is all that is
veri�ed and repaired, if needed. We call this SIT
1agent
. (This notation, as well as other
notation used in the paper, is included in the glossary of Appendix A.) If there are multiple
agents that cooperate, we consider two possibilities. In SIT
1plan
, every agent uses the same
multiagent plan, which is a \product" of the individual agent plans. This multiagent plan
is formed and veri�ed to see if it satis�es global multiagent coordination properties. The
multiagent plan is repaired if veri�cation produces any errors, i.e., failure of the plan to
satisfy a property. In SIT
multplans
, each agent independently uses its own individual plan.
To verify global properties, one of the agents takes the product of these individual plans to
form a multiagent plan. This multiagent plan is what is veri�ed. For SIT
multplans
, one or
more individual plans are repaired if the property is not satis�ed.
After the initial plan(s) have been veri�ed and repaired, the agents are �elded. While
�elded (online), the agents apply learning (e.g., evolutionary operators) to their plan(s)
as needed (step 5). Learning may be required to adapt the plan to handle unexpected
97
Gordon
situations or to �ne-tune the plan. If SIT
1agent
or SIT
1plan
, the single (multi)agent plan
is adapted. If SIT
multplans
, an agent adapts its own FSA, after which the multiagent
(product) plan is re-formed. For all situations, one agent then rapidly reveri�es the new
(multi)agent plan to ensure it still satis�es the required properties (steps 6 and 7). Re-
formation of the multiagent plan and reveri�cation are required to be as time-e�cient
as possible because they are performed online, perhaps in a highly time-critical situation.
Whenever (re)veri�cation fails, it produces a counterexample that is used to guide the choice
of an alternative learning operator or other plan repair as needed (step 8). This process of
executing, adapting, and reverifying plans cycles inde�nitely as needed. The main focus of
this paper is steps 6 and 7.
Rapid reveri�cation after learning is a key to achieving timely agent responses. Our long-
term goal is to examine all learning methods and important property classes to determine
the quickest reveri�cation method for each combination of learning method and property
class. In this paper we present new results that certain useful learning operators are a
priori guaranteed to be \safe" with respect to important classes of properties. In other
words, if the property holds for the plan prior to learning, then it is guaranteed to still
hold after learning.
4
If an agent uses these learning operators, it will be guaranteed to
preserve the properties with no reveri�cation required, i.e., steps 6 through 8 in Figure 1
need not be executed. This is the best one could hope for in an online situation where rapid
response time is critical. For other learning operators and property classes our a priori
results are negative. However, for the cases in which we have negative results, we present
novel incremental reveri�cation algorithms. These methods localize the reveri�cation in
order to save time over total reveri�cation from scratch.
5
We also present a novel algorithm
for e�ciently re-forming a multiagent plan, for the situation (SIT
multplans
) in which there
are multiple agents, each learning independently.
The novelty of our approach is not in machine learning or veri�cation per se, but rather
the synthesis of the two. There are numerous important potential applications of our
approach. For example, if antiviruses evolve more e�ective behaviors to combat viruses, we
need to ensure that they do not evolve undesirable virus-like behavior. Another example is
data mining agents that can exibly adapt their plans to dynamic computing environments
but whose behavior is adequately constrained for operation within secure or proprietary
domains. A third example is planetary rovers that adapt to unforeseen conditions while
remaining within critical mission parameters. Yet another example is automated factories
that adapt to equipment failures but continue operation within essential tolerances and
other speci�cations. Also, there are ongoing discussions at the Universities Space Research
Association about launching orbiting unmanned vehicles to run laboratory experiments.
The experiments would be semiautomated, and would thus require both adaptation and
behavioral assurances.
The last important application that we will mention is in the domain of power grid
and telecommunications networks. The following is an event that occurred (The New York
Times, September 21, 1991, Business Section). In 1991 in New York, local electric utilities
had a demand overload. In attempting to assist in solving the regional shortfall, AT&T
put its own generators on the local power grid. This was a manual adaptation, but such
4. This idea of property-preserving learning transformations was �rst introduced by Gordon (1998).
5. Incremental methods are often used in computer science for improving the time-e�ciency of software.
98
Asimovian Adaptive Agents
adaptations are expected to become increasingly automated in the future. As a result of
AT&T's actions, there was a local power overload and AT&T lost its own power, which
resulted in a breakdown of the AT&T regional communications network. The regional net-
work breakdown propagated to create a national breakdown in communications systems.
This breakdown also triggered failures of many other control networks across the country,
such as the air tra�c control network. Air travel nationwide was shut down. In the future,
it is reasonable to expect that some network controllers will be implemented using multiple,
distributed cooperating software agents. This example dramatically illustrates the poten-
tial vulnerability of our national resources unless these agents satisfy all of the following
criteria: continuous execution/monitoring, exible adaptation to failures, safety/reliability,
and timely responses. Our approach ensures that agents satisfy all of these.
This paper is organized as follows. Section 2 provides an illustrative example that is used
throughout the paper. Section 3 has the necessary background de�nitions of FSAs, property
types, formal veri�cation, and machine learning operators. A priori results for speci�c
machine learning operators are in Section 4. These learning operators alter automaton edges
and the transition conditions associated with edges. A transition condition speci�es the
condition under which a state-to-state transition may be made. We present positive a priori
results for some of these operators, where a \positive a priori result" means that the learning
operator preserves a speci�ed class of properties. On the other hand, counterexamples are
presented to show that some of the learning operators do not necessarily preserve these
properties. Section 5 extends the a priori results for the multiagent situation SIT
multplans
.
For all cases where we obtain negative a priori results, Section 6 provides incremental
algorithms for re-forming the multiagent plan and reverifying it, along with a worst-case
complexity analysis and empirical time complexity results. The empirical results show
as much as a
1
2
-billion-fold speedup for one of the incremental algorithms over standard
veri�cation. The paper concludes with a discussion of related work and ideas for future
research.
2. Illustrative Example
We begin with a multiagent example for SIT
1plan
or SIT
multplans
that is used throughout the
paper to illustrate the de�nitions and ideas. The section starts by addressing SIT
multplans
,
where multiple agents have their own independent plans. Later in the section we address
SIT
1plan
, where each agent uses a joint multiagent plan.
Imagine a scenario where a vehicle has landed on a planet for the purpose of exploration
and sample collection, for example as in the Path�nder mission to Mars. Like the Path�nder,
there is a lander (called agent \L") from which a mobile rover emerges. However, in this
case there are two rovers: the far (\F") rover for distant exploration, and the intermediary
(\I") rover for transferring data and samples from F to L.
We assume an agent designer has developed the initial plans for F, I, and L, shown
in Figures 2 and 3. These are simpli�ed, rather than realistic, plans { for the purpose of
illustration. Basically, rover F is either collecting samples/data (in state COLLECTING) or
it is delivering them to rover I (when F is in its state DELIVERING). Rover I can either be
receiving samples/data from rover F (when I is in its RECEIVING state) or it can deliver
them to lander L (when it is in its DELIVERING state). If L is in its RECEIVING state,
99
Gordon
�
�
�
�
H
Hj
COLLECTINGfF-collect,
F-deliverg
�
��
�
F-collect ^ I-deliver
else
?
6
else
�
�
�
�
DELIVERINGfF-deliverg
�
��
*
F-deliver ^ I-receive
�
�
�
�
H
Hj
RECEIVING fI-receiveg
�
��
�
else
F-deliver ^
I-receive ^
L-transmit
?
6
I-deliver ^
: L-transmit
�
�
�
�
DELIVERING fI-deliverg
�
��
*
else
Figure 2: Plans for rovers F (left) and I (right).
�
�
�
�
H
Hj
TRANSMITTINGfL-transmit,
L-pauseg
�
��
�
else
I-receive ^
L-transmit
?
6
else
H
H
H
HY
else
I-receive ^
L-pause
H
H
H
H
H
H
Hj
�
�
�
�
�
��
PAUSING fL-pauseg
I-receive ^
L-pause
�
�
�
�
H
Hj
RECEIVINGfL-receiveg
�
�
�
�
�
�
�)
�
��
*
I-deliver ^
L-receive
Figure 3: Plan for the lander L.
then it can receive the samples/data from I. Otherwise, L could be busy transmitting data
to Earth (in state TRANSMITTING) or pausing between actions (in state PAUSING).
As mentioned above, plans are represented using FSAs. An FSA has a �nite set of states
(i.e., the vertices) and allowable state-to-state transitions (i.e., the directed edges between
vertices). The purpose of having states is to divide the agent's overall task into subtasks.
A state with an incoming arrow not from any other state is an initial state. Plan execution
begins in an initial state.
Plan execution occurs as the agent takes actions, such as agent F taking action F-collect
or F-deliver. Each agent has a repertoire of possible actions, a subset of which may be
taken from each of its states. A plan designer can specify this subset for each state. The
choice of a particular action from this subset is modeled in the FSA as nondeterministic.
It is assumed that further criteria, not speci�ed here, are used to make the �nal run-time
choice of a single action from a state.
Let us specify the set of actions for each of the agents (F, I, L) in our example. F has
two possible actions: F-collect and F-deliver. The �rst action means that F collects samples
and/or data, and the second action means that it delivers these items to I. Rover I also
has two actions: I-receive and I-deliver. The �rst action means I receives samples/data
from F, and the second means that it delivers these items to L. L has three actions: L-
transmit, L-pause, and L-receive. The �rst action means L transmits data to Earth, the
second that it pauses between operations, and the third that it receives samples/data from
I. For each FSA, the set of allowable actions from each state is speci�ed in Figures 2 and 3
100
Asimovian Adaptive Agents
in small font next to the state. For example, rover F can only take action F-deliver from
its DELIVERING state.
The transition conditions (i.e., the logical expressions labeling the edges) in an FSA plan
describe the set of actions that enable a state-to-state transition to occur. The operator ^
means \AND," _ means \OR," and : means \NOT." The condition \else" will be de�ned
shortly. The transition conditions of one agent can refer to the actions of one or more other
agents. This is because each agent is assumed to be reactive to what it has observed other
agents doing. If not visible, agents communicate their action choice.
Once an agent's action repertoire and its allowable actions from each state have been
de�ned, \else" can be de�ned. The transition condition \else" labeling an outgoing edge
from a state is an abbreviation denoting the set of all remaining actions that may be taken
from the state that are not already covered by other transition conditions. For example,
in Figure 3, L's three transition conditions from state TRANSMITTING are (I-receive ^
L-transmit), (I-receive ^ L-pause), and \else." L can only take L-transmit or L-pause from
this state. However, rover I could take I-deliver instead of I-receive. Therefore, in this case
\else" is equivalent to ((I-deliver ^ L-transmit) _ (I-deliver ^ L-pause)).
An FSA plan represents a set of allowable action sequences. In particular, a plan is the
set of all action sequences that begin in an initial state and obey the transition conditions.
An example action sequence allowed by F's plan is ((F-collect ^ I-deliver), (F-collect ^
I-receive), (F-deliver ^ I-receive), ...) where F takes its actions and observes I's actions at
each step in the sequence.
At run-time, these FSA plans are interpreted in the following manner. At every discrete
time step, every agent (F, I, L) is at one of the states in its plan, and it selects the next
action to take. Agents choose their actions independently. They do not need to synchronize
on action choice. The choice of action might be based, for example, on sensory inputs from
the environment. Although a complete plan would include the basis for action choice, as
mentioned above, here we leave it unspeci�ed in the FSA plans. Our rationale for doing
this is that that the focus of this paper is on the veri�cation of properties about correct
action sequences. The basis for action choice is irrelevant to these properties.
Once each agent has chosen an action, all agents are assumed to observe the actions
of the other agents that are mentioned in its FSA transition conditions. For example, F's
transition conditions mention I's actions, so F needs to observe what I did. Based on its
own action and those of the other relevant agent(s), an agent knows the next state to which
it will transition. There is only one possible next state because the FSAs are assumed to
be deterministic. For example, if F is in its COLLECTING state, and it chooses action
F-collect, and it observes I taking action I-deliver, then it will stay in its COLLECTING
state. The process of being in a state, choosing an action, observing the actions of other
agents, then moving to a next state, is repeated inde�nitely.
So far, we have been assuming SIT
multplans
where each agent has its own individual
plan. If we assume SIT
1plan
, then each agent uses the same multiagent plan to decide its
actions. A multiagent plan is formed by taking a \product" (de�ned in Subsection 3.1)
of the plans for F, I, and L. This product models the synchronous behavior of the agents,
where \synchronous" means that at each time step every agent takes an action, observes
actions of other agents, and then transitions to a next state. The product plan is formed,
essentially, by taking the Cartesian product of the individual automaton states and the in-
101
Gordon
tersection of the transition conditions. Multiagent actions enable state-to-state transitions
in the product plan. For example, if the agents jointly take the actions F-deliver and I-
receive and L-transmit, then all agents will transition from the joint state (COLLECTING,
RECEIVING, TRANSMITTING) to the joint state (DELIVERING, DELIVERING, RE-
CEIVING) represented by triples of states in the FSAs for F, I, and L. A multiagent plan
consists of the set of all action sequences that begin in a joint initial state of the product
plan and obey the transition conditions.
Whether the situation is SIT
multplans
or SIT
1plan
, a multiagent plan needs to be formed
to verify global multiagent coordination properties (see step 2 of Figure 1). Veri�cation of
global properties consists of asking whether all of the action sequences allowed by the
product plan satisfy the property.
One class of (global) properties of particular importance, which is addressed here, is that
of forbidden multiagent actions that we want our agents to always avoid, called Invariance
properties. An example is property P1: :(I-deliver ^ L-transmit), which states that it
should always be the case that I does not deliver at the same time that L is transmitting.
This property prevents problems that may arise from the lander simultaneously receiving
new data from I while transmitting older data to Earth. The second important class ad-
dressed here is Response properties. These properties state that if a particular multiagent
action (the \trigger") has occurred, then eventually another multiagent action (the neces-
sary \response") will occur. An example is property P2: If F-deliver has occurred, then
eventually L will execute L-receive.
If the plans in Figures 2 and 3 are combined into a multiagent plan, will this multiagent
plan satisfy properties P1 and P2? Answering this question is probably di�cult or impos-
sible for most readers if the determination is based on visual inspection of the FSAs. Yet
there are only a couple of very small, simple FSAs in this example! This illustrates how
even a few simple agents, when interacting, can exhibit complex global behaviors, thereby
making global agent behavior di�cult to predict. Clearly there is a need for rigorous be-
havioral guarantees, especially as the number and complexity of agents increases. Model
checking fully automates this process. According to our model checker, the product plan
for F, I, and L satis�es properties P1 and P2.
Rigorous guarantees are also needed after learning. Suppose lander L's transmitter
gets damaged. Then one learning operator that could be applied is to delete L's action
L-transmit, which thereafter prevents this action from being taken from state TRANS-
MITTING. After applying a learning operator, reveri�cation may be required. For this
particular operator (deleting an action), no reveri�cation is needed (see Section 4).
In a multiagent situation, what gets modi�ed by learning? Who forms and veri�es the
product FSA? And who performs repairs if veri�cation fails, and what is repaired? The
answers to these questions depend on whether it is SIT
1plan
or SIT
multplans
. If SIT
1plan
,
the agent with the greatest computational power, e.g., lander L in our example, maintains
the product plan by applying learning to it, verifying it, repairing it as needed, and then
sending a copy of it to all of the agents to use. If SIT
multplans
, an agent applies learning to
its own individual plan. The individual plans are then sent to the computationally powerful
agent, who forms the product and veri�es that properties are satis�ed. If repairs are needed,
one or more agents repair their own individual plans.
102
Asimovian Adaptive Agents
It is assumed here that machine learning operators are applied one-at-a-time per agent
rather than in batch and, if SIT
multplans
, the agents co-evolve plans by taking turns learning
(Potter, 1997). Beyond these assumptions, this paper does not focus on the learning opera-
tors per se (other than to de�ne them). It focuses instead on the outcome resulting from the
application of a learning operator. In particular, we address the reveri�cation issue. The
next section gives useful background de�nitions needed for understanding reveri�cation.
3. Preliminary De�nitions
This section provides de�nitions of FSAs, properties, veri�cation, and machine learning
operators. For a clear, unambiguous understanding of the results in this paper, many of
these de�nitions are formal.
3.1 Automata for Agents' Plans
FSAs have at least four advantages over classical plans (Nilsson, 1980; Dean & Wellman,
1991). For one, unlike classical plans, the type of �nite-state automaton plans used here
allows potentially in�nite (indeterminate) length action sequences.
6
This provides a good
model of embedded agents that are continually responsive to their environment without
any arti�cial termination to their behavior. Execution and learning may be interleaved in
a natural manner. Another advantage is that FSA plans have states, and the plan designer
can use these states to represent subtasks of the overall task. This subdivides the plan into
smaller units, thereby potentially increasing the comprehensibility of the plans. States also
enable di�erent action choices at di�erent times, even if the sensory inputs are the same.
A third advantage of FSA plans is that they are particularly well-suited to modeling the
concurrent behavior of multiple agents. An arbitrary number of single-agent plans can be
developed independently and then composed into a synchronous multiagent plan (for which
global properties may be tested) in a straightforward manner. Finally, FSA plans can be
veri�ed using the very popular and e�ective automata-theoretic model checking methods,
e.g., see Kurshan (1994).
A disadvantage of FSA plans as opposed to classical plans is that there is a great deal
of research that has been done on automatically forming classical plans, e.g., see Dean and
Wellman (1991). It is unclear how much of this might be applicable to FSAs. On the
other hand, evolutionary algorithms can be used to evolve FSA plans (Fogel, 1996). A
disadvantage of FSA plans as opposed to plans composed of rule sets is that the latter may
express a plan more succinctly. Nevertheless for plans that require formal veri�cation, FSAs
are preferable because the complex interactions that can occur between rules make them
very hard to verify. Formal veri�cation for FSAs is quite sophisticated and widely used in
safety-critical industrial applications.
This subsection, which is based on Kurshan (1994), brie y summarizes the basics of the
FSAs used to model agent plans. Figures 2 and 3 illustrate the de�nitions. This paper
focuses on FSAs that model agents with a potentially in�nite lifetime, represented as an
in�nite-length \string" (i.e., a sequence of actions).
6. Results for agents with �nite lifetimes may be found in Gordon (1998, 1999).
103
Gordon
Before beginning our discussion of automata, we brie y digress to de�ne Boolean al-
gebra. Examples throughout this paper have automaton transition conditions expressed
in Boolean algebra, because Boolean algebra succinctly summarizes these transition condi-
tions. Boolean algebra is also useful for succinctly expressing the properties. Furthermore,
it is easier for us to describe two of the incremental reveri�cation algorithms if we use
Boolean algebra notation. Therefore, we brie y summarize the basics of Boolean algebra.
A Boolean algebra K is a set of elements with distinguished elements 0 and 1, closed
under the Boolean ^, _, and : operations, and satisfying the standard properties (Sikorski,
1969). For elements x and y of K, x^ y is called the meet of x and y, x_ y is called the join
of x and y, and :x is called the complement of x. For those readers who are unfamiliar with
Boolean algebras and who want some intuition for these operations, it may help to imagine
that each element of K is itself a set, e.g., a set of actions. Meet, join, and complement
would then be set intersection, union, and complement, respectively. Elements 0 and 1,
in this case, would be the empty set (;) and the set of all elements in the universe (U),
respectively.
The Boolean algebras are assumed to be �nite. There is a partial order among the
elements, �, which is de�ned as x � y if and only if x^ y = x. It may help to think of � as
analogous to � for sets. The elements 0 and 1 are de�ned as 8x 2 K, 0 � x, and 8x 2 K,
x � 1. The atoms (analogous to single-element sets) of K, �(K), are the nonzero elements
of K minimal with respect to �. In the rovers example, agents F, I, and L each have their
own Boolean algebra with its atoms. The atoms of F's Boolean algebra are its actions
F-collect and F-deliver; the atoms of I's algebra are I-receive and I-deliver; the atoms of L's
algebra are L-transmit, L-pause, and L-receive. The element (F-collect _ F-deliver) of F's
Boolean algebra describes the set of actions fF-collect, F-deliverg.
A Boolean algebra K
i
is a subalgebra of K if K
i
is a nonempty subset of K that is closed
under the operations ^, _, and :, and also has the distinguished elements 0 and 1.
Q
K
i
is the product algebra of subalgebras K
i
. An atom of the product algebra is the meet of
the atoms of the subalgebras. For example, if a
1
; :::; a
n
are atoms of subalgebras K
1
; :::;K
n
,
respectively, then a
1
^ ::: ^ a
n
is an atom of
Q
n
i=1
K
i
.
The Boolean algebra K
F
for agent F's actions is the smallest one containing the atoms
of F's algebra. It contains all Boolean elements formed from F's atoms using the Boolean
operators ^, _, and :, including 0 and 1. These same de�nitions hold for I and L's
algebras K
I
and K
L
. K
F
K
I
K
L
is the product algebra used for all transition conditions
in the multiagent plan (i.e., the product of the F, I, and L FSAs). One atom of the
product algebra K
F
K
I
K
L
is (F-collect ^ I-receive ^ L-pause). This is the form of actions
taken simultaneously by the three agents. Algebras K
F
, K
I
, and K
L
are subalgebras of the
product algebra K
F
K
I
K
L
.
Let us return now to automata. Formally, an FSA of the type considered here is a
three-tuple S = (V (S);M
K
(S); I(S)) where V (S) is the set of vertices (states) of S, K
is the Boolean algebra corresponding to S, M
K
(S) : V (S) � V (S) ! K is the matrix of
transition conditions which are elements of K, and I(S) � V (S) are the initial states.
7
Also,
E(S) = fe 2 V (S) � V (S) j M
K
(e) 6= 0g is the set of directed edges connecting pairs of
vertices of S. M
K
(e), which is an abbreviation for M
K
(S)(e), is the transition condition of
7. There should also be an output subalgebra, as in Kurshan (1994). This would help distinguish an agent's
own actions from those of other agents. However it is omitted here for notational simplicity.
104
Asimovian Adaptive Agents
�
�
�
�
H
Hj
COLLECTING, RECEIVING, TRANSMITTING
F-deliver ^ I-receive ^ L-transmit
?�
�
�
�
DELIVERING, DELIVERING, RECEIVING
Figure 4: Part of the product plan for agents F, I, and L.
M
K
(S) corresponding to edge e. Note that we omit edges labeled \0." By our de�nition,
an edge whose transition condition is 0 does not exist. We can alternatively denote M
K
(e)
as M
K
(v
i
; v
j
) for the transition condition corresponding to the edge going from vertex v
i
to
vertex v
j
. For example, in Figure 3, M
K
((TRANSMITTING, PAUSING)) is (I-receive ^
L-pause).
Figures 2 and 3 illustrate these FSA de�nitions. There are FSA plans for three agents,
F, I, and L with vertices, edges, and transition conditions. An incoming arrow to a state,
not from any other state, signi�es that this is an initial state.
A multiagent plan is formed from single agent plans by taking the tensor product (also
called the \synchronous product" or simply \product") of the FSAs corresponding to the
individual plans. Formally, the tensor product is de�ned as:
n
i=1
S
i
= (�V (S
i
);
i
M(S
i
); � I(S
i
))
where � is the Cartesian product, and the tensor product M(S
i
) ::: M(S
n
) of n tran-
sition matrices is de�ned as M(S
1
) ::: M(S
n
)((v
1
; v
1
0
); :::; (v
n
; v
n
0
)) = M(S
1
)(v
1
; v
1
0
)
^ ::: ^ M(S
n
)(v
n
; v
n
0
) for (v
1
; v
1
0
) 2 E(S
1
), :::; (v
n
; v
n
0
) 2 E(S
n
). In words, the product
FSA is formed by taking the Cartesian product of the vertices and the intersection of the
transition conditions. Initial states of the product FSA are tuples formed from the initial
states of the individual FSAs.
The product FSA models a set of synchronous FSAs. The Boolean algebra correspond-
ing to the product FSA is the product algebra. For Figures 2 and 3, to formulate the FSA
S modeling the entire multiagent plan, we take the tensor product S = F I L of the
three FSAs. For this tensor product, I(S) = f(COLLECTING, RECEIVING, TRANS-
MITTING), (COLLECTING, RECEIVING, PAUSING), (COLLECTING, RECEIVING,
RECEIVING)g. Part of the tensor product FSA is shown in Figure 4.
Next we de�ne the language of an FSA, which is the set of all action sequences permitted
by the FSA plan. To do this, we �rst de�ne a string, which is a sequence of actions (atoms).
Formally, a string x is an in�nite-dimensional vector, (x
0
; :::) 2 �(K)
!
, i.e., a string is an
in�nite (!) length sequence of actions (where K is the Boolean algebra used by S). A
run v of string x is a sequence (v
0
; :::) of vertices such that 8i, x
i
^M
K
(v
i
; v
i+1
) 6= 0,
i.e., x
i
� M
K
(v
i
; v
i+1
) because the x
i
are atoms. In other words, a run of a string is the
sequence of vertices visited in an FSA when the string satis�es the transition conditions
along the edges.
105
Gordon
The language of FSA S is de�ned as:
L(S) = fx 2 �(K)
!
j x has a run v = (v
0
; :::) in S with v
0
2 I(S) g
Such a run is called an accepting run, and S is said to accept string x. Any requirement
on accepting runs of an FSA are what is called the FSA acceptance criterion. In this case,
the acceptance criterion consists of one condition: accepting runs must begin in an initial
state. The veri�cation literature calls these FSAs, which accept in�nite-length strings,
!-automata (Kurshan, 1994).
A few more de�nitions are needed. An FSA is complete if, for each state v 2 V (S),
P
w2V (S)
M
K
(v; w) = 1. In other words, an FSA is complete if it speci�es what state-to-state
transition the agent should make for all possible actions taken by the other agents. This is
a very reasonable assumption to make because otherwise the agent would not know what
to do in some circumstances. An FSA is deterministic at state v if w 6= w
0
) M
K
(v; w)
^M
K
(v; w
0
) = 0. In other words, the choice of action uniquely determines which edge will be
taken from a state. An FSA is deterministic if it is deterministic at each of its states. Unless
otherwise stated, it is assumed here that all FSAs are complete and deterministic. The
restriction to deterministic FSAs is not a major problem because for every nondeterministic
FSA there is a deterministic one accepting the same language (Kurshan, 1994).
We also need the de�nition of a cycle in a graph. Model checking typically consists of
looking for cycles, as described in Section 3.3. A path in FSA S is a sequence of vertices
v = (v
0
; :::; v
n
) 2 V (S)
n+1
, for n � 1 such that (v
i
; v
i+1
) 2 E(S) for i = 0; :::; n � 1, i.e.,
M
K
(v
i
; v
i+1
) 6= 0. If v
n
= v
0
, then v is a cycle. Each cycle in an FSA plan allows the
possibility that the agent can in�nitely often, or as long as desired, revisit the vertices of
the cycle. It also implies that a substring can be repeated inde�nitely.
We next illustrate some of these de�nitions. An example string in the language of FSA
S, the multiagent FSA that is the product of F, I, and L, is
((F-collect ^ I-receive ^ L-transmit),
(F-deliver ^ I-receive ^ L-receive),
(F-deliver ^ I-receive ^ L-transmit),
(F-deliver ^ I-deliver ^ L-receive), ...).
This is a sequence of atoms of S. A run of this string is
((COLLECTING, RECEIVING, TRANSMITTING),
(DELIVERING, RECEIVING, RECEIVING),
(DELIVERING, RECEIVING, TRANSMITTING),
(DELIVERING, DELIVERING, RECEIVING),
(COLLECTING, RECEIVING, RECEIVING), ...).
All FSAs in Figures 2 and 3 are complete and deterministic. For example, in Figure 2,
rover I can only take action I-deliver from its DELIVERING state. However every possible
action choice of L determines a unique next state for I from DELIVERING. For example,
if L takes L-transmit then I must stay in state DELIVERING, and if L takes L-receive or
L-pause then I must go to state RECEIVING.
106
Asimovian Adaptive Agents
3.2 Properties
Now that we have presented the FSA formalism used for agent plans, we can address the
question of how to formalize properties. For veri�cation, properties are typically expressed
either as FSAs (for automata-theoretic veri�cation) or in temporal logic. Here, we assume
linear temporal logic. In other words, we assume that time proceeds linearly and we do not
consider simultaneous possible futures. Using the algorithm of Vardi and Wolper (1986),
one can convert any linear temporal logic formula into an automaton (because automata
are more expressive than linear temporal logic). Both representations are used here. To
simplify our proofs in Section 4, properties are expressed in temporal logic. For some of the
incremental reveri�cation methods in Section 6, we use automata-theoretic methods with
an FSA representation for the property.
Let us begin by de�ning temporal logic properties. Many of the de�nitions are based on
Manna and Pnueli (1991). To bridge the gap between automata (for plans) and temporal
logic (for properties), we need to de�ne a computational state (c-state). A computation is
an in�nite sequence of temporally-ordered atoms, i.e., a string. A c-state is an atom in a
computation. In other words, it is a (single or multiagent) action that occurs at a single
time step in a computation. We continue to refer to an automaton state as simply a \state."
P is a property that is true (false) for an FSA S. S j= P (S 6j= P ), if and only if P
is true for every string in the language L(S) (false for some string in L(S)). The notation
x j= P (x 6j= P ) means string x satis�es (does not satisfy) property P , i.e., the property
holds (does not hold) for x. Before de�ning what it means for properties to be true (i.e.,
hold) for a string, we �rst de�ne what it means for a formula that is a Boolean expression
to be true at a c-state. A c-state formula p is true (false) at c-state x
i
, i.e., x
i
j= p (x
i
6j=
p) if and only if x
i
� p (x
i
6� p), i.e., x
i
^ p 6= 0 (= 0) because p is a Boolean expression
with no variables on the same Boolean algebra used by FSA S, and x
i
is an atom of that
algebra. For example, F-collect j= (F-collect _ F-deliver) for c-state F-collect and c-state
formula (F-collect _ F-deliver). One can also talk about a c-state formula being true or
false for an atom, since a c-state is an atom.
A c-state formula p is true or false in particular c-states of a string. Property P is
de�ned in terms of p, and is true or false of an entire string. In particular, x j= P or x 6j= P
for the string x.
We focus on two property classes that are among those most frequently encountered in
the veri�cation literature: Invariance and Response properties. Invariance and Response
properties are likely to be useful for agents. For the case of a single agent (SIT
1agent
), In-
variance properties can express the requirement that a particular action never be executed.
8
Response properties are also useful for a single agent. They can be used to verify that a
pair of the agent's actions will occur in the correct order (i.e., a \response" always fol-
lows a \trigger") in the plan. In the context of multiple agents (SIT
1plan
or SIT
multplans
)
Invariance properties express the need for parallel multiagent coordination. In particular,
they express that multiple agents should not simultaneously perform some con icting set of
8. This could alternatively be implemented as a run-time check, but then there would be no assurance that
the plan without the action is a good one, for example, in terms of how well the revised plan satis�es
the agent's goals (perhaps captured in a \�tness function"). Alternatively, the action (atom) could be
omitted from the set of actions �(K). But in general one may not wish to rule out actions, in case the
situation and/or properties might change.
107
Gordon
actions. Response properties express the need for sequential multiagent coordination. For
example, they can express the requirement that one agent's action must follow in response
to a particular \triggering" action of another agent.
Here, we only present informal de�nitions of these properties; the formal de�nitions are
in Appendix B. An Invariance property P = 2:p (\Invariant not p") is true of a string
if p is \never" true, i.e., if p is not true in any c-state of the string. P = 2(p ! 3q)
is a Response property, where 3 means \eventually." We call p the \trigger" and q the
\response." A Response formula states that every trigger is eventually (in �nite time)
followed by a response.
To illustrate these property types, we continue the rovers and lander example. The
property P1 from Section 2, which states that it should always be the case that I does
not deliver at the same time that L is transmitting, is formally expressed as an Invariance
property P1 de�ned as: P1 = 2 (:(I-deliver ^ L-transmit)). Property P2 from Section 2,
which states that if F-deliver has occurred then eventually L will execute L-receive, is an
example of a Response property. This is expressed in temporal logic as P2 = 2 (F-deliver
! 3 L-receive).
Next consider the FSA representation for properties. As will be explained in Section 3.3
on veri�cation, what we really need to express for automata-theoretic veri�cation is the
negation of the property, i.e., :P . Strings in the language of FSA :P violate property P .
In this paper, we assume that :P is expressed using the popular B�uchi !-automaton (B�uchi,
1962). We decided to use the B�uchi FSA because one of the simplest and most elegant model
checking algorithms in the literature assumes this type of FSA for the property, and we use
that algorithm (see Subsections 3.3 and 6.1). A B�uchi automaton is de�ned to be a four-
tuple S = (V (S);M
K
(S); I(S); B(S)), where B(S) � V (S) is a set of \bad" states. To
de�ne the language of a B�uchi automaton, we require the following preliminary de�nition.
For a run v of FSA S, �(v) = fv 2 V (S) j v
i
= v for in�nitely many v
i
s in run vg. In
other words, �(v) equals the set of all vertices of S that occur in�nitely often in the run v.
Then for a B�uchi automaton S, L(S) = fx 2 �(K)
!
j x has a run v = (v
0
; :::) in S with
v
0
2 I(S) and �(v) \ B(S) 6= ;g. In other words, the B�uchi automaton has an acceptance
criterion that requires visiting some bad state in�nitely often, as well as beginning in an
initial state.
An example deterministic B�uchi FSA for :P1, where Invariance property P1 = 2 :(I-
deliver ^ L-transmit), is in Figure 5 (on the left) with B(:P1) = f2g. Note that visiting a
state in B(:P1) in�nitely often implies B�uchi acceptance, and because the FSA expresses
the negation of the property, visiting a \bad" state in B(:P1) in�nitely often is undesirable.
From Figure 5 we can see that any string that includes (I-deliver ^ L-transmit) will visit
state 2 in�nitely often, and B(:P1) = f2g. Thus any string that starts in state 1 and
includes (I-deliver ^ L-transmit) is in L(:P1) and therefore violates property P1.
Next consider Response properties of the form 2(p ! 3q). For this paper, the only
type of FSA that we need for verifying Response properties is the very simple deterministic
B�uchi FSA for the negation of a \First-Response" property.
9
(Determinism is needed for
our e�cient internal representation. See Subsection 6.1.) A First-Response property checks
9. A straightforward inductive argument shows that it is not possible to construct a deterministic B�uchi
automaton with a �nite number of states for the negation of the full Response property 2(p ! 3q)
(Mahesh Viswanathan, personal communication).
108
Asimovian Adaptive Agents
�
�
�
�
H
Hj
1
�
��
�
else
I-deliver ^ L-transmit
?
�
�
�
�
2
�
��
*
1
�
�
�
�
H
Hj
1
�
��
�
else
F-deliver
?
�
�
�
�
2
�
��
*
else
�
�
�
�
3
�
L-receive
�
��
*
1
Figure 5: Invariance property :P1 (left) and the First-Response version of property :P2
(right) as B�uchi FSAs, where B(S) = f2g for both automata.
whether the first trigger p in every string is followed by a response q. Figure 5 (on the right)
shows a B�uchi FSA for the First-Response property corresponding to :P2, where property
P2 = 2 (F-deliver ! 3 L-receive). For this FSA, B(:P2) = f2g. Any string whose
accepting run visits state 2 in�nitely often will include the �rst trigger and not the response
that should follow it. As discussed in Subsection 6.5, verifying First-Response properties
can in some circumstances (including all of our experiments) be equivalent to verifying the
full Response property 2(p ! 3q). Henceforth, when we use the term \Response" this is
assumed to include both the full Response and the First-Response versions.
3.3 Model Checking for Veri�cation
Now that we have our representations for plans and properties, it is possible to describe
model checking, i.e., for plan S and property P determining whether S j= P . First, however,
we need to begin with two essential de�nitions of accessibility: accessibility of one vertex
from another, and accessibility of an atom from a vertex.
De�nition 1 Vertex v
n
is accessible from vertex v
0
if and only if there exists a path from
v
0
to v
n
.
De�nition 2 Atom a
n�1
2 �(K) is accessible from vertex v
0
if and only if there exists a
path from v
0
to v
n
and a
n�1
�M
K
(v
n�1
; v
n
).
Accessibility from initial states is central to model checking. The reason is the following.
Recall from Section 3.2 that property P is true (false) for an FSA S, (i.e., S j= P (S 6j= P )),
if and only if P is true for every string in the language L(S) (false for some string in
L(S)). By de�nition, every string in the language has an accepting run. Therefore, it is
only necessary to verify the property for strings that have an accepting run. By de�nition,
every accepting run begins with an initial state. Therefore, every state in an accepting run
is accessible from an initial state, and every atom (c-state) in a string of the language is
accessible from an initial state. Clearly, the only states and atoms that need to be involved
in veri�cation are those accessible from initial states.
Invariance properties can be re-expressed in terms of accessibility. Invariance property
2:p could be restated as saying that there does not exist any atom a, where a � p, that
is accessible from an initial state. It is much more di�cult to express Response properties
109
Gordon
succinctly using accessibility. Nevertheless, accessibility plays a key role in verifying all
properties, as will be seen shortly.
There are a number of ways to perform model checking, but here we focus on two.
The �rst method is speci�cally tailored for one class of properties; the second is su�ciently
general for use in verifying many classes of properties. The rationale for choosing a speci�c
and a general algorithm is that this allows for a comparison to determine the computational
e�ciency gained by property-speci�c tailoring (see Subsection 6.5). In this section, we give
high-level sketches of these two model checking algorithms. The full algorithms are in
Section 6.
The �rst algorithm is a very simple and e�cient method tailored for Invariance properties
P = 2:p. For every initial state v
i
, this method begins at v
i
and visits every atom a
j
accessible from v
i
. If this atom has not already been checked, it checks to see whether
a
j
� p. If a
j
� p, then this is considered a veri�cation failure. If there are no failures,
veri�cation succeeds.
The second method, automata-theoretic (AT) model checking, is very popular in the
veri�cation literature (e.g, see Vardi and Wolper, 1986) and can be used to verify any prop-
erty expressible as a �nite-state automaton. It is used here for First-Response properties.
In AT model checking, asking whether S j= P is equivalent to asking whether L(S) � L(P )
for property P . This is equivalent to L(S) \ L(P ) = ; (where L(P ) denotes the comple-
ment of L(P )), which is algorithmically tested by �rst taking the tensor product of the
plan FSA S and the FSA corresponding to :P (i.e., S :P ). The FSA corresponding to
:P accepts L(P ). The tensor product implements language intersection. The algorithm
then determines whether L(S :P ) 6= ;, which implies L(S) \ L(P ) 6= ; (S 6j= P ). This
determination is implemented as a check for cycles in the product FSA S:P that are ac-
cessible from some initial state and that satisfy any other conditions in the FSA acceptance
criterion. Recall that a cycle is a sequence of vertices (v
0
; :::; v
n
) such that v
n
= v
0
. A cycle
is accessible from an initial state if one of its vertices is accessible from the initial state. A
cycle that is accessible from an initial state and that satis�es the FSA acceptance criterion
implies a nonempty language. This is because a string is in the language of an FSA if it is
an in�nite-length sequence of actions satisfying the FSA acceptance criterion, which always
includes the requirement that its accepting run must begin in an initial state. All in�nite
behavior eventually ends up in a cycle because the FSA has a �nite number of states.
Therefore, to be certain that the language is nonempty, it is necessary to determine
whether any accessible cycle satis�es the FSA acceptance criterion. The criterion of inter-
est is the B�uchi criterion, for the following reason. It is assumed here that the negation
of the property (:P ) is expressed as a B�uchi automaton. This implies that the FSA be-
ing searched, i.e., S :P , is also a B�uchi automaton, because taking the tensor product
preserves this criterion. The �nal check of this algorithm is whether an accessible cycle in
S :P satis�es the B�uchi acceptance criterion, because in that case the language is not
empty. A product state s is in B(S :P ) whenever it has a component state in B(:P ),
e.g., (COLLECTING, RECEIVING, RECEIVING, 2) is in B(S :P2) for property P2
because its fourth component is state 2 of B(:P2). According to the B�uchi acceptance
criterion, visiting a state v 2 B(S :P ) in�nitely often (assuming v is accessible from
an initial state) implies L(S :P ) 6= ;. This will happen if v is part of an accessible
cycle. In that case, S 6j= P and veri�cation fails. Otherwise, if no accessible product state
110
Asimovian Adaptive Agents
v 2 B(S :P ) is visited in�nitely often (i.e., it is not in a cycle), then L(S :P ) = ;
and therefore L(S) � L(P ), i.e., S j= P and veri�cation succeeds. A relatively e�cient
algorithm for AT veri�cation from the literature is presented in Section 6.
3.4 Machine Learning to Adapt Plans
Given plan S and property P , model checking determines whether S j= P . Next we consider
the case of learning, which is a change to S. This subsection addresses the issue of how a
learning operator can a�ect a plan S to generate a new plan S
0
.
We begin by presenting a taxonomy of FSA learning operators. It is likely that any
learning method for complete deterministic FSAs will be composed of one or more of these
operators. Nothing about our approach requires evolutionary learning per se; however to
make the discussion concrete, this is the form of learning that is assumed here. In the
context of evolutionary algorithms, the FSA learning operators are perturbations, such as
mutations, applied to the FSAs.
Procedure EA
t = 0; /* initial generation */
initialize population(t);
evaluate �tness(t);
until termination-criterion do
t = t+ 1; /* next generation */
select parents(t);
perturb(t);
evaluate �tness(t);
enduntil
end procedure
Figure 6: The outline of an evolutionary algorithm.
We assume that learning occurs in two phases: the o�ine and online phases (see Fig-
ure 1). During the o�ine phase, each agent starts with a randomly initialized population of
candidate FSA plans. This population is evolved using the evolutionary algorithm outlined
in Figure 6. The main loop of this algorithm consists of selecting parent plans from the
population, applying perturbation operators to the parents to produce o�spring, evaluat-
ing the �tness of the o�spring, and then returning the o�spring to the population if they
are su�ciently �t. After this evolution, veri�cation and repair are done to these initially
generated plans.
At the start of the online phase, each agent selects one \best" (according to its \�tness
function") plan from its population for execution. The agents are then �elded and plan
execution is interleaved with learning (adaptation), reveri�cation, and plan repair as needed.
The purpose of learning during the online phase is to �ne-tune the plan and adapt it to
keep pace with a gradually shifting environment, since normally real-world environments
are not static. The evolutionary algorithm of Figure 6 is also used during this phase, but the
assumption is a population size of one and incremental learning (i.e., one learning operator
111
Gordon
applied per FSA per generation). This is practical for situations in which the environment
changes gradually, rather than radically.
Formally, a machine learning operator o : S ! S
0
changes a (product or individual)
FSA S to post-learning FSA S
0
. A mapping between two automata S and S
0
is de�ned as a
mapping between their elements (Bavel, 1983). At the highest level, we can subdivide the
learning operators according to the elements of the FSA that they alter:
� One class of operators adds, deletes, or moves edge transition conditions. In other
words, o :M
K
(S)!M
K
(S
0
).
� Another class of operators adds, deletes, or moves edges, i.e., o : E(S)! E(S
0
).
� The third class of operators adds or deletes vertices, along with their edges, i.e.,
o : V (S)! V (S
0
) and o : E(S)! E(S
0
).
� The fourth class of operators changes the Boolean algebra used in the transition
conditions, i.e., o : K ! K
0
.
Here, we do not de�ne operators that add or delete states. In other words, we do not
address the third class of operators. The reason is that with the type of FSAs used here,
adding or deleting a state does not, in itself, a�ect properties. It is what we do with the
edges to/from a state and their transition conditions that can alter whether a property
is true or false for a plan. This is because properties are true or false for comp-states
(atoms) rather than for FSA states. Furthermore, this paper does not address changes to
the Boolean algebra, which is the fourth class of operators. This class of operators, which
includes abstractions, is addressed in Gordon (1998).
Therefore we are focusing on the �rst and second classes of operators. We de�ne operator
schemas, rather than operators. A machine learning operator schema applies to unspeci�ed
(variable) vertices, edges, and transition conditions. When instantiated with particular
vertices, edges, and transition conditions, it becomes a machine learning operator. In order
to avoid tedium, the operator schema de�nitions consider only the relevant parts of the
FSA, e.g., those parts that get altered. There is an implicit assumption that all unspeci�ed
parts of the FSA remain the same after operator application. There is also an assumption
that the learner ensures that all operators keep the automaton complete and deterministic.
The operators can be seen in the taxonomy (partition) of Figure 7. We de�ne each of
the corresponding operator schemas as follows, beginning with the most general one, called
o
change
, which changes edge transition conditions:
Operator Schema 1 (o
change
) Let S be an FSA with Boolean algebra K, and let o
change
:
S ! S
0
. Then we de�ne o
change
: M
K
(S)!M
K
(S
0
). In particular, suppose z �M
K
(v
1
; v
2
),
z 6= 0, for (v
1
; v
2
) 2 E(S) and z 6�M
K
(v
1
; v
3
) for (v
1
; v
3
) 2 E(S). Then o
change
(M
K
(v
1
; v
2
))
= M
K
(v
1
; v
2
) ^ :z (step 1) and/or o
change
(M
K
(v
1
; v
3
)) = M
K
(v
1
; v
3
) _ z (step 2). In other
words, o
change
may consist of two steps: the �rst to remove condition z from edge (v
1
; v
2
),
and the second to add (the same) condition z to edge (v
1
; v
3
). Alternatively, o
change
may
consist of only one of these two steps.
112
Asimovian Adaptive Agents
o
change
step (1) and step (2) step (1) or step (2) alone
access-cond : access-cond
v
1
=v
3
v
1
6= v
3
step (1) step (2)
X
X
X
X
X
X
X
X
X
X
X
X
X
Xz
�
�
�
�
�
�
�
�
�9
o
move
�
�=
Z
Z~
o
delete_spec
�
�=
Z
Z~
o
add_gen
�
�=
Z
Z~
o
delete
o
add
o
spec
o
gen
�
�
�
�=
�
�
�9
�
�
�
�/
�
�=
Z
Z~
X
X
X
X
X
X
X
X
Xz
�
�=
Z
Z~
o
delete+gen
o
spec+gen
o
stay
o
delete+add
o
spec+add
Figure 7: Taxonomy (partition) of learning operators.
All of the remaining operators are easier to describe in terms of a set of four primitive
operators. Therefore, we next de�ne these four primitives, which are one-step operators
that are special cases of o
change
and appear at the bottom right as leaves in the hierarchy of
Figure 7. The �rst two primitive operators delete (o
delete
) and add (o
add
) edges. We de�ne
o
delete
to delete edge (v
1
; v
2
) with the operator schema:
Operator Schema 2 (o
delete
) Let S be an FSA with Boolean algebra K, and let o
delete
:
S ! S
0
be de�ned with o
delete
: E(S) ! E(S) n f(v
1
; v
2
)g for deleted edge (v
1
; v
2
) of S.
Recall that a nonexistent edge has transition condition 0. Operator o
delete
could therefore be
considered a special case of o
change
that consists only of step (1) and an additional condition
that must be met, namely, that o
delete
(M
K
(v
1
; v
2
)) = (M
K
(v
1
; v
2
) ^ :z) = 0.
We de�ne o
add
to add edge (v
1
; v
3
) with the operator schema:
Operator Schema 3 (o
add
) Let S be an FSA with Boolean algebra K, and let o
add
: S !
S
0
be de�ned with o
add
: E(S)! E(S)[f(v
1
; v
3
)g for added edge (v
1
; v
3
) of S. Operator o
add
could be considered a special case of o
change
that consists only of step (2) and the additional
condition that M
K
(v
1
; v
3
) = 0 prior to applying o
add
.
The other two primitive operators are specialization (o
spec
) and generalization (o
gen
).
Specialization and generalization are operators commonly found in the machine learning lit-
erature, e.g., see Michalski (1983). In the context of an FSA, specialization lowers the level
of a particular state-to-state transition condition in the partial order �, whereas general-
ization raises it, as in Mitchell's Version Spaces (Mitchell, 1978). In particular, a transition
condition can be specialized with a meet and can be generalized with a join, which is
analogous to adding a conjunct to specialize and a disjunct to generalize as in Michalski
(1983).
Formally, we de�ne specialization and generalization, respectively, as follows:
Operator Schema 4 (o
spec
) Let S be an FSA with Boolean algebra K, and let o
spec
: S !
S
0
. Then we can de�ne o
spec
: M
K
(S) ! M
K
(S
0
), where o
spec
(M
K
(v
1
; v
2
)) = M
K
(v
1
; v
2
) ^
113
Gordon
:z, for some z 2 K, z 6= 0. Operator o
spec
could be considered a special case of o
change
that
consists only of step (1) and the additional two conditions o
spec
(M
K
(v
1
; v
2
)) = (M
K
(v
1
; v
2
)
^ :z) 6= 0 (i.e., o
spec
6= o
delete
), and M
K
(v
1
; v
2
) 6= :z (since otherwise o
spec
has no e�ect).
Operator Schema 5 (o
gen
) Let S be an FSA with Boolean algebra K, and let o
gen
: S !
S
0
. Then we can de�ne o
gen
:M
K
(S)!M
K
(S
0
), where o
gen
(M
K
(v
1
; v
3
)) =M
K
(v
1
; v
3
) _ z,
for some z 2 K, z 6= 0. Operator o
gen
could be considered a special case of o
change
that
consists only of step (2) and the two additional conditions that M
K
(v
1
; v
3
) 6= 0 (i.e., o
gen
6= o
add
) and (M
K
(v
1
; v
3
) ^z) = 0 (because otherwise z adds redundancy) prior to o
gen
.
Next, 10 learning operators are de�ned from these four primitives. Below o
change
in
the operator hierarchy of Figure 7 are two subtrees. The right subtree consists of one-step
operators, and the left subtree consists of two-step operators. We de�ne the two one-step
operators just below o
change
�rst (since we just de�ned the primitive operators below them):
Operator Schema 6 (o
delete_spec
) This operator consists of applying either of the prim-
itive operators o
delete
or o
spec
.
Operator Schema 7 (o
add_gen
) This operator consists of applying either of the primitive
operators o
add
or o
gen
.
It is relevant at this point to introduce two more operators that are not in the hierarchy
of Figure 7. They are not in the hierarchy because they are merely minor variants of
o
delete_spec
and o
add_gen
and they do not belong strictly below our most general operator
o
change
. These operators are introduced here because they are very useful and also because
they are guaranteed to preserve completeness of FSAs. In other words, if the FSA is
complete prior to applying these operators then it will be complete after applying them.
Recall from Section 2 that each FSA state is associated with a set of allowable actions that
may be taken from that state. These operators delete or add an action from the set of
allowable actions from a state:
Operator Schema 8 (o
delete�action
) Delete an allowable action from a state v
1
by one
or more applications of operator o
delete_spec
. Each application may be to a di�erent outgoing
edge from v
1
.
Operator Schema 9 (o
add�action
) Add an allowable action from a state v
1
by one or
more applications of operator o
add_gen
. Each application may be to a di�erent outgoing
edge from v
1
.
To understand why o
delete�action
consists of one or more applications of o
delete_spec
,
consider the following example. In Figure 2, deleting F-collect as an allowable action from
F's COLLECTING state results in F-deliver being the only allowable action from that state.
Furthermore, this results in the edge (COLLECTING, COLLECTING) being deleted and
the edge (COLLECTING, DELIVERING) being specialized. The reasoning is similar for
why o
add�action
is one or more applications of o
add_gen
.
The remaining operators, which are all of the operators on the left subtree of o
change
in
Figure 7, consist of two steps: the �rst to remove condition z from edge (v
1
; v
2
), and the
114
Asimovian Adaptive Agents
�
�
�
�
STATE1
a _ b
c
?
Z
Z
Z~
�
�
�
�
STATE2
�
�
�
�
STATE3
�
�
�
�
STATE4
a
?
Figure 8: Moving transition conditions between edges.
second to add (the same) condition z to edge (v
1
; v
3
). The �rst step consists of applying
one primitive operator, and the second step consists of applying another primitive operator.
Every one of the following operators preserves determinism and completeness of the FSAs.
In other words, if the FSA is deterministic and complete prior to operator application then
it will be deterministic and complete afterwards.
Operator Schema 10 (o
move
) This operator schema is identical to that of o
change
, with
one exception. Replace \and/or" with \and" in the de�nition. In other words, we have
o
move
(M
K
(v
1
; v
2
)) = M
K
(v
1
; v
2
) ^ :z and o
move
(M
K
(v
1
; v
3
)) = M
K
(v
1
; v
3
) _ z for some
(v
1
; v
2
); (v
1
; v
3
) 2 E(S). Therefore o
move
moves z from one edge to another.
All of the remaining operators are special cases of o
move
. We begin with the right subtree
of o
move
:
Operator Schema 11 (o
delete+add
) Apply o
delete
to edge (v
1
; v
2
) and then apply o
add
to
edge (v
1
; v
3
).
An example of o
delete+add
, using Figure 8, is to delete edge (STATE1, STATE3) (i.e.,
makeM
K
(STATE1, STATE3) = 0) and add a new edge (STATE1, STATE1) with transition
condition M
K
(STATE1, STATE1) = c.
Operator Schema 12 (o
spec+add
) Apply o
spec
to edge (v
1
; v
2
) and then apply o
add
to edge
(v
1
; v
3
).
For example, using Figure 8, we can move \b" from edge (STATE1, STATE2) to a
newly created edge (STATE1, STATE1) to make M
K
(STATE1, STATE2) = a ^ :b and
M
K
(STATE1, STATE1) = b. This is specialization of the condition on edge (STATE1,
STATE2) followed by addition of edge (STATE1, STATE1).
Next consider the left subtree of o
move
. At this point, it is relevant to examine the
reason for the split into the two subtrees of o
move
. All of the operators in the left subtree
satisfy a condition that is called the \accessibility condition." This condition states that
prior to learning (and also after learning), if vertex v
1
is accessible from some initial state
then vertex v
3
is guaranteed to also be accessible from that initial state. The reason for
this partition will become clear in Subsection 4.2, where we show that a theorem holds for
the two-step operators if and only if the accessibility condition is true. The reason that the
115
Gordon
two operators in the right subtree of o
move
fail to satisfy the accessibility condition is due
to their having o
add
as their second step. The de�nition of o
add
states that M
K
(v
1
; v
3
) = 0
prior to operator application, and therefore we have no guarantee of v
3
's accessibility, given
that v
1
is accessible from an initial state. The following are the de�nitions of the operators
for which the accessibility condition is true:
Operator Schema 13 (o
delete+gen
) Apply o
delete
to edge (v
1
; v
2
) and then apply o
gen
to
edge (v
1
; v
3
).
As an example, in Figure 8, we can move the condition \a _ b" from edge (STATE1,
STATE2) to edge (STATE1, STATE3) to make M
K
(STATE1, STATE2) = 0 and to make
M
K
(STATE1, STATE3) = c _ a _ b. This is deletion of edge (STATE1, STATE2) followed
by generalization of the transition condition on edge (STATE1, STATE3).
Operator Schema 14 (o
spec+gen
) Apply o
spec
to edge (v
1
; v
2
) and then apply o
gen
to edge
(v
1
; v
3
).
As an example, in Figure 8, we can move the disjunct \b" from edge (STATE1, STATE2)
to edge (STATE1, STATE3) to make M
K
(STATE1, STATE2) = a ^ :b andM
K
(STATE1,
STATE3) = c _ b. This is a specialization of the transition condition on edge (STATE1,
STATE2) followed by a generalization of the transition condition on edge (STATE1, STATE3).
Operator Schema 15 (o
stay
) The de�nition is the same as that of o
move
, with one ex-
ception. Replace vertex v
3
with vertex v
1
everywhere. In other words, the operator consists
of moving a condition from edge (v
1
; v
2
) to edge (v
1
; v
1
).
Note that each operator instantiation of the schema for o
stay
will be a special case of one
of the following: o
delete+add
, o
spec+add
, o
delete+gen
, or o
spec+gen
. It is considered o
stay
if and
only if on the second step of the operator the transition condition is moved to edge (v
1
; v
1
).
For example, using Figure 8, when we applied operator o
spec+add
(in the example above)
to move the disjunct \b" from edge (STATE1, STATE2) to edge (STATE1, STATE1) to
make M
K
(STATE1, STATE2) = a ^ :b and M
K
(STATE1, STATE1) = b, this could
be considered an instantiation of o
stay
, as well as o
spec+add
. Likewise when we applied
o
delete+add
to delete edge (STATE1, STATE3) and add edge (STATE1, STATE1) with \c"
as the transition condition, this could also be considered an instantiation of o
stay
.
Operator o
stay
is an especially useful operator. It makes the reasonable assumption that
when an agent no longer wants to transition to another state (e.g., an edge is deleted),
the agent just stays in its current state. In other words, the condition for transitioning to
another state is transferred to the edge leading back to the current state. For example,
suppose rover I becomes stuck at the lander and cannot rendezvous with F for an inde-
terminate period of time. It could generate a temporary plan (see Figure 2) that keeps
I in its DELIVERING state by deleting edge (DELIVERING, RECEIVING) and making
M
K
(DELIVERING, DELIVERING) = 1 (and DELIVERING would have to become an
initial state).
Recall that accessibility is a key issue for veri�cation. Now that we have a set of op-
erator schemas, let us consider how these operators a�ect accessibility from initial states.
116
Asimovian Adaptive Agents
Clarifying this will be relevant for understanding both the a priori proofs about property
preservation, and the motivation for the incremental reveri�cation algorithms. There are
two fundamental ways that our learning operators may a�ect accessibility: locally (ab-
breviated \L"), i.e., by directly altering the accessibility of atoms or states, or globally
(abbreviated \G"), i.e., by altering accessibility of states or atoms that could be visited
after the part of the FSA modi�ed by the learning operator. In particular, any change to
the accessibility of v
1
, v
2
, v
3
or atoms in M
K
(v
1
; v
2
) or M
K
(v
1
; v
3
), referenced in the oper-
ator de�nition, is considered local. Changes to accessibility of any other states or atoms is
considered global.
As an example of an L (local) change to accessibility, using Figure 8, suppose the agent
discovers a new action \d" that it can take. It adds \d" to its action repertoire, as well as
to the set of allowable actions from one of the states in its FSA. In particular, the agent
decides to allow \d" from STATE1 and decides to apply o
gen
to the transition condition for
(STATE1, STATE3) to get condition \c _ d." Then atom \d" was not previously accessible
from any initial state, but if we assume STATE1 is accessible from an initial state then the
application of o
gen
made the atom \d" accessible. Using Figure 8 to illustrate a G (global)
change to accessibility, suppose we delete edge (STATE1, STATE3) in that �gure. Then
STATE4, which was previously accessible (because we assume STATE1 is accessible) is no
longer accessible. On the other hand, the fact that STATE3 is no longer accessible is a local
change.
Now we are ready to summarize what the learning operators can do to accessibility.
First, we introduce one more notational convenience. The symbols " and # denote \can
increase" and \can decrease," respectively, and 6" and 6# denote \cannot increase" and \can-
not decrease," respectively. We use these symbols with G and L, e.g., " G means that a
learning operator can (but does not necessarily) increase global accessibility, and 6# L means
that an operator cannot decrease local accessibility.
The results for the primitive operators are intuitively obvious:
� o
delete
: # G # L 6" G 6" L
� o
spec
: 6# G # L 6" G 6" L
� o
add
: 6# G 6# L " G " L
� o
gen
: 6# G 6# L 6" G " L
The primitive operators provide answers about changes in accessibility for all of the
one-step operators. For the two-step operators (i.e., o
move
and all operators below it in the
hierarchy of Figure 7), we need to consider the net e�ect. For the results in this paper, we
only need to focus on one distinction { the di�erence in the net e�ect for those operators
that satisfy the accessibility condition (i.e., the left subtree of o
move
) versus the net e�ect for
those operators that do not satisfy this condition (i.e., the right subtree). The net e�ect of
those operators that satisfy the accessibility condition is that accessibility (global and local)
will never be increased, i.e., 6" G and 6" L. The reason is as follows. By looking at the results
for the primitive operators, it is apparent that the �rst step in these two-step operators can
never increase accessibility, because the �rst step is always o
delete
or o
spec
. Therefore, to
understand the intuition behind this result we need to examine the second step. Consider
117
Gordon
o
delete+gen
and o
spec+gen
. Note that o
gen
does not increase global accessibility (6" G), but
it can increase local accessibility (" L). Is " L a net e�ect due to the generalization step?
Because atoms are being transferred from one outgoing edge of some vertex v
1
to another
outgoing edge of v
1
with these two operators, by de�nition the local accessibility of those
atoms from an initial state will not be increased as a net e�ect. In other words, the atoms
are accessible from an initial state if and only if v
1
is, and these two learning operators
do not increase the accessibility of v
1
. Furthermore, by de�nition M
K
(v
1
; v
3
) 6= 0 prior to
learning, so the accessibility of v
3
is not increased. We conclude that 6" L is a net e�ect.
A similar line of reasoning explains why operator o
stay
will not increase local accessibility.
Operator o
stay
cannot increase global accessibility, even if it adds an edge, because the only
edge that this operator could add is (v
1
; v
1
). In conclusion, all three operators that satisfy
the accessibility condition have a net e�ect of not increasing accessibility (6" G and 6" L). On
the other hand, because operators o
delete+add
and o
spec+add
have o
add
as their second step,
they can increase accessibility.
Results from lower in the hierarchy of Figure 7 are inherited up the tree. For example,
because o
delete+add
can increase global accessibility, o
move
can as well. The following is a
summary of the relevant results we have so far about how the two-step learning operators
can change accessibility. To avoid overwhelming the reader, we present only those results
necessary for understanding this paper.
� o
stay
, o
delete+gen
, o
spec+gen
: 6" G 6" L
� o
delete+add
, o
spec+add
, o
move
, o
change
: " G
Before concluding this section, we brie y consider a di�erent partition of the learning
operators than that re ected in the taxonomy of Figure 7. This di�erent partition is neces-
sary for understanding the a priori proofs about the preservation of Response properties (in
Section 4). For this partition, we wish to distinguish those operators that can introduce at
least one new string with an in�nitely repeating substring (e.g., (a,b,c,d,e,d,e,d,e,...) where
the ellipsis represents in�nite repetition of d followed by e) into the FSA language versus
those that cannot. Any operator that can add atoms to the transition condition for an edge
in a cycle, add an edge to an existing cycle, or add an edge to create a new cycle belongs
to the �rst class (the class that can add such substrings). Thus this �rst class includes our
operators that can create new cycles (e.g., o
stay
because it can add a new edge (v
1
; v
1
)), as
well as our operators that can generalize the transition condition along some edge of a cycle
(e.g., o
delete+gen
because it can generalize M
K
(v
1
; v
1
)). The operators are divided between
these two classes as follows:
1. o
add
, o
gen
, o
add_gen
, o
add�action
, o
stay
, o
delete+gen
, o
spec+gen
, o
delete+add
, o
spec+add
, o
move
,
o
change
2. o
delete
, o
spec
, o
delete_spec
, o
delete�action
It is important to note that all of the two-step operators are in the �rst class.
At this point we have de�ned a set of useful operators (via their operator schemas) that
one could apply to an FSA plan for adaptation. With these operators, it is possible to
improve the e�ectiveness of a plan, and to adapt it to handle previously unforeseen external
118
Asimovian Adaptive Agents
and internal conditions. To ensure the usefulness of these learning operators, the learner
needs to check that it has not generated a useless plan (i.e., L(S) 6= ;). Although not
addressed in this paper, we are currently developing e�cient methods for making this check
using the knowledge of the learning that was done.
The particular choice of learning operators presented here was motivated by four factors.
First, these operators translate into easy-to-implement perturbations of entries in a table,
which is the representation of FSAs used in our implementation (see Section 6). Second,
these operators were inspired by the literature. For example, generalization and special-
ization operators are considered fundamental for inductive inference (Michalski, 1983), and
deleting/adding FSA edges are e�ective for evolving FSAs (Fogel, 1996). Third, these oper-
ators made practical sense in the context of applications that were considered. Fourth, the
particular taxonomies presented here facilitate powerful theoretical and empirical results
for reducing the time complexity of reveri�cation, as shown in the remainder of this paper.
4. A Priori Results about the Safety of Machine Learning Operators
Subsection 3.4 de�ned several useful learning operator schemas to modify automaton edges
(o : E(S)! E(S
0
)) and the transition conditions along edges (o :M
K
(S)!M
K
(S
0
)). The
results in this section establish which of these operator schemas o are a priori guaranteed to
preserve two property classes of interest (Invariance and Response). This section assumes
that all learning operators are applied to a single FSA plan, i.e., SIT
1agent
or SIT
1plan
.
Section 5 addresses the translation of the operators applied to a single plan into their e�ect
on a product plan (for SIT
multplans
), and how this a�ects the results. We begin by formally
de�ning what we mean by \safe machine learning operator."
4.1 \Safe" Online Machine Learning
Our objective is to lower the time complexity of reveri�cation. The ideal solution is to iden-
tify safe machine learning methods (SMLs), which are machine learning operators that are
a priori guaranteed to preserve properties (also called \correctness preserving mappings")
and require no run-time reveri�cation. For a plan S and property P , suppose veri�cation
has succeeded prior to learning, i.e., 8x, x 2 L(S) implies x j= P (i.e., S j= P ). Then a
machine learning operator o(S) is an SML if and only if veri�cation is guaranteed to succeed
after learning. In other words, if S
0
= o(S), then S j= P implies S
0
j= P .
Subsection 4.2 provides results about the a priori safety of machine learning operators.
Some of the results in Subsection 4.2 are negative. Nevertheless, although we do not have
an a priori guarantee for these learning operators, Section 6 shows that we can perform
reveri�cation more e�ciently than total reveri�cation from scratch.
4.2 Theoretical Results
Let us begin by considering the primitive operators. The results for all primitive operators
are corollaries of two fundamental theorems, Theorems 1 and 2, which may not be imme-
diately intuitive. For example, it seems reasonable to suspect that if an edge is deleted
somewhere along the path from a trigger to a response, then this could cause failure of a
Response property to hold because the response is no longer accessible. In fact, this is not
119
Gordon
true. What actually happens is that deletions reduce the number of strings in the language.
If the original language satis�es the property then so does is the smaller language. Theorem
1 formalizes this.
Theorem 1 Let S
0
be an FSA with Boolean algebra K. Let S be identical to S
0
, but with
additional edges, i.e., o : S
0
! S is de�ned as o : E(S
0
) ! E(S), where E(S
0
) � E(S).
Then L(S
0
) � L(S).
Proof. The language may be enlarged by the addition of new edges that have newly learned
transition conditions. On the other hand, because every accepting run remains an accepting
run regardless of new edges, x 2 L(S
0
) implies x 2 L(S), and we are never reducing the
size of the language. Therefore, L(S
0
) � L(S). 2
The results about the machine learning operator schemas o
delete
and o
add
follow as corol-
laries:
Corollary 1 o
delete
is an SML with respect to any property P .
Proof. Assume S j= P . Then 8x, x 2 L(S) implies x j= P . De�ne o
delete
(S) = S
0
. By
Theorem 1, L(S
0
) � L(S). Therefore, 8x, x 2 L(S
0
) implies x j= P . We conclude that
S
0
j= P , i.e., o
delete
(S) j= P . 2
To be consistent with Theorem 1, in Corollary 2 only (but not in the rest of the paper), we
use S
0
for the pre-o
add
FSA and S for the post-o
add
FSA, i.e., o
add
(S
0
) = S.
Corollary 2 o
add
is not necessarily an SML for any property, including Invariance and
Response properties.
Proof. Assume S
0
j= P . Then 8x, x 2 L(S
0
) implies x j= P . By Theorem 1, L(S
0
) �
L(S). Then we cannot be certain that S j= P , i.e., that o
add
(S
0
) j= P . For instance,
a counterexample for Invariance property 2:p occurs if we add an accessible edge with
transition condition p. 2
Now we consider a priori results for o
spec
and o
gen
. Again, we begin with a relevant
theorem for operator schema o.
Theorem 2 Let S
0
be an FSA with Boolean algebra K, and let o : S
0
! S be de�ned as
o : M
K
(S
0
) ! M
K
(S) where 9z 2 K, z 6= 0, (v
1
; v
3
) 2 E(S
0
), such that o(M
K
(v
1
; v
3
)) =
M
K
(v
1
; v
3
) _ z. Then L(S
0
) � L(S).
Proof. Similar to the proof of Theorem 1. 2
Corollary 3 o
spec
is an SML for any property.
Proof. Similar to the proof of Corollary 1 of Theorem 1. 2
Corollary 4 o
gen
is not necessarily an SML for any property, including Invariance and
Response properties.
120
Asimovian Adaptive Agents
Proof. Similar to the proof of Corollary 2 of Theorem 1. 2
We can draw the following conclusions from the theorems and corollaries just presented:
� Of the one-step learning operators, those that are guaranteed to be SMLs for any
property are o
delete
, o
spec
, and o
delete_spec
(which implies that o
delete�action
is also an
SML for any property).
� We need never be concerned with the �rst step in a two-step operator. It is guaranteed
to be an SML (because o
delete
or o
spec
is always the �rst step).
Next consider theorems that are needed to address the two-step operators. Although we
found results for the one-step operators that were general enough to address any property,
we were unable to do likewise for the two-step operators. Our results for the two-step op-
erators determine whether these operators are necessarily SMLs for Invariance or Response
properties in particular. Future work will consider other property classes. The theorems
are quite intuitive. The �rst theorem distinguishes those learning operators that will satisfy
Invariance properties from those that will not:
Theorem 3 A machine learning operator is guaranteed to be an SML with respect to any
Invariance property P if and only if 6" G and 6" L are both true (which, for our two-step
operators, implies that the operator satis�es the accessibility condition).
Proof. Suppose 6" G and 6" L are both true. Let Invariance property P = 2 :p. Assume P
is true of FSA S prior to learning. Then for every string y 2 L(S), it must be the case that
:p is true in every c-state of y. If accessibility of atoms is not increased (i.e., 6" G and 6"
L), then it must be the case that every c-state of every string x 2 L(S
0
), where S
0
= o(S),
is also a c-state of some string in L(S). Therefore, for every string x 2 L(S
0
), it must be
the case that :p is true in every c-state of x. In other words, moving transition conditions
around in an FSA without increasing accessibility will not alter the truth of an Invariance
property, which holds in every c-state of every string in the language of the FSA.
Suppose "G or " L. Increasing accessibility of atoms implies the possibility of introducing
a c-state in some string x 2 L(S
0
), where S
0
= o(S), that was not in any string of L(S).
This can cause violation of an Invariance property, as in the counterexample in the proof
of Corollary 2. Knowing that :p is true in every c-state of every string of L(S) provides no
guarantee that :p is true in every c-state of every string of L(S
0
). 2
Since we already have results to cover the one-step operators, we need only consider the
two-step operators.
Corollary 5 The machine learning operator schemas o
delete+gen
, o
spec+gen
, and o
stay
are
guaranteed to be SMLs with respect to any Invariance property P because for all of these
operators 6" G and 6" L.
Corollary 6 The machine learning operator schemas o
delete+add
, o
spec+add
, o
move
, and
o
change
are not necessarily SMLs with respect to any Invariance property P because for
all of these operators " G.
121
Gordon
�
�
�
�
H
Hj
STATE1
b
?
6
c
�
�
�
�
STATE2
a
?
6
d
�
�
�
�
STATE3
�
��
Y
e
�
�
�
�
H
Hj
STATE1
b
?
6
c
�
�
�
�
STATE2
6
d
�
�
�
�
STATE3
�
��
Y
e _ a
Figure 9: The automata S1 (left) and S1
0
(right).
The next theorem characterizes those learning operators that cannot be guaranteed to
be SMLs with respect to Response properties.
Theorem 4 Any machine learning operator schema that can introduce a new string with
an in�nitely repeating substring into the FSA language cannot be guaranteed to be an SML
for Response properties.
Proof. Assume FSA S satis�es a Response property prior to learning. Therefore every
string accepted by S satis�es the property. For each accepted string, every instance (or
the �rst instance if it is a First-Response property) of the trigger is eventually followed
by a response. Suppose the machine learning operator introduces a new string with an
in�nitely repeating substring into the language. Then it is possible that the pre�x of this
string before the in�nitely repeating substring includes a trigger and no response, and the
in�nitely repeating substring does not include a response. 2
Since we already have results to cover the one-step operators, we need only consider the
two-step operators.
Corollary 7 All of the two-step learning operators cannot be guaranteed to be SMLs with
respect to Response properties because they are in the �rst class in the partition related to
this theorem, i.e., they may introduce strings with in�nitely repeating substrings.
Consider a couple of illustrative examples of Theorem 4 and its corollary, using Figure 9.
Prior to learning (the FSA on the left of Figure 9), 8x, where x 2 L(S1), x j= P3, for
Response property P3 = 2 (a ! 3 d). Assume operator o
stay
: S1 ! S1
0
deletes edge
(STATE2, STATE3) and generalizes the transition condition on edge (STATE2, STATE2)
to \e _ a" (see Figure 9 on the right). Then the string consisting of b followed by in�nitely
many a's (b,a,a,a,...) 2 L(S1
0
) but 6j= P3. This helps us to see why o
stay
is not necessarily
an SML for Response properties. The same example illustrates why o
delete+gen
cannot be
guaranteed to be an SML for Response properties. For o
spec+gen
, suppose the condition for
(STATE2, STATE3) is \f _ a" in S1, and \f ^: a" in S1
0
but everything else is the same
as in Figure 9. Again, we can see the problem for Response properties.
We conclude by summarizing the positive a priori results:
� o
delete
, o
spec
, o
delete_spec
and o
delete�action
are SMLs for any property (expressible in
temporal logic).
122
Asimovian Adaptive Agents
� o
delete+gen
, o
spec+gen
and o
stay
are SMLs for Invariance properties.
and the negative a priori results:
� o
add
, o
gen
, o
add_gen
, o
add�action
, o
spec+add
, o
delete+add
, o
move
and o
change
are not neces-
sarily SMLs for Invariance or Response properties.
� o
delete+gen
, o
spec+gen
and o
stay
are not necessarily SMLs for Response properties.
The fact that all three learning operators that satisfy the accessibility condition are
guaranteed to be SMLs for Invariance properties is signi�cant, because Invariance properties
are extremely useful and common for verifying systems and many important applications
need only test properties of this class (Heitmeyer et al., 1998).
Finally, from Theorems 1 and 2 we learned that the heart of the problem for all of
the negative results is either an o
gen
step or an o
add
step. Later in this paper we address
these troublesome steps by �nding more e�cient methods for dealing with them than total
reveri�cation from scratch. However, �rst, in the next section, we consider how our a priori
results are translated from a single to a product FSA for SIT
multplans
.
5. Translating Learning Operators to a Product Automaton
In this section we address SIT
multplans
where each agent maintains and uses its own in-
dividual FSA, but for veri�cation the product FSA needs to be formed and veri�ed. For
SIT
multplans
, a learning operator is applied to an individual agent FSA and then the product
is formed. Therefore, it is necessary to consider the translation of each learning operator
from individual to product FSA, and how that a�ects the a priori SML results presented
above.
For operators o
spec+gen
, o
delete+gen
, o
spec+add
, and o
delete+gen
, we consider only the trans-
lations of the primitive operators. This is because the translations of these operators are
simply translations of their primitive components. The remaining translations are:
� o
spec
translates to o
spec
and/or o
delete
.
� o
delete
translates to o
spec
and/or o
delete
.
� o
gen
translates to o
gen
and/or o
add
.
� o
add
translates to o
gen
and/or o
add
.
� o
stay
translates to o
stay
and/or o
move
.
� o
move
translates to o
move
.
� o
change
translates to o
change
.
It may not be intuitive to the reader how o
gen
can translate to o
add
. To illustrate, we
use Figure 10, where the transition conditions, such as (a _ c), denote sets of multiagent
actions. Suppose o
gen
is applied to edge (1, 2) in the leftmost FSA so that the transition
condition is now (d _ b). Then a new edge (11
0
, 21
0
) is added to the product FSA (rightmost
123
Gordon
�
�
�
�
H
Hj
1
�
��
�
a _ c
d
?
6
e
�
�
�
�
2
�
��
*
c
�
�
�
�
H
Hj
1
0
�
��
�
b _ c
PRODUCT
�
�
�
�
H
Hj
11
0
�
��
�
c
�
�
�
�
21
0
�
��
*
c
Figure 10: Generalization can become addition in product.
in Figure 10) with the transition condition b. Recall that to form the product FSA we take
the Cartesian product of the vertices and the intersection of the transition conditions.
Likewise, o
spec
translates to either o
spec
or o
delete
in the product FSA.
To illustrate why o
stay
can become o
move
in the product, we use Figure 3. Suppose
we delete the edge (TRANSMITTING, RECEIVING) and move the transition condition
to edge (TRANSMITTING, TRANSMITTING). Then the global state (DELIVERING,
DELIVERING, TRANSMITTING) becomes accessible from initial state (COLLECTING,
RECEIVING, TRANSMITTING) by taking multiagent action (F-deliver ^ I-receive ^ L-
transmit). Previously, that multiagent action forced the product FSA to go to (DELIVER-
ING, DELIVERING, RECEIVING).
What implications do these translations have for the safety of the learning operators
for the product FSA? The positive a priori results for o
delete+gen
, o
spec+gen
, and o
stay
for
preserving Invariance properties become negative for the product. This is because o
gen
may
become o
add
and o
stay
may become o
move
. On the other hand, the positive a priori results
for o
delete
, o
spec
, o
delete_spec
and o
delete�action
preserving all properties remain positive for
the product. For o
delete
, o
spec
, o
delete_spec
, and o
delete�action
, this implies that the product
FSA never needs to be formed, reveri�cation does not have to be done, and thus there is
no run-time cost, even for multiple agents learning autonomously. As mentioned above,
the troublesome parts of all operators are due to their o
gen
or o
add
component. In the
next section we develop methods for reducing the complexity of reveri�cation over total
reveri�cation from scratch when these operators have been applied.
6. Incremental Reveri�cation
Recall that operators o
spec
and o
delete
cannot cause problems with the safety of learning,
whereas o
gen
and o
add
are risky (i.e., are not a priori guaranteed to be SMLs). Furthermore,
o
gen
and o
add
can cause problems when they are the second step in a two-step operator.
Fortunately, we have developed incremental reveri�cation algorithms for these operators
that can signi�cantly decrease the time complexity over total reveri�cation from scratch.
Recall that there are two ways that operators can alter accessibility: globally (G) or
locally (L). Furthermore, recall that o
add
can increase accessibility either way (" G " L),
whereas o
gen
can only increase accessibility locally (6" G " L). We say that o
gen
has only
124
Asimovian Adaptive Agents
a \localized" e�ect on accessibility, whereas the e�ects of o
add
may ripple through many
parts of the FSA. The implication is that we can have very e�cient incremental methods
for reveri�cation tailored for o
gen
, whereas we cannot do likewise for o
add
. In other words, a
more localized e�ect on accessibility implies that it is easier to localize reveri�cation to gain
speed. This is also true for both two-step operators that have o
gen
as their second step, i.e.,
o
delete+gen
and o
spec+gen
are amenable to incremental (localized) reveri�cation. Because
no advantage is gained by considering o
add
per se, we develop incremental reveri�cation
algorithms for the most general operator o
change
. These algorithms apply to o
add
and all
other special cases of o
change
.
We have developed two types of incremental reveri�cation algorithms: those that follow
the application of o
gen
, and those that follow the application of o
change
. For each of our
learning operators, one or more of these algorithms is applicable. Before presenting the
incremental algorithms, Subsection 6.1 presents two algorithms for total reveri�cation from
scratch, namely, one for Invariance properties and the other for all properties expressible as
FSAs, as well as an algorithm for taking the tensor product of the FSAs. These algorithms
apply to SIT
1agent
, SIT
1plan
, or SIT
multplans
. Subsection 6.2 gives incremental versions of
all the algorithms in Subsection 6.1. These algorithms are applicable when the learning
operator is o
change
or any of its special cases. Furthermore, they apply to any of SIT
1agent
,
SIT
1plan
, or SIT
multplans
. Subsection 6.3 has incremental algorithms for SIT
1agent
and
SIT
1plan
, learning operator o
gen
, and Invariance and full Response properties in particular.
The section concludes with theoretical and empirical results comparing the time complexity
of the incremental algorithms with the time complexity of the corresponding total version
(as well as with each other).
The goal in developing all of the incremental reveri�cation algorithms is maximal e�-
ciency. These algorithms make the assumption that S j= P prior to learning, which means
that any errors found on previous veri�cation(s) have already been �xed. Then learning oc-
curs (o(S) = S
0
), followed by the incremental reveri�cation algorithm (see Figure 1). Next
let us consider the soundness and completeness of the algorithms, where we assume normal
termination. All of the incremental reveri�cation algorithms presented here are sound (i.e.,
whenever they conclude that reveri�cation succeeds, it is in fact true that S
0
j= P ) for
\downstream" properties and \directionless" properties for which the negation is express-
ible as a B�uchi FSA. Downstream properties (which include Response) check sequences of
events in temporal order, e.g., whether every p is followed by a q. In contrast, \upstream"
properties check for events in reverse temporal order, e.g., whether every q is preceded by
a p.
10
Directionless properties, such as Invariance, impose no order for checking. Some of
the incremental algorithms are also complete, i.e., whenever they conclude that reveri�ca-
tion fails, it is in fact true that S
0
6j= P . (The reader should avoid confusing \complete
algorithm" with \complete FSA.")
When reveri�cation fails, it does so because of one or more errors, where an \error"
implies there is a property violation (S
0
6j= P ). There are two ways to resolve such errors.
Either return to the prelearning FSA(s) and choose another learning operator and reverify
again, or keep the results of learning but repair the FSA(s) in some other way to �x the
error. With one exception, the complete algorithms in this section �nd all true errors
10. William Spears (personal communication) identi�ed the upstream/downstream distinction as being rel-
evant to the applicability of the incremental algorithms described here.
125
Gordon
crt crr crp cdt cdr cdp drt drr drp ddt ddr ddp
T R 0 P T 0 T R 0 P T 0 T
R 0 T 0 0 R 0 0 T 0 0 R 0
P 0 0 R 0 0 T 0 0 R 0 0 T
Table 1: The transition function for agent L's FSA plan. The rows correspond to states
and the columns correspond to multiagent actions.
introduced by learning. The algorithms that are not complete may also �nd false errors.
Any algorithm that �nds all and only true errors can resolve these errors in either of the two
ways. An algorithm that does not �nd all errors or �nds false ones requires more restricted
error resolution. In particular, it can only be used with the �rst method for resolving errors,
which consists of choosing another learning operator. The algorithms that are sound but not
complete (can �nd false errors) are overly cautious. In other words, they may recommend
avoiding a learning operator when in fact the operator may be safe to apply.
Before presenting the incremental algorithms, we �rst present algorithms for total rever-
i�cation from scratch. These algorithms do not assume that learning has occurred, and they
apply to all situations. They are more general (not tailored for learning), but less e�cient,
than our incremental algorithms.
6.1 Product and Total (Re)veri�cation Algorithms for All Situations
For implementation e�ciency, all of our algorithms assume that FSAs are represented using
a table of the transition function �(v
1
; a) = v
2
, which means that for state v
1
, taking action
a leads to next state v
2
, as shown in Table 1. Rows correspond to states and columns
correspond to multiagent actions. This representation is equivalent to the more visually
intuitive representation of Figures 2 and 3. In particular, Table 1 is equivalent to the FSA
in Figure 3 for the lander agent L. In Table 1, states are abbreviated by their �rst letter,
and the multiagent actions are abbreviated by their �rst letters. For example, \crt" means
agent F takes action (F-collect), I takes (I-receive), and L takes (L-transmit). The table
consists of entries for the next state, i.e., it corresponds to the transition function. A \0" in
the table means that there is no possible transition for this state-action pair. One situation
in which this occurs is when an action is not allowed from a state. Consider an example
use of the table format for �nite-state automata. According to the �rst (upper leftmost)
entry in Table 1, if L is in state TRANSMITTING (\T") and F takes action F-collect, I
takes I-receive, and L takes L-transmit (which together is multiagent action \crt"), then
L will transition to its RECEIVING (\R") state, i.e., �(T, crt) = R. With this tabular
representation, o
change
is implemented as a perturbation (mutation) operator that changes
a table entry to another randomly chosen value for the next state. Operator o
gen
is a
perturbation operator that changes a 0 entry to a next state already appearing in that row.
For example, generalizing the transition condition along edge (T,R) can be accomplished
by changing one of the 0s to an R in the �rst row of Table 1. This is because the transition
condition associated with edge (T,R) is the set of all multiagent actions that transition from
126
Asimovian Adaptive Agents
Suppose there are n agents, and 1 � j
k
� the number of states in the FSA for agent k.
Then the algorithm forms all product states v = (v
j
1
; ::; v
j
n
) and speci�es their transitions:
Procedure product
for each product state v = (v
j
1
; ::; v
j
n
) do
if all v
j
k
; 1 � k � n, are initial states, then v is a product initial state
endif
for each multiagent action a
i
do
if (�(v
j
k
; a
i
) == 0) for some k, 1 � k � n, then �(v; a
i
) = 0
else �(v, a
i
) = (�(v
j
1
, a
i
),..., �(v
j
n
, a
i
)); endif
endfor
endfor
end procedure
Figure 11: Total
prod
product algorithm.
T to R, i.e., fcrt, drtg in Table 1. This set is expressed in Boolean algebra as (I-receive ^
L-transmit) (see Figure 3).
For SIT
1plan
or SIT
multplans
, prior to veri�cation the multiagent product FSA S needs to
be formed from the individual agent FSAs (see Figure 1). We can implement the algorithm
Total
prod
for generating the product FSA using the data structure of Table 1 as shown in
Figure 11. In the product FSA, an example product state and transition is �(CRT, drt)
= DDR because �(C, drt) = D, �(R, drt) = D, and �(T, drt) = R for agents F, I, and L,
respectively. The initial states of the product FSA are formed by testing whether every
individual state of the product is an initial state. For example, if D, D, and R are initial
states for F, I, and L, respectively, then DDR will be an initial state for F I L. After
forming the product states and specifying which are initial, the algorithm of Figure 11
speci�es the � transition for every product state and multiagent action.
Note that the algorithm in Figure 11 forms the product FSA S for testing Invariance
properties. To test First-Response properties using AT veri�cation, we need to form the
product FSA S:P . To do this simply requires considering :P to be the (n+1)st agent.
The algorithm in Figure 11 is modi�ed by changing n to n + 1 everywhere. It is also
important to note that in all situations (including SIT
1agent
), Total
prod
must be executed
to form the product S :P if AT veri�cation is to be done. In SIT
1agent
, S is just the
single agent FSA and n is 1. For SIT
1plan
, n = 1 also. In other words, for SIT
1plan
the
multiagent plan, once formed, is never subdivided and therefore it could be considered like
a single agent plan. In both of these cases, if AT veri�cation is done the product is taken
of the single plan FSA and the property FSA.
Given that the product FSA has been formed if needed, then the �nal (multi)agent FSA
can be veri�ed. We �rst consider a very simple model checking algorithm, called Total
I
,
tailored speci�cally for verifying Invariance properties of the form 2:p. The algorithm,
shown in Figure 12, consists of a depth-�rst search of S beginning in each initial state. Any
accessible atom a
i
that is part of a transition condition, where a
i
� p, violates the property.
(We store the set of all atoms a
i
� p for rapid access.)
127
Gordon
Procedure verify
for each state v 2 V (S) do
visited(v) = 0
endfor
for each initial state v 2 I(S) do
if (visited(v) == 0) then dfs(v); endif
endfor
end procedure
Procedure dfs(v)
visited(v) = 1;
for each atom a
i
2 �(K), a
i
� p, do
if �(v; a
i
) 6= 0 then print \Veri�cation error"; endif
endfor
for each atom a
i
2 �(K) set w = �(v; a
i
) and do
if (w 6= 0) and (visited(w) == 0) then dfs(w); endif
endfor
end procedure
Figure 12: Total
I
veri�cation algorithm.
Next we consider an algorithm for verifying any property whose negation is expressible
as a B�uchi FSA, including First-Response properties. The reader may wish to review the
high-level description of this AT model checking algorithm presented in Subsection 3.3 before
continuing. Figure 13 gives a basic version of this algorithm from Courcoubetis et al. (1992)
and Holzmann et al. (1996).
11
We call this algorithm Total
AT
because it is total automata-
theoretic veri�cation. Recall that in AT model checking, the property is represented as an
FSA, and asking whether S j= P is equivalent to asking whether L(S) � L(P ) for property
P . This is equivalent to L(S) \ L(P ) = ;, which is algorithmically tested by taking the
tensor product of the plan FSA and the FSA corresponding to :P . If L(S :P ) = ; then
L(S) � L(P ), i.e., S j= P and veri�cation succeeds; otherwise, S 6j= P and veri�cation fails.
The algorithm of Figure 13 assumes that the negation of the property (:P ) is expressed as
a B�uchi automaton and the FSA being searched is S :P .
Algorithm Total
AT
, in Figure 13, actually checks whether S 6j= P for any property P .
To check if S 6j= P , we can determine whether L(S :P ) 6= ;. This is true if there is some
\bad" state in B(S :P ) reachable from an initial state and reachable from itself, i.e.,
part of an accessible cycle and therefore visited in�nitely often. The algorithm of Figure 13
performs this check using a nested depth-�rst search on the product FSA S:P . The �rst
depth-�rst search begins at initial states and visits all accessible states. Whenever a state
s 2 B(S :P ) is discovered, it is called a \seed," and a nested search begins to look for a
cycle that returns to the seed. If there is a cycle, this implies the B(S :P ) (seed) state
can be visited in�nitely often, and therefore the language is nonempty (i.e., there is some
action sequence in the plan that does not satisfy the property) and veri�cation fails.
11. This algorithm is used in the well-known Spin system (Holzmann, 1991). A modi�cation was made to
the published algorithm for readability, as well as for e�ciency, for the case where it is desirable to halt
after the �rst veri�cation error. This modi�cation makes the nested call �rst in procedure dfs.
128
Asimovian Adaptive Agents
Procedure verify
for each state v 2 V (S :P ) do
visited(v) = 0
endfor
for each initial state v 2 I(S :P ) do
if (visited(v) == 0) then dfs(v); endif
endfor
end procedure
Procedure dfs(v)
visited(v) = 1;
if v 2 B(S :P ) then
seed = v;
for each state v 2 V (S :P ) do
visited2(v) = 0
endfor
ndfs(v)
endif
for each successor (i.e., next state) w of v do
if (visited(w) == 0) then dfs(w); endif
endfor
end procedure
Procedure ndfs(v) /* the nested search */
visited2(v) = 1;
for each successor (i.e., next state) w of v do
if (w == seed) then print \Bad cycle. Veri�cation error";
break
else if (visited2(w) == 0) then ndfs(w); endif
endif
endfor
end procedure
Figure 13: Total
AT
veri�cation algorithm.
129
Gordon
Suppose there are n agents, and agent i was modi�ed, 1 � i � n.
Operator o
change
modi�ed �(v
i
, a
adapt
) to be w
i
0
for state v
i
and multiagent action a
adapt
.
1 � j
k
� the number of states in the FSA for agent k.
Then the algorithm is:
Procedure product
for each product state v = (v
j
1
; :::; v
i
; :::; v
j
n
) formed from state v
i
do
if (�(v
j
k
; a
adapt
) == 0) for some k, 1 � k � n, then �(v; a
adapt
) = 0
else �(v, a
adapt
) = (�(v
j
1
, a
adapt
),..., w
i
0
, ..., �(v
j
n
, a
adapt
)); endif
endfor
end procedure
Figure 14: Inc
prod
product algorithm.
Total
I
and Total
AT
are sound and complete (for any property whose negation is express-
ible as a B�uchi FSA), and they �nd all veri�cation errors. Before elaborating on this, �rst
note that the term \veri�cation error" has a di�erent connotation for Total
I
and Total
AT
.
For Total
I
an error is an accessible bad atom (i.e., an atom a � p where the property is
2:p). For Total
AT
it is an accessible bad state that is part of a cycle. The reason Total
I
is sound is that it ags as errors only those atoms a � p. It is complete and �nds all errors
because it does exhaustive search and testing of all accessible atoms. Total
AT
is also sound
and complete, for analogous reasons. Because Total
I
and Total
AT
�nd all errors, they can
be used with either method of error resolution (i.e., choose another operator or �x the FSA).
6.2 Incremental Algorithms for o
change
and All Situations
All of the algorithms in the previous subsection can be streamlined given that it is known
that a learning operator (in this case, o
change
) has been applied. For simplicity, all of our
algorithms assume o
change
is applied to a single atom (multiagent action). For example,
we assume that if �(v
i
; a
adapt
) = w
i
, then o
change
(�(v
i
; a
adapt
)) = w
i
0
where w
i
and w
i
0
are
states (or 0, implying no next state), and a
adapt
is a multiagent action. Since we use the
tabular representation, this translates to changing one table entry.
Figure 14 shows an incremental version of Total
prod
, called Inc
prod
, which is tailored for
re-forming the product FSA after o
change
has been applied. The algorithm of Figure 14 is
for Invariance properties; for AT veri�cation change n to n+1 in the algorithm and assume
:P is the (n+ 1)st agent. Although Inc
prod
is applicable in all situations when taking the
product with the property FSA, the primary motivation for developing this algorithm was
the multiagent SIT
multplans
. Recall that in this situation, every time learning is applied to
an individual agent FSA, the product must be re-formed to verify global properties. The
wasted cost of doing this motivated the development of this algorithm.
Algorithm Inc
prod
assumes that the product was formed originally (before learning)
using Total
prod
. Inc
prod
capitalizes on the knowledge of what (individual or multiagent)
state (v
i
) and multiagent action (a
adapt
) transition to a new next state as speci�ed by
operator o
change
. This algorithm assumes that the prelearning product FSA is stored. Then
the only product FSA states whose next state needs to be modi�ed are those states that
130
Asimovian Adaptive Agents
Procedure product
I(S) = ;;
for each product state v = (v
j
1
; :::; v
i
; :::; v
j
n
) formed from state v
i
do
if visited(v) then I(S) = I(S) [ fvg; endif
if (�(v
j
k
; a
adapt
) == 0) for some k, 1 � k � n, then �(v; a
adapt
) = 0
else �(v, a
adapt
) = (�(v
j
1
, a
adapt
),..., w
i
0
,..., �(v
j
n
, a
adapt
)); endif
endfor
end procedure
Figure 15: Inc
prod�NI
product algorithm: a variation of Inc
prod
that gets new initial states.
include v
i
and transition on action a
adapt
. The method for reveri�cation that is assumed to
follow Inc
prod
is total reveri�cation, i.e., Total
I
or Total
AT
.
Next, consider another pair of product and reveri�cation algorithms that is expected
to be, overall, potentially even more e�cient. The goal is to streamline reveri�cation after
o
change
. This requires a few simple changes to the algorithms. The motivation for these
changes is that when model checking downstream properties, o
change
has only \downstream
e�ects," i.e., it only a�ects the accessibility of vertices and atoms altered by o
change
or those
that would be visited by veri�cation after those altered by o
change
.
Consider the changes. We start by building a set of the Cartesian product states v =
(v
j
1
; :::; v
i
; :::; v
j
n
) that are formed from the state v
i
that was a�ected by learning. The
�rst way that we can shorten reveri�cation is by using these states as the new initial states
for reveri�cation. In fact, we need only select those that were visited during the original
veri�cation (i.e., are accessible from the original initial states). In other words, suppose
for agent i, o
change
modi�ed �(v
i
; a
adapt
). Then we reinitialize the set of initial states to be
; and add all product states formed from v
i
that were marked \visited" during previous
veri�cation. This can be done by modifying the product algorithm of Figure 14 as shown in
Figure 15. The algorithm of Figure 15 is to form the product FSA for verifying Invariance
properties. To form the product for AT veri�cation, substitute I(S :P ) for I(S) and
(n + 1) for n in Figure 15. We call this incremental product algorithm Inc
prod�NI
, where
\NI" denotes the fact that we are getting new initial states.
The second way to streamline reveri�cation is by only considering a transition on ac-
tion a
adapt
, the action whose � value was modi�ed by learning, from these new initial
states. Thereafter, incremental reveri�cation proceeds exactly like total (re)veri�cation.
With these changes, Total
I
becomes Inc
I�NI
, shown in Figure 16. Likewise, with these
changes Total
AT
becomes Inc
AT�NI
, as shown in Figure 17. Figure 17 shows only changes
to procedures dfs and verify; ndfs is the same as in Figure 13. One �nal streamlining added
to Inc
I�NI
, but not Inc
AT�NI
, is that only the new initial states have \visited" reinitialized
to 0. This can be done for Invariance properties because they are not concerned with the
order of atoms in strings.
12
12. Suppose o
change
adds a new edge (v
1
; v
3
). If v
3
was visited on previous veri�cation of an Invariance
property (from a state other than v
1
), then all atoms that can be visited after v
3
would already have
been tested for the property. On the other hand, when testing First-Response properties the order of
131
Gordon
Procedure verify
for each new initial state v 2 I(S) do
visited(v) = 0
endfor
for each new initial state v 2 I(S) do
if (visited(v) == 0) then dfs(v); endif
endfor
end procedure
Procedure dfs(v)
visited(v) = 1;
if v 2 I(S) and w 6= 0, where w = �(v; a
adapt
), then
if (a
adapt
� p) then print \Veri�cation error"; endif
if (visited(w) == 0) then dfs(w); endif
else
for each atom a
i
2 �(K), a
i
� p, do
if �(v; a
i
) 6= 0 then print \Veri�cation error"; endif
endfor
for each atom a
i
2 �(K) set w = �(v; a
i
) and do
if (w 6= 0) and (visited(w) == 0) then dfs(w); endif
endfor
endif
end procedure
Figure 16: Inc
I�NI
reveri�cation algorithm.
Inc
I�NI
is sound for Invariance properties, and Inc
AT�NI
is sound for any downstream
or directionless property whose negation is expressible as a B�uchi FSA, including First-
Response and Invariance. Assuming S j= P prior to applying o
change
to form S
0
, if these
incremental reveri�cation algorithms conclude that S
0
j= P , then total reveri�cation would
also conclude that S
0
j= P . Recall that total reveri�cation is sound. Therefore, the same
is true for these incremental algorithms. Furthermore, these incremental reveri�cation al-
gorithms will �nd all of the new violations of the property introduced by o
change
. The
reason the algorithms are sound and �nd all new errors (for downstream or directionless
properties) is that there are only two ways that accessibility can be modi�ed by any of our
learning operators, including o
change
: locally or globally. Recall that local change alters the
accessibility of atom a
adapt
or the state �(v
i
; a
adapt
), and a global change alters the acces-
sibility of states or atoms that would be visited after �(v
i
; a
adapt
). In neither case (local
or global) will the learning operator modify accessibility of atoms or states visited before,
but not after, a
adapt
. Our algorithms reverify exhaustively (i.e., they reverify as much as
total reveri�cation would) for all atoms and states visited at or after a
adapt
. Since these
incremental algorithms perform reveri�cation exactly the same way as their total versions
atoms is relevant. Even if v
3
was previously visited, since it might not have been visited from v
1
, the
addition of (v
1
; v
3
) could add a new string with a new atom order that might violate the First-Response
property. Therefore, v
3
needs to be revisited for First-Response properties, but not for Invariance
properties.
132
Asimovian Adaptive Agents
Procedure verify
for each state v 2 V (S :P ) do
visited(v) = 0
endfor
for each new initial state v 2 I(S :P ) do
if (visited(v) == 0) then dfs(v); endif
endfor
end procedure
Procedure dfs(v)
visited(v) = 1;
if v 2 B(S :P ) then
seed = v;
for each state v 2 V (S :P ) do
visited2(v) = 0
endfor
ndfs(v)
endif
if v 2 I(S :P ) and w 6= 0 and (visited(w) == 0),
where w = �(v; a
adapt
), then dfs(w)
else
for each successor (i.e., next state) w of v do
if (visited(w) == 0) then dfs(w); endif
endfor
endif
end procedure
Figure 17: Procedures verify and dfs of the Inc
AT�NI
reveri�cation algorithm.
do after the part of the FSA that was modi�ed by learning, they will �nd all new errors
introduced by learning.
Inc
I�NI
is complete for Invariance properties because it ags errors using the same
method as Total
I
, and because Invariance properties are directionless and are therefore
impervious to the location of atoms in a string. On the other hand, Inc
AT�NI
is not
complete for all downstream properties. For example, it is not complete for properties
that check for the first occurrence of a pattern in a string, e.g., First-Response properties.
Because Inc
AT�NI
does not identify whether the new initial states are before or after the
�rst occurrence, there is no way to know if the �rst occurrence is being checked after learning.
Nevertheless, this lack of completeness for First-Response properties actually turns out to
be a very useful trait, as we will discover in Subsection 6.5.
6.3 Incremental Algorithms for o
gen
and SIT
1agent=1plan
We next present our �nal two incremental reveri�cation algorithms, which are applicable
only in SIT
1agent
and SIT
1plan
, when there is one FSA to reverify. These are powerful
algorithms in terms of their capability to reduce the complexity of reveri�cation. However,
their soundness relies on the assumption that the learning operator's e�ect on accessibility
133
Gordon
Procedure check-invariance-property
if v
1
was not previously visited, then output \Veri�cation succeeds."
else
if (z j= :p) then output \Veri�cation succeeds."
else output \Avoid this instance of o
gen
."; endif
endif
end procedure
Figure 18: Inc
gen�I
reveri�cation algorithm.
is localized, i.e., that it is o
gen
with SIT
1agent
or SIT
1plan
but not SIT
multplans
(where
o
gen
might become o
add
). An important advantage of these algorithms is that they never
require forming a product FSA, not even S :P , regardless of whether the property is
type Response. The algorithms gain e�ciency by being both tailored to a speci�c property
type and to a speci�c learning operator. The objective in developing these algorithms was
maximal e�ciency and therefore they sacri�ce completeness and/or the ability to �nd all
errors.
These two incremental algorithms are tailored for reveri�cation after operator o
gen
.
Assume that property P holds for S prior to learning, i.e., S j= P . Now we generalize the
transition condition M
K
(v
1
; v
3
) = y to form S
0
via o
gen
(M
K
(v
1
; v
3
)) = y _ z, where y ^ z
= 0. We want to verify that S
0
j= P .
One additional de�nition is needed before presenting our algorithms. We previously
de�ned what it means for a c-state formula p to be true at a c-state, but to simplify
the algorithms we also de�ne what it means for a c-state formula to be true of a transition
condition. A c-state formula p is de�ned to be true of a transition condition y, i.e., \y j= p,"
if and only if y � p (which can be implemented by testing whether for every atom a � y,
a � p.)
Let us begin with the algorithm Inc
gen�I
(which consists of two very simple tests)
tailored for o
gen
and Invariance properties, shown in Figure 18. Recall that M
K
(v
1
; v
3
) = y
and o
gen
(M
K
(v
1
; v
3
)) = y _ z. Inc
gen�I
, which tests \z j= :p," localizes reveri�cation to a
restricted portion of the FSA. (For e�ciency, z j= :p is implemented as a test for z � p
rather than z � :p because p is typically expected to be more succinct than :p.) Assume
the Invariance property is P = 2:p and S j= P . Then every string x in L(S) satis�es
Invariance property P , so for each x, :p is true of every atom in x. This implies y j= :p.
This statement is based on our assumption that v
1
is accessible from an initial state. If not,
reveri�cation is not needed. The generalization will not violate P . Therefore, the algorithm
begins by testing whether v
1
was visited on previous veri�cation. If not, the output is
\success." (Note that o
gen
does not alter the accessibility of v
1
.)
Inc
gen�I
is sound and complete for Invariance properties. Generalization of M
K
(v
1
; v
3
)
is application of o
gen
(M
K
(v
1
; v
3
)) = y_z to form S
0
. This operator o
gen
preserves Invariance
property P if and only if S
0
j= P , which is true if and only if z j= :p. The reason for this
is that we know S satis�es P from our original veri�cation, and therefore :p is true for all
atoms in all strings in L(S). The only possible new atoms in L(S
0
) but not in L(S) are
in z. If z j= :p, then :p is true for all atoms in L(S
0
), which implies that every string in
134
Asimovian Adaptive Agents
Procedure check-response-property
if y j= q then
if (z j= q and z j= :p) then output \Veri�cation succeeds."
else output \Avoid this instance of o
gen
"; endif
else
if (z j= :p) then output \Veri�cation succeeds."
else output \Avoid this instance of o
gen
"; endif
endif
end procedure
Figure 19: Inc
gen�R
reveri�cation algorithm.
L(S
0
) satis�es P . In other words, S
0
j= P . Therefore, Inc
gen�I
is sound. We also know
that it is complete because if 9a, a � z, a 6� p, then it must be the case that S
0
6j= P .
In conclusion, Inc
gen�I
, which consists of the test \z j= :p," is sound and complete. For
maximal e�ciency, our implementation of Inc
gen�I
halts after the �rst error, although it
is simple to modify it to �nd all errors (and this does not signi�cantly a�ect the empirical
time complexity results of Subsection 6.5, nor does it a�ect the worst-case time complexity).
Inc
gen�I
is incremental because it is localized to just checking whether the property holds
of the newly added atoms in z, rather than all atoms in L(S
0
). Finally, this algorithm only
needs to be executed for o
gen
, but not for o
spec+gen
or o
delete+gen
, because o
gen
is the only
version that can add new atoms via generalization. Recall that o
spec+gen
and o
delete+gen
are
SMLs for Invariance properties.
As an example of Inc
gen�I
, suppose a, b, c, d, and e are atoms, and the transition
condition y between STATE1 and STATE2 equals a. Let (a, b, b, d, d,...), where the
ellipsis indicates in�nite repetition of d, be a string in L(S) that includes STATE1 and
STATE2 as the �rst two vertices in its accepting run. The property is P = 2: e. Assume
the fact that this string satis�es : e was proved in the original veri�cation. Suppose o
gen
generalizes M
K
(STATE1, STATE2) from a to (a _ c) (i.e., it adds a new allowable action
c from STATE1), which adds the string (c, b, b, d, d,...) to L(S
0
). Then rather than test
whether all of the elements of f a, b, c, d g are � : e, we really only need to test whether
c � : e, because c is the only newly added atom.
The next algorithm, Inc
gen�R
, is for generalization and full Response properties (and is
nothing more than some simple tests). Like Inc
gen�I
, Inc
gen�R
localizes reveri�cation to a
restricted portion of the FSA. Assume the Response property is P = 2(p! 3q), where p
is the trigger and q is the response for c-state formulae p and q. Assume property P holds
for S prior to learning (S j= P ). Now we generalize M
K
(v
1
; v
3
) = y to form S
0
by applying
o
gen
(M
K
(v
1
; v
3
)) = y _ z, where y ^ z = 0. We need to verify that S
0
j= P .
Inc
gen�R
for o
gen
and full Response properties is in Figure 19. (Inc
gen�R
is also appli-
cable for o
delete+gen
and o
spec+gen
.) The algorithm �rst checks whether a response could be
required of the transition condition M
K
(v
1
; v
3
). A response is required if, for at least one
string in L(S) whose run includes (v
1
; v
3
), the pre�x of this string before visiting vertex
v
1
includes the trigger p not followed by response q, and the string su�x after v
3
does not
include q either. Such a string satis�es the property if and only if y j= q. Thus if y j= q
135
Gordon
and the property is true prior to learning (i.e., for S), then it is possible that a response is
required. In this situation (i.e., y j= q), the only way to be sure we are safe (S
0
j= P ) is if
the newly added condition z also has the response, i.e., z j= q. If not, then there could be
new strings in L(S
0
) whose accepting runs include (v
1
; v
3
) but do not satisfy the property.
For example, suppose a, b, c, and d are atoms, and the transition condition y between
STATE4 and STATE5 equals d. Let x = (a, b, b, d, ...) be a string in L(S) that includes
STATE4 and STATE5 as the fourth and �fth vertices in its accepting run. The property is
P = 2 (a ! 3 d), and therefore y j= q and x j= P . Suppose o
gen
generalizes M
K
(STATE4,
STATE5) from d to (d _ c), where z is c, which adds the string x
0
= (a, b, b, c, ...) to
L(S
0
). Then z 6j= q. If the string su�x after (a, b, b, c) does not include d, then there is
now a string that includes the trigger but does not include the response. In other words,
x
0
6j= P . Finally, if y j= q and z j= q, an extra check is made to be sure z j= :p because an
atom could be both a response and a trigger. New triggers should be avoided.
The second part of the algorithm states that if y 6j= q and no new triggers are introduced
by generalization, then the operator is \safe" to do. It is guaranteed to be safe (S
0
j= P )
in this case because if y 6j= q, then a response cannot be required here. In other words,
because S j= P , for every string in L(S) whose accepting run includes (v
1
; v
3
), either no
trigger occurred prior to visiting v
1
, or every trigger was followed by a response prior to
visiting v
1
, or a response occurred after visiting v
3
.
Inc
gen�R
is sound but not complete for full Response properties. Its soundness is based
on the fact that o
gen
does not increase accessibility of vertices or atoms visited after state
v
3
(i.e., globally) and therefore reveri�cation can be localized to only M
K
(v
1
; v
3
). Inc
gen�R
is not complete because it may output \Avoid this instance of o
gen
" when in fact o
gen
is safe to do. For example, if y j= q but z 6j= q, the algorithm will output \Avoid this
instance of o
gen
." Yet it may be the case that S
0
j= P if no trigger p precedes response q in
L(S
0
), or if a response is after v
3
. When Inc
gen�R
outputs veri�cation failure, it does not
supply su�cient information for FSA repair. Errors must be resolved by selecting another
learning operator. Note that \error" has a di�erent connotation for Inc
gen�R
than for the
AT veri�cation algorithms. Any \Avoid..." output is considered an error.
Another disadvantage of Inc
gen�R
is that it does not allow generalizations that add
triggers. If it is desirable to add new triggers during generalization, then one needs to modify
Inc
gen�R
to call Inc
AT
when reveri�cation with Inc
gen�R
fails, instead of outputting \Avoid
this instance of o
gen
." This modi�cation also �xes the false error problem, and preserves
the enormous time savings (see Section 6.5) when reveri�cation succeeds.
6.4 Theoretical Worst-Case Time Complexity Analysis
Recall that one of our primary objectives is timely agent responses. This section compares
the worst-case time complexity of the algorithms. Let us begin with the time complexity
of Total
prod
. This is O((
Q
n
i=1
jV (S
i
)j) � j�(K)j � n) to form the product of the individual
agent FSAs for Invariance property veri�cation, and O((
Q
n
i=1
jV (S
i
)j) � jP j � j�(K)j � n) to
form the product for AT veri�cation. Here n is the number of agents, jV (S
i
)j is the number
of states in single agent FSA S
i
, jP j is the number of states in the property FSA P , and
j�(K)j is the total number of atoms (multiagent actions). The reason for this complexity
result is that there are
Q
n
i=1
jV (S
i
)j product states for Invariance property veri�cation, and
136
Asimovian Adaptive Agents
(
Q
n
i=1
jV (S
i
)j) � jP j product states for AT veri�cation. The outer loop of Total
prod
iterates
through all product states. The inner loop of Total
prod
iterates through all j�(K)j atoms.
Note that (
Q
n
i=1
jV (S
i
)j)�j�(K)j and (
Q
n
i=1
jV (S
i
)j)�jP j�j�(K)j are the sizes of the product
FSA transition function tables built for Total
I
and Total
AT
, respectively. Total
prod
does
at most n lookups for each table entry.
By comparison, our incremental algorithm Inc
prod
for generating the product FSA has
time complexity O((
Q
n�1
1=1
jV (S
i
)j) �n) or O((
Q
n�1
i=1
jV (S
i
)j) � jP j �n) to modify the product
FSA for Invariance property reveri�cation or AT reveri�cation, respectively. This is because
the total number of revised product states is (
Q
n�1
i=1
jV (S
i
)j) or (
Q
n�1
i=1
jV (S
i
)j) � jP j, and
only one atom is considered (because we assume o
change
changes the next state for a single
atom a
adapt
). The time complexity of Inc
prod�NI
is the same as that of Inc
prod
.
Next consider the worst-case time complexity of total (re)veri�cation after the product
has been formed. It is O((
Q
n
i=1
jV (S
i
)j) � j�(K)j) for Total
I
. This is because, in the
worst case, every product state is accessible and therefore every entry in the product FSA
transition function table is visited. Assuming jBj is the number of \bad" (in the B�uchi sense)
states in the product FSA, then the worst-case time complexity of Total
AT
is O((jBj+1) �
(
Q
n
i=1
jV (S
i
)j) � jP j � j�(K)j). This is because, in the worst case, every entry in the product
FSA transition function table is visited once on the depth-�rst search and, for each bad
state, again on the nested depth-�rst search. Unfortunately, the worst-case time complexity
of Inc
I�NI
and Inc
AT�NI
are the same as that of Total
I
and Total
AT
, respectively. This
is because, in the worst case, every product state is still accessible. The restriction to
transition only on a
adapt
at �rst does not reduce the \big O" complexity.
Finally, we consider the worst-case complexity of Inc
gen�I
and Inc
gen�R
. First, we
de�ne for any Boolean expression x, jxj is the number of elements in fa j a 2 �(K) and a �
xg. For Invariance properties P of the form 2:p, jP j equals jpj since we test for each atom a
whether a j= p rather than a j= :p, because we expect jpj < j:pj in general. Then Inc
gen�I
requires time O(jzj � jpj) to determine whether z j= :p. (Checking whether v
1
was visited
requires constant time.) Assuming jpj <
Q
n
i=1
jV (S
i
)j (which should be true except under
bizarre circumstances), and since jzj � �(K), Inc
gen�I
saves time over Total
I
. Inc
gen�R
requires time O((jyj � jqj) + (jzj � (jpj + jqj))) to determine whether y j= q, and then to
determine whether z j= q and z j= :p.
13
Clearly (jyj+jzj) � j�(K)j because by the de�nition
of o
gen
, y ^ z = 0. Therefore, assuming (jpj+ jqj) < ((jBj+1) � j
Q
n
i=1
jV (S
i
)j � jP j) (which,
again, should be true except in bizarre circumstances), the worst-case time complexity of
Inc
gen�R
is lower than that of Total
AT
.
6.5 Empirical Time Complexity Comparisons
Worst-case time complexity is not always a useful measure. Therefore we supplement the
worst-case analyses with empirical results on cpu time. Our primary objective in these
experiments is to compare the incremental algorithms with total reveri�cation, as well as
with each other, for the context of evolving behaviorally correct FSAs. The time required
13. Determining whether z j= :p can be done by determining whether z � p. Also, for Inc
gen�R
, an
additional time O(j�(K)j � (jyj+ jzj)) is needed to identify y when using the representation of Table 1.
This does not a�ect our complexity comparisons or conclusions.
137
Gordon
for reveri�cation is signi�cant to address if we want timely agent responses, because rever-
i�cation occurs after every learning operator application.
Before describing the experimental results, let us consider the experimental methodol-
ogy. All code was written in C and run on a Sun Ultra 10 workstation. In our experiments,
FSAs were randomly initialized, subject to certain restrictions. The reason for randomness
is that this is a typical way to initialize individuals in a population for an evolutionary
algorithm. There are two restrictions on the FSAs. First, although determinism and com-
pleteness of FSAs are execution, rather than veri�cation, issues and therefore need not
be enforced for these experiments, our choice of tabular representation of the FSAs (see
Table 1) restricts the FSAs to being deterministic. Second, because the incremental algo-
rithms assume S j= P prior to learning, we restrict the FSAs to comply with this. There
are two alternative methods for enforcing this in the experiments: (1) use sparse FSAs (i.e.,
with many 0s) and keep generating new FSAs until total veri�cation succeeds (which does
not take long with sparse FSAs), or (2) use dense FSAs engineered to guarantee property
satisfaction. In particular, dense FSAs are forced to satisfy Invariance properties 2:p by
inserting 0s in every column of the transition function table (such as Table 1) labeled with
an atom a � p. Dense FSAs are forced to satisfy First-Response properties with trigger p
and response q by inserting 0s in every column labeled with an atom a � p. This eliminates
triggers initially. Note that either of these methods is a viable way to initialize a population
of FSAs for evolution because it ensures early success in satisfying the property. This paper
presents only the results with dense FSAs. See Gordon (1999) for the results with sparse
FSAs.
14
Another experimental design decision was to show scaleup in the size of the FSAs.
Throughout the experiments there were assumed to be three agents, each with the same 12
multiagent actions. Each individual agent FSA had 25 or 45 states.
15
With 45 states the
transition table contains 45
3
� 12 entries.
A suite of �ve Invariance and �ve Response properties was used, which is in Appendix
C. Invariance properties were expressed by storing the set of all atoms a � p for property
2:p. This su�ces for all of our algorithms tailored for Invariance properties. For AT
veri�cation, Response properties were expressed with a First-Response B�uchi FSA for the
negation of the property. An explanation of why this is adequate for our experiments is
below. For Inc
gen�R
, trigger p, and response q, all atoms a
i
� p and a
j
� q were stored. Six
independent experiments were performed to verify each of the properties. In other words,
every reveri�cation algorithm was tested with 30 runs { six runs for each of �ve Invariance
or �ve Response properties. For every one of these runs, a di�erent random seed was used
for generating the three FSAs. However, it is important to point out that all algorithms
being compared with each other saw the same FSAs. For example, in Table 2 we compare
Inc
prod
(row 1), Inc
prod�NI
(row 4), and Total
prod
(row 7). They all input the same three
FSAs. Furthermore, the learning operator (speci�c instantiation of the operator schema)
was the same for all algorithms being compared.
14. Sparse FSAs have an additional advantage, assuming they remain relatively sparse after evolution. The
advantage is their succinctness for e�cient execution, as in multientity models (Tennenholtz & Moses,
1989).
15. The sparse FSAs had 25, 45, or 65 states. To get accurate timing results with the dense FSAs, though,
65 states required a cpu free of any interfering processes for an unreasonably long time.
138
Asimovian Adaptive Agents
Let us consider the results in Tables 2 and 3. In both of these tables, each row corre-
sponds to an algorithm. Rows are numbered for later reference. The entries give perfor-
mance results, to be described shortly. Table 2 compares the performance of total rever-
i�cation with the algorithms of Subsection 6.2, which were designed for o
change
and all
situations. The situation assumed for these experiments was SIT
multplans
. Three dense
random (subject to the above-mentioned restrictions) FSAs were generated, and then the
product was formed. The result was a product FSA satisfying the property. Operator
o
change
was then applied, which consisted of a random (but points to a state instead of 0)
change to a randomly chosen table entry in the FSA transition table for a random choice
of one of the three agents. Finally, the product FSA was re-formed and reveri�cation done.
The methodology for generating Table 3 was similar to that for Table 2, except that
o
gen
was the learning operator and the situation was assumed to be SIT
1plan
. In other
words, the product FSA was formed, and then o
gen
applied to the product FSA of the three
agents, the product was taken with the property FSA if needed for AT veri�cation, and
then reveri�cation performed. Operator o
gen
consisted of choosing a random state s
i
and
a random action a
i
for which �(s
i
; a
i
) = s
k
, and choosing a random action a
j
for which
�(s
i
; a
j
) = 0, and then setting �(s
i
; a
j
) = s
k
.
Any column in Tables 2 or 3 labeled \sec" gives a mean, over 30 runs, of the cpu time
of the algorithm. Columns labeled \spd" give the speedup over total, i.e., the cpu time of
the incremental algorithm in that row divided by the cpu time of the corresponding total
algorithm. For example, the \spd" entry for Inc
prod
in row 1 gives its cpu time divided
by the cpu time of Total
prod
in row 7. Columns labeled \err" show the average number
of veri�cation errors over 30 runs. This is important to monitor because, for example, the
cpu time is most strongly correlated with the number of states \visited" during dfs, and
\visited2" during ndfs when AT veri�cation is used. Every property error causes ndfs to be
called with a nested search, which may be quite time-consuming. Also, it is important to
note that we did not force any veri�cation errors to occur. It was our objective to monitor
cpu time under natural circumstances for evolving FSAs. When errors arose they were
the natural result of applying a learning operator. The \err" columns are missing from
Table 2 because the values are all 0, i.e., no errors occurred during the experiments due to
applying o
change
, although we have observed errors to occur with this operator not during
the experiments. The lack of errors in the experiments resulted from the particular random
FSAs that happened to be generated during the experiments. Errors are quite common
with the speci�c o
gen
version of o
change
, as can be seen in Table 3. Note that \N/A" is in
the \err" column for anything other than a veri�cation algorithm because \err" refers to
veri�cation errors.
The algorithms (rows) should be considered in triples \p," \v," and \b," or else as a
single item \v+b." A \p" next to an algorithm name in Table 2 or 3 denotes it is a product
algorithm, a \v" that it is a veri�cation algorithm, and a \b" that it is the sum of the \p"
and \v" entries, i.e., the time for both re-forming the product and reverifying. For example,
Inc
I
(b) is considered to be an algorithm pair consisting of Inc
prod
(p) followed by Total
I
(v) (see rows 1-3 of Table 2). If no product needs to be formed, then the \b" version of the
algorithm is identical to the \v" version, in which case there is only one row labeled \v+b."
Tables 4, 5, and 6 re-present a subset (cpu time only) of the data from Tables 2 and 3
in a format that facilitates some comparisons. In other words, Tables 4, 5, and 6 contain
139
Gordon
no new data, only reformatted data from Tables 2 and 3. In Tables 4, 5, and 6, results are
grouped by \p," \v," or \b."
Let us elaborate on one more interesting issue before listing our experimental hypotheses.
Recall that we are using a First-Response property FSA and that this FSA checks only that
the first trigger in every string is followed by a response. For our evolutionary paradigm
(with dense FSA initialization) when using Inc
AT�NI
, verifying a First-Response property
is equivalent to verifying the full Response property. The false errors found by Inc
AT�NI
due to its incompleteness are in fact violations of the full Response property.
16
Therefore
for Inc
AT�NI
, First-Response FSAs are entirely adequate for reveri�cation of full Response
properties. Because we used the evolutionary paradigm in these experiments, and because
Inc
AT�NI
found the same number of errors as Total
AT
(i.e., Inc
AT�NI
found no false
errors), for the FSAs in these experiments testing First-Response properties was equivalent
to testing full Response properties.
For our experiments, �ve hypotheses were tested:
H1: Algorithms tailored speci�cally for Invariance properties are faster than those for
AT veri�cation, because the latter are general-purpose (and the product algorithms
include an additional FSA).
H2: The incremental algorithms are faster than the total algorithms for both product and
reveri�cation. This is expected to be true because they were tailored for learning.
H3: The \NI" versions of the incremental algorithms are faster than their counterparts,
which do not �nd new initial states. This is expected because of the increase in
streamlining.
H4: Inc
gen�I
and Inc
gen�R
are the fastest of all the algorithms, because they are tailored
for a less generic learning operator (i.e., o
gen
rather than o
change
), plus they are also
tailored for one speci�c property type, and they sacri�ce �nding all errors.
H5: Inc
gen�I
and Inc
gen�R
will have the best scaleup properties. They will not take more
time as FSA size increases. This latter expectation comes from the worst-case time
complexity analysis.
Subsidiary issues we examine are the percentage of wrong predictions (for Inc
AT�NI
and
Inc
gen�R
, which are not complete algorithms), and the maximum observed speedup.
The results are the following (unless stated otherwise, look at the \sec" columns):
H1: To see the results, in Table 2 look at rows 1 through 9 and compare each row r
in this set with row r+9. In other words, compare row 1 with row 10, row 2 with
row 11, and so on. Rows 1 through 9 are algorithms for Invariance properties, and
16. The reason is the following. Dense FSA initialization creates FSAs with no triggers. A learning operator
is then applied. After learning, Inc
AT�NI
begins reveri�cation at every state from which a new trigger
could have been added by learning. Thus every trigger in the FSA will be checked to see if it is followed
by a response. At every generation of our evolutionary learning paradigm, at most one learning operator
is applied per FSA, and this is immediately followed by reveri�cation and error resolution (if needed).
Therefore every new trigger will be caught by Inc
AT�NI
and, if not followed by a response, the problem
will be immediately resolved.
140
Asimovian Adaptive Agents
25-state FSAs 45-state FSAs
sec spd sec spd
1 Inc
prod
p .000157 .00497 .000492 .00255
2 Total
I
v .023798 .95663 .206406 .97430
3 Inc
I
b .023955 .07023 .206898 .51133
4 Inc
prod�NI
p .000206 .00652 .000617 .00320
5 Inc
I�NI
v .000169 .00680 .000528 .00320
6 Inc
I�NI
b .000375 .00110 .001762 .00435
7 Total
prod
p .031594 1.0 .192774 1.0
8 Total
I
v .024877 1.0 .211851 1.0
9 Total
I
b .340817 1.0 .404625 1.0
10 Inc
prod
p .000493 .00507 .001521 .00259
11 Total
AT
v .021103 .98903 .177665 .96869
12 Inc
AT
b .024798 .20022 .180707 .23441
13 Inc
prod�NI
p .000574 .00590 .001786 .00304
14 Inc
AT�NI
v .009011 .37450 .090824 .49520
15 Inc
AT�NI
b .009585 .07900 .092824 .12013
16 Total
prod
p .097262 1.0 .587496 1.0
17 Total
AT
v .024062 1.0 .183409 1.0
18 Total
AT
b .121324 1.0 .770905 1.0
Table 2: Average performance over 30 runs (5 properties, 6 runs each) with operator o
change
and dense FSAs. Rows 1 through 9 are for reveri�cation of Invariance properties
and rows 10 through 18 are for AT reveri�cation of Response properties.
25-state FSAs 45-state FSAs
sec spd err sec spd err
1 Inc
gen�I
v+b .000001 4.25e-5 .20 .000002 9.75e-6 .07
2 Inc
I�NI
v+b .000002 8.51e-5 .20 .000003 1.46e-5 .07
3 Total
I
v+b .023500 1.0 .20 .205082 1.0 .07
4 Inc
gen�R
v+b .000007 7.23e-8 .73 .000006 2.09e-9 .73
5 Inc
prod�NI
p .000006 5.22e-5 N/A .000006 8.51e-6 N/A
6 Inc
AT�NI
v 94.660700 .98099 3569.33 2423.550000 .84442 12553.40
7 Inc
AT�NI
b 94.660706 .97982 N/A 2423.550006 .84421 N/A
8 Total
prod
p .114825 1.0 N/A .704934 1.0 N/A
9 Total
AT
v 96.495400 1.0 3569.33 2870.080000 1.0 12553.40
10 Total
AT
b 96.610225 1.0 N/A 2870.784934 1.0 N/A
Table 3: Average performance over 30 runs (5 properties, 6 runs each) with operator o
gen
and dense FSAs. Rows 1 through 3 are for reveri�cation of Invariance properties
and rows 4 through 10 are for reveri�cation of Response properties.
141
Gordon
25-state FSAs 45-state FSAs
1 Inc
prod
p .000157 .000492
2 Inc
prod�NI
p .000206 .000617
3 Total
prod
p .031594 .192774
4 Total
I
v .023798 .206406
5 Inc
I�NI
v .000169 .000528
6 Total
I
v .024877 .211851
7 Inc
I
b .023955 .206898
8 Inc
I�NI
b .000375 .001762
9 Total
I
b .340817 .404625
Table 4: Average cpu time (in seconds) over 30 runs with operator o
change
and �ve Invari-
ance properties. This table is a duplication of some of the material in Table 2.
25-state FSAs 45-state FSAs
1 Inc
prod
p .000493 .001521
2 Inc
prod�NI
p .000574 .001786
3 Total
prod
p .097262 .587496
4 Total
AT
v .021103 .177665
5 Inc
AT�NI
v .009011 .090824
6 Total
AT
v .024062 .183409
7 Inc
AT
b .024798 .180707
8 Inc
AT�NI
b .009585 .092824
9 Total
AT
b .121324 .770905
Table 5: Average cpu time (in seconds) over 30 runs with operator o
change
and �ve Response
properties. This table is a duplication of some of the material in Table 2.
25-state FSAs 45-state FSAs
1 Inc
gen�R
p 0 0
2 Inc
prod
p .000006 .000006
3 Total
prod
p .114825 .704934
4 Inc
gen�R
v .000007 .000006
5 Inc
AT
v 94.660700 2423.550000
6 Total
AT
v 96.495400 2870.080000
7 Inc
gen�R
b .000007 .000006
8 Inc
AT
b 94.660706 2423.550006
9 Total
AT
b 96.610225 2870.784934
Table 6: Average cpu time (in seconds) over 30 runs with operator o
gen
and �ve Response
properties. This table is a duplication of some of the material in Table 3.
142
Asimovian Adaptive Agents
rows 10 through 18 are algorithms for AT veri�cation. In Table 3, rows 1 through
3 are algorithms for Invariance properties, and rows 5 through 10 are algorithms for
AT veri�cation. Compare row 2 with 7, and 3 with 10. (Rows 1 and 4 cannot be
compared because row 4 has an algorithm tailored for Response properties.) Note
that these comparisons are between a \v+b" and a \b." Since \v+b" means \v" or
\b," this is a correct comparison. These results show that H1 is mostly, but
not completely, con�rmed. It is con�rmed for all results in Table 3. On the other
hand, the results are mixed for Table 2.
H2: The easiest way to compare is to examine Tables 4, 5, and 6. In these cases the
comparison is between the �rst two rows labeled \p" (or \v" or \b") versus the third
row of that same label. The reason for making these comparisons is that the �rst
two rows of a given label correspond to an incremental algorithm (except for row 4 of
Tables 4 and 5) and the third row of a given label corresponds to a total algorithm.
Alternatively, one could examine Tables 2 and 3. In Table 2, rows 1 through 6 (other
than 2) and 10 through 15 (other than 11) are incremental algorithms, and rows 2, 11,
7 through 9, and 16 through 18 are total reveri�cation algorithms. The appropriate
comparisons are between rows 1 and 7, 4 and 7, 5 and 8, 3 and 9, 6 and 9, 10 and 16,
13 and 16, 14 and 17, 12 and 18, and 15 and 18. In Table 3, rows 1, 2, and 4 through
7 are incremental algorithms, and rows 3 and 8 through 10 are total. The appropriate
comparisons are between rows 1 and 3, 2 and 3, 4 and 10, 5 and 8, 6 and 9, and 7
and 10. All results con�rm H2. The statistical signi�cance of the comparisons in
Tables 2 and 3 were tested. Using an exact Wilcoxon rank-sum test, all comparisons
relevant to hypothesis H2 in Table 2 are statistically signi�cant (p < 0:01 and, in
most cases, p < 0:0001). In Table 3, however, the di�erences between Inc
AT�NI
and
Total
AT
(both the (v) and (b) versions) are not statistically signi�cant at the p < 0:01
level. All other comparisons in Table 3 are signi�cant at the p < 0:01 level.
H3: This hypothesis does not apply to the algorithms for re-forming the product FSA
because, obviously, it will require more time to get the new initial states for the
\NI" versions. We wish to test the overall time savings of the \NI" versions, so we
concentrate on the rows labeled \b." The relevant comparisons are row 7 versus 8
in Table 4 and row 7 versus 8 in Table 5. (Alternatively, one could compare row 3
versus 6, and row 12 versus 15 in Table 2.) Each of these comparisons is between an
\NI" version and a counterpart version of the algorithm that is the same as the \NI"
version except that it does not �nd new initial states. Tables 3 and 6 are not relevant
because they only have the \NI" versions but not their counterparts. (We only saw
the need to make one comparison between all \NI" versions and their counterparts,
which is re ected in Table 2.) All results con�rm hypothesis H3. After testing
the statistical signi�cance, it is found that the results are signi�cant (p < 0:01).
H4: To determine H4 requires considering Table 3 but not Table 2. This is because we only
need to compare algorithms for which o
gen
has been applied. Compare row 1 versus 2,
1 versus 3, 4 versus 7, and 4 versus 10 to see the results. All results show Inc
gen�I
(row
1) and Inc
gen�R
(row 4) to be at least as fast as the other algorithms. Therefore H4
is con�rmed. In all cases other than Inc
gen�I
(row 1) versus Inc
I�NI
(row 2), there
143
Gordon
is a noticeable speedup. In most cases, the speedup is quite dramatic. All noticeable
speedups are statistically signi�cant (p < 0:0001).
H5: To test H5, compare the �rst \spd" column (for 25-state FSAs) with the second
column with this label (for 45-state FSAs). A more desirable scaleup shows a lower
value for \spd" as the size of the FSA increases. It implies that the ratio of the cpu
time of the incremental algorithm to the cpu time of the total algorithm decreases
more (or increases less) as the FSA size increases. One should make this two-column
comparison for rows 1 through 6 (but not 2) and 10 through 15 (but not 11) of Table 2,
and rows 1 and 2, and 4 through 7 of Table 3 because these are all the incremental
algorithms. (We don't care about the total algorithms because \spd" is, by de�nition,
always 1.0 for them.)
17
If one considers the results of algorithms appearing in both
tables (e.g., Inc
I�NI
shows di�erent scaleup properties in the two tables, but we need
to consider both sets of results), then clearly Inc
gen�I
(row 1) and Inc
gen�R
(row 4) in
Table 3 show the best scaleup of all the incremental algorithms. H5 is con�rmed. It
is apparent from the \sec" columns that the time complexity of these two algorithms
does not increase (other than minor uctuations) as FSA size increases (see Table 3).
A couple of subsidiary issues are now addressed. For one, recall that Inc
AT�NI
and
Inc
gen�R
are not complete. Therefore, it is relevant to consider the percentage of incorrect
predictions (i.e., false errors) they made. Inc
AT�NI
made none. For the results in Table 3,
33% of Inc
gen�R
's predictions were wrong (i.e., false errors) for the size 25 FSAs, and 50%
were wrong for the size 45 FSAs.
Finally, consider the maximum observable speedup. Inc
gen�R
shows a
1
2
-billion-fold
speedup over Total
AT
on size 45 FSA problems (averaged over 30 runs)! This alleviates much
of the concern about Inc
gen�R
's false error rate. For example, given the rapid reveri�cation
time of Inc
gen�R
, an agent could use it to reverify a long sequence of learning operators
culminating in one that satis�es the property in considerably less time than it takes Total
AT
to reverify one learning operator.
We conclude this section by summarizing, in Table 7, the fastest algorithm (based on
our results) for every operator, situation, and property type. In Table 7, it is assumed
that a First-Response FSA is used for AT veri�cation of Response properties. Operator
o
add�action
is omitted from this table because it is not clear at this time whether it would
be faster to apply total reveri�cation or perform multiple applications of the incremental
algorithm (one for each primitive operator application). Section 8 considers an alternative
solution as future work. In Table 7, \None" means no reveri�cation is required, i.e., the
learning operator is a priori guaranteed to be an SML for this situation and property class.
7. Related Work
There has been a great deal of recent research on model checking, and even on model
checking of distributed systems (Holzmann, 1991). Nevertheless, there is very little in the
literature about model checking applied to systems that change. Two notable exceptions
are the research of Sokolsky and Smolka (1994) on incremental reveri�cation and that of
17. If \spd" 6= 1.0 for a total algorithm, this is due to the statistical variation in run time.
144
Asimovian Adaptive Agents
SIT
1agent=1plan
SIT
1agent=1plan
SIT
multplans
SIT
multplans
and Invariance and Response and Invariance and Response
o
change
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
delete
None None None None
o
spec
None None None None
o
add
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
gen
Inc
gen�I
or Inc
I�NI
Inc
gen�R
Inc
I�NI
Inc
AT�NI
o
delete_spec
None None None None
o
delete�action
None None None None
o
add_gen
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
move
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
delete+add
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
spec+add
Inc
I�NI
Inc
AT�NI
Inc
I�NI
Inc
AT�NI
o
delete+gen
None Inc
gen�R
Inc
I�NI
Inc
AT�NI
o
spec+gen
None Inc
gen�R
Inc
I�NI
Inc
AT�NI
o
stay
None Inc
AT�NI
Inc
I�NI
Inc
AT�NI
Table 7: Learning operators with the fastest reveri�cation method.
Sekar et al. (1994). Both of these papers are about reveri�cation of software after user
edits rather than adaptive agents. Nevertheless the work is related. Sokolsky and Smolka
use the modal �-calculus to express Invariance and Liveness properties. They present an
incremental version of a model checker that does block-by-block global computations of �xed
points, rather than AT or property-speci�c model checking as we do. The learning operators
assumed by their algorithm are edge deletions/additions on a representation similar to FSAs
called LTS (but unlike our multiagent work, they assume a single LTS). The worst-case
time complexity of their algorithm is the same as that of total reveri�cation, although
their empirical results are good. Note that we have a priori results for edge deletion.
However we do not have an incremental algorithm speci�cally tailored for edge addition (for
multiple agents and AT or property-speci�c model checking); thus this may be a fruitful
direction for future research. Sekar et al.'s approach consists of converting rule sets to
FSAs, then generating and testing functions that map from the post- to the prelearning
FSA and property. If the desired function can be found, they apply a theorem from Kurshan
(1994), which guarantees that the learning is \safe." Although no complexity results are
provided, the generate-and-test approach that they describe appears to be computationally
expensive. In contrast to Sekar et al., we have proofs and empirical evidence that our
methods are e�cient and, in some cases, that they are substantially more e�cient than
total reveri�cation from scratch.
There is also related research in the �eld of classical planning. In particular, Weld and
Etzioni (1994) have a method to incrementally test an agent's plan to decide whether to
add new actions to the plan. Actions are added only when their e�ects do not violate a
certain type of Invariance property. Their method has some similarities with our Inc
gen�I
algorithm. One di�erence is that our method is for reactive rather than projective plans.
145
Gordon
Another is that our veri�cation method is expressed using the formal foundations in the
model checking literature.
As mentioned in the introduction of this paper, FSAs have been shown to be e�ective
representations of reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel
& Markovitch, 1996; Fogel, 1996). FSA plans have been used both for multiagent competi-
tion and coordination. For example, Fogel's (1996) co-evolving FSA agents for competitive
game playing were mentioned above. A similarity with our work is that Fogel assumes
agents' plans are expressed as !-automata. Nevertheless, Fogel never discusses veri�cation
of these plans. Goldman and Rosenschein (1994) present a method for multiagent coordi-
nation that assumes FSA plans. Multiple agents cooperate by taking actions to favorably
alter their environment. The cooperation strategy is implemented by a plan developer who
manually edits the FSAs. The relationship to the work here is that they present FSA
transformations that ensure multiagent coordination. Likewise, in our research, a learn-
ing operator that is a priori guaranteed \safe" for some multiagent coordination property
transforms the FSA while ensuring coordination. Although both their method and ours
guarantee this coordination, their solution is manual whereas ours is entirely automated.
Some of the more recent research on agent coordination applies formal veri�cation meth-
ods. For example, Lee and Durfee (1997) model their agents' semantics with a formalism
similar to Petri nets (rather than FSAs). They verify synchronization (Invariance) proper-
ties, which prevent deadlock, using model checking. Furthermore, Lee and Durfee suggest
recovery from failed veri�cation using two methods: concept learning, and a method analo-
gous to that used by Ramadge and Wonham (1989). Burkhard (1993) and Kabanza (1995)
assume agent plans are represented as !-automata, and they address issues of model check-
ing temporal logic properties of the joint (multiagent) plans. Thus there is a growing
precedent for addressing multiagent coordination by expressing plans as !-automata and
verifying them with model checking. Our work builds on this precedent, and also extends
it, because none of this previous research addresses e�cient reveri�cation for agents that
learn.
Finally, there are alternative methods for constraining the behavior of agents, which
are complementary to reveri�cation and self-repair. For example, Shoham and Tennenholtz
(1995) design agents that obey social laws, e.g., safety conventions, by restricting the agents'
actions. Nevertheless, the plan designer may not be able to anticipate and engineer all laws
into the agents beforehand, especially if the agents have to adapt. One solution is to use
laws that allow maximum exibility (Fitoussi & Tennenholtz, 1998). However this solution
does not allow for certain changes in the plan, such as the addition or deletion of actions.
An appealing alternative would be to couple initial engineering of social laws with e�cient
reveri�cation after learning.
A method for ensuring physically bounded behavior of agents is \arti�cial physics"
(Spears & Gordon, 1999). With arti�cial physics, multiagent behavior is restricted by
arti�cial forces between the agents. Nevertheless, when encountering severe unanticipated
circumstances, arti�cial physics needs to be complemented with reveri�cation and \steering"
for self-repair (Gordon et al., 1999).
146
Asimovian Adaptive Agents
8. Summary and Future Work
Agent technology is growing rapidly in popularity. To handle real-world domains and in-
teractions with people, agents must be adaptable, predictable, and rapidly responsive. An
approach to resolving these potentially con icting requirements is presented here. In sum-
mary, we have shown that certain machine learning operators are a priori (with no run-time
reveri�cation) safe to perform. In other words, when certain desirable properties hold prior
to learning, they are guaranteed to hold post-learning. The property classes considered here
are Invariance and Response. Learning operators o
delete
, o
spec
, o
delete_spec
, and o
delete�action
were found to preserve properties in either of these classes. For SIT
1agent
and SIT
1plan
,
where there is a single (multi)agent FSA plan, o
delete+gen
, o
spec+gen
and o
stay
were found to
preserve Invariance properties. All of the a priori results are independent of the size of the
FSA and are therefore applicable to any FSA that has been model checked originally.
We then discussed transformations of learning operators and their corresponding a priori
results to a product plan. This addresses SIT
multplans
, where multiple agents each have their
own plan but the multiagent plan must be re-formed and reveri�ed to determine whether
multiagent properties are preserved. It was discovered that only o
delete
, o
spec
, o
delete_spec
,
and o
delete�action
preserve their a priori results for this situation.
Finally, we presented novel incremental reveri�cation algorithms for all cases in which the
a priori results are negative. It was shown in both theoretical and empirical comparisons that
these algorithms can substantially improve the time complexity of reveri�cation over total
reveri�cation from scratch. Empirical results showed as much as a
1
2
-billion-fold speedup.
These are initial results, but continued research along these lines will likely be applicable to
a wide range of important problems, including a variety of agent domains as well as more
general software applications.
When learning is required, we suggest that the a priori results should be consulted
�rst. If no positive results (i.e., the learning operator is an SML) exist, then incremental
reveri�cation proceeds.
To test our overall framework, we have implemented the rovers example of this paper
as co-evolving agents assuming SIT
multplans
, i.e., multiple agents each with its own plan.
By using the a priori results and incremental algorithms, we achieved signi�cant speedups.
We have also developed a more sophisticated application that uses reveri�cation during
evolution. Two agents compete in a board game, and one of the agents evolves its strategy to
improve it. The key lesson that has been learned from this implementation is that although
the types of FSAs and learning operators are slightly di�erent from those presented in this
paper, and the property is quite di�erent (it is a check for a certain type of cyclic behavior
on the board), initial experiences show that the methodology and basic results here could
potentially be easily extended to a variety of multiagent applications.
Future work will focus primarily on extending the a priori results to other learning
operators/methods and property classes, developing other incremental reveri�cation algo-
rithms, and exploring plan repair to recover from reveri�cation failures. One way in which
the a priori results might be extended is by discovering when learning operators will make
a property true, even if it was not true before learning.
A question that was not addressed here is whether the incremental methods are useful if
multiple machine learning operators are applied in batch (e.g., as one might wish to do with
147
Gordon
operator o
add�action
). In the future we would like to explore how to handle this situation
{ is it more e�cient to treat the operators as having been done one-at-a-time and use
incremental reveri�cation for each? Or is total reveri�cation from scratch preferable? Or,
better yet, can we develop e�cient incremental algorithms for sets of learning operators?
Plan repair was not discussed in this paper and is an important future direction. The
research of De Raedt and Bruynooghe (1994), which uses counterexamples to guide the
revision of theories subject to integrity constraints, may provide some ideas. There are
also plan repair methods in the classical planning literature that might be relevant to our
approach (Joslin & Pollack, 1994; Weld & Etzioni, 1994). It would be interesting to compare
the time to repair plans versus trying another learning operator and reverifying.
A limitation of our approach is that it does not handle stochastic plans or properties
with time limits, e.g., a Response property for which the response must occur within a
speci�ed time after the trigger. We would like to extend this research to stochastic FSAs
(Tzeng, 1992) and timed FSAs/properties (Alur & Dill, 1994; Kabanza, 1995), as well
as other common agent representations besides FSAs. Another direction for future work
would be to extend our results to symbolic model checking, which uses binary decision
diagrams (BDDs) so that the full state space need not be explicitly explored during model
checking (Burch et al., 1994). In some cases, symbolic model checking can produce dramatic
speedup. However, none of the current research on symbolic model checking addresses
adaptive systems.
Additionally, the ideas here are applicable to some of the FSA-based control theory
work. For example, Ramadge and Wonham (1989) assume FSA representations for both the
plant (which is assumed to be a discrete-event system) and the supervisor (which controls
the actions of the plant). We are currently applying some of the principles of e�cient
reveri�cation to change the supervisor in response to changes in the plant in a manner that
preserves properties (Gordon & Kiriakidis, 2000).
Finally, future work should focus on studying how to operationalize Asimov's Laws for
intelligent agents. What sorts of properties best express these laws? Weld and Etzioni
(1994) provide some initial suggestions, but much more remains to be done.
Acknowledgments
This research is supported by the O�ce of Naval Research (N0001499WR20010) in con-
junction with the \Semantic Consistency" MURI. I am grateful to William Spears, Joseph
Gordon, Stan Sadin, Chitoor Srinivasan, Ramesh Bharadwaj, Dan Hoey, and the anony-
mous reviewers for useful suggestions and advice. The presentation of the material in this
paper was enormously improved thanks to William Spears' suggestions.
148
Asimovian Adaptive Agents
Appendix A. Glossary of Notation
j= Models (satis�es)
model checking A veri�cation method entailing brute-force search
AT Automata-theoretic model checking
SIT
1agent
Single agent situation
SIT
1plan
Multiagent situation where each agent uses a multiagent plan
SIT
multplans
Multiagent situation where each agent uses an individual plan
FSA Finite-state automaton
V (S) The set of states (vertices) of FSA S
E(S) The set of state-to-state transitions (edges) of FSA S
transition condition Logical description of the set of actions enabling a transition
K A Boolean algebra
� Boolean algebra partial order; x � y i� x ^ y = x
M
K
(S) The matrix of transition conditions of FSA S
M
K
(v
i
; v
j
) Transition condition associated with edge (v
i
; v
j
)
I(S) The set of initial states of FSA S
atoms Primitive elements of a Boolean algebra; atoms are actions
string Sequence of actions (atoms)
L(S) The language of (set of strings accepted by) FSA S
!-automaton An FSA that accepts in�nite-length strings
run The sequence of FSA vertices visited by a string
accepting run The run of a string in the FSA language
acceptance criterion A requirement of accepting runs of an FSA
The tensor (synchronous) product of FSAs
complete FSA Speci�es a transition for every possible action
deterministic FSA The choice of action uniquely determines the next state
path Sequence of vertices connected by edges
cycle A path with start and end vertices identical
c-state Computational state; an action occurring in a computation
accessible from There exists a path from
2 Temporal logic \invariant"
3 Temporal logic \eventually"
Invariance property 2:p, i.e., \Invariant not p"
Response property 2(p! 3q), i.e., \Every p is eventually followed by q"
First-Response property The �rst p (trigger) is followed by a q (response)
B(S) The set of \bad" (to be avoided) states of FSA S
" Can increase accessibility
6" Cannot increase accessibility
# Can decrease accessibility
6# Cannot decrease accessibility
SML Safe machine learning operator, i.e., preserves properties
sound algorithm One that is correct when it states that S j= P
complete algorithm One that is correct when it states that S 6j= P
� The FSA transition function
149
Gordon
Appendix B. Temporal logic properties
This appendix, which is based on Manna and Pnueli (1991), formally de�nes Invariance and
Response properties in temporal logic. We begin by de�ning the basic temporal operator U
(Until). We assume a string (x
0
; :::) of c-states of FSA S, where 0 � i; j; k. Then for c-state
formulae p and q, we de�ne Until as x
j
j= p U q , for some k � j, x
k
j= q, and for every i
such that j � i < k, x
i
j= p.
Invariance properties are de�ned in terms of Eventually properties, so we de�ne Even-
tually �rst. For c-state formula p and FSA S, we de�ne property P = 3p (\Eventually p")
as a property that is true (false) for a string if it is true (false) at the initial c-state x
0
of
the string. Formally, if x = (x
0
; :::) is a string of FSA S, then x j= 3p , x
0
j= true U p,
i.e., \eventually p." A property P = 2:p (\Invariant not p") is de�ned as x j= 2:p ,
x j= :3p, i.e., \never p." Finally, a Response formula is of the form 2(p ! 3q), where p
is called the \trigger" and q the \response." A Response formula states that every trigger
is eventually followed by a response.
Appendix C. Properties for Experiments
The following �ve Invariance properties were used in the test suite:
2 (:(I-deliver ^ L-transmit))
2 (:(I-deliver ^ L-pause))
2 (:(F-collect ^ I-deliver))
2 (:(F-collect ^ I-deliver ^ L-receive))
2 (:(F-deliver ^ I-receive ^ L-pause))
The following �ve Response properties were used in the test suite:
2 (F-deliver ! 3 L-receive)
2 (F-deliver ! 3 I-receive)
2 (F-collect ! 3 L-transmit)
2 ((F-collect ^ I-deliver) ! 3 L-receive)
2 (F-deliver ! 3 (I-receive ^ L-receive))
References
Alur, R., & Dill, D. (1994). A theory of timed automata. Theoretical Computer Science,
126, 183{235.
Asimov, I. (1950). I, Robot. Greenwich, CT: Fawcett Publications, Inc.
Bavel, Z. (1983). Introduction to the Theory of Automata. Reston, VA: Prentice-Hall.
B�uchi, J. (1962). On a decision method in restricted second-order arithmetic. InMethodology
and Philosophy of Science, Proceedings of the Stanford International Congress, pp. 1{
11. Stanford, CA: Stanford University Press.
150
Asimovian Adaptive Agents
Burch, J., Clarke, E., Long, D., McMillan, K., & Dill, D. (1994). Symbolic model checking
for sequential circuit veri�cation. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 13(4), 401{424.
Burkhard, H. (1993). Liveness and fairness properties in multi-agent systems. In Proceedings
of the Thirteenth International Joint Conference on Arti�cial Intelligence (IJCAI),
pp. 325{330. Chambery, France.
Carmel, D., & Markovitch, S. (1996). Learning models of intelligent agents. In Proceedings
of the Thirteenth National Conference on Arti�cial Intelligence (AAAI), pp. 62{67.
Portland, OR.
Clarke, E., & Wing, J. (1997). Formal methods: State of the art and future directions.
ACM Computing Surveys, 28(4), 626{643.
Courcoubetis, C., Vardi, M., Wolper, P., & Yannakakis, M. (1992). Memory-e�cient algo-
rithms for the veri�cation of temporal properties. Formal Methods in System Design,
1, 257{288.
De Raedt, L., & Bruynooghe, M. (1994). Interactive theory revision. In Michalski, R.,
& Tecuci, G. (Eds.), Machine Learning IV, pp. 239{264. San Mateo, CA: Morgan
Kaufmann.
Dean, T., & Wellman, M. (1991). Planning and Control. San Mateo, CA: Morgan Kauf-
mann.
Elseaidy, W., Cleaveland, R., & Baugh, J. (1994). Verifying an intelligent structure control
system: A case study. In Proceedings of the Real-Time Systems Symposium, pp.
271{275. San Juan, Puerto Rico.
Fitoussi, D., & Tennenholtz, M. (1998). Minimal social laws. In Proceedings of the Fifteenth
National Conference on Arti�cial Intelligence, pp. 26{31. Madison, WI.
Fogel, D. (1996). On the relationship between duration of an encounter and the evolution
of cooperation in the iterated prisoner's dilemma. Evolutionary Computation, 3(3),
349{363.
Goldman, S., & Rosenschein, J. (1994). Emergent coordination through the use of coop-
erative state-changing rules. In Proceedings of the Twelfth National Conference on
Arti�cial Intelligence, pp. 408{413. Seattle, WA.
Gordon, D. (1998). Well-behaved borgs, bolos, and berserkers. In Proceedings of the Fif-
teenth International Conference on Machine Learning (ICML), pp. 224{232. Madison,
WI.
Gordon, D. (1999). Re-veri�cation of adaptive agents' plans. Tech. rep., Navy Center for
Applied Research in Arti�cial Intelligence.
Gordon, D., & Kiriakidis, K. (2000). Adaptive supervisory control of interconnected discrete
event systems. In Proceedings of the International Conference on Control Applications
(ICCA), pp. 50{56. Anchorage, AK.
151
Gordon
Gordon, D., Spears, W., Sokolsky, O., & Lee, I. (1999). Distributed spatial control, global
monitoring and steering of mobile physical agents. In Proceedings of the IEEE Inter-
national Conference on Information, Intelligence, and Systems (ICIIS), pp. 681{688.
Washington, D.C.
Grefenstette, J., & Ramsey, C. (1992). An approach to anytime learning. In Proceed-
ings of Ninth International Workshop on Machine Learning, pp. 189{195. Aberdeen,
Scotland.
Heitmeyer, C., Kirby, J., Labaw, B., Archer, M., & Bharadwaj, R. (1998). Using abstraction
and model checking to detect safety violations in requirements speci�cations. IEEE
Transactions on Software Engineering, 24(11), 927{948.
Holzmann, G., Peled, D., & Yannakakis, M. (1996). On nested depth-�rst search. In
Proceedings of the Second Spin Workshop, pp. 81{89. Rutgers, NJ.
Holzmann, G. J. (1991). Design and Validation of Computer Protocols. NJ: Prentice-Hall.
Joslin, D., & Pollack, M. (1994). Least-cost aw repair: A plan re�nement strategy for
partial-order planning. In Proceedings of the Twelfth International Conference on
Arti�cial Intelligence, pp. 1004{1009. Seattle, WA.
Kabanza, F. (1995). Synchronizing multiagent plans using temporal logic speci�cations. In
Proceedings of the First International Conference on Multiagent Systems (ICMAS),
pp. 217{224. San Francisco, CA.
Kurshan, R. (1994). Computer Aided Veri�cation of Coordinating Processes. Princeton,
NJ: Princeton University Press.
Lee, J., & Durfee, E. (1997). On explicit plan languages for coordinating multiagent plan
execution. In Proceedings of the Fourth International Workshop on Agent Theories,
Architectures, and Languages (ATAL), pp. 113{126. Providence, RI.
Manna, Z., & Pnueli, A. (1991). Completing the temporal picture. Theoretical Computer
Science, 83(1), 97{130.
Michalski, R. (1983). A theory and methodology of inductive learning. In Michalski, R.,
Carbonell, J., & Mitchell, T. (Eds.), Machine Learning I, pp. 83{134. Palo Alto, CA:
Tioga.
Mitchell, T. (1978). Version Space: An Approach to Concept Learning. Ph.D. thesis,
Stanford University.
Nilsson, N. (1980). Principles of Arti�cial Intelligence. Palo Alto, CA: Tioga.
Potter, M. (1997). The Design and Analysis of a Computational Model of Cooperative
Coevolution. Ph.D. thesis, George Mason University.
Ramadge, P., & Wonham, W. (1989). The control of discrete event systems. Proceedings of
the IEEE, 1, 81{98.
152
Asimovian Adaptive Agents
Sekar, R., Lin, Y.-J., & Ramakrishnan, C. (1994). Modeling techniques for evolving dis-
tributed applications. In Proceedings of Formal Description Techniques (FORTE),
pp. 22{29. Berne, Switzerland.
Shoham, Y., & Tennenholz, M. (1995). On social laws for arti�cial agent societies: O�-line
design. Arti�cial Intelligence, 73(1-2), 231{252.
Sikorski, R. (1969). Boolean Algebras. New York, NY: Springer-Verlag.
Sokolsky, O., & Smolka, S. (1994). Incremental model checking in the modal mu-calculus.
In Proceedings of Computer-Aided Veri�cation (CAV), pp. 351{363. Stanford, CA.
Spears, W., & Gordon, D. (1999). Using arti�cial physics to control agents. In Proceedings
of the IEEE International Conference on Information, Intelligence, and Systems, pp.
281{288. Washington, D.C.
Tennenholtz, M., & Moses, Y. (1989). On cooperation in a multi-entity model. In Pro-
ceedings of the Eleventh International Joint Conference on Arti�cial Intelligence, pp.
918{923.
Tzeng, W. (1992). Learning probabilistic automata and markov chains via queries. Machine
Learning, 8, 151{166.
Vardi, M., & Wolper, P. (1986). An automata-theoretic approach to automatic program
veri�cation. In Proceedings of the First Annual Symposium on Logic in Computer
Science (LICS), pp. 332{345. Cambridge, MA.
Weld, D., & Etzioni, O. (1994). The �rst law of robotics. In Proceedings of the Twelfth
National Conference on Arti�cial Intelligence, pp. 1042{1047. Seattle, WA.
153