toward socially intelligent service robotseecs.vanderbilt.edu/cis/papers/socintagent.doc · web...

Toward Socially Intelligent Service Robots

D.M. Wilkes, A. Alford, R.T. Pack*, T. Rogers, R.A. Peters II, and K. KawamuraIntelligent Robotics Laboratory

Vanderbilt University Nashville, Tennessee, 37235 USA

*R.T. Pack is now withReal World Interface, Inc.

Jaffrey, New Hampshire 03452 USA

AbstractIn the Intelligent Robotics Laboratory (IRL) at Vanderbilt University we seek to develop service robots with

a high level of social intelligence and interactivity. In order to achieve this goal, we have identified two main issues for research. The first issue is how to achieve a high level of interaction between the human and the robot. This has lead to the formulation of our philosophy of Human Directed Local Autonomy (HuDL), a guiding principle for research, design, and implementation of service robots. The motivation for integrating humans into a service robot system is to take advantage of human intelligence and skill. Human intelligence can be used to interpret robot sensor data, eliminating computationally expensive and possibly error-prone automated analyses. Human skill is a valuable resource for trajectory and path planning as well as for simplifying the search process. In this paper we present our plans for integrating humans into a service robot system. We present our paradigm for human/robot interaction, HuDL. The second issue is the general problem of system integration, with a specific focus on integrating humans into the service robotic system. This work has lead to the development of the Intelligent Machine Architecture (IMA), a novel software architecture that has been specifically designed to simplify the integration of the many diverse algorithms, sensors, and actuators necessary for socially intelligent service robots. Our testbed system is described, and some example applications of HuDL for aids to the physically disabled are given. An evaluation of the effectiveness of the IMA is also presented. .

1 IntroductionThe current generation of service robots typically are not socially intelligent agents, or have only the

most primitive of social interaction capabilities. In the Intelligent Robotics Laboratory (IRL) at Vanderbilt University we seek to develop service robots with a higher level of social intelligence and interactivity. In order to achieve this goal, we have identified two main issues for research. The first issue is how to achieve a high level of interaction between the human and the robot. This has lead to the formulation of our philosophy of Human Directed Local Autonomy (HuDL), a guiding principle for research, design, and implementation of service robots. The service robots we seek to develop are socially intelligent agents by the very nature of what they are intended to do. They are to be useful robots that interact closely with humans through as natural interfaces as possible. The relationship can be thought of as symbiotic in the sense that both the human and the robot work together to achieve goals, for example as aids to the elderly or disabled. The second issue is the general problem of system integration, with a specific focus on integrating humans into the service robotic system. This work has lead to the development of the Intelligent Machine Architecture (IMA), a novel software architecture that has

. HuDL, A Design Philosophy for Socially Intelligent Service Robots, D.M. Wilkes, R.T. Pack, W.A. Alford, and K. Kawamura, in Socially Intelligent Agents, AAAI Press Technical Report FS-97-02, ©1997, American Association for Artificial Intelligence.

been specifically designed to simplify the integration of the many diverse algorithms, sensors, and actuators necessary for socially intelligent service robots.

In much research a socially intelligent agent is a purely virtual agent. It is constructed entirely via software and lives and operates within a computer or computer network. These agents are capable of very significant and challenging tasks within their universe, but when it comes to interacting with the physical reality of our universe they are, of course, very limited. This is due to the fact that virtual agents are not robots. Robots, on the other hand, are a curious blend of the virtual and the physical. Since robots typically have one or more computers as “brains” they may indeed contain some types of virtual agents or models of the universe. However, a robot is specifically designed to accomplish physical tasks. Thus a robot is not a computer and typically has physical capabilities considerably beyond those of a computer. As a consequence of this combination of the virtual and the physical, research into socially intelligent robots has many challenging aspects that are often not present in purely virtual agents. Issues of control, computer vision, signal processing, selection of actuators and sensors, and many others are inescapable in robotics. Additionally, robotic agents must also incorporate artificial intelligence, planning, reasoning, human-computer interaction, modeling, etc., as in virtual agents.

Since a service robot is a truly concrete physical realization of a socially intelligent agent, its development represents an extremely challenging task. Many difficult problems in planning, learning, control, communication, signal processing, vision, etc. must be solved to produce a robustly functioning system. Attempting to achieve such a system and simultaneously require that it exhibit a high level of autonomy is very difficult, very expensive, and typically impractical at the present time. We propose a guiding philosophy for research and design that explicitly supports the evolution of a robot from a system with limited abilities and autonomy to one with a high degree of both. The philosophy we propose is Human Directed Local Autonomy, or HuDL (Kawamura et al. 1996a).

HuDL is based on exploiting the symbiotic relationship between the human user and the robot. In essence, the idea is to make maximum use of the things that humans do well and the things that robots can do well. A good example is a robot to aid a physically disabled person, perhaps suffering from some degree of paralysis. The human is intelligent, but has physical limitations in mobility and dexterity. The robot is mobile and/or able to manipulate objects, but perhaps lacks the intelligence to solve many real-world problems. A symbiotic combination of human and robot can improve the ability of the human to control his environment and enhances the usefulness of the robot by significantly augmenting its intelligence with that of the user. One key feature is flexible human integration. The background for this is clear: Human intelligence is superior whenever unexpected or complicated situations are met. Roles and methods for integrating humans into different types of activities must be defined and developed.

For example, the user may request the robot to go into the kitchen and bring him a drink (see Figure 1). The robot may have sufficient intelligence to navigate to the kitchen, but the planning and object recognition problems may prove too difficult. If appropriate visual feedback is supplied to the user (perhaps a monitor lets him “see” what the robot sees) he can narrow the search space for the robot by giving high level instructions (perhaps through a speech recognition system that provides a natural and convenient interface) such as “Pick up the red glass on the counter to your right.” The robot may see several red objects that could be the glass in question, moves forward, points to the most likely object, and sends a message to the user, “Do you mean this one?” The user responds “No, that is a bottle of catsup, the glass is further to your right.” In this way the user guides the robot through the difficult object recognition task, greatly enhancing the likelihood of success. Similarly, the robot has enabled the human to have more control over his environment.

Figure 1: Service Robot System Testbed

In the Intelligent Robotics Laboratory, we are using HuDL to guide the development of a cooperative service robot team consisting of a dual armed stationary humanoid, called ISAC (see Figure 2), and a modified Helpmate mobile robot, simply called Helpmate (see Figure 3). The user interfaces currently under development include speech recognition (for verbal input), text to speech (for verbal output), vision (for tracking of faces and hands as well as many other tasks), gesture (a vision based interface), force, touch and sound localization. These interfaces are being used to make the overall interaction with ISAC and Helpmate into a natural “multimedia” experience that is comfortable for non-technical users. ISAC is even able to play an electronic musical instrument, a theremin, with perfect pitch. Indeed, this may be one of the most interesting social skills of a service robot to date.

Figure 2: Our humanoid robot, ISAC.

Figure 3: Our mobile robot, Helpmate.

The organization of the paper is as follows. Section 2 describes human-robot interaction, approaches to such interaction used by other researchers, our approach (HuDL), and reviews some of the many interaction modalities that are available. Section 3 relates the human-robot interaction problem to the more general problem of system integration and presents the Intelligent Machine Architecture, our solution to dealing with the complexities of system integration. Section 4 describes the application of the IMA to service robots and evaluates its performance. Section 5 describes our testbed system. Section 5 describes our robot testbed, the service robot tasks we are addressing, and discusses our progress. Finally, Section 6 gives concluding remarks.

2 Human Directed Local AutonomyFor the past ten years, our lab has focused on service robotics research. Specifically, we have been

developing robotic aid systems for the disabled (Bagchi and Kawamura 1994), (Pack and Iskarous 1994), (Kawamura et al. 1996a). We have continually observed that a key research issue in service robotics is the integration of humans into the system. This has led to our development of guiding principles for service robot system design (Kawamura et al. 1996b). Our paradigm for human/robot interaction is HuDL, initially introduced in (Kawamura et al. 1996a).

2.1 Human/Robot InteractionIn traditional industrial robotics settings, a human presence is not only undesirable; it can be dangerous.

In a service robotics setting, however, a human presence is not only possible, it can be highly desirable. Why should humans and robots interact? Obviously if a service robot is to be of use to the human, it must be able to communicate with the human. The human needs some way of telling the robot what service to perform, or to stop what it is doing, or even provide an evaluation of how well the robot is performing.

The role of the human in controlling the robot can be described by a spectrum (Pook and Ballard 1995). At the extremes of the spectrum are teleoperation and full autonomy. These extremes have both advantages and disadvantages. Teleoperation is tiring and tedious for the user, while full autonomy is very difficult to achieve and is often fragile. The middle of the spectrum tries to balance robot autonomy and human control in order to keep the advantages of the extremes, while avoiding their disadvantages.

Integrating humans into a robot system in this way has several advantages. First, is the use of the human's intelligence and decision making abilities. For example, the human can interpret the robot's sensor data (e.g., indicating a target object in a camera scene), thereby simultaneously reducing the computational burden on the robot and increasing the robustness of the overall system. The human can also detect an exceptional or error situation, and assist the robot in recovering (Frohlich and Dillmann 1993). However, the human is not directly or explicitly driving the robot's actuators. This relieves the human of the often tedious and frustrating, or in the case of the physically disabled user, impossible task of manual teleoperation.

2.2 Deictic ControlSeveral systems using a deictic or pointing strategy have been developed for human/robot interaction. In

these systems, the user can select and parameterize a robot behavior from a set of possible behaviors. The user parameterizes the behavior by pointing. The Deictic Teleassistance model (Pook and Ballard 1995) uses a sign language for controlling a robot. The user can select from five signs, which can select from a set of four behaviors. The signs contain both semantic (i.e., which behavior is selected) and spatial (i.e., what is the context of the behavior) content. To input the signs, the user wears a glove that can measure his finger joint angles and the 3D location of his hand.

The testbed robot for this sign language was a PUMA manipulator with a Utah/MIT hand; the task, to open a door. Users controlled the robot using both teleoperation and teleassistance via the sign language. On average, users spent only 7 seconds controlling the robot using teleassistance, compared with 33 seconds using teleoperation. Remote experiments were also performed, where the user wore a head-mounted video display system which displayed images from cameras mounted near the manipulator. In these experiments communication latency was simulated by introducing delay into the transmission of the remote video and the user's input. These experiments showed that teleoperation completion time increased linearly with time, from an average of about 50 sec with .5 sec latency, to about 130 sec with 2 sec latency, while teleassistance completion time, on average, was between 20 and 40 sec regardless of latency. Other researchers have used pointing to control mobile robots. The HyperScooter (Shibata, et.al., 1995) is a scooter-type wheelchair robot with a camera. The user in this case rides the robot and controls it using a GUI mounted on the handlebars. This GUI presents the user with a shared view of the video from the front-mounted camera, in which the user can indicate areas, and with buttons for other user input. To move the robot, the user selects either a "Go Toward" or "Turn" behavior. The "Go Toward" behavior requires the user to select a portion of the shared view, which the robot's vision system uses for tracking. The robot then moves until the user presses a button labeled "Here." The robot then prompts the user to select another image region as a visual cue to the robot. This cue is used later when the robot executes a learned task. The robot learns a task by storing a sequence of object-action-cue triples called Visually Governed Actions (VGA). The object is the target for the tracking routine of the vision system, the action is the robot behavior (e.g., "Go Toward"), and the cue is the visual cue used to determine when to transfer to the next VGA.

Supervised Autonomy (Cheng and Zelinsky 1997) is another deictic control system for a mobile robot. As with the HyperScooter, the user has a GUI with a set of behaviors and a shared view of the robot's visual input. The user selects areas in the shared view which are used by the robot's vision system. The robot's vision system can also suggest areas of the image which appear "interesting" - that is, areas which produce a high output from an interest-operation on the image. In addition to user selected behaviors, the robot also has a collision avoidance behavior which can override the user's selected behavior. During operation, the robot sends notifications to the user when certain events occur, such as the activation of the collision avoidance behavior or failure in the tracking system. The robot uses a Purposive Map (PM) to record these events and display them for the user.

While all these systems use similar techniques for controlling the robot, i.e., selecting and parameterizing behaviors, it is interesting to compare the modes of feedback used by each. In the case of the Deictic Teleassistance system, no mention is made of feedback, other than video images in the case of the remote operation. The HyperScooter provides the user with a shared view of the robot's video input, with textual output of the current VGA and with images of visual cues. The Supervised Autonomy system also provides a

shared view of the robot's video input, as well as notifications to the user and a graphical display of the purposive map.

2.3 Collaborative DiscourseCollaborative systems feature interaction between humans and computational agents to solve a problem.

The agents in these systems are intelligent, autonomous software entities that conduct a discourse with the human to communicate plans and set goals. Collagen (Rich and Sidner 1997) is a toolkit for building the “front end” of such agents; that is, the part of the agent that communicates with the agent. As the authors put it, “Collagen provides a generic framework for recording and communicating the decisions made by the agent (and the user), but not for making them”. The agent and human communicate using an artificial language that is application-specific. Although the framework is application-independent, the agent designer must provide a model of the task. This model includes the artificial discourse language and the recipe library.

One area of interest in collaborative systems research is mixed initiative interaction. In such a system, either the human or the computational agent can take the initiative of the conversation. (Novick and Sutton 1997) discusses the definition of initiative. They identify three factors of initiative:

· Choice of task - what the conversation is about

· Choice of speaker - who speaks and when

· Choice of outcome - who does what

The authors discuss several models of initiative in the context of these three factors, identifying which factors each model incorporates.

2.4 HuDL The previous sections described two paradigms of interaction for humans and artificial agents or robots -

deictic control and mixed-initiative collaborative discourse. In the deictic systems, the human initiates most actions taken by the robot; the robot's autonomy is limited to a set of behaviors, such as collision avoidance. Interaction with the robot is limited - the human gives commands and the robot displays some sensory input or representation of its internal state. In the collaborative systems, the artificial agent is autonomous, and though it cooperates with the human, and may need the human's help in solving a problem, the agent can “take control” of the conversation; it need not wait for the human's input. In addition, the discourse occurs in a high-level artificial language or a natural language.

We would like more interaction between human and robot and more robot autonomy than exists in the deictic systems. However, we are not attempting to achieve in our robots the level of autonomy and ability of the agents in collaborative systems. Our paradigm for human/robot interaction, HuDL, tries achieve a balance between deictic and collaborative paradigms. HuDL has three important aspects: interaction, rhythm, and adaptation.

By interaction we mean that the human and robot exchange information. This interaction allows the human to influence the robot's actions in some way and acquire feedback about the robot's internal state. This interaction can also be used by the robot to confirm the human's input and request the human's assistance. Although the interaction is not necessarily conducted in a high-level language, as in the collaborative systems, the goal is the same: for each participant to convey information about its own actions, goals, or plans, and to acquire information about the other participant's actions, goals, and plans. We would like this interaction to occur periodically - that is, it should have rhythm. Rhythmic responses from computer systems have become important indicators of correct function: animated cursors for windowing systems, animated icons for web browsers, even the sound of disk access-all these things let us know that our computers are working. If the computer ceases to give some type of rhythmic response, i.e., the cursor stops moving, the disk access stops, etc., the human knows that it has “locked up” and should take corrective action. The same idea can apply to a robot. As long as it can provide a rhythmic response, the human is more confident that it is functioning. If the human-robot interaction is rhythmic, exceptional or error states can be detected by the simple technique of the

timeout. The human expects the robot to respond within a certain time interval. If it does not, the human may suspect the robot has failed in some way. Likewise, the robot also expects some input from the human. If the human does not provide it, the robot may ask if the human is busy, confused, or perhaps even in need of medical attention.

As the human becomes more familiar with the robot, and proficient in interaction, his interaction patterns will change. Consider the user of a word processor. At first, as the user discovers the capabilities of the software, he probably does things "the hard way," by searching through menus, etc. As he becomes more familiar with the program, he begins to learn shortcuts. Likewise, the user of a robot is likely to desire shortcuts in using the robot. Besides the obvious customizations of shortcuts, macros and preferences, the adaptive aspect of the human-robot interaction provides richer avenues for "personalizing" the robot. For example, through the use of an appropriate learning mechanism, the robot may learn to anticipate what the human will request next. Although it would be annoying and possibly dangerous to have the robot execute tasks before it is asked, the robot could prepare to execute the task by, for example, searching for target objects or positioning its manipulators.

Currently our concept of HuDL is still under development. We are using the concepts of HuDL to design a primitive agent within the IMA for user interaction (IMA is described in detail in Section 3.). This primitive agent, or "HuDL agent," will process user input and dispatch events to other IMA primitive agents. The HuDL agent will also be a "clearinghouse" for user notifications. Thus, the HuDL agent will act as a go-between for the robot and the human.

2.5 Interaction Modalities for Service RobotsSince service robots are specifically intended to perform tasks for and around humans, special

consideration must be given to the modalities of interaction between robots and people. To make the interaction more “normal” for the human user, communication should go far beyond human keyboard input and flashing lights on the robot. We want the human and robot to interact using as many different media as possible. Ideally, these media include those natural to human interaction, such as speech (Frohlich and Dillmann 1993), (Stopp et al. 1994), gestures (Pook and Ballard 1995), and touch (Muench and Dillmann 1997). The motivation for using these natural modes is that they are familiar, are more comfortable, and require less training for the user. Computer-based GUIs with pointing devices can also be used effectively (Cleary and Crisman 1996) and are also “natural” interfaces for robots.

As an example, imagine again that a robot is requested by the user to retrieve a certain item, but the robot’s visual object classification system is not robust enough to identify correctly the object in the current environment. The robot can indicate its best estimate of what it “thinks” is the object to the user. The user responds either by saying “Yes, that is the item,” or “No, the item I want is the large item to the right.” Use of symbolic terms such as “to the right” is more natural and convenient for human-robot interaction.

2.5.1 Human Detection and LocalizationService robots are expected, by definition, to provide services to human users. It is, therefore, critically

important that service robots be aware of humans. The sensing group report from the recent International Workshop on Biorobotics: Human-Robot Symbiosis clearly identified this capability as one of the foremost abilities required for service robots (Sandini et al. 1996). The report then went on to say, “Traditionally, humans are treated as obstacles for the robot. That is, in a human-robot symbiotic system where humans and robots act in close association, each must be aware of the other in the sense that they must be capable of understanding each other’s behaviors and, to some extent, predict each other’s intentions.”

If the robot is to interact with humans it is important for the robot to know when a human is present. This awareness of presence may indicate to the robot that it is being observed or that the user desires to give input. In addition to simply knowing that a human is present, a well-designed service robot will benefit from knowing where this human is in relation to itself. It is clear to see that human localization is essential for a service robot, both to avoid harming users and bystanders, as well as to enhance communication with humans. The requirement

of human detection and localization is quite possibly unique to socially intelligent service robotics. Most other fields of robotics are not so closely physically coupled with humans.

Human detection is a higher-level capability that employs several input modalities. We choose to list it here as a modality itself in order to emphasize it as a capability of central importance. Various technologies, such as the ones described below, must be integrated into the overall system and their results fused to robustly determine the presence of a human.

Face Detection and TrackingOne feature of humans that robots can be trained to recognize is the face. Face detection and tracking

involve determining if a face is represented in the visual system of the robot (Sinha 1994), (Sinha 1994b), (Podilchuk 1995), (Qiu 1997). Once a face is detected, the robot’s attention may be directed toward that person. Tracking allows the robot to continue to focus attention on the user while moving. Our lab has developed a face detection algorithm that can be modified into a tracking algorithm.

Skin Detection and TrackingThe color of human skin is another feature that a robot can be trained to recognize. Identifying skin color

usually indicates that a user is present. Skin detection and tracking can be used in ways similar to that of face detection and tracking. However, skin tracking is not limited to tracking faces, but can also be used for tracking hands, feet or whatever may be useful for the specific application. Our skin tracking algorithm, which incorporates both skin color detection and direction of the camera head, has performed well in our experiments.

Sound LocalizationDetection of humans is not limited to the visual system of service robots. The auditory system may also

be used. One capability of a microphone array is sound localization, the ability to determine the direction from which a sound has come. Sound localization, which may be used to direct the robot’s attention toward the location of the sound, is currently being studied in our lab.

Identification of UsersCombining the above technologies gives the robot an awareness that a human is in its presence, provided

that the user is visible, or has spoken to the robot. This information may then be further combined to produce some confidence measure of the detection. Additionally, as future work will hopefully demonstrate, the robot may combine face, skin tone, and sound characteristics to identify and distinguish frequent users from one another.

2.5.2 SpeechSpeech is often regarded as one of the most natural forms of communication between humans. Robots

that relate to humans should, then, be able to incorporate that type of communication. Speech recognition is a maturing technology that allows a machine to understand not just that something was spoken, but also which words and phrases were spoken. As a result of the advances in this area, there are several commercial products currently available that are cost-effective and have appropriately large vocabularies. We are planning to use IBM Via Voice for our speech recognition engine. The related problem of interpreting the recognized speech is currently under investigation in our lab and will be integrated into the overall system.

Text-to-speech technology allows the robot to speak text rather than just display it on a screen. The advances in this aspect of speech technology likewise have generated an available market of products. The current tools have a variety of properties used to customize the characteristics of the voice, including gender and speed. Currently we are developing using the Microsoft Speech Software SDK for text-to-speech synthesis.

2.5.3 GestureGesture is also very prevalent in human interaction, often resorted to in expressing emphasis. Since

gestures are used to communicate many things, an important design decision is to investigate what kinds of gestures will be useful in service robotics. Although many researchers have investigated the understanding of finger spelling or American Sign Language, those types of gestures are not the most natural for communicating with a service robot. Gestures that are more applicable are those that do not have a formal syntax or perform as formal language. Designing the robot to understand gestures that supplement language and convey more basic information will enhance other interaction modalities, and not merely duplicate effort. Some basic gestures convey commands to come, or stop, or to indicate selection or direction. All of these are related to action, place, or movement, which are physical and spatial, rather than abstract verbal concepts.

An example of the type of gesture that conveys information is pointing. When the results of pointing are combined with a natural voice command, such as “hand me that block,” the gesture can be seen to disambiguate the command if several blocks are present, but only one is pointed out. Our current work in gesture recognition involves finding a pointing human finger. This takes visual information in the form of color, to determine skin tone area, and model matching to determine the location and orientation of the fingertip. This recognition will be expanded to tracking a pointing finger to convey trajectory information.

2.5.4 Physical InteractionA major distinction of service robots from many other socially intelligent agents is that service robots

engage in physical interaction. These robots must be aware of their surroundings and other actors in the environment. This knowledge is key to the safe and successful execution of physical manipulations, which may include touching a human, handing off an object, moving an object, or moving toward an object. Physical interaction is directly concerned with the sensors and actuators of the robot.

The robots in our lab have capabilities to detect and effect elements in their environments. Sensors are the components of a robot that are used to detect what is in the robot's surroundings. A CCD camera is a very common sensor in robotics for acquiring visual information. The cameras we use are likewise common. Infrared (IR) sensors are also helpful in determining the presence of humans. Currently we are researching the use of an array of IR sensors to detect humans and their motion. Force and tactile sensors are used to let the robot know that it is touching another object and how hard. These are also being used in a variety of situations. One such current use of the force sensors is to detect when the robot’s hand has touched the table when searching for an object on it. Each of these sensors listed above can also be used in control algorithms to achieve the desired motion performance of the actuators. Actuators are the components of a robot that enable it to physically manipulate objects in its environment. These include arms, hands, legs, wheels, motors, etc. The robot controls the actuators to achieve its goals.

There are many interaction modalities available, and very likely more will come as technology progresses. A major challenge in robotics is the integration of such modalities into a coherent system. Our approach is described next.

3 The Intelligent Machine ArchitectureAlthough the robot may not have high levels of capability or even a large repertoire of tasks, it is

important that the robot system be a completely integrated system and not a demonstration of isolated capabilites. A fully integrated system, however simple and limited, permits the HuDL principles to be applied. A software architecture is proposed as a tool to support the integration process. Thus, the architecture helps developers of the service robot system to make changes and add capabilities more rapidly than other architectural approaches. The IMA is a tool for the development of intellgent machines.

The key to building a system using the HuDL principles is to build a fully integrated system and gradually improve elements of the system design based on feedback from user interactions. Therefore, we developed the Intelligent Machine Architecture (IMA) (Pack et al. 1997, Kawamura and Pack 1997, Pack 1998) a two-level software architecture for rapidly integrating the many elements of an intelligent machine, such as a

service robot. The high level model is called the robot-environment model, and it describes the software system in terms of a group of primitive software agents connected by a set of agent relationships. The concepts used for this agent-based decomposition of the system are inspired by Minsky's The Society of Mind (Minsky 1986) and object-oriented software engineering (Rumbaugh et al. 1991). The architecture also defines an implementation-level model, called the agent-object model, that describes each of the agents and relationships in the high-level model as a network of software modules called component objects. The separation of the architecture into two levels allows designers of intelligent machine software to address software engineering issues such as reuse, extensibility, and management of complexity, as well as system engineering issues such as parallelism, scalability, reactivity, and robustness. The IMA draws on ideas from many modern robot software architectures including the Subsumption Architecture (Brooks 1989), AuRA (Arkin 1990), GLAIR (Hexmoor, Lammens, and Shapiro 1995), ANA (Maes 1993), and others and represents the synthesis of these ideas with those from software architecture research (Shaw et al. 1995, Sztipanovits, Karsai, and Franke 1995) into a pattern for the development of software subsystems of intelligent machines that emphasizes integration and software reuse.

Figure 4: Correspondence of IMA Agents to System-Level Entities

3.1 Robot-Environment ModelIn the language used by Rumbaugh (Rumbaugh 1991), IMA primitive agents are actors (they use other

primitive agents as resources, and have a thread of execution) and servers (they provide resources for other primitive agents as well). A primitive agent within the IMA has certain properties that separate it from the typical uses of the term “Agent” and focus on what is essential to being a useful abstraction for software system integration. Some authors stipulate that agents must be able to reason, hold explicitly represented beliefs, communicate in formal languages and maximize their own utility (Wooldridge 1994, Rao 1996), but that is not essential to the usefulness of the agent concept as an abstraction for the development of software systems (Baeijs,

Demazeau, and Alvares 1996, Overgaard, Petersen, and Perram 1994). The primary features of an agent that make it a useful abstraction for software development are autonomy, proactivity, reactivity, connectivity and resource parsimony. The abstractions of primitive agents and relationships between agents described below help the IMA realize these properties for its agent network.

IMA primitive agents are asynchronous, decision-making software modules. The collection of asynchronously executing agents provides an abstraction that eliminates many concerns about synchronization, because the architecture provides each agent with a knowledge of time that allows each local decision process to evaluate information based on its age relative to that agent. The agents in IMA work less like traditional software systems and more like a set of coupled, closed-loop systems, continually sampling inputs, updating states and computing new outputs. Relationships provide the coupling between the computation within these agents.

The IMA defines several classes of primitive agents and describes their primary functions in terms of modeling the environment, the intelligent machine itself, or behaviors and tasks developed for the machine (Figure 4). The classification is based on experience with agent-based control software for intelligent service robots (Bagchi and Kawamura 1992) as well as ideas from Minsky (Minsky 1986). Lim (Lim 1994) and Suehiro and Kitagaki (Suehiro and Kitagaki 1996) also developed multi-agent software systems based on ideas from Minsky, but each used a fixed set of relationship types between all agents. Figure 5 shows how IMA agents are grouped into several classes, described below.

Sensor agents provide an abstraction of sensor hardware and incorporate basic sensor processing and filtering. Lim (Lim 1994) and Suehiro and Kitagaki (Suehiro and Kitagaki 1996) used agents to encapsulate sensors in a similar manner. Actuator agents provide an abstraction of controlled actuator hardware and incorporate servo control loops. Lim (Lim 1994) and Suehiro and Kitagaki (Suehiro and Kitagaki 1996) also used agents to encapsulate actuators in a similar manner. Environment agents perform the anchoring process and contain mechanisms that process sensor data to update an abstraction of the environment. Minsky (Minsky 1986) suggests that groups of agents might be used to form models of the environment in this way. Skill agents encapsulate closed-loop processes that combine sensors and control actuators to achieve a certain sensory-motor goal. Firby's Reactive Action Packages (Firby 1995) are somewhat like skill agents. Behavior agents are a simplified subset of skill agents. The purpose of behavior agents is to define a class of highly reactive agents that are suitable for implementation of safety reflexes for an intelligent machine. Task agents encapsulate sequencing mechanisms that decide which skill and environment agents to invoke and in what order. The planning agents in the ISAC 2 system (Bagchi and Kawamura 1992) were like these agents, and ISAC 2 exhibited some intelligent behavior with simple scripting within its task agents.

The primitive agent serves as the scaffolding for everything the intelligent machine knows or does relating to a specific element of the robot, task, or environment, much like the concept of object in object-oriented systems. For example, Figure 4 shows that IMA agents are built to represent the physical resources of the robot (e.g., Arms, Pan-Tilt Head, Image Capture, Sound Capture, Speech I/O, Sonar, LIDAR) as well as behaviors (e.g., Avoid Collisions, Coordinate Movements, Emergency Reflexes), skills (e.g., Visual Tracking, Grasping, Visual Servoing), and tasks (e.g., Feed Soup, Find Object, Assemble Parts). However, the model of the environment is also developed as a set of agents (e.g., Bowls, Blocks, Parts, Walls, Places, Forks, Collections) that engage in a process of anchoring (Saffiotti, Konolige, and Ruspini 1994) to keep their state coherent with the world as experienced by the robot's sensor agents. Another software agent represents the user of the system; the user agent combines information from other agents to estimate the state of the user and represents what the system knows about the user for use by other primitive agents.

Figure 5: IMA Agent and Relationship Classes

3.2 RelationshipsOne novel aspect of the IMA is the explicit use of a set of architectural connectors called agent

relationships. These relationships are abstractions of interaction between agents that include typical software interactions (such as function signatures or sequences of method calls) as well as less structured interactions (such as spreading activation or motion schema). Agent relationships combine resource limiting mechanisms and more explicit interaction data-flow under the same general concept. The relationships between agents are implemented as sets of reusable component objects that are aggregated within a hosting agent, allowing that agent to participate in a given relationship. Each relationship is implemented once and configured for use throughout the network of agents. Most multi-agent systems focus on the use of a single type of interaction. ISAC 2 (Bagchi and Kawamura 1992) and JANUS (Beyer and Smieja 1994) both use blackboards where agents communicate. Lim (Lim 1994), Steiner et al. (Steiner et al. 1993) and Rao (Rao 1996) use message passing of strings in artificial languages as the communication path between agents. Maes (Maes 1993), Blumberg and Galyean (Blumberg and Galyean 1995), Bagchi (Bagchi, Biswas, and Kawamura 1996), and Overgaard (Overgaard, Petersen, and Perram 1994) connect the system with dynamic numerical connections much like a neural network or electrical circuit.

The IMA also defines a simpler connector that is called the agent link. Agent links are two-way data-flow connectors that have explicit roles (input links connect to output links). Agent links were adopted from the recent development of “plug and socket architectures (Shaw et al. 1995)” in the discipline of software engineering. The agent link captures a direct connection between two agents. Relationships between agents contain and manage a set of these agent links. The idea of agent-links and the management of links by an agent-relationship is illustrated in Figure 6. The relationship enables an agent to manage many links of the same class from other

agents. Thus, there is a link class that corresponds to each relationship class that defines the contribution of a participant in that relationship.

Figure 6: IMA Agent-Links and Relationships

All agent relationships result from the need to manage some type of resource conflicts in the agent-network. When multiple agents contribute commands for actuator motion, arbitration between the contributions is needed. Both actuator arbitration and sensor fusion share the same structure: many inputs, one output. When multiple agents share the output of sensor processing, bandwidth control is needed to effectively utilize the underlying computational fabric for relevant agents. These situations are examples of the types of resource conflicts that can reduce the effective use of parallel paths between sensing and action. Figure 5 shows a division of IMA agent relationships into classes; geometric transform, sensor flow, spreding activation, message flow, cartesian, simple, and motion schema are examples of some of the current classes.

Motion Schema (Arkin 1990, Overgaard, Petersen, and Perram 94) relationships represent motion commands to actuator agents so that multiple command sources may be combined based on the value of an activation for each participant. Subsumption (Brooks 1986) is a simplified special case of this type of relationship where the arbitration order is fixed at design time. Sensor flow relationships are a reusable component for managing bandwidth limited sensor updates. Each participant receives data at rates dictated by the activation levels of each participant. Message flow relationships are like the mailbox or message port concept used in agent-network systems where the primary communication path is language based. Spreading Activation (Maes 1993, Bagchi, Biswas, and Kawamura 1996) relationships uses a spreading activation algorithm to update the activation levels of different agents. This relationship can be used both to control operation and focus attention (Anderson 1983) in the agent network.

3.3 Agent-Object ModelThis level of the model describes how an agent network defined by the robot-environment model is

constructed from a collection of component objects. Strong encapsulation and separation of policy, mechanism,

and representation are sought at this level of the model. The goal of such separation is to make the development process of agents more structured and facilitate the reuse of software components between agents in the network. The following paragraphs describe the component object classes within an agent.

Agent components are component objects that support a set of interfaces for management by the agent manager as well as interfaces for persistence to streams (blocks of bytes) and storages (hierarchical databases of named streams). The agent component generally also supports an external interface that exposes its specific functionality. Representations, agent relationships and policy components are all subclasses of agent components. Composite components are agent components that support additional interfaces to manage and store a set of subordinate objects. These components act as containers for other agent components, thus providing for a hierarchical composition of components within an agent. Each agent has only one agent manager. This object supports a set of interfaces to establish local connectivity, provides a high-resolution local clock and locates other agents in the system. The agent manager is responsible for handling the process of loading and initializing the connections between other agent components. Each agent component may define a set of outgoing ports called component links. These ports are references to other components within the agent and the references are automatically managed by the agent manger. Policy components give the agent its internal activity and control its response to events and the results of computation. The policy component is an object that encapsulates an operating system thread (i.e., it is scheduled for execution by the underlying operating system) and provides an explicit representation of the control state within an agent. Representation components are the foundation of communication between agents. A representation component is a distributed component object that communicates its state to a set of proxy objects in other agents. Mechanism components are configurable objects that can be invoked to perform one of a set of computations. Mechanism components use a set of component links to representation components as part of their configuration. The mechanism component is like the “command” pattern described by Gamma (Gamma et al. 1995). Component managers permit interconnection with standard OLE and Active software technologies, used to build multi-media documents and other document-centered software on the PC platform. Component managers provide a view of a single agent component when inserted into an OLE or ActiveX container. Component managers allow the components of agents to be visualized using graphical editors and support rendering of graphical or textual representations into web pages containing an ActiveX component manager. This concept is an example of the bridge design pattern (Gamma et al. 1995) used to connect two separate systems, and the component managers are the bridge components: they support some custom IMA interfaces and some standard OLE and ActiveX interfaces. Agent links contain a bundle of representation components and defines a set of roles for each relationship component. A pair of agent link components are software constructs that work like a plug and socket. Relationship components manage a set of agent links and provide mechanisms to selectively update and use contributions from each participant link. These components provide a local view of the relationship between agents to all the other components within the hosting agent. They can be used to bring together the representations and mechanisms needed to arbitrate between multiple input commands or use arbitration to control the update of multiple output data streams.

The structure (Figure 7), functionality (Figure 8) and policy (Figure 9) of an agent is clearly separated into encapsulated components by the agent-object model. Any program has these aspects, but frequently they are mingled or woven together in lines of source code. By separating these aspects it is possible to develop more structured and reusable software at a relatively fine-grained level. These aspects address connectivity, data flow, and control flow as separate views of the software system.

The structural aspect of the agent addresses two separate system concepts. Figure 7 shows that the agent is organized as a tree of component objects originating with the agent manager object as the root. This hierarchical structure is efficient for grouping and collecting the various components within the agent as well as for storing the components to persistent storage. A non-hierarchical connection between agent components is supported as well. This type of connection is called the component link. Each agent component is designed to support a set of incoming interfaces and explicitly connect to a set of interfaces on other components. These agent links are managed automatically by the framework and are a kind of interface-connection architecture as described by Shaw (Shaw et al. 1995).

Figure 7: Structural Aspect

Figure 8: Functional Aspect

Figure 9: Policy Aspect

3.4 Configuration of the SystemThe configuration of an IMA system is stored within the properties of the components of the primitive

agents in the system. The structure of each agent is stored as a hierarchy of objects beneath an agent manager object. Each component within an agent is provided with a persistent model for linking to other components within an agent or within another agent. The configuration of agent components is stored as a set of component properties and a set of component links. The properties of a component are a set of name-value pairs provided through an interface implemented on each component. Some of these properties store the names of other components needed by a component; these names are called component links. Connections between agents are stored in the persistent properties of components called agent links. These components store the name of a counterpart (component of same type in another agent) and a role that allows them to reconnect with their counterpart when the agent is loaded from persistent storage. The configuration of a Robot-Environment software system within IMA has three levels, where configuration changes in one level are kept somewhat isolated from other levels. Agent components have a set of properties that control their operation. Properties are named values of any typical programming type, including indexed arrays of such values. Thus, the gains of a control law component or the initial weights of an arbitration mechanism would be component properties for the respective component. Primitive agents are a set of agent components and connections between the components. The agent manager is configured with a set of agent components, and each component supports both incoming and outgoing connections. An agent is configured by configuring all the outgoing connections, called component links, and setting the properties for each agent component Robot-Environment models are sets of primitive agents and sets of relationships that connect them. A system of agents may be configured by specifying the counterpart for each agent-link, or by adding and removing agent-links from the relationships of each agent.

The decoupling of interfaces between each level of configuration is what permits the IMA to cope with changes in the software system. The configuration of agent components that define a primitive agent can be varied, as long as the agent supports a superset of its current relationships. This permits integration of functionality to be preserved as new functionality is added to each agent. This decoupling of design is critical to the rapid and efficient replacement and augmentation of algorithms that guide an agent's operation. IMA agents can be improved without breaking integration with existing agents. The set of agents and links in the robot-

environment model can be varied, as long as the same relationships are used. For example, new object agents may be created to represent new elements of the robot environment. If these agents support the same set of relationships as other environment agents, then the addition will require a reconfiguration at the robot-environment level only. These two examples illustrate the usefulness of maintaining separate, coupled abstractions of the robot-environment model and the agent-object model.

Finally, the interfaces that define the core of IMA operation are based on a binary standard for component objects, called the Component Object Model (COM) (Microsoft 1996). Any language or tool that can generate component objects conforming to this standard can be used to add new components to the overall system or implement tools for use with the system. At this time C++, VisualBasic, and Java are programming languages that support development of components using this standard.

4 Service Robot Systems: An Application of the IMA

4.1 Evaluation of IMAThe initial implementation of the IMA has been evaluated to try to discover how it affects the

development process and properties of the resulting system (Pack 1998). One purpose of software architecture is to provide useful abstractions that work at a level higher than traditional programming methods (Shaw et al. 1995) and confer some set of properties to conforming systems. Another purpose of software architecture is to describe the decomposition and integration between components as well as functionality in a form that can be used to guide the development process and improve it in some way. Typical evaluations of architecture include capability, scalability and productivity. Capability means that the architecture provides a means to express concepts or describe systems that cannot be adequately described by other architectures. Scalability implies that the architecture adequately handles systems as they grow larger. Productivity means that the architecture provides abstractions (and tools) that increase the productivity of developers.

4.2 CapabilityCapability is an issue that is too vague to measure, and, in a rigorous sense, it can be argued that all

architectures that are Turing complete are equivalently capable because they can be used to compute any computable function. Thus, it comes down to analyzing the ease of modeling different systems within the architecture and how the resulting software performs. The IMA follows a relatively intuitive and extensible object-oriented model at both levels of the system. This model encourages reuse of software and encapsulation of software behind explicit incoming and outgoing interface definitions at both levels of the system. In addition, two very different robot systems were modeled and were controlled using IMA agent networks. ISAC, a humanoid robot, was modeled as a network of IMA agents, as was Helpmate, a mobile manipulator system. From experiences in implementation of IMA on our two service robots, we can qualitatively observe that the architecture is capable of expressing robot-environment models for both robot system types using combinations of the basic elements of the architecture.

4.3 ScalabilityScalability is a property that is measured in many ways. Essentially, scalability means that the

architecture can solve larger problems with proportionally larger resources. These resources are often described in terms of processing power or communication bandwidth between elements of the system. Thus, centralized architectures frequently have problems with scalability, because the performance of the central component always limits the performance of the total system. Distributed architectures do not escape this problem, because growth of communication overhead between system elements may also limit the system performance. The IMA addresses scalability by avoiding centralized bottlenecks whenever possible and remaining asynchronous and distributed. Furthermore, the relationships in IMA permit developers to easily exert various kinds of bandwidth

control on the links to other agents. This helps to avoid communications bottlenecks in the system and is a beneficial side effect of using a global attention-focusing mechanism.

The timing of updates between agents was measured to assess the communication overhead in the agent network. Figure 10 shows the update times when the agents are on other computer systems connected by a 10Mbps Ethernet 10-base-T network connection. This means that the single source representation is communicated to 1,2,4, 8, or 16 other agents. As expected, the performance varies linearly, and timing is very consistent when updating the representations of agents on the same computer system. The network loading for these tests was relatively light, but the network was not dedicated to this individual test in isolation. The important feature to show is that the update time is roughly proportional to the number of connected remote representations. Again, the update times are plotted for 25 trials where each line representing a different number of agents that receive the update from the single source.

In order to give the reader some context for understanding the different primitive agents and components being tested, the following definitions are made:

Vector Signal: Vector Signal is a representation component that encapsulates a history of time-stamped vectors. The Vector Signal is used for storing and passing between components such things as sensor data and actuator state variables.

Motion Link: Motion Link is an agent link manager component used for controlling actuator agents. The component has three Vector Signal subcomponents, or aspects: Position, Command, and ArbData. The Position aspect is used by the actuator agent to report its current state. The Command and ArbData aspects are used by the actuator agent's arbitration mechanism to compute the control input for the actuator.

Simple Schema Rel.: Simple Schema Relationship is used by actuator agents for command arbitration. The relationship component can contain many Motion Link subcomponents. The command contributions of these subcomponents are combined using a weighted average and the result is the control input for the actuator.

Net Image Rep.: Net Image Representation encapsulates an array of image pixels. It can transmit its data to proxy representations across a network.

General Engine: General Engine is an engine, or policy, component. It performs a fixed sequence of mechanism activations, repeated at a configurable interval. This engine is useful for agents that need only repeat a series of operations, such as hardware sampling.

Figure 11 shows the update times of a motion schema relationship, which involves combination of commands from several contributing agents. Each line in this figure represents a certain number of contributors to the relationship. For each contributor set the update timing was measured 25 times. This update involves the update of many representation components and the computation of an effective value from the contributions within the relationship. The experiment also involved agents running both locally and remotely. The intention of these measures is to show that the communication time between agents linked by representations, links and relationships grows roughly linearly with an increasing number of connections to the agent hosting the relationship. If the updates times were to exhibit faster than linear growth, the communication burden would limit expansion of system size considerably. This leaves open the possibility for scaling IMA to larger systems. However, it would be ideal if the IMA could utilize broadcast or multicast communication to further reduce network overhead.

One way of comparing these evaluations is to consider the update time of the arm controller used in the ISAC system. The robot should be able to take action and respond appropriately as updates propagate through the agent network on various distributed machines. When you compare the update time for these components with the sampling time of the SoftArm controller agent, which is about 20 milliseconds for the current ISAC system, it can be seen that an event that causes an update in the motion schema contributions will be able to propagate through the network and become part of an effective contribution to motion within a few samples of the servo

control loop. Thus, these tests show that the basic IMA components are fast enough to provide a substrate for distributed, reactive control and provide a flexible interface that includes high-level context influences as well.

Figure 10: Remote Vector Signal Representation Performance (Pack 1998)

Figure 11: Motion Schema Relationship Performance (Pack 1998)

4.4 ProductivityArchitecture-based design can increase productivity by making the development process of each

component easier, by supporting high levels of component reuse, or by supplying tools that automatically handle redundant or repetitive coding. The IMA tries to achieve improved productivity by supporting very strict encapsulation and consistently high levels of reuse at the agent component level and by offering the ability to achieve some reuse at the agent level. The IMA also benefits from some tool support because it was explicitly designed to use common, commercially available software tools wherever possible.

In the example systems presented above, many components were used multiple times within the development of the agent network for ISAC and were reused for development of the agent network for the Helpmate robot. The Tables 1 and 2 show some of the reusable components developed for demonstrations of the ISAC system and how many times these components were used in agents that are part of the robot-environment model for the ISAC system. Table 3 shows a similar table for some of the Helpmate robot agents. The tables

show a reuse fraction for each agent that is the number of reused components divided by the number of total components in the agent. A high reuse fraction implies that less development work was spent on an agent, while a low reuse fraction implies that many special components needed to be developed for the agent. A reused component is one that appears in different classes of agents in the same system. A specialized component is one developed for a particular subclass of agent. These too may be reused but are not counted in the tables. Thus, the tables reflect a conservative measure of reuse of agent components.

The tables show that for this set of agents in the ISAC robot model, significantly more than half of the components in the model are reused components. It should be noted that because the ISAC system is continually evolving, these agents represent only a subset of the robot model for the ISAC system. Also, these agents represent some of the lowest level robot model agents. These agents encapsulate actual robot hardware interfaces and tend to have higher numbers of specialized components, because each hardware interface is specialized. The high measures of reuse also show the qualitative performance of the abstractions selected in the design. High reuse levels imply that the abstractions selected in the design of IMA fit the actual implementation process.

Another interesting measure of reuse is the comparison of components that were reused in the development of the ISAC system that have also been reused in the development of the agent network for the robot-environment model of the Helpmate system. There are similarities between the two robots, so a high level of reuse was expected for components between the two systems. Table 4 shows some of the components developed for ISAC that were used unchanged (except for configuration or properties) on the Helpmate robot. Another way to examine the reuse across systems is that the only new components developed for Helpmate involved hardware specific to Helpmate, algorithms specific to mobile robots, or interfaces specific to mobile robots. Thus, in the development of Helpmate, the developers needed to develop only components that relate to specific hardware or behaviors. The rest of the system, a significant fraction, was reused from the development of the ISAC system agents.

Table 1: Reuse of Components in ISAC Agents (Pack 1998)Component Arm Agent (Each) Head Agent Greifer Agent Force ControlVector Signal Rep. 6 2 2 4Motion Link 3 3 1 1Simple Schema Rel. 1 1 0 0General Engine 1 1 1 1Specialized 5 1 1 2Total Components 16 8 5 8Reuse Fraction 0.69 0.88 0.80 0.75

Table 2: Reuse of Components in Additional ISAC Agents (Pack 1998)Component Color Seg. Tracking Agent Visual Servo Agent Drawing AgentVector Signal Rep. 0 2 0 2Motion Link 0 0 3 1Net Image Rep. 2 4 0 0Vector Link 0 0 5 0General Engine 1 1 0 1Specialized 2 3 4 2Total Components 5 10 12 6Reuse Fraction 0.60 0.70 0.67 0.67

Table 3: Reuse of Components in Helpmate Agents (Pack 1998)

Component Drive Agent Sonar Agent LIDAR Agent Head AgentVector Signal Rep. 2 1 1 2Motion Link 3 0 0 0Simple Schema Rel. 1 0 0 1General Engine 1 1 1 1Specialized 1 1 1 1Total Components 8 3 3 5Reuse Fraction 0.88 0.67 0.67 0.80

Table 4: Reuse of Components Between ISAC and Helpmate (Pack 1998)ISAC Component Number of ISAC Uses Number of Helpmate UsesVector Signal Rep. 18 6Motion Link 12 3Simple Schema Rel. 2 2Net Image Rep. 6 0General Engine 7 4

5 Testbed SystemThis section describes our testbed system for socially intelligent service robotics, which consists of a

dual-arm humanoid robot, a mobile robot, and a human user (Figure 1).

5.1 Dual-arm HumanoidOur dual-arm humanoid robot hardware consists of a pair of 6-degree-of-freedom Softarm manipulators

and a pan/tilt camera head (see Figure 2). The Softarms are actuated by artificial pneumatic muscles. These arms have several advantages, e.g., they are low-power, lightweight, and naturally compliant. They are ideal for use in situations where contact with humans is necessary. On each of the Softarms we have installed a 6-axis force/torque sensor at the wrist. The Softarms are controlled by a PC expansion card designed in-house. The camera head is a Directed Perceptics pan/tilt unit on which we have mounted a stereo vergence platform designed in-house. This platform holds two color CCD cameras. The robot’s software runs on a network of PCs.

5.2 Mobile RobotOur platform for mobile robotics and mobile manipulation research is an augmented Helpmate robot (see

Figure 3). The Helpmate is a differentially steered mobile base with a bank of forward- and side-looking SONAR sensors mounted in a vertical panel. To this robot we have added a laser ranger (LIDAR), a vision system based on a pan/tilt camera head, a manipulator, and two Pentium computers.

The manipulator is a 5 degree-of-freedom Softarm mounted on the left side of the robot, directly behind the sonar panel. The entire workspace of the arm is on the left side of the robot. The Cost-effective Active Camera Head (CATCH) is a 4 degree-of-freedom stereo camera head with pan, tilt, and independent verge. CATCH has two color cameras and is mounted on a platform near the front of the robot. The camera head is offset to the left, so that the SoftArm’s workspace can be viewed without occlusion by the body of the robot.

The robot has two onboard computers. A 150MHz Pentium handles the drive, SONAR, LIDAR, and SoftArm, while a 166MHz Pentium with MMX handles vision processing. The two computers are connected by a 10-BaseT Ethernet network. In addition, the 150MHz Pentium is connected to our lab’s LAN by radio Ethernet. This allows the HelpMate to be part of a distributed multi-robot system.

5.3 Application of HuDL to Service RoboticsThis section shows how the concept of HuDL can be used to integrate humans into a service robot system

for aiding the physically disabled. We are currently developing demonstrations for our service robot system using the IMA. What follows is a description of a pair of related scenarios we have chosen to help us develop and integrate service robot technology, including human interfaces. Thus, in the words of the recent NSF Workshop on Human-Centered Systems, our research can be characterized at least in part as a “Testbed-Style Research Project within a Situated Context” (Woods and Winograd 1997). Our system uses ISAC and Helpmate as a service robot team. This pair of benchmarks was selected because there is direct human interaction, a relatively unstructured environment, and simplifying assumptions can be gradually relaxed to increase problem complexity as our development proceeds.

5.3.1 Humanoid Service Robot Aid TaskISAC is activated and the benchmark demonstration is started. ISAC begins scanning its environment for

utensils, food, phones, and other items relevant to its task. Once a certain level of certainty about this environment is achieved, ISAC begins scanning in front of itself for a user. Strange or badly classified items are noted for later use. If too much time passes, ISAC re-scans the environment and then comes back to looking for a user. When a user is detected (skin-tone, face detection, voice command and combinations of these) ISAC begins with a greeting using voice and gestures. ISAC will then try to determine the identity of the user, either through a detection algorithm or by asking for confirmation of identity. Identification is not necessary to proceed, but could be used to customize responses or subsequent behavior. ISAC may then ask the user for help (through voice, gestures and visual feedback) in identifying things that were not automatically classified in the initial pass and introduce the user to some options.

Once the user is in place and the robot has begun interacting, the user might ask (via a voice command) ISAC to feed him soup. ISAC should confirm this with the user using voice and gesturing to the soup. If soup is not on the table, ISAC should set the table from a side cart. If there is no soup, ISAC should say so and list the food items it knows it actually has as alternatives for the user. Assuming that soup is requested and is available, then the demonstration should proceed.

ISAC should place the soup close to the user and pick up the spoon. The active vision system will then alternate between tracking the user and fixating the bowl to guide both parts of the feeding motion. ISAC will enter a cycle of dipping up soup (dip confirmation might use color or force information, for example) and bringing it to the user’s mouth. A force transient will signal that the soup is taken and the cycle will continue. Then the phone starts ringing when the robot has soup in the spoon. It should start taking the spoon back to the bowl and begin locating the phone. As soon as the phone is located, the robot should pick up the phone with his other hand while shaking off soup from the spoon with his dipping hand. When the conversation is over, signaled by force on the phone, ISAC should hang up and ask if it should resume feeding.

An IMA primitive agent decomposition of this system is shown in Figure 12. The robot’s physical resources and skills (e.g., visual servoing, table, spoon) and the tasks (setting the table, feeding, etc.) are all represented by primitive agents.

Clear TableTask

SoftArm(R)

SoftArm(L)

GripperAgent(L)

GripperAgent(R)

ISACBody

StereoB/W Image

Pan-TiltHead Color

Image

VoiceI/O

Cup

Table

Spoon

Fork

Bowl

Napkin

Condiment

Skin-toneTracking

3DTracking

VisualServoing(R)

VisualServoing(L)

CollisionAvoid(R)

CollisionAvoid(L)

PhysicalBehavior / Skill

Task Level

FeedingTask

Phone Ans.Task

PickupUtensil(R) Pickup

Utensil(L)

Set TableTask

Figure 12: IMA Primitive Agent Decomposition for Dual-Armed Humanoid Service Robot

5.3.2 Mobile Manipulator Fetch TaskIn this example, the user requests that HelpMate, the mobile manipulator, fetch an object such as a soda

can, from another room and bring it back to ISAC. In such a scenario, the robot must complete a series of subtasks in the proper order:

1. Go to the correct room2. Pick up the soda can3. Return to the original room4. Hand the soda can to ISAC

We assume that in steps 1 and 3 the robot uses a map-based navigation program. The granularity of the map is fine enough to place the robot close to the target object, i.e., in the same room. In steps 2 and 4, the robot uses vision, both to move toward the target destination and to guide its manipulator for pickup and deposit. We will call steps 1 and 3 the map mode of operation, and steps 2 and 4 the visual servoing mode.

One problem that occurs in map mode is that the robot may become trapped or lost. In this case, the human assists the robot; the problem now becomes a question of how the human will know when to intervene. Our solution is twofold: (1) allow the human to monitor the robot’s progress using sensor data and (2) have the robot monitor its own progress and decide when to ask the human for help.

Sensor data, such as the robot’s dead reckoning, range data from SONAR or LIDAR, or a video stream can be used to give the human a graphical representation of the robot’s surroundings. From this the human can

identify the robot’s position, decide if the robot is lost and, if so, how to correct the situation. Likewise, monitoring the robot’s speed can indicate whether the robot has malfunctioned or become trapped.

For the robot to determine if it needs help, we will use supervisor primitive agents to monitor the robot’s progress. For example, the navigation supervisor will make an initial estimate of total spatial displacement and an initial estimate of the time required to finish the navigation phase. It will also monitor the robot’s speed. Should either the navigation take too long or the robot’s speed fall below a threshold for too long, the robot will request help from the human.

In the visual servoing mode, the robot tracks the target object in a video image. To initialize the tracking, the robot interacts with the human to locate the object in a video image. The robot presents an image of the scene, highlighting its estimate of the object’s location. The human can agree with the robot, or can direct the robot to look at another location. This process is repeated until the robot has found the object. This reduces the complexity of the problem of initializing a tracking algorithm by restricting the search space of the initialization. The interaction between the human and the robot can include many modalities such as speech, mouse, touch sensitive screen, and gesture.

A supervisor primitive agent for tracking can be used to ask the user to reinitialize the tracking algorithm. This agent can make assumptions about the velocity of the target object, based on information from the human; e.g., if the target is on a table, it can be assumed to be stationary. The supervisor can compute the object’s apparent velocity based on the robot’s velocity and the velocity of the active camera head, and compare this with the object’s assumed velocity. The tracking algorithm itself may report a confidence value to the supervisor primitive agent, providing additional information about the success of the tracking.

An IMA primitive agent decomposition of this system is shown in Figure 13. The robot’s physical resources and skills (e.g., visual servoing, navigation), the human, and the fetch task itself are all represented by primitive agents.

SoftArm

LIDARAgent

StereoColor Image

Pan-TiltHead Color

ImageColor

Tracking

Model-matchTracking

VisualServoing

CollisionAvoidance

ResourceBehavior / Skill

Task Level

PickupObject

Map-BasedNavigation

FetchTaskVoice

I/OSONARAgent

ResourceAgents

Skill/BehaviorAgents

Vision-GuidedNavigation

TaskAgents

DriveAgent

FailureDetection

HumanAgent

Figure 13: IMA Primitive Agent Decomposition of Fetch Task

5.3.3. Status of Current SystemThe task scenarios just described are not complete and will take some time to bring to completion.

However, many of the individual components and primitive agents needed for these tasks have been completed. For example, in the case of the humanoid service robot aid task, we have completed development of the following primitive agents: Stereo Imaging, Pan-Tilt Head, Color Image, Skin-tone Tracking, Visual Servoing, Voice I/O, Soft Arm, Gripper, Collision Avoidance, Table, Cup, Fork, Spoon, Bowl, and Pickup Utensil. The 3D Tracking and Feeding Task primitive agents are currently under development. The others are future work. In the mobile manipulator fetch task we have completed the following primitive agents: Stereo Color Image, Pan-Tilt Head, Color Image, Color Tracking, Visual Servoing, SONAR, Voice I/O, Model-match Tracking, LIDAR, Soft Arm, Collision Avoidance, Drive, and Map-Based Navigation. The Vision-Guided Navigation and Failure Dectection agents are under development, and the rest are future work.

As an example, one such task for which all the necessary agents have been completed is that of, upon request, having the humanoid present a beverage container to the human to drink. This task is activated when the speech I/O agent perceived the command “Give me a drink.” The task agent would then begin to look at the table, searching for the beverage container. The search is performed using the object classifier first to verify the object. Then the model-match tracking agent provides a fixation point for servoing the arm to the cup. The task agent then commands the gripper to grasp the container. At this point, the task agent prepares to bring the beverage to the user. The user is located and fixated visually using the skin-tone tracking agent. Servoing is again used to bring the beverage near the human’s mouth. The beverage has a straw in it, so the user can drink easily. The task agent perceives the user’s satisfaction, when the speech I/O agent perceives the response “I don’t want any more to drink.” This give drink task agent is capable of interacting with the user and the resources and skills of the robot in order to achieve the goals of the task.

6 ConclusionIn this paper, we have presented our views on the development of socially intelligent service robots. In

particular we have focused on two main issues. The first is the interaction between the human user and the robot, the nature of this interaction (that it is rhythmic, dialog based and adaptive), and the modalities of this interaction. Our research in this area is guided by our paradigm for human-robot interaction, which is Human Directed Local Autonomy, or HuDL. The second is the difficult but critical issue of integrating a wide variety of complex technologies into one highly complex system. Our approach to dealing with the system integration problem has lead to the development of the Intelligent Machine Architecture (IMA), also described in this paper. An evaluation of the effectiveness of IMA is also presented.

We have also described two example applications of how we are using HuDL and IMA to achieve a practical service robot system for aiding the physically disabled. We are currently implementing our ideas on the testbed systems described in this paper.

References

Anderson, J.R. 1983. The Architecture of Cognition. Harvard University Press.

Arkin, R.C. 1990. “Integrating Behavioral, Perceptual, and World Knowledge in Reactive Navigation,” Robotics and Autonomous Systems, 6:105-122.

Baeijs, C., Demazeau, Y., and Alvares, L. 1996. “SIGMA: Application of Multi-agent systems to cartographic generalization,” in Walter Van der Velde and John W. Perram, eds., Agents Breaking Away: 7th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, pp. 163-176, Springer-Verlag.

Bagchi, S. and Kawamura, K. 1992. “An architecture for a distributed object oriented robotic system ,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 2, pages 711-716.

Bagchi, S. 1993. “Isac 2: The sequel.” Intelligent Robotics Lab Technical Report, Vanderbilt University, 1993.

Bagchi, S., and Kawamura, K. 1994. ISAC: A Robotic Aid System for Feeding the Disabled. Proceedings of the AAAI Spring Symposium on Physical Interaction and Manipulation. Stanford University.

Bagchi, S., Biswas, G., and Kawamura, K. 1996. “Interactive task planning under uncertainty and goal changes,” Robotics and Autonomous Systems, 18:157-167.

Beyer, U. and Smieja, F. 1994. “JANUS: A society of agents,” Technical Report GMD 840, German National Research Centre for Computer Science (GMD), Germany.

Blumberg, B.M. and Galyean, T.A. 1995. “Multi-level direction of autonomous creatures for real-time virtual environments,” Computer Graphics Proceedings, SIGGRAPH-95.

Booch, G. 1994. Object-Oriented Analysis and Design. Addison-Wesley.

Brooks, R.A. 1986. “A Robust Layered Control System for a Mobile Robot,” IEEE Trans. Robotics and Automation, RA-2(1).

Brooks, R.A. 1989 “The Whole Iguana,” in M. Brady, ed., Robotics Science, MIT Press.

Cameron, J., MacKenzie, D., Ward, K., Arkin, R.C., and Book, W. 1993. “Reactive Control fo Mobile Manipulation,” Proc. IEEE Conf. Robotics and Automation.

Cheng, G. and Zelinsky, A. 1997. "Supervised Autonomy: A Paradigm for Teleoperating Mobile Robot" , Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

Cleary, M.E. and Crisman, J.D. “Canonical Targets for Mobile Robot Control by Deictic Visual Servoing ,” Proc. IEEE 1996 International Conference on Robotics and Automation, pp. 3093-3098, 1996.

Crisman, J.D. and Bekey, G. 1996. Grand challenges for robotics and automation: The 1996 ICRA panel discussion. IEEE Robotics and Automation Magazine, 3(4):10-16.

Firby, R.J. 1995. “Task execution: Interfacing to reactive skill networks,” In Don't Know, pages 146-153.

Fischer, K., Muller, J.P., and Pischel, M. 1995 “A pragmatic BDI Architecture,” In Intelligent Agents II: IJCAI '95 Workshop, pages 203-218.

Friedrich, H., and Dillmann, R.. 1995. Robot programming based on a single demonstration and user intentions. In Proceedings of the 3rd European Workshop on Learning Robots at ECML'95 , M. Kaiser, editor, Heraklion. Crete, Greece.

Fröhlich, J., and Dillmann, R. 1993. Interactive robot control system for teleoperation. In Proceedings of the International Symposium on Industrial Robots (ISIR '93). Tokyo, Japan.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. 1995. Design Patterns: Elements of Reusable Object Oriented Software. Addison-Wesley.

Guinn, C. 1996. “Mechanisms for Mixed-Initiative Human-Computer Collaborative Discourse,” Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics.

Haller, S., McRoy, S., and Ali, S. 1997. "Towards a Model for Dialogic Discourse", The AAAI Spring Symposium on Computational Models for Mixed Initiative Interaction, Standford, CA.

Hexmoor, H.H., Lammens, J.M., and Shapiro, S.C. 1995. “An autonomous agent architecture for integrating unconscious and conscious, reasoned behaviors,” In AAAI Spring Symposium: Lessons Learned from Implemented Software Architectures for Physical Agents.

Iglesias, C.A., Gonsalez, J.C., and Velasco, J.R. 1995. “MIX: A general purpose multiagent architecture,” In M. Wooldridge, J.P. Muller, and M. Tambe, eds, Intelligent Agents II: IJCAI '95 Workshop, pages 251-267. Springer-Verlag.

Kawamura, K., Wilkes, D.M., Pack, T., Bishay, M., and Barile, J. 1996a. Humanoids: Future Robots for Home and Factory. In Proceedings of the International Symposium on Humanoid Robots, 53-62. Waseda University, Tokyo, Japan.

Kawamura, K.,. Pack, R, Bishay, M., and Iskarous, M. 1996b. Design Philosophy for Service Robots. Robotics and Autonomous Systems, 18:109-116.

Kawamura, K., and Pack, R. T. 1997. Object-Based Software Architecture for Service Robot Development. In Proceedings of the International Conference on Advanced Robotics.

Lim, W. 1994. “An agent-based approach for programming mobile robots,” Proc. IEEE Conference on Robotics and Automation, pages 3584-3589.

Lueth, T. C., Laengle, T., Herzog, G., Stopp, E., and Rembold, U. 1994. KANTRA - human-machine interaction for intelligent robots using natural language. In Proceedings of the ROMAN '94.

Luo, R.C. and Kay, M.G. 1989. “Multi sensor integration and fusion in intelligent systems,” IEEE Iransactions on Systems, Man and Cybernetics, 19(5).

Maaß, W. 1995. How Spatial Information Connects Visual Perception and Natural Language Generation in Dynamic Environments: Towards a Computational Model. In: A.U. Frank, W. Kuhn (eds.), Spatial Information Theory: A Theoretical Basis for GIS. Proc. of the Int. Conference COSIT'95, Semmering, Austria, 223-240. Berlin, Heidelberg: Springer.

Maes, P. 1993. “Behavior-based artificial intelligence,” Proceedings of the 2nd Conference on Adaptive Behavior. MIT Press.

Marcenac, P.1997. “The multiagent approach,” IEEE Potentials, 16(1):19-22.

Mayfield,, J., Labrou, Y., and Finin, T. 1996. “Evaluation of KQML as an agent communication language,” In M. Wooldridge, J.P. Muller, and M. Tambe, eds, Intelligent Agents Vol ume no - Proceedings of the 1995 Workshop on Agent Theories, Architectures and Languages. Springer-Verlag.

Microsoft. 1996. The Microsoft Object Technology Strategy: Component Software, technical report, Microsoft, Redmond, Washington.

Minsky, M. 1986. The Society of Mind. Simon and Schuster.

Muench, S., and Dillmann, R. 1997. “Haptic Output in Multimodal User Interfaces,” In Proceedings of the 1997 International Conference On Intelligent User Interfaces (IUI97), 105-112. Orlando, FL.

Musliner, D.J. 1993. CIRCA: Cooperative Intelligent Real-Time Control Architecture. PhD thesis, University of Michigan, 1993.

Novick, G., and Sutton, S. 1997. “What is Mixed-Initiative Interaction?” In the working notes of the AAAI Spring Symposium on Computational Models for Mixed Initiative Interaction, Stanford University, March 24-26.

Overgaard, L.,. Petersen, H.G, and Perram, J.W. 1994. “Motion planning for an articulated robot: A multi-agent approach,” In John W. Perram and JeanPierre Muller, eds, Distributed Software Agents and Applications: 6th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, pages 206219. Springer-Verlag.

Pack, R.T., and Iskarous, M. 1994. The Use of the Soft Arm for Rehabilitation and Prosthetics. In Proceedings of the RESNA 1994 Annual Conference, 472-475. Nashville, TN.

Pack, R.T., Wilkes, M., Biswas, G., and Kawamura, K. 1997. “Intelligent Machine Architecture for Object-Based System Integration,” Proc. AIM ’97.

Pack, R.T. 1998. IMA: The Intelligent Machine Architecture, Ph.D. dissertation, Vanderbilt University.

Penny, S. 1997. Embodied Cultural Agents: at the intersection of Art, Robotics,and Cognitive Science, in the working notes of the AAAI Fall Symposium on Socially Intelligent Agents.

Picard, R. 1997. Affective Computing. Cambridge, MA: MIT Press.

Podilchuk, C. and Zhang, X. 1995. Face Recognition Using DCT-Based Feature Vectors, Proc. Int. Conf. Acoust., Speech, and Sig. Proc., Atlanta.

Pook P.K., and Ballard, D.H., 1995. Deictic Human/Robot Interaction. In Proceedings of the International Workshop on Biorobotics: Human-Robot Symbiosis, 259-269. Tsukuba, Japan.

Qiu, B. 1997. Face and Facial Feature Detection in a Complex Scene. M.S. Thesis, Vanderbilt University.

Rao, A.S. 1996. “Agentspeak(l): BDI agents speak out in a logical computable language,” In Walter Van der Velde and John W. Perram, eds, Agents Breaking Away: 7th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, pages 42-71. SpringerVerlag.

Rich, Charles and Sidner, Candace L. 1997. ”COLLAGEN: When Agents Collaborate with People,” Proceedings of the First International Conference on Autonomous Agents, 284-291.

Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., and Lorensen, W. 1991. Object Oriented Analysis and Design. Prentice-Hall.

Saffiotti, A., Konolige, K., and Ruspini, E.H. 1994. “A multivalued logic approach to integrating planning and control,” Technical report, Artificial Intelligence Center: SRI International.

Sandini, G., Yamada, Y., Wilkes, D.M., and Bishay, M. 1996. Sensing Group Report, Int. Workshop on Biorobotics: Human-Robot Symbiosis, Robotics, and Autonomous Systems, vol. 18, pp. 207-211, 1996.

Shaw, M. and Garlan, D. 1996. Software Architecture: Perspecitvies on an Emerging Discipline. Prentice Hall.

Shaw, M., DeLine, R.,Klein, D., Ross, T., Young, D., and Zelesnik, G. 1995. “Abstractions for software architecture and tools to support them,” IEEE Transactions on Software Engineering.

Shibata, T., Matsumoto, Y., Kuwahara, T., Inaba, M., and Inoue, H. 1995. "Hyper Scooter: a Mobile Robot Sharing Visual Information with a Human", Proc. IEEE 1995 International Conference on Robotics and Automation, pp. 1074-1079.

Sinha, P. 1994. Object Recognition via Image Invariants: A Case Study. In Investigative Ophthalmology and Visual Science, vol. 35, pp. 1735-1740, Sarasota, Florida.

Sinha, P. 1994b. Qualitative Image-Based Representations for Object Recognition, A.I. Memo No. 1505, Dept. of Brain and Cognitive Sciences, MIT.

Steiner, D., Burt, A., Kolb, M., and Lerin, C. 1993. “The conceptual framework of mai21,” In Cristiano Castelfranchi and Jean-Pierre Muller, eds, From Reaction to Cognition: 5th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, pages 217-230. Springer-Verlag.

Stopp, E., Gapp, K. P., Herzog, G., Laengle, T., and Lueth T. 1994. Utilizing spatial relations for natural language access to an autonomous mobile robot. In Deutsche Jahrestagung für Künstliche Intelligenz (KI '94).

Suehiro, T. and Kitagaki, K. 1996. “Multi-agent based implementation of robot skills ,” Proc. IEEE Conference on Robotics and Automation, pages 2976-2981.

Suehiro, T, Takahashi, H., and Yamakawa, Y. 1997. “Research on real world adaptable autonomous systems: Development of a hand-to-hand robot,” In RWC 1997.

Woods, D., and Winograd, T. 1997. Breakout Group 3 Report: Human-Centered Design, NSF Workshop on Human-Centered Systems. Arlington, VA.

Sztipanovits, J., Karsai, G., and Franke, H. 1995. “MULTIGRAPH: An architecture for model-integrated computing,” Proc. ICECCS'95, pages 361-368.

Wooldridge, M. 1994. Time, knowledge and choice. Technical report, Department of Computing, Manchester Metropolitan University, Chester Street, Manchester M1 5GD, United Kingdom.

toward socially intelligent service robotseecs.vanderbilt.edu/cis/papers/socintagent.doc · web...

Documents