using a simulated user to explore human robot...
TRANSCRIPT
Abstract
Human-robot interfaces (HRI) can be difficult to use. We examine urban search rescue robots (USR) as an
example. We present here a theory of their use based on a model user written in the ACT-R cognitive
modeling language. The model, using a simulated eye and hand, interacts directly with an unmodified and
simple tele-operating task of maneuvering in an environment to avoid other moving objects. The model
user also does a secondary task, analogous to many HRI tasks. In addition to describing the knowledge the
human operator must have, as well as what aspects of the task will be difficult for the operator, the model
makes quantitative predictions about how the speed of the robot influences the quality of the navigation and
performance on the secondary task. These results are examples of the types of outputs available from a
model user. As the model now interacts with the USR simulator using only the bitmap, the model should
be widely applicable to testing other simulators and to actual robots. The model already suggests why
human-robot interfaces are difficult to use and where they can be improved.
Index Terms--
cognitive model, ACT-R, human-robot interfaces
Pass to Isaac, Rob, Rick Sherry, Robin Murphy, Salvucci, Woods, Clayton Lewis and Freed
Pass to “Nottingham guy” upon publication.
Page 1 document.doc 5/18/2023 - 3:14 PM
Using a simulated user to explore human robot interfaces
D. VAN ROOY, FRANK E. RITTER Member, IEEE, and R. ST.-AMANT, Member, IEEE
Simuser to explore HRI
Manuscript for Special Issue on Human-Robot Interaction - IEEE Transactions on Systems, Man and Cybernetics.
I. INTRODUCTION
In the future, it might be that robots will become completely autonomous and will act largely independent.
However, such a level of independence has not yet been achieved and is in some cases simply undesirable.
Many of the tasks that robots face today like exploration, reconnaissance and surveillance, will continue to
require supervision [1]. Furthermore, people often do not have enough confidence in a completely
autonomous robot to let it operate independently. So it seems that the level to which the use of robots will
be integrated in our society, will be largely dependent on the robots ability to communicate with humans in
understandable and friendly modalities [2].
There exists a large number of communication channels between human operator and robot that are
typically classified into verbal and non-verbal [3]. Lately, a great deal of interest has been directed towards
the development of multi-modal interfaces that combine several modalities such as natural-language based
interfaces, virtual reality based displays [4], emotive computing [5], and gesture recognition [6]. These
interfaces share many functions, among which reducing time to do a task, helping avoid errors in behavior,
and improving and supporting trust of the human operators in the system [7].
Despite its importance, a general theory of human-robot interface use seems to be lacking. Many human-
robot interfaces do not even respect the most fundamental HCI principles. In this paper, we will present the
beginnings of a theory that indicates the issues that make human-robot interfaces difficult to use.
Concurrently, we will present a quantative tool in the form of a simulated user that can be used to identify
problems associated with human-robot interface use. Specifically, we introduce a methodology in which a
cognitive model autonomously exercises human-robot interfaces, indicating ways to improve the interface
and laying bare problems that can serve as starting points for a general theory of human-robot interface use.
One of the reasons that there does not seem to be a general theory of human-robot interface use is the
complexity of the task domain, which is reflected in the diversity in types of human-robot interactions. An
application that illustrates this well is that of robot assisted Urban Search and Rescue (USR). USR
involves the detection and rescue of victims from urban structures like collapsed buildings. Because of the
extreme physical and perceptual demands of USR, these applications are usually mixed-initiative human-
robot interactions, in which a human operator and a robot interact in some manner to produce adequate
performance [8]. This means that it might be optimal for the robot to exhibit a fair amount of autonomy in
some situations, for instance, in navigating in a confined space using its own sensors. However, other
situations might require human intervention: An operator may have to assist in freeing a robot because its
Page 2 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
sensors do not provide enough information for autonomous recovery [8]. And yet further interventions,
some only imagined, such as providing medication to trapped survivors, will legally require human
intervention. This illustrates how in the case of enhanced robot autonomy, the role of the operator could
often shift between control to monitoring and diagnosis [1].
In many ways, USR can be viewed as a form of end-user programming: A user, that is not the programmer
of the software being used, is trying to get a task done. However, key aspects of end user programming are
missing from many human-robot systems. In end-user programming, the generality of full-fledged
programming languages is typically traded for an applied extension language, which allows the user to
change the actions or interface of a system. Good examples of this are spreadsheet users that write
formulas and macros, the programming extensions for AutoCad (in Lisp) and for Microsoft Word (in
Visual Basic). Complex user interfaces like Windows use a limited set of high level principles, such as
metaphor (a desktop with folders and recycle bins), consistency, and direct manipulation, from which
detailed rules are derived.
There are several reasons why these principles from end-user programming are missing from many human
robot systems. First of all, the task domain of human-robot systems is more complex and diverse, making
it very hard to meet the needs of diverse users or come up with a general metaphor. Furthermore, these
systems are typically more expensive than regular commercial software packages. At the same time, they
are not build as often as regular software and when they are built, it is usually not by people trained in HCI.
What is needed is a way to test and improve these interfaces.
II. USING A SIMULATED USER TO EXPLORE HUMAN ROBOT INTERFACES
In this section, we will introduce a cross-platform architecture in which a cognitive model simulates user
performance. Specifically, we will introduce a simulated user, consisting of a cognitive model and a pair of
simulated eyes and hands that can be applied to sample human-robot interfaces (or with additional
knowledge any other interface for that matter). Ultimately, the intention is to provide a quantative tool to
guide the design process of human-robot interfaces. This tool will enable designers to apply psychological
theories in real time, providing a simulated user that acts like and interacts with the same interface as a real
user.
A cognitive model forms the cognition of our simulated user. A cognitive model is a theory of human
cognition realized as a running computer program. It produces human-like performance in that it takes
Page 3 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
time, commits errors, deploys strategies, and learns. It presents a means of applying cognitive psychology
data and theory to HCI problems in real-time and in an interactive environment [9-11]. We have developed
a system consisting of the cognitive architecture ACT-R [12] and a simulated eyes and hands suite called
Segman [13] that can be applied to virtually any type of interface running on any operating system. We
will begin by describing the parts that make up the system and then a demonstration. Subsequently, we’ll
discuss how this system can be applied as a simulated user to explore human-robot interaction, leading to
explanations of user’s behavior and evaluation of interfaces.
A. The ACT-R architecture
The ACT-R architecture integrates theories of cognition [12], visual attention [14], and motor movement
([15]. It has been applied successfully to higher-level cognition phenomena, such as modeling scientific
reasoning [16], differences in working memory [17], and skill acquisition [18] to name but a few. Recently
it has been applied successfully to a number of HCI issues [19] [20] [11].
ACT-R makes a distinction between two types of long-term knowledge, declarative and procedural.
Declarative knowledge is factual and holds information like “2 + 2 = 3”. The basic units of declarative
knowledge are chunks, which are schema-like structures, effectively forming a propositional network.
Procedural knowledge consists of production rules that encode skills and take the form of condition-action
pairs. Production rules correspond to specific goals or sub-goals, and mainly retrieve and change
declarative knowledge.
Besides the symbolic procedural and declarative components, ACT-R also has a sub-symbolic component
that determines the use of the symbolic knowledge. Each symbolic construct, be it a production or chunk,
has sub-symbolic parameters associated with it that reflect its past use. In this way, the system keeps track
of the usefulness of the symbolic information. Which information is currently available in the declarative
memory module is partially determined by the odds that a particular piece of information will be used in
that context.
An important aspect of the ACT-R architecture is that models created in it predict human behavior
qualitatively and quantative: Each covert step of cognition (production firing, retrieval from declarative
memory) and overt action (mouse-click, moving attention) has latencies associated with them that are based
on psychological theories and data. For instance, taking a cognitive action, firing a production rule takes
50 ms (modulated by other factors such as practice), and the time needed to move a mouse is calculated
Page 4 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
using Fitts law (e.g., [21]). In this way, the system provides a way to apply psychological knowledge in
real-time.
B. The perceptual-motor buffers
A schematic of the current implementation of the theory, ACT-R 5.0 (act.psy.cmu.edu/ACT-R_5.0), is
shown in Figure 1. At the heart of the architecture is a production system, which represents central
cognition and interacts with a number of buffers.
These buffers represent the information that the system is currently acting on: The Goal buffer contains the
present goal of the system, the Declarative buffer contains the declarative knowledge that is currently
available, and the perceptual and motor buffers indicate the state of the perceptual and motor module (busy
or free, and their contents). The communication between central cognition and the buffers is regulated by
Page 5 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Figure 1. ACT-R 5 system diagram. The production system and buffers run in parallel, but each component is itself serial. The graded areas indicate the novel functionality provided by SEGMAN that overrides the original perceptual –motor functionality of ACT-R 5, which is indicated by the dashed lines.
production rules. As mentioned, production rules are condition-action pairs: The first part of a production
rule, the condition-side, typically tests if certain declarative knowledge (in the form of a chunk) is present
in a certain buffer. The second part, the action side, then sends a request to a buffer to either change the
current goal, retrieve knowledge from a buffer such as declarative memory, or perform some action.
The perceptual and motor buffers allow the model to “look” at an interface and manipulate objects in that
interface. The perceptual buffer builds a representation of the display in which each object is represented
by a feature. Productions can send commands to the perceptual buffer to direct attention to an object on the
screen and create a chunk in declarative memory that represents that object and its location on the screen.
The production system can then send commands, initiated by a production rule, to the motor buffer to
manipulate these objects.
Central cognition and the various buffers all run in parallel with one another, but each of the perceptual and
motor buffers is serial (with a few rare exceptions) and can only contain one chunk of information. This
means that the production system might retrieve a chunk from declarative memory, while the perceptual
buffer shifts attention and the motor buffer moves the mouse. We will mainly concentrate on the motor and
perceptual buffer, which are most relevant for our purpose.
C. Segman and ACT-R 5
ACT-R 5 in its current release (act.psy.cmu.edu) interacts with interfaces using a Perceptual-Motor buffer
(ACT-R/PM). ACT-R/PM [20] includes tools for creating interfaces and annotating existing interfaces in
Macintosh Common Lisp so that models can see and interact with objects in the interface. This allows
most models to interact in some way with most interfaces that are written in that language, and to let all
models interact with all interfaces written with the special tools.
For our simulations, we developed a more general version of ACT-R/PM, which provides ACT-R 5 direct
access to an interface, thus removing the need for a specific interface creation tool. This is done by
extending ACT-R/PM with the Segman suite (www.csc.ncsu.edu/faculty/stamant/cognitive-
modeling.html).
As Figure 1 shows, Segman [13] takes pixel-level input from the screen (i.e., the screen bitmap), runs the
bitmap through image processing algorithms, and builds a structured representation of the screen. This
Page 6 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
representation is then passed to ACT-R through the ACT-R/PM theory of visual perception (i.e. perceptual
buffer). ACT-R/PM moderates what is visible and how long it takes to see and recognize objects.
Segman can also generate mouse and keyboard inputs to manipulate objects on the screen. This
functionality is called through the ACT-R/PM theory of motor output, but we have extended the output
results to work with any Windows interface. This is done by creating very primitive events (click icon,
select button, etc), which are implemented as functions at the operating system level. As such, they are
indistinguishable from user-generated events. Currently, we have a fully functional system that runs under
Windows 98 and 2000.
III. THE MODEL OF ROBOT DRIVING
We will now describe an implementation of our system called dum-AS (pronounced [doo ‘maa]) [22],
which stands for driver user model - ACT-R & Segman. Dum-AS drives a car in a Java-implemented
game, which can be downloaded from the internet at www.theebest.com/games/3ddriver/3ddriver.shtml.
For the simulations reported below, no changes were made to the game.
We choose the 3D driver game for several reasons. Most importantly, it shares both surface and deep
similarities with human-robot interface use. On the surface level, it uses a first-person view as task
perspective and the environment changes dynamically in response to the actions of the user and task
environment, which is also the case for many human-robot systems. On a deeper level, driving behavior is
a prototypical example of real-time, interactive decision making in an interactive environment [23] [19].
The simulation we are using is comparable to many robot applications in that it relies heavily on
perceptual-motor skills, and involves decision-making under time pressure and interacting with a
dynamically changing environment. Furthermore, the driving game represents a simplified driving
environment, which corresponds sufficiently to real-life driving but is nevertheless a controlled setting.
Because its source code is extensible, we can manipulate aspects of the environment (e.g., slow or fast
driving) and add an interface whose features can be varied (e.g., bigger or smaller buttons), essentially
allowing for controlled experimental manipulations. Because the code is Java, this can be done on multiple
platforms. And finally, because we did not write it, it saved us time and helps to show the generality of this
approach.
Models of driving have been targets of research for decades (the analysis of Gibson and Crooks in 1938
provides one of the earliest examples [24]; see Bellet and Tattegrain-Veste [25] for a concise historical
Page 7 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
overview from a cognitive ergonomics perspective.) The hierarchical risk model of van der Molen and
Botticher is a representative example of recent models [26]. Driving can be seen as structured into
strategic, tactical and operational levels. Moving up the hierarchy, each level describes an increasingly
abstract set of behaviors that govern choices at the level below it. At the strategic level, planning activity
takes place, such as the choice of route and travel speed. At the tactical level decisions encompass more
concrete, situation-dependent actions, such as lane changing, passing, and so forth. The operational level
describes skilled but routine activities, such as steering and acceleration.
The different levels of abstraction represent different demands on the cognitive, perceptual, and motor
abilities of the driver. For example, feedback from assistive technology such as ABS or power steering is
provided at the operational level through haptic channels, often imperceptibly. Feedback for travel speed,
in contrast, requires some cognitive activity at the strategic level, to interpret speedometer readings. If the
feedback channels from these different activities were reversed (e.g., if the driver had to interpret a
numerical value to determine power steering assist), their usability would be seriously impaired. Many task
domains in HRI, in particular urban search and rescue, share this layered structure.
A. Current implementation of the model
We set out to let the model perform some standard tasks, like staying on the road, avoiding oncoming
traffic, and increasing or decreasing speed. At this point, the model can start the game by clicking the
mouse on the game window, accelerate by pushing the “A” key, brake by pushing the “Z” key, and steer by
using the left and right arrow-keys.
Perceptual processing in the model is based on observations from the literature on human driving, as is
common for other driving models . Land and Horwood's [27] study of driving behavior describes a "double
model" of steering, in which a region of the visual field relatively far away from the driver (about 4 degrees
below the horizon) provides information about road curvature, while a closer region (7 degrees below the
horizon) provides position-in-lane information. Attention to the visual field at an intermediate distance, 5.5
degrees below the horizon, provides a balance of this information, resulting in the best performance.
The visual interface of the 3D Driver Game, which is the same interface the model uses, is shown in Figure
2. The default procedure for perception in the model is as follows. The model computes position-in-lane
information by detecting the edges of the road and the stripes down the center of the road. The stripes are
combined into a smoothed curve to provide the left boundary of the lane, while the right edge of the road
Page 8 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
directly gives the right lane boundary. The model computes the midpoint between these two curves at 5.5
degrees below the horizon. This point, offset by a small amount to account for the left-hand driving
position of the car, is treated as the goal. If the center of the visual field (the default focal point) is to the
right of this point, the model must steer to the right, otherwise to the left.
Perceptual processing in the model has limitations. For example, it is not entirely robust: Determining the
center of the lane can break down if the road is curving off too fast in one direction or another. Segman can
also return some of the information that it has extracted. For example, it can determine road curvature from
more distant points, as is done in models of human driving [28]. However, this has not led to improved
performance in this simulation environment.
In its current form, the model has problems staying on the road due to the fact that it does not handle visual
input very well. The amount of change in speed depends on how the visual environment is changing. At
the moment, the model takes snapshots of the whole visual scene to determine its actions. It detects
changes by recording the locations of specific points in the visual field and then measuring the distance
they move from one snapshot to the next. It turns out that this is not a good way to handle visual flow:
Suppose that at time t the model analyzes the road, records the data for estimating visual flow, and
determines that steering one direction or another is appropriate. At time t+1 some steering command is
issued, and the simulated car moves in that direction. At time t+1 or later the road is again analyzed so that
flow can be computed, but at this point the action of the model resulted in changes in the visual field,
independent of changes that would have occurred otherwise. This contribution needs to be accounted for, or
the car might end up braking every time it steers.
Page 9 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Figure 2: Two snapshots of the driving environment.
The model still represents a rather restricted model of driving behavior. Whereas typical driving models
use more than 40 rules [23], the complete behavior of dum-AS is currently determined by only 20
productions rules. Foremost, this reflects that the production system of dum-AS does not yet use the full
range of perceptual-motor capabilities offered by the ACT-R architecture through the ACT-R/PM theory.
Nevertheless, the demonstrations below will illustrate that even with a relatively simple ACT-R model, the
current already demonstrates some of its capabilities and produces behavior that is fully in line with more
established models of driving.
b. Two demonstrations
We provide two example analyses of the 3D driver game interface. This first one assesses the influence of
speed on the ability to drive, the second examines how multi-tasking influences driving. These
demonstrations are really proofs of existence. They are examples of the type of measures that would be
helpful in testing and designing more advanced human-robot interfaces. In order to increase the realism of
our simulation, we will need to expand the perceptual-motor capabilities of the ACT-R model. However,
even though the model at this point only simulates constraints in cognitive functioning, it is able to simulate
realistic driving behavior. Figure 3 shows a screenshot of the desktop during a simulation run. On the left,
if shows a GNU Emacs window, in which a trace of the cognitive model appears (which we discuss later).
The right top half of the figure shows the debug window of the Allegro Lisp package.
Page 10 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Figure 3: Screen capture showing a GNU Emacs window on the left, an Allegro Lisp window in the top right corner and the driving game in the bottom right corner.
The experimenter starts a run by typing (run) at the Lisp prompt in the Allegro Debug window. After that,
the model starts the game autonomously by clicking on the game window shown in the right bottom of the
figure. Note that model “knows” where the gaming window resides on the desktop. By limiting the
attention of the model to the position and dimensions of the game, we create a virtual bounding box on the
screen. Next, the model accelerates, drives at a constant speed and slows down if necessary (e.g. in a
strong curve). Because at this point the model cannot pass, a run typically ends when the model hits traffic
in its own lane. In other cases, it can commit an error and would run off the road. Figure 4 shows the
results of the model simulation on two dependent measures: Lateral deviation, which shows the position of
the car with respect to the center of the right lane and total driving time in minutes.
Speed
In the speed demonstration, dum-AS completed two sets of 10 runs, one at low and one at high speed, in
the 3D Driving Game. Because the driving environment is a two-lane road, a good way to characterize
driver performance is by looking at the ability of the model to stay on its ideal line of driving, which is
about 4 degrees to the right of the center of the road. Lane deviation is commonly used in driving studies
to measure the influence of a variety of factors such as multi-tasking and drug use [29].
In this simulation, we looked at the influence of speed on the amount of lane deviation. The left panel of
Figure 4 shows the model predicts that average lane deviation will increase as speed increases, which is in
line with experimental data and previous models . The model needs a certain amount of time to update its
representation of the environment, mainly determined by constraints build into the ACT-R model. As a
result, the distance between steering adjustments increases as speed increases thus leading to larger lane
deviations.
The right panel of Figure 4 shows how total average driving time, measured in minutes, drops significantly
as speed increases. The explanation for this is made clear by the type of errors in this condition: Because
more distance passes between two steering adjustments, the chance of accidents also increases. In the Slow
condition, dum-AS only had 3 accidents, compared to 7 and 10 in the Medium and Fast conditions
respectively.
Page 11 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
0
3
6
9
12
15
Slow Medium Fast
Lane
dev
iatio
n
Multi-tasking
In the multi-tasking demonstration, we illustrate how dividing the model’s attention produces the same
effect as increasing speed. In essence, the model’s performance is determined by the speed and accuracy
with which it reacts and adapts to the environment. As a consequence, anything that averts attention from
driving will affect performance. More precisely, the time between updating moments will increase, leading
to behavior that is less adapted to the environment. To simulate this, we added useless knowledge to the
system designed to interfere with driving. Specifically, we added to the model’s procedural knowledge
some simple rules that can fire any time while the model is driving. This simulates the influence of
distracting thoughts, as well as the effects of reduced working memory: Due to the serial nature of rule-
firing in ACT-R, whenever one of the useless rules fires, it results into a slowing down of the execution of
the relevant driving productions. As a result, performance will be more error prone (for related work,
ritter.ist.psu.edu/acs-lab/#ACT-R/AC).
Figure 5: Lane deviation (in degrees) and total driving time (in minutes) of dum-AS in the Standard and Worried condition.
Page 12 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Figure 4. Speed Demonstration: Lane deviation (in degrees) and total driving time (in minutes) of dum-AS in function of speed. Slow corresponds to a driving speed within the range of 15-20, medium 20-25, and fast 30-35 as measured on the spedometer in the simulation.
Slow Medium Fast Slow Medium Fast
Standard Worried Standard Worried
We compared the slow speed condition from the speed demonstration (Standard) to a condition in which
the model drove at the same speed but was bothered by “obtrusive” thoughts (Worried). Figure 5 shows
the result for the same set of dependent measures. The right panel shows the model predicts that average
lane deviation increases when the model is worried. This confirms data generated with more complex
driver models that show how a secondary task affects performance . The second measure further confirms
this. The left panel of Figure 5 shows how total average driving time, measured in minutes, drops
significantly in the worry condition due to an increase in the number of accidents.
A very useful aspect of the ACT-R model is that it also generates a protocol of behavioral output,
illustrating how separate parts of a complex behavior like driving unfold over time. Figure 5 illustrates this
for the multi-tasking demonstration: It depicts a test run of the model, starting with the “go” production
and ending with a crash. For illustrative purpose, we chose a particularly short run. As you can see, the
protocol indicates what behavior (steering, cruising, “thinking about the World Cup”) is taking place at
what time.
Time 0.000: Go Selected**********************GO!!********************** Time 0.050: Go Fired Time 0.050: Perceive-Environment Selected Time 0.100: Perceive-Environment Fired Time 0.100: Decide-Action Selected Time 0.150: Decide-Action Fired Time 0.150: Steer-Right Selected GOING TO THE RIGHT >>>>>>>>>>>>>>>>>>>>>>>> Time 0.200: Steer-Right Fired Time 0.200: Perceive-Environment Selected Time 0.250: Perceive-Environment Fired Time 0.250: Decide-Action Selected Time 0.300: Decide-Action Fired Time 0.300: Steer-Right Selected GOING TO THE RIGHT >>>>>>>>>>>>>>>>>>>>>>>> Time 0.350: Steer-Right Fired Time 0.350: Perceive-Environment Selected Time 0.400: Perceive-Environment Fired Time 0.400: Decide-Action Selected Time 0.450: Decide-Action Fired Time 0.450: Cruising Selected
CRUISING Time 0.500: Cruising Fired
Time 0.500: Perceive-Environment Selected
Time 0.550: Perceive-Environment Fired Time 0.550: Decide-Action Selected Time 0.600: Decide-Action Fired Time 0.600: Steer-Left Selected <<<<<<<<<<<<<<<<<<<<GOING TO THE LEFT Time 0.650: Steer-Left Fired Tim .0.650: Perceive-Environment Selected Time 0.700: Perceive-Environment Fired Time 0.700: Decide-Action Selected Time 0.750: Decide-Action Fired Time 0.750: Thinking about wc Selected “Thinking about the World Cup” Time 0.800: Thinking about wc Fired Time 0.800: Perceive-Environment Selected Time 0.850: Perceive-Environment Fired Time 0.850: Decide-Action Selected Time 0.900: Decide-Action Fired Time 0.900: Cruising Selected CRUISING
. . . Time 2.400: Cruising Fired Time 2.400: Perceive-Environment Selected Time 2.450: Perceive-Environment Fired Time 2.450: Crashing Selected *********************CRASH*************** <<<<<<<Writing data to data.txt>>>>>>
Time 2.500: Crashing Fired
Page 13 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Figure 6. Protocol of behavioral output generated by dum-AS during a run in the multi-tasking
demonstration.
This protocol gives insight into the behavior, in that it shows the sequence and timing of behavior and can
also indicate critical points in a behavior. Furthermore, it can be compared to behavior of human subjects
as a further validation of the model, or to gain further insight into a complex behavior such as driving.
C. A set of relevant subtasks.
Even though dum-AS is still in its beginning stages and more work needs to be done, it already illustrates
many of the issues that a theory of human-robot interface use will have to face. More specifically, it allows
identifying a set of subtasks that appear relevant to human-robot interface use. What are these?
1. Visual orientation: Visual input is undoubtedly the most important source of information in driving [30].
Nevertheless, the human visual system seems badly equipped for a task like driving: We only see sharply
in a small center of the visual field; acuity drops significantly towards the periphery. As a result, eye
movement, in the form of saccades, is needed to construct an integrated field of vision for larger scenes.
To accomplish this, a driver needs a theory of where to look, and what features are important in the visual
field.
The domain of visual orientation is probably the place where collaboration between human and robot will
be most intense. Robots continue to be poor at high-level perceptual functions, like object recognition and
situation assessment [31], which means the human operator will still play an important role [32]. However,
USR applications have illustrated that it is often not an easy task for an operator to infer complex features
from certain environments. For instance, when maneuvering through a small and dark shaft, it is difficult
to discern any features. As a result, it becomes hard to perform certain tasks like identifying victims [8].
Furthermore, the human operator can remove ambiguities that arise from the limited visual capabilities of
robots, by providing information that enables the visual system to adapt to the situation at hand [33].
2. Speed control and steering: Based on information coming from the visual system, the model has to
decide which speed would be appropriate. This means the user model has to have a theory that determines
optimal speed in a given situation. Once the optimal speed is determined, motor procedures have to be
performed to manipulate the appropriate controls. So the process of controlling speed will be determined to
great extent by the constraints of other parts of the system.
A constraint that typically arises in tele-operated navigation is that of communications time lag, which
occurs often as a result of navigating quickly. Our speed and multi-tasking demonstrations already
Page 14 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
indicated how important it is to keep time intervals between updating moments as short as possible. In
USR applications, the time that passes between operating the controls on an interface and the robot actually
reacting, creates an additional communications time lag. The impact of this additional constraint can be
routinely added to our system. This model can start to help quantify tradeoffs between speed and accuracy.
The occurrence of course corrections is another interesting and unexpected parallel between our simple
driving game demo and USR applications. In the driving game, situations would occur in which the system
would “overreact” to a change in the environment, usually a curve, and would brake to a complete halt.
Subsequently, the system would go in a recurring sequence of over-corrections and the run had to be either
terminated or the model had to be helped by the human simulator. The same occurs in USR applications
when the operator overshoots a desired location, sending the robot into the same type of “repetitive cycle of
over-corrections” [8]. This is a known problem in human operators controls [34] [35]. It is pleasing and
useful to see a simulated user exhibit the same maladaptive behavior.
3. Multi-tasking: Most process models of driving use very efficient and continuous processes to perceive
the environment and control the car [36]. In contrast, our implementation uses a discrete updating
mechanism, reflecting the discrete nature of the ACT-R model in which each step takes a fixed amount of
time. This has important consequences. First, our model does not produce optimal behavior but rather
aims at simulating human behavior. Specifically, its performance is determined by the speed and accuracy
with which it reacts and adapts to the environment. Second, our model allows assessing the influence of a
secondary task, in other words, the effects of multi-tasking (see also [19]). If the model has to divide its
attention between two tasks, each of those tasks will suffer. Specifically, the time between updating
moments might increase, leading to poorer performance.
Our system allows exploring the relationship between the attentional constraints of the human operator,
which are captured by the cognitive model, and the operated object, be it a simulated car or robot. As such,
it is perfectly suited to explore the operation of multiple robots through one user interface. It becomes
possible to map the resources (attention, visual and motor capabilities) of the human operator to the
performance of the robots in real-time. Our system can indicate in a quantative way, which adjustments to
the interface would lead to better performance. As such, it could also indicate at what point robot
autonomy could effectively compensate for human operator constraints.
Page 15 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Some aspects of human-robot interaction have not yet been directly addressed in the driving game, but will
be in future simulations. These are:
4. Navigation: Our driving game has a direct interface, in that the operator directs the car using the
keyboard. This perspective is often referred to as “inside-out” driving or piloting, because the operator
feels as if she is inside the robot or vehicle and looking out, and is a common method for vehicle or robot
tele-operation [1]. A problem that arises with direct interfaces is that they provide poor contextual cues,
which leads to less situation awareness. Using our system, it is possible to investigate how map building
by the user is related to aspects of the user interface and characteristics of the user (expert, novice, high of
low working memory capability and so on).
5. The influence of the user interface on performance: In human-robot interactions, the interface plays a
secondary role. The user usually aims at completing one or more primary tasks and uses the interface to
achieve her goals. The present approach is ideally suited to explore how changes in the interface will affect
performance of the user on the primary tasks.
6. The level of expertise of the user. By varying the knowledge of the task in the cognitive model, one
could vary the degree of skill or expertise of the user and see how this affects performance. It affects the
design of the interface as well, as one would like experts to have great flexibility and control, while novices
should be guarded from making large and costly mistakes. By using the current system, one could explore
the relationship between user interface design and level or performance in a direct and quantative way.
IV. GENERAL DISCUSSION: WHAT THE MODEL TELLS US ABOUT HUMAN-ROBOT INTERFACES
Our model that interacts with a simulated USR-type task provides several suggestions about what makes
human-robot interfaces difficult to use. These implications arise from each aspect of the model. Several
problems in perception and eye-hand coordination arise from the simulated eye and hand. Addition
problems can be noted from the cognitive model of this task. The cognitive modeling language, as it
implements a unified theory of cognition, makes several further suggestions.
The cognitive model of the USR tasks predicts that USR tasks are difficult because they contain several
difficult subtasks, and because they interact. The model currently has to do several tasks at once, driving,
noticing objects, and so on. These tasks alone are not difficult, but they interact with each other, competing
to use the same resources, rule firings and buffer contents. If these tasks were in different modalities, they
Page 16 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
would degrade each other’s performance less. For example, if the model could notice trees by saying
something rather than by clicking on them, its performance would improve.
The simulated eye of the model, based on Segman [13] explains where several difficulties for users come
from. Segman was able to work fairly well on its early tasks, as its authors noted explicitly, because the
computer screen is a relatively benign environment for vision. Edges are crisp, color and shapes are not
ambiguous, and there are a limited number of object types. Environments for USR are basically none of
these. The control screens of these systems are easy to see, but the addition of a forward looking (indeed
any perspective) video display adds further complexity to the vision processing. Other reports agree that
humans have problems understanding the video displays on USR interfaces [8].
In addition to including a harder vision problem because of more ambiguous stimuli, the video displays are
also noisier and of poorer quality than either vision (the display) or the controls. These effects may cause
disproportionate problems for the model than for the human as humans have better vision systems, but
humans too have problems with recognizing objects when the display is noisy.
Models will have some difficulty with vision processing of the video signal for some time. These
difficulties are also explored in the general computer vision community. The problems that arise out of
trying to understand the image like a human would suggests that there are many more problems to be
solved, and that human have difficulties with these displays as well. This may be a place for computers to
help users. Work on augmented reality would have a useful role here. Some of the earliest work in
psychology showed that object naming was more difficult than word recognition. Screens that described or
labeled ambiguous objects would help the model and are likely to help humans in this area. Such a system
could help by holding visual state (going up stairs, on the third floor). These systems could also help by
numbering ambiguous objects. Videos of workers at the World Trade Center search and rescue effort
(available from www.crasar.org) suggest that it would be useful to have objects numbered for discussion by
the operators.
The model currently cannot recognize emergent features. For example, trees that make up a line. This skill
appears to be an interaction between vision and cognition. It is not straightforward to include it in a model.
Similar inferential skills are necessary in USR to recognize implications such as hot walls, and unnatural
positions of objects. People have difficulty in doing it, for recognizing the implications of objects in
perception appears to be an important component of expertise [37].
Page 17 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
These vision problems are compounded when considering multiple robots controlled by a single operator.
It may be quite useful for computer vision algorithms to preprocess what it can, and then providing a verbal
description of the robot's progress ("going through woods"). A military commander might like to see what
their subordinates are doing occasionally, but if they saw everything, they would be overwhelmed. Verbal
reports in that case help manage attention and reduce cognitive demands.
As the task increases in complexity, the model will have to have mental maps of the world, which is
difficult for people to create without external memory aids. Unless the operator has a clipboard, paper, and
pen with them, they have to hold the mental map of the world in their head. Providing support for world
views in the interface would help users navigate their world.
The model's complexity and enfolding representation offers a theoretically based prediction of situation
awareness. That is, the model's mental map of the world at a point in time can be compared to the world at
that point in time. It will be difficult and probably not useful to assign a number to this comparison, but
qualitative summaries and full descriptions of the match and mismatch give a detailed, meaningful measure
of the model's awareness of the situation. The model (its knowledge and strategies) and the interface it uses
to run robots can be modified to improve situation awareness. What is likely to happen is that some aspects
can be easily improved, but that memory decay and limits to attention are likely to hinder representing the
entire world in the model. The designer will be left with working on what aspects of the situation should be
highlighted for the model and thus for the user.
There are difficulties specifying all the tasks that users would do with a USR robot. Previous reviews [8]
provides a short list of tasks, including navigation and noticing. The current set of tasks suggests that the
operators, in addition to their routine tasks, are also doing many novel tasks. Generally, performing tasks
that require problem solving requires more expertise and is more error prone than well-practiced behavior.
These effects may be due to the lack of practice with USR interfaces, but they may also indicate a
fundamental effect of the domain. This effect suggests that models and human operators should practice
doing the simple tasks to increase resources available for more complicated tasks, and that they should
practice more complicated tasks to support transfer of skills between the complex tasks [38].
The model we presented and the cognitive modeling architecture it is created in, ACT-R, make several
predictions about why this task is difficult for humans. We note a few here.
Page 18 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
Learning how to use an HR interface is currently important for the acceptance of robots in urban search and
rescue [8]. USR robots are currently just another tool, with restrictions on training time. If their use
becomes more pervasive or specialists emerge to use USR robots like dogs are used in USR, then this may
change.
Learning in the ACT-R theory, as presented by Anderson is sensitive to several factors that are usually
absent in human-robot interfaces or have bad values. Feedback, its quality and amount, may be the most
important factor. HRI interfaces often provide poor feedback as to the robot's location and the effects of
commands. While the current model does not learn, it will have difficulties learning when commands do
not always work in a uniform manner, the interpretation of the actions is hindered by imperfect perception,
and when the feedback is not provided. The feedback is also often delayed, which hinders learning.
The nature of a mixed initiative interface makes it harder to use. There are more aspects of the state to
keep track of, including who has which tasks, the state of progress of the other agent, and then context
switching and communication. Models created in the ACT-R architecture can do these tasks, but the
models predict that these activities will take time, attention, and working memory (Wood reference please).
Finally, the models to use robots interact in a less direct way than models that use more typical interfaces.
The models have to keep a distal representation separate from the interface. They are representing a
separate world that is not the interface, but built from the interface. More direct interfaces can rely on the
interface to hold and be the state of their world [9].
CONCLUSIONS
Based on the results of this model we can see several general lessons for the area of urban search and
rescue robots and the design of their interfaces in particular.
Models can use HRI interfaces
The model presented here shows that models of users can already provide insights and summaries useful
for creating better human-robot interfaces. The various levels of the model provide suggestions for
improving the interface we studied, and by analogy provides suggestions for many USR robot interfaces.
The model's parameters can be varied to represent individual difference in the operator, such as working
memory capacity and knowledge. The impact of these differences on performance can be examined, as
shown in figures 4 and 5. These results suggest that the speed of the robot can influence its drivability,
Page 19 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
with increased speed not always leading to better performance. Differences in the interface can also be
examined.
Massive application and reuse is becoming possible
Our model is not complete and is not a perfect user in many ways. It is, however, uniquely positioned to
rapidly improve the scope of behavior it can represent as well as being applied to further interfaces.
Having the model interact with an off-the-shelf interface based on reading a bitmap and passing system
function calls to generate input events means that the model can now start to be applied to virtually any
human-robot interface. Interfaces that run under the Windows operating system can be examined directly.
Interfaces that run under the Macintosh system can be examined with the Windows to Macintosh display
tool VNC. Interfaces that run under the X-windows system can be examined using the Xceed Windows
utility for displaying X-Windows generated displays under Windows. As noted above, there are limitations
in fonts and object recognition, but these are now approachable problems.
Creating the model within a cognitive architecture provides an approach for including more aspects of
behavior quickly as well as providing several further advantages when creating such a large model. There
are many people creating models of human behavior using the ACT-R architecture (see
act.psy.cmu.edu/papers.html for a list). Some of these models are of behaviors not of interest or of use for
modeling a user of HR interfaces. There are enough models, however, that we were able to start to build
upon existing models rather than create the model entirely from scratch. This is theoretically pleasing
because it offers an additional audience for these models, as well as other users and scientists to test the
user model in formal and informal ways.
There are several models that we can already point to as being candidates for directly extending our model.
Candidates include St. Amant's model of autonomous exploration of interfaces [13], Ritter's model of
telephone dialing [39], and Salvucci's model of telephone dialing while driving [19]. St. Amant's model of
interface exploration may be extendible to include victim search in the environment, a common task in for
urban search and rescue robots [8].
Working within a common cognitive modeling architecture provides the resources of the architecture for
people interested in understanding the model. In the case of ACT-R, this includes an online tutorial,
summer schools, programming interfaces, a mailing list to get help with technical problems, and a manual.
Page 20 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
HRI are poor because it is difficult to pay attention to them
The model predicts that human-robot interfaces are sensitive to many factors, including graphics, the
processing speed of the human operator relative to the robot they are manipulating, the knowledge and
processing in the robot and user, and the user's eye-hand coordination. Improving an interface that relies on
this many factors is difficult without tools to help keep track of these factors and their relationships.
Keeping track of these factors in complex environments, such as operating multiple robots, understanding
how their control structures will be robust to perturbations, and predicting how many robots an operator can
easily manipulate, will be particularly difficult.
To a certain extent, this knowledge of users and of how to support users with interfaces can be passed to
designer by having them reading books on Human-Computer Interaction. This approach is useful, and
model users like the one presented here can also help provide a system level summary of users that will be
interesting and informative to HRI designers.
HRI are poor because it is difficult for humans
The model explored here showed that it is not an artifact or coincidence that current human-robot interfaces
are difficult to use. Our theory of HRI use, dum-AS, suggests these interfaces are difficult for all kinds of
reasons. The difficulties range from the perceptual issues that must be addressed, to the relatively high-
level cognition and problem solving in new situations that must be supported, as well as the relatively large
amount of knowledge that is required.
The model of vision that interprets the bitmap makes direct suggestions that a reason the task is difficult is
because the vision recognition problem of the real world bitmaps is difficult. This result suggestions that
better display hardware and augmented reality would help make HR interfaces easier to use.
As these user models become easier to create they will be able to more routinely provide feedback directly
to interface designers. In the meantime, example models like this can summarize behavior with human-
robot interfaces, noting what makes human-robot interfaces difficult to use so that designers can study
these problems and avoid them.
Page 21 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
ACKNOWLEDGMENTS
This work was sponsored by the Space and Naval Warfare Systems Center San Diego, grant number N66001-1047-411F. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be assured.
DIRK VAN ROOY is a post-doctoral researcher at the new interdisciplinary School of Information Sciences and Technology at Penn State. He earned his PhD, MS and BS in Computational Psychology from the Free University of Brussels.
FRANK E. RITTER (M '82) has helped start the new interdisciplinary School of Information Sciences and Technology at Penn State. He earned his PhD in AI and Psychology and a MS in psychology from CMU, and BSEE from the University of Illinois/Urbana. He is on the editorial board of Human Factors and the steering committee of the Society for the AI and the Simulation of Behavior.
ROBERT ST. AMANT is an associate professor in the Computer Science Department at North Carolina State University. He co-directs the IMG Lab, which performs research in the areas of intelligent user interfaces, multimedia, and graphics. He earned a Ph.D. in computer science in 1996 from the University of Massachusetts, Amherst, and a B.S. in electrical engineering and computer science from the Johns Hopkins University in 1985.
REFERENCES
1. T.W. Fong and C. Thorpe, Vehicle Teleoperation interfaces. Autonomous Robots, 2001. 11: p. 9-11.
2. L.S. Lopes, et al., Sentience in robots: applications and challenges. IEEE Intelligent Systems [see also IEEE Expert], 2001. 16(5): p. 66 -69.
3. T. Yamada, J. Tatsuno, and H. Kobayashi. A practical way to apply the natural human like communication to human-robot interface. in International Workshop on Robot and Human Interactive Communication. 2001: IEEE.
4. F. Steele, G. Thomas, and T. Blackmon. An Operator Interface for a Robot-Mounted, 3D Camera System: Project Pioneer. in Proceedings of the 1999 IEEE Virtual Reality Conference. 1998. Houston, TX.
5. C. Breazeal, A Motivational System for Regulating Human-Robot Interaction., in Proceedings of AAAI'98. 1998, AAAI Press / The MIT Press: Madison, Wisconsin, USA. p. 126-131.
6. R.M. Voyles, J.D. Morrow, and P.K. Khosla, Gesture-Based Programming for Robotics: Human Augmented Software Adaptation. Special issue of IEEE Intelligent Systems on self-adaptive software, 1999. 14(6): p. 22-29.
7. T.W. Fong, C. Thorpe, and C. Baur, Collaboration, Dialogue, and Human-Robot Interaction, in 10th International Symposium of Robotics Research. 2001, Springer-Verlag, London.: Lorne , Victoria, Australia,.
8. R. Murphy, Casper, J., Micire, M., and Hyams, J., Mixed-initiative Control of Multiple Heterogeneous Robots for USAR, in IEEE Transactions on Robotics and Automation. 2002.
9. F.E. Ritter and J.H. Larkin, Using process models to summarize sequences of human actions. Human-Computer Interaction, 1994. 9(3&4): p. 345-383.
Page 22 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
10. F.E. Ritter and R.M. Young, Embodied models as simulated users: Introduction to this special issue on using cognitive models to improve interface design. International Journal of Human-Computer Studies, 2001. 55: p. 1-14.
11. M.J. Schoelles and W.D. Gray. Argus Prime: Modeling emergent microstrategies in a complex simulated task environment. in Proceedings of the Third International Conference on Cognitive Modeling. 2000: Veenendal, NL: Universal Press.
12. J.R. Anderson and C. Lebiere, The Atomic Components of Thought. 1998, Mahwah, NJ: Lawrence Erlbaum Associates.
13. R. St. Amant and M.O. Riedl, A perception/action substrate for cognitive modeling in HCI. International Journal of Human-Computer Studies., 2001. 55(1): p. 15-39.
14. J.R. Anderson, Matessa, M., & Lebiere, C., ACT-R: a theory of higher level cognition and its relation to visual attention. Human Computer Interaction, 1997. 12 (4): p. 439-460.
15. D. Kieras and D.E. Meyer, An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction., 1997. 12: p. 391-438.
16. C.D. Schunn and J.R. Anderson, The generality/specificity of expertise in scientific reasoning. Cognitive Science, 1999. 23(3): p. 337-370.
17. M.C. Lovett, L.M. Reder, and C. Lebiere, Modeling individual differences in a digit working memory task., in Proceedings of the Conference of the Cognitive Science Society. 1997, Mahwah, NJ: Erlbaum. p. 460-465.
18. J.R. Anderson, J.M. Fincham, and S. Douglass, The role of examples and rules in the acquisition of a cognitive skill. Journal of Experimental Psychology: Learning, Memory and Cognition., 1997. 23: p. 932-945.
19. D.D. Salvucci, Predicting the effects of in-car interface use on driver performance: An integrated model approach. International Journal of Human-Computer Studies, 2001. 55(85-107).
20. M.D. Byrne, ACT-R/PM and menu selection: Applying a cognitive architecture to HCI. International Journal of Human-Computer Studies., 2001. 55: p. 41-84.
21. P.M. Fitts, The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 1954. 47: p. 381-391.
22. M. Antoniotti, A. Deshpande, and A. Girault, Microsimulation Analysis of a Hybrid System Model of Multiple Merge Junction Highways and Semi-Automated Vehicles, in Proceedings of the IEEE Conference on Systems, Man and Cybernetics. October 1997: Orlando, FL, U.S.A.
23. J. Aasman, Implementationsof car-driver behaviour and psychological models, in Road User Behavior: Theory and Practice., J.A. Rothengatter and R.A. Bruin, Editors. 1988, Van Gorcum: Assen.
24. J.J. Gibson and L.E. Crooks, A theoretical field-analysis of automobile-driving. American Journal of Psychology, 1938. 51: p. 453-471.
25. T. Bellet and H. Tattegrain-Veste, A framework for representing driving knowledge. International Journal of Cognitive Ergonomics, 1999. 3(1): p. 3-49.
26. H.H. van der Molen and M.T. Bötticher, A hierarchical risk model for traffic participants. Ergonomics, 1998. 31(4): p. 537-555.
27. M.F. Land and J. Horwood Which parts of the road guide steering? Nature, 1995. 377: p. 339-340.28. M.F. Land and D.N. Lee, Where we look when we steer. Nature, 1994. 369: p. 742-744.29. H.W.J. Robbe, Marijuana use and driving. Journal of the International Hemp Association 1: 44-48.,
1994.30. B.L. Hills, Vision, visibility and perception in driving. Perception, 1980. 9: p. 183-216.31. P. Milgram, et al., Applications of Augmented Reality for Human-Robot Communication, in
IROS'93: Int'l Conf. on Intelligent Robots and Systems. 1993: Japan. p. 1467-1472.
Page 23 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI
32. R. Clark, Asimov's laws of robotics:implications for information techonolgy. IEEE Computer, 1994. 26, 27(12, 1).
33. T. Takahashi, et al. Human-robot interface by verbal and nonverbal communication. in RSJ International Conference on Intelligent Robots and Systems. 1998.
34. C. Wickens, Mavor, A., Purusurum, R., and McGee, B., The Future of Traffic Control. 1998, Washington, D.C: National Academy of Sciences.
35. C.D. Wickens, Gordon, S., and Liu, Y., An introduction to human factors engineering. 1998, New York: Addison Wesley Longman, Inc.
36. B. Cheng and T. Fujioka. A Hierarchical Driver Model. in IEEE Conference on Intelligent Transportation Systems. 1997. ITSC: IEEE.
37. G.A. Klein, Recognition-primed decisions., in Advances in Man-Machine Systems Research, W.B. Rouse, Editor. 1989, JAI.: Greenwich, CT. p. 47-92.
38. M.K.A. Singley, J. R., Transfer of Cognitive Skill. 1989, Cambridge, MA: Harvard University Press.
39. F.E. Ritter, A role for cognitive architectures: Guiding user interface design, in Proceedings of the Seventh Annual ACT-R Workshop, p. 85-91. 2000: Department of Psychology, Carnegie-Mellon University.
Page 24 document.doc 5/18/2023 - 3:14 PM
Simuser to explore HRI