efﬁcient and trustworthy social navigation via explicit ... · predictable motions [26]. sadigh...

1

Efficient and Trustworthy Social Navigation ViaExplicit and Implicit Robot-Human Communication

Yuhang Che, Allison M. Okamura, Fellow, IEEE, and Dorsa Sadigh, Member, IEEE

Abstract—In this paper, we present a planning framework thatuses a combination of implicit (robot motion) and explicit (vi-sual/audio/haptic feedback) communication during mobile robotnavigation. First, we developed a model that approximates bothcontinuous movements and discrete behavior modes in humannavigation, considering the effects of implicit and explicit com-munication on human decision making. The model approximatesthe human as an optimal agent, with a reward function obtainedthrough inverse reinforcement learning. Second, a planner usesthis model to generate communicative actions that maximize therobot’s transparency and efficiency. We implemented the planneron a mobile robot, using a wearable haptic device for explicitcommunication. In a user study of an indoor human-robot pairorthogonal crossing situation, the robot was able to activelycommunicate its intent to users in order to avoid collisions andfacilitate efficient trajectories. Results showed that the plannergenerated plans that were easier to understand, reduced users’effort, and increased users trust of the robot, compared to simplyperforming collision avoidance. The key contribution of thiswork is the integration and analysis of explicit communication(together with implicit communication) for social navigation.

Index Terms—Social Human-Robot Interaction, Human-Centered Robotics, Motion and Path Planning, Haptics andHaptic Interfaces

I. INTRODUCTION

MOBILE robots are entering human environments, withapplications ranging from delivery and support in ware-

houses, to home and social services. In this work we focus onsocial navigation, in which the movements and decisions ofrobots and humans affect each other. Researchers have studiedvarious methods for generating robot motions that can avoidcollision and comply to social rules during navigation [1]–[7].These methods have shown great potential towards bringingintelligent robots into human environments. However, there isan inherent limitation: robot motion alone does not capture allaspects of social interaction. In scenarios like social naviga-tion, there are many ways that humans interact with each other.For example, by establishing eye contact, two people oftenassume that they will both take actions to avoid each other(people often act more conservatively if they think others arenot paying attention). In addition, there are scenarios whereconflicts occur and require more involved communication. Onecommon example is people choosing to move to the same side

This work was supported in part by the Human-centered AI seed grantprogram from the Stanford AI Lab, Stanford School of Medicine and StanfordGraduate School of Business.

Y. Che is with Waymo LLC, Mountain View, CA 94043. A. M.Okamura is with the Department of Mechanical Engineering, and D.Sadigh is with the Department of Computer Science, Stanford Uni-versity, Stanford, CA 94305, USA. Email: [email protected],[email protected], [email protected].

Wearable Haptic

Interface

Fig. 1. A social navigation scenario in which the robot communicates its intent(to yield to the human) to the human both implicitly and explicitly. Implicitcommunication is achieved via robot motion (slowing down and stopping),and explicit communication is achieved through a wearable haptic interface.

when trying to avoid each other, and then moving to the otherside simultaneously again. In this case, a person usually hasto stop and gesture the other person to go first. Social robotsneed to be able to handle such conflicts in a similar mannerby incorporating different ways of communication.

The aforementioned limitation motivates us to explore richerinteraction and communication schemes for robot social nav-igation. Specifically, we propose to consider both implicitcommunication and explicit communication (See Fig. 1). Inhuman-robot interaction literature, there are various definitionsof implicit and explicit communication [8]–[10]. Our definitionof implicit communication closely follows that of [10]: theinformation from implicit communication need to be extractedby leveraging the context and common ground. Conversely,we define explicit communication as actions that conveyinformation in a deliberate and non-ambiguous manner.

Examples of implicit communication include eye contactand body language. In social navigation, people often encodetheir intentions into their motion, which serves as implicitcommunication [11]. Similarly, a robot’s motion carries in-formation about its intent. The notion of motion as implicitcommunication is closely related to the idea of legibility [12],which measures how well a robot’s motion (and behavior,more generally speaking) expresses its intent. In this work,we develop an algorithm that enables emergence of legiblemotions in social navigation.

Given our definition, explicit communication can take manyforms, including spoken language, visual signals (symbols and

arX

iv:1

810.

1155

6v2

[cs

.RO

] 2

4 Fe

b 20

20

2

colors) and audio signals (sirens), as long as the meaningsof these signals are established beforehand. In this work, wefocus on the use of haptic feedback because it is immediate andtargeted. In addition, haptic feedback is advantageous whenusers’ visual and auditory channels are already overloaded.However, the algorithms developed in this work are not limitedto haptic communication and can be directly applied to otherforms of explicit communication.

Both implicit and explicit communication during navigationwill affect how people move. Thus, we propose that the abilityto understand and predict navigation behaviors of humanswould be beneficial to planning communicative actions. Wedeveloped a data-driven, probabilistic model to predict humanbehaviors, including both continuous movements and discretebehavior modes, given robot actions. Leveraging the humanmodel, we develop an algorithm that plans for a robot’s motionand communication through haptic feedback. Our approachrelies on the assumption that users are cooperative – they willnot intentionally thwart the robot. In environments in whichhumans and robots work together, such as offices, homes,hospitals, and warehouses, this assumption is reasonable. Weleave scenarios where humans act adversarially to future work.We also assume that users are equipped with wearable hapticinterfaces, and the robot can send feedback via these interfaces(can be replaced with sound or visual cues if appropriate). Inthis work, we focus on the interaction between one robot andone human user.

The main contributions of this work are:1) A predictive model of human navigation behavior in re-

sponse to both implicit and explicit robot communication.The model is described in Section IV, and the frameworkin which the model is applied is described in Section III.

2) An interactive planning algorithm based on the humanmodel that enables a robot to proactively communicatethrough either implicit or explicit communication, withthe goal of efficient and transparent interaction (SectionV).

3) Implementation of the proposed algorithm on a physicalmobile robot platform, and analysis and verification of thealgorithm with user studies. We first evaluated and tunedthe system in simulation (Section VI) and then performedexperiments (Section VII).

II. BACKGROUND

A. Social Navigation

In traditional robot motion planning frameworks, hu-mans are usually treated as obstacles that the robot shouldavoid [13]. However, in social navigation scenarios, humansare different from most other obstacles – they have a purpose,follow social rules, and react to other agents in the envi-ronment, including the robot [14]. Researchers have exploredvarious methods for social-aware motion planning.

Understanding and modeling reactive behaviors of humansis necessary for planning socially acceptable actions. A pop-ular framework to model human navigation in response toother agents is the Social Force Model (SFM) [15]–[17],which frames pedestrians as affected by interactive forces

that drive them towards their goals and away from eachother. SFM has been applied to human-robot interaction andsocial navigation scenarios [2], [18]–[20]. Some work in thefield of multi-agent planning also consider the reactive natureof moving agents. For example, optimal reciprocal collisionavoidance (ORCA) [21] assumes that each agent takes someresponsibility of resolving pair-wise conflicts, which result inglobal collision free paths. Guy et al. extended ORCA tomodel human navigation by introducing additional parameterssuch as reaction time and personal space [22]. These reaction-based methods are popular for crowd simulation because ofthe high computational efficiency. However, they tend to beless accurate in predicting individual trajectories, and do notcapture the stochasticity of human behavior.

Researchers have applied probabilistic methods to modelinghuman behavior and planning for social navigation. The Inter-active Gaussian Process model was proposed in [23]; it takesinto account the variability of human behavior and mutualinteractions. Trautman et al. used this model to navigate arobot through a crowded cafeteria [1]. Inverse ReinforcementLearning (IRL) [24]–[27] is another framework that allowsa probabilistic representation of human behavior. IRL wasapplied in [4]–[6], [28] to learn pedestrian models and planfor socially compliant navigation. In general, these approachesaim to generate “human-like” actions, with the underlyingassumption that the robot should behave similarly to a humanin order to be socially acceptable. We also use IRL as partof our model to predict humans’ responses. Our work differsfrom previous approaches in that we model jointly the effectsof both explicit communication and robot motions on humannavigation behavior.

Recently, deep neural networks also gained attention inrobot navigation applications. Chen et al. incorporated socialnorms and multi-agent reactions into a deep reinforcementlearning framework to enable robot navigation in a pedestrian-rich environment [7]. Tai et al. applied a generative adversarialimitation learning strategy to generate socially compliant nav-igation from raw sensor data [29]. Deep learning frameworkshave the advantage that they often impose less assumptions onhuman behavior. However, it is unclear whether they generatemotions that are natural and transparent. In addition, theamount of data required for training these neural networks canbe large and expensive to obtain. Our approach is more dataefficient and explainable by leveraging modeling techniques.

B. Expressive Motion in Robotics

Besides social navigation, researchers have investigated theproblem of planning interactive and communicative motionsin the field of computational human-robot interaction [30].Dragan et al. formalized the idea of legibility in [31] andproposed methods to generate legible motions for robot ma-nipulators [12]. Legible motions are motions that expressthe robot’s intent to observers, and can be different frompredictable motions [31]. Sadigh et al. modeled a human-robotsystem jointly as a dynamical system, and proposed methodsto plan motions that actively influence human behaviors [32]and further actively gather information of internal state of

3

I’ll pass on the left side!

Fig. 2. An example scenario where the robot needs to plan for appropriatemovement and communication.

humans [33], [34]. Similar ideas were explored in human-robotcollaboration scenarios. Bestick et al. developed a methodto enable a robot to purposefully influence the choices of aperson in handover tasks to help the person avoid suboptimalgrasps [35]. Nikolaidis et al. studied human-robot mutualadaptation, and proposed a decision framework to allow therobot to guide participants towards a better way of completingthe task [36]. In our work, we consider robot motions that caninform or affect people as a form of implicit communication,as people often have to infer about the robot’s intent withcontext.

C. Non-Verbal Explicit Communication Methods

In this work, we define actions that communicate infor-mation deliberately and unambiguously as explicit commu-nication. The most common form of explicit communicationis verbal communication. Non-verbal communication can bemade explicit, such as pointing gestures [37], [38], headnod [39] and visual displays [40].

Haptic feedback has been used as an explicit communicationmechanism in human-robot interaction. Scheggi et al. designeda vibrotactile haptic bracelet to guide the user along trajecto-ries that are feasible for human leader-robot follower formationtasks [41], [42]. The bracelet can display three distinct signalsthat inform human to slow down and turn toward left orright. Sieber et al. used haptic feedback to assist a userteleoperating a team of robots manipulating an object [43]by communicating the status (stable vs. in transient) of therobot team. Here, we use haptic feedback to explicitly conveythe robot’s discrete intent to the user in collision avoidancescenarios during navigation.

Non-verbal communication, such as gaze, are considered asimplicit if it indirectly conveys information [44]–[46]. In thiswork, we use the motion of the mobile robot, which does notimpose any constraint on the appearance of the robot and cannaturally be incorporated in social navigation.

III. A PLANNING FRAMEWORKFOR SOCIALLY-AWARE ROBOT NAVIGATION

Humans have an impressive ability to navigate seamlesslyaround other people. They walk still walk smoothly andefficiently when faced with a large group of people walkingtoward or around them in busy streets, concerts, cafeterias,conferences, or office spaces. However, robots lack the sameelegance while navigating around people. Our goal is to designmobile robots that navigate in a similar fashion to humans inspaces shared by humans and robots. Our insight for socialnavigation is that mobile robots – similar to humans – need to

pro-actively plan for interactions with people. In this section,we provide a high level outline of our interaction framework,and we will provide a more detailed description in Sections IVand V.

In this work, we focus on a scenario in which one robotinteracts with one human. To illustrate this, we use the runningexample illustrated in Fig. 2: a robot and a human need to passeach other either on the left side or on the right side. To achievethis pass smoothly, the robot and human must understand eachother’s intent and coordinate their movements. The objectiveof our planning framework is to generate appropriate explicitcommunication and robot motions (which also serve as im-plicit communication) to facilitate human-robot interactions insuch scenarios. The explicit communication consists of a finitenumber of discrete signals, for example, expressing a plan topass on the left or the right side. The robot motions consistof continuous actions such as the robot’s linear and angularvelocities.

We model the joint human-robot system as a fully-observable interactive dynamical system in which the actionsof the human and the robot can influence each other. Inaddition, we consider the stochasticity of actions of the humanand the effect of communication in our planning framework.

Let sth and str denote the physical state of the human andthe robot at a certain time step t, respectively. The states arecontinuous, and include positions and velocities of each agent.The robot can apply continuous control inputs utr that changeits state through the following dynamics:

str = Fr(st−1r , utr) (1)

Similarly, the state of the human is directly changed by heraction uth:

sth = Fh(st−1h , uth) (2)

We let ur = [u1r, ..., uNr ]ᵀ and uh = [u1h, ..., u

Nh ]ᵀ denote

the sequence of robot and human actions over a finite horizonN ([·]ᵀ denotes a vector). ξ1:Nr = [(s1r, u

1r), ..., (s

Nr , u

Nr )]ᵀ

and ξ1:Nh = [(s1h, u1h), ..., (s

Nh , u

Nh )]ᵀ are then the state-action

trajectories of the robot and the human over the horizon N ,respectively. We use uc to refer to the explicit communicationactions. Each of the actions uir and uih are vectors withappropriate dimensions. In this work, we focus on a one-dimensional communication action uc, i.e., uc ∈ Uc ∪ “none”.

We define an overall reward function for the robotRr(ξ

1:Nr , ξ1:Nh , uc). We let this reward function depend on both

the robot’s trajectory and the trajectory of the human. Thisreward function encodes properties such as efficiency (e.g.,robot’s velocity, distance to goal), safety (distance to human),and other metrics. In this work, we use an optimization-basedplanning algorithm, where the robot will optimize the expectedsum of its reward function Rr. Eq. (3) represents this expectedreward, which is computed over predicted trajectories ξ1:Nhof the human. We emphasize that the distribution of ξ1:Nh isaffected by the robot actions. In the next section, we presenta hybrid model that predicts this distribution.

We will discuss our planning algorithm in Sec. V. We willuse Model Predictive Control (MPC) [47] algorithm, where at

4

every time step, the robot computes a finite horizon sequenceof actions to maximize its expected reward:

u∗r , u∗c = argmax

ur,uc

Eξ1:Nh

[Rr(ξ

1:Nr , ξ1:Nh , uc)

](3)

In an MPC scheme, at each time step, the robot onlyexecutes the first control in the optimal sequence u∗r , andthen replans at the next time step. The robot also plans on anexplicit communication u∗c ∈ Uc ∪ “none”. The robot decidesto either communicate (u∗c ∈ Uc), or does not provide anycommunication (u∗c = “none”). When computing the plan ateach time step, we assume that explicit communication canonly happen at the beginning of the planning horizon. This isbecause only the first set of actions in the planned sequenceis executed, and a new plan will be generated in the next timestep.

The key insight of this framework is that the robot is awareof the stochasticity of human behavior and the influences ofcommunication on humans’ actions. As a result, the robotwill pro-actively communicate its intent to avoid undesirablesituations. Take the example scenario in Fig. 2: if the robotand the human choose to pass each other on different sides, acollision could happen. By expressing its intent explicitly, therobot minimizes the chance of a potential collision.

IV. HYBRID MODEL OF HUMAN NAVIGATION

In this section, we present our model of human navigation,focusing on how the robot’s implicit and explicit communica-tion affect behaviors of the human.

To compute the optimal robot actions using Eq. (3), therobot is required to predict human actions over a finite timehorizon. Since human actions are stochastic, the robot needs topredict the distribution of uh. We assume the human dynamicsFh are known. Therefore, the distribution of uh induces anequivalent distribution over human trajectories ξ1:Nh for agiven uh and initial state. Modeling this distribution exactly isquite challenging and may be computationally expensive. Ourobjective is to develop an approximate model that captures theinteractive nature of human behavior, i.e., how the human willreact to the robot’s movements and explicit communication.

Fig. 3 illustrates the three most important components of ourmodel. On a high level, we model human navigation behavioras a joint mixture of distribution over the trajectory, similarto [4]:

p(ξ1:Nh ) =∑ψ

pψ(ξ1:Nh )p(ψ) (4)

The first part of the mixture, pψ(ξ1:Nh ), models the distributionof the continues state-actions given a specific behavior modeψ. The second part, p(ψ), models the distribution of thediscrete behavior mode, ψ. The behavior mode reflects theunderlying intent of the human, and is affected by boththe robot’s implicit communication (motion) ξr and explicitcommunication uc.

The idea of discrete behavior modes or classes of actionsis sometimes associated with homotopy classes of trajecto-ries [4], [48]. Here we do not enforce the trajectories given

a specific value of ψ to belong to one homotopy class.Instead, we associate each value of ψ with an underlyingreward function Rψh that the human optimizes. With an inversereinforcement learning framework, our model assigns highprobabilities to trajectories that comply with the behaviormode (one homotopy that corresponds to the underlying intentof the human), and low but non-zero probabilities to trajec-tories in other homotopy classes. This representation is moregeneralizable than homotopy, as different behavior modes arenot necessarily different in homotopy in other scenarios. Forexample, tailgating versus keeping safe distances are twobehavior modes in driving, and they cannot be differentiatedby homotopy. However, they can be characterized by differentcost functions that drivers may optimize.

For computational efficiency, we use relatively simple mod-els for human and robot dynamics. The human state sth =[xh, yh, xh, yh]

ᵀ consists of positions and velocities in the 2Dplane, and the robot state str = [xr, yr, θr]

ᵀ is simply its pose.We use a constant acceleration model for human dynamics.Thus, the human control input is uth = [xh, yh]

ᵀ. For the robotdynamics, we use a differential drive kinematic model, withcontrol input utr = [v, ω]ᵀ, consisting of linear and angularvelocities.

The formulation in Eq. (4) describes our modeling approachin its most general form. In our specific setup, the distributionp(ξ1:Nh ) should be conditioned on the history of humanand robot trajectories, history of explicit communication,and future robot actions over the prediction horizon. Tonumerically compute the distribution and use it for planning,we make two assumptions:

Assumption 1. The trajectory of the human ξ1:Nh over theprediction horizon depends on the environment, E, (includingthe goal of the human) and the future robot trajectory ξ1:Nr :

pψ(ξ1:Nh |ξ1:Nr , E) (5)

Here actions of the human depend on the robot’s actions.However, the robot also needs to predict actions of thehuman and plan accordingly. This inter-dependency leads tothe infinite regress problem, i.e., a sequence of reasoningsabout each others’ actions that can never come to an end. Toresolve this problem, we assume that the human has access tothe future robot trajectory, and model the interaction betweenthe human and the robot as a two-player Stackelberg game(leader-follower game). We argue that this assumption isreasonable: given relatively short planning horizon, humansare usually able to predict immediate actions of other agents.Since the focus of this work is the interaction between therobot and the human, the goals and environment are assumedto be known and fixed.

Assumption 2. We assume that the behavior mode is onlyaffected by past information. With this, we can express thesecond term in Eq. (4) as:

p(ψ|ξ−N :0r , ξ−N :0

h , u−N :0c ) (6)

where ξ−N :0r , ξ−N :0

h are the past trajectories of the robot andthe human, and u−N :0

c is the past explicit communications

5

Left Right Left Right

I’ll pass on the left side!

p = left(⇠1:Nh )

<latexit sha1_base64="i7UJR2x+k2TtbPxycojfhlMRR9s=">AAACD3icbVDJSgNBEO2JW4xb1KOXwaDES5gRQRGEoBdPEsEskIlDT6cmaexZ6K4RwzB/4MVf8eJBEa9evfk3dpaDJj6o4vFeFd31vFhwhZb1beTm5hcWl/LLhZXVtfWN4uZWQ0WJZFBnkYhky6MKBA+hjhwFtGIJNPAENL27i6HfvAepeBTe4CCGTkB7Ifc5o6glt7gfu6kTK342bK6D8ICpAB+zrOw8cLd/m9qnV9mBWyxZFWsEc5bYE1IiE9Tc4pfTjVgSQIhMUKXathVjJ6USOROQFZxEQUzZHe1BW9OQBqA66eiezNzTStf0I6krRHOk/t5IaaDUIPD0ZECxr6a9ofif107QP+mkPIwThJCNH/ITYWJkDsMxu1wCQzHQhDLJ9V9N1qeSMtQRFnQI9vTJs6RxWLGtin19VKqeT+LIkx2yS8rEJsekSi5JjdQJI4/kmbySN+PJeDHejY/xaM6Y7GyTPzA+fwCoNJ0C</latexit><latexit sha1_base64="i7UJR2x+k2TtbPxycojfhlMRR9s=">AAACD3icbVDJSgNBEO2JW4xb1KOXwaDES5gRQRGEoBdPEsEskIlDT6cmaexZ6K4RwzB/4MVf8eJBEa9evfk3dpaDJj6o4vFeFd31vFhwhZb1beTm5hcWl/LLhZXVtfWN4uZWQ0WJZFBnkYhky6MKBA+hjhwFtGIJNPAENL27i6HfvAepeBTe4CCGTkB7Ifc5o6glt7gfu6kTK342bK6D8ICpAB+zrOw8cLd/m9qnV9mBWyxZFWsEc5bYE1IiE9Tc4pfTjVgSQIhMUKXathVjJ6USOROQFZxEQUzZHe1BW9OQBqA66eiezNzTStf0I6krRHOk/t5IaaDUIPD0ZECxr6a9ofif107QP+mkPIwThJCNH/ITYWJkDsMxu1wCQzHQhDLJ9V9N1qeSMtQRFnQI9vTJs6RxWLGtin19VKqeT+LIkx2yS8rEJsekSi5JjdQJI4/kmbySN+PJeDHejY/xaM6Y7GyTPzA+fwCoNJ0C</latexit><latexit sha1_base64="i7UJR2x+k2TtbPxycojfhlMRR9s=">AAACD3icbVDJSgNBEO2JW4xb1KOXwaDES5gRQRGEoBdPEsEskIlDT6cmaexZ6K4RwzB/4MVf8eJBEa9evfk3dpaDJj6o4vFeFd31vFhwhZb1beTm5hcWl/LLhZXVtfWN4uZWQ0WJZFBnkYhky6MKBA+hjhwFtGIJNPAENL27i6HfvAepeBTe4CCGTkB7Ifc5o6glt7gfu6kTK342bK6D8ICpAB+zrOw8cLd/m9qnV9mBWyxZFWsEc5bYE1IiE9Tc4pfTjVgSQIhMUKXathVjJ6USOROQFZxEQUzZHe1BW9OQBqA66eiezNzTStf0I6krRHOk/t5IaaDUIPD0ZECxr6a9ofif107QP+mkPIwThJCNH/ITYWJkDsMxu1wCQzHQhDLJ9V9N1qeSMtQRFnQI9vTJs6RxWLGtin19VKqeT+LIkx2yS8rEJsekSi5JjdQJI4/kmbySN+PJeDHejY/xaM6Y7GyTPzA+fwCoNJ0C</latexit>

p(✓|⇠r)<latexit sha1_base64="Z0y/1y0tFsu3+zLGTuKLhLkrODw=">AAAB+HicbVBNS8NAEN34WetHox69LBahXkoigh6LXjxWsB/QhLDZbtqlm03YnYg19pd48aCIV3+KN/+N2zYHbX0w8Hhvhpl5YSq4Bsf5tlZW19Y3Nktb5e2d3b2KvX/Q1kmmKGvRRCSqGxLNBJesBRwE66aKkTgUrBOOrqd+554pzRN5B+OU+TEZSB5xSsBIgV1Jax4MGZAn74EH6jSwq07dmQEvE7cgVVSgGdhfXj+hWcwkUEG07rlOCn5OFHAq2KTsZZqlhI7IgPUMlSRm2s9nh0/wiVH6OEqUKQl4pv6eyEms9TgOTWdMYKgXvan4n9fLILr0cy7TDJik80VRJjAkeJoC7nPFKIixIYQqbm7FdEgUoWCyKpsQ3MWXl0n7rO46dff2vNq4KuIooSN0jGrIRReogW5QE7UQRRl6Rq/ozXq0Xqx362PeumIVM4foD6zPH4iskv4=</latexit><latexit sha1_base64="Z0y/1y0tFsu3+zLGTuKLhLkrODw=">AAAB+HicbVBNS8NAEN34WetHox69LBahXkoigh6LXjxWsB/QhLDZbtqlm03YnYg19pd48aCIV3+KN/+N2zYHbX0w8Hhvhpl5YSq4Bsf5tlZW19Y3Nktb5e2d3b2KvX/Q1kmmKGvRRCSqGxLNBJesBRwE66aKkTgUrBOOrqd+554pzRN5B+OU+TEZSB5xSsBIgV1Jax4MGZAn74EH6jSwq07dmQEvE7cgVVSgGdhfXj+hWcwkUEG07rlOCn5OFHAq2KTsZZqlhI7IgPUMlSRm2s9nh0/wiVH6OEqUKQl4pv6eyEms9TgOTWdMYKgXvan4n9fLILr0cy7TDJik80VRJjAkeJoC7nPFKIixIYQqbm7FdEgUoWCyKpsQ3MWXl0n7rO46dff2vNq4KuIooSN0jGrIRReogW5QE7UQRRl6Rq/ozXq0Xqx362PeumIVM4foD6zPH4iskv4=</latexit><latexit sha1_base64="Z0y/1y0tFsu3+zLGTuKLhLkrODw=">AAAB+HicbVBNS8NAEN34WetHox69LBahXkoigh6LXjxWsB/QhLDZbtqlm03YnYg19pd48aCIV3+KN/+N2zYHbX0w8Hhvhpl5YSq4Bsf5tlZW19Y3Nktb5e2d3b2KvX/Q1kmmKGvRRCSqGxLNBJesBRwE66aKkTgUrBOOrqd+554pzRN5B+OU+TEZSB5xSsBIgV1Jax4MGZAn74EH6jSwq07dmQEvE7cgVVSgGdhfXj+hWcwkUEG07rlOCn5OFHAq2KTsZZqlhI7IgPUMlSRm2s9nh0/wiVH6OEqUKQl4pv6eyEms9TgOTWdMYKgXvan4n9fLILr0cy7TDJik80VRJjAkeJoC7nPFKIixIYQqbm7FdEgUoWCyKpsQ3MWXl0n7rO46dff2vNq4KuIooSN0jGrIRReogW5QE7UQRRl6Rq/ozXq0Xqx362PeumIVM4foD6zPH4iskv4=</latexit>

p(✓|uc, tc)<latexit sha1_base64="m760quRERbBN4TLOm0KsCHRlWI8=">AAACBXicbVDLSsNAFJ3UV62vqEtdBItQQUoigi6LblxWsA9oQphMJ+3QySTM3AglduPGX3HjQhG3/oM7/8ZJm4W2Hhg4nHPuzNwTJJwpsO1vo7S0vLK6Vl6vbGxube+Yu3ttFaeS0BaJeSy7AVaUM0FbwIDTbiIpjgJOO8HoOvc791QqFos7GCfUi/BAsJARDFryzcOk5sKQAn5wY53Lr8lSn0xOwScnvlm16/YU1iJxClJFBZq++eX2Y5JGVADhWKmeYyfgZVgCI5xOKm6qaILJCA9oT1OBI6q8bLrFxDrWSt8KY6mPAGuq/p7IcKTUOAp0MsIwVPNeLv7n9VIIL72MiSQFKsjsoTDlFsRWXonVZ5IS4GNNMJFM/9UiQywxAV1cRZfgzK+8SNpndceuO7fn1cZVUUcZHaAjVEMOukANdIOaqIUIekTP6BW9GU/Gi/FufMyiJaOY2Ud/YHz+ANSqmMM=</latexit><latexit sha1_base64="m760quRERbBN4TLOm0KsCHRlWI8=">AAACBXicbVDLSsNAFJ3UV62vqEtdBItQQUoigi6LblxWsA9oQphMJ+3QySTM3AglduPGX3HjQhG3/oM7/8ZJm4W2Hhg4nHPuzNwTJJwpsO1vo7S0vLK6Vl6vbGxube+Yu3ttFaeS0BaJeSy7AVaUM0FbwIDTbiIpjgJOO8HoOvc791QqFos7GCfUi/BAsJARDFryzcOk5sKQAn5wY53Lr8lSn0xOwScnvlm16/YU1iJxClJFBZq++eX2Y5JGVADhWKmeYyfgZVgCI5xOKm6qaILJCA9oT1OBI6q8bLrFxDrWSt8KY6mPAGuq/p7IcKTUOAp0MsIwVPNeLv7n9VIIL72MiSQFKsjsoTDlFsRWXonVZ5IS4GNNMJFM/9UiQywxAV1cRZfgzK+8SNpndceuO7fn1cZVUUcZHaAjVEMOukANdIOaqIUIekTP6BW9GU/Gi/FufMyiJaOY2Ud/YHz+ANSqmMM=</latexit><latexit sha1_base64="m760quRERbBN4TLOm0KsCHRlWI8=">AAACBXicbVDLSsNAFJ3UV62vqEtdBItQQUoigi6LblxWsA9oQphMJ+3QySTM3AglduPGX3HjQhG3/oM7/8ZJm4W2Hhg4nHPuzNwTJJwpsO1vo7S0vLK6Vl6vbGxube+Yu3ttFaeS0BaJeSy7AVaUM0FbwIDTbiIpjgJOO8HoOvc791QqFos7GCfUi/BAsJARDFryzcOk5sKQAn5wY53Lr8lSn0xOwScnvlm16/YU1iJxClJFBZq++eX2Y5JGVADhWKmeYyfgZVgCI5xOKm6qaILJCA9oT1OBI6q8bLrFxDrWSt8KY6mPAGuq/p7IcKTUOAp0MsIwVPNeLv7n9VIIL72MiSQFKsjsoTDlFsRWXonVZ5IS4GNNMJFM/9UiQywxAV1cRZfgzK+8SNpndceuO7fn1cZVUUcZHaAjVEMOukANdIOaqIUIekTP6BW9GU/Gi/FufMyiJaOY2Ud/YHz+ANSqmMM=</latexit>

(a) (b) (c)

Fig. 3. Overview of different components of the model. (a) We model the distribution of human trajectories given their behavior mode ψ. Each behavior modeis associated with an underlying intent. In this example, trajectories that pass the robot on the left have much higher probabilities (thicker lines), because theunderlying intent is to pass on the left. (b) The robot’s implicit communication (motion) affects the belief of the human over the robot’s intent, and thereforeaffects the behavior mode of the human. (c) The robot’s explicit communication also has strong effects on the belief and behavior mode of the human.

over a horizon of length N . We further assume that onlythe most recent explicit communication actually affects thebehavior mode:

p(ψ|ξ−N :0r , ξ−N :0

h , uc, tc) (7)

Here uc represents the most recent explicit communication,and tc represents the time when uc is communicated. Notethat at each time step, the robot can decide to not performany explicit communication (uc = “none”). The most recentcommunication refers to the most recent uc 6= “none”, e.g.,the last time haptic feedback was provided.

In the next two subsections, we discuss details of modelinghumans’ continuous navigation actions as in Eq. (5) and thehumans’ discrete behavior mode as in Eq. (7).

A. Modeling Continuous Navigation Actions of a Human

Inverse Reinforcement Learning. We employ a data-drivenapproach to model humans’ continuous navigation actions.Specifically, we use maximum entropy inverse reinforcementlearning (MaxEnt IRL) [25], [27] to learn the distribution ofhuman actions that matches a set of provided demonstrationsin expectation. We assume the humans are approximatelyoptimizing a reward function Rψh (ξ

1:Nh , ξ1:Nr , E) for a given

discrete class ψ. Under this model, the probability of theactions uh is proportional to the exponential of the total rewardof the trajectory (Z is a normalization vector):

pψ(ξ1:Nh |ξ1:Nr , E) =

1

Zexp

(Rψh (ξ

1:Nh , ξ1:Nr , E)

)=

1

Zexp

(wψ · f(ξ1:Nh , ξ1:Nr , E)

) (8)

We parameterize the human reward function as a linearcombination of features f that capture relevant properties ofthe navigation behavior. The features are weighted by wψ ,which can be learned by maximizing the overall likelihood ofthe provided demonstrations D:

argmaxw

∏uh∈D

pw(ξ1:Nh |ξ1:Nr , E) (9)

The weight vector wψ is a function of the behavior mode:ψ. Given a selected set of features such as distance tothe other agents, heading, or velocity, we learn appropriateweights corresponding to the humans’ reward functions foreach discrete class ψ from collected demonstrations. We willdiscuss the specific features used in this work in Sec. VI.

B. Modeling Behavior Modes of a Human

When modeling the distribution of behavior modes of ahuman, we consider the effect of both implicit communication(via robot movements) and explicit communication (via hapticfeedback). We assume that the human will infer the robot’sintent, and act cooperatively during the interaction. Mathemat-ically, this suggests that the distribution of behavior modes ofthe human is related to her belief over the robot’s intent, θ.In our particular social navigation problem, we assume thisbelief is equal to probability of choosing a behavior mode:

p(ψ|ξ−N :0r , ξ−N :0

h , uc, tc) = p(θ|ξ−N :0r , ξ−N :0

h , uc, tc) (10)

In other words, if the human believes that there is an 80%chance that the robot will yield priority, she will behave in“human priority” mode with the same probability of 0.8. Non-linear relationships between p(ψ) and p(θ) could also be usedto model risk-averse (more willing to yield priority) and risk-seeking behaviors.

Using Bayes rule, we can transform Eq. (10) to:

p(θ|ξ−N :0r , ξ−N :0

h , uc, tc)

∝ p(ξ−N :0r , uc, tc|θ, ξ−N :0

h ) · p(θ|ξ−N :0h )

= p(ξ−N :0r |θ, ξ−N :0

h ) · p(uc, tc|θ) · p(θ)(11)

The second step in Eq. (11) assumes conditional indepen-dence of the robot trajectory ξ−N :0

r and explicit communica-tion uc. With this factorization, we can separately model theeffect of robot motion (implicit communication) and explicitcommunication on behavior modes of the human. The last termp(θ) is the prior on the robot’s intent, which should be decidedbased on the application. In our implementation, there is anequal chance for the robot communicates different intents. Sowe choose a uniform prior.

The formulation casts the backward inference problem(from action to intent) into forward prediction problems (fromintent to action). Next, we derive methods to compute eachpart of Eq. (11).

Implicit Communication. We assume that the human ingeneral expects the robot to be rational and efficient accordingto some reward function. Applying the principle of maximum

6

Time (s)

p(✓

=✓ 1

|uc,t

c)

<latexit sha1_base64="L8AFZ8LSJxmo0Xv0Hi37Q6mGobQ=">AAACE3icbVDLSgMxFM3UV62vUZdugkWoImVGBN0IRTcuK9gHdIYhk2ba0MyD5I5Qxv6DG3/FjQtF3Lpx59+YaWeh1QOBwzn3JLnHTwRXYFlfRmlhcWl5pbxaWVvf2Nwyt3faKk4lZS0ai1h2faKY4BFrAQfBuolkJPQF6/ijq9zv3DGpeBzdwjhhbkgGEQ84JaAlzzxKag4MGRB8gWfEs/E9dmIdyu/MUo9OjjF49NAzq1bdmgL/JXZBqqhA0zM/nX5M05BFQAVRqmdbCbgZkcCpYJOKkyqWEDoiA9bTNCIhU2423WmCD7TSx0Es9YkAT9WfiYyESo1DX0+GBIZq3svF/7xeCsG5m/EoSYFFdPZQkAoMMc4Lwn0uGQUx1oRQyfVfMR0SSSjoGiu6BHt+5b+kfVK3rbp9c1ptXBZ1lNEe2kc1ZKMz1EDXqIlaiKIH9IRe0KvxaDwbb8b7bLRkFJld9AvGxzdUg50u</latexit><latexit sha1_base64="L8AFZ8LSJxmo0Xv0Hi37Q6mGobQ=">AAACE3icbVDLSgMxFM3UV62vUZdugkWoImVGBN0IRTcuK9gHdIYhk2ba0MyD5I5Qxv6DG3/FjQtF3Lpx59+YaWeh1QOBwzn3JLnHTwRXYFlfRmlhcWl5pbxaWVvf2Nwyt3faKk4lZS0ai1h2faKY4BFrAQfBuolkJPQF6/ijq9zv3DGpeBzdwjhhbkgGEQ84JaAlzzxKag4MGRB8gWfEs/E9dmIdyu/MUo9OjjF49NAzq1bdmgL/JXZBqqhA0zM/nX5M05BFQAVRqmdbCbgZkcCpYJOKkyqWEDoiA9bTNCIhU2423WmCD7TSx0Es9YkAT9WfiYyESo1DX0+GBIZq3svF/7xeCsG5m/EoSYFFdPZQkAoMMc4Lwn0uGQUx1oRQyfVfMR0SSSjoGiu6BHt+5b+kfVK3rbp9c1ptXBZ1lNEe2kc1ZKMz1EDXqIlaiKIH9IRe0KvxaDwbb8b7bLRkFJld9AvGxzdUg50u</latexit><latexit sha1_base64="L8AFZ8LSJxmo0Xv0Hi37Q6mGobQ=">AAACE3icbVDLSgMxFM3UV62vUZdugkWoImVGBN0IRTcuK9gHdIYhk2ba0MyD5I5Qxv6DG3/FjQtF3Lpx59+YaWeh1QOBwzn3JLnHTwRXYFlfRmlhcWl5pbxaWVvf2Nwyt3faKk4lZS0ai1h2faKY4BFrAQfBuolkJPQF6/ijq9zv3DGpeBzdwjhhbkgGEQ84JaAlzzxKag4MGRB8gWfEs/E9dmIdyu/MUo9OjjF49NAzq1bdmgL/JXZBqqhA0zM/nX5M05BFQAVRqmdbCbgZkcCpYJOKkyqWEDoiA9bTNCIhU2423WmCD7TSx0Es9YkAT9WfiYyESo1DX0+GBIZq3svF/7xeCsG5m/EoSYFFdPZQkAoMMc4Lwn0uGQUx1oRQyfVfMR0SSSjoGiu6BHt+5b+kfVK3rbp9c1ptXBZ1lNEe2kc1ZKMz1EDXqIlaiKIH9IRe0KvxaDwbb8b7bLRkFJld9AvGxzdUg50u</latexit>

Fig. 4. Inference on the robot’s intent based on explicit communication.Assuming that uc = θ1 happened at tc = 0

entropy, we model the human as expecting robot movementswith probability:

p(ξ−N :0r |θ, ξ−N :0

h ) =exp

(Rθr(ξ

−N :0r , ξ−N :0

h ))∫

ξ′rexp

(Rθr(ξ

′r, ξ−N :0h )

)dξ′r

(12)

where Rθr(ξ−N :0r , ξ−N :0

h ) is a reward function that the humanexpects the robot to optimize, given its intent θ. To computethe integration in the denominator of Eq. (12), we use thesecond-order Taylor series to approximate Rθr(ξ

−N :0r , ξ−N :0

h ).

Explicit Communication. The belief of the human over therobot’s intent is strongly affected by explicit communication,because the intent is directly conveyed. However, the effectof explicit communication should decay over time, as therobot’s intent may change, and only short-term intents arecommunicated. Inspired by a model of human short-termverbal retention [8], we propose:

p(uc, tc|θ) =1

Z

{A exp

(− t−tcM

)+ 1 uc = θ

1 uc 6= θ(13)

Here A and M are parameters that determine the characteristicof the distribution, and Z is a normalization factor. The explicitcommunication initially reflects the true intent with very highprobability. However, the inference strength decays over time,and eventually the communication becomes irrelevant to therobot’s latest intent.

To clarify this model, let us consider a simpler version ofEq. (10): P (θ|uc, tc) – to infer the robot’s intent with onlyexplicit communication. Assume that θ ∈ {θ1, θ2} is binary,uc = θ1, and tc = 0, and use Bayes rule again:

p(θ = θ1|uc, tc) =p(uc, tc|θ1)p(θ1)

p(uc, tc|θ1)p(θ1) + p(uc, tc|θ2)p(θ2)

=A exp (−(t− tc)/M) + 1

A exp (−(t− tc)/M) + 2(14)

The change of p(θ = θ1|uc, tc) over time is plotted inFig. 4. Initially, the robot’s intent can be inferred with highprobability given the explicit communication. As time passes,the inference strength decreases and eventually the belief overthe robot’s intent becomes uniform (0.5).

In the scenarios we consider, the human infers the robot’sintent with both implicit and explicit communication. The

combined effect can be computed with Eqs. (11) – (13) givenin this section.

V. PLANNING FOR COMMUNICATION

The general planning framework was described in Sec-tion III. In this section, we discuss the details of the algorithm,including the design of reward functions, derivation of thesolution to the optimization, and an outline of our implemen-tation.

A. Robot Reward Function

The overall reward Rr that robot optimizes in Eq. (3)consists of four parts that quantify robot efficiency, humancomfort, safety, and reward of explicit communication:Robot Efficiency. The robot should get as close as possibleto its target with minimum effort:

Rr,r(ξ1:Nr ) =

∥∥xNr − xgr∥∥+ γr

N∑t=1

∥∥utr∥∥2 , (15)

where γr is a weight that balances effort and goal. Thiscomponent determines how fast the robot approaches the goal.xNr is the robot position at time step N , and xgr is the goalposition. ‖·‖ denotes Euclidean norm.Human Comfort. We want the human to spend less effortto achieve her goal:

Rr,h(ξ1:Nh ) =

∥∥xNh − xgh∥∥+ γh

N∑t=1

∥∥utr∥∥2 . (16)

Similar to γr, γh balances between human reaching her goaland her effort.Safety. The robot should avoid collisions with the human:

Rr,safety(ξ1:Nr , ξ1:Nh ) =

N∑t=1

exp

(−‖xh,t − xr,t‖2

D2r

). (17)

Explicit Communication. Too much explicit communicationwill distract or annoy the human. Therefore, we set a constantreward for performing each explicit communication:

Rr,EC(uc) = γEC · 1(uc 6= “none”) (18)

where 1 is the indicator function.The components defined above are then combined linearlywith weights ci:

Rr(ξ1:Nr , ξ1:Nh , uc) =

4∑i=0

ci ·Rr,i(ξ1:Nr , ξ1:Nh , uc) (19)

B. Human-Aware Planning

To solve for the optimal robot actions using Eq. (3), weneed to compute the expectation of the robot reward over thedistribution of uh. In the last section, we derived a hybridmodel for human behavior. To speed up computation, here weonly consider the most likely trajectory given each possiblehuman intent. With this, the expectation in Eq. (3) can beexpressed as:

Eξ1:Nh[Rr] =

∑ψ

p(ψ) ·Rr(ξ1:Nr , ξψh , uc), (20)

7

where ξψh is the most likely human response given ψ, and canbe computed as:

ξψh = argmaxξ1:Nh

wψ · f(ξ1:Nh , ξ1:Nr ) (21)

The distribution p(ψ) is given by Eqs. (10) and (11).We use gradient-based optimization to solve the optimal

robot controls u∗r . This requires the gradient information ofthe expected reward in Eq. (20). Letting Rr = Eξ1:Nh

[Rr], weaim to find:

∂Rr∂ur

=∂Rr∂ur

+∑ψ

∂Rr∂uh

∂uψh∂ur

, (22)

where uψh is the most likely human actions. As we have asymbolic representation of Rr, both ∂Rr

∂urand ∂Rr

∂uhcan be

computed analytically. Following the implicit differentiationmethod discussed in [32], we compute the last unknown termusing the following expression:

∂uψh∂ur

=

[∂2Rψh∂u2

h

]−1 [− ∂2Rψh∂uh∂ur

](23)

We have assumed a discrete set of communication actionsUc ∪ “none”. To obtain the optimal explicit communicationu∗c , we enumerate all possible communication actions uc ∈Uc ∪ “none” and solve the optimization for each uc. This canbe done in parallel because the optimizations do not dependon each other.

The planning algorithm is outlined in Alg. 1. At everytime step, the algorithm first retrieves states of the robot andthe human, and performs a one step prediction of robot andhuman states in order to compensate time spent for planning.Then it updates the belief over the behavior mode of thehuman ψ with new observations using equations (10) and (11).Before optimizing for robot actions, the algorithm needs togenerate an initial guess for the initial state of the human.If there is a plan from the previous time step, the plan isused as the initial guess. When there is no previous plan(first time step, or first detection of human), we generate theinitial guesses as follows: First, we compute the robot actionsignoring the human. We use a feedback control based policyto steer the robot towards its goal [49]. We then computethe human actions using an attracting potential field at thegoal position, and a repelling potential field centered at therobot. The algorithm then performs a set of optimizations,one for each explicit communication action. Taking advantageof modern multi-core CPUs, we can run these optimizationsin parallel. Finally, the best explicit communication u∗c androbot movements (implicit communication) u∗r are selectedand executed.

We implemented the planning algorithm in C++, and usedthe software package NLopt [50] to perform numerical opti-mization. In our implementation, we chose a planning horizonof N = 6, and a time step of 0.5 seconds.

VI. SIMULATION RESULTS

In order to validate the proposed modeling and planningframework, we first collected human demonstrations and con-ducted a simulation experiment. In this section, we describe

Algorithm 1: Outline of the planning algorithm.Data: Uc - set of explicit communicative actions

1 repeat/* get current+predicted states */

2 str, utr, s

th ← GetTrackingInfo();

3 str, sth ← PredictOneStep(str, s

th);

/* update the hybrid model */4 p(ψ)← UpdateModel(p(ψ), str, u

tr, s

th, u′c, tc);

/* find the optimal actions */5 u0

r,u0h ←

GenerateInitGuess(str, sth, xg,r, xg,h);

6 for uc ∈ Uc ∪ {null} do in parallel7 ur(uc), Rr(uc)←

ComputePlan(u0r,u

0h, s

tr, s

th, p(ψ), uc);

8 end9 u∗c ← argmaxuc

Rr(uc) ;10 u∗r = ur(u

∗c);

11 until GoalReached(xtr, xg,r, ε);

/* execute the actions */12 if u∗c 6= null then13 Communicate(u∗c);14 u′c ← u∗c ;15 tc ← t;16 end17 ExecuteControl(u∗r);

18 t← t+ 1;

the specific scenario for our experiment, validation of thehuman model, and simulation results.

A. Social Navigation Scenario

We consider the scenario shown in Fig. 1 for our simulationand experimental evaluation: a human and a robot movein (approximately) orthogonal directions and encounter eachother. In this scenario, they have to coordinate with each otherand decide who should pass first to avoid collision. We chosethis scenario over the face-to-face encounter scenario becausein the face-to-face encounter, there is a relatively strong socialnorm that people pass each other on the right hand side. Weprefer to start with a scenario with no such bias to allowus better focus on the effect of communication. The robotcan explicitly communicate its intent: to yield priority to thehuman (human priority) or not (robot priority). To numericallyevaluate the model and compute the optimal plan, we need todefine features leading to a learned reward function, which canthen be used to model humans’ continuous actions Eq. (8) aswell as behavior modes Eq. (12) of the human.

We select features that are commonly used to characterizesocial navigation of pedestrians [51]:Velocity. Pedestrians tend to maintain a constant desired ve-locity. We use a feature that sums up the squared velocityalong the trajectory:

fvelocity =

N∑t=1

∥∥xth∥∥2 (24)

8

Human

Robot Target

Obstacle

Fig. 5. Visualization of (a subset of) the features. Warm color indicates highreward and cool color indicates low reward. Human trajectory is in black, androbot trajectory is in red. Arrows indicate the positions and moving directionsat the specified time step.

Here, and in the following equations, || · || represents L2 norm.Acceleration. To navigate efficiently, pedestrians usuallyavoid unnecessary accelerations:

facceleration =

N∑t=1

∥∥xth∥∥2 =

N∑t=1

∥∥uth∥∥2 (25)

Distance to goal. Pedestrians typically try to get to the targetposition as close as possible given a time horizon:

fgoal =∥∥xNh − xgoal

∥∥ (26)

Avoiding static obstacles. Pedestrians avoid static obstaclesin the environment:

fobstacle =

N∑t=1

exp

(−‖x

th − otclosest‖

2

D2o

)(27)

where oclosest,t is the position of the closest obstacle to thehuman at time t, and Do is a scaling factor.Collision avoidance with the robot. Pedestrians tend toavoid each other in navigation. We assume that they avoidthe robot in a similar fashion:

favoidance =

N∑t=1

exp

(−‖x

th − xtr‖

2

D2r

)(28)

Avoiding the front side of the robot. In addition to avoid-ing the robot, we observe that humans tend to not cut in frontof the robot, especially when they think that the robot haspriority. This behavior is captured by the feature:

ffront =

N∑t=1

exp(−(xth − xtf )ᵀD−1f (xth − xtf )

)(29)

where xtf is a position in front of the robot, and Df scalesthe Gaussian and aligns it with the robot’s orientation.

The features (except for velocity and acceleration) arevisualized in Fig. 5. Human and robot trajectories are from ademonstration we collected to train the IRL model. The figureshows that the human indeed avoided low reward regions (coolcolor) and navigated to the high reward region (warm color).

Rewards for the Behavior Mode Model: In general, it ischallenging to to define the expected reward function of thehuman Rψh (ξ

−N :0r , ξ−N :0

h ), as it is not measurable. Here wedesign two reward functions based on our observations. Datadriven methods like IRL can potentially produce more accurateresults. Our goal, however, is to derive an approximate modelthat can inform the planning algorithm. Therefore, we preferto start with simpler designs:Human Priority (ψ = ψh). When the robot’s intent is toyield priority to the human, it should avoid moving directlytowards the human and getting too close. We use a rewardfunction:

Rψh

h = −0∑

t=−N

nt · (xtr − xth)max(‖xtr − xth‖

2, 1.0)

(30)

where nt is a unit vector that points from the robot to thehuman: nt = (xtr − xth)/‖xtr − xth‖. Note that here t ≤ 0means that the summation is over the previous time steps.Robot Priority (ψ = ψr). When the robot has priority,the human would expect it to move with a desired velocity,regardless of the state of the human:

Rψr

h = −0∑

t=−N

|vtr − vd|max(‖xtr − xth‖

2, 1.0)

(31)

Here vtr is the robot’s speed and vd is the desired speed.In Eqs. (30) and (31), the reward at each time step is divided

by max(‖xtr − xth‖2, 1.0) because when the robot is relatively

far from the user, its actions carries less information about itsintent. We use the max() operation to achieve numericallystable results.

B. Human Model Evaluation

Data Collection for Learning Human Model. We collectednavigation demonstrations from four users. Each user wasasked to walk back and forth between two target locations(setup is similar to that of Section VII). At the same time, amobile robot (TurtleBot2, Open Source Robotics Foundation,Inc.) also moved between a set of target positions in theenvironment. The target positions were selected so that therobot would cross path with the user. The user was allowed towalk freely as in normal human environment, while avoidingcollision with the robot.

We collected data for two scenarios: the human priorityscenario and the robot priority scenario. When the user hadpriority, the robot would slow down and yield to the user uponencounter (we decreased the maximum allowable speed of therobot as it got close to the user). When the robot had priority,it would simply ignore the user. In each condition, the userwas told the intent of the robot, so the user always knew therobot’s true intent. To distract the user from focusing solelyon the robot, we asked the user to remember two numbers atone target location, and answer arithmetic problems using thetwo numbers at the other target location.

For each user, we recorded 64 trials for each scenario (atrial is moving from one target location to the other). We used80% of the data to train the model, and the other 20% as thetesting set.

9

(d) (e)(c)(b)(a)

Fig. 6. Sample predictions using the learned reward functions and cross validation. (a)-(b) show an example where the prediction matches the actualmeasurement. (c)-(d) show an example where initially the prediction doesn’t match the measurement. We use the model to re-predict human actions at eachtime step, and the prediction starts to match measurement after two seconds. (e) Cross validation of prediction accuracy. Type I corresponds to cases wherethe model initially predicts the same homotopy class as the demonstration ((a), (b)), and type II corresponds to cases where the model initially predicts adifferent homotopy class ((c), (d)). Blue indicates errors of the IRL model, and orange indicates errors of the social force (SF) model.

Prediction with Learned Reward Function. Using MaxEntIRL, we recover the human reward functions in human priorityand robot priority scenarios. The reward functions can be usedto compute the most likely continuous navigation actions usingEq. (21). This computes the actions over a fixed time horizon.To predict the entire trajectory, we use an MPC approach:we compute the most likely actions starting from time stepk, predict the human state at k + 1, and recompute the mostlikely actions starting from k + 1.

Fig. 6 presents two prediction examples. In the first example((a)-(b)), the predicted trajectory matches the measured trajec-tory very well (same homotopy class, type I). In the secondexample ((c)-(d)), the model initially predicts a trajectory thatpasses the obstacle on a different side (different homotopyclass, type II). Although the prediction does not match themeasured trajectory, it is still reasonable. In this example, boththe predicted and measured trajectories pass behind the robot.Passing on either side of the obstacle has little effect on thereward and is almost equally likely. When predicting at t = 2.0s, because the human already started to move toward one sideof the obstacle, the model is able to generate a more accurateprediction.

The cross validation result is shown in Fig. 6(e). Theprediction error is calculated as the average Euclidean distancebetween the predicted and measured trajectories:

e =1

N ′

N ′∑t=1

∥∥xtpred − xtmeas

∥∥ (32)

where N ′ is the total number of time steps. We compute theprediction error separately for cases where the initial predic-tion is in the same homotopy class as the measurement (typeI), and cases where the initial prediction and the measurementare in different homotopy classes (type II). It can be observedthat the initial prediction error is much smaller for type I, butboth types becomes more accurate as the human gets closerto the goal. Compared with a social force model we usedpreviously [20], the model described here performed betterfor this scenario.Evaluation of the Behavior Mode Model. In addition tocontinuous trajectories, our model can also predict behaviormodes, or equivalently, the belief of the human over the

robot’s intent. As it is impossible to measure this belief in anexperiment, we aim to show that the prediction is reasonablewith a few test scenarios.

Fig. 7(a) presents a scenario where the robot slowed downto let the user pass first. The scenario is illustrated withthe three plots in the top row, showing trajectories and theenvironment at different time steps. The data is from onedemonstration we collected for training the IRL model. Theplot in the bottom row shows the predicted belief p(θ = θh)over time, given no explicit communication, and two differentexplicit communication actions at tc = 0.5 s: communicatinghuman priority (uc = θh) and communicating robot priority(uc = θr). In the case of no communication, the beliefdecreases initially, but rises up as the robot stops to yield to theuser. This result matches the intuition that the user can inferthe robot’s intent (human priority) to some extent by observingits movement (slowing down). When the robot communicateshuman priority at the beginning, the belief jumps to a highvalue and maintains this value afterwards. An interesting testscenario is when the robot communicates robot priority, whichis not its true intent. According to our model, initially thebelief drops, indicating that the user believes the explicitcommunication. However, when the robot slows down to letthe user pass first, the belief starts to increase, indicating thatthe user starts to believe otherwise as she observes the robotmovements.

We also tested scenarios where the robot’s intent is to giveitself priority. An example is shown in Fig. 7(b). Similarly,we show the change in belief given different explicit commu-nication in the bottom plot. These examples demonstrate thatour model can capture the effect of both explicit and implicitcommunication, and the prediction matches our intuition.

C. Case Study in SimulationBefore conducting a user study with a physical mobile

robot (described Section VII), we first tested our algorithmin simulation. The purpose of the simulation is two-fold:first to validate that the proposed algorithm can work inidea setups, and second to select appropriate parameters thatproduce reasonable behaviors.

Fig. 8(a) illustrates a simulated scenario where the humanand the robot move along orthogonal paths. Here the simulated

10

Time (s)

Communicating human priority

No communicationCommunicating robot priority

Static Map

Obstacle

Time (s)

(a) (b)P

(✓=✓ h

)<latexit sha1_base64="6fHlXgOf11NwLuQY8L81/y57PxM=">AAAB+3icbVDLSsNAFJ3UV62vWJduBotQNyURQTdC0Y3LCvYBbQiT6aQZOpmEmRuxhP6KGxeKuPVH3Pk3TtsstPXA5R7OuZe5c4JUcA2O822V1tY3NrfK25Wd3b39A/uw2tFJpihr00QkqhcQzQSXrA0cBOulipE4EKwbjG9nfveRKc0T+QCTlHkxGUkeckrASL5dbdUHEDEg14vmR2e+XXMazhx4lbgFqaECLd/+GgwTmsVMAhVE677rpODlRAGngk0rg0yzlNAxGbG+oZLETHv5/PYpPjXKEIeJMiUBz9XfGzmJtZ7EgZmMCUR62ZuJ/3n9DMIrL+cyzYBJungozASGBM+CwEOuGAUxMYRQxc2tmEZEEQomrooJwV3+8irpnDdcp+HeX9SaN0UcZXSMTlAduegSNdEdaqE2ougJPaNX9GZNrRfr3fpYjJasYucI/YH1+QM8D5Po</latexit><latexit sha1_base64="6fHlXgOf11NwLuQY8L81/y57PxM=">AAAB+3icbVDLSsNAFJ3UV62vWJduBotQNyURQTdC0Y3LCvYBbQiT6aQZOpmEmRuxhP6KGxeKuPVH3Pk3TtsstPXA5R7OuZe5c4JUcA2O822V1tY3NrfK25Wd3b39A/uw2tFJpihr00QkqhcQzQSXrA0cBOulipE4EKwbjG9nfveRKc0T+QCTlHkxGUkeckrASL5dbdUHEDEg14vmR2e+XXMazhx4lbgFqaECLd/+GgwTmsVMAhVE677rpODlRAGngk0rg0yzlNAxGbG+oZLETHv5/PYpPjXKEIeJMiUBz9XfGzmJtZ7EgZmMCUR62ZuJ/3n9DMIrL+cyzYBJungozASGBM+CwEOuGAUxMYRQxc2tmEZEEQomrooJwV3+8irpnDdcp+HeX9SaN0UcZXSMTlAduegSNdEdaqE2ougJPaNX9GZNrRfr3fpYjJasYucI/YH1+QM8D5Po</latexit><latexit sha1_base64="6fHlXgOf11NwLuQY8L81/y57PxM=">AAAB+3icbVDLSsNAFJ3UV62vWJduBotQNyURQTdC0Y3LCvYBbQiT6aQZOpmEmRuxhP6KGxeKuPVH3Pk3TtsstPXA5R7OuZe5c4JUcA2O822V1tY3NrfK25Wd3b39A/uw2tFJpihr00QkqhcQzQSXrA0cBOulipE4EKwbjG9nfveRKc0T+QCTlHkxGUkeckrASL5dbdUHEDEg14vmR2e+XXMazhx4lbgFqaECLd/+GgwTmsVMAhVE677rpODlRAGngk0rg0yzlNAxGbG+oZLETHv5/PYpPjXKEIeJMiUBz9XfGzmJtZ7EgZmMCUR62ZuJ/3n9DMIrL+cyzYBJungozASGBM+CwEOuGAUxMYRQxc2tmEZEEQomrooJwV3+8irpnDdcp+HeX9SaN0UcZXSMTlAduegSNdEdaqE2ougJPaNX9GZNrRfr3fpYjJasYucI/YH1+QM8D5Po</latexit>

Fig. 7. Demonstrations of the behavior mode model in different scenarios. (a) A scenario where the robot slowed down to let the user pass first. Top plotsshow the trajectories of the robot and the user, and a map of the environment at three different time steps. Bottom plot shows the user’s belief over the robot’sintent (human priority) over time, given different explicit communication at t = 0.5 s, predicted by our model. (b) A scenario where the robot didn’t slowdown and passed first.

Time (s)

Velo

city

(m/s

)

predictionsrobot plan

target

robot target

human path

explicit communication happened

explicit communication didn’t happen

(a) (b)

Fig. 8. (a) Simulated scenario using the proposed interactive planner. In this scenario, the robot slows down and explicitly communicates its intent (humanpriority) to the human. Top: trajectory snippets at five individual time steps. Black and red arrows represent the positions and moving directions of the humanand the robot at the given time step. Thin, solid red lines represent the robot’s planned trajectories, thin solid/dashed black lines represent predicted humantrajectories. Bottom: Velocity profile of the robot and the human. (b) Various starting positions of the robot vs. whether explicit communication is used bythe planner for the same scenario. Red region represent the starting positions where explicit communication is used.

human user follows a predefined trajectory, and the trajectoryis deterministic regardless of the robot’s actions. While thisis not realistic, the purpose is to test whether the planner cangenerate reasonable robot behaviors. We will further validatethe effectiveness of the plan with real-world user studies. Thesubplots in the top row shows the scenario at 5 different timesteps, and the subplot in the bottom row shows the speed ofthe user and the robot over time. We can observe that the robotstarts to slow down at t = 1 s, and explicitly communicatesits intent to the user at t = 1.5 s. The robot continues movingslowly to allow the user pass first, and then speeds up towardsits goal.

To understand why the planner does this, we visualize thegenerated plan and the predictions of human movements, asshown in Fig. 8(a). As the robot speeds up and approachesthe intersection, the user becomes unsure of how to avoid the

robot. This is reflected in the second subplot in the first row,where the user’s possible future trajectories diverge. In thissimulation, we set the reward function coefficients ch � cr, sothat the planner cares more about the human user’s efficiencyand comfort. As a result, the planner explicitly communicateshuman priority to the user to minimize the chance that theuser would slow down to yield to the robot.

We tested the planner with various robot starting positionsand target positions. Fig. 8(b) presents the relationship ofrobot starting position and whether explicit communicationis generated by the planner, for one specific target position.We can see that explicit communication is only used if therobot starts within the banded region in red. When the robotstarts too close to the intersection (upper-right grey region),the planner predicts that the user will let the robot pass firstas it is closer to the intersection. When the robot starts too far

11

robot starting positions

target

45°

45°

robot

user

target 2

target 1

(a) (b) (c) (d)vibration motor

Fig. 9. (a) Experimental field. (b-c) Different starting and goal positions for the robot. (d) The wearable haptic interface and the vibration motor.

(bottom-left grey region), the planner predicts that the userwill pass first. Explicit communication is used only when itbecomes ambiguous who should pass first.

We also studied the effect of the coefficients ch and cr inthe reward function in Eq. (19). Setting ch � cr resulted ina submissive robot that would yield to the user when therewas a potential collision. Conversely, setting ch � cr resultedin aggressive robot behaviors. When ch ≈ cr, the robot tendsto be submissive. In our implementation, the robot is usuallyslower than the human, so it is often more efficient to let theuser pass first. Based on the simulation results, we decided toexperimentally test two sets of parameters: ch : cr = 1 : 5as robot-prioritized scenario, and ch : cr = 2 : 1 as human-prioritized scenario.

VII. EXPERIMENTAL RESULTS

Simulation results from the previous section have demon-strated that the planning framework can generate appropriateimplicit and explicit communication actions. This sectionpresents a real-world user study to further test the effectivenessof the proposed planning framework.

A. Experimental Setup

We again used a TurtleBot2 as the mobile robot platform.The robot was equipped with a Hokuyo URG-04LX-UG01laser range finder, and an ASUS Zenbook UX303UB as theonboard computer. The planning algorithm is implementedon a desktop computer with Intel Core i7 processor and16GB RAM. The two computers communicate with each otherthrough a wireless network.

The onboard computer processes the laser range finderreadings to localize the robot and estimate the position andvelocity of nearby pedestrians. We localize the robot in astatic map using laser-based Monte Carlo Localization [52].The robot detects and tracks pedestrians using a leg detectionalgorithm provided by ROS [53]. The algorithm first attemptsto segment leg objects from laser range finder readings witha pre-trained classifier, then pairs nearby legs and associatesthe detection with a tracker for each person. The localizationand tracking results are sent to the desktop computer that runsour planning algorithm. The planning algorithm computes theplan and sends it back to the robot.

The explicit communication is displayed to the user via awearable haptic interface that consists of a single vibration

motor (Haptuator Mark II, shown in Fig. 9 (d)). The interfaceis capable of rendering distinct signals by modulating thevibration pattern. In this experiment, we display haptic cueswith different vibration amplitudes and durations to indicatethe robot’s intent. Robot priority is represented by a single longvibration (duration 1.5s, max current 250 mA), and humanpriority is represented by 3 short pulses (0.2 sec vibrationwith 0.2 sec pause in between, max current 150 mA). Totrain the user interpreting the haptic cues, we first displaycues in random order (3 trials for each vibration pattern) andexplain the meaning. Then we test the user with randomizedcues. We found that all users can recognize each cue correctly.The training process is done for each user before the actualexperiment.

The experiment field is illustrated in Fig. 9(a), which is aroom of size 8 m×10 m. Two tables are placed at the two endsof the field as the targets for the user. To distract the user fromfocusing solely on the robot, we place questionnaires on thetables, and ask the user to remember and answer arithmeticquestions. The entire experiment is recorded with an overheadcamera (GoPro Hero 4, recording at 60 Hz), and we post-process the video to extract the trajectories of the user and therobot. To facilitate tracking, we ask the user to wear a purplehat, and we attach an ArUco marker [54] on the top of therobot.

B. Experimental Design

We conducted a 3×1 within-subject experiment, manipulat-ing the communication mode (baseline, implicit only, explicit+ implicit) and measuring users’ path lengths, whether theypass the robot in the front, and their trust in the robot. Detailsof the experiment design and procedure are described below.Manipulated Factors. We manipulate one factor: the com-munication mode. The user’s performance under each commu-nication mode reflects the effectiveness of the communicationmethod. In addition, the experimental trials consist of differenttask priorities and robot starting positions. We randomize theorder of task priority so that the user does not know the robot’strue intent beforehand. We also vary the starting position of therobot to create richer interaction scenarios and prevent the userfrom developing a fixed pattern (e.g. always pass in front of therobot regardless of communication). Overall, the experiment isdivided into three sessions based on the communication mode:

12

⇤<latexit sha1_base64="vhYgWo3F3ixZi5ZgrNNn1eNSP5I=">AAAB6HicbZC7SwNBEMbnfMb4ilraLAZBLMKdjTZi0MYyAfOA5Ah7m7lkzd7esbsnhCNgb2OhiK3/jL2d/42bR6GJHyz8+L4ZdmaCRHBtXPfbWVpeWV1bz23kN7e2d3YLe/t1HaeKYY3FIlbNgGoUXGLNcCOwmSikUSCwEQxuxnnjAZXmsbwzwwT9iPYkDzmjxlrV006h6JbcicgieDMoXn3mLx8BoNIpfLW7MUsjlIYJqnXLcxPjZ1QZzgSO8u1UY0LZgPawZVHSCLWfTQYdkWPrdEkYK/ukIRP3d0dGI62HUWArI2r6ej4bm/9lrdSEF37GZZIalGz6UZgKYmIy3pp0uUJmxNACZYrbWQnrU0WZsbfJ2yN48ysvQv2s5Lklr+oWy9cwVQ4O4QhOwINzKMMtVKAGDBCe4AVenXvn2Xlz3qelS86s5wD+yPn4Ad35jnc=</latexit><latexit sha1_base64="pCwF18JIdUPdMSFw8oJvBOLMY/A=">AAAB6HicbZC7SgNBFIbPxltcb1FLm8UgiEXYtdFGDNpYJmAukCxhdnI2GTM7u8zMCmHJE9hYKGKrD2NvI76Nk0uhiT8MfPz/Ocw5J0g4U9p1v63c0vLK6lp+3d7Y3NreKezu1VWcSoo1GvNYNgOikDOBNc00x2YikUQBx0YwuB7njXuUisXiVg8T9CPSEyxklGhjVU86haJbcidyFsGbQfHyw75I3r/sSqfw2e7GNI1QaMqJUi3PTbSfEakZ5Tiy26nChNAB6WHLoCARKj+bDDpyjozTdcJYmie0M3F/d2QkUmoYBaYyIrqv5rOx+V/WSnV47mdMJKlGQacfhSl3dOyMt3a6TCLVfGiAUMnMrA7tE0moNrexzRG8+ZUXoX5a8tySV3WL5SuYKg8HcAjH4MEZlOEGKlADCggP8ATP1p31aL1Yr9PSnDXr2Yc/st5+AM+Ij+s=</latexit><latexit sha1_base64="LS2ifVitLxT8rb2K4uccCIYd/tE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoMgFuHOJpZBG8sEzAckR9jbzCVr9vaO3T0hHPkFNhaK2PqT7Pw3bpIrNPHBwOO9GWbmBYng2rjut1PY2Nza3inulvb2Dw6PyscnbR2nimGLxSJW3YBqFFxiy3AjsJsopFEgsBNM7uZ+5wmV5rF8MNME/YiOJA85o8ZKzatBueJW3QXIOvFyUoEcjUH5qz+MWRqhNExQrXuemxg/o8pwJnBW6qcaE8omdIQ9SyWNUPvZ4tAZubDKkISxsiUNWai/JzIaaT2NAtsZUTPWq95c/M/rpSa88TMuk9SgZMtFYSqIicn8azLkCpkRU0soU9zeStiYKsqMzaZkQ/BWX14n7euq51a9plup3+ZxFOEMzuESPKhBHe6hAS1ggPAMr/DmPDovzrvzsWwtOPnMKfyB8/kDbsGMqg==</latexit>

⇤<latexit sha1_base64="vhYgWo3F3ixZi5ZgrNNn1eNSP5I=">AAAB6HicbZC7SwNBEMbnfMb4ilraLAZBLMKdjTZi0MYyAfOA5Ah7m7lkzd7esbsnhCNgb2OhiK3/jL2d/42bR6GJHyz8+L4ZdmaCRHBtXPfbWVpeWV1bz23kN7e2d3YLe/t1HaeKYY3FIlbNgGoUXGLNcCOwmSikUSCwEQxuxnnjAZXmsbwzwwT9iPYkDzmjxlrV006h6JbcicgieDMoXn3mLx8BoNIpfLW7MUsjlIYJqnXLcxPjZ1QZzgSO8u1UY0LZgPawZVHSCLWfTQYdkWPrdEkYK/ukIRP3d0dGI62HUWArI2r6ej4bm/9lrdSEF37GZZIalGz6UZgKYmIy3pp0uUJmxNACZYrbWQnrU0WZsbfJ2yN48ysvQv2s5Lklr+oWy9cwVQ4O4QhOwINzKMMtVKAGDBCe4AVenXvn2Xlz3qelS86s5wD+yPn4Ad35jnc=</latexit><latexit sha1_base64="pCwF18JIdUPdMSFw8oJvBOLMY/A=">AAAB6HicbZC7SgNBFIbPxltcb1FLm8UgiEXYtdFGDNpYJmAukCxhdnI2GTM7u8zMCmHJE9hYKGKrD2NvI76Nk0uhiT8MfPz/Ocw5J0g4U9p1v63c0vLK6lp+3d7Y3NreKezu1VWcSoo1GvNYNgOikDOBNc00x2YikUQBx0YwuB7njXuUisXiVg8T9CPSEyxklGhjVU86haJbcidyFsGbQfHyw75I3r/sSqfw2e7GNI1QaMqJUi3PTbSfEakZ5Tiy26nChNAB6WHLoCARKj+bDDpyjozTdcJYmie0M3F/d2QkUmoYBaYyIrqv5rOx+V/WSnV47mdMJKlGQacfhSl3dOyMt3a6TCLVfGiAUMnMrA7tE0moNrexzRG8+ZUXoX5a8tySV3WL5SuYKg8HcAjH4MEZlOEGKlADCggP8ATP1p31aL1Yr9PSnDXr2Yc/st5+AM+Ij+s=</latexit><latexit sha1_base64="LS2ifVitLxT8rb2K4uccCIYd/tE=">AAAB6HicbVA9SwNBEJ2LXzF+RS1tFoMgFuHOJpZBG8sEzAckR9jbzCVr9vaO3T0hHPkFNhaK2PqT7Pw3bpIrNPHBwOO9GWbmBYng2rjut1PY2Nza3inulvb2Dw6PyscnbR2nimGLxSJW3YBqFFxiy3AjsJsopFEgsBNM7uZ+5wmV5rF8MNME/YiOJA85o8ZKzatBueJW3QXIOvFyUoEcjUH5qz+MWRqhNExQrXuemxg/o8pwJnBW6qcaE8omdIQ9SyWNUPvZ4tAZubDKkISxsiUNWai/JzIaaT2NAtsZUTPWq95c/M/rpSa88TMuk9SgZMtFYSqIicn8azLkCpkRU0soU9zeStiYKsqMzaZkQ/BWX14n7euq51a9plup3+ZxFOEMzuESPKhBHe6hAS1ggPAMr/DmPDovzrvzsWwtOPnMKfyB8/kDbsGMqg==</latexit>

⇤⇤<latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit><latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit><latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit>

⇤⇤⇤<latexit sha1_base64="KuAKJS7bKrn7GWSgcfHr5sXj5ig=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJIDyURQY9FLx4rmLbQhrLZbtqlm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSqFQdf9dtbWNza3tks75d29/YPDytFxyySZZtxniUx0J6SGS6G4jwIl76Sa0ziUvB2O72Z++4lrIxL1iJOUBzEdKhEJRtFKfi2vTWv9StWtu3OQVeIVpAoFmv3KV2+QsCzmCpmkxnQ9N8UgpxoFk3xa7mWGp5SN6ZB3LVU05ibI58dOyblVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6CXKh0gy5YotFUSYJJmT2ORkIzRnKiSWUaWFvJWxENWVo8ynbELzll1dJ67LuuXXv4arauC3iKMEpnMEFeHANDbiHJvjAQMAzvMKbo5wX5935WLSuOcXMCfyB8/kD/dWOIg==</latexit><latexit sha1_base64="KuAKJS7bKrn7GWSgcfHr5sXj5ig=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJIDyURQY9FLx4rmLbQhrLZbtqlm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSqFQdf9dtbWNza3tks75d29/YPDytFxyySZZtxniUx0J6SGS6G4jwIl76Sa0ziUvB2O72Z++4lrIxL1iJOUBzEdKhEJRtFKfi2vTWv9StWtu3OQVeIVpAoFmv3KV2+QsCzmCpmkxnQ9N8UgpxoFk3xa7mWGp5SN6ZB3LVU05ibI58dOyblVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6CXKh0gy5YotFUSYJJmT2ORkIzRnKiSWUaWFvJWxENWVo8ynbELzll1dJ67LuuXXv4arauC3iKMEpnMEFeHANDbiHJvjAQMAzvMKbo5wX5935WLSuOcXMCfyB8/kD/dWOIg==</latexit><latexit sha1_base64="KuAKJS7bKrn7GWSgcfHr5sXj5ig=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJIDyURQY9FLx4rmLbQhrLZbtqlm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSqFQdf9dtbWNza3tks75d29/YPDytFxyySZZtxniUx0J6SGS6G4jwIl76Sa0ziUvB2O72Z++4lrIxL1iJOUBzEdKhEJRtFKfi2vTWv9StWtu3OQVeIVpAoFmv3KV2+QsCzmCpmkxnQ9N8UgpxoFk3xa7mWGp5SN6ZB3LVU05ibI58dOyblVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6CXKh0gy5YotFUSYJJmT2ORkIzRnKiSWUaWFvJWxENWVo8ynbELzll1dJ67LuuXXv4arauC3iKMEpnMEFeHANDbiHJvjAQMAzvMKbo5wX5935WLSuOcXMCfyB8/kD/dWOIg==</latexit>

⇤⇤<latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit><latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit><latexit sha1_base64="C+0k2tEfgS8hlxG9OHL8+UTzqGU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBZBeiiJCHosevFYxX5AG8pmO2mXbjZhdyOU0H/gxYMiXv1H3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9Vqv1xxa+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7plJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jYZcIXMiIkllClubyVsRBVlxoZTsiF4yy+vktZFzXNr3v1lpX6Tx1GEEziFc/DgCupwBw1oAoMQnuEV3pyx8+K8Ox+L1oKTzxzDHzifP9S3jOI=</latexit>






(a) (b) (c)

Fig. 10. Comparison of different metrics for three experimental conditions. (a) Percentage of time that the user passed in front of the robot. (b) Averagepath length. A base length (6.2 m) is subtracted from all for visualization purpose. (c) The user’s trust in the robot. Brackets indicate statistical significance(*p < 0.05, **p < 0.01, ***p < 0.001).

• explicit + implicit: the robot communicates its intentto the user both explicitly (via haptic feedback) andimplicitly (by changing speed and direction), using amodel to predict the user’s movement.

• implicit only: the robot does not plan for explicit commu-nication, but still changes speed and direction accordingto a model that predicts the user’s movement.

• baseline: the robot simply performs collision avoidancewith the user without predicting the user’s movement.

Each session consists of 20 trials. In each trial, the humanuser and the robot both move from a target position to a goalposition, as shown in Fig. 9(b). The robot is assigned one ofthe two task priorities in each trial: robot priority (ch : cr = 1 :5) or human priority (ch : cr = 2 : 1). We also vary the startingposition of the robot. The starting positions can be classified asfar (4 out of 20 trials), close (4/20) and normal (12/20), basedon the distance to the user’s starting position (Fig. 9(b)). Thereare equal numbers of high-priority trials and low-priority trialsfor each type of starting position. The order of the trials foreach communication mode is pseudo-randomized. We counter-balance the order of three communication modes among allusers using a Latin square design. The first 10 trials are treatedas training trials and not included for analysis. In the trainingtrials, we will reveal the robot’s true intent to the user aftereach trial. We never tell the robot’s intent to the user in theexperimental trials. The purpose of the training trials is tofamiliarize the user with the experimental procedure and therobot’s behavior. After the experiment, we ask the user to rateher trust in the robot on a scale from 1 to 7, 1 being the leasttrust and 7 being the most trust. The exact question we use inthe survey is “Please rate your overall trust in the robot in thefirst/second/third condition”. There are 7 choices under eachquestion and we explain to the user the meaning of the scale.

In addition to the orthogonal encounter scenario, we alsotested the planner in other experimental setups. Fig. 9 (c)presents human-robot encounter at different angles (+45◦ and−45◦). We found that the planner worked similarly in thesescenarios.Dependent Measures. We measure the user’s path length for

each trial. Path length is commonly used to estimate the user’sefficiency and effort in navigation tasks. We also measurewhether the user passes the robot in the front, which is anindicator of whether the user understands the robot’s intentcorrectly (the user is expected to pass the robot in the front inhuman priority trials and from the behind in robot prioritytrials, if they understand the robot’s intent correctly. Thismeasurement reflects the transparency of the robot’s behavior.Finally, we measure the user’s trust in the robot. Trust is mea-sured using a post-experiment survey as described previously.In the session where explicit communication is allowed, wealso record whether explicit communication happens and thetime at which it happens.Hypothesis. We hypothesize that:I. Using explicit + implicit communication conveys the

robot’s intent more clearly than implicit only and baseline,such that users will elect to pass in front of or behind therobot as appropriate for a given priority.

II. The user’s average path length is shorter when the robotplans for communication with the human model (explicit+ implicit and implicit only modes).

III. The user’s overall trust rating in the robot is higher whenthe robot plans for explicit + implicit communication,compare to the other two conditions.

Subject Allocation. A total of 12 people (7 males and 5females, age from 23 to 33) participated in the experiment aftergiving informed consent, under a protocol that was approvedby the Stanford University Institutional Review Board. Weused a within-subjects design and counterbalanced the orderof the three sessions.

C. Analysis and ResultsFig. 10 and Table I summarizes major results. We describe

the analysis and results in detail in the following paragraphs.Understanding Robot Intent. To characterize how oftenthe user correctly understood the robot’s intent, we computethe percentage of trials that the user passed in front of therobot for each communication mode and task priority. A one-way repeated measures ANOVA revealed a significant effect

13

TABLE IMEAN AND STANDARD DEVIATION OF MEASUREMENTS. FOR THE FIRSTTWO MEASUREMENTS, THE FIRST ROW IS HUMAN PRIORITY AND THE

SECOND ROW IS ROBOT PRIORITY.

Measure Imp+Exp Imp Base% Passing inFront

0.94± 0.12 0.72± 0.36 0.33± 0.270.11± 0.15 0.50± 0.37 0.55± 0.41

Path Length 6.42± 0.20 6.62± 0.29 6.79± 0.226.64± 0.07 6.68± 0.10 6.91± 0.18

Trust 6.58± 0.49 4.58± 0.64 3.33± 0.85

on percentage passing in the front for the communicationmode (F (2, 33) = 14.61, p < .001 for human priority trials,F (2, 33) = 5.75, p = .007 for robot priority trials). Weperformed the statistical test separately for robot and humanpriority trials. We performed a post-hoc analysis with TukeyHSD to determine pairwise differences. For human prioritytrials, results show that the baseline mode is significantlydifferent from the implicit only mode (p = .005) and theimplicit + explicit mode (p < .001). For robot priority trials,the implicit + explicit mode is significantly different fromthe implicit only mode (p = 0.027) and the baseline mode(p = 0.01). This supports Hypothesis I that the user can betterunderstand the robot’s intent with explicit communication.Path Length. Path length is a commonly used metric tomeasure efficiency. We computed average path length (overall trials of each condition and robot intent) of each user.Similar to the metric of passing in the front, we performed aone-way repeated measure ANOVA analysis, and results showa significant effect (F (2, 33) = 6.36, p = .005 for humanpriority trials, F (2, 33) = 14.18, p < .001 for robot prioritytrials). Post-hoc comparisons reveal that, for human prioritytrials, the average path length of the implicit + explicit modeis significantly shorter than the baseline mode (p = .003). Forrobot priority trials, the average path length of the baselinemode is significantly longer than the implicit only mode(p < .001) and implicit + explicit mode (p < .001). The resultsuggests that the user can navigate more efficiently when therobot plans for communication (Hypothesis II).Trust. Past research has shown that human would trust therobot more if it can express its intent via motion [55] orother forms of communication [40], and we expect similarresults in our experiment. Here users’ trust in the robot wasmeasured with a post-experiment survey, as explained in theprevious subsection. A one-way repeated measure ANOVAanalysis revealed a significant effect for communication mode(F (2, 33) = 64.5, p < .001). Post-hoc Tukey HSD analysisshowed that all three pairs are significantly different from eachother (p < .001 for all), which supports Hypothesis III.

D. Effect of Communication

Fig. 10(a) shows that when there was both implicit andexplicit communication, users were able to understand therobot’s intent most of the time and acted accordingly – userspassed in front of the robot 92% of the time in humanpriority trials, and only 9% of the time in robot priority trials.When there was only implicit communication, users were moreconfused, especially in robot priority trials (about 50% of the

Time (s)

Rob

ot V

eloc

ity (m

/s)

Robot slows down to yield to/avoid the userHuman Priority

Robot Priority

Fig. 11. Velocity profiles from sample trials. Top: human priority trials.Bottom: robot priority trials.

time users passed in front of the robot despite that the robot’sintent was not to yield). When the robot performed collisionavoidance naively (baseline condition), users were the mostconfused, and acted more conservatively.

Fig. 11 depicts the robot’s velocity over time from sampletrials of different communication modes and task priorities.In human priority trials (top row), when the robot planned forcommunication (implicit only and implicit + explicit modes), itstarted to slow down relatively early, compared to the baselinemode. By doing so, the robot implicitly expressed its intent.Indeed, significantly more users chose to pass in front ofthe robot with these communication modes (Fig. 10(a)). Thereason that the planner generated this behavior is that, withthe human model, the planner was able to predict the effectof robot motions on the user’s navigation behavior. Morespecifically, our human model would predict that, given therobot slows down early, the human is more likely to believethat the robot is yielding priority and will choose to passfirst, which would generate higher expected rewards. Thisencouraged the robot to slow down earlier. On the other hand,the baseline planner did not perform any prediction and simplysolved for a collision-free trajectory. As a result, the robot didnot slow down until it got very close to the user, which causedthe user to believe that it did not want to yield. The resultsuggests that legible motions emerge when reactions of thehuman are taken into consideration during planning.

Our results also suggest that users can navigate more effi-ciently when the robot planned for communication. Fig. 10(b)shows that for human priority trials, users’ path length issignificantly shorter in the implicit + explicit mode than thebaseline mode. For robot priority trials, both the implicit +explicit mode and the implicit only mode result in significantlyshorter path lengths than the baseline mode. Knowing therobot’s intent, users can better coordinate their movementswith the robot to navigate more efficiently. In the implicit+ explicit mode, users in general got feedback early in thetrial. In the implicit only mode, users had to infer the robot’sintent, and tended to be less efficient. This result does not show

14

statistical significance; users’ behaviors have high variance inthe implicit only mode. In Fig. 10(a), we observe the samehigh variance in the implicit only mode.

An important feature of the study is that we never explainedto the user how the robot may move differently in three com-munication modes. One of the main purpose of the trainingtrials is to let the user figure out the robot’s movement pattern.We observe that users in general were able to adapt to therobot’s communication mode in a few trials. Most users startedconservatively, trying to keep relatively large distances fromthe robot. In the implicit + explicit mode, most users adapted totrust the explicit communication quickly. In the baseline mode,users made their decisions largely based on the initial distancefrom the robot, as they found it difficult to tell the robot’s intentfrom its motion. Interestingly, in the implicit only mode, weobserved that users’ behaviors varied based on their strategyof interacting with the robot. Aggressive users tend to realizeearlier that the robot would slow down when it wanted to yieldpriority; they were more likely to cut in front of the robot. Thishelped them to interpret the implicit communication. Timidusers, on the other hand, tend to keep larger distances or let therobot pass first. Sometimes this behavior resulted in the robotnot slowing down in human priority trials. Unlike legibilityplanning as in [31], our algorithm does not explicitly optimizelegibility. In addition, the algorithm responds to human actionsin real time. As a result, the robot would not wait if the useris too far, even in human priority trials, which could confusethe user. This explains in part the high variance observed inthe implicit only mode.

Note that in the implicit + explicit mode, the robot may notalways use explicit communication. Again, this is the result ofoptimizing the overall reward function. We assigned a negativereward for performing explicit communication to prevent over-communicating. In trials where the robot started too close toor too far from the intersection point, it often chose to notuse explicit communication because the chance of collision ifextremely low.

Finally, we show that users gained more trust in the robotwith the proposed planner in Fig. 10 (c). We found statisticalsignificance between each pair of communication modes.When the robot communicated both implicitly and explicitly,users trusted the robot the most. This is also the conditionwhere users understood the robot’s intent the best (based onthe previous analysis shown in Fig. 10(a)). The correlationsuggests that the easier users can understand the robot’s intent,the more they trust the robot – transparency is beneficial insocial navigation. When users do not trust the robot, they tendto act conservatively and be less efficient (fewer trials wherethe user passed in front of the robot, and longer path lengthsin the baseline mode). Note that the trust measurement in thiswork is exploratory, as we only asked users to rate their overalltrust over the robot. In order to obtain more accurate results,a more specific and validated questionnaire should be used.

VIII. CONCLUSION AND FUTURE WORK

In this paper, we presented a planning framework that lever-ages implicit and explicit communication to generate efficient

and transparent social navigation behaviors. This approach formobile robot proactive communication is inspired by humans’ability to use both explicit and implicit communication toavoid collisions during navigation. The planner relies on anew probabilistic model that predicts human movements. Weevaluated the planner both in simulation and in a user studywith a physical mobile robot. Results showed that the proposedplanning algorithm can generate proactive communicative ac-tions, which better expressed the robot’s intent, reduced users’effort, and increased users’ trust of the robot compared tocollision avoidance with and without a model that predictsusers’ movements.

There are numerous ways to expand on this work. First, themodel of human navigation behavior makes certain assump-tions and approximations as described in Section IV. As aresult, the planner can not deal with certain scenarios (e.g., ifthe human suddenly stops and remains stationary). Expandingthe data collection for IRL, including additional behaviors,and relaxing assumptions, would improve the model and thusthe planner. Second, our planner is computationally expensive.One approach to reducing the computational complexity ofplanning is to use sampling-based methods, instead of per-forming optimization. Third, we can generalize our approachto consider richer interaction scenarios and other communica-tion modalities. We are also interested in identifying the ad-vantages of combining implicit and explicit communication inother application scenarios. For example, in collaborative taskssuch as assembly, prompt explicit communication can be usedto align each agent’s goal, while implicit communication suchas forces can improve physical interactions between partners.Finally, we would like to explore in future work how implicitand explicit communication affects users’ trust towards therobot. The result in this work is an initial demonstration ofthe benefit of communication in terms of trust. However, morestudies, such as looking into specific aspects of trust [56], areneeded to gain deeper understanding.

While this work only explored a relatively simple scenario,the result has clearly demonstrated the benefit of combiningimplicit and explicit communication in robot social navigation.We believe that the computational framework for modeling andplanning communicative actions can potentially be adapted toother applications and improve interactions between robots andhumans.

REFERENCES

[1] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation indense human crowds: the case for cooperation,” in IEEE InternationalConference on Robotics and Automation, 2013, pp. 2153–2160.

[2] G. Ferrer and A. Sanfeliu, “Proactive kinodynamic planning using theextended social force model and human motion prediction in urbanenvironments,” in IEEE/RSJ International Conference on IntelligentRobots and Systems, 2014, pp. 1730–1735.

[3] M. Kollmitz, K. Hsiao, J. Gaa, and W. Burgard, “Time dependent plan-ning on a layered social cost map for human-aware robot navigation,”in IEEE European Conference on Mobile Robots, 2015, pp. 1–6.

[4] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Sociallycompliant mobile robot navigation via inverse reinforcement learning,”The International Journal of Robotics Research, vol. 35, no. 11, pp.1289–1307, 2016.

[5] B. Kim and J. Pineau, “Socially adaptive path planning in human en-vironments using inverse reinforcement learning,” International Journalof Social Robotics, vol. 8, no. 1, pp. 51–66, 2016.

15

[6] M. Pfeiffer, U. Schwesinger, H. Sommer, E. Galceran, and R. Sieg-wart, “Predicting actions to act predictably: Cooperative partial motionplanning with maximum entropy models,” in IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2016, pp. 2096–2101.

[7] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motionplanning with deep reinforcement learning,” in IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2017, pp. 1343–1350.

[8] L. Peterson and M. J. Peterson, “Short-term retention of individual verbalitems,” Journal of Experimental Psychology, vol. 58, no. 3, p. 193, 1959.

[9] C. Breazeal, C. D. Kidd, A. L. Thomaz, G. Hoffman, and M. Berlin,“Effects of nonverbal communication on efficiency and robustnessin human-robot teamwork,” in IEEE/RSJ international conference onintelligent robots and systems, 2005, pp. 708–713.

[10] R. A. Knepper, C. I. Mavrogiannis, J. Proft, and C. Liang, “Implicitcommunication in a joint action,” in ACM/IEEE International Confer-ence on Human-Robot Interaction (HRI, 2017, pp. 283–292.

[11] C. I. Mavrogiannis and R. A. Knepper, “Towards socially competentnavigation of pedestrian environments,” in Robotics: Science and Sys-tems 2016 Workshop on Social Trust in Autonomous Robots, 2016.

[12] A. D. Dragan and S. S. Srinivasa, “Generating Legible Motion,” inRobotics: Science and Systems, 2013.

[13] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach tocollision avoidance,” IEEE Robotics & Automation Magazine, vol. 4,no. 1, pp. 23–33, 1997.

[14] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, “Human-awarerobot navigation: A survey,” Robotics and Autonomous Systems, vol. 61,no. 12, pp. 1726–1743, 2013.

[15] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,”Physical Review E, vol. 51, no. 5, p. 4282, 1995.

[16] F. Zanlungo, T. Ikeda, and T. Kanda, “Social force model with explicitcollision prediction,” Europhysics Letters, vol. 93, no. 6, p. 68005, 2011.

[17] F. Farina, D. Fontanelli, A. Garulli, A. Giannitrapani, and D. Prat-tichizzo, “Walking ahead: The headed social force model,” PloS ONE,vol. 12, no. 1, p. e0169734, 2017.

[18] P. Ratsamee, Y. Mae, K. Ohara, M. Kojima, and T. Arai, “Social naviga-tion model based on human intention analysis using face orientation,” inIEEE/RSJ International Conference on Intelligent Robots and Systems,2013, pp. 1682–1687.

[19] D. Mehta, G. Ferrer, and E. Olson, “Autonomous navigation in dynamicsocial environments using multi-policy decision making,” in IEEE/RSJInternational Conference on Intelligent Robots and Systems, 2016, pp.1190–1197.

[20] Y. Che, C. T. Sun, and A. M. Okamura, “Avoiding human-robot colli-sions using haptic communication,” in IEEE International Conferenceon Robotics and Automation, 2018, pp. 5828–5834.

[21] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research. Springer, 2011, pp.3–19.

[22] S. J. Guy, M. C. Lin, and D. Manocha, “Modeling collision avoid-ance behavior for virtual humans,” in International Conference onAutonomous Agents and Multiagent Systems, vol. 2. InternationalFoundation for Autonomous Agents and Multiagent Systems, 2010, pp.575–582.

[23] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense,interacting crowds,” in IEEE/RSJ International Conference on IntelligentRobots and Systems, 2010, pp. 797–803.

[24] A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learn-ing,” in International Conference on Machine Learning, 2000, pp. 663–670.

[25] S. Levine and V. Koltun, “Continuous inverse optimal control withlocally optimal examples,” in International Conference on MachineLearning, 2012, pp. 475–482.

[26] P. Abbeel and A. Y. Ng, “Exploration and apprenticeship learningin reinforcement learning,” in International Conference on MachineLearning. ACM, 2005, pp. 1–8.

[27] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum en-tropy inverse reinforcement learning,” in AAAI Conference on ArtificialIntelligence, 2008, pp. 1433–1438.

[28] M. Kuderer, H. Kretzschmar, C. Sprunk, and W. Burgard, “Feature-basedprediction of trajectories for socially compliant navigation.” in Robotics:science and systems, 2012.

[29] L. Tail, J. Zhang, M. Liu, and W. Burgard, “Socially compliant nav-igation through raw depth inputs with generative adversarial imitationlearning,” in IEEE International Conference on Robotics and Automation(ICRA), 2018, pp. 1111–1117.

[30] A. Thomaz, G. Hoffman, M. Cakmak et al., “Computational human-robot interaction,” Foundations and Trends in Robotics, vol. 4, no. 2-3,pp. 105–223, 2016.

[31] A. D. Dragan, K. C. Lee, and S. S. Srinivasa, “Legibility and predictabil-ity of robot motion,” in ACM/IEEE International Conference on Human-Robot Interaction, 2013, pp. 301–308.

[32] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning forautonomous cars that leverage effects on human actions,” in Robotics:Science and Systems, 2016.

[33] D. Sadigh, S. S. Sastry, S. A. Seshia, and A. Dragan, “Informationgathering actions over human internal state,” in IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 2016, pp. 66–73.

[34] D. Sadigh, N. Landolfi, S. S. Sastry, S. A. Seshia, and A. D. Dragan,“Planning for cars that coordinate with people: Leveraging effects onhuman actions for planning and active information gathering over humaninternal state,” Autonomous Robots (AURO), vol. 42, no. 7, pp. 1405–1426, 2018.

[35] A. Bestick, R. Bajcsy, and A. D. Dragan, “Implicitly assisting humansto choose good grasps in robot to human handovers,” in InternationalSymposium on Experimental Robotics. Springer, 2016, pp. 341–354.

[36] S. Nikolaidis, D. Hsu, and S. Srinivasa, “Human-robot mutual adaptationin collaborative tasks: Models and experiments,” The InternationalJournal of Robotics Research, vol. 36, no. 5-7, pp. 618–634, 2017.

[37] C.-M. Huang and B. Mutlu, “Modeling and evaluating narrative gesturesfor humanlike robots,” in Robotics: Science and Systems, 2013, pp. 57–64.

[38] M. Lohse, R. Rothuis, J. Gallego-Perez, D. E. Karreman, and V. Evers,“Robot gestures make difficult tasks easier: the impact of gestures onperceived workload and task performance,” in SIGCHI Conference onHuman Factors in Computing Systems. ACM, 2014, pp. 1459–1466.

[39] T. Hashimoto, S. Hiramatsu, T. Tsuji, and H. Kobayashi, “Realizationand evaluation of realistic nod with receptionist robot saya,” in IEEEInternational Symposium on Robot and Human Interactive Communica-tion (RO-MAN), 2007, pp. 326–331.

[40] K. Baraka and M. M. Veloso, “Mobile service robot state revealingthrough expressive lights: Formalism, design, and evaluation,” Interna-tional Journal of Social Robotics, vol. 10, no. 1, pp. 65–92, 2018.

[41] S. Scheggi, F. Chinello, and D. Prattichizzo, “Vibrotactile haptic feed-back for human-robot interaction in leader-follower tasks,” in Inter-national Conference on Pervasive Technologies Related to AssistiveEnvironments, 2012, pp. 51:1–51:4.

[42] S. Scheggi, F. Morbidi, and D. Prattichizzo, “Human-robot formationcontrol via visual and vibrotactile haptic feedback,” IEEE Transactionson Haptics, vol. 7, no. 4, pp. 499–511, 2014.

[43] D. Sieber, S. Music, and S. Hirche, “Multi-robot manipulation controlledby a human with haptic feedback,” in IEEE/RSJ International Confer-ence on Intelligent Robots and Systems, 2015, pp. 2440–2446.

[44] B. Mutlu, F. Yamaoka, T. Kanda, H. Ishiguro, and N. Hagita, “Nonverballeakage in robots: communication of intentions through seemingly unin-tentional behavior,” in Proceedings of the 4th ACM/IEEE InternationalConference on Human Robot Interaction, 2009, pp. 69–76.

[45] H. Admoni, A. Dragan, S. S. Srinivasa, and B. Scassellati, “Deliber-ate delays during robot-to-human handovers improve compliance withgaze communication,” in Proceedings of the ACM/IEEE InternationalConference on Human-Robot Interaction, 2014, pp. 49–56.

[46] A. Moon, D. M. Troniak, B. Gleeson, M. K. Pan, M. Zheng, B. A.Blumer, K. MacLean, and E. A. Croft, “Meet me where i’m gaz-ing: how shared attention gaze affects human-robot handover timing,”in ACM/IEEE International Conference on Human-Robot Interaction,2014, pp. 334–341.

[47] M. Morari, C. Garcia, J. Lee, and D. Prett, Model predictive control.Prentice Hall Englewood Cliffs, NJ, 1993.

[48] C. I. Mavrogiannis, W. B. Thomason, and R. A. Knepper, “Socialmomentum: A framework for legible navigation in dynamic multi-agentenvironments,” in ACM/IEEE International Conference on Human-Robot Interaction, 2018, pp. 361–369.

[49] A. Astolfi, “Exponential stabilization of a wheeled mobile robot via dis-continuous control,” ASME Journal of Dynamic Systems, Measurement,and Control, vol. 121, no. 1, pp. 121–126, 1999.

[50] S. Johnson, “The nlopt nonlinear-optimization package [software],”2014.

[51] S. Hoogendoorn and P. H. L. Bovy, “Simulation of pedestrian flows byoptimal control and differential games,” Optimal Control Applicationsand Methods, vol. 24, no. 3, pp. 153–172, 2003.

[52] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlolocalization for mobile robots,” Artificial Intelligence, vol. 128, no. 1-2,pp. 99–141, 2001.

16

[53] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,”in ICRA workshop on open source software, vol. 3, no. 3.2, 2009, p. 5.

[54] S. Garrido-Jurado, R. Munoz-Salinas, F. Madrid-Cuevas, and M. Marın-Jimenez, “Automatic generation and detection of highly reliable fiducialmarkers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014.

[55] A. D. Dragan, S. Bauman, J. Forlizzi, and S. S. Srinivasa, “Effectsof robot motion on human-robot collaboration,” in Proceedings of theTenth Annual ACM/IEEE International Conference on Human-RobotInteraction. ACM, 2015, pp. 51–58.

[56] P. A. Hancock, D. R. Billings, K. E. Schaefer, J. Y. Chen, E. J.De Visser, and R. Parasuraman, “A meta-analysis of factors affectingtrust in human-robot interaction,” Human factors, vol. 53, no. 5, pp.517–527, 2011.

efﬁcient and trustworthy social navigation via explicit ... · predictable motions [26]. sadigh...

Documents