motion learning and adaptive impedance for robot...

Motion Learning and Adaptive Impedance for Robot Control duringPhysical Interaction with Humans

Elena Gribovskaya, Abderrahmane Kheddar, and Aude Billard

Abstract— This article combines programming by demon-stration and adaptive control for teaching a robot to physicallyinteract with a human in a collaborative task requiring sharingof a load by the two partners. Learning a task model allowsthe robot to anticipate the partner’s intentions and adapt itsmotion according to perceived forces. As the human representsa highly complex contact environment, direct reproduction ofthe learned model may lead to sub-optimal results. To compen-sate for unmodelled uncertainties, in addition to learning wepropose an adaptive control algorithm that tunes the impedanceparameters, so as to ensure accurate reproduction. To facilitatethe illustration of the concepts introduced in this paper andprovide a systematic evaluation, we present experimental resultsobtained with simulation of a dyad of two planar 2-DOF robots.

I. INTRODUCTION

We consider the problem of adapting on-the-fly the controllaw of a robot-follower when engaged in a collaborativetask with another robot, the robot-leader. The task consistsof lifting a rigid beam, Fig. 1. This work builds upon our

Leader Follower

q3

q2

q1

Kd,L Dd,L Λd,L

Kd

~ Dd

~ Λd~

xd,L xd,L xd,L xd xd xd

. ... ..

fL

f

Fig. 1. Two planar robots are engaged in a collaborative task oflifting a beam. The robot-leader substitutes the human. The robot-followeranticipates the motion intentions of the robot-leader and adapts accordingly.Task completion requires the satisfaction of “soft constraints”: the tworobots must coordinate and adapt their motions so as to avoid tilting thebeam. The desired kinematic plan xd,L, xd,L, xd,L of the robot-leaderis given. During demonstration, the robot-follower learns to generate adesired kinematic command xd, xd, xd in response to the perceived force f .The dynamical behavior of robots’ end-effectors is modeled as mechanicalimpedance characterized by desired stiffness, damping, and inertia. Duringtask execution, the robot-follower adapts its desired stiffness Kd and inertiaΛd, so as to ensure accurate reproduction of a learned task model.

previous research where we proposed a learning algorithm tocontrol the motion of the HRP-2 robot engaged in a similartask with a human [1], [2]. These experiments have revealedhow important it is to be able to anticipate the forces (appliedat the robot’s end-effector), so as to generate an appropriate

E. Gribovskaya and A. Billard are with Learning Algo-rithms and Systems Laboratory (LASA), EPFL, Switzerland;{elena.gribovskaya,aude.billard}@epfl.ch. A.Kheddaris with CNRS-UM2 LIRMM, Montpellier, France, and the CNRS-AISTJRL, UMI3218/CRT, Japan; [email protected].

response. Furthermore, as prediction may not account forall perturbations, e.g. those induced by changes in behaviorof the leading partner (sudden deviations from the motionplan or varying arm impedance), additional mechanismsare required to mitigate non-modeled effects during taskexecution.

The problem addressed in the current paper goes beyondthat of a robot reacting to random disturbances and focuseson continuous prediction and adaptation to the dynamics ofthe partner. We decompose the problem in two parts. First,we learn a task model from demonstrations; the task modelis then used to generate an appropriate reference kinematicand feedforward control signal in response to the perceivedforce. Importantly, the task model also allows to predict theforce that will be perceived; the force prediction enables on-line adaptation of the robot’s motion. Second, we proposean adaptive impedance controller that compensates for non-modeled effects. Here, we extend the method proposed by [3]to encapsulate the force feedback error into the adaptationlaws. We evaluate the system’s performance in controlledsituations using a physical simulation of a pair of planarrobots, where the robot-leader mimics the role played by thehuman in the real-world experiments. To simulate changesin the control law of the human (here the robot-leader), weintroduce delays, signal-dependent noise, as well as changein the impedance of the controller of the follower.

II. APPROACH

The Fig. 2 shows the control flow of our model. We discusshow a task model is learned from a set of demonstrations, andpresent the adaptive impedance control law that allows forcompensation of effects not captured by the learned model.

Learning

Task Model

Learning

Feedforward

Control

Impedance

Control Law

Generating

Reference

Signal

D= {ξ, ξ, u}.

ξ = h(ξ ).

u = u (ξ)

ex ex ef.

K , Λ∼ ∼

x ,x ,xd d d

. ..u,

f

d d

Adapting

Impedance

Demonstration

Fig. 2. After acquiring a set of demonstrations D, the robot learns thetask model ξ = h(ξ) and a forward control signal u = u(ξ) that mapsthe desired state ξ of the task model to actual motor commands. Therobot is controlled through an impedance control law so as to compensatefor non-modeled aspects of the external dynamics. The dynamical systemrepresentation of the task model allows the robot to generate referencesignals on-line adapting to the force applied by a human. The desiredstiffness Kd and inertia Λd are continuously adapted during task execution.

A. Learning a Task Model

A training set D consists of M demonstrated trajectoriesof length Nk, for k = 1 · · ·M . Each trajectory is a sequenceof states, ξkt , state derivatives ξkt , and control inputs uk

t ,t = 1 · · ·Nk. The control input uk

t is the vector of jointtorques. The task state is described by an augmented stateξ = [xd, fd]T and is composed of the reference velocity xd,and the force feedback part that represents the expected forcefd, perceived by a force sensor mounted at the robot’s wrist.Following our previous work [4], we assume that the task

f [N]

x [m

/sec

]

.

Vector-field of the

learned internal

model of the lifting task

Training data

d

d

0

0.6

0.4

0.2

0-10 15

Fig. 3. The task model is represented by a dynamical system ξ =h(ξ), ξ = [xd; fd] and estimated from the training data. At each timestep, the velocity xd and force fd are inferred from these observed at theprevious step. Their dynamical relationships follow vector fields displayedin blue. Dark gray lines show the demonstrations. One can observe anaccurate fit between the inferred and demonstrated dynamics. Statisticalinference extends prediction of the force-velocity pattern to ranges of thesevariables not observed during training. This offers a greater robustnessduring adaptation to a new human partner.

model is governed by an autonomous (time-independent)dynamical system, where the acceleration of the robot’s end-effector is a function of its velocity and the perceived force1:

ξ = h(ξ) + η(ξ) ∼ h(ξ) = E[P(ξ|ξ)] (1)

where h(ξ) is the dynamic function governing the temporalevolution of the motion, and η(ξ) ∼ N(0,Ση(ξ)) is thesignal-dependent noise.

Given a training set D (see Section III-A for a descriptionof the data acquisition procedure), we learn the task modelh(ξ) by estimating the joint density P(ξ, ξ) through Gaus-sian Mixture Models; given the joint density, we compute theconditional expectation from Eq. 1. We apply the incrementalEM procedure that we developed in [4] to ensure that theestimate of the velocity is asymptotically stable at the originof the system. In this context, the stability ensures that whenthe force perceived at the end-effector is null, the robot stopsmoving, as shown in Fig. 3.

The important novelty here consists in introducing theaugmented state ξ. The augmented state encapsulates both

1Encoding a task model as an autonomous dynamical system allows torepresent behaviors that cannot be unambiguously expressed by a singlenon-autoregressive function. Specifically, the dependency between force andvelocity in our case is not functional: a single value of force correspondsto different velocities; see Fig. 3.

the kinematic command x and the haptic input f and allowsone to learn a temporal evolution of the reference velocitycorrelated with the perceived external force. An advantage ofthis formulation is that the robot can switch across differentreference velocity profiles in response to a change in thepartner’s intentions.

Another advantage of this encoding is that the task modelcan be used to mitigate sensory delays by predicting the per-ceived force. Specifically, to generate a reference kinematics,the robot does not need to get actual value of the force ateach time step, it can predict the perceived force from thetask model. Later, once it gets actual value of the force, therobot may offset its prediction so as to switch to a differentvelocity profile if necessary.B. Impedance Control

The rigid body dynamics in task coordinates [5]2 is:

Λ(x, x)x + µ(x, x)x + J−T g(x) = J−T τ + f , (2)

Λ(x, x) = J−T MJ−1 µ(x, x) = J−1(C − MJ−1J)J−1,

with q ∈ RNq the vector of joint angles, M the inertia matrix,C the Coriolis/centrifugal matrix, g(x) the vector of gravitytorques, f is the vector of external forces and τ are theapplied joint torques, J is the Jacobian.

Impedance control in the task space consists of the fol-lowing control objective [6]:

Λdex + Ddex + Kdex = ef (3)

where ex = x − xd is the position error between the actualposition x and the reference position xd; ef = f−fd measureshow much the actual perceived force f deviates from thepredicted one fd. Λd, Dd, and Kd are the symmetric andpositive definite matrices of desired inertia, damping, andstiffness, respectively.

Substituting x from Eq. 3 into Eq. 2, the Cartesianimpedance controller can be implemented via the jointtorques τ as follows:

τ = u + JT Kex + JT Dex + JT Λdef, (4)

where u = g + JT (Λxd + µxd) (5)

Kd = ΛΛ−1d Kd, Dd = ΛΛ−1

d Dd + µ, Λd = ΛΛ−1d − I.

Eq. 4 can be rewritten as a sum of the feedforward and feed-back components. The feedback signal can be decomposedinto the kinematic feedback vx and the force feedback vf .

τ = u + v, v = vx + vf (6)

vx = JT (Kdex + Ddex), vf = JT Λdef.

Note that the feedforward control u in Eq. 4-5 correspondsto the learned feedforward control in Eq. 7. Analyticalcomputation of u through Eq. 5 requires an accurate dynamicmodel of the robot that is rarely available. Many authors sug-gest different algorithms for approximating the feedforward

2The notation −T refers to pseudo-inverse of a transposed matrix. Thenotation −1 refers to pseudoinverse if a matrix is noninvertible

control, we learn u from demonstrations. Therefore, if wefurther refer to feedforward control u, we mean the learnedcontrol u from Eq. 7.

During motion, most of variation occurs along the verticalaxis, therefore only the impedance parameters along thevertical axis are estimated. Following common practice, wefurther set Dj = λ

√Kj ; λ is an scalar, and λ = 2 in our

experiments.C. Learning Feedforward Control

The feedforward control input u is generally a functionof the robot’s desired and actual states. Given a task modelh(ξ), which defines the nonlinear dependency between theseparameters, u becomes a function of the task state ξ. Follow-ing [3], we learn an estimate of the feedforward commandu in the linear form:

u = [Φ(ξ)TΘ]T , (7)

where Φ ∈ RK(Nx+Nf ) is a vector of G basis functions andΘ ∈ RK(Nx+Nf )×Nq is a matrix of the tunable parameters(each column θi, i = 1 · · ·Nq, of the matrix Θ correspondsto one degree of freedom in the joint space). Nx, Nf , Nq

refer to the dimensionality of the Cartesian space of therobot’s end-effector, perceived force, and the joint space,respectively. This type of linear control parameterization iscommonly used in the adaptive control literature [7], [8]. Thebasis Φ(ξ) consists of G Gaussian functions:

Φj = Φ(ξ)j =πj(ξ) ξ∑Gg=1 πg(ξ)

, j = 1 · · ·G (8)

where πj(ξ) = exp(−0.5(ξ − µξ,j)

TΣ−1j (ξ − µξ,j)

)with µξ,j , Σj are the mean and the diagonal covariance of ajth Gaussian kernel. Given D, we learn the parameterizationin Eq.7-8 through Linear Weighted Regression [9].D. Adaptive Impedance

We use an iterative algorithm to update the impedanceparameters and the parameters of the feedforward controller.We follow [3] and write an adaptation law for the feedfor-ward parameters θi (∀i = 1 · · ·Nq) from Eq. 7 as follows:

∆θi =β

2((1− χ)Φϵi + (1 + χ)Φ|ϵi|)− γ1 (9)

where ϵ is the error function, that will be defined laterand β, χ, and γ are empirical constants. During physicalinteraction, a robot may deviate from the learned task modelbecause of varying human intentions. Hence, we introducetwo supplementary feedback terms, ef and em to the originalformulation in [3]. These terms denote respectively an errorbetween the predicted and the actual perceived force and akinematic error, e.g. that caused by changes in the mass ofa manipulated object. That is:

ϵ = JT (em + ρ3ef) (10)em = ℓmin(f · ex, 0)(ρ1ex + ρ2ex)

where · refers to the inner vector product and ℓ = ∥f · ex∥−1

is the normalization factor. The parameters ρ1, ρ2, ρ3 ∈ R

weight the importance given to errors in the trajectory, thevelocity, and the force. We can now rewrite Eq. 9 as:

∆θi =β

2(1− χ1)ΦJT em +

β

2(1 + χ1)Φ|JT em|+ (11)

β

2(1− χ2)ΦJT em +

β

2(1 + χ2)Φ|JT em| − γ1.

The bio-mimetic approach [3] suggests to transform Eq. 11into adaptation laws for both feedforward and feedbackcomponents. Next, we provide analytical discussion of com-ponents in Eq. 11 and propose the algorithm for adaptationof the impedance parameters.

In Eq. 11, the terms β2 (1 − χ1)Φϵim and β

2 (1 − χ2)Φϵifcorrespond to the conventional adaptation law from thecontrol literature. β

2 (1 − χ1)Φϵim generates a force in thedirection opposite to the kinematic error em, and updates thefeedforward signal u. β

2 (1 − χ2)Φϵif compensates for thedeviation of the external force from its reference value andcontributes to the adaptation of the desired inertia.

The terms dependent on the absolute values of the errorsaim at tuning the stiffness. β

2 (1+χ1)Φ|ϵim| increases stabilityin response to kinematic perturbation, while β

2 (1+χ2)Φ|ϵif |decreases the stiffness if the deviation of the external forceis increasing. Indeed, a sudden increase in the force erroref means that the human is attempting to impose a differ-ent motion plan, and hence the robot should decrease thestiffness so as to maintain stable interaction.

Analogous to [3], the update mechanism emulates auto-matic relaxation through the term γ1. This is similar toa behavior observed in humans who, in the absence ofmotion errors, tend to relax muscles so as to minimizeenergy consumption [10]. The authors in [3] advocate thatthe components involving absolute values of the errors areresponsible for muscle “co-activation” and should affectadaptation of impedance in robots. Following this reasoning,we rearrange the terms in Eq. 11; the update procedure forthe forward signal and the impedance parameters then canbe written as follows:

∆θi = κxΦ(JTem)i − γθ, κx > 0, i = 1 · · ·Nq (12)

∆Kj

d = βx|ejm| − βf |ejf | − γK, βx, βy, γ > 0. (13)

∆Λjd = κfe

jf − γΛ, j = 1 · · ·Nx κf > 0 (14)

Eq. 4 together with Eq. 7,12-14 represent the control al-gorithm that enables the simultaneous on-line adaptation ofthe feedforward signal, the desired stiffness, and the inertia.Note, if we update the desired stiffness and inertia accordingto Eq. 13-14, their values may go below zero; to avoid this,we consider adaptation only within given boundaries, e.g.,2 < Kj

d < 100.III. RESULTS

We assess our method in simulations where the robot-follower interacts with another planar robot, Fig. 1. Tohighlight different types of adaptation handled by our algo-rithm, we simulate different conditions that may arise duringexecution of the collaborative tasks, and that would requireon-the-fly adaptation of the robot’s control law.

−10 0 10 20

Passive Observation Active Observation

−15

−5

5

15

0 .1 .2 .3

t[sec] f [N]

f [N

]

x [

m/s

ec]

.

PASSIVE observation ACTIVE observationLeader

Follower

Leader

Follower

(d)(e)(a)

(c)0 .1 .2 .3

0

.2

.4

0

.2

.4

x [

m/s

ec]

.

t[sec](b)

Leader Follower Leader Follower

Fig. 4. TWO-STAGE TRAINING PROCEDURE. To simulate real-world training, where the robot is teleoperated by a human, we adopt a two stagetraining procedure. Figs. (a), (e) present the robots’ configurations during training. Desyncronization between the partners is greatly reduced during activeobservation, as expressed by the reduced tilting of the beam. PASSIVE OBSERVATION: The stiffness of the robot-follower is set to be low ( 5N/m)and the stiffness of the robot-leader is high ( 50N/m). This allows the robot-leader to impose its kinematic plan; see Fig. (b). The actual velocity ofthe robot-follower is higher than its reference signal and coincides with the actual and reference velocities of the robot-leader. Such a forced adaptationis achieved at the cost of considerable energy injection; see Figs. (c)-(d). The robot-follower perceives high positive external forces that are due to theeffort of the robot-leader. After observing the task “passively”, the robot-follower stores the kinematic information and discards the force signals. ACTIVEOBSERVATION: The stiffness of both partners is medium ( 15N/m). The robot-leader repeats the same reference kinematical profile as at the previousstage, while the robot-follower utilizes the kinematic profile acquired during passive observation. Improved synchronization decreases the magnitude of theforces perceived by both partners; see Figs. (c)-(d), solid line. The final training set is composed of the velocity signal recorded during passive observation,and the external forces/applied torques recorded during active observation.

A. Two-stage Training ProcedureDemonstration data used for validation of our algorithm,

are acquired through a two-stage training procedure: ineach demonstration, the robots alternate between “passive”and “active” stages. In total, 15 demonstrations at differentspeeds have been acquired. This procedure is inspired fromthe way humans incrementally learn to synchronize witheach other. The leader keeps the same pace in passive andactive stages. The follower adapts its desired kinematics tosynchronize with the leader.

We simulate two types of sensorimotor limitations of thehuman motor system: a signal-dependent noise and sensorydelay. An reaction delay (a time span between the momentswhen the follower starts perceiving the changing force andwhen it actually starts moving) is assigned to be 150ms; theperception delay along the motion (delay between perceptionof a force and reaction) is 3ms.

During training the robots are controlled with theimpedance control law according to Eq. 4 with the pre-defined impedance parameters and the zero reference forcefd. The reference kinematics profiles share the same goal(i.e. bring the beam in a specified location), but have dis-similar timing (due to different reference velocity profilesand sensory delays). For both robots, the reference kine-matics are generated by a dynamical system parameterizedwith a multiplicative parameter. Specifically, we use theVITE dynamical model of human reaching motions [11]:xd = α(−xd + 4(xtar − xd)), where α is the multiplicativeparameter, xtar is the given target location. The dynamicalsystem produces the same task space trajectories but withdifferent velocity profiles3.

To provide examples of adaptation to different velocities,the parameter α of the leader is varied from one demon-stration to another (but not between the two stages of one

3A model parameterized with α = 10 produces the same trajectories inthe state-space {x1;x2} as a model parameterized with α = 20, howeverthe latter converges to the target twice as fast.

demonstration), so as to generate motions with a maximumdesired velocity of 0.2-0.5m/s and duration of 0.4-0.8s. Theα of the follower is fixed so to generate the reference motionof 0.8s with the maximum velocity 0.2m/s.

1) Passive Observation: The two robots are controlled totrack their reference kinematic profiles generated with theVITE system as discussed above. The stiffness of the robot-follower is set to be low (5N/m) and the stiffness of the robot-leader is high (50N/m). The robot-leader imposes its motionplan to the partner; see Fig. 4-(b). This requires the leaderto inject a considerable amount of energy; see Fig. 4-(c),(d),dashed line. Therefore, even though the follower tracks themotion of the leader (the actual velocity of the follower isclose to that of the leader), the perceived forces are differentfrom those that would be observed if it did so intentionally.To observe these forces, the follower reproduces the actualkinematics during the next training step.

2) Active Observation: The robot-leader tracks the samereference trajectory as during the passive stage. The robot-follower utilizes as the reference signal the actual kinematicsx, x, x recorded during passive observation (it better matchesthe reference kinematics of the leader; see Fig. 4-(b)). Thestiffness of both partners is medium (15N/m).

By deliberately reproducing this imposed kinematic pro-file, the follower generates forces that are better alignedwith those of the leader. The leader injects less energy and,therefore, the forces perceived by the follower are smallerthan those at the first stage; see Fig. 4-(c),(d), solid line.Fig. 4-(a),(e) highlights improvements in synchronizationacross the two stages4. The collected data (actual velocity,forces, and the feedforward commands of the follower) arefurther used to learn the task model h and the feedforwardcontrol u.

4During active observation the decrease in the perceived forces is mainlycaused by an improved synchronization. But the lowered stiffness of therobot-leader also contributes to this decrease. Indeed, the robot-leader doesnot forcefully guide the robot-follower.

B. Learning a Task Model

The acquired training data are depicted in Fig. 5-(a),(b);note that the data exhibit a force-velocity correlation, thiscorrelation is similar to the one observed in real-world dataacquired with the HRP-2 robot (see [2]). Importantly, the

−10 0 100

.2

.4

0

0

.1

.2

.4 .6.2

t[sec]

x [m

/sec

] d.

x [m

] d

x [m

/sec

] d.

f [N] −10 0 10 15

−40

−20

0

f [N]x [

m/se

c]

d.

u [

N r

ad]

(a) (b)

(c) (d)

observed by

Follower

reproduced by

Follower

reproduced by

Leader

t[sec]

0

.2

.4

0 .4 .6.2

Fig. 5. TASK LEARNING AND REPRODUCTION. In this experiment,the robot-follower learns the lifting task by observing demonstrationsperformed with different velocity profiles imposed by the leader. Duringreproduction, the robot-leader varies its kinematics plan from one attemptto another; it does so by changing motion duration and maximum velocity.The robot-follower, governed by the learned task and control model, adaptsits motion and successfully accomplishes the task. (a) The state-space viewof the data used for training a task model (gray line) and the reproductionattempts (green line). (b) The forward control signal generated by thefollower during demonstration (gray line) and reproduction (green line). Thetwo datasets correspond to the two joints of the planar robot-follower. (c)-(d)The time-series of the Cartesian velocity and trajectory of the robot-followerduring reproduction. The follower (green line) adjusts its kinematics andsynchronize with the leader (dashed blue line).

force-velocity dependency defines a task model: how togenerate a motion that is consistent with perceived force; i.e.for a given value of the perceived force, the robot estimatesthe relevant velocity. After data acquisition, the robot learnsthe task model ξ = h(ξ) and the forward control modelu = u(ξ). During reproduction, the learned models are fedinto the control law in Eq. 4. The results of the task executionare depicted in green in Fig. 5. Note that the robot-followersuccessfully adapts its velocity and synchronizes with therobot-leader.

C. Adaptation of Unknown Impedance

In the previous experiments we reused the impedanceparameters that the two robots had during training. However,in general the impedance parameters of the robot-followerare unknown, e.g., if the demonstrations are provided throughteleoperation. We now assume that the robot-follower hasno information about the impedance it should apply at theend-effector. Therefore, the robot-follower should adapt theparameters on-line, i.e. during task reproduction, Fig. 6.

The experiment starts with the robot-follower’s stiffnessand inertia set to mid-range values of 35N/m and 0 respec-

tively (in contrast, during demonstration the two parameterswere 10N/m and 0.7). This is an important difference inthe parameter values. One can see in Fig. 6-(a), dashedline, that reproduction without adaptation of the impedanceparameters leads to an overestimated reference velocity anddifficulties to stop at the target (oscillations in the velocityprofile). Adaptation allows for tuning of the parameters so asto ensure stable interaction; see Fig. 6-(a), green line. Afterslightly growing in the beginning of the motion, to enable forsmooth acceleration, the stiffness gradually decreases due tothe relaxation term and errors in tracking the desired forceprofile; see Fig. 6-(d). The inertia, in turn, decreases in thebeginning, to ensure the stable onset of the motion. It furtherincreases to endow the robot-follower with greater reactivityto the partner’s intentions; see Fig. 6-(c).

−.05

.05

.15

.25

.0

.1

.2

.3

.4

.5

0

10

20

30

40

K [

N/m

]d

t[sec]

x [

m/s

ec]

.

x [

m]

LeaderFollower

with impedance

used in training

Follower

with adapted

impedance

Follower

with arbitrary,

fixed impedance

t[sec] t[sec]

(a) (b)

(d)

∼.6 .4 .6.2

0

.5

1

1.5

Λ [

N/m

]d

t[sec](c)

.4.20

∼

0

.4 .6.20 .4 .6.20

Fig. 6. ADAPTATION OF UNKNOWN IMPEDANCE. In general, theimpedance parameters that would be optimal for the task are unknown.Therefore, the follower should tune its stiffness and inertia during taskexecution. The arbitrary impedance parameters may cause undesirableeffects, e.g. overestimated reference kinematics and contact instabilities(blue dashed line). Our algorithm provides the adaptation law, to tune theparameters and to ensure accurate reproduction (green lines in (a)-(b) showthe results with impedance adaptation).

D. Adaptation to Perturbations

We tested the ability of the learned model to adapt tochanging intentions of the leader during task execution.We simulated uncertainties about the motion objectives byvarying the target position xtar in the motion plan of theleader. Three cases are considered where the leader changesthe motion plan during task execution. It decides to movethe beam (1) higher than it has planned initially, (2) lower,but still higher than the actual position of the robots at themoment of taking the decision, and (3) lower than both theoriginal target position and the actual position. Fig. 7 showsthat the follower succeeds in bringing the beam to the newdesired location. Such an adaptation is possible because thelearned model generalizes the force and velocity patterns tovalues not observed during training (as discussed in Fig. 3).

f [N]

x [m

/sec

].

x[m

]

t[sec](a) (b)

x [m

/sec

]

d.

t[sec](c)

no perturbations applied Case I: Leader decides

to move higher

Case II: Leader decides

to move lower than the

original target

Case III: Leader

decides to stop

perturbation

d

On graph (b) the dashed line denotes to trajectories of Leader

-20 0 25

0

.3

.1

.2

0 .2 .4 .6

0

.2

.4

0

.2

.4

0 .2 .4 .6

beginning of

the motion

Fig. 7. ADAPTATION TO PERTURBATIONS. Three cases are analyzed: the robot-leader changes the motion plan on-line and move the beam (1) higherthan initially planned, (2) lower, but higher than the actual position of the beam at the moment the change decision is taken, and (3) stops at a positionlower than the actual position at the decision-taking moment. In case (1), the robot-follower manages to re-accelerate (see the two peaks in the velocityprofile). In cases (2) the robot-follower pro-actively decelerates. In case (3), the robot also manages to smoothly drop velocity below zero and lower thebeam.

Damping Controller

x [m/sec]

.

(a) (b)

(c) (d)

0 .1 .2 .3

−0.5

−0.3

−0.1

0.1

˙

θ[rad]

t[sec]

0 20

.0

.2

.4

Learned controller with

adaptive impedanceDamping controller

Damping controller

Learned controller

f [N]

Fig. 8. COMPARISON WITH A DAMPING CONTROLLER. We compareour system versus the damping controller. (a), (c) The planar robotsperforming lifting; the follower is controlled by our controller (a), and thedamping controller (c). One may notice the improvements in coordinationbetween the partners when the robot-follower adapts its kinematic profile(a): the beam is kept horizontal all along the motion. It is persistently tiltedwhen the follower is controlled with the damping (c). (b) θ characterizesdeviation of the beam from the horizontal orientation. (d) The leader has toapply considerably higher forces to make the system move.

E. Comparison with a Damping Controller

To highlight the advantages of the proposed approach forcontrolling a robot during collaborative manipulation tasks,we also implemented a damping controller [12]. This easy toimplement and computationally cheap method is often usedto control robots during physical collaboration with people.The only free parameter in this controller is the dampingcoefficient; the reference kinematics and all other impedanceparameters are set to zero.

The damping control has been proven to be useful andefficient in many applications; however, it puts additional

workload on the human and is not adequate to control forfast motions.

We compare our system versus the damping controller inFig. 8. The forces perceived by the robot are much higherthan those observed with our learned system (Fig. 8-(d)).This is due to the higher forces that the leader has to applyto maintain the interaction. The beam undergoes strongerrotation (Fig. 8-(b)) due to the imbalanced forces on its twosides. This will be highly undesirable if the beam is loadedwith unfixed objects that may slip down. However, we shouldemphasize, that our controller possesses local knowledge andfar from the demonstrated data, its prediction is irrelevant.In this case, we suggest switching to a damping controllerto ensure safety.

IV. DISCUSSION AND RELATED WORK

While interaction control has been investigated since long,the available tools remain less efficient than those developedfor free-space motion. Later attempts to implement physicalhuman-robot interaction introduced the notions of variableimpedance and active following. For instance in [13] a robotadjusts its damping depending on the perceived force; in [12]it tries to predict the target of a motion so as to generatea reference trajectory proactively. Although their validity isconfirmed with successful experimental results, they are stillad-hoc solutions: no generic framework for tackling bothtask learning and the variable impedance during physicalinteraction exists. In robot learning, the few existing worksconsider a single learned trajectory as a reference signalfor a hard-coded impedance controller. In [14], a robot istaught to clap hands with a human. The robot utilizes HiddenMarkov Models (HMM) to recognize the human behaviorfrom motion data and generate a reference trajectory. Theconsidered scenario does not require continuous interactionand haptic signals do not effect the reference kinematics. Ahand-shaking robot is presented in [15]. The authors encodemotion trajectories with an HMM, where the hidden variablesrepresent the human impedance. Such encoding requires the

robot to measure human impedance and to recognize whichmotion model to choose. Recognition happens at the onsetof the motion and governs the robot through the rest of thetask without adaptation. This is different from our approachthat advocates continuous adaptation of the motion.

V. CONCLUSIONS AND FUTURE WORKS

We present an approach to learning robot control duringphysical interaction with humans. The method addresses theproblem of controlling a robot so that it can coordinate itsmotions with that of a human in collaborative tasks, and thiswhile relying solely on haptic and proprioceptive feedback(no vision or verbal commands are involved).

In contrast to other works on variable impedance, ourmethod allows for adaptation within an execution trial, andnot only from trial to trial. Since there is a human in theloop, this characteristic is essential. We cannot ensure thatat the next trial the person will identically repeat the taskand provide the robot with time to tune its controller. Thepresence of a human in the loop makes it complicated toconduct a formal stability analysis of the adaptive impedance.Indeed, the prove of stability should demonstrate that theerrors ex, ex, and ef converge to zero with time. Humanbrings uncertainty into errors and, therefore, the stability ofthe system cannot be proved unless hypotheses are madeon human behavior. To ensure safety during interaction, weincorporate several security mechanisms that bound the forceproduced by the robot.

Here, we consider a non-redundant case; the method isalso applicable to redundant set-ups; one plausible way is tofollow a projection based approach [16]. Currently, we areworking towards implementing the algorithm on the physicalWAM robot and consider higher dimensional tasks.

We believe that our method contributes importantly toresearch on physical human-robot interaction. The proposedsystem endows the robot with two fundamental features ofhuman motor control that emerge during physical interaction:learning haptic communication in a natural manner, and con-tinuous adaptation to incoming forces during task execution.Additionally, the simulator developed in this work providesan efficient means to study physical interactions betweentwo agents for which we have yet very few models. Thesimulator offers a framework for systematical assessment andperformance comparison of different algorithms for controlof human-robot interaction.

APPENDIX

We briefly summarize the major concepts of the bio-mimeticadaptive algorithm presented in [3], for details, please, refer to theoriginal publication. The following cost function that penalizes thefeedback cost and activation of the feedforward command is:

minθi Ri(θi) = 0.5β (vi)2 + γ

K∑k=1

θik, ∀ i = 1 · · ·Nq. (15)

where β > 0, γ > 0 are empirical constants controlling theinfluence of the two components. In [3], it is suggested to usea special form of the feedback signal vi for derivation of theadaptation policy:

vi = 0.5[(1− χ)ϵi + (1 + χ)|ϵi|], ϵi = ρ1ei + ρ2e

i, (16)

ei is deviation of the controlled signal from its desired value,χ, ρ1, ρ2 > 0 are empirical constants. To optimize the trainedparameters Θ in Eq.7, the cost function Ri is minimized by gradientdescent:

∆θit = −dRi

dθi= −β(

∂vi

∂θik

)T vi − γ1, (17)

The control τ , feedforward u, and feedback signal v are linked:τ = u + v; see Eq.6. τ represents the environment being learnedand is assumed to be independent of Θ, therefore the adaptationlaw in Eq.17 can be rewritten as:

∆θi = β(∂ui

∂θik

)T vi − γ1 = βΦvi − γ1. (18)

VI. ACKNOWLEDGMENTS

This work was supported in part by the European Commissionunder contract numbers FP7-248258 (First-MM).

REFERENCES

[1] S. Calinon, P. Evrard, E. Gribovskaya, A. Billard, and A. Kheddar,“Learning collaborative manipulation tasks by demonstration using ahaptic interface,” in Proceedings of the International Conference onAdvanced Robotics, 2009.

[2] P. Evrard, E. Gribovskaya, S. Calinon, A. Billard, and A. Kheddar,“Teaching Physical Collaborative Tasks: Object-Lifting Case Studywith a Humanoid,” in Proceedings of IEEE International Conferenceon Humanoid Robots, 2009.

[3] G. Ganesh, A. Albu-Schaffer, M. Haruno, M. Kawato, and E. Burdet,“Biomimetic motor behavior for simultaneous adaptation of force,impedance and trajectory in interaction tasks,” in Proceedings of IEEEInternational Conference on Robotics and Automation, 2010.

[4] E. Gribovskaya, K. Zadeh, S. Mohammad, and A. Billard, “LearningNonlinear Multivariate Dynamics of Motion in Robotic Manipulators,”International Journal of Robotics Research, vol. 1, 2011.

[5] O. Khatib, “A unified approach for motion and force control of robotmanipulators: The operational space formulation.” IEEE Journal ofRobotics and Automation, vol. 3(1), pp. 1114–1120, 1987.

[6] N. Hogan, “Impedance control: An approach to manipulation, part i-iii,” ASME Journal of Dynamic Systems, Measurements, and Control,vol. 107, pp. 1–24, 1985.

[7] E. Burdet, A. Codourey, and L. Rey, “Experimental evaluation ofnonlinear adaptive controllers,” IEEE Control Systems Magazine,vol. 18, pp. 39–47, 1998.

[8] K. Astrom and B. Wittenmark, Adaptive Control. Addison-Wesley,1989.

[9] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weightedlearning for control,” Arificial Intelligence Review, vol. 11, pp. 75–113, 1997.

[10] D. Franklin, E. Burdet, K. Tee, R. Osu, C.-M. Chew, T. Milner, andM. Kawato, “CNS learns stable, accurate, and efficient movementsusing a simple algorithm,” The Journal of Neuroscience, vol. 28(44),pp. 11 165–11 173, 2008.

[11] E. Gribovskaya and A. Billard, “Combining dynamical systems controland programming by demonstration for teaching discrete bimanualcoordination tasks to a humanoid robot,” Proceeding of IEEE/ACMInternational Conference on Human-Robot Interaction, 2008.

[12] Y. Maeda, T. Hara, and T. Arai, “Human-robot cooperative ma-nipulation with motion estimation,” in Proceedings on InternationalConference on Intelligent Robots and Systems, 2001.

[13] V. Duchaine and C. Gosselin, “General model of human-robot co-operation using a novel velocity based variable impedance control,”in Proceedings of the Joint EuroHaptics Conference and Symposium,2007.

[14] D. Lee, C. Ott, and Y. Nakamura, “Mimetic communication modelwith compliant physical contact in Human–Humanoid interaction,”International Journal of Robotics Research, 2010.

[15] Z. Wang, A. Peer, and M. Buss, “An HMM approach to realistic haptichuman-robot interaction,” in Proceedings of the Joint EuroHapticsConference and Symposium, 2009.

[16] C. Ott, Cartesian Impedance Control of Redundant and Flexible-JointRobots, B. Siciliano, O. Khatib, and F. Groen, Eds. Spinger, 2008.

motion learning and adaptive impedance for robot...

Documents