automatic gaze control of humanoid headstolkinr/publication/c48.pdf · automatic gaze control of...

7
Automatic Gaze Control of Humanoid Head Naresh Marturi 1,2 , Valerio Ortenzi 2 , Jingjing Xiao 3 , Maxime Adjigble 4 , Rustam Stolkin 2 and Aleˇ s Leonardis 4 Abstract— This paper addresses the problem of gaze control for a humanoid robot head. We propose a control framework composed of two components: 1. an adaptive visual tracker that is capable of tracking an object with unknown trajectory and 2. an optimised visual control framework capable of controlling the overall head joint motions in order to maintain the tracked object at image centre. The advantage of such a framework is that it doesn’t require any prior knowledge of the object trajectory and can optimally exhibit the required joint motions i.e. the maximum motion of neck yaw and eye tilt/pan when gazing in a horizontal plane. An adaptive gain has been used with the controller in order to provide dynamic convergence of the task space error when gazing at an object with unpredicted trajectories. The proposed framework has been validated in real-time using our bi-manual robotic platform Boris and the obtained results demonstrate the efficiency of our approach. I. INTRODUCTION Over the last decade, humanoid robotics research has been greatly benefitted from the technological advancements made in the fields of mechanics, perception and control. To this extent, the development of new anthropomorphic robots that can mimic human-like behaviour gathered major attention. Generally, these robots possess various sensory abilities, out of which visual sensing is a fundamental component while interacting with the physical world. It not only assists in performing complex tasks like handling objects, path following etc., but also helps in maintaining human-robot in- teraction, e.g. by perceiving and executing facial expressions. Regardless of the application, constant and stable observation of a target is indispensable, which points at the necessity of controlling the head gaze. A prolonged gaze towards any specific moving object involves controlling either the movements of head or of eyes. This head-eye coordination for a humanoid robot integrates both computer vision and robot control. Vision helps in tracking a target object while the control module assists in joint motion generation based on the tracked information, termed as visual servoing [1]. This paper addresses both vision and control modules. There are different types of eye movements in humans and the most important for the gaze are: saccades, pursuit and vergence movements [2]. Saccades are the fast eye 1 KUKA Robotics UK Ltd., Great Western Street, Wednesbury, WS10 7LL, UK. [email protected] 2 School of Mechanical Engineering, University of Birmingham, Edgbas- ton, Birmingham, B15 2TT, UK. 3 School of Electronic, Electrical and Computer Engineering, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK. 4 School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK. Fig. 1. Boris - the bi-manual humanoid robot with KIT head at University of Birmingham. rotations i.e. both eyes are reoriented (pan and tilt) in the same direction to project an image of the world on to fovea, while pursuit helps in keeping an image of a moving object still (pan or tilt). Both of these eye movements are vital and hence, to maintain a constant gaze, at least two Cartesian degrees of freedom (DOF) i.e. pan and tilt of eyes or head are required. On the other hand, vergence movements compensate the disparities between both eyes. Based on this concept, many humanoid heads with varying degrees of freedom (DOFs) were built during the recent years [3], [4], [5], [6]. However, most of them are kine- matically redundant and require sophisticated gaze control mechanisms. In [7], a learning-based technique combined with log-polar tracking has been used. As the gaze direction is affected by other joint motions (neck and head), an independent controller for the neck has been developed. However, these learning-based methods require large amount of data processing and are inadequate in reproducing human- like behaviour in real-time. Gu and Su [6] have presented a biologically inspired gaze control method using a 4 DOFs head. Due to the specialised decision thresholds for eye movements, their method cannot be readily extended to the heads with more DOFs. In [8], an optimised gaze control mechanism for a 2 DOFs robot was developed. Due to the lack of sufficient DOFs, the other parts of the robot body were controlled in order to minimise the motion of the high- inertia parts. In [5], a normal proportional controller with a constant gain integrated with a biologically inspired tracker has been used to maintain a constant gazing direction for

Upload: nguyentu

Post on 25-Mar-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Automatic Gaze Control of Humanoid Head

Naresh Marturi1,2, Valerio Ortenzi2, Jingjing Xiao3, Maxime Adjigble4, Rustam Stolkin2 and Ales Leonardis4

Abstract— This paper addresses the problem of gaze controlfor a humanoid robot head. We propose a control frameworkcomposed of two components: 1. an adaptive visual tracker thatis capable of tracking an object with unknown trajectory and2. an optimised visual control framework capable of controllingthe overall head joint motions in order to maintain the trackedobject at image centre. The advantage of such a frameworkis that it doesn’t require any prior knowledge of the objecttrajectory and can optimally exhibit the required joint motionsi.e. the maximum motion of neck yaw and eye tilt/pan whengazing in a horizontal plane. An adaptive gain has been usedwith the controller in order to provide dynamic convergence ofthe task space error when gazing at an object with unpredictedtrajectories. The proposed framework has been validated inreal-time using our bi-manual robotic platform Boris and theobtained results demonstrate the efficiency of our approach.

I. INTRODUCTION

Over the last decade, humanoid robotics research has beengreatly benefitted from the technological advancements madein the fields of mechanics, perception and control. To thisextent, the development of new anthropomorphic robots thatcan mimic human-like behaviour gathered major attention.Generally, these robots possess various sensory abilities,out of which visual sensing is a fundamental componentwhile interacting with the physical world. It not only assistsin performing complex tasks like handling objects, pathfollowing etc., but also helps in maintaining human-robot in-teraction, e.g. by perceiving and executing facial expressions.Regardless of the application, constant and stable observationof a target is indispensable, which points at the necessity ofcontrolling the head gaze.

A prolonged gaze towards any specific moving objectinvolves controlling either the movements of head or of eyes.This head-eye coordination for a humanoid robot integratesboth computer vision and robot control. Vision helps intracking a target object while the control module assists injoint motion generation based on the tracked information,termed as visual servoing [1]. This paper addresses bothvision and control modules.

There are different types of eye movements in humansand the most important for the gaze are: saccades, pursuitand vergence movements [2]. Saccades are the fast eye

1 KUKA Robotics UK Ltd., Great Western Street, Wednesbury, WS107LL, UK. [email protected]

2School of Mechanical Engineering, University of Birmingham, Edgbas-ton, Birmingham, B15 2TT, UK.

3School of Electronic, Electrical and Computer Engineering, Universityof Birmingham, Edgbaston, Birmingham, B15 2TT, UK.

4School of Computer Science, University of Birmingham, Edgbaston,Birmingham, B15 2TT, UK.

Fig. 1. Boris - the bi-manual humanoid robot with KIT head at Universityof Birmingham.

rotations i.e. both eyes are reoriented (pan and tilt) in thesame direction to project an image of the world on tofovea, while pursuit helps in keeping an image of a movingobject still (pan or tilt). Both of these eye movements arevital and hence, to maintain a constant gaze, at least twoCartesian degrees of freedom (DOF) i.e. pan and tilt of eyesor head are required. On the other hand, vergence movementscompensate the disparities between both eyes.

Based on this concept, many humanoid heads with varyingdegrees of freedom (DOFs) were built during the recentyears [3], [4], [5], [6]. However, most of them are kine-matically redundant and require sophisticated gaze controlmechanisms. In [7], a learning-based technique combinedwith log-polar tracking has been used. As the gaze directionis affected by other joint motions (neck and head), anindependent controller for the neck has been developed.However, these learning-based methods require large amountof data processing and are inadequate in reproducing human-like behaviour in real-time. Gu and Su [6] have presented abiologically inspired gaze control method using a 4 DOFshead. Due to the specialised decision thresholds for eyemovements, their method cannot be readily extended to theheads with more DOFs. In [8], an optimised gaze controlmechanism for a 2 DOFs robot was developed. Due to thelack of sufficient DOFs, the other parts of the robot bodywere controlled in order to minimise the motion of the high-inertia parts. In [5], a normal proportional controller with aconstant gain integrated with a biologically inspired trackerhas been used to maintain a constant gazing direction for

(a) (b)Fig. 2. (a) Illustration of different joints associated with head (b) Headkinematics diagram used to estimate the D-H parameters (Change bothimages).

the Twente humanoid head. However, by using such type ofcontrol, the eyes move faster while the head slowly followsthe motion. In [9], a novel virtual mechanism-based approachhas been developed for a 7 DOFs ARMAR-III robot head.In order to maintain a constant gaze, two external virtualprismatic joints (one for each eye) were considered with thekinematic structure of the head. Milighetti et al. [10] havedeveloped a weighted pseudoinverse of Jacobian-based gazecontrol schema using ARMAR-III head. A Kalman filterwas used in the feedforward control to predict the positionof the target. However, this method requires previouslycomputed covariance matrices to initiate the Kalman filter.Moreover, a back projected humanoid head was presentedin [3] showing animated gaze. Recently in [11], a tangent-projection approach has been considered to maintain head-eye coordination for a 2 DOF head.

Most of the above mentioned methods proposed differenttechniques to exploit kinematic redundancy of the head byusing two or more controllers. In this work, we present anadaptive gaze controller whose control strategy has been opti-mised such that optimal joint motions are generated to mimichuman-behaviour. The gain associated with the proposedcontroller has been chosen to be adaptive such that to provideoptimal and dynamic joint motions. In the reminder of thispaper: section II presents the used experimental set-up alongwith the head kinematic model. The Jacobian matrix obtainedfrom the kinematic model has been used to control the head-eye movements. In order to track unknown target objects,an adaptive tracker has been developed and explained insection III along with the developed gaze controller. Finally,both tracking and gaze control experiments are illustrated insection IV.

II. USED SET-UP AND HEAD KINEMATICS

A. Experimental Set-up

The experimental set-up used for this work consists of abi-manual humanoid platform “Boris”, integrated with a KIThead as shown in Fig. 1. The robot contains two KUKA LWRmanipulator arms and the head is comprised of a neck andtwo eyes (each eye share a pair of cameras). Both eyes sharea common tilt axis (joint 5 of the head) and are associatedwith independent pan axes (joints 6 and 7 of the head). Thisimaging system is mounted on a 4 DOFs neck, where 3

joints are responsible for producing lower neck pitch-yaw-roll movements and a fourth joint is used for upper neckpitch, as shown in Fig. 2(a). Even though each eye containstwo cameras, one with narrow angle lens and the other withwide angle lens, we use only the wide angle ones in thiswork.

The imaging system provides color images of 640 × 480pixels at a frequency of 60 Hz. The cameras are connected toa remote desktop using a firewire interface. Both head controland vision processing algorithms are programmed in C++and are executed in parallel threads on a desktop computer.Real-time matrix computations are performed using ViSP[12]. This creates the flexibility to maintain synchronousdevice operation almost without a time lag. The joints ofthe head are commanded in position, whose reference iscomputed within the control framework. These angles aretransferred to the head control computer over Ethernet.

B. Head Kinematics

Since each eye is located in parallel to the other andare linked to the common neck, two different kinematicchains are feasible. Usually, these systems are termed asbranched mechanisms, where different kinematic chains canbe obtained from the base.

Two different types of operation are possible with sucha configuration. One chain can be the dominant and beresponsible to perform the entire task while the other chainsfollow its motion. Alternatively, all the available chains areequally prioritised and the overall motion to accomplish aparticular task is distributed over the chains based on theirDOFs [9]. In this work, we follow the former; since the visualtracking of an object is performed by the dominant eye, theother eye simply adapts the motion of main eye. Based onthis criteria, we compute the forward kinematics of a singlechain (associated with the left eye) and its Jacobian matrixin order to control the head motion.

We define q the robot configuration, taking values in a6-dimensional configuration space C. The kinematic modelof the head [10] is computed using the Denavit-Hartenbergparameters shown in Table I, yielding the homogeneousmatrix T(q). Considering i = 1 · · · 6, αi are expressed inradians and ai and di in meters.

TABLE IDENAVIT-HARTENBERG PARAMETERS OF THE KIT HEAD.

Joint ai αi di θi1 0 −π/2 0 q12 0 π/2 0 q23 0 π/2 -0.1745 q34 0.1 π 0 q45 0 π/2 0 q56 0.0465 0 0 q6

Forward kinematics T(q) is computed classically multi-plying the contributions of all the 6 revolute joints presentedin the dominant kinematic chain i−1Ti(qi):

T(q) =0 T1(q1) . . .5 T6(q6) (1)

Using (1), geometric Jacobian of the kinematic chain hasbeen computed as

J =

[z0 × (pe − p0) . . . z5 × (pe − p5)

z0 . . . z5

]6×6

(2)

where zi−1 is the z axis of rotation matrix in 0Ti−1(q),pe the end effector position and pi the three elements ofthe fourth column of 0Ti−1(q). The computation of thisJacobian is of key importance in order to link the visualmotion with head. In this work, we consider that the cameraframe coincides with the end effector of the kinematic chainof the head.

III. HEAD MOTION CONTROL

The major goal of this work is to automatically controlall the available DOF of a humanoid head such that theobject of interest (either moving or stationary) is alwaysmaintained at the origin of gazing axis. This type of motioncan always mimic the human behaviour and requires at leasttwo joint motions at any point of time to maintain the panand tilt of either eye or head or both. In order to accomplishthis goal, the overall task has been decomposed into twosubtasks: tracking a moving object with undefined trajectoryand synchronous control of the head-eye motion. In fact,both these tasks are inter-connected i.e. an object cannot betracked continuously without a robust motion control of theavailable joint angles. Obeying this, we detail each of thesesubtasks below.

A. Adaptive Target Tracking

Generally target tracking by vision involves obtaininguseful features, which are then used to compute the 3D poseinformation of an object. This process plays a vital role inthe gaze control methodology. As mentioned before, we usethe dominant vision for tracking, the respective images ofthis eye are utilized for real-time tracking. The target trackerdeveloped in this work has been inspired from our previouslydeveloped version [13]. Since we consider tracking a singlerigid object, only the primary layer of the existing trackeris used. This simplification has been made for the sake ofcomputational efficiency bearing in mind the overall goal.

Since the target objects are unknown, no object detectionstrategy is used in this work. Instead, the developed trackerhas been manually initialized, where a human operatorselects the corresponding point coordinates of a boundingbox around the object with simple mouse clicks.

Once, the target position is provided, the process startsby obtaining a reference model Hr(ς) of the object. Let{ui}i = 1 · · ·M be the pixels associated within the providedreference region R. Hr(ς) is obtained by computing theindividual channel histograms ui → h(ui) ∀i ∈ R, whereh : R2 → {1, 2, 3 · · ·M} represents total number of binsaccording to the pixel RGB values. Now, Hr(ς) is built usingthe probability of a particular histogram bin ς and is givenby

Hr(ς) =

{1

M

∑i∈R

δ [h(ui)− ς]

}ς=1···M

(3)

(a) (b)

(c)

Fig. 3. (a) Reference model (b) equally positioned samples around thereference model (c) final tracked box.

where, δ is the Kronecker delta function. Notice that Hr(ς)

is normalized so that:M∑ς=1

Hr(ς) = 1.

After extracting the reference model, we equally allocatethe samples (bounding boxes) around the reference sam-ple. This step provides us with a set of observed featuresfrom all the allocated samples. In turn individual models{Ho(j)

}j=1···N are computed for each observed jth sample.

Fig. 3 illustrate the reference sample and equally allocatedsamples.

Next, in order to find an optimum sample, Bhattacharyyacoefficient ρ has been used to measure the similarity scoresbetween the reference model Hr and each and every ob-served model Ho(j). This is given by

ρ(Hr, Ho(j)

)=

M∑ς=1

√Hr(ς)Ho(j)(ς) (4)

Even though the optimum sample can be selected byfinding the maximum of ρ, in this work, we have usedGaussian weights to improve the performance of tracker.These weights ωj are given by

ωj = exp(ρ(Hr, Ho(j)

))∗ κ (5)

where, κ > 0 is a constant. Note that ωj also needs to fulfil

the normalization condition:N∑j=1

ωj = 1.

The final estimation of the tracker Ψ is obtained by theexpectation operator over all samples,

Ψ =

M∑j=1

ωjUi (6)

where, Ui represents the image coordinates of the sample j.In turn, the obtained Ψ will act as the reference boundingbox for the next frame.In order to reduce the effect of noise,a low-pass filter has been used to tune Ψ and is given by

Ψk = αΨk + (1− α)Ψk−1 (7)

where, k and k − 1 denote current and the previous values,respectively and α ≤ 1 ∈ R.

B. Automatic Gaze Controller

The control problem is to always maintain an object atthe centre of camera optical axis by moving the head joints.Hence, the image centre point can be treated as the referencefeature s∗. The centroid of the bounding box obtained fromprevious step is considered as the current feature s(t). Theoverall goal is to regulate this error e = s− s∗. Consideringthe robot configuration, the visual features can be

s(t) = s (q(t)) (8)

Using (8), the relation between robot joint velocities andoverall task velocities is given by

s =∂s

∂qq = Jsq (9)

where, s is the temporal variations of visual features, q pointsto the head joint velocities and Js is the task space Jacobiangiven by [14]

Js = LscVe

eVb (10)

where, Ls is a 2 × 6 Jacobian obtained from the currentfeatures, cVe is a 6× 6 velocity twist matrix build using thehomogeneous transformation matrix from camera frame tothe end-effector frame and eVb is robot joint space Jacobianexpressed in the end-effector frame computed using (2).An important consideration is that the end-effector framecoincides with the used camera frame, so that the head cangaze perfectly at an object at any point of time.

Using (9), a basic visual servoing control law of the formqh = −λJs

†s can be used, where λ is a positive constantgain, qh being joint velocities and Js

† is the pseudo inverseof J. However, to improve the convergence and to achieve ahuman-like behaviour, in this work we define a control lawinspired from the Levenberg-Maquardt optimization criteria[15]. It is given by

qh = λa

(Js>Js + µ · diag(Js

>Js))−1

Js>(s− s∗) (11)

where, λa < 0 and µ > 0 are gains. Considering Js>Js as

Hessian matrix H and by using e, the simplified final controllaw is then given by

qh = λa (H + µ · diag(H))−1

Js>e (12)

In order to assist the convergence, in this work we considerλ as adaptive gain of the form

λa = βe−γi + ε (13)

where, β,γ,ε are positive values and i is the iteration count.An important point to note is that, since Z is not computed

from the image space, it has been constrained. Due to this,even though the head is controlled using all the availablejoints, it cannot move backwards to maintain a constantdistance with the object.

IV. REAL-TIME VALIDATIONS

This section presents the gaze control experiments con-ducted using the system. We follow top-down approachby demonstrating the capabilities of individual modulesfollowed by the final gaze control using presented experi-mental set-up. The experiments start with vergence controlto maintain synchronous motion of both eyes.

A. Vergence Control for Eye Alignment

As we mentioned before that we rely on dominant chain,the other chain (with other eye as end-effector) has to followits movements that ensures both eyes are looking at sameobject. This type of behaviour ensures that the auxiliary eyefollows dominant eye motion. To this extent, we use ourpreviously developed Fourier-based direct visual servoingmethod [16], where the control is generated based on theobserved disparity between both eye images. Fig. 5(a) refersto the image from dominant eye (in our case, left eye) andFig. 5(b) and 5(c) show the observed and corrected vision,respectively. From the results, it can be seen that both eyesare aligned with a minimal error.

B. Visual Tracking

As an initial step towards the head gaze control, we eval-uate the developed tracker by conducting tests with variousobjects irrespective of size, shape, colour, environment andtrajectory. It starts by manual initialization i.e. a humanoperator provide initial bounding box coordinates. First rowof Fig. 4 shows used objects along with their initialized

(a)

(b) (c)

Fig. 5. (a) Image from the dominant eye (red dot points to the image centre)(b) image from right eye before correction (c) after vergence control .

Fig. 4. Images from the head dominant eye view. (top row) Provided initial bounding box for an object (middle row) tracked object after 10 seconds and(bottom row) tracked object after 50 seconds. The test shown in fourth column was performed at the possible low lighting conditions where the objectused can be easily mismatched with the background.

regions. The object has then been moved manually to analysethe performance of tracker. Second and third rows of Fig. 4show the tracked object respectively at time t = 10s andt = 50s. These results clearly demonstrate the efficiency ofdeveloped tracker in using it for gaze control.

C. Head Gaze Control

Once the reference and initial visual features are extractedfrom the above step, the head starts gazing at the object.During this process, the joint angles are read and updatedcontinuously in parallel for every iteration of the control.Approximately, a single iteration spans 37 ms including bothimage processing and control generation.

Here, we first validate the developed control strategy bycomparing it with a gaze controller based on pseudo-inverseof the task space Jacobian, similar to the one used in [10].

1) Initial validations: The overall objective is to performthe initial saccadic movement by controlling both eye andhead motions. For the sake of comparison we use a constantgain value of λ = 20 for both the proposed and comparedcontrollers and constrained the Z axis motion (in cameraframe) to 1. The controller has thus been validated by pro-viding similar initial bounding box of the object that acts as avital cue and has been analysed for the initial convergence.Fig. 6 shows head joint variations with the compared andproposed controllers. From the obtained results it can be seenthat the proposed gaze controller provides fast and smoothmovements during initial gazing. It can also be noticed thatdue to the use of optimization, the joints with high motioncapabilities move faster than that of lower ones. For example,from Fig. 6(bottom), the maximum head movement has beencompensated by neck and eye movements. Besides, as therigid target object is finely fixed within the task space, onlythe motion in horizontal plane is required i.e. to use neck

0 10 20 30 40 50 60 70−2

−1

0

1

2

Iterations

Jo

int

ve

loc

itie

s [

m/s

]

Neck Pitch (q1)

Neck Roll (q2)

Neck Yaw (q 3

)

Head Tilt (q4)

Eyes Tilt (q5)

Left Eye Pan (q6)

0 10 20 30 40 50 60 70−1.5

−1

−0.5

0

0.5

1

Iterations

Jo

int

ve

loc

itie

s [

m/s

]

Neck Pitch (q1)

Neck Roll (q2)

Neck Yaw (q 3

)

Head Tilt (q4)

Eyes Tilt (q5)

Left Eye Pan (q6)

Fig. 6. (top) Control law based on the pseudo-inverse of Jacobian (bottom)proposed controller.

yaw and eye tilt/pan in order to gaze at the object. Thistype of behaviour can be extensively seen with humans. Theproposed optimized gaze controller performed in the similarfashion.

2) Gazing at manually moved target: This test is per-formed to gaze at a target object that is moved by a humanoperator. For this case and for next experiment, adaptivegain given by (13) is considered. Besides, the scene containmultiple objects. Fig. 7 shows dominant eye-view images ofhead at different times and Fig. 8 illustrate pan (along X)and tilt (along Y ) angle errors of head. Fig. 9 shows theevolution of joint angles and velocities during the task. Theoperator starts to move the object at t = 12.1 s from whenthe head starts gazing at moving object. First of all the scenecomplexity does not affect the tracker and next all the jointangles contribute in minimising the pan and tilt errors. Thisclearly points at the use of redundancy of head in maintaininga constant gaze.

(a) (b)

(c) (d)

Fig. 7. Images from the head dominant eye view during gazing task.Pink box defines reference and yellow box is observed. Images are at (a)t = 1.5 s (b) t = 14.8 s (c) t = 21.3 s and at (d) t = 34 s, respectively.Note that images are cropped for illustration.

200 250 300 350 400 450 500 550 600240260280300320340360

Iterations (1 iteration = 37 ms)

X−

Pix

els

200 250 300 350 400 450 500 550 600220230240250260270280

Y−

Pix

els

200 250 300 350 400 450 500 550 600−10

0

10

Iterations (1 iteration = 37 ms)

Pan (

°)

200 250 300 350 400 450 500 550 600−20

0

20

Iterations (1 iteration = 37 ms)

Tilt

(°)

Fig. 8. (top) Control law based on the pseudo-inverse of Jacobian (bottom)proposed controller.

3) Gazing at humanoid arm: The final test has beenperformed to gaze at the end-effector of the robot arm. Forthis purpose, KUKA LWR (acting as humanoid hand) hasbeen moved manually (gravity compensation mode) withinthe task space. Fig. 10 shows some of the eye-view imagesand the corresponding errors are reported in Fig. 11. Theobtained results clearly demonstrate the efficiency of theproposed gaze controller.

4) Supplementary video description: Video in attachmentshows how the head follows the moving object. We reportthree views of the scene: 1. general view of the experimentalsetup e.g. the whole robot is visible; 2. view of the head, toshow the motion of the head and 3. the view from the eye

200 250 300 350 400 450 500 550 600−0.5

0

0.5

1

Iterations (1 iteration = 37 ms)

An

gle

(°)

q1

q2

q3

200 250 300 350 400 450 500 550 600−1

−0.5

0

0.5

Iterations (1 iteration = 37 ms)

Ang

le (

°)

q4

q5

q6

(a)

200 250 300 350 400 450 500 550 600−1

0

1

2

Iterations (1 iteration = 37 ms)

velo

city [m

/s]

q1

q2

q3

200 250 300 350 400 450 500 550 600−1

0

1

2

Iterations (1 iteration = 37 ms)

velo

city [m

/s]

q4

q5

q6

(b)

Fig. 9. The dominant 6 joint (a) angles and (b) velocities variation duringthe gazing task.

camera, also with the purpose of showing the images used fortracking and visual servoing components of the framework.

V. CONCLUSION

This paper presented an automatic gaze control based onvisual servoing. A tracker was developed to track unknownobjects selected in the initialisation phase. Then a visualservoing framework based on the image and the robotkinematic Jacobian matrices controls the head in order tofollow the selected object, i.e. to keep it in the centre of thecamera image. Experimental results were reported to showhow the head follows the tracked object effectively with theproposed method.

There are a number of directions to improve this work andwe intend to refine several aspects of this framework. Futurework will include an amelioration of the tracker, which isin fact a simplification of our previous tracker in [13]. Amore robust tracker will most likely impact positively on theperformances of the gaze control framework.

(a) (b) (c) (d)

Fig. 10. Images from the head dominant eye view during gazing task at arm end-effector. Pink box defines reference and yellow box is observed. Imagesare at (a) t = 0.2 s (b) t = 1.4 s (c) t = 3.4 s and at (d) t = 5.6 s, respectively.

0 100 200 300 400 500 600−20

0

20

Iterations (1 iteration = 37 ms)

Pa

n (

°)

0 100 200 300 400 500 600−20

0

20

Iterations (1 iteration = 37 ms)

Tilt

(°)

Fig. 11. (top) Control law based on the pseudo-inverse of Jacobian (bottom)proposed controller.

From a different perspective, we are interested in using allthe potentialities of our robotic platform Boris. In particular,we would like to extend our previous work [17], integratingthe gaze control. Prospective applications include but are notlimited to contact detection and estimation through vision,and in this case gaze control would be crucial to track theend effector of the robot, like in one of the experimentsshown in section IV.

ACKNOWLEDGEMENTS

This work was supported by the . . .

REFERENCES

[1] F. Chaumette and S. Hutchinson, “Visual servo control. i. basicapproaches,” IEEE Robotics & Automation Magazine, vol. 13, no. 4,pp. 82–90, 2006.

[2] D. Purves, G. J. Augustine, D. Fitzpatrick, L. C. Katz, A.-S. LaMantia,J. O. McNamara, S. M. Williams et al., Types of Eye Movementsand Their Functions. Sunderland (MA): Sinauer Associates, 2001,available from: http://www.ncbi.nlm.nih.gov/books/NBK10991/.

[3] S. A. Moubayed, G. Skantze, and J. Beskow, “The furhat back-projected humanoid head–lip reading, gaze and multi-party interac-tion,” International Journal of Humanoid Robotics, vol. 10, no. 01, p.1350005, 2013.

[4] T. Asfour, K. Welke, P. Azad, A. Ude, and R. Dillmann, “The karlsruhehumanoid head,” in IEEE-RAS International Conference on HumanoidRobots. IEEE, 2008, pp. 447–453.

[5] L. Visser, R. Carloni, and S. Stramigioli, “Design and control ofthe twente humanoid head,” in Proceedings of the 2nd Workshopon Human Friendly Robotics. Genova, Italy: http://www.iit.it,December 2009, http://hfr2009.wordpress.com/. [Online]. Available:http://doc.utwente.nl/71681/

[6] L. Gu and J. Su, “Gaze control on humanoid robot head,” in The SixthWorld Congress on Intelligent Control and Automation, vol. 2. IEEE,2006, pp. 9144–9148.

[7] G. Metta, A. Gasteratos, and G. Sandini, “Learning to track coloredobjects with log-polar vision,” Mechatronics, vol. 14, no. 9, pp. 989–1006, 2004.

[8] F. Faber, M. Bennewitz, and S. Behnke, “Controlling the gaze directionof a humanoid robot with redundant joints,” in IEEE InternationalSymposium on Robot and Human Interactive Communication. IEEE,2008, pp. 413–418.

[9] D. Omrcen and A. Ude, “Redundant control of a humanoid robothead with foveated vision for object tracking,” in IEEE InternationalConference on Robotics and Automation (ICRA). IEEE, 2010, pp.4151–4156.

[10] G. Milighetti, L. Vallone, and A. De Luca, “Adaptive predictivegaze control of a redundant humanoid robot head,” in IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS).IEEE, 2011, pp. 3192–3198.

[11] Z. Zhang, A. Beck, and N. Magnenat-Thalmann, “Human-like behav-ior generation based on head-arms model for robot tracking externaltargets and body parts,” 2014.

[12] E. Marchand, F. Spindler, and F. Chaumette, “Visp for visual servoing:a generic software platform with a wide class of robot control skills,”Robotics & Automation Magazine, IEEE, vol. 12, no. 4, pp. 40–52,2005.

[13] J. Xiao, R. Stolkin, and A. Leonardis, “Single target tracking usingadaptive clustered decision trees and dynamic multi-level appearancemodels,” June in-press.

[14] F. Chaumette, “Visual servoing,” in Robot Manipulators: Modeling,Performance Analysis and Control, E. Dombre and W. Khalil, Eds.ISTE, 2007, ch. 6, pp. 279–336.

[15] H. Gavin, “The levenberg-marquardt method for nonlinear leastsquares curve-fitting problems,” Department of Civil and Environmen-tal Engineering, Duke University, pp. 1–15, 2011.

[16] N. Marturi, B. Tamadazte, S. Dembele, and N. Piat, “Visual servoingschemes for automatic nanopositioning under scanning electron micro-scope,” in IEEE International Conference on Robotics and Automation(ICRA). IEEE, 2014, pp. 981–986.

[17] V. Ortenzi, M. Adjigble, J. Kuo, R. Stolkin, M. Mistry et al., “An ex-perimental study of robot control during environmental contacts basedon projected operational space dynamics,” in IEEE-RAS InternationalConference on Humanoid Robots. IEEE, 2014, pp. 407–412.