on the eye tracking problem: a challenge for robust control

*Correspondence to: Allen Tannenbaum, Department of Electrical and Computer Engineering, Georgia Institute ofTechnology, Atlanta, GA 30082, U.S.A.

sE-mail: [email protected]

tThis paper is dedicated to the memory of my dear friend and mentor George Zames.

Contract/grant sponsor: National Science Foundation; contract/grant number: ECS-9700588Contract/grant sponsor: NSF-LISContract/grant sponsor: Air Force O$ce of Scienti"c Research; contract/grant number: AF/F49620-98-1-0168Contract/grant sponsor: Army Research O$ce; contract/grant number: DAAG55-98-1-0169Contract/grant sponsor: MURI

Copyright ( 2000 John Wiley & Sons, Ltd.

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROLInt. J. Robust Nonlinear Control 2000; 10:875}888

On the eye tracking problem: a challenge for robust controlt

Allen Tannenbaum*,s

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30082, U.S.A.

SUMMARY

Eye tracking is one of the key problems in controlled active vision. Because of modelling uncertainty andnoise in the signals, it becomes a challenging problem for robust control. In this paper, we outline some ofthe key issues involved as well as some possible solutions. We will need to make contact with techniquesfrom machine vision and multi-scale image processing in carrying out this task. In particular, we will sketchsome of the necessary methods from computer vision and image processing including optical #ow, activecontours (&snakes'), and geometric driven #ows. The paper will thus have a tutorial #avor as well. Copyright( 2000 John Wiley & Sons, Ltd.

KEY WORDS: visual tracking; robust control; eye movements; active contours; optical #ow; multiscale imageprocessing

1. INTRODUCTION

In this paper, we consider the controlled active vision problem of tracking eye movements. Thesolution of this problem requires the integration of techniques from control theory, signalprocessing and computer vision. We will show that it is possible to treat this problem by usingadaptive and robust control in conjunction with multiscale methods from signal processing, andshape recognition theory from computer vision. Tracking is a basic control problem in which wewant the output to follow or track a reference signal, or equivalently we want to make thetracking error as small as possible relative to some well-de"ned criterion (say energy, power, peakvalue, etc.). Even though tracking in the presence of a disturbance is a classical control issue, theproblem at hand is very di$cult and challenging because of the highly uncertain nature of thedisturbance.

One can consider the eye movement tracking problem in the context of a man}computerinterface. In particular, we would like to infer from the movement of the eyes of a computer user,

which actions the user intends to accomplish, e.g., opening a folder on a computer screen,expanding a pull-down menu item, etc. This type of computer tracking system should serve asa concrete illustration of what is involved in our eye tracking methodology.

It is important to emphasize that while considerable research has been performed in the area ofmeasuring eye movements in a controlled laboratory environment, much less research has beendone on using this information in applications. Hence the techniques which we will discuss belowshould have a wide range of applicability in a number of tracking problems including those inrobotics, remotely controlled vehicles, and pilot tracking helmets currently being developed. Thelatter are systems combining helmet mounted head and eye track capability to de"ne a subject'strue line of sight. Clearly, the proper exploitation of the dynamic characteristics of the humanvisual system by tracking the postion of the viewer's eyes leads to drastic reduction in the amountof information that needs to be transmitted. Discussions of visual tracking and eye movements(together with an extensive list of references) may be found in References [1}4].

We should note that the problem of visual tracking di!ers from standard tracking problems inthat the feedback signal is measured using imaging sensors. In particular, it has to be extracted viacomputer vision algorithms and interpreted by a reasoning algorithm before being used in thecontrol loop. Furthermore, the response speed is a critical aspect. For obvious reasons, an eyetracking system should be as non-invasive as possible. The eye movement is tracked by studyingimages acquired by grey scale or infra-red cameras or by an array of sensors. The images areanalysed to extract the relative motions of the eyes and the head. The low-level data acquisitionand recognition is accomplished via a multiscale technique. This technique o!ers a number ofadvantages. First, as is becoming clear from research in signal processing and medical imageacquisition procedures, it leads to fast real time signal acquisition and understanding procedures.Recognition will be accomplished using a multiscale approach and a new computational theoryof shape.

Consequently, from the control point of view, we have a tracking problem in the presence ofa highly uncertain disturbance which we want to attenuate. Note that the uncertainty is due to thesensor noise (classical), the algorithmic component described above (uncertainty in extractedfeatures, likelihood of various hypotheses, etc.), and modelling uncertainty.

We should note that this paper will have a tutorial #avour. Indeed, one of our key motivationsis to introduce some key concepts from active vision to the robust control community. Westrongly believe that both communities will derive much bene"t by studying each other's methodsand problems. In particular, for visual tracking and for the speci"c problem of eye movementtracking, the techniques of robust control which explicitly treat uncertainty can be relevant as wewill argue below.

2. VISUAL TRACKING

There have been a number of papers on the use of vision in tracking, especially in the roboticscommunity. (See e.g., References [1, 2, 5}9], and the references therein.) Typically, the issueaddressed in this work is the use of vision for servoing a manipulator for tracking (or anequivalent problem). The motivation for using vision in such a framework is clear, that is, thecombination of computer vision coupled with control can be employed to improve the measure-ments. Indeed, because of improvements in image-processing techniques and hardware, robotic

876 ON THE EYE TRACKING PROBLEM

Copyright ( 2000 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2000; 10:875}888

technology is reaching the point where vision information may become an integral part of thefeedback signal. For problems with little uncertainty, simple PID controllers have been used, andfor more noisy systems, adaptive schemes as well as stochastic based LQG controllers have beenutilized.

A number of control schemes have been proposed for the utilization of visual information incontrol loops. These have ranged from the use of sensory features to characterize hierarchicalcontrol structures, Fourier descriptors, and image segmentation; see References [9}11], and thereferences therein. A promising approach based on optical -ow has been used as a key element inthe calculation of the robot's driving signals (see in particular, References [7, 8], and ourdiscussion below). Indeed, since an object in an image is made up of brightness patterns, as theobject moves in space so do the brightness patterns. The optical #ow is then the apparent motionof the brightness patterns; see Reference [12]. Under standard assumptions, given a static targetand a moving camera (or equivalently, a static object and a moving camera), one can write downequations for the velocity of a projected point of the object onto the image plane. Several methodshave been discussed for the computations of this velocity. This information can then be used totrack the image.

Let us consider the concrete problem of tracking targets on a computer screen. Then we haveonly a two-dimensional tracking question. We assume for simplicity that the object moves ina plane which is perpendicular to the optical axis of the camera. If the camera then moves withtranslation velocity

q :"Cqx

qyD

and rotational velocity oz

(with respect to the camera frame), one may pose the two-dimensionaltracking problem as follows for an object [13]. Let P

tdenote the area on the image plane which is

the projection of the target. Then visually tracking this feature amounts to "nding the cameratranslation q and rotation o

z(with respect to the camera frame) which keeps P

tstationary. There

are similar characterizations for the tracking of features. Now from this set-up, one can writedown linearized equations of the optical #ow generated by the motion of the camera where thecontrol variables are given by those components of the optical #ow induced by the camera'stracking motion.

The exact form may be found in References [13, 7], and need not concern us now. The point isthat the resulting system may be written in standard state space form and after discretization(with ¹ the time between two consecutive frames) takes on the form

x (n#1)"x (n)#¹ur(n)#¹d (n)#v(n)

z(n)"x (n)#w (n)

where ur

is the reference, d is the exogenous disturbance, v is a &noise' term for the modeluncertainty, z is the measurement together with noise component w. (All the vectors are in R3.The components of the state vector are made up of the the x, y, and roll component of thetracking error.)

There are a number of important control issues related to such a set-up. Of course, one has theproblem of measurement delays (we want to work in real time) and choice of sampling time. Butwe feel there is a much deeper and more di$cult problem which must be addressed beforea reasonable choice of control strategy can be made. Namely, in general the uncertainty (v and w)

A. TANNENBAUM 877


is modelled as white noise. This model is conservative and does not bring into account anyof the possible structure of noise environment. One of the key contributions in modern robustcontrol has the consideration of structure in uncertainty (see Reference [14] and the referencestherein).

In our case, we are proposing a much deeper analysis of the uncertainty connected to suchproblems. This brings the key element of signal processing and in particular, the new powerfulmethods of multiscale computations. Shape recognition theory in computer vision based onHamilton}Jacobi theory will also play a key role in this program as will be argued below. We willin particular discuss ways of more explicitly processing the eye movement information andemploying it in a feedback loop.

3. EYE MOVEMENT INFORMATION

While many studies have been performed on the measurement of eye movements, little researchhas been done on employing this information in applications. Some of the invasive eye trackingtechniques measure potential di!erences between the cornea and retina while others use contactlenses that "t precisely over the cornea. Non-invasive techniques identify features located on animage of the eye, such as the boundary of the iris or pupil or the corneal re#ection of a light shoneat the eye, and infer the movement of the eye from that of these features. Since the movement ofthese features could also be due to a movement of the head, it is important to track severalfeatures and extract the di+erential motion of these features. For example, active near-infrared eyetracking systems use the relative motion of the light re#ected o! the corneal surface and thatre#ected o! the retina to follow the user's eye-gaze. A good survey of eye-tracking techniques canbe found in References [3, 15]. See also the manual [16].

Much of the work on using eye movement information has focused on communications withhandicapped users, e.g., References [17, 18]. Other works that emphasized applications similar tothe one that we are considering include [19}23]. These pioneering research e!orts addressed theuse of eye movement in man}computer communications and demonstrated the feasibility of theprinciple. They also identi"ed the problems and limitations of this approach to user}computercommunications that must be solved in order to make it a practical reality that can compete andcomplement other man}machine interface tools such as the mouse, pen and tablet, joy-stick andkeyboard.

The rationale behind eye tracking approaches to user}machine communication is that anobject is seen by a human subject if it is imaged on a small area of the retina called the fovea.Hence, a user's eye position provides information about the area of the screen that he is focusingon. The information is of course accurate to within the angular width of the fovea (aboutone-degree). This degree of accuracy has been found to be satisfactory for man}computerinteraction by previous researchers. (Finer accuracy may be needed for the study of the eyemuscles but this is not our concern here.)

Biological studies indicate that the most common eye movements fall into two categories:saccade and ,xation. (See References [3, 24, 4] and the references therein for discussions ofsaccade and "xation and their relation to biological vision. These works also contain extensiveresults on the general problem of eye tracking.) A saccade is a sudden movement of the eye fromone area of interest in the scene to another. A saccade is usually followed by a "xation interval of



200 to 600 ms during which the eye still makes small jittery movements covering less than onedegree. Other types of eye movements may be neglected in man}machine communication.

The basic problem in eye-tracking based user}machine communication then is that of identify-ing "xation periods (i.e. periods during which the user feels that he is visually "xing at an object),and locating the object of "xation in the presence of the jittery eye motions due to the muscles.Note that humans do ignore these jittery eye motions during "xation, i.e. when we "x an object weare under the impression that we are looking at a single spot in the scene. Note also that a pure"ltering approach to eliminate the eye jitters is not satisfactory since it will also slow the responseto a true saccade.

4. DATA ACQUISITION

There are two basic approaches to data acquisition: an active approach similar to the one used bycurrent eye tracking systems and a passive imaging technique.

In the active approach, a harmless near-infrared light is used to illuminate the user's face. Twore#ections from the eyes are then extracted. The "rst re#ected signal is due to the corneal surfaceand is called the glint. The second re#ection occurs o! the retina and is called the bright eyecomponent. To minimize the e!ect of background radiation, current systems typically requirea dim lighting. In this paper, we will concentrate on the passive approach.

4.1. Image feature tracking approach

The passive approach to data acquisition is again based on extracting the two re#ections from aninfra-red image of the scene. In this sense the approach is similar to the active system. The passivesystem that we consider simply consists of a grey scale levels camera. The camera is covered withlight "lters to attenuate di!use re#ections due to background illumination and enhance the pupil.

4.1.1. Contour map. In both cases, our immediate objective is to either extract the location of thepupil and a feature of the face such as the nose, or extract the glint and bright eye re#ections fromthe image. We accomplish this by "rst producing an edge image using a snakes-based approach.In this section, we will describe a new paradigm for snakes or active contours based on principlesfrom curvature driven #ows which because of its speed and accuracy seems ideally suited for edgeextraction in this context.

Active contours may be regarded as autonomous processes which employ image coherence inorder to track various features of interest over time. Such deformable contours have the ability toconform to various object shapes and motions. Snakes have been utilized for segmentation, edgedetection, shape modelling, and visual tracking. See References [1, 2, 25] and the referencestherein. Active contours have also been widely applied for various applications in medicalimaging. For example, snakes have been employed for the segmentation of myocardial heartboundaries as a prerequisite from which such vital information such as ejection-fraction ratio,heart output, and ventricular volume ratio can be computed.

In the classical theory of snakes, one considers energy minimization methods where controlledcontinuity splines are allowed to move under the in#uence of external image dependent forces,

A. TANNENBAUM 879


internal forces, and certain contraints set by the user. As is well-known there may be a number ofproblems associated with this approach such as initializations, existence of multiple minima, andthe selection of the elasticity parameters. Moreover, natural criteria for the splitting and mergingof contours (or for the treatment of multiple contours) are not readily available in this framework.

In Reference [27], we propose a novel deformable contour model to successfully solve suchproblems, and which will become one of our key techniques for tracking. Our method is based onthe Euclidean curve shortening evolution (see our discussion below) which de"nes the gradientdirection in which a given curve is shrinking as fast as possible relative to Euclidean arc-length,and on the theory of conformal metrics. We multiply the Euclidean arc-length by a conformalfactor de"ned by the features of interest which we want to extract, and then we compute thecorresponding gradient evolution equations. The features which we want to capture therefore lieat the bottom of a potential well to which the initial contour will #ow. Moreover, our model maybe easily extended to extract 3D contours based on motion by mean curvature [26].

Let us brie#y review some of the details from Reference [26]. First of all, in References [27, 28],a snake model based on the level set formulation of the Euclidean curve shortening equation isproposed. More precisely, the model is

L(Lt

"/(x, y)E+(EAdivA+(

E+(EB#lB (1)

Here the function / (x, y) depends on the given image and is used as a &stopping term'. Forexample, the term / (x, y) may chosen to be small near an edge, and so acts to stop the evolutionwhen the contour gets close to an edge. One may take [27, 28]

/ :"1

1#E+Gp * IE2(2)

where I is the (grey-scale) image and Gp is a Gaussian (smoothing "lter) "lter. The function((x, y, t) evolves in (1) according to the associated level set #ow for planar curve evolution in thenormal direction with speed a function of curvature which was introduced in Reference [29].

It is important to note that the Euclidean curve shortening part of this evolution, namely

L(Lt

"E+(EdivA+(

E+(EB (3)

is derived as a gradient #ow for shrinking the perimeter as quickly as possible. As is explained inReference [27], the constant in-ation term l is added in (1) in order to keep the evolution movingin the proper direction. Note that we are taking ( to be negative in the interior and positive in theexterior of the zero level set.

We would like to modify the model (1) in a manner suggested by the curve shortening #ow. Wechange the ordinary arc-length function along a curve C"(x (p), y (p))T with parameter p given by

ds"(x2p#y2

p)1@2dp

to

ds("(x2

p#y2

p)1@2/dp,

where / (x, y) is a positive di!erentiable function. Then we want to compute the correspondinggradient #ow for shortening length relative to the new metric ds

(.



Accordingly set

¸((t) :"P

1

0KKLC

Lp KK/dp

Then taking the "rst variation of the modi"ed length function ¸(, and using integration by parts

(see [26]), we get that

¸@((t)"!P

L( (t)

0T

LC

Lt, /iNo !(+/ )No )No Uds

which means that the direction in which the ¸(perimeter is shrinking as fast as possible is given by

LC

Lt"(/i!(+/ )No ))No (4)

This is precisely the gradient #ow corresponding to the miminization of the length functional ¸(.

The level set version of this is

L(Lt

"/E+(EdivA+(

E+(EB#+/ )+(. (5)

One expects that this evolution should attract the contour very quickly to the feature which lies atthe bottom of the potential well described by the gradient #ow (5). As in References [27, 28], wemay also add a constant in#ation term, and so derive a modi"ed model of (1) given by

L(Lt

"/E+(EAdivA+(

E+(EB#lB#+/ )+(. (6)

Notice that for / as in (2), +/ will look like a doublet near an edge. Of course, one may chooseother candidates for / in order to pick out other features.

We now have very fast implementations of these snake algorithms based on level set methods[29, 30]. Clearly, the ability of our snakes to change topology, and quickly capture the desiredfeatures will make them an indispensable tool for our eye tracking algorithms.

4.1.2. Contour shape identixcation. The edge map that we produce via the multiscale approach isthen analysed to identify the shape of each contour. This allows us to associate the contour withone of the features (e.g. glint or bright eye re#ection, nose contour, etc.) that we use to infer theinstantaneous eye movement. We plan to exploit the recent work in shape recognition anddecomposition in our research [31}33]. In this work, a new approach is given for a computa-tional theory of shape which combines morphological operations with Gaussian smoothing inone framework. Moreover, an innovative notion of scale for signals can be applied to the problemof shape extraction. Our work can also be regarded as a generalization of certain usefulmorphological "lters, and so o!ers a nonlinear multiscale framework for feature and motiondetection in computer vision. See also References [34}36].

The approach is based on a reaction}di!usion equation derived from the curve evolution

LC

Lt"(a#bi)No , (7)

C() , 0)"C0( ) ),

A. TANNENBAUM 881


where No "No () , t) denotes the unit normal and i"i () , t) the curvature of the curve C() , t) ofthe family. (See References [31}33] for all the details.) We can study the evolution of shapes undervery general deformations, and show that they compose into two types, a deformation that isconstant along the normal (morphology) and corresponds to a nonlinear hyperbolic wave type ofprocess, and a deformation which varies with the curvature and corresponds to a quasi-lineardi!usive one (a quasi-linear analogue of Gaussian smoothing). Indeed, this last type of deforma-tion may be derived from the geometric heat equation or Euclidean curve shortening process[37]. Its smoothing properties have been shown to be superior to those of conventional (linear)Gaussian smoothing in vision applications [32]. Our evolutions give rise to shocks, the singular-ities of shape, which provide a hierarchical decomposition of a shape into a given set of shapeelements, e.g. parts and protrusions.

In fact, this approach gives a rigorous treatment of singularities in the context of shapetheory, and playing o! the parameters a and b against one another leads to a new reac-tion}di+usion scale space. Thus it allows one to formulate a novel framework for multi-scale"ltering. Further, there is a strong connection to Hamilton}Jacobi theory in this research. Seealso References [29, 38].

To illustrate some of the above remarks, let us consider, the case in which a"1, b"0. In thiscase, Equation (7) is a nonlinear hyperbolic equation, and so its solutions develop singularities,and thus a notion of weak entropy solution must be developed. This leads to the correspondingviscosity solution in the Hamilton}Jacobi formulation of the evolution [29, 39, 40]. Indeed, inthis case, it is very easy to see how singularities may develop from such an evolution. One caneasily show that in general for evolution ((7)), if v denotes the arc-length parameter, then thecurvature evolves as

LiLt

"bivv#bi3!ai2.

For a"1, b"0, we get that

LiLt

"!i2

whose solution is

i (v, t)"i (v, 0)

1#ti(v, 0).

Thus if the curve has a point of negative curvature, the solution must become singular in "nitetime.

We should also add that we already have reliable software written in C## for carrying outthis process of implementing the solutions of these types of evolutions.

5. TRACKING AND OPTICAL FLOW

Once the contours corresponding to the various features that we wish to track have beenidenti"ed, we use an optical #ow procedure to estimate the motion of each feature from two



consecutive images that contain only that feature. The images are obtained from the current andprevious contour images by deleting all contours except for the one of interest. The resultingmotion estimation problem is much better conditioned than the traditional optical #ow estima-tion from pure intensity images.

The computation of optical #ow has proved to be an important tool for problems arising inactive vision. The optical #ow "eld is the velocity vector "eld of apparent motion of brightnesspatterns in a sequence of images [41]. One assumes that the motion of the brightness patterns isthe result of relative motion, large enough to register a change in the spatial distribution ofintensities on the images. Thus, relative motion between an object and a camera can give rise tooptical #ow. Similarly, relative motion among objects in a scene being imaged by a static cameracan give rise to optical #ow.

In our computation of the optical #ow we use work on generalized viscosity solutions toHamilton}Jacobi type equations. Indeed, these techniques seem ideally suited for the variationalEuler}Lagrange approaches to this problem (see also [41}43] and the references therein).Utilizing such generalized solutions, we have been able to handle the singularities and regularityproblems for several distinct variational formulations of the optical #ow that occur in this area.This type of methodology has already been applied to the shape-from-shading problem [44, 45],and edge detection [46].

5.1. L1 based optical yow

In Reference [47], we consider a spatiotemporal di!erentiation method for optical #ow. Eventhough in such an approach, the optical #ow typically estimates only the isobrightness contours,it has been observed that if the motion gives rise to su$ciently large intensity gradients in theimages, then the optical #ow "eld can be used as an approximation to the real velocity "eld andthe computed optical #ow can be used reliably in the solutions of a large number of problems; seeRererence [12] and the references therein. Thus, optical #ow computations have been used quitesuccessfully in problems of three-dimensional object reconstruction, and in three-dimensionalscene analysis for computing information such as depth and surface orientation. In objecttracking and robot navigation, optical #ow has been used to track targets of interest. Discontinui-ties in optical #ow have proved an important tool in approaching the problem of imagesegmentation.

The problem of computing optical #ow is ill-posed in the sense of Hadamard. Well-posednesshas to be imposed by assuming suitable a priori knowledge. In Reference [47], we employa variational formulation for imposing such a priori knowledge.

One constraint which has often been used in the literature is the &optical #ow constraint' (OFC).The OFC is a result of the simplifying assumption of constancy of the intensity, E"E(x, y, t), atany point in the image [41]. It can be expressed as the following linear equation in the unknownvariables u and v

Exu#E

yv#E

t"0, (8)

where Ex, E

yand E

tare the intensity gradients in the x, y, and the temporal directions

respectively, and u and v are the x and y velocity components of the apparent motion ofbrightness patterns in the images, respectively. It has been shown that the OFC holds providedthe scene has Lambertian surfaces and is illuminated by either a uniform or an isotropic light

A. TANNENBAUM 883


source, the 3-D motion is translational, the optical system is calibrated and the patterns in thescene are locally rigid.

It is not di$cult to see from Equation (8) that computation of optical #ow is unique only up tocomputation of the #ow along the intensity gradient +E"(E

x, E

y)T at a point in the image [41].

(The superscript T denotes &transpose'.) This is the celebrated aperture problem. One way oftreating the aperture problem is through the use of regularization in computation of optical #ow,and consequently the choice of an appropriate constraint. A natural choice for such a constraintis the imposition of some measure of consistency on the #ow vectors situated close to one anotheron the image.

In their pioneering work, Horn and Schunk [41] use a quadratic smoothness constraint. Theimmediate di$culty with this method is that at the object boundaries, where it is natural to expectdiscontinuities in the #ow, such a smoothness constraint will have di$culty capturing the optical#ow. For instance, in the case of a quadratic constraint in the form of the square of the norm ofthe gradient of the optical #ow "eld [41], the Euler}Lagrange (partial) di!erential equations forthe velocity components turn out to be linear elliptic. The corresponding parabolic equationstherefore have a linear di!usive nature, and tend to blur the edges of a given image. In the past,work has been done to try to suppress such a constraint in directions orthogonal to the occludingboundaries in an e!ort to capture discontinuities in image intensities that arise on the edges; seeReference [48] and the references therein.

We have proposed in Reference [47] a novel method for computing optical #ow based on thetheory of the evolution of curves and surfaces. The approach employs an ¸1 type minimization ofthe norm of the gradient of the optical #ow vector rather than quadratic minimization as has beenundertaken in most previous regularization approaches. The equations that arise are nonlineardegenerate parabolic equations. The equations di!use in a direction orthogonal to the intensitygradients, i.e., in a direction along the edges. This results in the edges being preserved. Theequations can be solved by following a methodology very similar to the evolution of curves basedon the work in References [29, 30]. Proper numerical implementation of the equations leads tosolutions which incorporate the nature of the discontinuities in image intensities into the optical#ow.

We can summarize our procedure as follows:

1. Let E"E (x, y, t) be the intensity of the given moving image. Assume constancy of intensityat any point in the image, i.e.

Exu#E

yv#E

t"0

where

u"dx

dtv"

dy

dt

are the components of the apparent motion of brightness patterns in the image which wewant to estimate.

2. Consider the regularization of optical #ow using the ¸1 cost functional

min(u, v) PPJu2

x#u2

y#Jv2

x#v2

y#a2(E

xu#E

yv#E

t)2dxdy

where a is the smoothness parameter.



3. The corresponding Euler}Lagrange equations may be computed to be

iu!a2E

x(E

xu#E

yv#E

t)"0

iv!a2E

y(E

xu#E

yv#E

t)"0

where the curvature

iu:"divA

+u

E+uEB ,

and similarly for iv.

4. These equations are solved via &gradient descent' by introducing the system of nonlinearparabolic equations

uLt {"i

u(!a2E

x(E

xuL #E

yvL#E

t),

vLt {"i

v(!a2E

y(E

xuL #E

yvL#E

t),

for uL "uL (x, y, t@ ), and similarly for vL .

The above equations have a signi"cant advantage over the classical Horn}Schunck quadraticoptimization method since they do not blur edges. Indeed, the di!usion equation

't"*'!

1

E+'E2S+2'(+'), +'T,

"i'E+'E

does not di!use in the direction of the gradient +' ; see Reference [44]. Our optical #owequations are perturbations of the following type of equation:

't"

i'E+'E

E+'E.

Since E+'E is maximal at an edge, our optical #ow equations do indeed preserve the edges. Thus the¸1-norm optimization procedure allows us to retain edges in the computation of the optical #ow.

This approach to the estimation of motion will be one of the tools which we will employ in ourtracking algorithms. The algorithm has already proven to be very reliable for various type ofimagery [47]. We are also interested in applying a similar technique to the problem of stereodisparity. We also want to apply some of the multigrid techniques of Vogel [49] in order tosigni"cantly enhance the performance of both our optical #ow and stereo disparity algorithms.

6. SEGMENTATION AND DATA ANALYSIS

Given a noisy eye trajectory which is extracted from a sequence of data, the problem thenbecomes that of segmenting the data into saccade and "xation intervals and estimating the mean"xation point. A multiscale approach is used in this step as well.

The "xation estimation problem is complicated by several sources of disturbance. Speci"cally,eye blinks and other artifacts may occur during a "xation without terminating it. During these

A. TANNENBAUM 885


artifacts which can last up to 200 ms, data may be absent. To better localize the object of "xationand segment the eye movement track, one can use a joint continuous/discrete maximumlikelihood approach to re"ne the estimates of the beginning and end of "xation intervals and ofthe location of the "xation point. The approach is a joint continuous/discrete estimation problembecause the "xation point can take continuous values whereas the segmentation into &"xation'and &non-"xation' intervals is a binary problem. It is based on statistical models of "xation eyeactivity and duration and is optimal in a Bayesian sense. It uses a prediction step to "ll in missingdata during blinks. The prediction step produces two estimates based on the hypotheses that the eyeis either in a "xation mode or a saccade mode. The hypotheses are either accepted or rejected bycomputing the likelihood of each after observing a certain number of readings in the future. Thedelay associated with the procedure is limited to about 150 ms, which should not be noticeable tothe user. One can also naturally put Bayesian statistics into the gradient snake approach forcontour extraction as described above; see Reference [50]. Thus the multiscale geometric snakebased edge detection algorithm "ts very naturally into such a Bayesian framework.

Note that we do not need to send information about saccade movements. When the "xationpoint changes, we only provide information about the initial and "nal "xation points to thetracking routine. However, to speed up the man}machine communication process, we use thesaccade movement to predict where the next "xation will be. Saccade movements can bedescribed by a hidden Markov model. See Reference [51] and the references therein. In addition,a Markov model may be used to summarize possible user actions in the current context. Eachstate in the hidden Markov model describing the saccade is associated to a stochastic model ofa particular stage and type of motion. By combining the probabilistic description of the userintent based on the present context and the hidden Markov model of the saccade, we may derivean estimation rule for predicting where the next "xation will be.

7. CONCLUSIONS

In this paper, we considered a general methodology for treating eye-tracking problems. We notedthat because of the uncertainty in the models and the signals, this class of problems presentsa unique opportunity to researchers in robust control. We discussed an approach which called fora combination of robust control, computer vision, and multiscale signal processing.

Recently, there has been a cross-fertilization among researchers in vision and control. Much ofvision research until now has been open loop. When the loop has been closed very elementarycontrol algorithms have been applied which have worked with mixed results. The use of visualinformation in a feedback loop can provide a rich new source of questions which has the potentialof stimulating whole new areas of control research. The eye-tracking problem outlined in thispaper can be taken as a paradigm for a whole range of issues in controlled active vision. Indeed,the problem poses a powerful challenge to the robust control community.

ACKNOWLEDGMENTS

We would like to thank Professors Gary Balas and Ahmed Tew"k of the University of Minnesota fora number of very helpful conversations about visual tracking.



This work was supported in part by grants from the National Science Foundation ECS-9700588,NSF-LIS, by the Air Force O$ce of Scienti"c Research AF/F49620-98-1-0168, by the Army Research O$ceDAAG55-98-1-0169, and MURI Grant.

REFERENCES

1. Blaker A, Isard M. Active Contours: ¹he Application of ¹echniques from Graphics,<ision, Control ¹heory and Statisticsto <isual ¹racking of Shapes in Motion. Springer: New York, 1999.

2. Blake A, Yuille A. Active <ision. MIT Press: Cambridge, MA, 1992.3. Carpenter R. Movements of the Eyes. Pion: London, 1988.4. Murray D, Buxton B. Experiments in the Machine Interpretation of <isual Motion. MIT Press: Cambridge, MA, 1999.5. Chaumette F, Rives P, Espiau B. Positioning of a robot with respect to an object, tracking it and estimating its

velocity by visual servoing. Proceedings of IEEE International Conference on Robotics and Automation 1991;2248}2253.

6. Feddema JT, Lee CS, Mitchell OR. Automatic selection of image features for visual servoing of a robot manipulator.Proceedings of the IEEE International Conference on Robotics and Automation 1989; 832}837.

7. Luo RC, Mullen R, Wessell DE. An adaptive robotic tracking system using optical #ow. Proceedings of the IEEEInternational Conference on Robotics and Automation 1988; 568}573.

8. Papanikolopoulos N, Khosla PK, Kanade T. Vision and control techniques for robotic visual tracking. Proceedingsof the IEEE International Conference on Robotics and Automation, 1991.

9. Weiss LE, Sanderson AC, Neuman CP. Dynamic sensor-based control of robots with visual feedback. IEEE Journalof Robotics and Automation 1987; RA-3:404}417.

10. Roach J, Aggarwal J. Computer tracking of objects moving in space. IEEE PAMI 1979; 1:127}135.11. Wallace TP, Mitchell OR. Analysis of three dimensional movement using Fourier descriptors. IEEE PAMI 1980;

2:583}588.12. Horn BKP. Robot <ision. MIT Press: Cambridge, MA, 1986.13. Papanikolopoulos N, Khosla PK, Kanade T. Adaptive robotic visual tracking. Proceedings of American Control

Conference, Boston, MA, 1991.14. Doyle J, Francis B, Tannenbaum A. Feedback Control ¹heory. MacMillan: New York, 1991.15. Young LR, Sheena D. Survey of eye movement recording methods. Behavior Research Methods and Instrumentation

1975; 7:397}429.16. Eye ¹racking Systems Handbook. Applied Science Laboratories: Waltham, MA.17. Hutchinson TE, White KP, Martin WN, Reichert KC, Frey LA. Human-computer interaction using eye-gaze input.

IEEE ¹ransactions on Systems, Man, and Cybernetics 1989; 1527}1534.18. Levine JL. An eye controlled computer. IBM Watson Center Res. Rep. RC-8857, York-town Heights, NY, 1981.19. Jacob RJK. The use of eye movements in human-computer interaction techniques: what you look at is what you get.

ACM ¹ransactions on Information Systems 1991; 9:152}169.20. Bolt RA. Eyes at the interface. in Proceedings of the ACM Human Factors in Computer Systems Conference,

Gaithersburg, MD 1982; 360}362.21. Zalad AL. Eye-voice controlled interface. Proceedings of the 30th Annual Meeting of the Human Factors Society, 1986;

322}326.22. Ware C, Mikaelian HT. An evaluation of an eye tracker as a device for computer input. Proceedings of the ACM

CHI190 Human Factors in Computing Systems Conference, Toronto, Canada 1987; 183}188.23. Starker I, Bolt RA. A gaze responsive self-disclosing display. Proceedings of the ACM CHI190 Human Factors in

Computing Systems Conference, Seattle, Washington 1990; 3}9.24. Carpenter R, Williams M. Neural computation of log likelihood in the control of saccadic eye movements. Nature

1995; 377:59}62.25. Kass M, Witkin A, Terzopoulos D. Snakes: active contour models. International Journal of Computer <ision 1987;

1:321}331.26. Kichenassamy S, Kumar A, Olver P, Tannenbaum A, Yezzi A. Conformal curvature #ows: from phase transitions to

active vision. Archive for Rational Mechanics and Analysis 1996; 134:275}301. A short version of this paper hasappeared in the Proceedings of ICC<, June 1995.

27. Caselles V, Catte F, Coll T, Dibos F. A geomteric model for active contours in image processing. NumerischeMathematik 1993; 66:1}31.

28. Malladi R, Sethian J, Vermuri B. Shape modelling with front propagation: a level set approach. IEEE PAMI 1995;17:158}175.

29. Osher SJ, Sethian JA. Fronts propagation with curvature dependent speed: Algorithms based on Hamilton-Jacobiformulations. Journal of Computational Physics 1988; 79:12}49.

A. TANNENBAUM 887


30. Sethian J. A review of recent numerical algorithms for hypersurfaces moving with curvature dependent speed. Journalof Di+erential Geometry 1989; 31:131}161.

31. Kimia B, Tannenbaum A, Zucker S. Towards a computational theory of shape: an overview. ¸ecture Notes inComputer Science 1990; 427:402}407.

32. Kimia BB, Tannenbaum A, Zucker SW. Shapes, shocks, and deformations, I. International Journal of Computer<ision1995; 15:189}224.

33. Kimia B, Tannenbaum A, Zucker S. On the evolution of curves via a function of curvature, I: the classical case.Journal of Mathematical Analysis and Applications 1992; 162:438}458.

34. Brockett RW, Maragos P. Evolution equations for continuous-scale morphology. Proceedings of the ICASSP, vol. III,1992; 125}128.

35. Maragos P. Pattern spectrum and multiscale shape representation. IEEE PAMI 1989; 11:701}716.36. Serra J. Image Analysis and Mathematical Morphology. Academic Press: New York, 1982.37. Grayson M. The heat equation shrinks embedded plane curves to round points. Journal of Di+erential Geometry 1987;

26:285}314.38. Sapiro G, Tannenbaum A. Invariant curve evolution and image analysis. Indiana ;niversity Journal of Mathematics

1993; 42:985}1009.39. Barles G. Remarks on a #ame propagation model. INRIA Report 464, Sophia-Antipolis, France, 1985.40. Crandall MG, Lions PL. Viscosity solutions of Hamilton-Jacobi equations. ¹ransactions of the American Mathemat-

ical Society 1983; 277:1}42.41. Horn BKP, Schunck BG. Determining optical #ow. Arti,cial Intelligence 1981; 20:185}203.42. Hildreth EC. Computations underlying the measurement of visual motion. Arti,cial Intelligence 1984; 23:309}354.43. Snyder M. On the mathematical foundations of smoothness constraints for the determination of optical #ow and for

surface reconstruction. PAMI 1991; 13:1105}1114.44. Lions PL, Rouy E, Tourin A. Shape from shading, viscosity and edges. Technical Report, CEREMADE, UniversiteH

Paris IX-Dauphine, France, 1991.45. Rouy E, Tourin A. A viscosity solutions approach to shape-from-shading. SIAM Journal of Numerical Analysis 1992;

29:867}884.46. Alvarez L, Lions PL, Morel JM. Image selective smoothing and edge detection by nonlinear smoothing. II. SIAM

Journal of Numerical Analysis 1992; 29:845}866.47. Kumar A, Tannenbaum A, Balas G. Optical #ow: a curve evolution approach. IEEE ¹ransactions on Image

Processing 1996; 5:598}611.48. Nagel H-H, Enkelmann W. An investigation of smoothness constraints for the estimation of displacement

vector "elds from image sequences. IEEE ¹ransactions on Pattern Analysis and Machine Intelligence 1986; PAMI-8:565}593.

49. Vogel C. Total variation regularization for ill}posed problems. ¹echnical Report, Department of Mathematics,Montana State University, April 1993.

50. Haker S, Sapiro G, Tannenbaum A. Knowledge-based segmentation of SAR data with learned priors. IEEE¹ransactions on Image Processing 2000; 9:298}302.

51. Rimey R, Brown C. Controlling eye movements with hidden Markov models. International Journal of Computer<ision 1991; 7:47}66.



on the eye tracking problem: a challenge for robust control

Documents