real time eye tracking for human computer interfaces subramanya amarnag, raghunandan s. kumaran and...

REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACESSubramanya Amarnag, Raghunandan S. Kumaran and John GowdyDept. of Electrical and Computer Engineering , Clemson University.

Email: {asubram, ksampat, jgowdy}@clemson.edu

Eye tracking

Intrusive Non Intrusive

Advantages: Can be highly accurate.Disadvantages: Can be very cumbersome for the user. Not ideal for practical purposes.

Advantages: User friendly.Disadvantages: The accuracy of systems developed thus far is not good when compared to intrusive systems

Our System - Highlights

• Non IR based • Non Intrusive• Uses an ordinary camera to track the eyes• Utilizes a Dynamic training strategy thus making it user and lighting condition invariant.• Ideal for systems where high accuracy is not required

Pre - Processing

• In this stage the intensity of the pixels is considered for eliminating a number of pixels.• A threshold of 0.27 has been experimentally determined to be ideal for most cases.• If the intensity of a pixel is above the threshold, then that pixel is eliminated.• The remaining pixels are passed to the next stage.

Bayesian Classifier

• In this stage the problem consists of classifying the pixels into eye and non-eye classes.

• Bayesian Classifier is used as the binary classifier.• Gaussian PDFs are used to model both the eye and non-eye classes.• Means and covariance of the classes are dynamically updated after

processing each frame.

Clustering

• Bayesian Classifier does not eliminate all the non-eye pixels, especially facial hair and other dark pixels.

• Clustering is performed to identify the ‘dark islands’ in the remaining image.

• Our algorithm can be considered as an unsupervised c-means algorithm. The difference being that here no assumptions are made regarding the number of cluster or the cluster centers.

For i=1 to N

For j=1 to noe

Is dist( x(i),exemplar(j) ) < threshold

Update exemplar

Exemplar(1) = x(1); noe = 0

Create a new cluster,noe = noe + 1

YesNo

j = noe noe = Number of exemplars

Post Processing

• Clustering returns the total number of ‘dark islands’ in the image.• Post processing is done to identify the ‘eyes’ among these ‘dark islands’.• The first step is to merge clusters which are close to each other ( less

than 5 pixels).• The next step uses the geometrical features of the clusters such as the

size, width and the height to eliminate them. • Finally we should be left with 2 clusters which represent the eyes.• The location of the eyes are used to limit the search region for the next

frame.

Results

• The system was implemented on an Intel Pentium III 997 MHz machine and achieved a frame rate of 26 fps.

• The system was tested on 2 databases : Clemson University Audio Visual Experiments ( CUAVE ) database and the CMU audio-visual dataset.

• Accuracy achieved:

– CMU database : 88.3%

– CUAVE database, stationary speaker : 86.4%

– CUAVE database, moving speaker : 76.5%

Frame SearchRegion

Pre-ProcessingBayesianClassifier

Clustering Post-Processing

EyesLocated

Successfully?

No, Process Next Frame

Update MeansAnd Covariance.

Update frameSearch Region

Yes

LocationOf theEyes

Yes

Input Frame

References[1] S. Baluja and D. Pomerleau, “Non Intrusive gaze tracking using Artificial

Neural Networks,” Technical Report CMU-CS-94-102, Carnegie Mellon

University.

[2] Advanced Multimedia Processing Lab, CMU, http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/.

[3] E.K. Patterson, S. Gurbuz, Z. Tufekci, and J.N. Gowdy, “ CUAVE: A New

Audio-Visual Database for Multimodal Human-Computer Interface

Research,” ICASSP, Orlando, May 2002.

This figure illustrates the performance of theSystem against complex backgrounds

Results for a sequence of framesfrom the CMU dataset

Results for a sequence of framesfrom the CUAVE dataset

Abstract In recent years considerable interest has developed in real time eye tracking

for various applications including lip tracking. Although there exist many lip tracking algorithms, they are bound by a number of constraints such as color of the lips, the size and shape of the lips, constant motion of the lips etc, for their successful implementation. However, eye tracking algorithms may be designed to overcome these constraints. Hence eye tracking appears to be a reasonable solution to the lip tracking problem as a fix on the speakers eyes will give us a rough estimate on the position of the lips.

http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/

http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/

real time eye tracking for human computer interfaces subramanya amarnag, raghunandan s. kumaran and...

Documents