real time eye tracking for human computer interfaces subramanya amarnag, raghunandan s. kumaran and...

1
REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACES Subramanya Amarnag, Raghunandan S. Kumaran and John Gowdy Dept. of Electrical and Computer Engineering , Clemson University. Email: {asubram, ksampat, jgowdy}@clemson.edu Eye tracking Intrusive Non Intrusive Advantages: Can be highly accurate. Disadvantages: Can be very cumbersome for the user. Not ideal for practical purposes. Advantages: User friendly. Disadvantages: The accuracy of systems developed thus far is not good when compared to intrusive systems Our System - Highlights Non IR based Non Intrusive Uses an ordinary camera to track the eyes Utilizes a Dynamic training strategy thus making it user and lighting condition invariant. Ideal for systems where high accuracy is not required Pre - Processing In this stage the intensity of the pixels is considered for eliminating a number of pixels. A threshold of 0.27 has been experimentally determined to be ideal for most cases. If the intensity of a pixel is above the threshold, then that pixel is eliminated. The remaining pixels are passed to the next stage. Bayesian Classifier In this stage the problem consists of classifying the pixels into eye and non-eye classes. Bayesian Classifier is used as the binary classifier. Gaussian PDFs are used to model both the eye and non-eye classes. Means and covariance of the classes are dynamically updated after processing each frame. Clustering Bayesian Classifier does not eliminate all the non-eye pixels, especially facial hair and other dark pixels. Clustering is performed to identify the ‘dark islands’ in the remaining image. Our algorithm can be considered as an unsupervised c- means algorithm. The difference being that here no assumptions are made regarding the number of cluster or the cluster centers. For i=1 to N For j=1 to noe Is dist( x(i), exemplar(j) ) < threshold Update exemplar Exemplar(1) = x(1); noe = 0 Create a new cluster, noe = noe + 1 Yes No j = noe noe = Number of exemplars Post Processing Clustering returns the total number of ‘dark islands’ in the image. Post processing is done to identify the ‘eyes’ among these ‘dark islands’. The first step is to merge clusters which are close to each other ( less than 5 pixels). The next step uses the geometrical features of the clusters such as the size, width and the height to eliminate them. Finally we should be left with 2 clusters which represent the eyes. The location of the eyes are used to limit the search region for the next frame. Results The system was implemented on an Intel Pentium III 997 MHz machine and achieved a frame rate of 26 fps. The system was tested on 2 databases : Clemson University Audio Visual Experiments ( CUAVE ) database and the CMU audio-visual dataset. Accuracy achieved: CMU database : 88.3% CUAVE database, stationary speaker : 86.4% CUAVE database, moving speaker : 76.5% Frame Search Region Pre-Processing Bayesian Classifier Clustering Post-Processing Eyes Located Successfully? No, Process Next Frame Update Means And Covariance. Update frame Search Region Yes Location Of the Eyes Yes Input Frame References [1] S. Baluja and D. Pomerleau, “Non Intrusive gaze tracking using Artificial Neural Networks,” Technical Report CMU-CS-94-102, Carnegie Mellon University. [2] Advanced Multimedia Processing Lab, CMU, http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing / . [3] E.K. Patterson, S. Gurbuz, Z. Tufekci, and J.N. Gowdy, “ CUAVE: A New Audio-Visual Database for Multimodal Human-Computer Interface Research,” ICASSP, Orlando, May 2002. This figure illustrates the performance of the System against complex backgrounds Results for a sequence of frames from the CMU dataset Results for a sequence of frames from the CUAVE dataset Abstract In recent years considerable interest has developed in real time eye tracking for various applications including lip tracking. Although there exist many lip tracking algorithms, they are bound by a number of constraints such as color of the lips, the size and shape of the lips, constant motion of the lips etc, for their successful implementation. However, eye tracking algorithms may be designed to overcome these constraints. Hence eye tracking appears to be a reasonable solution to the lip tracking problem as a fix on the speakers eyes will give us a rough estimate on the position of the lips.

Upload: kimberly-bennett

Post on 24-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACES Subramanya Amarnag, Raghunandan S. Kumaran and John Gowdy Dept. of Electrical and Computer Engineering,

REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACESSubramanya Amarnag, Raghunandan S. Kumaran and John GowdyDept. of Electrical and Computer Engineering , Clemson University.

Email: {asubram, ksampat, jgowdy}@clemson.edu

Eye tracking

Intrusive Non Intrusive

Advantages: Can be highly accurate.Disadvantages: Can be very cumbersome for the user. Not ideal for practical purposes.

Advantages: User friendly.Disadvantages: The accuracy of systems developed thus far is not good when compared to intrusive systems

Our System - Highlights

• Non IR based • Non Intrusive• Uses an ordinary camera to track the eyes• Utilizes a Dynamic training strategy thus making it user and lighting condition invariant.• Ideal for systems where high accuracy is not required

Pre - Processing

• In this stage the intensity of the pixels is considered for eliminating a number of pixels.• A threshold of 0.27 has been experimentally determined to be ideal for most cases.• If the intensity of a pixel is above the threshold, then that pixel is eliminated.• The remaining pixels are passed to the next stage.

Bayesian Classifier

• In this stage the problem consists of classifying the pixels into eye and non-eye classes.

• Bayesian Classifier is used as the binary classifier.• Gaussian PDFs are used to model both the eye and non-eye classes.• Means and covariance of the classes are dynamically updated after

processing each frame.

Clustering

• Bayesian Classifier does not eliminate all the non-eye pixels, especially facial hair and other dark pixels.

• Clustering is performed to identify the ‘dark islands’ in the remaining image.

• Our algorithm can be considered as an unsupervised c-means algorithm. The difference being that here no assumptions are made regarding the number of cluster or the cluster centers.

For i=1 to N

For j=1 to noe

Is dist( x(i),exemplar(j) ) < threshold

Update exemplar

Exemplar(1) = x(1); noe = 0

Create a new cluster,noe = noe + 1

YesNo

j = noe noe = Number of exemplars

Post Processing

• Clustering returns the total number of ‘dark islands’ in the image.• Post processing is done to identify the ‘eyes’ among these ‘dark islands’.• The first step is to merge clusters which are close to each other ( less

than 5 pixels).• The next step uses the geometrical features of the clusters such as the

size, width and the height to eliminate them. • Finally we should be left with 2 clusters which represent the eyes.• The location of the eyes are used to limit the search region for the next

frame.

Results

• The system was implemented on an Intel Pentium III 997 MHz machine and achieved a frame rate of 26 fps.

• The system was tested on 2 databases : Clemson University Audio Visual Experiments ( CUAVE ) database and the CMU audio-visual dataset.

• Accuracy achieved:

– CMU database : 88.3%

– CUAVE database, stationary speaker : 86.4%

– CUAVE database, moving speaker : 76.5%

Frame SearchRegion

Pre-ProcessingBayesianClassifier

Clustering Post-Processing

EyesLocated

Successfully?

No, Process Next Frame

Update MeansAnd Covariance.

Update frameSearch Region

Yes

LocationOf theEyes

Yes

Input Frame

References[1] S. Baluja and D. Pomerleau, “Non Intrusive gaze tracking using Artificial

Neural Networks,” Technical Report CMU-CS-94-102, Carnegie Mellon

University.

[2] Advanced Multimedia Processing Lab, CMU, http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/.

[3] E.K. Patterson, S. Gurbuz, Z. Tufekci, and J.N. Gowdy, “ CUAVE: A New

Audio-Visual Database for Multimodal Human-Computer Interface

Research,” ICASSP, Orlando, May 2002.

This figure illustrates the performance of theSystem against complex backgrounds

Results for a sequence of framesfrom the CMU dataset

Results for a sequence of framesfrom the CUAVE dataset

Abstract In recent years considerable interest has developed in real time eye tracking

for various applications including lip tracking. Although there exist many lip tracking algorithms, they are bound by a number of constraints such as color of the lips, the size and shape of the lips, constant motion of the lips etc, for their successful implementation. However, eye tracking algorithms may be designed to overcome these constraints. Hence eye tracking appears to be a reasonable solution to the lip tracking problem as a fix on the speakers eyes will give us a rough estimate on the position of the lips.