real time eye tracking for human computer interfaces subramanya amarnag, raghunandan s. kumaran and...
TRANSCRIPT
REAL TIME EYE TRACKING FOR HUMAN COMPUTER INTERFACESSubramanya Amarnag, Raghunandan S. Kumaran and John GowdyDept. of Electrical and Computer Engineering , Clemson University.
Email: {asubram, ksampat, jgowdy}@clemson.edu
Eye tracking
Intrusive Non Intrusive
Advantages: Can be highly accurate.Disadvantages: Can be very cumbersome for the user. Not ideal for practical purposes.
Advantages: User friendly.Disadvantages: The accuracy of systems developed thus far is not good when compared to intrusive systems
Our System - Highlights
• Non IR based • Non Intrusive• Uses an ordinary camera to track the eyes• Utilizes a Dynamic training strategy thus making it user and lighting condition invariant.• Ideal for systems where high accuracy is not required
Pre - Processing
• In this stage the intensity of the pixels is considered for eliminating a number of pixels.• A threshold of 0.27 has been experimentally determined to be ideal for most cases.• If the intensity of a pixel is above the threshold, then that pixel is eliminated.• The remaining pixels are passed to the next stage.
Bayesian Classifier
• In this stage the problem consists of classifying the pixels into eye and non-eye classes.
• Bayesian Classifier is used as the binary classifier.• Gaussian PDFs are used to model both the eye and non-eye classes.• Means and covariance of the classes are dynamically updated after
processing each frame.
Clustering
• Bayesian Classifier does not eliminate all the non-eye pixels, especially facial hair and other dark pixels.
• Clustering is performed to identify the ‘dark islands’ in the remaining image.
• Our algorithm can be considered as an unsupervised c-means algorithm. The difference being that here no assumptions are made regarding the number of cluster or the cluster centers.
For i=1 to N
For j=1 to noe
Is dist( x(i),exemplar(j) ) < threshold
Update exemplar
Exemplar(1) = x(1); noe = 0
Create a new cluster,noe = noe + 1
YesNo
j = noe noe = Number of exemplars
Post Processing
• Clustering returns the total number of ‘dark islands’ in the image.• Post processing is done to identify the ‘eyes’ among these ‘dark islands’.• The first step is to merge clusters which are close to each other ( less
than 5 pixels).• The next step uses the geometrical features of the clusters such as the
size, width and the height to eliminate them. • Finally we should be left with 2 clusters which represent the eyes.• The location of the eyes are used to limit the search region for the next
frame.
Results
• The system was implemented on an Intel Pentium III 997 MHz machine and achieved a frame rate of 26 fps.
• The system was tested on 2 databases : Clemson University Audio Visual Experiments ( CUAVE ) database and the CMU audio-visual dataset.
• Accuracy achieved:
– CMU database : 88.3%
– CUAVE database, stationary speaker : 86.4%
– CUAVE database, moving speaker : 76.5%
Frame SearchRegion
Pre-ProcessingBayesianClassifier
Clustering Post-Processing
EyesLocated
Successfully?
No, Process Next Frame
Update MeansAnd Covariance.
Update frameSearch Region
Yes
LocationOf theEyes
Yes
Input Frame
References[1] S. Baluja and D. Pomerleau, “Non Intrusive gaze tracking using Artificial
Neural Networks,” Technical Report CMU-CS-94-102, Carnegie Mellon
University.
[2] Advanced Multimedia Processing Lab, CMU, http://amp.ece.emu.edu/projects/AudioVisualSpeechProcessing/.
[3] E.K. Patterson, S. Gurbuz, Z. Tufekci, and J.N. Gowdy, “ CUAVE: A New
Audio-Visual Database for Multimodal Human-Computer Interface
Research,” ICASSP, Orlando, May 2002.
This figure illustrates the performance of theSystem against complex backgrounds
Results for a sequence of framesfrom the CMU dataset
Results for a sequence of framesfrom the CUAVE dataset
Abstract In recent years considerable interest has developed in real time eye tracking
for various applications including lip tracking. Although there exist many lip tracking algorithms, they are bound by a number of constraints such as color of the lips, the size and shape of the lips, constant motion of the lips etc, for their successful implementation. However, eye tracking algorithms may be designed to overcome these constraints. Hence eye tracking appears to be a reasonable solution to the lip tracking problem as a fix on the speakers eyes will give us a rough estimate on the position of the lips.