gazelib - city ulbms03.cityu.edu.hk/studproj/cs/2010csnky597.pdf · supervisor : prof ip, ho shing...
TRANSCRIPT
-
(09CS026)
GazeLib A low Cost Implementation of Real-time Gaze Tracking
Framework
(Volume 1 of 1 )
Student Name : Ng King Yui
Student No. :
Programme Code
: BScCS
Supervisor : Prof IP, Ho Shing Horace
Date : 12 April, 2010
City University of Hong Kong Department of Computer Science BSCCS/BSCS Final Year Project 2009-2010
Final Report
For Official Use Only
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 2 of 81-
Declaration
I have read the project guidelines and I understand the meaning of academic
dishonesty, in particular plagiarism and collusion. I hereby declare that the work I
submitted for my final year project, entitled:
GazeLib: A low Cost Implementation of Real-time Gaze Tracking Framework
does not involve academic dishonesty. I give permission for my final year project work
to be electronically scanned and if found to involve academic dishonesty, I am aware of
the consequences as stated in the Project Guidelines.
Student Name: Ng King Yui Signature: _____________
Student ID : Date : _____________
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 3 of 81-
Abstract
Eye gaze reflects a person’s attention over time, which is a powerful cue for
determining what might be interesting. Therefore, eyes have not only indispensable
meaning for human communication, but also great potential to build human computer
interaction in a more natural and direct mode. Ironically, most available gaze trackers
are either driven by specific design operating software or high end hardware. High
costs have always been the barrier prohibiting wide spread of gaze tracking technology.
Moreover, the majority gaze trackers are adopted corneal reflection tracking technique
which actively illuminates the eye region by infrared light requiring quasi-stable lighting
conditions to operate. In addition, potential eye hazards maybe arise from long period
or close proximity IR exposure. To solve these problems, a robust hybrid method
integrates model and feature based tracking approaches is proposed in this project.
The core of the proposed method is applying Active Shape Model fitting technique to
locate facial features. Eye features are then extracted by sophisticated image
processing including filtering and eclipse fitting. Towards a framework available to
public, the proposed framework is packed in programming library and available in an
open-source package.Evaluation experiments show that system prototype is capable to
perform real-time remote gaze tracking under several lighting conditions with low-cost
and off-the-shelf webcams while maintaining acceptable accuracy.The proposed
method has significantly improved the usability, reduced the cost of using gaze tracking
technology, which is an important step to make it enter the mass market.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 4 of 81-
Acknowledgments
I would like to thank my supervisor Prof. Horace H.S. IP for his advice and valuable
support throughout the development of this project. This project really would not have
been reached to an end without his guidance and patience.
I would also like to thank Dr. Ken C.K. LAW, Dr. Joe C.H. YUEN from Department of
Computer Science and Dr. Lionel P.K. SUN from Department of Mathematics for their
gentlemanly supports and guidance.
Furthermore, also great thanks to the organizations, including FG-NET consortium and
DTU IMM that prepared and published the annotated face databases used in the
project, and to those who allowed their faces to be used.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 5 of 81-
A low Cost Implementation of Real-time Gaze Tracking Framework
Final Report
K.Y. Ng
Deliverable date: 12 April, 2009
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 6 of 81-
Table of Contents Introduction .................................................................................................................... 9
Introduction ............................................................................................................... 10
Literature Review ......................................................................................................... 13
2.1 Gaze Tracking ..................................................................................................... 14
2.1.1 Biological structure of human eye ................................................................ 14
2.1.2 Mathematical eye model .............................................................................. 17
2.2 Eye Tracking Techniques.................................................................................... 19
2.2.1 Electro-Oculography (EOG) ......................................................................... 19
2.2.2 Scleral Contact lens/ Search Coils ............................................................... 20
2.2.3 Video-Oculography with Corneal Reflection ................................................. 20
2.2.4 Video-Oculography with visual light ............................................................. 23
2.3 Video-based gaze tracking hardware settings .................................................... 24
2.3.1 Head-mount ................................................................................................. 24
2.3.2 Table-mount ................................................................................................. 24
2.4 Potential hazards with IR .................................................................................... 25
Methodology ................................................................................................................. 26
3.1 Introduction ......................................................................................................... 27
3.2 Active Shape model (ASM) ................................................................................. 27
3.3 Active appearance model (AAM) ........................................................................ 28
3.4 POSIT ................................................................................................................. 28
3.5 RANSAC ............................................................................................................. 29
Design & Implementation ............................................................................................. 30
4.1 Introduction ......................................................................................................... 31
4.2 Development Environment .................................................................................. 31
4.3 System design .................................................................................................... 32
4.3.1 Architecture .................................................................................................. 32
4.3.2 Conceptual Class diagram ........................................................................... 33
4.4 System implementation ....................................................................................... 34
4.4.1 Overall system flow ...................................................................................... 34
4.4.2 Active Shape Model building ........................................................................ 35
4.4.3 Face detection and tracking ......................................................................... 37
4.4.4 Head poses estimation ................................................................................. 41
4.4.5 Eye feature extraction .................................................................................. 44
4.4.6 Gaze Estimation ........................................................................................... 46
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 7 of 81-
Results & Comments .................................................................................................... 49
5.1 Introduction ......................................................................................................... 50
5.2 Testing environment............................................................................................ 50
5.3 Performance of face tracking .............................................................................. 51
5.3.1 Presence of distractions ............................................................................... 51
5.3.2 Various ambient lighting conditions .............................................................. 54
5.3.3 Different facial expressions .......................................................................... 55
5.3.4 Blurred inputted image ................................................................................. 56
5.4 Performance of eye features extraction .............................................................. 56
5.4.1 Presence of distractions ............................................................................... 56
5.4.2 Various ambient lighting conditions .............................................................. 60
5.4.3 Various iris colors ......................................................................................... 61
5.5 Performance of head poses estimation ............................................................... 62
5.5.1 Pose estimation results using POSIT ........................................................... 62
5.5.2 Pose estimation results using LK Optical flow .............................................. 63
5.5.3 Compare of using two approaches ............................................................... 63
5.6 Performance of whole system ............................................................................. 65
5.6.1 Speed ........................................................................................................... 65
5.6.2 Tracking with off-the-shelf equipment .......................................................... 66
5.6.3 Gaze tracking accuracy ................................................................................ 66
5.6 Case study: GazePad ......................................................................................... 67
5.6.1 Motivations ................................................................................................... 68
5.6.2 Interface Design Concepts ........................................................................... 69
5.6.3 Operation ..................................................................................................... 71
5.6.3 Performances ............................................................................................... 72
Conclusions .................................................................................................................. 73
6.1 Critical reviews .................................................................................................... 74
6.1.1 Achievements ............................................................................................... 74
6.1.2 Limitations .................................................................................................... 75
6.2 Future work ......................................................................................................... 76
6.3 Application areas ................................................................................................ 76
6.4 Project feedback ................................................................................................. 77
References ................................................................................................................... 78
References ................................................................................................................ 79
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 8 of 81-
Revision History:
Date Author(s) Comments
09-04-2010 Jack Ng First draft version
11-04-2010 Jack Ng First release
16-01-2011 Jack Ng Changed the title;
Modified the Abstract;
Typos and Grammar errors correction
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 9 of 81-
Chapter 1
Introduction
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 10 of 81-
Introduction
Eyes are the most important sensory organ of human; more than half sensory
impressions come from eyes. Moreover, gaze is a powerful cue for determining what
might be interesting for the observer (Duchowski, 2003). Generally speaking, eye gaze
is an indicator showing a person’s attention over time. Therefore, eyes have not only
indispensable meaning for human communication, but also great potential to build
human computer interaction in a more natural and direct mode (Jacob and Karn, 2003).
Since gaze information has valuable and useful applications in human computer
interaction and user intention detection, various gaze tracking algorithms have been
proposed and some of them have been commercialized (Daunys et al, 2006).
Gaze tracking which is originated from the research of eye movement (Jacob, 1995) is
defined as a continuous process of measuring the "Point of Regard" (PoR) or the "Line
of Sight" (LoS) of eye (ITU Gaze group, 2009). Eye tracking and gaze estimation are
the two main procedures involved in tracking eye gaze. The process for detecting and
tracking relevant features (e.g. pupil center) in the eye image is known as eye tracking.
Gaze estimation is the mathematical procedure to translate image features into the
gaze coordinates.
With the advancements in computer vision technologies, recently, gaze tracking has
already been considered as a solved problem. Corneal reflection based tracking
method is commonly adopted by popular gaze tracking algorithms (Daunys and
Ramanauskas, 2004; Goni, 2004; Li et al, 2005) and commercialized gaze tracking
products (EyeTech Digital Systems, 2009; Lc technologies, 2009). Infrared (IR) lights
used to actively illuminate the eye region to produce speak of light reflected by the
cornea are known as corneal reflection or glint. The corneal reflection remains
stationary during eye movement. Based on the eye images captured by camera, gaze
can be estimated based on the relative position between pupil and the glints in the
image. Gaze tracking using active IR method can be divided into two types: remote
tracking and head mount system. Remote tracking system which widely appears in
commercial tracking systems usually employs high-end camera for image capturing.
High accuracy and few degrees of head movement can be achieved. Head mounted
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 11 of 81-
system is placed in a helmet or special glasses together with IR lighting device and
camera. The whole system follows the user’s head movement. Li et al (2005) shows
that satisfactory tracking results can still be obtained even low resolution camera is
used. Thus most low-cost gaze tracking solutions are based on the head mounted
approach (San Agustin et al, 2009; Li et al, 2005). However, gaze tracking systems
based on IR illumination have many limitations. First, most eye tracker using IR cannot
properly operate when presenting another light source since this method greatly relied
on IR illumination and thus quasi-stable lighting conditions is the minimal prerequisite
(Villanueva et al, 2008). As a result, this approach is only suitable for indoor use, and
not recommended for user wearing glasses. Second, the position of IR light source and
the camera need to be carefully calibrated before tracking. It is not a favor for home use
environment. IR lights are employed to produce corneal reflections as they are barely
visible to human vision. It truly enhances user experiences towards gaze tracking, but,
at the same time eyes’ protection mechanisms against bright light by the natural
aversion cannot function. Issues regarding the long period of IR exposure time are
raised (Mulvey et al, 2008). Guidelines regarding long period or close proximity
exposure of IR have not been addressed in current infrared safety standards yet. Thus,
potential eye hazard may exist. More recently, various eye gaze-tracking algorithms
without IR lights have also been proposed (Villanueva et al, 2008). Kohlbecher et al
(2008) proposed a gaze tracking algorithm based on the shape of iris through ellipse
fitting to infer the eye gaze. Again, high end hardware components are required.
As mentioned before, most gaze tracking systems are driven by specific design or high-
end hardware and operating software, they are varied by different manufacturers (Bates
et al, 2005). High costs of hardware and software have always been the barrier
prohibiting the wide spread of gaze tracking technology. The marketing study
conducted by Jordansen et al (2005) described that up to year 2005, an eye tracking
system in Europe costs from EUR 4,100 to EUR 17,900, which is around HK 47, 200 to
HK 207, 640. Furthermore, same study reported the majority targeted user groups of
commercial gaze tracking products are those with disabilities, such as ALS or locked-in
syndrome and research organizations. Widespread integration of eye tracking to
consumer-grade human computer interfaces is rarely seen.
This project focuses on promoting gaze tracking technology to consumer-grade human
computer interfaces by reducing the price, emphasizing ease-of-use, increasing the
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 12 of 81-
extendibility, and enhancing the flexibility and mobility. Instead of relying on active IR
illumination and the corneal reflections, a robust facial feature based gaze tracking
approach proposed by Chen et al (2008) is employed. 2D facial features are tracked
and then used to estimate gaze. In contrast with Chen et al (2008), my proposed
system only requires a single uncalibrated camera without hardware modifications (e.g.
building IR LED grid) instead of stereo camera, thus off-the-shelf components can be
used. This method can work properly without IR lights, so it makes the gaze tracking
system work under both indoor and outdoor conditions. Since no active illumination is
required, therefore wearing glasses is no longer a problem. Owing to low-cost and off-
the-shelf hardware components are employed, the price will be reduced by hundred
times if a webcam is used as it only costs about HK 100 to HK 500. Once the price
drops to the mentioned range, gaze tracking interfaces will appear everywhere
(Jordansen et al, 2005). Gaze tracking technology will revolutionize future development
of human computer interaction methodology. The framework is packed into a
programming library and available to public in an open-source package which makes
those complicated implementations transparent. Developers can build applications
concerning to gaze tracking interface only in few function calls.
The report will begin by presenting the historical and theoretical background reviews on
gaze tracking techniques. Detailed descriptions and justifications of the proposed
method will be shown immediately after literature review section. Experiments were
conducted with results to demonstrate the performances of our gaze tracking system. A
case study on an application prototype built on top of our library is discussed in details.
Finally, conclusions on achievements and limitations were given, and future works are
suggested.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 13 of 81-
Chapter 2
Literature Review
Gaze Tracking
Eye Tracking Techniques
Video-based gaze trackinghardware setting
Potential hazards with IR
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 14 of 81-
2.1 Gaze Tracking
General speaking, eye tracking is the process for measuring the eye position and
movement. This project is interested in gaze tracking rather than eye tracking, but rest
of the report will go to review a range of techniques in eye tracking as well as face
tracking, which is related to eye-gaze tracking. The term “gaze tracking” instead of
“eye-tracking” will be used when refers to issues of measuring the eye-gaze direction or
"Point of Regard". Knowledge in biological and psychological of the human vision
system is essential to understand the process from gathering eye's positional
information to eye-gaze information.
2.1.1 Biological structure of human eye
Figure 2.1: The Anatomy of the Eye
(Quade, 2009)
Eye is regarding as one of the most complex organs in the human body. Operations of
the eye can be imagined as operating a camera. Light rays from an object enter the eye
through a small hold called pupil, then passing through a focusing lens and finally be
focused on the retina. Ciliary muscles are responsible for changing the thickness of the
lens (i.e. focal length is adjusted) in order to focus objects from various distances on the
retina. Iris which gives the colored ring outlook of the eye is used to controls the amount
of light entering the eye. Retina is a membrane containing numerous photoreceptors
(rods and cons) which lying on the near surface of the eyeball similar to film in a
camera. The photoreceptor transforms the light energy to electrical impulses or neural
signals, and these signals are then transmitting to visual processing part of the brain
through the optic nerve (Hyrskykari, 2006).
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 15 of 81-
Figure 2.2: Cross section of a human eye
(Hyrskykari, 2006)
2.1.1.1 Visual angle
Visual angle (angular size) is the angle of a viewed object subtend to the retina. Given
the object’s height s, d the distance from the lens to the object, the visual angle α can
be calculated using the formula: (Hyrskykari, 2006).
Figure 2.3: The visual angle
(Hyrskykari, 2006)
2.1.1.2 Field of view
Field of view (or Field of vision) is the defined as the horizontal angular (linear or areal)
extent of a given scene that is seen by eyes and determined by the placement of eyes.
Fields of view can be classified into two types: field of view of an individual eye and field
of view of an overlapped portion of eyes (binocular field). Field of view of human is in-
between 160 to 208 degrees and 120 to 180 degrees for individual eye (Savas, 2005).
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 16 of 81-
2.1.1.3 Visual acuity
Visual acuity refers to the ability of a person to perceive spatial detail. The further the
distance apart from the fovea, the lower is the visual acuity. A normal young person has
visual acuity measuring in the order of minutes or sometimes seconds of visual angle,
but visual acuity will decrease as the age increase. Whereas visual acuity is measuring
in minutes, such accurate gaze estimation cannot be obtained because of gaze cannot
be considered to be only a sharp point on a scene. When a point on a scene falls onto
center of the fovea, not only that point can be perceived sharply, but also some other
surrounding areas that fall onto the rest of the fovea. In addition, a person can shift his
own visual attention without eye movement (Hyrskykari, 2006). These are the reasons
of some potential error is still appeared even an exact point of a scene fall onto the
fovea is tracked. This potential error is reported by Jacob and Karn (2003) is
approximate one degree and Duchowski (2003) two degrees.
Figure 2.4: The visual acuity of the eye (Hyrskykari, 2006)
2.1.1.4 Movement of eye
Eye movement can be divided into three types, namely saccadic movements, smooth
pursuit movements and convergent movements. Smooth pursuit eye movements occur
when eyes tracing a moving object in field of view. Convergent eye movements are
response to keep both eyes focus on the object. Since eyes normally do not only trace
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 17 of 81-
the object smoothly, but also perform sudden jumps from one point to another, this is
called saccadic eye movements. Saccadic eye movements are one of the fastest
movements our body can make. Eyes can perform rotation in an amazing speed of
about 500 degree per second and repeat this saccadic action over hundred thousand a
day (Savas, 2005). Saccadic eye movements are done by three pairs of muscles
attached outside on the eyeball. They are arranged in response three rotation actions:
horizontal (left - right), vertical (up - down) and about the line of sight. The saccadic
latencies are called fixations when the perception of visual objects occurs. Fixation time
is various for different tasks but typically average around 250 ms. Eyes are not entirely
steady throughout whole fixation period, they are performing some movements in a
smaller scale. These small movements caused recognition of fixation become more
complicated (Hyrskykari, 2006).
Figure 2.5:Eye’s directions of saccadic movements (Oculomotor Research Group, 2006)
2.1.2 Mathematical eye model
Formulate the eye as a mathematical model is a must in order to enable performing
precise description and calculation of gaze tracking. Optical axis is defined as an
imaginary line passing through the eye ball center and the pupil center. The visual axis
is defined as the line joining the center of the fovea and the lens which make an angle
to the optical axis. Daunys et al(2005) reported that in a typical adult human eye, the
fovea falls about 4-5 degree temporally and about 1.5 degree below the point of
intersection of the optic axis and the retina.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 18 of 81-
Figure 2.5:Mathematical model framework of eye (Daunys et al, 2006)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 19 of 81-
2.2 Eye Tracking Techniques
Although human computer interfaces involving user's eye-gaze control mode is a new
branch of eye tracking research. However, eye tracking itself has its long history being
used for medical or psychological research for more than a half century, but not for
everyday computer interfaces (Duchowski, 2003). Tracking pupil or iris center and
determine the degree of eye movement in the face image is the first step of designing a
gaze tracking method. Eye detection and eye tracking techniques have been developed
for more than a half century which is employed in medical and psychological research
(Duchowski, 2003).Recently, eye tracking approaches can be classified into at least
four categories. Detail explanations are as follows.
2.2.1 Electro-Oculography (EOG)
Electro-Oculography (EOG) technique was been widely used in eye movement tracking
over the pass forty-year and still being frequently used in the clinical environment today.
There are approximate prominently 1mV potential differences between Cornea and
Fundus. EOG evaluates eye movement by measuring the electric potential differences
of skin around the eyes. This technique is measuring eye movement relative to user’s
head position. Therefore, it is not quite suitable for measuring point of regard. Although
EOG is cheap and non-invasive, however it is not a reliable method for quantitative
measurement due to electrical signal might be changed even there is no eye movement
as well as affected by metabolic changes in the eye (Savas, 2005).
Figure 2.6: An EOG implemented eye tracker
(EagleEyes Project, 2009)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 20 of 81-
2.2.2 Scleral Contact lens/ Search Coils
This method employs contact lens mounted with wired coil attaching on the eye directly.
Electrical potential difference is induced when the wired coil moving in the magnetic
field, the eye’s movement was calculated through measuring the induced electric
potential differences of the wired coil. EOG gives a very high temporal and spatial
resolution result, which allowed small eye movement measurement. Although this is the
most precise method for performing eye tracking, but it is invasive. Therefore, this
method is rarely used in the clinical environment but usually implemented in research
environments (Duchowski, 2003).
Search Coil
Turntable with Primelec Coil System, Neurology
Dept. University Hospital Zurich
Coil Frame (350 mm)
Figure 2.7: Eye Tracking System CS681 (Primelec, D. Florin., 2009)
2.2.3 Video-Oculography with Corneal Reflection
When a fixed light source is actively illuminating the eye region, light reflections will be
formed on the cornea, known as “Purkinje images”. Infrared (IR) light is usually being
used as that light source since IR is barely visible to human eye hence it does not serve
as a distraction. The first Purkinje image (called “glint”) is captured by eye tracker using
calibrated infrared sensitive camera. The position of glint is remained constant with
minor head movement, thus eye rotation is truly reflected by the position between the
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 21 of 81-
pupil centre and the glint. Therefore, viewer’s Point of Regard (POR) can be calculated
using this prosperities. There are two general types of eye tracking techniques related
to active eye illumination method: “Bright Pupil Tracking” and “Dark Pupil Tracking”.
The different between these two techniques is based on the location of the light source
(Daunys, 2006; Duchowski, 2003; Glenstrup and Engell-Nielsen, 1995). Both
techniques will produce large iris pupil contrast in the captured image which allows
robust eye tracking, but, there are two related problems. First, the contract between
pupil and the rest of eye area in the image becomes not clear if other external light
sources are present at the same time such as a outdoor condition which make the
tracking algorithm hard to determine the boundaries of eye features. Second, when the
user wearing glasses or contact lenses, multiple glints will appears which is hard for the
algorithm to find the true corneal reflection (Daunys, 2006).
Figure 2.8: The four Purkinje images are form when lights directed to the eye
(Glenstrup and Engell-Nielsen, 1995)
2.2.3.1 Bright Pupil Tracking
If the illumination is coaxial with the optical path, eye will act as a retroreflector.
Therefore, the lights reflecting off the retina will be the same direction as incoming light
which similar to red eye. This phenomenon is known as bright pupil effect which make
pupil appears as a very bright spot and iris as a dark disc in the captured image. This
approach is work better for people with blue iris color (Tobii Technology AB, 2009).
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 22 of 81-
Figure 2.9: A gold corner cube
retroreflector (Retroreflector, 2009) Figure 2.10: Bright pupil formed on the
captured image(Daunys, 2006)
2.2.3.2 Dark Pupil Tracking
If the illumination source is offset from the optical path, the reflecting light form retina
will not same as the incoming light direction. Therefore the pupil appears dark in the
captured image. This approach is work better for people with dark eyes (Tobii
Technology AB, 2009).
Figure 2.11: Working principle of a corner reflector (EyeTracking, 2009)
Figure 2.12: Eye region image with corneal reflex (Daunys, 2006)
2.2.3.3 Problems
Large iris pupil contrast allows robust eye tracking with all iris color, but, there are two
problems with this technique. First, the contract between pupil and the rest of eye area
in the image becomes not clear if other external light source is present at the same time
such as outdoor condition which made the tracking algorithm hard to determine the
boundaries of eye features. Second, when the user wearing glasses or contact lenses,
multiple glints will appears which is hard for the algorithm to find the true corneal
reflection (Daunys, 2006).
http://upload.wikimedia.org/wikipedia/commons/0/03/Corner-Cube.jpg
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 23 of 81-
2.2.4 Video-Oculography with visual light
This approach is only relying on the image analysis algorithms alone rather than active
illumination. Images captured by calibrated camera under normal lighting are direct
inputted to the algorithm for performing gaze estimation. There are various algorithms
proposed under in this category, which can be mainly classified into three types:
deformable templates based, appearance based, and feature based methods.
Deformable template-based and appearance based methods are attempted to fit the
predefined model to the image while feature based are attempted to fit the image
features to the fixed model (Daunys, 2006).
2.2.4.1 Deformable templates based
Deformable template tracking method is based on a manually predefined generic
template which is matched to the image. The correlation value computed for a
candidate image with the predefined template which is used to determine existence of
eye. This approach is accurate and easy to implement, but it cannot deal with variation
in scale, pose and shape effectively. Moreover, matching a template is computationally
demanding and high contrast image is required (Savas, 2005).
2.2.4.2 Appearance based
Appearance based tracking method is based on statistical analysis and machine
learning to find the relevant characteristics of eye and non-eye images. The learned
characteristics are appearing in the form of distribution models or discriminant
functions, which are used for eye detection. Distribution model is a probabilistic
framework. Bayesian classification or maximum likelihood is used to classify whether a
candidate image as eye or non-eye. However, high dimensional image makes the
implementation of Bayesian classification infeasible. Discriminant function is derived by
projecting the high dimensional image to a lower dimensional space which used to
image classification. PCA and Hidden Markov Model are the most commonly used
appearance based technique (Savas, 2005).
2.2.4.3 Feature based
Feature based tracking methods is based on extracting particular features such as color
distribution of the eye region or feature points of the eye in the image to perform
identification. Feature based tracking consists of feature extraction and feature
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 24 of 81-
mapping. A typical feature based tracking algorithms are particle filter, Gabor filters,
Kalman filtering and mean shift (Zhou et al, 2008).
2.3 Video-based gaze trackinghardware settings
Video-based gaze tracking approach is the main concern of this project, therefore, only
hardware setting adopted by video based gaze tracker will be reviewed. Generally
speaking, video based gaze tracker can be classified into two types: head-mount
tracker and remote-tracker (or table-mount) based on whether the cameras are
attached to the subject’s head or positioned remotely.
2.3.1 Head-mount
Head-mounted gaze tracker estimate gaze direction relative to the user’s head position.
Applications that require fast head movements and low cost gaze tracking solutions
(Winfield, 2005; The system I4Contro, 2009) are preferred to employ head-mount
approach. Since the camera is placed in a close range to user’s eye. On the other
hand, higher intrusion level makes this type of trackers unsuitable for computer control
(Daunys et al, 2006).
Figure 2.13: openEyes system using head-mounted device (Winfield, 2005)
2.3.2 Table-mount
Table-mount gaze trackers track the head position and orientation in 2D or 3D space.
This type of system does not require any attachment to the user and allow free head
movement within certain limits, thus these are more adequate for computer control.
However, accuracy of gaze estimation is lower compare with head-mount system. High-
resolution camera is usually preferred for remote tracking (Daunys et al, 2006).
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 25 of 81-
2.4 Potential hazards with IR
The spectral emission of infrared LEDs employed in most IR based eye tracker is
usually limited to a near infrared band (IR-A, 780-1400 nm). Notwithstanding IR-A band
LEDs have been tested and result shows that clearly no hazard to the eye for viewing in
some short period of time (few hours) based on current national and international ocular
exposure limits for infrared optical radiation. However, explicit guidelines regarding long
period or close proximity exposure of the eye to IR have not been addressed in any
current infrared safety standards yet. Potential hazards with are still remaining an open
question (Mulvey et al, 2008). Moreover, Mulvey et al (2008) reported emissions are
possible outside IR-A range if a conventional incandescent lamp or discharge lamp that
has been filtered to block most the visible light and transmit IR-A is employed.
Figure 2.15: The different photo-biological effects of optical radiation (Mulveyet et al, 2008)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 26 of 81-
Chapter 3
Methodology
Introduction
Active shape model
Active appearance model
POSIT
RANSAC
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 27 of 81-
3.1 Introduction
Instead of relying on active IR illumination and the corneal reflections, a robust facial
features based gaze estimation approach is proposed based on the ASM face tracking
algorithm and proposed eye features extraction algorithm. To achieve our aims, the
proposed approach involves different algorithms and solutions in computer vision. In
this chapter, briefly reviews are given on some of the techniques and concepts, which
were adopted in our system.
3.2 Active Shape model (ASM)
Automatic and accurate location of facial features is a difficult problem in computer
vision. The variety of human faces, expressions, facial hair, glasses, poses, and lighting
contributes to the complexity of the problem. Active Shape Model (ASM) is a solution to
this problem. Active Shape Model is a kind of shape statistical models, which iteratively
deform to fit to the object in a new image. The shape is constrained by a Statistical
Shape Model which only can be deformed in ways seen in a training set of annotated
examples. ASM is needed to be trained on a set of manually landmarked images first.
After training, the statistical shape model can then be used to extracting feature points
on a face. The searching steps involve:
1. Locating each landmark independently.
2. Correcting the locations of each landmark if necessary by looking at how the
landmarks are located with respect to each other.
Figure 3.2: ASM template fitting process
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 28 of 81-
(Cootes, 2009)
3.3 Active appearance model (AAM)
The Active Appearance Model (AAM) merges the shape and texture model into a single
model of appearance. AAM itself contains a statistical model of the shape and grey-
level appearance of the object of interest. Template matching of the model to the image
involves finding model parameters, which minimize the difference between the image
and a synthesized model (Cootes, 2009).
Figure 3.2: AAM template matching process (Cootes, 2009)
3.4 POSIT
Pose from Orthography and Scaling with Iteration” also known as POSIT is a useful
algorithm used to estimate the positions of known objects in three dimensions. It is
originally proposed by DeMenthon, D. 1993. To compute an object’s pose, at least four
non-coplanar points, their corresponding 2D projections on the image must be found.
The perspective scaling of known objects can be found and thus compute its
approximate pose through the first part of the algorithm – pose from orthography and
scaling (POS). However, the approximations from POS will not be very accurate.
Therefore, the four observed points are then projected at the pose calculated through
POS and start POS algorithm again with these new point positions. Typically, the true
object pose can be recovered within four or five iterations. (Bradski and Kaehler, 2008)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 29 of 81-
3.5 RANSAC
RANSAC is an abbreviation for "Random Sample Consensus" was first published by
Fischler and Bolles in 1981. POSIT is an algorithm for robust fitting of the model to data
in the presence of many data outliers. The inputs to the RANSAC algorithm are a set of
obtained data values, a parameterized model which can be fitted to the data, and some
threshold parameters.
1. Selecting a random subset S of the original data as hypothetical inliers
2. Fitting the model to the hypothetical inliers
3. Testing all other data against the fitted model to see whether they fit well to the
estimated model, if yes, also considered as a hypothetical inliers
4. Then re-estimating the model to new hypothetical inliers if there contain sufficient
points
5. Evaluating the model by estimating the error of the inliers relative to the model
Repeat step 1-5 N times, each time producing either a rejected model or a refined
mode, the refined model is kept if its error is lower than the last saved model. After N
iterations, we can get the fitted model which is best fitted (Fisher, 2009).
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 30 of 81-
Chapter 4
Design & Implementation
Introduction
Development environment
System design
System implementation
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 31 of 81-
4.1 Introduction
The designs and implementations of the gaze-tracking framework library named
“GazeLib” will be presented in details. The main capabilities of the library are:
1. Face detection
2. Annotated facial feature points tracking
3. Head pose estimation
4. Features extraction of both eyes individually ( pupil or iris center and radius)
5. 2D trajectory extraction of both eyes individually ( measuring the pupil center )
6. Gaze tracking of both eyes individually
7. Blink detection of both eyes individually
In additions, tracking results in video sequence (e.g. facial model fitting result, eye
trajectory extraction, etc.) is also implemented, which may provide useful information for
further analysis or studies.
4.2 Development Environment
The system is developed using a notebook with the following configurations:
Hardware configurations:
1. CPU: Intel Core2Due (P7500) 1.66 GHz, FSB 800 MHz
2. RAM: 2GB DDR2-667
3. MON: 13.3-inch LCD widescreen display, 1280 x 800 pixel resolution
4. CAM:
a. Built-in iSight, 640 x 480 pixel resolution
b. E-tiger, 640 x 480 pixel resolution
Software configurations:
1. Microsoft Visual C++ 2008
2. OpenCV 2.0
3. ASMLibrary 4.09
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 32 of 81-
4.3 System design
4.3.1 Architecture
The system architecture is illustrated in the figure. The system is based on OpenCV
library and ASM Library. OpenCV (Open Source Computer Vision) programming library
originates from Intel, providing a lot of algorithms for computer vision processing. ASM
Library (Active Shape Model Library SDK) is a C++ implementation of the Active Shape
Model framework developed by YAO Wei. This great library contains algorithms for
training and building the statistical model together with the fitting algorithms.
“GazeLib” is packed and released as an open source programming library framework.
Developers can build their own applications making use of gaze tracking technology
based on “Gazelib” without any prior knowledge.“GazeLib” not only aims to achieve
high reusability, but make it possible to create revolutionary applications that will set the
bar for the next generation gaze tracking application.
Figure 4.1: System architecture
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 33 of 81-
The system is divided into three components:
1. Face detection and tracking component
2. Eye tracking component
3. Gaze to screen coordinates mapping component
All components need to be work collectively and correctly in order to have a working
gaze tracking system.
4.3.2 Conceptual Class diagram
Figure 4.2: System class diagram
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 34 of 81-
4.4 System implementation
4.4.1 Overall system flow
Figure 4.3: System flowchart
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 35 of 81-
The flowchart summarized the main system flow as a whole, face detection, facial
feature tracking, head pose estimation, eye feature extraction and gaze mapping
process. Every algorithm in details will be discussed through later sections.
4.4.2 Active Shape Model building
At the very beginning, a onetime Active Shape Model (ASM) training process needs to
be done before everything can work correctly.
Number of landmarks
Figure 4.4:Mean error versus number of landmarks
(Milborrow and Nicolls 2008)
The landmark point number of the model directly affects the fitting result. Milborrow and
Nicolls (2008) conducted experiments on point-to-point error against the number of
landmarks, the result in the study shows that in order to improve the mean t is to
increase the number of landmarks in the model. Because of fitting a landmark tends to
help fitting other landmarks. Therefore fitting results are improved by increasing the
landmarks number. In a meanwhile, the searching time increases roughly linearly with
more landmarks.
As a result, different head pose images with manually annotated 68 facial landmark
points of various individuals were used for active shape model training algorithm to
build the 2D statistical model. The face image used in ASM training is extracted from
the G-NET AGING DATABASE and DTU IMM face database.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 36 of 81-
Figure 4.5: Manually annotated 68 landmarks face model
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 37 of 81-
4.4.3 Face detection and tracking
The face detection algorithm is based on Viola-Jones Classifier implemented in the
OpenCV library. The facial feature points tracking algorithm is a model-based face
tracking method.
Low cost tracking equipments such as a webcam can only offer low resolution capture,
which is an inborn limitation to the tracking system. Fortunately, the eye tracker does
not require very accurate feature points fitting result. A satisfactory result is enough to
define the searching windows for eyes’ features extraction. In addition, the whole
system operates in real-time, so the tracking speed is directly affecting the performance
of the tracker. Thus, high stability and efficiency with acceptable accuracy are the main
concerns in designing the face tracking algorithm.
Active appearance model (AAM) is originally designed to use as facial feature
extraction. Compare with active shape model (ASM), AAM is more stable and accurate.
AAM face tracking is stable provide that nearly frontal face. It does not work quite well
for tracking face at an angle. It is found that AAM cannot deform to right shape after
head rotation. ASM tracking is not as stable as AAM, but a face rotated at some degree
it is still able to be tacked but not rotation in a great angle.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 38 of 81-
Figure 4.6: Fitting results comparison between AAM and ASM
In addition, the computational cost for performing AAM is much higher than ASM. The
result in the experiments shows that cost for performing AAM only can achieve average
5 fps while ASM can achieve average 9.5 fps. The results indicated that AAM is not a
good choice for implementing a real-time gaze tracking system.
Figure 4.7: Performances comparison between AAM and ASM
By experiments and observations, the model fitting stability and result can be improved
by reducing noises and details on the inputted image. The following figure showing the
fitting results of performed ASM fitting directly, use Gaussian filter or Median filter
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 39 of 81-
before performing ASM fitting. The result shows that the fitting performance of the
Median filter is the best among three. Therefore, the inputted image is passed to the
median filter to perform noise reduction first. Median filter can reduce the noise in an
image while preserving the edges at the same time. This is the reason to choose
Median filter rather than other filters. In other to further increase the speed, the image
will be scaled by half before doing active shape model fitting.
Figure 4.8: Face fitting result with different filters
Before doing active shape model fitting, the face is first detected by using Viola-Jones
Classifier. If the face is presence, the mean model shape will be initiated by using the
VJ classifier detection result. Finally, the active model will be fitted iteratively to the
image until converge or maximum allowed number iteration reach.
Locating facial landmarks is equivalent to locating facial features since landmarks lay
out facial features, which are the main advantage of adopting this approach. As there
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 40 of 81-
are 68 landmarks distributed on the whole face, not only the feature points around eyes
will be tracked. Therefore, this tracking method is not limited to eyes’ region extraction,
but also can be extended to other facial feature extraction and tracking such as mouth.
Figure 4.9: head and detection tracking algorithm flowchart
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 41 of 81-
4.4.4 Head poses estimation
There are two approaches to implement head poses estimation algorithm. The first
approach is using POSIT algorithm provided in OpenCV library. The second approach
is using LK Optical, RANSAC algorithm together with POSIT to perform head poses
estimation. The details of two approaches will be discussed in the following section.
4.4.4.1 Approach 1: Using POSIT only
This approach is relative simple compare to the second approach. The head tracker is
keep fitting the facial landmarks to the inputted image, once the user initiated the 3D
head pose estimation algorithm, the lankmarks fitting result from head tracking will be
used to build a 3D head model and the POSIT object. In the next round execution, the
POSIT object together with the face fitting result will be inputted to the POSIT algorithm
to perform the 3D head poses estimation, a 3D rotation matrix and a translation vector
were obtained.
Figure 4.10: POSIT head pose estimation algorithm flowshart
4.4.4.2 Method 2:Using LK Optical flow + RANSAC + POSIT
The second approach is more complicated than the first approach. It is trigger to initiate
by the user, or otherwise the system just keeps doing face tracking and eye features
extraction. Numerous feature points will be marked on the inputted face image when
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 42 of 81-
the algorithm is first time triggered. The marked feature points will be considered as
model points set which are used to initiate POSIT object later. The feature points
marked will be tracked using LK Optical flow algorithm. RANSAC algorithm is applied to
the successful tracked feature points. The following steps were performed:
1. A subset from the successful tracked points will be randomly selected to perform
3D pose estimation.
2. All other points in the successful tracked points will be tested against the fitted
model to see whether they fit well to the estimated model.
3. Points within a certain distance threshold are considered as inlier to the model.
These steps were repeated several times, the set of with the largest inlier number is
considered as the best tracked points set.
Finally, POSIT object is built on the fly using the model points and the best tracked
points. POSIT algorithm is used to estimate the 3D head pose. After the estimation
process, the outliners in model points set will be deleted.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 43 of 81-
Figure 4.11: Using LK Optical flow, RANSAC with POSIT to estimate head pose
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 44 of 81-
4.4.5 Eye feature extraction
The pupil region can be extracted by using the edge detection and ellipse fitting
algorithm. In order to obtain a good fitting result, some preprocessing steps need to be
done before performing edge detection.
The first step is to convert the color image to grey scale color space in order to facilitate
edge detection. By experiment, more strengthened edge image can be obtained if using
different components of a color spaces, for example, using B component of RGB is
better than converting whole RGB to gray-scale. After converted to grey scale, median
filter is applied to reduce noises while preserving the sharp edges. Histogram
equalization is performed spreading out the brightness values of the image, thus image
contrast is increased.
Colors of human iris are ranging from brown to green, blue and dark brown, etc. The
color of the pupil region is always black despite a wide range of iris colors. There are
strong contrasts between dark iris/pupil with their outside area. Therefore binary
threshold can be applied to the grey scale eye image to eliminate edges formed by
lower intensity areas.
Figure 4.12: Image threshold results of different color iris
Edge detection is performed on the threshold image, since more than one edges curve
maybe existed. Therefore a knowledge based method is implemented in order to
extract the best fit iris/pupil ellipse. The best fit ellipse is selected using following steps:
1. Select an edge from the edges detected.
2. Perform ellipse fitting on the selected edge curve.
3. Mark the fitting as valid if the ratio of the major radius and the minor radius is
less than three times, otherwise ignore the fitting result and continue to the next
iteration.
4. Repeat the above steps until all edges were examined
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 45 of 81-
5. Select the ellipse with the largest area
After the best fitted ellipse is extracted, the rest steps are going to find whether the
ellipse is a valid iris. The testing criteria are as follows:
1. Result ellipse width > inputted eye region image width / 2
2. Result ellipse width inputted eye region image height
4. Result ellipse height
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 46 of 81-
4.4.6 Gaze Estimation
The screen is treated as a rectangle with n * m pixel. Assuming eye movement range
corresponding to the corner points of the screen are formed a prefect rectangle with i * j
pixel. A simple gaze estimation method using ratio mapping is adapted. The details will
be discussed in the following sections.
4.4.6.1 System calibration
Since a ratio based screen coordinate mapping is used to estimate the eye-gaze,
therefore, a calibration procedure is performed in order to obtain different pupil center
positions in pixel (reference to the capturing screen) corresponding to the edge of the
screen to calculate the eye movement rectangle. User is required to focus on different
calibration points printed on the screen during the system calibration process.
Figure 4.14: Nine point calibration procedure (red dot showing the calibration points, the blue color number is the calibration
sequence and brown characters inside () defining optimal criteria to that particular calibration point)
The steps are as follows:
1. A particular calibration point is printed on screen at each time
2. The pupil positions corresponding to that particular point in certain time period
are recorded while user focusing on that point.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 47 of 81-
3. An optimal point is selected from the recorded points set in order to represent
the pupil position corresponding to the particular screen calibration point.
p.s. The optimal point means the point which closest to the limit, different
calibration point have its own optimal definition. The detail is shown in the
following figure. By experiment, the following optimal definition is more accurate
than calculating the average among the recorded set of points.
4. The above steps are repeated until pupil centers’ position of all calibration point
is obtained.
Figure 4.15: system calibration flowchart
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 48 of 81-
4.4.6.2Gaze estimation
For gaze estimation, a very simple mapping is performed. Firstly, the screen is treated
as a square with n * m pixel and the eye movement range corresponding to the corner
points of the screen are formed a prefect rectangle (gaze rectangle) with i * j pixel. The
gaze rectangle is calculated by fitting a rectangle to the pupil centers obtained in the
system calibration procedure.
Pupil center and gaze rectangle coordinates are measured reference to the capturing
screen coordinate. By solving the above two equations, the gaze screen coordinates
mapping can be obtained.
Figure 4.16: The pupil center to screen coordinate mapping
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 49 of 81-
Chapter 5
Results & Comments
Introduction
Testing environment
Performance evaluations
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 50 of 81-
5.1 Introduction
In this chapter, evaluation on performance of the whole system and other subsystem
will be presented. At the beginning, prosperities of the testing environment will be
defined. Then, performance of each component is explained. Since the result obtained
from pervious step is the input of the next step, thus the performance of the whole
system is highly dependent on the performance of each component. For an example, in
order to extract the pupil center of each eye, the system must first localize the facial
feature points correctly; then the eye searching region will be calculated directly from
the face tracking result. Finally, the eye features can be determined. At the end of this
chapter, performance of an application called “GazePad:” which implemented based on
our gaze tracking library will be studied and discussed.
5.2 Testing environment
The system is tested with a notebook and a PC workstation with same software
configurations but in different hardware environments:
Notebook configurations:
1. CPU: Intel Core2Due (P7500) 1.66 GHz, FSB 800 MHz
2. RAM: 2GB DDR2-667
3. MON: 13.3-inch LCD widescreen display, 1280 x 800 pixel resolution
4. OS : Window 7 Professional 32bit
PC workstation configurations:
1. CPU: Intel Core2Due (E8500) 3.17 GHz, FSB1333 MHz
2. RAM: 4GB DDR2-667
3. MON: 21-inch LCD widescreen display, 1280 x 800 pixel resolution
4. OS : Window Server 2008 Enterprise 32bit
Camera used:
1. Macbook built-in iSight, 640 x 480 pixel resolution
2. E-tiger, 640 x 480 pixel resolution
3. Polar Net-Cam, 640 x 480 pixel resolution
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 51 of 81-
Software configurations:
1. Microsoft Visual C++ 2008
2. OpenCV 2.0
3. ASMLibrary 4.09
Figure 5.1: showing the testing environment hardware settings
5.3 Performance of face tracking
5.3.1 Presence of distractions
ASM face fitting can handle partly occluded face, because of other points at un-
occluded part are helping to fit the points at occluded part. It is also indicated that the
number of point is directly affecting the fitting performance in terms of accuracy. The
more the points in training the ASM, the more stable and accurate fitting result can be
achieved. However, increasing the number points also affects the fitting speed. Since
the face tracker is Implemented using the active shape model, therefore the face model
can be fit on the image as long as the most part of the face is presented, which allowing
the tracker to be robust against distractions. As a result, presence of other faces, hand
movements across the face or wearing glasses will not cause the algorithm to lose
tracking of the face. Experiences against different distractions were conducted to show
the performance of the face tracking algorithm.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 52 of 81-
5.3.1.1 Glasses
Experiments are conducted in regarding ASM face fitting when the user wearing
glasses. Two types of glasses were tested: glasses with non-black color glasses frame
and glasses with black color glasses frame. The result shows that the head tracker is
performed well when the user wearing glasses regardless what type it is.
Figure 5.2:Tracking with a pair of non-black color frame glasses
Figure 5.3: Tracking with a pair of black color frame glasses
5.3.1.2 Passing hand
Unlike other tracking techniques, for example, Camshift or LK Optical flow, ASM is not
easily distracted by a moving similar color object across the tracking object. ASM can
be used to tack face with large movement as well while LK Optical flow not able to do
this.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 53 of 81-
Figure 5.4: Face tracking with hand occludes part the face
5.3.1.3 Multiple faces
Our face tracking algorithm is able to handle present with other faces. Only the face
that is the closest to the central axis of the capturing screen will be considered while all
other faces will be ignored. It is also indicated that the face tracking algorithm can be
applied to different people.
Figure 5.5: Face tracking with multiple faces
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 54 of 81-
5.3.1.4 Face-like structures
The face tracking result is not distracted by face-like structures since the ASM only
fitted to the face closed to the training set.
Figure 5.6: Face tracking with face-like structure present
5.3.2 Various ambient lighting conditions
5.3.2.1 Indoor
Different indoor ambient lighting conditions are stimulated by adjusting a different level
of brightness against the normal lighting level. It is shown that the face tracking
algorithm can be applied to different lighting conditions means our face tracking
algorithm works well in natural lighting frustrations. However, the tracking result is not
quite stable and accurate in too light or too dark condition.
Figure 5.7: Face tracking in various lighting simulation
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 55 of 81-
5.3.2.2 Outdoor with complex background
The face tracker is tested in an outdoor environment with complex background and
moving object present. It works well in outdoor.
Figure 5.8: Face tracking at outdoor environment
5.3.3 Different facial expressions
The trained active shape model can be deformed to fit face expressions as well. Some
of the tracking results are shown in the following figure. The ASM fitting result is good
as long as large ranges in different facial expressions in training samples are provided.
Figure 5.9: ASM model fitting with difference facial expressions
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 56 of 81-
5.3.4 Blurred inputted image
A bit blurring of the inputted image is not affecting the fitting result provided that the
edge of the face can be recognized. However, sometimes the face shape model maybe
converged to wrong shape.
Figure 5.10: ASM and iris fitting on blurred image
5.4 Performance of eye features extraction
5.4.1 Presence of distractions
The result of eye features extraction is highly depending on the face fitting result. The
eye tracker works well as long as the face fitting result is good, since the active shape
model defines the eye tracker searching region. If the deformable model is converged
to wrong shape, the eye features extraction result will be wrong.
Figure 5.11: Searching regions of two eyes
In some cases, the eye tracker is not affected even the active shape model is
converged to wrong shape if the upper part of the shape model is almost in a right
position.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 57 of 81-
Figure 5.12: Eye features extraction in wrong converged shape model
5.4.1.1 Glasses
The face tracking result does not much affect by different type of glasses. However, eye
features extraction can be greatly affected by different glasses frame color of the
glasses. For example, user wearing a pair of glasses with black color glasses frame,
since the iris fitting algorithm is based on the selecting the fitted ellipse with the largest
area. Therefore, sometimes the features extracted maybe wrong.
Figure 5.13: Features extraction result affected by black color glasses frame
In most of the times, the features extraction is right. The following experiments show
the extraction results.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 58 of 81-
Figure 5.14: Eye features extraction with a pair of non-black color frame glasses
Figure 5.15: Eye features extraction with a pair of black color frame glasses
5.4.1.2 Passing hand
As discussed in the early parts, eye features extraction is highly depending on the face
tracking result. Since, the face tracker can handle a similar color object passing through
the user’s face, therefore the eye features can be extracted correctly by the eye tracker.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 59 of 81-
Figure 5.16: Face tracking with hand occludes part the face
5.4.1.3 Face-like structures and multiple faces
Since the searching region inputted to the eye tracker is defined by the fitted face
shape model. Thus, present of multiple faces or face-like structure does not cause any
effects on eye features extraction.
Figure 5.17: Eye features extraction with face-like structure present
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 60 of 81-
5.4.2 Various ambient lighting conditions
Lighting is an important factor which affecting the extraction result the most. Extreme
High or low brightness will cause errors in the color thresholding process, thus the
extraction quality and performance will be degraded. Our eye tracker can perform well
in the following conditions.
5.4.2.1 Indoor
Different indoor ambient lighting conditions are stimulated by adjusting a different level
of brightness against the normal lighting level same as in face tracking lighting
simulation. The results showing the eye feature extraction does not work quite well in
the too dark environment. Features can still be extracted, but the stability and accuracy
are degraded.
Figure 5.18: Eye features extraction in various lighting simulation
The tracking result in the dark environment is not performed well because of the color
contrast different between the iris region and other regions on eye areno longer sharp.
As a result, bad threshold image of the eye region is produced, which causing the iris
fitting algorithm not work well.
5.4.2.2 Outdoor with complex background
IR based gaze tracker is not performed well in outdoor condition since IR is easily
affected by present of other light sources. Our tracker is based on ambient color, which
can work well in outdoor environment.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 61 of 81-
Figure 5.19: Eye features extraction at outdoor environment
5.4.3 Various iris colors
Our eye features extraction algorithm is originally designed for dark iris color eye, but in
reality, human iris colors are in a wide range. Our eye features extraction algorithm can
be applied to different colored iris eye without code modification. The only thing
changed is that the algorithm is extracting the pupil contour rather than extracting the
iris region. The algorithm works because the pupil region must be black regardless of
the iris color. Thus pupil center and region can be still extracted. This change will not
cause significant affect on gaze estimation since gaze estimation algorithm is based on
the pupil center.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 62 of 81-
Figure 5.20: Pupil fitting with different iris color
5.5 Performance of head poses estimation
Two approaches are proposed in earlier part to perform head pose estimation, there
are pros and cons in both approaches. The result is illustrated and discussed in details
in this section.
5.5.1 Pose estimation results using POSIT
Head pose is recovered by using POSIT algorithm, in general the resulting 3D rotation
matrix and the translation vector are correct but not in a very precise manner. In
addition, the ASM tracking is not quite stable resulting constantly frustration in the head
poses estimation result. There are always +/- 10 degree errors in the estimated result.
Figure 5.21: Head pose estimation result
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 63 of 81-
5.5.2 Pose estimation results using LK Optical flow
This approach estimate3D head pose using POSIT together with LK Optical flow and
RANSAC. The estimation result is very stable and accurate since it does not depend on
the ASM face fitting result. However, the feature points are easily affected by moving an
object across the face or rapid head movement.
Figure 5.22: The rotational angles (roll, yaw and pitch) is calculated using the head pose estimation result
5.5.3Compare of using two approaches
The first approach is only using POSIT algorithm while the second approach is using
POSIT together with LK Optical flow and RANSAC. The tracking accuracy and stability
of the second approach are ahead of the first approach. However, our system is
adopted the first approach as head pose estimation algorithm because of the following
factors:
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 64 of 81-
1. The second approach shows a better estimation result, but it is computational
costly than the first approach. It only operates at half speed of the first approach
(showing in the following figure). Since our gaze tracking system is operating in
real-time, thus second approach is not affordable to be used.
2. In the second approach, the tracking deleted by using RANSAC is not recovered
automatically while the first approach does not have this problem.
3. Neither other distractions (e.g. moving hand) nor large head movement is
allowed in the tracking process in the second approach since LK Optical flaw will
lose track of the feature points.
Figure 5.23: Comparison of two head pose estimation approaches
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 65 of 81-
Figure 5.24: A crossing hand on the face
5.6 Performance of whole system
In this project, a real-time gaze tracking library with 3D head poses estimation is
developed. The performance of each steps are critical for the final output. In this section
performance of the whole system will be tested.
5.6.1 Speed
The system is tested on two computer setting environments mention in section 5.2. All
components are evaluated as a whole. Our gaze tracking system can achieve an
overall average nearly 8 fps in the desktop testing environment and average 5 fps in the
laptop testing environment. The result shows that the frame rate will be higher with
increasing processor speeds.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 66 of 81-
Figure 5.25: Measuring speed in frame per second during tracking (nearly 8 fps
achieved)
5.6.2 Tracking with off-the-shelf equipment
The gaze tracking library is tested against various off-the-shelf monitors and low cost
webcams. The webcams used were listed in following figure. All of them are not
calibrated before use.
Figure 5.26: The webcams used in the system development and testing
5.6.3 Gaze tracking accuracy
The final step reaches to estimate the eye-gaze in our system. Gaze estimation
algorithm highly depends on the performance of the face tracking result and eye
features extraction result. In our case, since the ASM fitting is not very stable, all
landmarks’ position in the shape model shift slightly frame to frame. Thus the result in
eye features extraction is slightly affected. As a result, the gaze estimation result
appears to have a slight pulsating movement. The overall position of the estimation is
correct dispirit of the pulsating movement errors. The following figure shows the result
of a user focus in the nine positions same as in the system calibration.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 67 of 81-
Figure 5.27: Gaze mapping for focusing on the nine calibration points’ position
(Green color cross indicating the left eye gaze, while blue color cross is indicating the right eye gaze)
5.6 Case study: GazePad
GazePad is designed as a test bed application to test the possibilities for using eye-
gaze as an input control. GazePad is a simple application built on top of our gaze
tracking library. GazePad acquires gaze information from the gaze tracking library, the
gaze screen coordinates is then mapped on characters pad in order perform letter
input. GazePad is intuitive to use, learning and training are not required for using this
application.
Figure 5.28: GazePad operating environment
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 68 of 81-
5.6.1 Motivations
Consider a person's entire body is paralyzed (including mouth, facial movements, etc.)
but thinking and language processes remain intact. The person's brain is literally
locked-into their barely functioning body. How do they communicate with the outside
world? Traditionally, there are some alternative communication methods that can be
used. Some of them are listed below:
1. Using a simple blinking system, like blink once for "yes" and twice for "no".
2. Using a more complex Morse code blinking system.
3. With a vocal communication partner together with the simple blinking system, the
communication partner keeps constantly saying "Is it an A? Is it a B?" etc. Blink
once for "yes" and twice for "no".
4. Using an alphabet card board with a vocal communication partner, similar to the
above approach, partner keeps saying "Is the letter in the 1st row? Is the letter in
the 2nd row?" etc.
5. Some paralyzed people who still have free head movement can use a head
mounted stick for typing (shown in the following figure).
6. The most convenient method is using gaze tracking control like Prof. Stephen
Hawking.
Figure 5.29: A disable patient using a head-mount
stick for typing (www.skymyworlds.com, 2009)
Figure 5.30: Prof. Hawking with his gaze control
(Wikipedia, 2009)
All above methods are very slowly and inconvenient excepting the gaze control method.
There are strong indications that gaze tracking technique has potential to become an
important component in human machine interfaces. Ironically, although there are some
successful rolled out commercial gaze trackers in the market. However, those systems
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 69 of 81-
and equipment are super expensive, which are not affordable by most of the people.
High cost and special designed components (no other alternative choice) are the main
reasons that keeping apart the people who have actually needs. If the price of gaze
communication systems can be dropped to some range, gaze control could become a
preferred means of control for a large group of people (Jordansen et al,2005).
A gaze typing system called “GazePad” is developed based on low cost off-the-shelf
components that can be bought in most consumer hardware stores. Whish helping
those people lives become easier and meaningful.
The main user groups are people with motor neuron disease (MND)(e.g. whole body
paralyzed) and amyotrophic lateral sclerosis (ALS).
5.6.2 Interface Design Concepts
Since using gaze control cannot achieve as accurate as using the mouse when pointing
to a particular object (e.g. a small button). Thus a conventional on screen keyboard
layout cannot be adopted because of all buttons are closely packed and small.
Thereby, some alternative input systems using in touch screen mobile phones were
studied since they are not demanding on high accuracy. For example, Q9 Chinese and
Q9 English are divided based on a 3x3 matrix.
Figure 5.31: QCode Chinese input system (www.qcode.com, 2009)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 70 of 81-
Figure 5.32: Character board for paralyzed people communicate
(Univ. of Washington, 2009)
After tested the accuracy of our developed gaze tracking library and case study of other
alternative imputed methods, it is found that the button size is larger the better. Thus,
we have designed to divide the whole screen into 3x3 matrix similar to QCode inputting
method. The letters are placing on the whole screen like the character board. Our
design also combining the concept of QWERTY keyboard, grouping letter with their
usage frequency. They are ranked based on the usage frequency in order to ensure
prompt and speedy input.
Figure 5.33: QWERTY keyboard (www.computerhope.com, 2009)
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 71 of 81-
5.6.3 Operation
When the user wants to input a particular character, it is required to focus on the cell
which contains that character for some period of time (two seconds in our case) to enter
the sub single character selecting page. By using this selection method, every latter can
be typed in two steps.
Figure 5.34: GazePad operating screen
For an example, I want to type a letter “s”, and then I focused on the first row second
column, the single character selecting page is entered after 2 seconds. After that, I
focus on the letter “s” in the second row third column for 2 seconds. After the letter “s” is
typed.
Figure 5.35: Letter selection process
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 72 of 81-
Apart from letter input, symbol and number input are also supported in GazePad.
Similar to using a mobile phone, change the inputting mode in order to type symbols
and numbers.
Figure 5.36: Different inputting model is supported
5.6.3 Performances
In the typing experiments, a string contains character and symbol “cityu computer
science. jack ^.^” is inputted using GazePad. The record time used is about 3.3 minutes
which means that on average 9 characters can be inputted per minute. In addition,
every letter can be selected in two steps.
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 73 of 81-
Chapter 6
Conclusions
Critical reviews
Future work
Application area
Project feedback
-
A low Cost Implementation of Real-time Gaze Tracking Framework Final Report
- Page 74 of 81-
6.1 Critical reviews
In this project, a gaze tracking library with facial feature tracking, eye features extraction
and head poses estimation was developed. The facial feature tracking is based on
active shape model fitting, which is a model-based face tracking algorithm. The active
shape model was trained by numerous face images of different individual manually
annotated with 68 feature points. The shape result from facials model fitting is then
being used to extract two eyes searching region individually. The extraction of detailed
infor