comparison of thermal and visual facial imagery for use in sparse representation based facial...
TRANSCRIPT
-
8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst
1/4
Comparison of Thermal and Visual Facial
Imagery for use in Sparse Representation based
Facial Recognition SystemAsif Raza Butt and Asim BaigDepartment of Electrical and Computer Engineering, Muhammad Ali Jinnah University, Islamabad, Pakistan
Abstract-Facial Recognition is probably one of the most
commonly used biometric characteristics used by
humans for recognition. This is one of the reasons why
it has been subject of intense research for the past 30
years or so. In this time a lot of work is being done not
only in the development of stable, real time facial
recognition system but also in acquiring different
modalities of facial imagery for use with these systems.
One of the most successful recent attempts at
developing a robust real time facial recognition system
is based on representing the whole system as an
underdetermined sparse linear system and solving it
accordingly. On the other hand, the two mostly widely
used modalities of facial imagery are Thermal and
Visible images. In this paper, we compare the
performance of a sparse representation based facial
recognition system on both thermal and visible imagery.
We also elaborate on the results in detail and explain
the performances obtained.
I. INTRODUCTIONFacial features based Recognition is such an
integral part of human nature that it has become adedicated process for the brain [1]. It is also one of the
most sought after modalities in real world security and
safety applications such as surveillance, access
control, information security and identity detection. It
has the advantage of being universally accepted and
can be acquired overtly or covertly. The ultimate goal
of a robust facial recognition is to provide accurate
detection in presence of noise such as illumination
variation, aging and facial expression.
One of the biggest challenges in face recognition
based systems is the high dimensional data space. A
lot of work has been done in recent years to improve
the speed, robustness and accuracy of the system byreducing the dimensionality of the data. The idea is to
map the high dimensional facial data into fewer more
discriminant dimensions. One of the earliest examples
of this approach is the use of EigenFaces [2] and PCA
[3] for facial recognition.
Another more commonly used approach to facial
recognition is to train the recognition algorithm on
only a small subset of more discriminating images to
define a decision boundary such as Support Vector
Machine based approaches in [4] and [5].
In recent years approaches based on the sparse
representation of the facial recognition system have
become more common as they allow for the
development of a robust, accurate and real time facial
recognition system. Wright et. al. in [6] were the first
to propose this approach. They proposed to present
the input image as an over-complete set of featureswhose base elements are the enrollment and training
images. This allow for the representation of the whole
system as an underdetermined sparse linear system
which can be solved as an l1minimization problem.
The experimental results in their paper show that the
proposed approach is robust to type of features
selected and provides an accurate result.
The other direction that the researchers are going
into to improve the performance of facial recognition
systems is to try different input modalities of facial
images with existing recognition techniques such as
Visual Imagery, Thermal Imagery, sketches and even
fusion of multiple modalities. The aim is to use thesedifferent modalities of facial imagery to counteract the
effects of illumination, pose, expression and aging in
the input imagery. These various modalities allow for
the face to be recognized both holistically as well as
based on finer features.
A number of current studies have shown that
thermal IR imagery offers a promising alternative to
visible imagery for handling variations in facial
appearance due to illumination [7, 8], facial
expression [9, 10] and face pose [11] . Thermal IR
imagery is nearly invariant to change in ambient
illumination and provide capabilities for identification
under extremely low lighting conditions such as totaldarkness [12]. On the flip side, it does not provide the
finer facial features that the visible imagery can
provide for detection.
These properties of thermal imagery make them
ideal candidate for the use with approaches that focus
on the more holistic approach to facial detection. In
addition, any approach that utilized only the key/most
prominent features of the facial image should also
perform reasonably well with thermal imagery.
-
8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst
2/4
In this paper we compare the performance of the
sparse representation based facial recognition system
presented in [6] on both thermal and visual imagery.
To evaluate the performance of sparse representation
based approach we used an enrolment database and
probe gallery with both thermal and visual facial
images. The matching performance is evaluated for
Thermal-to-Thermal matching, Thermal-to-Visual
matching, Visual-to-Thermal matching and Visual-to-
Visual matching. This detailed evaluation provides an
interesting insight into how the sparse representation
based approach views the data and what is the ideal
format to use with such type of approaches.
The rest of the paper is organized as follows:
Section 2 briefly outlines the working of a sparse
representation based facial recognition system.
Section 3 outlines the experimental setup and the
database being used; section 4 discusses the results
and comments on the systems performance. Section 5
provides conclusions and outlines future research
directions.
II. OVERVIEW OF SPARSE REPRESENTATION BASEDAPPROACH
The main idea behind the sparse representation
based approaches is a generalization of the nearest
subspace (NS) [13] approaches. Nearest Subspace
based classifiers are based on the best linear
representation of training samples in each class. The
major difference between the two approaches is that
one takes only the training samples from each class as
the face subspace whereas the other takes the
complete enrollment dataset as a linear span of
training images for classification. This allows sparserepresentation based approaches to provide robustness
against illumination and pose variations. The issue
with this representation is that smaller variations
between faces of different users can cause
misclassification. This is the reason authors in [6]
work with small size input images i.e. 12x12 or
15x15.
Broadly, the sparse representation based
approaches work as follows: Given a sufficient set of
training images for ithuser, the sample set Ai can be
written as:
[ ]
where vi,1, vi,2etc. are the training images of the ith
user. Then any new image from the same class will lie
approximately on the same linear subspace and can be
represented as
where y is the approximation of the new input
image based on the existing training images. It is
interesting to note that the more training image exists
the better the representation of the new image. For a
real scenario the membership of the new image is
unknown and to handle that a new matrixAis defined
that encompasses the entire enrolment database and
can be represented as
In this case y can be written as
Where
[ ]
Represents a coefficient vector with all zero entries
except for the ones associated with the ith user.
Equation 4 then represents an underdetermined sparse
linear system that can be solved forxousing any of the
possible approaches such as l1-minimization or least
square minimization approach. Although least square
minimization based approaches are not as accurate as
l1-minimization based approaches they tend to be
simpler to implement and MATLAB provides a built-
in function with an optimized implementation. In this
paper we work with least square minimization based
approach for the sake of simplicity.
III. THE EXPERIMENTAL SETUPWe required a standard and established database of
thermal and visual images to properly evaluate the
performance of sparse representation based approach.
In this regards, the enrolment database and probe
gallery was generated from the Dataset 02: IRIS
Thermal/Visible Face Databasesubset of the Object
Tracking and Classification Beyond the Visible
Spectrun (OTCBVS)database [14], freely available for
download at http://www.cse.ohio-
state.edu/OTCBVS-BENCH/. For this paper 30
users were selected and the probe gallery consisted of
a thermal and a visual image each for every user. This
means that the probe gallery consists of 60 images
with 30 thermal and 30 visual images. The enrollment
gallery consists of 4 thermal and 4 visual images of
each user. Only forward facing images with slight
variation in pose were selected and no restrictions
were placed on the expressions. The enrollment
database so generated consists of 240 almost forward
looking images with expression variations. It should
be noted that the faces were cropped from the image
so as not to bias the results due to accidental matching
of background or clothing in the image.The code for sparse representation based approach
was written in MATLAB using the built-in LSQR
function. The matching was results were verified
visually and the results shown are for Rank Zero (0)
matching only i.e. only the highest scoring enrolment
image is compared visually with the gallery image
and marked as match or non-match.
1
(2)
3
4
5
-
8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst
3/4
To evaluate the effect of scaling on the matching
process the code is run multiple times with different
size enrollment and probe images each time. The
approach is evaluated for 9 different sizes. The sizes
used are 8x8, 12x12, 15x15, 20x20, 25x25, 30x30,
35x35, 40x40, 45x45 and 50x50. The results for each
of these sizes and their analysis are provided in the
next section.
IV. RESULTS AND ANALYSISTable 1 show the matching score comparisons for
different sizes of thermal and visual images as well as
overall matching scores. Two major observations are
immediately obvious when these results are analyzed.
First and foremost, as commented in [6] the correct
match percentage increases with an increase in the
size of the input images. It is interesting to note that
this increase is not linear and in fact the matching
starts to decrease once the image size increases
beyond a certain limit. In our experimentation that
limit was the size of 30x30.
The reason for this reduction is that once the image
size goes beyond a certain threshold size more and
smaller local feature become visible. The sparse
representation based approaches are global matching
approaches by nature and therefore work better when
only the larger features such as eyes, nose, mouth and
face shape are being utilized for matching. Once
smaller features come into play these approaches tend
to become more inaccurate.
The second observation is that the thermal imagevs. thermal image matching accuracy is always more
than any other case. This is again due to the nature of
the sparse representation based matching approaches.
As mentioned above these approaches work on global
scale and work best when only larger facial features
are available for matching. In thermal images these
global features are almost always more prominent
than in visual imagery. Therefore, the thermal vs.
thermal image matching provides better matching
results.
An interesting observation is that although the
overall accurate matching results were lower for
smaller image sizes such as 8x8 and 12x12 a majority
of the correct matches were due to thermal vs thermal
matching. This phenomenon can easily be explained
by the two observations provided above. The graph in
figure 3 shows this result in a clearer fashion. In
addition, it should be noted that although the although
the thermal vs. thermal correct matching percentage
reduces as he image size increases it is still greater
than visual vs visual correct match percentage.
It is safe to comment based on the results and their
analysis that sparse representation based techniques
are global feature matching techniques by nature and
that it is better to use thermal imagery with these
sparse representation based techniques. In addition,
the results also show that the optimum size of probe
and enrolment images should be between 20x20 and
30x30 when using lease mean square minimization
approach based system.
It would be interesting to evaluate the l1-
minimization based approaches in the same way and
we are currently working towards this evaluation. It
would also be interesting to look more closely at those
images that were matched correctly in Thermal Vs
Visual and Visual Vs Thermal matches to evaluate the
reason behind these correct matches. We believe that
it will provide deeper insight into the working of
sparse representation based approaches in particular
and the global feature matching based approaches ingeneral.
V. CONCLUSIONA comparison is provided between visual and
thermal images as input and enrolment dataset for a
sparse representation based approaches. The results
show not only that sparse representation based
approaches can be considered global feature matching
base approaches but also that thermal imagery
TABLE1.MATCHING RESULTS FOR DIFFERENT SIZE IMAGES
Pixel Size
Total
Correct
Matches
Thermal Vs
Thermal
Matches
Visual Vs
Visual
Matches
Thermal Vs
Visual
Matches
Visual Vs
Thermal
matches
8x8 19 15 4 0 0
12x12 25 19 5 0 1
15x15 29 20 7 0 2
20x20 30 21 7 0 2
25x25 32 20 11 1 0
30x30 32 20 9 0 3
35x35 31 19 10 0 2
40x40 31 19 9 0 3
45x45 31 19 7 2 2
50x50 30 17 9 0 4
-
8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst
4/4
provides better accuracy as compared to visual
imagery for these techniques. In addition, the results
also show that the accuracy of these techniques will
drop once the image size increases beyond a certain
threshold. Further comparisons should be performed
based on l1-minimization based approaches. Another
interesting research direction should be to analyze and
evaluate the images that provide correct matches in
thermal vs visual matching and visual vs thermal
matching.
REFERENCES
[1] A. K. Jain and S. Z. Li, Handbook of Face Recognition,
Springer-Verlag New York, Inc. 2005 ISBN: 038740595X
[2] M. Turk and A. Pentland. Eigenfaces for recognition.
International Journal on Cognitive Neuroscience, 3(1):7186,
1991.
[3] A. dAspremont, L.E. Ghaoui, M. Jordan, and G. Lanckriet, A
Direct Formulation of Sparse PCA Using Semidefinite
Programming, SIAM Rev., vol. 49, pp. 434-448, 2007.
[4] V. Vapnik, The Nature of Statistical Learning Theory.
Springer, 2000.
[5] R. Singh, M. Vatsa and A. Noore. Integrated multilevel image
fusion and match score fusion of visible and infrared face
images for robust face recognition. Pattern Recognition, vol. 41
pp. 880-893. 2008
[6] J. Wright, A. Yang, A. Ganesh, S. Sastry and Y. Ma. Robust
face recognition via sparse representation. IEEE Transactions
on Pattern Analysis and Machine Intelligence. vol. 31, no. 2. pp.
201-227, 2009.
[7]George B., Aglika G., Saurabh S. and Ioannis P., Face
recognition by fusing thermal infrared and visible imagery ,
Image and Vision Computing 24 (2006) 727742
[8] D. Socolinsky, A. Selinger, J. Neuheisel, Face recognition with
visibleand thermal infrared imagery, Computer Vision and
Image Understanding(2003) 72114.
[9] G. Friedrich, Y. Yeshurun, Seeing people in the dark: face
recognition ininfrared images, in: Second BMCV, 2003.
[10]A. Jain, R. Bolle, S. Pankanti, Biometrics: Personal
Identification inNetworked Society, Kluwer Academic
Publishers, Dordrecht, 1999.
[11] I. Pavlidis, P. Symosek, The imaging issue in an automatic
face/disguisedetection system, in: IEEE Workshop on Computer
Vision Beyond theVisible Spectrum, 2000, pp. 1524.
[12]J. Park, T. Oh, S. Ahn, S. Lee, Glasses removal from facial
image using recursive error compensation, IEEE Transactions
on Pattern Analysis and Machine Intelligence 27 (5) (2005)
805811.
[13] P. Belhumeur, J. Hespanda, and D. Kriegman, Eigenfaces
versus Fisherfaces: Recognition Using Class Specific Linear
Projection, IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[14] IEEE OTCBVS WS Series Bench; DOE University Research
Program in Robotics under grant DOE-DE-FG02-86NE37968;
DOD/TACOM/NAC/ARC Program under grant R01-1344-18;
FAA/NSSA grant R01-1344-48/49; Office of Naval Research
under grant #N000143010022.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
8x8 12x12 15x15 20x20 25x25 30x30 35x35 40x40 45x45 50x50
%
ofmatches
Pixel Size Vs Thermal-Thermal Matches % age
Percentage of
Thermal-Thermal
Matches
Figure 3. Graph showing comparison between Pixel Size and Thermal Match %age