comparison of thermal and visual facial imagery for use in sparse representation based facial...

8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst

1/4

Comparison of Thermal and Visual Facial

Imagery for use in Sparse Representation based

Facial Recognition SystemAsif Raza Butt and Asim BaigDepartment of Electrical and Computer Engineering, Muhammad Ali Jinnah University, Islamabad, Pakistan

Abstract-Facial Recognition is probably one of the most

commonly used biometric characteristics used by

humans for recognition. This is one of the reasons why

it has been subject of intense research for the past 30

years or so. In this time a lot of work is being done not

only in the development of stable, real time facial

recognition system but also in acquiring different

modalities of facial imagery for use with these systems.

One of the most successful recent attempts at

developing a robust real time facial recognition system

is based on representing the whole system as an

underdetermined sparse linear system and solving it

accordingly. On the other hand, the two mostly widely

used modalities of facial imagery are Thermal and

Visible images. In this paper, we compare the

performance of a sparse representation based facial

recognition system on both thermal and visible imagery.

We also elaborate on the results in detail and explain

the performances obtained.

I. INTRODUCTIONFacial features based Recognition is such an

integral part of human nature that it has become adedicated process for the brain [1]. It is also one of the

most sought after modalities in real world security and

safety applications such as surveillance, access

control, information security and identity detection. It

has the advantage of being universally accepted and

can be acquired overtly or covertly. The ultimate goal

of a robust facial recognition is to provide accurate

detection in presence of noise such as illumination

variation, aging and facial expression.

One of the biggest challenges in face recognition

based systems is the high dimensional data space. A

lot of work has been done in recent years to improve

the speed, robustness and accuracy of the system byreducing the dimensionality of the data. The idea is to

map the high dimensional facial data into fewer more

discriminant dimensions. One of the earliest examples

of this approach is the use of EigenFaces [2] and PCA

[3] for facial recognition.

Another more commonly used approach to facial

recognition is to train the recognition algorithm on

only a small subset of more discriminating images to

define a decision boundary such as Support Vector

Machine based approaches in [4] and [5].

In recent years approaches based on the sparse

representation of the facial recognition system have

become more common as they allow for the

development of a robust, accurate and real time facial

recognition system. Wright et. al. in [6] were the first

to propose this approach. They proposed to present

the input image as an over-complete set of featureswhose base elements are the enrollment and training

images. This allow for the representation of the whole

system as an underdetermined sparse linear system

which can be solved as an l1minimization problem.

The experimental results in their paper show that the

proposed approach is robust to type of features

selected and provides an accurate result.

The other direction that the researchers are going

into to improve the performance of facial recognition

systems is to try different input modalities of facial

images with existing recognition techniques such as

Visual Imagery, Thermal Imagery, sketches and even

fusion of multiple modalities. The aim is to use thesedifferent modalities of facial imagery to counteract the

effects of illumination, pose, expression and aging in

the input imagery. These various modalities allow for

the face to be recognized both holistically as well as

based on finer features.

A number of current studies have shown that

thermal IR imagery offers a promising alternative to

visible imagery for handling variations in facial

appearance due to illumination [7, 8], facial

expression [9, 10] and face pose [11] . Thermal IR

imagery is nearly invariant to change in ambient

illumination and provide capabilities for identification

under extremely low lighting conditions such as totaldarkness [12]. On the flip side, it does not provide the

finer facial features that the visible imagery can

provide for detection.

These properties of thermal imagery make them

ideal candidate for the use with approaches that focus

on the more holistic approach to facial detection. In

addition, any approach that utilized only the key/most

prominent features of the facial image should also

perform reasonably well with thermal imagery.


2/4

In this paper we compare the performance of the

sparse representation based facial recognition system

presented in [6] on both thermal and visual imagery.

To evaluate the performance of sparse representation

based approach we used an enrolment database and

probe gallery with both thermal and visual facial

images. The matching performance is evaluated for

Thermal-to-Thermal matching, Thermal-to-Visual

matching, Visual-to-Thermal matching and Visual-to-

Visual matching. This detailed evaluation provides an

interesting insight into how the sparse representation

based approach views the data and what is the ideal

format to use with such type of approaches.

The rest of the paper is organized as follows:

Section 2 briefly outlines the working of a sparse

representation based facial recognition system.

Section 3 outlines the experimental setup and the

database being used; section 4 discusses the results

and comments on the systems performance. Section 5

provides conclusions and outlines future research

directions.

II. OVERVIEW OF SPARSE REPRESENTATION BASEDAPPROACH

The main idea behind the sparse representation

based approaches is a generalization of the nearest

subspace (NS) [13] approaches. Nearest Subspace

based classifiers are based on the best linear

representation of training samples in each class. The

major difference between the two approaches is that

one takes only the training samples from each class as

the face subspace whereas the other takes the

complete enrollment dataset as a linear span of

training images for classification. This allows sparserepresentation based approaches to provide robustness

against illumination and pose variations. The issue

with this representation is that smaller variations

between faces of different users can cause

misclassification. This is the reason authors in [6]

work with small size input images i.e. 12x12 or

15x15.

Broadly, the sparse representation based

approaches work as follows: Given a sufficient set of

training images for ithuser, the sample set Ai can be

written as:

[ ]

where vi,1, vi,2etc. are the training images of the ith

user. Then any new image from the same class will lie

approximately on the same linear subspace and can be

represented as

where y is the approximation of the new input

image based on the existing training images. It is

interesting to note that the more training image exists

the better the representation of the new image. For a

real scenario the membership of the new image is

unknown and to handle that a new matrixAis defined

that encompasses the entire enrolment database and

can be represented as

In this case y can be written as

Where

[ ]

Represents a coefficient vector with all zero entries

except for the ones associated with the ith user.

Equation 4 then represents an underdetermined sparse

linear system that can be solved forxousing any of the

possible approaches such as l1-minimization or least

square minimization approach. Although least square

minimization based approaches are not as accurate as

l1-minimization based approaches they tend to be

simpler to implement and MATLAB provides a built-

in function with an optimized implementation. In this

paper we work with least square minimization based

approach for the sake of simplicity.

III. THE EXPERIMENTAL SETUPWe required a standard and established database of

thermal and visual images to properly evaluate the

performance of sparse representation based approach.

In this regards, the enrolment database and probe

gallery was generated from the Dataset 02: IRIS

Thermal/Visible Face Databasesubset of the Object

Tracking and Classification Beyond the Visible

Spectrun (OTCBVS)database [14], freely available for

download at http://www.cse.ohio-

state.edu/OTCBVS-BENCH/. For this paper 30

users were selected and the probe gallery consisted of

a thermal and a visual image each for every user. This

means that the probe gallery consists of 60 images

with 30 thermal and 30 visual images. The enrollment

gallery consists of 4 thermal and 4 visual images of

each user. Only forward facing images with slight

variation in pose were selected and no restrictions

were placed on the expressions. The enrollment

database so generated consists of 240 almost forward

looking images with expression variations. It should

be noted that the faces were cropped from the image

so as not to bias the results due to accidental matching

of background or clothing in the image.The code for sparse representation based approach

was written in MATLAB using the built-in LSQR

function. The matching was results were verified

visually and the results shown are for Rank Zero (0)

matching only i.e. only the highest scoring enrolment

image is compared visually with the gallery image

and marked as match or non-match.

1

(2)

3

4

5


3/4

To evaluate the effect of scaling on the matching

process the code is run multiple times with different

size enrollment and probe images each time. The

approach is evaluated for 9 different sizes. The sizes

used are 8x8, 12x12, 15x15, 20x20, 25x25, 30x30,

35x35, 40x40, 45x45 and 50x50. The results for each

of these sizes and their analysis are provided in the

next section.

IV. RESULTS AND ANALYSISTable 1 show the matching score comparisons for

different sizes of thermal and visual images as well as

overall matching scores. Two major observations are

immediately obvious when these results are analyzed.

First and foremost, as commented in [6] the correct

match percentage increases with an increase in the

size of the input images. It is interesting to note that

this increase is not linear and in fact the matching

starts to decrease once the image size increases

beyond a certain limit. In our experimentation that

limit was the size of 30x30.

The reason for this reduction is that once the image

size goes beyond a certain threshold size more and

smaller local feature become visible. The sparse

representation based approaches are global matching

approaches by nature and therefore work better when

only the larger features such as eyes, nose, mouth and

face shape are being utilized for matching. Once

smaller features come into play these approaches tend

to become more inaccurate.

The second observation is that the thermal imagevs. thermal image matching accuracy is always more

than any other case. This is again due to the nature of

the sparse representation based matching approaches.

As mentioned above these approaches work on global

scale and work best when only larger facial features

are available for matching. In thermal images these

global features are almost always more prominent

than in visual imagery. Therefore, the thermal vs.

thermal image matching provides better matching

results.

An interesting observation is that although the

overall accurate matching results were lower for

smaller image sizes such as 8x8 and 12x12 a majority

of the correct matches were due to thermal vs thermal

matching. This phenomenon can easily be explained

by the two observations provided above. The graph in

figure 3 shows this result in a clearer fashion. In

addition, it should be noted that although the although

the thermal vs. thermal correct matching percentage

reduces as he image size increases it is still greater

than visual vs visual correct match percentage.

It is safe to comment based on the results and their

analysis that sparse representation based techniques

are global feature matching techniques by nature and

that it is better to use thermal imagery with these

sparse representation based techniques. In addition,

the results also show that the optimum size of probe

and enrolment images should be between 20x20 and

30x30 when using lease mean square minimization

approach based system.

It would be interesting to evaluate the l1-

minimization based approaches in the same way and

we are currently working towards this evaluation. It

would also be interesting to look more closely at those

images that were matched correctly in Thermal Vs

Visual and Visual Vs Thermal matches to evaluate the

reason behind these correct matches. We believe that

it will provide deeper insight into the working of

sparse representation based approaches in particular

and the global feature matching based approaches ingeneral.

V. CONCLUSIONA comparison is provided between visual and

thermal images as input and enrolment dataset for a

sparse representation based approaches. The results

show not only that sparse representation based

approaches can be considered global feature matching

base approaches but also that thermal imagery

TABLE1.MATCHING RESULTS FOR DIFFERENT SIZE IMAGES

Pixel Size

Total

Correct

Matches

Thermal Vs

Thermal

Matches

Visual Vs

Visual

Matches

Thermal Vs

Visual

Matches

Visual Vs

Thermal

matches

8x8 19 15 4 0 0

12x12 25 19 5 0 1

15x15 29 20 7 0 2

20x20 30 21 7 0 2

25x25 32 20 11 1 0

30x30 32 20 9 0 3

35x35 31 19 10 0 2

40x40 31 19 9 0 3

45x45 31 19 7 2 2

50x50 30 17 9 0 4


4/4

provides better accuracy as compared to visual

imagery for these techniques. In addition, the results

also show that the accuracy of these techniques will

drop once the image size increases beyond a certain

threshold. Further comparisons should be performed

based on l1-minimization based approaches. Another

interesting research direction should be to analyze and

evaluate the images that provide correct matches in

thermal vs visual matching and visual vs thermal

matching.

REFERENCES

[1] A. K. Jain and S. Z. Li, Handbook of Face Recognition,

Springer-Verlag New York, Inc. 2005 ISBN: 038740595X

[2] M. Turk and A. Pentland. Eigenfaces for recognition.

International Journal on Cognitive Neuroscience, 3(1):7186,

1991.

[3] A. dAspremont, L.E. Ghaoui, M. Jordan, and G. Lanckriet, A

Direct Formulation of Sparse PCA Using Semidefinite

Programming, SIAM Rev., vol. 49, pp. 434-448, 2007.

[4] V. Vapnik, The Nature of Statistical Learning Theory.

Springer, 2000.

[5] R. Singh, M. Vatsa and A. Noore. Integrated multilevel image

fusion and match score fusion of visible and infrared face

images for robust face recognition. Pattern Recognition, vol. 41

pp. 880-893. 2008

[6] J. Wright, A. Yang, A. Ganesh, S. Sastry and Y. Ma. Robust

face recognition via sparse representation. IEEE Transactions

on Pattern Analysis and Machine Intelligence. vol. 31, no. 2. pp.

201-227, 2009.

[7]George B., Aglika G., Saurabh S. and Ioannis P., Face

recognition by fusing thermal infrared and visible imagery ,

Image and Vision Computing 24 (2006) 727742

[8] D. Socolinsky, A. Selinger, J. Neuheisel, Face recognition with

visibleand thermal infrared imagery, Computer Vision and

Image Understanding(2003) 72114.

[9] G. Friedrich, Y. Yeshurun, Seeing people in the dark: face

recognition ininfrared images, in: Second BMCV, 2003.

[10]A. Jain, R. Bolle, S. Pankanti, Biometrics: Personal

Identification inNetworked Society, Kluwer Academic

Publishers, Dordrecht, 1999.

[11] I. Pavlidis, P. Symosek, The imaging issue in an automatic

face/disguisedetection system, in: IEEE Workshop on Computer

Vision Beyond theVisible Spectrum, 2000, pp. 1524.

[12]J. Park, T. Oh, S. Ahn, S. Lee, Glasses removal from facial

image using recursive error compensation, IEEE Transactions

on Pattern Analysis and Machine Intelligence 27 (5) (2005)

805811.

[13] P. Belhumeur, J. Hespanda, and D. Kriegman, Eigenfaces

versus Fisherfaces: Recognition Using Class Specific Linear

Projection, IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 19, no. 7, pp. 711-720, July 1997.

[14] IEEE OTCBVS WS Series Bench; DOE University Research

Program in Robotics under grant DOE-DE-FG02-86NE37968;

DOD/TACOM/NAC/ARC Program under grant R01-1344-18;

FAA/NSSA grant R01-1344-48/49; Office of Naval Research

under grant #N000143010022.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

8x8 12x12 15x15 20x20 25x25 30x30 35x35 40x40 45x45 50x50

%

ofmatches

Pixel Size Vs Thermal-Thermal Matches % age

Percentage of

Thermal-Thermal

Matches

Figure 3. Graph showing comparison between Pixel Size and Thermal Match %age

comparison of thermal and visual facial imagery for use in sparse representation based facial...

Documents