comparison of thermal and visual facial imagery for use in sparse representation based facial...

Upload: aabeg100

Post on 04-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst

    1/4

    Comparison of Thermal and Visual Facial

    Imagery for use in Sparse Representation based

    Facial Recognition SystemAsif Raza Butt and Asim BaigDepartment of Electrical and Computer Engineering, Muhammad Ali Jinnah University, Islamabad, Pakistan

    Abstract-Facial Recognition is probably one of the most

    commonly used biometric characteristics used by

    humans for recognition. This is one of the reasons why

    it has been subject of intense research for the past 30

    years or so. In this time a lot of work is being done not

    only in the development of stable, real time facial

    recognition system but also in acquiring different

    modalities of facial imagery for use with these systems.

    One of the most successful recent attempts at

    developing a robust real time facial recognition system

    is based on representing the whole system as an

    underdetermined sparse linear system and solving it

    accordingly. On the other hand, the two mostly widely

    used modalities of facial imagery are Thermal and

    Visible images. In this paper, we compare the

    performance of a sparse representation based facial

    recognition system on both thermal and visible imagery.

    We also elaborate on the results in detail and explain

    the performances obtained.

    I. INTRODUCTIONFacial features based Recognition is such an

    integral part of human nature that it has become adedicated process for the brain [1]. It is also one of the

    most sought after modalities in real world security and

    safety applications such as surveillance, access

    control, information security and identity detection. It

    has the advantage of being universally accepted and

    can be acquired overtly or covertly. The ultimate goal

    of a robust facial recognition is to provide accurate

    detection in presence of noise such as illumination

    variation, aging and facial expression.

    One of the biggest challenges in face recognition

    based systems is the high dimensional data space. A

    lot of work has been done in recent years to improve

    the speed, robustness and accuracy of the system byreducing the dimensionality of the data. The idea is to

    map the high dimensional facial data into fewer more

    discriminant dimensions. One of the earliest examples

    of this approach is the use of EigenFaces [2] and PCA

    [3] for facial recognition.

    Another more commonly used approach to facial

    recognition is to train the recognition algorithm on

    only a small subset of more discriminating images to

    define a decision boundary such as Support Vector

    Machine based approaches in [4] and [5].

    In recent years approaches based on the sparse

    representation of the facial recognition system have

    become more common as they allow for the

    development of a robust, accurate and real time facial

    recognition system. Wright et. al. in [6] were the first

    to propose this approach. They proposed to present

    the input image as an over-complete set of featureswhose base elements are the enrollment and training

    images. This allow for the representation of the whole

    system as an underdetermined sparse linear system

    which can be solved as an l1minimization problem.

    The experimental results in their paper show that the

    proposed approach is robust to type of features

    selected and provides an accurate result.

    The other direction that the researchers are going

    into to improve the performance of facial recognition

    systems is to try different input modalities of facial

    images with existing recognition techniques such as

    Visual Imagery, Thermal Imagery, sketches and even

    fusion of multiple modalities. The aim is to use thesedifferent modalities of facial imagery to counteract the

    effects of illumination, pose, expression and aging in

    the input imagery. These various modalities allow for

    the face to be recognized both holistically as well as

    based on finer features.

    A number of current studies have shown that

    thermal IR imagery offers a promising alternative to

    visible imagery for handling variations in facial

    appearance due to illumination [7, 8], facial

    expression [9, 10] and face pose [11] . Thermal IR

    imagery is nearly invariant to change in ambient

    illumination and provide capabilities for identification

    under extremely low lighting conditions such as totaldarkness [12]. On the flip side, it does not provide the

    finer facial features that the visible imagery can

    provide for detection.

    These properties of thermal imagery make them

    ideal candidate for the use with approaches that focus

    on the more holistic approach to facial detection. In

    addition, any approach that utilized only the key/most

    prominent features of the facial image should also

    perform reasonably well with thermal imagery.

  • 8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst

    2/4

    In this paper we compare the performance of the

    sparse representation based facial recognition system

    presented in [6] on both thermal and visual imagery.

    To evaluate the performance of sparse representation

    based approach we used an enrolment database and

    probe gallery with both thermal and visual facial

    images. The matching performance is evaluated for

    Thermal-to-Thermal matching, Thermal-to-Visual

    matching, Visual-to-Thermal matching and Visual-to-

    Visual matching. This detailed evaluation provides an

    interesting insight into how the sparse representation

    based approach views the data and what is the ideal

    format to use with such type of approaches.

    The rest of the paper is organized as follows:

    Section 2 briefly outlines the working of a sparse

    representation based facial recognition system.

    Section 3 outlines the experimental setup and the

    database being used; section 4 discusses the results

    and comments on the systems performance. Section 5

    provides conclusions and outlines future research

    directions.

    II. OVERVIEW OF SPARSE REPRESENTATION BASEDAPPROACH

    The main idea behind the sparse representation

    based approaches is a generalization of the nearest

    subspace (NS) [13] approaches. Nearest Subspace

    based classifiers are based on the best linear

    representation of training samples in each class. The

    major difference between the two approaches is that

    one takes only the training samples from each class as

    the face subspace whereas the other takes the

    complete enrollment dataset as a linear span of

    training images for classification. This allows sparserepresentation based approaches to provide robustness

    against illumination and pose variations. The issue

    with this representation is that smaller variations

    between faces of different users can cause

    misclassification. This is the reason authors in [6]

    work with small size input images i.e. 12x12 or

    15x15.

    Broadly, the sparse representation based

    approaches work as follows: Given a sufficient set of

    training images for ithuser, the sample set Ai can be

    written as:

    [ ]

    where vi,1, vi,2etc. are the training images of the ith

    user. Then any new image from the same class will lie

    approximately on the same linear subspace and can be

    represented as

    where y is the approximation of the new input

    image based on the existing training images. It is

    interesting to note that the more training image exists

    the better the representation of the new image. For a

    real scenario the membership of the new image is

    unknown and to handle that a new matrixAis defined

    that encompasses the entire enrolment database and

    can be represented as

    In this case y can be written as

    Where

    [ ]

    Represents a coefficient vector with all zero entries

    except for the ones associated with the ith user.

    Equation 4 then represents an underdetermined sparse

    linear system that can be solved forxousing any of the

    possible approaches such as l1-minimization or least

    square minimization approach. Although least square

    minimization based approaches are not as accurate as

    l1-minimization based approaches they tend to be

    simpler to implement and MATLAB provides a built-

    in function with an optimized implementation. In this

    paper we work with least square minimization based

    approach for the sake of simplicity.

    III. THE EXPERIMENTAL SETUPWe required a standard and established database of

    thermal and visual images to properly evaluate the

    performance of sparse representation based approach.

    In this regards, the enrolment database and probe

    gallery was generated from the Dataset 02: IRIS

    Thermal/Visible Face Databasesubset of the Object

    Tracking and Classification Beyond the Visible

    Spectrun (OTCBVS)database [14], freely available for

    download at http://www.cse.ohio-

    state.edu/OTCBVS-BENCH/. For this paper 30

    users were selected and the probe gallery consisted of

    a thermal and a visual image each for every user. This

    means that the probe gallery consists of 60 images

    with 30 thermal and 30 visual images. The enrollment

    gallery consists of 4 thermal and 4 visual images of

    each user. Only forward facing images with slight

    variation in pose were selected and no restrictions

    were placed on the expressions. The enrollment

    database so generated consists of 240 almost forward

    looking images with expression variations. It should

    be noted that the faces were cropped from the image

    so as not to bias the results due to accidental matching

    of background or clothing in the image.The code for sparse representation based approach

    was written in MATLAB using the built-in LSQR

    function. The matching was results were verified

    visually and the results shown are for Rank Zero (0)

    matching only i.e. only the highest scoring enrolment

    image is compared visually with the gallery image

    and marked as match or non-match.

    1

    (2)

    3

    4

    5

  • 8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst

    3/4

    To evaluate the effect of scaling on the matching

    process the code is run multiple times with different

    size enrollment and probe images each time. The

    approach is evaluated for 9 different sizes. The sizes

    used are 8x8, 12x12, 15x15, 20x20, 25x25, 30x30,

    35x35, 40x40, 45x45 and 50x50. The results for each

    of these sizes and their analysis are provided in the

    next section.

    IV. RESULTS AND ANALYSISTable 1 show the matching score comparisons for

    different sizes of thermal and visual images as well as

    overall matching scores. Two major observations are

    immediately obvious when these results are analyzed.

    First and foremost, as commented in [6] the correct

    match percentage increases with an increase in the

    size of the input images. It is interesting to note that

    this increase is not linear and in fact the matching

    starts to decrease once the image size increases

    beyond a certain limit. In our experimentation that

    limit was the size of 30x30.

    The reason for this reduction is that once the image

    size goes beyond a certain threshold size more and

    smaller local feature become visible. The sparse

    representation based approaches are global matching

    approaches by nature and therefore work better when

    only the larger features such as eyes, nose, mouth and

    face shape are being utilized for matching. Once

    smaller features come into play these approaches tend

    to become more inaccurate.

    The second observation is that the thermal imagevs. thermal image matching accuracy is always more

    than any other case. This is again due to the nature of

    the sparse representation based matching approaches.

    As mentioned above these approaches work on global

    scale and work best when only larger facial features

    are available for matching. In thermal images these

    global features are almost always more prominent

    than in visual imagery. Therefore, the thermal vs.

    thermal image matching provides better matching

    results.

    An interesting observation is that although the

    overall accurate matching results were lower for

    smaller image sizes such as 8x8 and 12x12 a majority

    of the correct matches were due to thermal vs thermal

    matching. This phenomenon can easily be explained

    by the two observations provided above. The graph in

    figure 3 shows this result in a clearer fashion. In

    addition, it should be noted that although the although

    the thermal vs. thermal correct matching percentage

    reduces as he image size increases it is still greater

    than visual vs visual correct match percentage.

    It is safe to comment based on the results and their

    analysis that sparse representation based techniques

    are global feature matching techniques by nature and

    that it is better to use thermal imagery with these

    sparse representation based techniques. In addition,

    the results also show that the optimum size of probe

    and enrolment images should be between 20x20 and

    30x30 when using lease mean square minimization

    approach based system.

    It would be interesting to evaluate the l1-

    minimization based approaches in the same way and

    we are currently working towards this evaluation. It

    would also be interesting to look more closely at those

    images that were matched correctly in Thermal Vs

    Visual and Visual Vs Thermal matches to evaluate the

    reason behind these correct matches. We believe that

    it will provide deeper insight into the working of

    sparse representation based approaches in particular

    and the global feature matching based approaches ingeneral.

    V. CONCLUSIONA comparison is provided between visual and

    thermal images as input and enrolment dataset for a

    sparse representation based approaches. The results

    show not only that sparse representation based

    approaches can be considered global feature matching

    base approaches but also that thermal imagery

    TABLE1.MATCHING RESULTS FOR DIFFERENT SIZE IMAGES

    Pixel Size

    Total

    Correct

    Matches

    Thermal Vs

    Thermal

    Matches

    Visual Vs

    Visual

    Matches

    Thermal Vs

    Visual

    Matches

    Visual Vs

    Thermal

    matches

    8x8 19 15 4 0 0

    12x12 25 19 5 0 1

    15x15 29 20 7 0 2

    20x20 30 21 7 0 2

    25x25 32 20 11 1 0

    30x30 32 20 9 0 3

    35x35 31 19 10 0 2

    40x40 31 19 9 0 3

    45x45 31 19 7 2 2

    50x50 30 17 9 0 4

  • 8/13/2019 Comparison of Thermal and Visual Facial Imagery for Use in Sparse Representation Based Facial Recognition Syst

    4/4

    provides better accuracy as compared to visual

    imagery for these techniques. In addition, the results

    also show that the accuracy of these techniques will

    drop once the image size increases beyond a certain

    threshold. Further comparisons should be performed

    based on l1-minimization based approaches. Another

    interesting research direction should be to analyze and

    evaluate the images that provide correct matches in

    thermal vs visual matching and visual vs thermal

    matching.

    REFERENCES

    [1] A. K. Jain and S. Z. Li, Handbook of Face Recognition,

    Springer-Verlag New York, Inc. 2005 ISBN: 038740595X

    [2] M. Turk and A. Pentland. Eigenfaces for recognition.

    International Journal on Cognitive Neuroscience, 3(1):7186,

    1991.

    [3] A. dAspremont, L.E. Ghaoui, M. Jordan, and G. Lanckriet, A

    Direct Formulation of Sparse PCA Using Semidefinite

    Programming, SIAM Rev., vol. 49, pp. 434-448, 2007.

    [4] V. Vapnik, The Nature of Statistical Learning Theory.

    Springer, 2000.

    [5] R. Singh, M. Vatsa and A. Noore. Integrated multilevel image

    fusion and match score fusion of visible and infrared face

    images for robust face recognition. Pattern Recognition, vol. 41

    pp. 880-893. 2008

    [6] J. Wright, A. Yang, A. Ganesh, S. Sastry and Y. Ma. Robust

    face recognition via sparse representation. IEEE Transactions

    on Pattern Analysis and Machine Intelligence. vol. 31, no. 2. pp.

    201-227, 2009.

    [7]George B., Aglika G., Saurabh S. and Ioannis P., Face

    recognition by fusing thermal infrared and visible imagery ,

    Image and Vision Computing 24 (2006) 727742

    [8] D. Socolinsky, A. Selinger, J. Neuheisel, Face recognition with

    visibleand thermal infrared imagery, Computer Vision and

    Image Understanding(2003) 72114.

    [9] G. Friedrich, Y. Yeshurun, Seeing people in the dark: face

    recognition ininfrared images, in: Second BMCV, 2003.

    [10]A. Jain, R. Bolle, S. Pankanti, Biometrics: Personal

    Identification inNetworked Society, Kluwer Academic

    Publishers, Dordrecht, 1999.

    [11] I. Pavlidis, P. Symosek, The imaging issue in an automatic

    face/disguisedetection system, in: IEEE Workshop on Computer

    Vision Beyond theVisible Spectrum, 2000, pp. 1524.

    [12]J. Park, T. Oh, S. Ahn, S. Lee, Glasses removal from facial

    image using recursive error compensation, IEEE Transactions

    on Pattern Analysis and Machine Intelligence 27 (5) (2005)

    805811.

    [13] P. Belhumeur, J. Hespanda, and D. Kriegman, Eigenfaces

    versus Fisherfaces: Recognition Using Class Specific Linear

    Projection, IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 19, no. 7, pp. 711-720, July 1997.

    [14] IEEE OTCBVS WS Series Bench; DOE University Research

    Program in Robotics under grant DOE-DE-FG02-86NE37968;

    DOD/TACOM/NAC/ARC Program under grant R01-1344-18;

    FAA/NSSA grant R01-1344-48/49; Office of Naval Research

    under grant #N000143010022.

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    8x8 12x12 15x15 20x20 25x25 30x30 35x35 40x40 45x45 50x50

    %

    ofmatches

    Pixel Size Vs Thermal-Thermal Matches % age

    Percentage of

    Thermal-Thermal

    Matches

    Figure 3. Graph showing comparison between Pixel Size and Thermal Match %age