illumination invariant face recognition - …raghuram/publications/courseprojects/enee63… ·...

ILLUMINATION INVARIANT FACE RECOGNITION

Raghuraman Gopalan

[email protected]

ENEE 631: Digital Image and Video Processing

Instructor: Dr. K. J. Ray Liu

Term Project - Spring 2006

1. INTRODUCTION The performance of the Face Recognition algorithms is severely affected by two important factors: the change in Pose and Illumination conditions of the subjects. The changes in Illumination conditions of the subjects can be so drastic that, the variation in lighting will be of the similar order as that of the variation due to the change in subjects [1] and this can result in misclassification. For example, in the acquisition of the face of a person from a real time video, the ambient conditions will cause different lighting variations on the tracked face. Some examples of images with different illumination conditions are shown in Fig. 1. In this project, we study some algorithms that are capable of performing Illumination Invariant Face Recognition. The performances of these algorithms were compared on the CMU-Illumination dataset [13], by using the entire face as the input to the algorithms. Then, a model of dividing the face into four regions is proposed and the performance of the algorithms on these new features is analyzed.

Fig. 1: An example of change in Illumination and change in subjects: Image Source [13]. Row 1 shows the changes in the image of the same subject due to different Illumination

conditions. Row 2 shows a different subject.

1. RELATED WORK ON ILLUMINATION INVARIANT FACE RECOGNITION There are many algorithms found in the literature [2] that deal with the ill-posed problem of Illumination Invariant recognition of faces. These algorithms can be divided into four basic types: 1) Heuristics 2) Image Comparison methods 3) Class based methods 4) Model Based methods Heuristic Approaches make some minor modifications to the existing algorithms. A well known method is to remove the first three principal components in the Eigenfaces technique [3]. This method stems from the belief that, most of the variations in an image will be due the Illumination changes and these variations will be represented by the eigenvectors corresponding to the three highest eigenvalues. Another approach is to perform histogram equalization on the ill-lit images [4]. Heuristic methods give a marginal improvement in the performance, but, can't be relied upon always. Image comparison methods use different image representations and distance measures for classification [1]. Some of the image representations used were edge maps, derivatives of gray levels and images filtered with 2-D Gabor like functions. Typical distance measures like, point-wise distance, regional distance and affine-GL distance were used. But, it was concluded that none of these methods alone can

overcome the image variations due to illumination changes. The Class Based approaches work on the assumption of lambertian surfaces with no shadowing effects. In [5], a 3-D linear subspace was constructed for each person [using three aligned faces for each person under different lighting conditions]. Under these ideal conditions, this method was proved to be illumination invariant. But, it is very hard to get the image conditions required by this algorithm. Some other class based approaches are, Fisherfaces [4], Bayesian Face Recognition [6] The model based approaches are computation intensive and give better results than the other methods. In these methods, a 3-D face model is used to synthesize a virtual image from a given image under the desired illumination conditions. One such best performing method is the Morphable Model for the Synthesis of 3-D faces [11].

1. PROBLEM FORMULATION

The overall Face Recognition system contains three distinct stages namely; Face Detection, Feature Extraction [included cropping] and the recognition schemes. This system is shown in Fig. 2.

Fig. 2: Configuration of a Generic Face Recognition System: Image Source [2] In this project, we are concerned only with the Face Recognition part. The CMU-Illumination dataset [13] has images of 68 individuals, with 21 different lighting conditions for every individual. All the faces were of the size 48*40. Some of these images will be used to train the algorithm. These images will be used by the algorithms to construct a feature space and, the hope is that, the training set is good enough to capture the changes in illumination and the facial changes between different persons. Then, these features will be used by the algorithms to nullify the effect of lighting changes. The remaining images in the dataset will be used for the testing phase. We test the performance of four algorithms, namely, Image Gradient method [7], Image Preprocessing approach [8], Bayesian Face Recognition [6] and Eigenfaces [9]. These algorithms fall under the Holistic Template based approach in which, all the input features will be used for classification and recognition. This is in contrast to the feature geometry approaches that use statistical data of some special features in the object for performing Face Recognition. Finally, we compare the recognition rates of these algorithms with different facial features used for training and testing. A Model is proposed to divide the face into different regions, like; the eyes, nose and chin [see Fig. 3]. This model has two high frequency regions and two low frequency regions. Interesting results were obtained when these features were tested on different algorithms.

Fig. 3: The Proposed Model of Dividing the Face into Four regions

1. IMAGE GRADIENT BASED FACE RECOGNITION 4.1 Basic Idea of the Algorithm: In this paper [7], the problem of determining functions of an image of an object that are insensitive to illumination changes was considered. It was shown show that for an object with Lambertian reflectance there are no discriminative functions that are invariant to illumination. This result led the authors to adopt a probabilistic approach in which they

analytically determined a probability distribution for the image gradient as a function of the surface's geometry and reflectance. The distribution revealed that the direction of the image gradient is insensitive to changes in illumination direction. Using this distribution, an illumination insensitive measure of image comparison was developed and tested for the problem of face recognition. 4.2 Implementation: The Gradient was computed at all the pixel locations on both dimensions. Then, the direction of the gradient was found using the inverse tangent formula. We consider that, the pixels in the human faces have some degree of correlation and hence, the following formula was used to compute the gradient in the X and Y direction.

This distribution of the Image Gradient direction will now serve as the comparison feature for both the training and testing data. 4.3 Results: The performance of the Image Gradient Based method on all five features used for testing – the entire face, eyes, nose and the chin regions, is given in Fig. 4. This method performs the best when Entire Face is provided as the input, followed by eyes and nose as the inputs. This is satisfying because, the chin regions have very low frequency information and hence, the gradient technique performs poorly in these regions than the high frequency rich eyes and nose regions.

Fig. 4: Performance of Image Gradient Method on different Test Features

1. BAYESIAN FACE RECOGNITION 5.1 Basic Idea of the Algorithm: This paper [6] proposes a method for matching images using a probabilistic measure of similarity, based primarily on the Maximum-a-Posteriori analysis of the Image differences. A new method for replacing the complex calculation of on-line Bayesian Similarity measures, with, less expensive linear subspace measures and simple Euclidean norms, was proposed. This reduces the computational complexity of this method, when compared with the standard face recognition algorithms.

5.2 Implementation: It is assumed the image intensity differences are characteristic of the typical variations in the appearance of the individuals. Two distinct classes, namely, Intrapersonal Variations and Interpersonal Variations were defined. The Intrapersonal class captures the variations of the same individual due to different lighting conditions. The Interpersonal Class captures the variations between different individuals. The likelihoods for both classes will be computed for all images in the database, and, output will be assigned to the class with the highest similarity measure.

is the similarity measure used. When the prior probabilities are unknown, the maximum likelihood will be used for comparison. The likelihoods for both the classes can be computed using the following formula:

Since the above formula involves inversion of the covariance matrix, the complexity of this method is very high. As suggested in [7], the offline transformation method was used to compute the likelihood using the following formula:

where, ej ,ij are the whitened version of the input images. These likelihoods will now serve as the measure for comparison. 5.3 Results: The performance of the Bayesian Face Recognition method on all five features used for testing – the entire face, eyes, nose and the chin regions, is shown in Fig. 5. This method performs nearly the same under the five test conditions. So, even the low frequency regions have something to differentiate between different persons and of course, the illumination changes. So, all the test features have some information to make this class based method to classify the test data with reasonable accuracy.

Fig. 5: Performance of Bayesian Approach on different Test Features

6. IMAGE PREPROCESSING ALGORITHM FOR ILLUMINATION INVARIANT FACE RECOGNITION

6.1 Basic Idea of the Algorithm: This method [8] works on the widely accepted model [12] that, the image is the product of luminance and reflectance. Since reflectance is the 'perceived sensation', the authors estimate the luminance from the given image, and, divide the image by the luminance to get the reflectance component. Reflectance contains only the features pertaining to the individuals, and, not due to lighting variation. So, this can result in significant improvement in the recognition rate. 6.2 Implementation: The perception gain model is adopted in this method. The general solution for estimating the Luminance information in the image is obtained by minimizing

where the first term drives the solution to follow the perception gain model, while the second term imposes a smoothness constraint. Ω refers to the input image. The parameter λ controls the relative importance of the two terms. The space varying permeability weight ρ(x, y) controls the anisotropic nature of the smoothing constraint. The Euler-Lagrange equation for the calculus of this variation problem yields,

A discretization lattice (Fig. 6) is used to get a Partial Differential Equation equivalent of the above equation.

This equation was implemented and the pixel value at each location was substituted by the term in the left side of the equation. The parameter h is responsible for attaching appropriate weights to the luminance component depending on the neighbor pixel values.So, for all the training and testing images, we solve this above partial differential equation. The resulting images will serve as the feature set. Then, Eigen faces or Principal Component Analysis was performed and a significant improvement in performance was obtained.

Fig. 6: Discretization Lattice for the PDE 6.3 Results: The performance of the Image Preprocessing Algorithm on all five features used for testing – the entire face, eyes, nose and the chin regions, is shown in Fig. 7. This algorithm performs well even in the low frequency regions. When the number of training samples per class is less, its performance in the left chin region is a close second best to the case when entire face is provided as the input. There is a dip in the performance when the ninth sample of all the persons was included for training. This may be because, the features represented by the ninth sample and any one of the first eight samples are so close that it results in an error.

Fig. 7: Performance of Image Preprocessing Method on different Test Features

7. EIGENFACES 7.1 Basic Idea of the Algorithm: The Eigenfaces approach [9] tries to project the image from the original image space onto the directions of maximum scatter. When the scatter is more, the information is distributed without much overlap. By this way, a high dimensional data can be reduced into low dimensional scatter directions [see Fig. 8]. This is basically a dimensionality

reduction technique and it is a computationally efficient alternative to the normal process of correlation. Even with reduced dimensions, Eigenfaces can provide the results that correlation could achieve.

Fig 8: Principal Component Decomposition: Source: Internet. PCA 1 represents the direction of maximum scatter, which corresponds to the maximum eigenvalue. PCA 2

doesnot give a good representation of the data and hence can be thrown away.

7.2 Implementation: A covariance matrix is constructed from the training data set. We then perform Eigen Decomposition of the covariance matrix. The eigenvectors corresponding to the largest Eigen values denote the directions of maximum scatter. We select some of these eigenvectors and project both training and testing dataset onto the directions defined by these eigenvectors. For example, in this project, each of the 48*40 training images is converted into column vector of size 1920*1. So, the dimension of the original image space is 1920. Eigen Decomposition was performed, and if the top 100 eigenvectors are chosen, the resulting dimension of the images will be 100. We now work in this reduced subspace, which still has most of the information about the features of the images. Finally, the comparison between the test and training data is done using the nearest neighbor classifier. 7.3 Results: The performance of the Eigenfaces technique on all five features used for testing – the entire face, eyes, nose and the chin regions, is given in Fig. 9. This method performs

equally well on all the test conditions. It is very difficult to find the best performing feature as the graphs cross over throughout the training sample range.

Fig. 9: Performance of Eigenfaces Method on different Test Features

8. Eigenfaces with Histogram Equalization: 8.1 Implementation: This is a heuristic technique. Histogram Equalization was performed on all the training

and testing images. Lighting changes in the images will make the image histogram to be more biased towards the higher gray values. Histogram Equalization outputs an image with a uniformly distributed histogram. In other words, it tries to enhance the contrast of the image by spreading the histogram over the entire gray level values. This method resulted in a slight improvement in the performance, when compared with the baseline Eigenfaces approach. 8.2 Results: The performance of Histogram Equalized Eigenfaces is provided in Fig. 10 for all the test features.

Fig. 10: Performance of Histogram Equalized Eigenfaces Method on different Test Features

9. PERFORMANCE COMPARISON 9.1 Entire Face: When Entire face was used for the computation, the Image Gradient method is performing the best, followed by the Image Preprocessing Algorithm. There is a very good improvement in Eigenfaces method with Histogram Equalization. The Bayesian approach also performs in a way similar to the histogram equalized eigenfaces method. The results are shown in Fig. 11. So, this is a proof of principle that, the direction of the Image Gradient is insensitive to the changes in the lighting conditions. Even with only one training sample per person, the performance of this algorithm is around 70%. This is a very encouraging result.

Fig. 11: Comparison of Performance of all Algorithms on the Entire face Features

9.2 Parts of the Face:

The Gradient Based method performs better when the eyes were used as the feature. For nose and chin regions, Image Preprocessing method clearly outperforms other methods. For the chin regions, Bayesian method also performs well. When the number of training samples per person is more than eight, the performance of these algorithms is close to 80%. The comparison of the results is provided in Fig. 12.

Fig. 12: Comparison of Performance of all Algorithms on Different Parts of the FaceRow 1: Eyes and Nose

Row 2: Left and Right Chin regions

10. CONCLUSIONS, DISCUSSIONS AND FUTURE WORK

The results of testing on the parts of the face have revealed some interesting details. From the above graphs, it is clear that these algorithms have the capacity of performing well, even with low frequency features [like the chin], given that, we have sufficient training samples with illumination variations. The chin region is one of the hardest regions to classify because; there are minimal variations in that region among different individuals. By testing on different parts of the face, we are effectively providing different features of the face to the algorithms. An obvious thought that would arise is – to combine the outputs of the classifiers. There are two methods of combining the classifier outputs. The simplest method is to provide the same feature to all the algorithms and then take a majority vote from the outputs of the algorithms. There is a very high probability that, a majority vote classifier will provide better recognition rates than the performance of individual classifiers. The next method is to combine the outputs of the algorithms, with different features taken as the input. For example, the gradient method performs the best on the eyes, whereas, the Image Preprocessing method is a top performer for nose and chin. So, the outputs of gradient method [for eyes] and Image Preprocessing method [for nose and chin] can be combined. [10] explores the different possibilities of combining the classifier outputs with different feature inputs. The authors of that paper have concluded that, by combining the classifiers in that way, the error in output label will be reduced. So, we can hope for a boost in recognition rate by implementing various combination strategies. Another possible improvement is to automatically detect the high and low frequency regions in the subjects. The model suggested in this project may be applicable only for the human faces. If we were to recognize the presence of some objects that are not of the same shape as the human faces, this model cannot be used. Automatic detection of features can be thought of as measuring the magnitude of the gradient across the image, and then by setting some threshold, high frequency and low frequency features can be extracted. We should be aware that, our test and training data are well controlled, in the sense that, we consider properly cropped faces. In real-time Face recognition, accurate face detection is vital. Also, we may not have samples corresponding to different illumination conditions for all the subjects. In certain cases, we may have only one sample for training and that too can be of different pose or guise. In other cases, we may not have the test individual as a part of the training set. So, the performance of the system in those cases will be poor. In conclusion, we expect a substantial improvement in the recognition rates [for illumination invariance] by combining classifiers. The state-of-art methods perform very well than the individual algorithms implemented in this project. But, the proposed model of dividing the faces into different regions can be incorporated into those best performing methods to further improve their recognition rates!

11. REFERENCES

1. Yael Adini, Yael Moses, Shimon Ullman," Face Recognition: the Problem of compensating for Changes in Illumination Direction", IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), 1997. 2. W. Zhao, R. Chellappa, P.J.Phillips and A. Rosenfeld, " Face Recognition: A Literature Survey", ACM Computing Surveys, 2003. 3. PN Belhumeur, JP Hespanha, DJ Kriegman, " Eigenfaces vs. Fisherfaces: recognition using class specific linear projection", IEEE Transactions on PAMI, 1997. 4. Sung K-K., Poggio.T, "Example-based learning for view-based human face detection", IEEE Transactions on PAMI, 1998. 5. Georghiades, A., Kriegman .D, Belhumeur .P, " Illumination Cones for recognition under variable lighting in Faces", Proceeding of IEEE Conference on CVPR, 1998 6. B Moghaddam, T Jebara, A Pentland , " Bayesian face recognition", Pattern Recognition, Vol. 33, No. 11, pps. 1771-1782, November, 2000. 7. HF Chen, PN Belhumeur, DW Jacobs, " In search of illumination invariants", Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2000 8. R Gross, V Brajovic, "An image preprocessing algorithm for illumination invariant face recognition", 4th International Conference, AVBPA 2003. 9. M Turk, A Pentland, "Eigenfaces for recognition", Journal of Cognitive Neuroscience, 1991. 10. J Kittler, M Hatef, RPW Duin, J Matas, " On combining classifiers", IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998. 11. V Blanz, T Vetter, " A morphable model for the synthesis of 3D faces", International Conference on Computer Graphics and Interactive Techniques, 1999. 12. Horn .B, "Robot Vision", MIT Press, 1986 13. Sim, T. Baker, S. Bsat, M., "The CMU Pose, Illumination, and Expression

(PIE) database", Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002.

illumination invariant face recognition - …raghuram/publications/courseprojects/enee63… ·...

Documents