homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/xue_rpt.docx · web viewthe orl...

8
ECE 539 Individual Project Report Face Recognition based on 2D-PCA and Convolutional Neural Network Name: HONGLIANG XUE 1. Topic/Problem Face Recognition is a common problem in machine learning. And the technology has already been widely used in our lives. For example, Facebook can automatically tag people’s faces in images, and also some mobile devices use face recognition to protect private security. The objective of this project is to examine and compare the performances of two different methods in face recognition: 2D-PCA and Convolutional Neural Network. In this project, the ORL face database[1] will be used as training and test data. MATLAB is used for implementing both two algorithms in this project. 2. Database The ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face images taken between April 1992 and April 1994 at the Cambridge University Computer Laboratory.[1] The database contains 10 different images of each of 40 distinct subjects. For the same subject, the images may vary in lighting, angle, facial expressions and facial details. But the backgrounds of these images are all a black homogenous background. The following is a snapshot of this database:[1]

Upload: others

Post on 19-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

ECE 539 Individual Project ReportFace Recognition based on 2D-PCA and Convolutional Neural Network

Name: HONGLIANG XUE1. Topic/Problem

Face Recognition is a common problem in machine learning. And the technology has already been widely used in our lives. For example, Facebook can automatically tag people’s faces in images, and also some mobile devices use face recognition to protect private security.

The objective of this project is to examine and compare the performances of two different methods in face recognition: 2D-PCA and Convolutional Neural Network.

In this project, the ORL face database[1] will be used as training and test data.MATLAB is used for implementing both two algorithms in this project.

2. DatabaseThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

images taken between April 1992 and April 1994 at the Cambridge University Computer Laboratory.[1]

The database contains 10 different images of each of 40 distinct subjects. For the same subject, the images may vary in lighting, angle, facial expressions and facial details. But the backgrounds of these images are all a black homogenous background.

The following is a snapshot of this database:[1]

Because the dataset is unlabeled, I manually label it into 40 classes corresponding to the 40 subjects, each class have 10 data points. Then I split the dataset into training set and testing set.

Page 2: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

3. Approaches(1) 2D-PCA (Two-Dimensional Principal Component Analysis)

The idea of 2D-PCA comes from 1D-PCA. As Yang etc. have introduced in their paper[2], because 1D-PCA only works on 1D vectors, the image matrix must be transformed into 1D vectors when operating 1D-PCA algorithm. However, the 2D-PCA can work directly on image matrixes. It use the image matrixes in the training set to form a covariance matrix, then use that to extract the features of each image, and finally do classification on testing set.

In detail, the 2D-PCA method is to project image A, an m× n matrix onto a projection vector X by linear transformation:

Y=AXHere Y is the projected feature vector of image A.To reduce the probability of misclassification, we want Y to as scattered as possible. The

scatter of Y can be characterized by the trace of covariance of the feature vectors, which is denoted as tr (∑), ∑ is the covariance matrix of Y.

∑=E (Y−EY ) (Y−EY )T=E [ AX−E ( AX ) ] [ AX−E ( AX ) ]T=E [(A−EA) X ] [( A−EA) X ]T

So, the trace can be calculated:

tr (∑ )=[( A−EA) X ]T [( A−EA) X ]=XT [ E ( A−EA )T( A−EA)] X

Let G=E ( A−EA )T ( A−EA )

, it is the so-called the image covariance matrix, which

can be easily evaluated using training images.

G= 1M ∑

j=1

M

( A j−A )T( A j−A)

Where A j is the j-th training image, and A is the mean of all training images.So we get

tr (∑ )=XT [ E ( A−EA )T ( A−EA)] X=XT GX

We choose X1 , X2 ,…, Xd which are orthonormal eigenvectors of G corresponding to the first d largest eigenvalues as the projection vectors, and then do the projection

Y k=A X k , k=1 , 2 ,3 , …, dY 1 ,Y 2 ,…,Y d are the principle components or feature vectors of image A. Similarly, we

can get the feature vectors of all training and testing images. To classify the testing data, we just apply a nearest neighbor classifier using Euclidean

distance.

(2) CNN (Convolutional Neural Network)The reason why convolutional neural network (CNN) is applied to handle the face

recognition problem is that, as Lawrence etc. stated [3], the problem is ill-posed. So that even if some models may work very well in some cases, they cannot generalize well to those ‘unseen’ images. Convolutional Neural Network (CNN) are able to incorporate some constraints and get some degree of shift and deformation invariance.

A CNN usually consists of many layers and each of them will contain one or planes.

Page 3: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

There are input layer, convolutional layer, subsampling/max-pooling layer, fully-connected/output layer, and usually followed with an MLP or other type of classifier to train the features extracted.

Convolutional layers are the most important part of CNN. As its name says, convolution is used in this layer so as to detect features of the image. Usually, a convolutional layer will contain multiple planes, which use different convolution kernels to detect multiple features.

In CNN, once a feature is detected in the convolutional layer, its exact position in the image will become not so important. So we will have a subsampling layer, which use max-pooling or averaging or other method to do local averaging and subsampling operation.

The unique connection strategy of CNN allows us to reduce the number of weights that need to be computed.

Generally, the overall network will train using backpropagation gradient-descent method, which is the same as MLP.

4. Experiment results(1) 2D-PCA (Two-Dimensional Principal Component Analysis)

2D-PCA is not such a complex algorithm, so I wrote my own MATLAB code to implement this method.

I split the ORL dataset into training and testing set with different ratio, and also use different d values (the number of feature vectors extracted from the image)

Note that I didn’t apply cross-validation in the training process. It is because I didn’t realize it at the beginning until I found I’m not able to modify the code before the deadline.

Here are the plots of my experiment result:

# of training data each class1 2 3 4 5 6 7 8 9

accu

racy

(%)

70

75

80

85

90

95

100

d=2d=4d=8d=16d=32

Page 4: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

As these plots show, the 2D-PCA method is quite fast, only about 2 seconds each time. The accuracy is around 90% to 95% for # of training samples ≥ 6, which is quite good.The accuracy should be higher if the number of training data is larger, as figure 1 shows.For larger d, the time needed should be longer, as figure 2 shows.For larger d, the accuracy should be higher, because more features are extracted for comparison.However, it is not so obvious in figure 1. I think it is because the dataset is not very big,

and even if we choose a small d, we can still get a quite high accuracy.

# of training data each class1 2 3 4 5 6 7 8 9

time(

s)

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

d=2d=4d=8d=16d=32

Page 5: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

(2) CNN (Convolutional Neural Network)To implement Convolutional Neural Network (CNN) in MATLAB, I have used the

deepLearnToolbox-master written by Rasmus Berg Palm[4]. (Much Thanks to Rasmus for sharing the toolbox on github)

During the experiment of using CNN, the biggest difficulty is to determine the parameters of the network, such as the number of convolutional layers, the window size of convolution, the number of planes in each convolutional layer, the subsampling rate in subsampling layer, learning rate, etc.

I have tried using many combinations of parameters, and finally I determine to use the one same as what Lawrence etc. used in their paper[3], because it returns the best result. (Just as the figure[3] shows)

For the learning rate, I just set it to a constant = 0.3, while in Lawrence’s trial, they set it as a function of epoch number, which varies from 0.5 to 0.01 with the increase of epoch.

The result of this experiment is about 85% accuracy on classifying testing data, which is some lower than 94% in the paper. I am not sure whether it is because of the learning rate, but it should be one of the reasons.

For my other trials, I find that changing the window size, plane number or subsampling rate doesn’t have significant effects unless the change is really big. And I have tried to add one more convolutional layer as well as subsampling layer, but it turns out that much more time is consumed but the accuracy even decreases. I haven’t figured out the reason.

Time is also an important factor to the difficulty of tuning the network. Generally, it takes about one hour to train the whole network, so I cannot try so many times.

Page 6: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

5. Conclusions1. 2D-PCA

(1) Advantages:Algorithm simple, easy to use, accuracy quite high

(2) Disadvantages:Requires small variance of sample images, hard to tell its performance on other larger datasets

2. CNN(1) Advantages:

More stable performance, able to handle much larger dataset(2) Disadvantages:

Difficult to find the optimal parameters for the network, takes a lot of time to train the network

According to the result of this project, we have to say that both two algorithms have their pros and cons, and it is really hard to tell which one is better or worse.However, we notice that the face recognition problem can be roughly divided into two types(1) We need to find a person from a huge dataset. The system returns a list of most likely

people. This doesn’t requires the procedure to be done in real-time. For example, the police database.

(2) We need to identify a person/a group of people with a relatively small dataset. But it needs to be done in real-time. For example, a security monitoring system or mobile privacy security system.

It is obvious that CNN will be more suitable for the first case, and 2D-PCA will be better for the second one. Therefore, we can get to the conclusion that we should use the most proper one under certain situations.

6. Future work3. 2D-PCA

Modify the code so that cross-validation is used in the algorithmUse other datasets, especially larger ones, to see its performance

4. CNNContinue tuning the network and see whether it can perform betterRead through the MATLAB toolbox codes and try to write my ownAlso use other datasets to see its performance

Page 7: homepages.cae.wisc.eduhomepages.cae.wisc.edu/~ece539/project/s16/Xue_rpt.docx · Web viewThe ORL database, which is formerly called ‘The ORL Database of Faces’, is a set of face

7. References1. The ORL database of faces, AT&T Laboratories Cambridge,

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html2. Jian Yang; Zhang, D.; Frangi, A.F.; Jing-Yu Yang, “Two-Dimensional PCA: A New

Approach to Appearance-Based Face Representation and Recognition”, in Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 1, pp. 131-137, January 2004

3. Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. “Face Recognition: A Convolutional Neural-Network Approach”, in Neural Networks, IEEE Transactions on, vol. 8, no. 1, pp. 98-113, January 1997

4. DeepLearnToolbox, Rasmus Berg Palm, GitHub, https://github.com/rasmusbergpalm/DeepLearnToolbox