b.mohabbati, r.ebrahimpour, s.kasaei, e.kabir department of mathematics and computer science...

B.Mohabbati , R.Ebrahimpour , S.Kasaei , E.Kabir Department of Mathematics and Computer Science

Amirkabir University of TechnologySchool of Cognitive Sciences, Institute for Studies

in Theoretical Physics and Mathematics (IPM)Computer Engineering Department, Sharif University of Technology

Department of Electrical and Computer Engineering, Tarbiat Modarres University

Tehran, Iran

Neural Networks Ensembles For Face Recognition

Outline

• Face Recognition– What is face recognition?– Its applications– Different approaches– Neural Networks approach– Combining Classifiers– Experimental results– Conclusions

It works with the most obvious individual identifier – the human face.

Biometrics : digital analysis using cameras or scanners of biological characteristics such as facial structure, fingerprints and iris patterns to match profiles to databases of people

What is Face Recognition?

• A set of two task:– Face Identification: Given a face image that belongs to a person in a

database, tell whose image it is.– Face Verification: Given a face image that might not belong to the database,

verify whether it is from the person it is claimed to be in the database.

Applications in security :• Authentication• Identification

Applications

Multimedia ManagementSecuritySmart CardsSurveillanceAirports and railway stationsStadiumsPublic transportationGovernment officesBusinesses of all kinds

Quick way to discover criminals

face informationfor each person

in a fixed domain

a person whose face isin the input image

Face recognition system

Different Approaches

• Geometric Features • Euclidian Distance• Graph Matching • Template Matching• Transform-Markov Model• Hidden-Markov Model• …

Neural computing provides technical informationprocessing methods that are similar to the way information is processed in biological systems, such as the human brain

Feature-Based Approaches:

Holistic Approaches :

• Neural Networks

Eigenfaces “Principal Component Analysis(PCA) “Fisherfaces “Linear Discriminant Analysis(LDA)”

Biological inspirations

• Some numbers…– The human brain contains about 10 billion nerve cells

(neurons)– Each neuron is connected to the others through 10000

synapses

• Properties of the brain – It can learn, reorganize itself from experience– It adapts to the environment – It is robust and fault tolerant

Biological neuron

• A neuron has– A branching input (dendrites)– A branching output (the axon)

• The information circulates from the dendrites to the axon via the cell body• Axon connects to dendrites via synapses

– Synapses vary in strength– Synapses may be excitatory or inhibitory

What is an artificial neuron ?

• Definition : Non linear, parameterized function with restricted output range

x1 x2 xn

y

w0

w1 w2 wn

Activation functions

Linear Logistic

Activation functions++ +

++

+ ++++

+++

+ +++

++ +

++ +++

++

+++ +

+++ +

++Perceptron

• Rosenblatt (1962)

• Linear separation

• Inputs :Vector of real values

• Outputs :1 or -1

The perceptron algorithm converges if examples are linearly separable

Learning

• The procedure that consists in estimating the parameters of neurons so that the whole network can perform a specific task

• 2 types of learning– The supervised learning

– The unsupervised learning

• The Learning process (supervised)– Present the network a number of inputs and their corresponding outputs– See how closely the actual outputs match the desired ones– Modify the parameters to better approximate the desired outputs

Idea : group typical input data in function of resemblance criteria un-known a prioriData clusteringNo need of a professor

The network finds itself the correlations between the dataExamples of such networks :

Kohonen feature maps

The desired response of the neural network in function of particular inputs is well known.A “Professor” may provide examples and teach the neural network how to fulfill a certain task

Multi-Layer Perceptron (MLP)

• One or more hidden layers

• Sigmoid activations functions

1st hidden layer

2nd hidden layer

Output layer

Input data

Class1 Class2

• Back-propagation algorithm

I f the jth node is an output unit

Credit assignment

Learning Rate : Momentum term to smoothThe weight changes over time

Error Back Propagation

-Used to train the MLP

-Uses gradient descent to minimise the

squared error between actual and desired outputs

-The error is summed over all inputs

-Error space can have local minima,

which traps gradient descent

MLP Learning method

Back Propagation Algorithm

Initialise weights

Present input and target

Calculate the actual outputGiven this input

From last layer, work backwardsUpdating the weights

Input Pattern 1 Target Output 1

=

Input Pattern 2 Target Output 2

= 1000 “ Binary digit”

0100

=

Input Pattern n

Properties of Neural Networks

• Supervised networks are universal approximators• Theorem : A Multi Layer Perceptron (MLP) with only one hiddenlayer has the

capability to act as a universal approximator, (Hornik et al., 1989)

• Type of Approximators– Linear approximators : for a given precision, the number of parameters grows

exponentially with the number of variables (polynomials)– Non-linear approximators (NN), the number of parameters grows linearly with the

number of variables

Three-Layer Arbitrary(Complexity

Limited by No.of Nodes)

A

AB

BB

A

Face recognition using Neural Network

Problems

• Work done on a very selective set of face images, mostly:– In upright position– Lighting and background controlled– Either in frontal or profile view– Have no occlusions, facial hair

Image DataTrain-Test

Domain Transformation(Preprocessing)

Neural Networks model(Classifier)

Standard Face Recognition Architecture

Classified face resultFace recognition

Requirements

• Accurate• Efficient• Light invariant• Rotation invariant

and …

Why Preprocessing ?• The curse of Dimensionality

– The quantity of training data grows exponentially with the dimension of the input space

– In practice, we only have limited quantity of input data • Increasing the dimensionality of the problem leads to give a poor representation

of the mapping• The preprocessing has a huge impact on performances of neural networks

Preprocessing methods • Normalization -Inputs of the neural net are often of different types with different orders of magnitude

(E.g. Pressure, Temperature, etc.)

- It is necessary to normalize the data so that they have the same impact on the model - Translate input values so that they can be exploitable by the neural network

Component reduction• Sometimes, the number of inputs is too large to be exploited• The reduction of the input number simplifies the construction of the model• Goal : Better representation of the data in order to get a more synthetic view without losing relevant information• Reduction methods (PCA, CCA(Curvilinear Components Analysis), etc.)

Principal Components Analysis (PCA)

• Principle– Linear projection method to reduce the number of parameters – Transfer a set of correlated variables into a new set of uncorrelated variables– Map the data into a space of lower dimensionality– Form of unsupervised learning

• Properties– It can be viewed as a rotation of the existing axes to new positions in the space defined by

original variables– New axes are orthogonal and represent the directions with maximum variability

Faces viewed as vectors

Supposing a data set X with N data points, each of dimensions P

The PCA - Eigenface

For training set X1, X2, …… XN

• Average face

• Difference vector (Zero Mean)

•Covariance matrix

•Choose i largest Eigenvalues“ i is the inherent dimensionality of the subspace governing the original image “

Here we will summarize the concept and method to finding principal factors:

XM = X -

First principal components

= 0.9571 * -0.1945 * +0.0461 * +0.586 *

Training image set

Adjusted Training image set

Image mean PCA N -Eigenface

M-PCA

The PCA - Eigenface

M-PCA

Training image set



M-PCA Combining Classifiers

Training the classifiers

Static committee machines (Ensemble):Static committee machines are ones where the responses of experts are combined without the mechanism seeing the input.Ensembles of classifiers are trained on different or similar data and using different or similar features. The classifiers are run simultaneously and their outputs are merged into one compound classification.2 main methods : “Ensemble averaging and Boosting”

Two main approach to committee machines (Combining classifiers, classifiers fusion):1-Static committee machines (Ensemble) 2- Dynamic committee machines (modular)

…

y2(n)

yL(n)

Input x(n)

Expert L

Expert1

Expert2 Combiner output

y1(n)

Neural Networks Ensembles

Dynamic committee machines: input signal is directly involved in combining ouputs

Eg Mixtures of experts and hierarchical mixtures of experts

Gating network decides the weighting of each network

Dynamic committee machines:

y1(n)

…

Gating network

outputy2(n)

yL(n)

Input x(n)

g1(n)

g2(n)

gL(n)ExpertL

Expert1

Expert2

Majority voting : In majority voting every voter has one vote that can be cast for any one candidate. The candidate that received the majority (i.e. more than half) of the votes, wins the election.Average voting : Average of vote and the candidate with the highest average wins the election.

Product rule voting :Eeach voter gives a confidence value for each candidate. Then all confidence values are multiplied per candidate. The candidate with the highest confidence product wins

Some of Ensemble Methods:

Classifiers Support for W1 Support for W2 Decision

C1 0.8 0.2 W1

C2 0.4 0.6 W2

C3 0.3 0.7 W2

C4 0.6 0.4 W1

C5 0.3 0.7 W2

MAJ - - W2

AVR 0.48 0.52 W2

PRO 0.01728 0.2352 W2

C1

C2

C3

C4

C5

CombinerW1

W2

Purposed method:

Consider the Ensemble network system shown in this Fig. Here we see a set of M “level-0” networks . N1 to Nm whose outputs are combined using a “level-1” Network N* .

Training Rule: The idea is to train the “level-0” networks first and then examine their behavior when generalization. This provides a new training set which is used to the train “level-1” network.

…

…y1

y2

ym

yM-PCA

N1

N2

Nm

N*

……

level-0 level-1

We have used the ORL database which contains a set of faces taken between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK.There are 10 different images of 40 distinct subjects. There are variations in facial expression (open/closed eyes, smiling/non-smiling), and facial details (glasses/no glasses). All the images were taken against a dark homogeneousbackground with the subjects in an up-right, frontal position, with tolerance for some tilting and rotation of up to about 20 degrees. There is some variation in scale of up to about 10.The images are grayscale with a resolution of 92*112.

Face Database

The set of 10 images for one subject, considerable variation can be seen

Experimental results and discussion

Experimental results and discussion

Training image set



Step 1


Step 2

Testing image set

Zero meanTesting images


Step 3Classified faced

Principal Components

Net Topology

Best (%) Average(%)

1-25 25:40:40 89 87.7

1-30 30:80:40 88.5 87.9

1-35 35:80:40 90 89.7

1-40 40:40:40 89.5 88.9

1-50 50:40:40 89.5 86.6

1-100 100:60:40 86 84.1

1-200 200:80:40 84 82.2

Percentage correct classification on test set (200 faces – 5 times repetition)Different net topologies for the same input data

Principal Components

Net Topology

Best Average

1-40 40:40:40 89.5 88.9

1-40 40:20:40 89.5 83.8

1-40 40:80:40 90.5 89.1

1-40 40:100:40 89 87.6

Manually selected committee

Averaging Majority voting

Purposed method

40:20:40 (5 nets) 90.5 88.15 93.35

40:40:40 (5 nets) 88.35 88.5 95

40:80:40 (5 nets) 92 90 97.15

Correct rate Percentage of

Some manually selected committees

…y1

y2

y5

yM-PCA

MLP1

MLP2

MLP5

MLP*……

level-0 level-1

Conclusion

Three main strategies for making independent individual classifiers:

1. Using different procedure, for example different kind of classifiers or different parameters in an identical procedure. For example different MLP can be trained based on different initial weights or learning parameter or number of nodes.

2. Using different representation of input (or feature set), although this approach is practically effective in improving generalization, managing the procedure and analyzing is more complicated.(K.M. Ali and M.J. Pazzini,1995)

3. Using identical representation but different training sets

AcknowledgmentsI Would like to thank prof. Shahshahani for helpful

comments and Olivertti ReseaerchLaboratory for develop and maintaining the ORL

database .

Tanks for your attention

b.mohabbati, r.ebrahimpour, s.kasaei, e.kabir department of mathematics and computer science...

Documents

face recognition slide

face image

human face

face identification

face verification

criminals face information

inhibitory slide

separable slide