b.mohabbati, r.ebrahimpour, s.kasaei, e.kabir department of mathematics and computer science...
Post on 21-Dec-2015
213 views
TRANSCRIPT
![Page 1: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/1.jpg)
B.Mohabbati , R.Ebrahimpour , S.Kasaei , E.Kabir Department of Mathematics and Computer Science
Amirkabir University of TechnologySchool of Cognitive Sciences, Institute for Studies
in Theoretical Physics and Mathematics (IPM)Computer Engineering Department, Sharif University of Technology
Department of Electrical and Computer Engineering, Tarbiat Modarres University
Tehran, Iran
Neural Networks Ensembles For Face Recognition
![Page 2: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/2.jpg)
Outline
• Face Recognition– What is face recognition?– Its applications– Different approaches– Neural Networks approach– Combining Classifiers– Experimental results– Conclusions
![Page 3: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/3.jpg)
It works with the most obvious individual identifier – the human face.
Biometrics : digital analysis using cameras or scanners of biological characteristics such as facial structure, fingerprints and iris patterns to match profiles to databases of people
What is Face Recognition?
• A set of two task:– Face Identification: Given a face image that belongs to a person in a
database, tell whose image it is.– Face Verification: Given a face image that might not belong to the database,
verify whether it is from the person it is claimed to be in the database.
![Page 4: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/4.jpg)
Applications in security :• Authentication• Identification
Applications
Multimedia ManagementSecuritySmart CardsSurveillanceAirports and railway stationsStadiumsPublic transportationGovernment officesBusinesses of all kinds
Quick way to discover criminals
face informationfor each person
in a fixed domain
a person whose face isin the input image
Face recognition system
![Page 5: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/5.jpg)
Different Approaches
• Geometric Features • Euclidian Distance• Graph Matching • Template Matching• Transform-Markov Model• Hidden-Markov Model• …
Neural computing provides technical informationprocessing methods that are similar to the way information is processed in biological systems, such as the human brain
Feature-Based Approaches:
Holistic Approaches :
• Neural Networks
Eigenfaces “Principal Component Analysis(PCA) “Fisherfaces “Linear Discriminant Analysis(LDA)”
![Page 6: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/6.jpg)
Biological inspirations
• Some numbers…– The human brain contains about 10 billion nerve cells
(neurons)– Each neuron is connected to the others through 10000
synapses
• Properties of the brain – It can learn, reorganize itself from experience– It adapts to the environment – It is robust and fault tolerant
![Page 7: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/7.jpg)
Biological neuron
• A neuron has– A branching input (dendrites)– A branching output (the axon)
• The information circulates from the dendrites to the axon via the cell body• Axon connects to dendrites via synapses
– Synapses vary in strength– Synapses may be excitatory or inhibitory
![Page 8: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/8.jpg)
What is an artificial neuron ?
• Definition : Non linear, parameterized function with restricted output range
x1 x2 xn
y
w0
w1 w2 wn
Activation functions
Linear Logistic
Activation functions++ +
++
+ ++++
+++
+ +++
++ +
++ +++
++
+++ +
+++ +
++Perceptron
• Rosenblatt (1962)
• Linear separation
• Inputs :Vector of real values
• Outputs :1 or -1
The perceptron algorithm converges if examples are linearly separable
![Page 9: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/9.jpg)
Learning
• The procedure that consists in estimating the parameters of neurons so that the whole network can perform a specific task
• 2 types of learning– The supervised learning
– The unsupervised learning
• The Learning process (supervised)– Present the network a number of inputs and their corresponding outputs– See how closely the actual outputs match the desired ones– Modify the parameters to better approximate the desired outputs
Idea : group typical input data in function of resemblance criteria un-known a prioriData clusteringNo need of a professor
The network finds itself the correlations between the dataExamples of such networks :
Kohonen feature maps
The desired response of the neural network in function of particular inputs is well known.A “Professor” may provide examples and teach the neural network how to fulfill a certain task
![Page 10: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/10.jpg)
Multi-Layer Perceptron (MLP)
• One or more hidden layers
• Sigmoid activations functions
1st hidden layer
2nd hidden layer
Output layer
Input data
Class1 Class2
![Page 11: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/11.jpg)
• Back-propagation algorithm
I f the jth node is an output unit
Credit assignment
Learning Rate : Momentum term to smoothThe weight changes over time
Error Back Propagation
-Used to train the MLP
-Uses gradient descent to minimise the
squared error between actual and desired outputs
-The error is summed over all inputs
-Error space can have local minima,
which traps gradient descent
MLP Learning method
![Page 12: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/12.jpg)
Back Propagation Algorithm
Initialise weights
Present input and target
Calculate the actual outputGiven this input
From last layer, work backwardsUpdating the weights
Input Pattern 1 Target Output 1
=
Input Pattern 2 Target Output 2
= 1000 “ Binary digit”
0100
=
Input Pattern n
![Page 13: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/13.jpg)
Properties of Neural Networks
• Supervised networks are universal approximators• Theorem : A Multi Layer Perceptron (MLP) with only one hiddenlayer has the
capability to act as a universal approximator, (Hornik et al., 1989)
• Type of Approximators– Linear approximators : for a given precision, the number of parameters grows
exponentially with the number of variables (polynomials)– Non-linear approximators (NN), the number of parameters grows linearly with the
number of variables
Three-Layer Arbitrary(Complexity
Limited by No.of Nodes)
A
AB
BB
A
![Page 14: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/14.jpg)
Face recognition using Neural Network
Problems
• Work done on a very selective set of face images, mostly:– In upright position– Lighting and background controlled– Either in frontal or profile view– Have no occlusions, facial hair
Image DataTrain-Test
Domain Transformation(Preprocessing)
Neural Networks model(Classifier)
Standard Face Recognition Architecture
Classified face resultFace recognition
Requirements
• Accurate• Efficient• Light invariant• Rotation invariant
and …
![Page 15: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/15.jpg)
Why Preprocessing ?• The curse of Dimensionality
– The quantity of training data grows exponentially with the dimension of the input space
– In practice, we only have limited quantity of input data • Increasing the dimensionality of the problem leads to give a poor representation
of the mapping• The preprocessing has a huge impact on performances of neural networks
Preprocessing methods • Normalization -Inputs of the neural net are often of different types with different orders of magnitude
(E.g. Pressure, Temperature, etc.)
- It is necessary to normalize the data so that they have the same impact on the model - Translate input values so that they can be exploitable by the neural network
Component reduction• Sometimes, the number of inputs is too large to be exploited• The reduction of the input number simplifies the construction of the model• Goal : Better representation of the data in order to get a more synthetic view without losing relevant information• Reduction methods (PCA, CCA(Curvilinear Components Analysis), etc.)
![Page 16: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/16.jpg)
Principal Components Analysis (PCA)
• Principle– Linear projection method to reduce the number of parameters – Transfer a set of correlated variables into a new set of uncorrelated variables– Map the data into a space of lower dimensionality– Form of unsupervised learning
• Properties– It can be viewed as a rotation of the existing axes to new positions in the space defined by
original variables– New axes are orthogonal and represent the directions with maximum variability
Faces viewed as vectors
Supposing a data set X with N data points, each of dimensions P
![Page 17: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/17.jpg)
The PCA - Eigenface
For training set X1, X2, …… XN
• Average face
• Difference vector (Zero Mean)
•Covariance matrix
•Choose i largest Eigenvalues“ i is the inherent dimensionality of the subspace governing the original image “
Here we will summarize the concept and method to finding principal factors:
XM = X -
First principal components
![Page 18: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/18.jpg)
= 0.9571 * -0.1945 * +0.0461 * +0.586 *
Training image set
Adjusted Training image set
Image mean PCA N -Eigenface
M-PCA
The PCA - Eigenface
![Page 19: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/19.jpg)
M-PCA
Training image set
Adjusted Training image set
Image mean PCA N -Eigenface
M-PCA Combining Classifiers
Training the classifiers
![Page 20: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/20.jpg)
Static committee machines (Ensemble):Static committee machines are ones where the responses of experts are combined without the mechanism seeing the input.Ensembles of classifiers are trained on different or similar data and using different or similar features. The classifiers are run simultaneously and their outputs are merged into one compound classification.2 main methods : “Ensemble averaging and Boosting”
Two main approach to committee machines (Combining classifiers, classifiers fusion):1-Static committee machines (Ensemble) 2- Dynamic committee machines (modular)
…
y2(n)
yL(n)
Input x(n)
Expert L
Expert1
Expert2 Combiner output
y1(n)
Neural Networks Ensembles
![Page 21: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/21.jpg)
Dynamic committee machines: input signal is directly involved in combining ouputs
Eg Mixtures of experts and hierarchical mixtures of experts
Gating network decides the weighting of each network
Dynamic committee machines:
y1(n)
…
Gating network
outputy2(n)
yL(n)
Input x(n)
g1(n)
g2(n)
gL(n)ExpertL
Expert1
Expert2
![Page 22: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/22.jpg)
Majority voting : In majority voting every voter has one vote that can be cast for any one candidate. The candidate that received the majority (i.e. more than half) of the votes, wins the election.Average voting : Average of vote and the candidate with the highest average wins the election.
Product rule voting :Eeach voter gives a confidence value for each candidate. Then all confidence values are multiplied per candidate. The candidate with the highest confidence product wins
Some of Ensemble Methods:
Classifiers Support for W1 Support for W2 Decision
C1 0.8 0.2 W1
C2 0.4 0.6 W2
C3 0.3 0.7 W2
C4 0.6 0.4 W1
C5 0.3 0.7 W2
MAJ - - W2
AVR 0.48 0.52 W2
PRO 0.01728 0.2352 W2
C1
C2
C3
C4
C5
CombinerW1
W2
![Page 23: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/23.jpg)
Purposed method:
Consider the Ensemble network system shown in this Fig. Here we see a set of M “level-0” networks . N1 to Nm whose outputs are combined using a “level-1” Network N* .
Training Rule: The idea is to train the “level-0” networks first and then examine their behavior when generalization. This provides a new training set which is used to the train “level-1” network.
…
…y1
y2
ym
yM-PCA
N1
N2
Nm
N*
……
level-0 level-1
![Page 24: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/24.jpg)
We have used the ORL database which contains a set of faces taken between April 1992 and April 1994 at the Olivetti Research Laboratory in Cambridge, UK.There are 10 different images of 40 distinct subjects. There are variations in facial expression (open/closed eyes, smiling/non-smiling), and facial details (glasses/no glasses). All the images were taken against a dark homogeneousbackground with the subjects in an up-right, frontal position, with tolerance for some tilting and rotation of up to about 20 degrees. There is some variation in scale of up to about 10.The images are grayscale with a resolution of 92*112.
Face Database
The set of 10 images for one subject, considerable variation can be seen
Experimental results and discussion
![Page 25: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/25.jpg)
Experimental results and discussion
Training image set
Adjusted Training image set
Image mean PCA N -Eigenface
Step 1
M-PCA Combining Classifiers
Step 2
Testing image set
Zero meanTesting images
M-PCA Combining Classifiers
Step 3Classified faced
![Page 26: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/26.jpg)
Principal Components
Net Topology
Best (%) Average(%)
1-25 25:40:40 89 87.7
1-30 30:80:40 88.5 87.9
1-35 35:80:40 90 89.7
1-40 40:40:40 89.5 88.9
1-50 50:40:40 89.5 86.6
1-100 100:60:40 86 84.1
1-200 200:80:40 84 82.2
Percentage correct classification on test set (200 faces – 5 times repetition)Different net topologies for the same input data
Principal Components
Net Topology
Best Average
1-40 40:40:40 89.5 88.9
1-40 40:20:40 89.5 83.8
1-40 40:80:40 90.5 89.1
1-40 40:100:40 89 87.6
![Page 27: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/27.jpg)
Manually selected committee
Averaging Majority voting
Purposed method
40:20:40 (5 nets) 90.5 88.15 93.35
40:40:40 (5 nets) 88.35 88.5 95
40:80:40 (5 nets) 92 90 97.15
Correct rate Percentage of
Some manually selected committees
…y1
y2
y5
yM-PCA
MLP1
MLP2
MLP5
MLP*……
level-0 level-1
![Page 28: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/28.jpg)
Conclusion
Three main strategies for making independent individual classifiers:
1. Using different procedure, for example different kind of classifiers or different parameters in an identical procedure. For example different MLP can be trained based on different initial weights or learning parameter or number of nodes.
2. Using different representation of input (or feature set), although this approach is practically effective in improving generalization, managing the procedure and analyzing is more complicated.(K.M. Ali and M.J. Pazzini,1995)
3. Using identical representation but different training sets
![Page 29: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/29.jpg)
AcknowledgmentsI Would like to thank prof. Shahshahani for helpful
comments and Olivertti ReseaerchLaboratory for develop and maintaining the ORL
database .
![Page 30: B.Mohabbati, R.Ebrahimpour, S.Kasaei, E.Kabir Department of Mathematics and Computer Science Amirkabir University of Technology School of Cognitive Sciences,](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d615503460f94a424d6/html5/thumbnails/30.jpg)
Tanks for your attention