REG. NO: F17/1437/2011



Project report submitted in partial fulfillment of the requirement for the award of the degree

of Bachelor of Science in Electrical and Electronic Engineering at the University of Nairobi

Date of Submission: 13/05/2016

Department of Electrical and Information Engineering



1 Introduction 1.1 Introduction to Automatic Class Attendance

Maintaining attendance is very important in all learning institutes for checking the performance of

students. In most learning institutions, student attendances are manually taken by the use of

attendance sheets issued by the department heads as part of regulation. The students sign in these

sheets which are then filled or manually logged in to a computer for future analysis. This method

is tedious, time consuming and inaccurate as some students often sign for their absent colleagues.

This method also makes it difficult to track the attendance of individual students in a large

classroom environment[1]. In this project, we propose the design and use of a face detection and

recognition system to automatically detect students attending a lecture in a classroom and mark

their attendance by recognizing their faces.

While other biometric methods of identification (such as iris scans or fingerprints) can be more

accurate, students usually have to queue for long at the time they enter the classroom[2]. Face

recognition is chosen owing to its non-intrusive nature and familiarity as people primarily

recognize other people based on their facial features[3]. This (facial) biometric system will consist

of an enrollment process in which the unique features of a persons’ face will be stored in a database

and then the processes of identification and verification. In these, the detected face in an image

(obtained from the camera) will be compared with the previously stored faces captured at the time

of enrollment.

1.2 Problem Definition.

The traditional manual methods of monitoring student attendance in lectures are tedious as the

signed attendance sheets have to be manually logged in to a computer system for analysis. This is

tedious, time consuming and prone to inaccuracies as some students in the department often sign

for their absent colleagues, rendering this method ineffective in tracking the students’ class

attendance. Use of the face detection and recognition system in lieu of the traditional methods will

provide a fast and effective method of capturing student attendance accurately while offering a


secure, stable and robust storage of the system records , where upon authorization; one can access

them for purposes like administration, parents or even the students themselves[4].

1.3 Objectives

The overall objective is to develop an automated class attendance management system comprising

of a desktop application working in conjunction with a mobile application to perform the following


To detect faces real time.

To recognize the detected faces by the use of a suitable algorithm.

To update the class attendance register after a successful match.

To design an architecture that constitutes the various components working harmoniously.

1.4 Scope of the project.

We are setting up to design a system comprising of two modules. The first module (face detector)

is a mobile component, which is basically a camera application that captures student faces and

stores them in a file using computer vision face detection algorithms and face extraction

techniques. The second module is a desktop application that does face recognition of the captured

images (faces) in the file, marks the students register and then stores the results in a database for

future analysis.

1.5 Justification.

This project serves to automate the prevalent traditional tedious and time wasting methods of

marking student attendance in classrooms. The use of automatic attendance through face detection

and recognition will increase the effectiveness of attendance monitoring and management.


This method could also be extended for use in examination halls to curb cases of impersonation as

the system will be able to single out the imposters who won’t have been captured during the

enrollment process. Applications of face recognition are widely spreading in areas such as criminal

identification, security systems, image and film processing[5]. The system could also find

applications in all authorized access facilities.


2 Literature Review 2.1 Digital Image Processing.

Digital Image Processing is the processing of images which are digital in nature by a digital

computer[6]. Digital image processing techniques are motivated by three major applications


Improvement of pictorial information for human perception

Image processing for autonomous machine application

Efficient storage and transmission.

2.1.1 Human Perception

This application employs methods capable of enhancing pictorial information for human

interpretation and analysis. Typical applications include; noise filtering, content enhancement

mainly contrast enhancement or deblurring and remote sensing.

2.1.2 Machine Vision Applications

In this, the interest is on the procedures for extraction of image information suitable for computer

processing. Typical applications include;

Industrial machine vision for product assembly and inspection.

Automated target detection and tracking.

Finger print recognition.

Machine processing of aerial and satellite imagery for weather prediction and crop


Facial detection and recognition falls within the machine vision application of digital image



2.2 Image Representation in a Digital Computer.

An image is a 2-Dimensional light intensity function

𝐟 (𝐱, 𝐲) = 𝐫 (𝐱, 𝐲) × 𝐢 (𝐱, 𝐲) - (2.0)

Where, , r x y is the reflectivity of the surface of the corresponding image point.

, i x y Represents the intensity of the incident light.

A digital image f(x, y) is discretized both in spatial co-ordinates by grids and in brightness by

quantization[7]. Effectively, the image can be represented as a matrix whose row, column indices

specify a point in the image and the element value identifies gray level value at that point. These

elements are referred to as pixels or pels.

Typically following image processing applications, the image size which is used is𝟐𝟓𝟔 × 𝟐𝟓𝟔,

elements, 𝟔𝟒𝟎 × 𝟒𝟖𝟎 pels or 𝟏𝟎𝟐𝟒 × 𝟏𝟎𝟐𝟒 pixels. Quantization of these matrix pixels is done at

8 bits for black and white images and 24 bits for colored images (because of the three color planes

Red, Green and Blue each at 8 bits)[8].

2.3 Steps in Digital Image Processing.

Digital image processing involves the following basic tasks;

Image Acquisition - An imaging sensor and the capability to digitize the signal produced

by the sensor.

Preprocessing – Enhances the image quality, filtering, contrast enhancement etc.

Segmentation – Partitions an input image into constituent parts of objects.

Description/feature Selection – extracts the description of image objects suitable for further

computer processing.

Recognition and Interpretation – Assigning a label to the object based on the information

provided by its descriptor. Interpretation assigns meaning to a set of labelled objects.

Knowledge Base – This helps for efficient processing as well as inter module cooperation.


Figure 2-1. A diagram showing the steps in digital image processing

2.4 Definition of Terms and History

2.4.1 Face Detection

Face detection is the process of identifying and locating all the present faces in a single image or

video regardless of their position, scale, orientation, age and expression. Furthermore, the detection

should be irrespective of extraneous illumination conditions and the image and video content[9].

2.4.2 Face Recognition

Face Recognition is a visual pattern recognition problem, where the face, represented as a three

dimensional object that is subject to varying illumination, pose and other factors, needs to be

identified based on acquired images[10].

Face Recognition is therefore simply the task of identifying an already detected face as a known

or unknown face and in more advanced cases telling exactly whose face it is[11].

2.4.3 Difference between Face Detection and Face Recognition

Face detection answers the question, Where is the face? It identifies an object as a “face” and

locates it in the input image. Face Recognition on the other hand answers the question who is this?

Or whose face is it? It decides if the detected face is someone known or unknown based on the

database of faces it uses to validate this input image[12].


It can therefore be seen that face detections output (the detected face) is the input to the face

recognizer and the face Recognition’s output is the final decision i.e. face known or face unknown.

2.5 Face Detection

A face Detector has to tell whether an image of arbitrary size contains a human face and if so,

where it is.

Face detection can be performed based on several cues: skin color (for faces in color images and

videos, motion (for faces in videos), facial/head shape, facial appearance or a combination of these

parameters. Most face detection algorithms are appearance based without using other cues.

An input image is scanned at all possible locations and scales by a sub window. Face detection is

posed as classifying the pattern in the sub window either as a face or a non-face. The face/non-

face classifier is learned from face and non-face training examples using statistical learning


Most modern algorithms are based on the Viola Jones object detection framework, which is based

on Haar Cascades.

2.6 Haar – Cascades.

Haar like features are rectangular patterns in data. A cascade is a series of “Haar-like features” that

are combined to form a classifier[14]. A Haar wavelet is a mathematical function that produces

square wave output.

Figure 2-2. Haar like Features [13]


Figure 2.2 shows Haar like features, the background of a template like (b) is painted

gray to highlight the pattern’s support. Only those pixels marked in black or white are used when

the corresponding feature is calculated[15].

Since no objective distribution can describe the actual prior probability for a given image to have

a face, the algorithm must minimize both the false negative and false positive rates in order to

achieve an acceptable performance[16]. This then requires an accurate numerical description of

what sets human faces apart from other objects. Characteristics that define a face can be extracted

from the images with a remarkable committee learning algorithm called Adaboost[17]. Adaboost

(Adaptive boost) relies on a committee of weak classifiers that combine to form a strong one

through a voting mechanism[18]. A classifier is weak if, in general, it cannot meet a predefined

classification target in error terms[7]. The operational algorithm to be used must also work with a

reasonable computational budget. Such techniques as the integral image and attentional cascades

have made the Viola-Jones algorithm[15] highly efficient: fed with a real time image sequence

generated from a standard webcam or camera, it performs well on a standard PC.

Figure 2-3. Haar-like features with different sizes and orientation [13]

The size and position of a pattern’s support can vary provided its black and white rectangles have

the same dimension, border each other and keep their relative positions. Thanks to this constraint,

the number of features one can draw from an image is somewhat manageable: a 24 × 24 image,

for instance, has 43200, 27600, 43200, 27600 and 20736 features of category (a), (b), (c), (d) and

(e) respectively as shown in figure 2.3, hence 162336 features in all[13].

In practice, five patterns are considered. The derived features are assumed to hold all the

information needed to characterize a face. Since faces are large and regular by nature, the use of

Haar-like patterns seems justified.


2.7 How the Haar – like Features Work.

A scale is chosen for the features say 24 × 24 pixels. This is then slid across the image. The

average pixel values under the white area and the black area are then computed. If the difference

between the areas is above some threshold then the feature matches[7].

In face detection, since the eyes are of different color tone from the nose, the Haar feature (b) from

Figure 2.3 can be scaled to fit that area as shown below,

Figure 2-4. How the Haar like feature of figure 2.3 can be used to scale the eyes

One Haar feature is however not enough as there are several features that could match it (like the

zip drive and white areas at the background of the image of figure 2.4). A single classifier therefore

isn’t enough to match all the features of a face, it is called a “weak classifier.” Haar cascades, the

basis of Viola Jones detection framework [16] therefore consist of a series of weak classifiers

whose accuracy is at least 50% correct. If an area passes a single classifier, it moves to the next

weak classifier and so on, otherwise, the area does not match.

2.7.1 Cascaded Classifier

Figure 2-5. several classifiers combined to enhance face detection


From figure 2.5, a 1 feature classifier achieves 100% face detection rate and about 50% false

positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20%

cumulative). A 20 feature classifier achieves 100% detection rate with 10% false positive rate (2%

cumulative)[17].Combining several weak classifiers improves the accuracy of detection.

A training algorithm called Adaboost, short for adaptive boosting[14], which had no application

before Haar cascades[14], was utilized to combine a series of weak classifiers in to a strong

classifier. Adaboost tries out multiple weak classifiers over several rounds, selecting the best weak

classifier in each round and combining the best weak classifier to create a strong classifier[7].

Adaboost can use classifiers that are consistently wrong by reversing their decision[7]. In the

design and development, it can take weeks of processing time to determine the final cascade


After the final cascade had been constructed, there was a need for a way to quickly compute the

Haar features i.e. compute the differences in the two areas. The integral image was instrumental

in this.

2.7.2 Integral Image

The Integral image also known as the “summed area table” developed in 1984 came in to

widespread use in 2001 with the Haar cascades[4]. A summed area table is created in a single pass.

This makes the Haar cascades fast, since the sum of any region in the image can be computed

using a single formula[17].

The integral image computes a value at each pixel (x, y) as is shown in figure 2.6, that is the sum

of the pixel values above and to the left of (x, y), inclusive. This can quickly be computed in one

pass through the image.

Figure 2-6. Pixel Coordinates of an integral image


Let A, B, C D be the values of the integral image at the corners of a rectangle as shown in figure


The sum of original image values within the rectangle can be computed.

𝑆𝑢𝑚 = 𝐴 − 𝐵 − 𝐶 + 𝐷 - (2.1)

Only three additions are required for any size of rectangle[17]. This face detection approach

minimizes computation time while achieving high detection accuracy[15]. It is now used in many

areas of computer vision[4] [7].


2.8 Improving Face Detection

Face detection can be improved by tuning the detectors parameters to yield satisfactory results.

The parameters to be adjusted are explained as follows.

2.8.1 Scale Increase Rate.

The scale increase rate specifies how quickly the face detector function should increase the scale

for face detection with each pass it makes over an image. Setting the scale increase rate high makes

the detector run faster by running fewer passes. If it is set too high it may jump quickly between

the scales and miss the faces. The default increase rate in OpenCV is 1.1. This implies that the

scale increases by a factor of 10 % each pass.

The parameters assume a value of 1.1, 1.2, 1.3 or 1.4.

Figure 2-7. Values of the integral Image on a rectangle


2.8.2 Minimum Neighbors Threshold

The minimum neighbor’s threshold sets the cutoff level for discarding or keeping rectangle groups

as either faces or not. This is based on the number of raw detections in the group and its values

ranges from zero to four.

When the face detector is called behind the scenes, each positive face region generates many hits

from the Haar detector as in Figure 2.8. The face region itself generates a large cluster of rectangles

that to a large extend overlap. The isolated detections are usually false detections and are discarded.

The multiple face region detections are then merged in to a single detection. The face detection

function does all this before returning the list of the detected faces. The merge step groups

rectangles that contain a large number of overlaps and then finds the average rectangle for the

group. It then replaces all the rectangles in the group with the average rectangle.

Figure 2-8. Lena’s image showing the list of rectangles [27]

2.8.3 Canny Pruning Flag

The Canny Pruning flag detection parameter is a flag variable that when set enables the face

detector to skip regions in the image that are unlikely to contain a face. The regions to be skipped

are usually identified by running an edge detector i.e. the canny edge detector over the image

before running the face detector. This greatly reduces computational overhead and eliminates false


positives. The choice of setting the flag or not is usually a tradeoff between speed and detecting

more faces.

2.8.4 Minimum Detection Scale

This detection parameter sets the size of the smallest face that can be searched in the input image.

The most commonly used size is 24 × 24. Depending on the resolution of the input image, the

small size may be a small portion of the input image. This would then not be helpful as its detection

would take up Central Processing Unit (CPU) cycles that could have been utilized for other


2.8.5 Expected Output of Face Detector on Test Images.

Figure 2.9. Shows the expected results after a successful face detection using the Viola Jones face


Figure 2-9. Expected result on images from the CMU – MIT faces database


2.9 Why Biometric Identification?

Human identification is a basic societal requirement for proper functioning of a nation. By

recognizing a face, you could easily detect a stranger or identify a potential breach of security.

In todays larger, more complex society it isn’t that simple with all the growing electronic

interactions. So it becomes even more important to have an electronic verification of a person’s


Until recently, electronic verification was done either based on something the person had in their

possession like an ID card, or on something they knew, like a password. The major problem is that

these forms of electronic identification are not very secure as they can be faked by hackers,

maliciously given away, stolen or even lost.

Therefore, the ultimate form of electronic verification of a person’s identity is biometrics. That is

using a physical attribute of a person to make an affirmative identification[4]. This is because such

attributes like finger print, Iris or face of a person cannot be lost, given away, stolen or forged by


2.10 Why Face Recognition in lieu of other Biometric


While traditional biometric methods of identification such as fingerprints, Iris scans and voice

recognition are viable, they are not always the best suited depending on where they will be used.

In applications such as Surveillance and monitoring of public places for instance, such methods

would end up failing because they are time consuming and inefficient especially in situations

where there are many people involved[29]. The cost of implementation is also a hindrance as some

components often have to be imported. This would lead to the setup of the system being expensive.

In general, we cannot ask everyone to line up and put their finger on a slide or an eye in front of a

camera or do something similar. Thus the intuitive need for an affordable and mobile system much

similar to the human eye to identify a person.


2.11 History of Face Recognition

Table 2.1. A table showing the brief history of the existing face recognition techniques

Year Authors Method

1973 Kanade First Automated System

1987 Sirovich & Kirby Principal Component Analysis (PCA)

1991 Turk & Pentland Eigenface

1996 Etemad & Chellapa Fisherface

2001 Viola & Jones Adaboost + Haar Cascade

2007 Naruniec& Skarbek Gabor Jets

Takeo Kanade is a Japanese computer scientist and one of the world's foremost researchers in

computer vision came up with a program which extracted face feature points (such as nose, eyes,

ears and mouth) on photographs[30]. These were then compared to reference data.

A major milestone that reinvigorated research was the PCA method by Sirovich and Kirby in 1987.

The Principal Component Analysis is a standard linear algebra technique, to the face recognition

problem, which showed that less than one hundred values were required to accurately code a

suitably aligned and normalized face image[7].

Turk and Pentland discovered that while using the Eigen faces technique, the residual error could

be used to detect faces in images, a discovery that enabled reliable, real time automatic face

recognition systems.

Although this approach was somehow constrained by environmental factors, it nonetheless created

significant interest in furthering development of automated face recognition techniques.

The viola Jones Adaboost and Haar cascade method brought together new algorithms and insights

to construct a framework for robust and extremely rapid visual detection. This system was most

clearly distinguished from previous approaches in its ability to detect faces extremely rapidly.

Operating on 384 y 288 pixel images, faces were detected at 15 frames per second on a 700MHz

Intel Pentium 3 Processor[15].


All identification or authentication technologies operate using the following four stages:

Capture: A physical or behavioral sample is captured by the system during Enrollment and

also in identification or verification process.

Extraction: unique data is extracted from the sample and a template is created.

Comparison: the template is then compared with a new sample.

Match/non-match: the system decides if the features extracted from the new Samples are a

match or a non-match.

2.12 Face Recognition Concepts

Although different approaches have been tried by several groups of people across the world to

solve the problem of face recognition, no particular technique has been discovered that yields

satisfactory results in all circumstances[29].

The different approaches of face recognition for still images can be categorized in to three main

groups namely:

Holistic Approach – In this, the whole face region is taken as an input in face detection

system to perform face recognition.

Feature-based Approach – where the local features on the face such as the noise and eyes

are segmented and then fed to the face detection system to ease the task of face recognition.

Hybrid Approach – In hybrid approach, both the local features and the whole face are used

as input to the detection system, this approach is more similar to the behavior of human

beings in recognizing faces[32].

There are two main types of face Recognition Algorithms [13]:

Geometric – this algorithm focuses at distinguishing features of a face.

Photometric – a statistical approach that distills an image into values and comparing the

values with templates to eliminate variances.


The Most Popular algorithms are [4];

1. Principal Component Analysis based Eigenfaces.

2. Linear Discriminate Analysis.

3. Elastic Bunch Graph Matching using the fisher face algorithm.

4. The Hidden Markov Model

5. Neuronal Motivated Dynamic Link Matching.

It should however be noted that the existing face recognition techniques are not one hundred

percent (100%) efficient just yet. Typical efficiencies range between 40% to 60%[1].

The computer-based facial recognition industry has made many useful advancements in the past

decade; however, the need for higher accuracy remains. Through the determination and

commitment of industry, government evaluations, and organized standards bodies, growth and

progress will continue, raising the bar for face-recognition technology[10].

2.13 Why PCA Based Eigenfaces method?

The PCA based eigenfaces method has the best overall recognition efficiency of up to seventy

percent (70%)[5], as compared to the rest of the existing face recognition algorithms. PCA also

has a low memory requirement, low computational complexity and takes less time to execute.

These factors along with the limited timeframe of implementing the project informed the choice

of PCA based Eigenfaces as the face recognition method.

Most systems in practice employ this technique by use of one camera that is fixed at one point in

a room (say the front of a classroom). This method is limited in terms of distance of the detected

faces and the quality. Our chosen method uses a mobile camera which will be hand held by the

user. The captured image will thus be of high quality minimizing false positives when it comes to

the recognition stage. It also improves the results by extracting the detected images within the

rectangle and converting them to grayscale. This will tremendously reduce inaccuracies due to

posture, illumination, background noise and different recognition results for light and dark skinned

detected faces.


2.14 PCA and its relation to Face Recognition

Principal Component Analysis (PCA) is a mathematical procedure that uses an orthogonal

transformation to convert a set of values of possibly correlated M variables (faces) in to a set of

values of K uncorrelated variables called principle components (eigenvectors)[29].

The number of principal components (Eigenfaces) is always less than or equal to the number of

original variables (face images) i.e. K ≤ M.

This transformation is defined in such a way that the first eigenface shows the most dominant

“direction/feature” of the training set of images and each succeeding component in turn shows the

next most possible dominant “direction/feature”; all under the constraint that it be uncorrelated to

the preceding Eigenface[5].

To reduce the calculations needed for finding these eigenfaces, the dimensionality of the original

training set is reduced before they are calculated.

Since eigenfaces sow the “directions” of data and each preceding eigenface shows less “directions”

and more “noise”, only few first eigenface (say K) are selected whereas the last of the eigenfaces

are discarded.

Figure 2-10. The average face, the first and last eigenface which is mainly noise

Figure 2.10 shows the average face and the first and last eigenfaces that were generated from a

collection of 30 images each of 4 people. The average face shows the smooth face structure of a

generic person, the first few eigenfaces will show some dominant features of faces, and the last


eigenfaces (e.g.: Eigenface 119) are mainly image noise. Figure 2.11 shows the first 32 eigenfaces.

Images 32 - 119 are discarded because they are mainly noise[11].

Figure 2-11. The dominant 32 eigenfaces representing all the images in the set [11]

Representing an image as a combination of K Eigenfaces reduces the number of values needed to

recognize it from M to K. This makes the recognition process faster and free of errors caused by


2.15 Working Principle of PCA Eigenface for Face


Given a training set of M images as shown in the figure 2.12 and an unknown face all of the same

size, PCA Eigenface method aims at representing the face image as a linear combination of a set

of Eigenfaces/Eigenvectors[4] as in figure 2.11.


Figure 2-12. A training set consisting of M images

These Eigenfaces (eigenvectors) are in fact the principle components of the training set of face

images generated after reducing the dimensionality of the training set.

Once Eigenfaces are selected, each training set image is represented in terms of these eigenfaces.

When an unknown face comes for recognition, it is also represented in terms of the selected


The eigenface representation of the unknown face is compared with that of each training set face

image. The distance between them is calculated. If the distance is above some specified threshold,

then it recognizes the unknown face as that person. PCA Eigenfaces method considers each pixel

in an image as a separate dimension. E.g.

A 50 * 50 image = 2500 pixels thus has 2500 dimensions. This method does not work on images

directly, it first converts them to a matrix (vector) form.

2.16 PCA Face Recognition Algorithm.

Step 0: Create a training set and load it.

The training set consists of total M images as shown in Figure 2.12. Where each image is of size

𝑁 × 𝑁

Step 1: Convert the images in the training set to face vectors.

The images are converted in to column vectors. Let a face image be a two dimensional N by N

array of 8 bit intensity values. An image may also be considered as a vector of dimension N

squared. So that a typical image of size 50 by 50 becomes a vector of dimension 2500 dimensional

space. An ensemble of images, then, maps to a collection of points on this huge space[12]. All


images are then converted in to column vectors of N squared rows by one. These are represented

by Ti as in figure 2.13.

Figure 2-13. The free space of vectors obtained by converting the N x N images

Step 2: Normalize the face vectors i.e. remove all the common features. This is done by calculating

the average face vector and then subtracting it from all the face vectors. Figure 2.14 shows the

average face (U) which is then subtracted from all the face vectors.

Step 3: Calculate the eigenvectors from the covariance matrix

Figure 2-14. The average face is represented by the blue column labelled U


We now calculate the eigenvectors from the covariance matrix since in PCA based Eigenfaces

method the principal components are obtained from the covariance matrix. The covariance matrix

C is given by

. - (2.2)TC A A

Where , , 1 2 3 M

A Ø Ø Ø Ø and is of dimension 2 - (2.3)N M

C . . X T 2 2 2 2A A N X M M N N X N - (2.4)

This is a very huge matrix as depicted in Figure 2.15 where N = 50, 2 2500N . This would yield

2500 2500 Eigenfaces

Figure 2-15. The covariance matrix

To find K eigenvectors from M = 2500 eigenvectors with K<M would take a lot of time due to the

many calculations that would need to be done and hence the need for Step 3 Dimensionality


Step 4: Reduce the dimensionality of the training set.

To reduce these calculations on the needed eigenfaces, we calculate them from a covariance matrix

of reduced dimensionality.



2 2. . 100 100 - (2.5)




This would give 100 eigenvectors as shown in Figure 2.16.

Figure 2.16.

Step 5: Select K best eigenvectors such that K < M and can represent the whole training set. The

yellow bars represent the selected K Eigen vectors that are sufficient to represent the whole training

set as in Figure 2.17.

Figure 2-16. The reduced dimensionality matrix in the low dimensional space


Figure 2-17. The selected K eigenvectors that can represent the whole set of images

Step 6: Convert lower dimensional K eigenvectors into original face dimensionality as in Figure


Figure 2-18. The selected K eigenvectors being mapped in the high order dimensional space


- (2.6)i iU AV

iU = ith eigenvector in the higher dimensional space

iV = ith eigenvector in the lower dimensional space

Step 7: Represent each image as a linear combination of all K eigenvectors plus the mean/average


Figure 2-19. Training set images as a weighted sum of the eigenvectors and the average face

Step 8: For each image in the training set calculate and store the associated weight vectors.[5]


A weighted face vector Ω which is the eigenface representation of the ith face weight vector for

each face is calculated as shown in Figure 2.20


The face recognition algorithm flow chart of Figure 2.21 is basically a method of checking which

training image is most similar to the input image, out of the whole training set[5].

Figure 2-20. Weights of the eigenvectors/eigenfaces


2.16.1 Advantages of Using PCA[2]

Low memory requirement.

Low computational Complexity

Better recognition accuracy

Less execution time

Figure 2-21. The PCA based Eigenfaces face recognition algorithm flow chart


2.17 Training Set.

A faces database also known as a training set in machine learning is a collection of faces of people

or subjects that your system is built to recognize [4]. These training sets store face label pairs i.e.

face images with their names. The face images are obtained from multiple snaps of people who

the system has to recognize such that each person’s set of snaps covers all possible facial

expressions, posture and light conditions that may be possible at recognition time. The camera

should be able to produce high quality pictures to be stored in the database. A faces database can

be implemented as a folder in Windows or as a table Database Management System (DBMS) like

MS Access.

Limitations of a Faces Database implementation as a Folder

Not secure as faces are visible, one could delete them

Difficult to associate labels with the proper images.

Advantages of Folder Implementation

Simple code to access data

It is faster to manage

No database design Dependency

We will implement the database as a table in MS Access owing to the following;

Benefits of using MS Access.

Faster in performance

Powerful as a proper database as various fields such as Name, Registration number, Year

of Birth etc. can be linked in tables.

Data is secure



Managing and adding faces requires a proper software interface i.e. write code to access


System becomes design dependent i.e. reads columns in the proper order.

Logical Steps to store Faces to a Database.

Detect and extract a face

Label the extracted face

Insert the extracted face label pair to the faces database

2.18 Applications

Table 2.2. Application areas of face recognition [9].


3 Methodology and Design 3.1 System Design

In this design, several related components in terms of functionality have been grouped to form

sub-systems which then combine to make up the whole system. Breaking the system down to

components and sub-systems informs the logical design of the class attendance system.

3.2 General Overview

The flow diagram of Figure 3.1 depicts the systems operation.

Figure 3-1. Sequence of events in the class attendance system.


From Figure 3.1, it can be observed that most of the components utilized are similar;( the Image

acquisition component for browsing for input images, the face detector and the faces database for

storing the face label pairs) only that they are employed at the different stages of the face

recognition process.

3.3 Training Set Manager Sub System

The logical design of the training set management sub-system is going to consist of an image

acquisition component, a face detection component and a training set management component.

Together, these components interact with the faces database in order to manage the training set.

These are going to be implemented in a windows application form.

3.4 Face Recognizer Sub System.

The logical design of the Face Recognizer will consist of the image acquisition component, face

recognizer and face detection component all working with the faces database. In this the image

acquisition, and face detection component are the same as those in the Training set manager sub

system as the functionality is the same. The only difference is the face recognizer component and

its user interface controls. This will load the training set again so that it trains the recognizer on

the faces added and show the calculated eigenfaces and average face. It should then show the

recognized face in a picture box.

3.5 Full Mobile Module Logical Design.

This Android application module will consist of a camera component, android face detector

component and a SQLite Database component to store the detected images.






Camera Component

Figure 3-2. Logical design of the mobile module


The Android face detector and camera components will work to detect a face from the camera

input image. The image will then be captured and saved in the SQLite database. This will be

retrieved by the image acquisition component of the desktop module.

3.6 System Architecture.

The figure below shows the logical design and implementation of the three desktop subsystems.

Figure 3-3. The logical design of the Desktop Module Subsystems


3.7 Functions of the two Sub –Systems

The functionalities of the components are depicted in the block diagrams of figure 3.4. The face

recognizer system will consist of two major components i.e. the training set manager and the face

recognizer. These two components will share the Faces database, the image acquisition and the

face detector components; as they are common in their functionality.

We will therefore partition the system in to two subsystems and have their detailed logical designs

to be implemented.

Database of

Faces (This

contains the

training set)

Image Acquisition

(Gets the input

image with the

human face )

Face Detector


(detects faces and

Face Recognizer

Recognises the

detected faces from

the trained data

Trains the recognizer

on the training set

Loads the training set

Shows the calculated

average face and the


Connects to the faces




Connects to faces


Loads the training

set to display

present faces

Deletes a face

from the training


Updates a face in

Figure 3-4. A block diagram showing functions of the components.


3.8 Full Systems Logical Design

3.9 Tools

The following tools will be used in the implementation of the designed system. They’ve been

divided in to two categories; Mobile and Desktop tools. The mobile tools are the components that

will aid in the implementation of the Mobile module. This module is responsible for capturing the

students’ images in a classroom environment and then storing them for further processing by the

desktop module. The desktop tools are components; hardware or software that will be utilized in

the actual development of the desktop module. The desktop module also connects to the class

attendance register which is implemented as a database management system.

Image Acquisition




Face Detector


(detects faces)




Face Detector


(detects faces)






Figure 3-5. A logical design of the whole system


3.9.1 Mobile Tools

The Mobile Module will utilize OpenCvs’ library to implement face detection by use of

the frontal Haar Cascade face detector in either Android studio or Eclipse.

OpenCV for Android Library – The current stable version as at now is 3.1.0

OpenCV (Open Source Computer Vision) is a library of programming functions mainly

aimed at real-time computer vision, originally developed by Intel research center in Russia.

The library is cross platform and free for use under the Open source BSD license and hence

it’s free for both academic and commercial use.

Android Studio/ Eclipse IDE

Android Studio is the official IDE for Android application development, based on IntelliJ

IDEA. The Eclipse IDE, although no longer offering support also worked after installing

the ADT (Android Developing Tools) plugin and the android Native Development Kit

(NDK) so as to run the native OpenCV code written in C. This yields satisfactory results

during the preliminary tests.

3.9.2 Desktop Tools

EmguCV Library – libemgucv – windows-x64-2.21.1150

EmguCV is a cross platform .Net wrapper to the OpenCV image processing library.

Allowing OpenCV functions to be called from .NET compatible languages such as C#,

VB, VC++, Iron Python etc. The wrapper can be compiled by Visual Studio, Xamarin

Studio and Unity, it can run on Windows, Linux, Mac OS X, iOS, Android and Windows


OpenCV/EmguCV uses a type of face detector called a Haar Cascade. The Haar Cascade

is a classifier (detector) trained on thousands of human faces.

Visual Studio 2015

Visual Studio 2015 Community Edition on a 64 bit Win 10 Pro is able to build and run the

solution examples after a proper configuration of EmguCV. The desktop module will

utilize the EmguCV library in Visual Studio 2015 to implement the two sub-systems

(Training set manager and Face recognizer) together with face detector in windows form.

The MS Access database will be designed in MS Office Suite 2013.


4 Results and Analysis 4.1 User Interface of the system.

Faces Database Editor.

The faces database editor adds faces in the training set. The image is acquired from the highlighted

box number 1 as shown in Figure 4.1and displayed as is on step 2 on a picture box. The Regions

of Interest (ROI) i.e. face (es) in the image will then automatically be detected by drawing a light

green rectangular box. In step 3 we give the extracted grayscale face from the image a face label

and then add them to the training set. In step 4 we can then modify the face label pairs in the event

they were wrongly captured or even delete the faces if they are not as per the standards. Finally

step 5 prepares us for the recognition stage.

Figure 4-1. The Training set editor



The Face Recognizer.

The face recognizer compares the input face in the image captured with the faces captured during

enrollment. If it is a match it then retrieves the name associated with the input face.

Step 1 is to train the recognizer to be able to identify a face as either known or unknown. Step two

selects the source of the image with the face to be recognized. This could be from a live camera

feed or a folder with captured images. The input image with the face is then displayed in the

recognizer picture box 3 as shown in Figure 4.2. The name of the input face in the image is then

displayed as shown in Step 4. The returned name of the input face, date and time are then utilized

in populating the records in the attendance register database. Clicking the button of step 6 displays

the register as shown in Figure 4.3. The highlighted step 5 displays the computed average and

Eigen faces. The arrows are used to navigate through the Eigen faces. The “View Grid” button

displays the Eigen faces/vectors that had been computed from the covariance matrix in a grid form

Selecting Camera feed as the source of the input image pops up the window of Figure 4.4. The

images in the video feed are automatically detected, tracked and recognized. Images can also be

added to the database from the live camera feed.

Figure 4-2. The Face recognizer


Figure 4-3. The Attendance register.

From Figure 4.4 below, the highlighted box 1 shows the current camera view/scene. The faces and

eyes in the images are automatically detected as indicated by the rectangular boxes around them.

The detected face is extracted and compared with those in the database. Upon a successful match,

the name associated with the face is then displayed on the upper edge of the rectangular box. The

number of faces in the scene as well as their corresponding names are also shown on the

highlighted box number 2. The Face Adder box 3 can also be used to add faces to the database.


Figure 4-4. The live camera feed window

4.2 Face Detection

For group photos a Minimum Neighbors’ detection tuning parameter of 3.0 yielded the best overall

performance as indicated in Figure 4.5 where the physical count is 53.

The face marked by a red hexagon is not detected in the Min Neighbors’ setting of 4. This is

because the face is not fully displayed. Four is the highest setting which strictly returns frontal

images. The second lady on the first row is not detected in either of the settings because her face

is skewed to the right. The face detector only works with frontal images. 52 out of 53 images were

successfully detected.


Figure 4-5. Comparison between Minimum Neighbors setting of three and four

Figure 4.6 shows a group photo with a minimum neighbors setting of 1.0 and 2.0. Tuning the

minimum neighbors setting to 1 returns the number of faces in the images as 8; different from the

physical count of 5. This is because the detector returned the slightest resmblance to a face as an

actual face and hence the three fase detections marked in red circles as shown in Figure 4.6. Using

the same image from class and incrementing the setting to 2.0 returned the number of detected

faces as 5 which corresponded with the physical count.Increasing the setting further to 4.0 reduced

the number of detected faces to three.


Figure 4.6 (a) Figure 4.6 (b)

Figure 4.6 (c)

Figure 4-6. Min Neighbors setting of 1.0, 2.0 and 3.0 respectively on an image from class.


Figure 4-7. A Minimum detection scale of 200

A minimum detection scale of 25 had the best overall performance for very large group photos in

terms of speed. Increasing the scale to 200 as shown in Figure 4.7 tremendously reduces the time

taken to return the number of faces in an image. The minimum detection scale also makes it

possible to be able to detect and recognize faces over longer and shorter distances of recognition

by decreasing and increasing the scale respectively. Low detection scales waste Central Processing

Unit cycles if the size of the faces in the image is large.

The system had 100% face detection rate for different frontal faces; local as well as faces from

standard faces databases like the Yale faces. The system was also able to detect bearded faces as

well as faces with glasses.

4.3 Face Recognition

In order to improve the recognition efficiency of the system, nine photos for each person from the

standard Yale faces database were chosen for training, the remaining two photos were chosen for

the testing set. Out of the fifteen subjects from the Yale faces database, twelve faces were correctly

recognized. This was proportional to 80% accuracy. The faces of Figure 4.8 were not properly



Figure 4-8. The Yale database faces that were not properly recognized

Out of the fourteen faces of figure 4.9, ten were successfully recognized corresponding to a

recognition accuracy of about 71.43%. The main cause of false recognition was the strength of the

trained data and the illumination of the image. Face recognition is a form of machine learning and

thus the larger and diverse the faces in the. training set, the stronger the trained data used in

recognizing faces.

Having several diverse faces of the same person with different facial expressions possible at the

time of recognition creates strong training data and increases the accuracy of recognition. The

lighting conditions present at the time of capturing the image to be recognized also affects the

recognition results as is the case in Figure 4.8 (a) and (c). Two closely identical people could also

be recognized as one person unless the training data is strong.


Figure 4-9. Images from class

Out of 60 faces in the database, tests were done for several subsets exclusive of the Yale dataset.

The results obtained were tabulated as in Table 5.0. The percentage recognition rate was computed

as the average of the percentages for the different subsets. Faces with or without glasses had no

effect on the recognition rates. The mean percentage recognition rate was obtained to be 80.22%.

Center light faces had the best overall recognition rate at 90%. The primary issues facing most of

the face detection and recognition systems that are in use today are rotation, pose, distance of

recognition and illumination. These reduce the efficiency of the system unless performed under

some necessary constraints. These constraints would involve positioning the subjects at specific

positions, which in a real world classroom environment would be very hard and not to mention

time consuming where the number of subjects involved is large.


Table 5.0. Recognition results for various datasets

Datasets No of








% Correct




10 10 9 90



15 15 11 73.3



15 15 12 80



10 10 7 70



10 10 8 80



20 20 17 85



30 30 25 83.3

With the help of a divergent combination of techniques and algorithms, this system helps us to

achieve desired results with better accuracy. The provision of variable minimum detection scale

eliminates the issue of distance for detection and recognition for both up close and group images.

This has improved the face detection accuracy for upright frontal faces to 100% and consequently

improved the face recognition accuracy from the typical efficiencies of 70%. Similarly, the

minimum neighbors’ setting has tremendously improved face detection accuracy.

Extracting and converting the rectangular part of the detected face instead of the whole image

eliminates the effects of background noise on face detection improving the accuracy of the system.

The camera in the system is used such that it only captures the frontal images so the problem of

pose is not an issue. Histogram equalization is applied to the input images, this ensures that the


output images are of uniform distribution of intensities through the reassignment of the intensity

pixels. The input images of varying illumination are thus all enhanced in detail, this contributes

into better face recognition results.

Figure 4.10 shows the first 32 Eigen faces generated from a collection of 50 faces each of five

people. The first few eigenfaces show dominant features of faces and the last eigenfaces from 196

to 247 are mainly image noise as shown in Figure 4.11 and are therefore discarded. The average

face of Figure 4.12 obtained shows the smooth face structure of a generic human being.

Figure 4-10. The first 32 Eigen faces

From Figures 4.10 and 4.11, it’s seen that the first eigenface shows the most dominant facial

features of the training set images. The succeeding eigenfaces (principal components) in turn show

the next highly probable facial features and more noise. Out of the 247 training images, 195

principal components together with the average face are enough to fully reconstruct the complete

training set. We were therefore able to convert a set of correlated face variables (M) in to a set of

values of K uncorrelated variables called principle components (eigenvectors)[29]. The number of

eigenfaces was noted to be less than the original face images i.e. K < M, In accordance with PCA.


Figure 4-11. The last Eigenfaces in the training set

Figure 4-12. The average face

From the Eigen faces obtained in the face recognition stage, it was interesting to discover that the

principal components analysis can be used for image compression as evidenced by the dominant

number of Eigen faces that can comfortably represent all images in the training set. Out of 247

images in the training set, only 195 faces together with the average faces are required to fully

reconstruct the 247 faces in the set.


5 Conclusion and Recommendation.

It can be concluded that a reliable, secure, fast and an efficient class attendance management

system has been developed replacing a manual and unreliable system. This face detection and

recognition system will save time, reduce the amount of work done by the administration and

replace the stationery material currently in use with already existent electronic equipment.

There is no need for specialized hardware for installing the system as it only uses a computer and

a camera. The camera plays a crucial role in the working of the system hence the image quality

and performance of the camera in real time scenario must be tested especially if the system is

operated from a live camera feed.

The system can also be used in permission based systems and secure access authentication

(restricted facilities) for access management, home video surveillance systems for personal

security or law enforcement.

The major threat to the system is Spoofing. For future enhancements, anti- spoofing techniques

like eye blink detection could be utilized to differentiate live from static images in the case where

face detection is made from captured images from the classroom. From the overall efficiency of

the system i.e. 83.1% human intervention could be called upon to make the system foolproof. A

module could thus be included which lists all the unidentified faces and the lecturer is able to

manually correct them.

Future work could also include adding several well-structured attendance registers for each class

and the capability to generate monthly attendance reports and automatically email them to the

appropriate staff for review.



[1] V. Shehu and A. Dika, “Using Real Time Computer Algorithms in Automatic Attendance

Management Systems.” IEEE, pp. 397 – 402, Jun. 2010.

[2] K. Susheel Kumar, S. Prasad, V. Bhaskar Semwal, and R. C. Tripathi, “Real Time Face

Recognition Using AdaBoost Improved Fast PCA Algorithm,” Int. J. Artif. Intell. Appl., vol.

2, no. 3, pp. 45–58, Jul. 2011.

[3] Prof. P.K Biswas, Digital Image Processing. .

[4] S. Z. Li and A. K. Jain, Eds., Handbook of face recognition. New York: Springer, 2005.

[5] N. Mahvish, “Face Detection and Recognition,” Few Tutorials, 2014. .

[6] Anil K Jain, Lin Hong, Sharath Pankanti, and Ruud Bolle, Biometric Identification. IEEE,


[7] N. Tom, Face Detection, Near Infinity - Podcasts, 2007. .

[8] T. Kanade, Computer recognition of human faces. Basel [etc.]: Birkhäuser, 1977.

[9] A. L. Rekha and H. K. Chethan, “Automated Attendance System using face Recognition

through Video Surveillance,” Int. J. Technol. Res. Eng., vol. 1, no. 11, pp. 1327–1330, 2014.

[10] I. Kim, J. H. Shim, and J. Yang, “Face detection,” Face Detect. Proj. EE368 Stanf. Univ.,

vol. 28, 2003.

[11] E. Shervin, “OpenCV Computer Vision,” 03-Oct-2010. .

[12] T. Matthew and A. Pentland, Eigenfaces for Recognition, vol. 3, Volume 3, Number 1 vols.

Vision and Modelling Group, The Media Laboratory, MIT: Journal of Cognitive

Neuroscience, 1991.

[13] Y.-Q. Wang, “An Analysis of the Viola-Jones Face Detection Algorithm,” Image Process.

Line, vol. 4, pp. 128–148, Jun. 2014.

[14] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” J.-Jpn. Soc. Artif.

Intell., vol. 14, no. 771–780, p. 1612, 1999.

[15] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57,

no. 2, pp. 137–154, 2004.

[16] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57,

no. 2, pp. 137–154, 2004.

[17] M. Fuzail, H. M. F. Nouman, M. O. Mushtaq, B. Raza, A. Tayyab, and M. W. Talib, “Face

Detection System for Attendance of Class’ Students.”

[18] Y. Freund and R. E. Schapire, “A desicion-theoretic generalization of on-line learning and

an application to boosting,” in Computational learning theory, 1995, pp. 23–37.



The Recognizer Function

private void DetectAndRecognizeFaces() Image<Gray, byte> grayframe = TestImage.Convert<Gray, byte>(); //Assign user-defined Values to parameter variables: MinNeighbours = int.Parse(comboBoxMinNeigh.Text); // the 3rd parameter WindowSize = int.Parse(textBoxWinSize.Text); // the 5th parameter ScaleIncreaseRate = Double.Parse(comboBoxSclnRte.Text); //the 2nd parameter //detect faces from the gray-scale image and store into an array of type 'var',i.e 'MCvAvgComp[]' var faces = grayframe.DetectHaarCascade(haar, ScaleIncreaseRate, MinNeighbours, HAAR_DETECTION_TYPE.DO_CANNY_PRUNING, new Size(WindowSize, WindowSize))[0]; //MessageBox.Show("Total Faces Detected: " + faces.Length.ToString()); Bitmap BmpInput = grayframe.ToBitmap(); Bitmap ExtractedFace; // an empty "box"/"image" to hold the extracted face. Graphics g; //draw a sandybrown rectangle on each detected face in image foreach (var face in faces) //locate the detected face & mark with a rectangle TestImage.Draw(face.rect, new Bgr(Color.SandyBrown), 3); CamImageBox.Image = TestImage; //set the size of the empty box(ExtractedFace) which will later contain the detected face ExtractedFace = new Bitmap(face.rect.Width, face.rect.Height); //assign the empty box to graphics for painting g = Graphics.FromImage(ExtractedFace); //graphics fills the empty box with exact pixels of the face to be extracted from input image g.DrawImage(BmpInput, 0, 0, face.rect, GraphicsUnit.Pixel); // On the "Recognised As" ImageBox //draw the input image result = TestImage.Copy(face.rect).Convert<Gray, byte>().Resize(100, 100, Emgu.CV.CvEnum.INTER.CV_INTER_CUBIC); if (trainingImages.ToArray().Length != 0) MCvTermCriteria termCrit = new MCvTermCriteria(ContTrain, 0.001); recognizer = new EigenObjectRecognizer( trainingImages.ToArray(), //database face image list labels.ToArray(), //database face name list 5000, ref termCrit); camAverFace.Image = recognizer.AverageImage; camEigenFaceBox.Image = recognizer.EigenImages[fNo];


btnPrev.Enabled = true; btnNext.Enabled = true; btnViewGrid.Enabled = true; try StudentName = recognizer.Recognize(result); catch (Exception ex) MessageBox.Show(ex.ToString()); TestImage.Draw(StudentName, ref font, new Point(face.rect.X - 2, face.rect.Y - 2), new Bgr(Color.LightGreen)); lblResult.Text = StudentName; //"Match Found" label to display the returned label of the face recognised imageFound.Image = TestImage; //display the input image on the "Recognised As" picture box AddRecordToDB(lblResult.Text); //Add the fetched label of the recognised image on the Attendance Register else MessageBox.Show("face unknown - Please train the images first"); DAdapter.Update(ARegTable);

Frame Processing

private void ProcessFrame(object sender, EventArgs arg) // fetch the frame captured by camera Image<Bgr, Byte> ImageFrame = capture.QueryFrame(); //line 1 if (ImageFrame != null) // convert the image in to grayscale Image<Gray, byte> grayframe = ImageFrame.Convert<Gray, byte>(); var faces = grayframe.DetectHaarCascade(haar, ScaleIncreaseRate, MinNeighbours, HAAR_DETECTION_TYPE.DO_CANNY_PRUNING, new Size(WindowSize, WindowSize))[0]; if (faces.Length > 0) // MessageBox.Show("Total Faces Detected: " + faces.Length.ToString()); Bitmap BmpInput = grayframe.ToBitmap(); Bitmap ExtractedFace; //empty Graphics FaceCanvas; ExtFaces = new Bitmap[faces.Length]; foreach (var face in faces)


ImageFrame.Draw(face.rect, new Bgr(Color.RoyalBlue), 3); //set the size of the empty box(ExtractedFace) which will later contain the detected face ExtractedFace = new Bitmap(face.rect.Width, face.rect.Height); //set empty image as FaceCanvas, for painting FaceCanvas = Graphics.FromImage(ExtractedFace); FaceCanvas.DrawImage(BmpInput, 0, 0, face.rect, GraphicsUnit.Pixel); ExtFaces[faceNo] = ExtractedFace; faceNo++; faceNo = 0; // to display extracted faces in the Extfaces picture box pbCollectedFaces.Image = ExtFaces[faceNo]; btnAddtoTS.Enabled = true; txtBoxFaceName.Enabled = true; btnNext.Enabled = true; btnPrev.Enabled = true; // show the image in the emguCv imagebox CamImageBox.Image = ImageFrame; //line 2

top related