face detection in digital imagery using computer vision ... · face detection in digital imagery...

Face Detection in DigitalImagery Using Computer

Vision and Image Processing

Thomas Rondahl

December 1, 2011

Bachelor’s Thesis in Computing Science, 15 credits

Supervisor at CS-UmU: Johanna BjorklundExaminer: Pedher Johansson

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Abstract

The aim of this thesis was to add an increased detection rate for profile/partial faces whileincreasing the stability and run-time of the system.

By adding a failure fault limit to an existing implementation of a face detection systemapplication and a tolerance limit for detection time, a desired throughput for detectedobjects could be established.

The results were obtained through an empirical analysis of test data which was comparedbetween the implementation done for this thesis and the older implementation. The resultsshowed an increase in detected faces (in low sized images) by 10% while also increasing thenumber of false-positives by 0.725 detections per average image.

In large size image cases, an automatic scaling functionality was added, to decrease detectiontime and decrease false-negatives. The results indicated a decrease in average detectiontime from (old implementation) ≈15 seconds to ≈2 seconds, while still increasing positivedetection with 23%, from an average of 42% to 65%. False-positives were also decreasedfrom 5.8 to 0.2 detections per average image used in test.

Contents

1 Introduction 1

2 Problem Description 3

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Introduction to Computer Vision and Image processing 5

3.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.1 Techniques used for the FDS . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.2 Viola and Jones detection . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2.3 A neural network for face detection . . . . . . . . . . . . . . . . . . . . 8

3.2.4 Cascade classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Nested Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.5 Haar-like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Haar wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.6 Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 System structure 17

4.1 Work flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Detection conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.2 Loading nested cascades . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Experiments 21

iii

iv CONTENTS

5.1 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6 Results 23

6.1 Unmodified Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.1.1 High resolution images . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.1.2 Low resolution images . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.2 Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.2 Example study number 1 . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.3 Example study number 2 . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.2.4 Summary of study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

HD images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

LD images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7 Conclusions 33

7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

8 Acknowledgments 35

References 37

List of Figures

3.1 Neural network pre-process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

(a) Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

(b) Grey and scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

(c) Histogram equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Rejection cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Haar-like features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(a) Center feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(b) Basic feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(c) Line feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(d) Basic feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(e) Basic feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(f) Edge feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

(g) Line feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Rotated Haar-like features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

(a) Line feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

(b) Edge feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

(c) Border feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5 Haar-like feature placed on an ROI. . . . . . . . . . . . . . . . . . . . . . . . 15

(a) Edge feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

(b) Line feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

(c) Basic feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6 Haar wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.7 Feature generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Overlapping detection areas . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Overlapping detection areas . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Example system flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 Nested cascade load decision. . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.1 Detection using auto-scaled and non-scaled . . . . . . . . . . . . . . . . . . 26

(a) Scaled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

v

vi LIST OF FIGURES

(b) Non-scaled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 Non-scaled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.3 Detection using auto-scaled and non-scaled . . . . . . . . . . . . . . . . . . 27

(a) Scaled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

(b) Non-scaled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.4 Non-scaled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.5 Image Set i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(a) Image from Set i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(b) Image from Set i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(c) Image from Set i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(d) Complete detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(e) Three missed detections . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

(f) Complete detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.6 Image Set ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(a) Image from Set ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(b) Image from Set ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(c) Image from Set ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(d) Good detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(e) OK detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

(f) Showing one false-positive . . . . . . . . . . . . . . . . . . . . . . . . . . 31

List of Tables

3-1 Cascade setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6-1 Scaling Table for image Figure 6.1. . . . . . . . . . . . . . . . . . . . . . . . 25

6-2 Times and false-positive for Table 6-1. . . . . . . . . . . . . . . . . . . . . . 25

6-3 Scaling Table for image Figure 6.3. . . . . . . . . . . . . . . . . . . . . . . . 25

6-4 Times and false-positive for Table 6-3. . . . . . . . . . . . . . . . . . . . . . 28

6-5 Variation in size of the 33 HD images. . . . . . . . . . . . . . . . . . . . . . 28

6-6 Summary of HD test cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6-7 Sizes of images used in LD test. . . . . . . . . . . . . . . . . . . . . . . . . . 29

6-8 Summary of detection for all studied LD test cases. . . . . . . . . . . . . . . 29

vii

viii LIST OF TABLES

List of Equations

A-3.1 Viola-Jones sub-window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

A-3.2 Original Viola-Jones sub-window . . . . . . . . . . . . . . . . . . . . . . . 7

A-3.3 False-positive detection rate for entire cascade. . . . . . . . . . . . . . . . 9

A-3.4 Positive detection rate for entire cascade. . . . . . . . . . . . . . . . . . . 10

A-3.5 Integral Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

A-3.6 Tilted sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

A-3.7 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

A-3.8 Weight normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

A-3.9 Final strong classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

A-3.10 Main wavelet function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

A-3.11 Scaling function in Haar wavelets. . . . . . . . . . . . . . . . . . . . . . . 13

A-3.12 Classification function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

A-3.13 Haar-like calculation for intensity. . . . . . . . . . . . . . . . . . . . . . . 15

A-3.14 Standard deviation for haar-like feature. . . . . . . . . . . . . . . . . . . . 15

A-4.1 Circle intersection detection. . . . . . . . . . . . . . . . . . . . . . . . . . 18

A-4.2 Circle overlapping detection confirmation. . . . . . . . . . . . . . . . . . . 18

A-6.1 Scaling equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

A-6.2a Collecting column scale factor. . . . . . . . . . . . . . . . . . . . . . . . . 24

A-6.2b Collecting image scale factor. . . . . . . . . . . . . . . . . . . . . . . . . . 24

ix

x LIST OF EQUATIONS

Chapter 1

Introduction

This thesis describes the implementation of a face detection system (note FDS) that willbe used to augment an existing system. The new detector needed to fit into the currentAPI and deliver an improvement in detection result, mostly towards faces which were tiltedor in profile. The new detector also had to be more robust and reliable than previousimplementations.

The implementation is part of a larger application that had not yet been fully implemented.The fully developed system was later devised and is currently in early stages of client-usageand evaluation.

The FDS was implemented to be used by the company CodeMill AB in the service pro-vided by FaceClip.se. CodeMill AB is an Umea-based ICT consulting company that of-fers development and consulting, in various areas but specified on Media-production andMedical-resource development.

1

2 Chapter 1. Introduction

Chapter 2

Problem Description

The problem consists in expanding a FDS to detect faces, regardless of angle and profile,variations; increasing the detection algorithms flow through and speed for correct face de-tection; adding stability and good programing ethics for maintenance and future reuse ofcode.

The FDS had to use the OpenCV framework, and implemented an modern usage of thisframework in C++. Also creating a more stable and robust augmentation of a currentdetector; while adapting the new FDS to fit into an already existing system.

2.1 Problem Statement

Create a FDS that has a low false-negative error-rate while also maintaining a low false-positive. The system used the implementation robust to special cases, such as tilted facesand otherwise non-upright frontal faces. Also adding a detection implementation for profilefaces and frontal faces in an non-upright position, such as angled faces.

The solution was to be integrated in an existing system, this was done using SWIG (Simpli-fied Wrapper and Interface Generator). A SWIG project was created to act as abstractionlayer between the C++ FDS-library and the overlying main system which is implementedin Java EE6.

2.2 Goals

A main goal of the project was to increase throughput in detected faces at any given image,if the image contains faces. If the image does not contain any faces, then none shouldbe reported as detected. Augmenting detection of faces in non-vertical positions, havingmultiple angles in the same image and detecting a high percentage of existing faces. Alsothe system should be able to detect faces in profile, not excluding already given result

3

4 Chapter 2. Problem Description

and not generating false-positive detections. Additionally, I strive to create an modernimplementation that uses system resources in an economical, and optimal, way. Anothergoal is that the construction of the new implementation have to be integrated in a currentlyused system, and using the current API for integration and delivering the same standardreply format.

Goal 1. Increasing throughput in detected faces at any given image, if image containsfaces. If not containing any faces, then none should be reported as detected.

Goal 2. Detecting faces in profile, not excluding already given result and not generatingfalse-positive detections.

2.3 Purposes

The reason for pursuing the goals described above is to make the FDS more versatile, robustand economic with respect to system resources, thereby improving the marketability of theoriginal system. While making an integration to an existing system an lesser task.

2.4 Methods

To reach our set goals, we proceed as follows. First, performing an literature study ofcurrent methods and available algorithms in the field of Computer Vision and face detectionin digital imagery. Studying current methods for detection of faces in digital imagery andcomparing them to the available algorithms provided by the OpenCV library.

Using the book Learning OpenCV [1] a case-study of available algorithms will be done andalso an in-depth study of the framework that is OpenCV. A dialog with the postgradu-ate student Jean-Paul Kouma concerning his work in face recognition technology will aidour understanding of the field and the various algorithms that have be produced and notnecessarily implemented in OpenCV, yet.

By finding special cases for which the current version of detector fails, and finding limitationson new implementations for back-reference. The data for these limitations search will bevarious image of corner cases such as profile image and tilted faces in images.

Chapter 3

Introduction to ComputerVision and Image processing

To us humans, perceiving images and recognizing faces, focusing, grasping a content andseeing a 3-dimensional structure from a picture is easy. It is an important property for oureveryday life and something we do not need to put much effort into. In Computer Vision wetry to achieve something similar, observing a picture and inverting it back to its properties.Namely collecting information on color intensity, edges and shapes and gathering a contextfrom this. Processing a digital image back to its properties, since more powerful computersbecame affordable, changed into a much more feasible task.

In the late 90s image processing started to attract a great deal more interest, but the contem-porary implementation of detectors and image processors were computationally expensiveusing raw pixels and edges. Now more effective algorithms, and computers, makes it possibleto use image processing in an feasible way.

3.1 Basic concepts

Sub-windowPart of the image currently being processed.

ClassifierA evaluation for features, containing a threshold for positive evaluation. See Section3.2.6 for more details on classifiers and feature evaluation.

Classifier CascadeAn algorithm for rapid detection in an sub-window, where a positive result from aclassifier triggers the next classifier in line. A negative result from on classifier atany point during the cascade will immediate rejection of the current sub-window. SeeSection 3.2.4 for more details.

Haar-like

5

6 Chapter 3. Introduction to Computer Vision and Image processing

Haar-like features have scalar values that represent differences in average intensitiesbetween two rectangular regions. See Section 3.2.5 for more details.

BoostingConstructs a “strong” classifier as a linear combination of many “weak” classifiers, forexample a classifier cascade. See Section 3.2.4 AdaBoost for more details.

3.2 Face detection

According to the taxonomy of Ming-Hsuan Yang, David J. Kriegman and Narendra Ahuja[11] face detection can be classified as adhering to one of four approaches: knowledge-based,feature-based, template-based or appearance-based.

Knowledge-based face detection uses an ontology of how facial features relate to each other.This is based on rules derived from anatomical knowledge of the human face, and howdifferent features are related. These rules can be seen as a cascade (like Figure 3.2) whereeach step should answer a particular question [18]: Any eyes? Any nose? And so on. . .

Feature-based face detection focuses on finding distinctive features as locations in images.Such as eyes, noses and mouths, and verify that the detected feature is in an plausible regionof the image, this can be done through geometrical testing.

Template-based face detection has a wide range of poses, expressions, and an ontology, storedin an template-database. This approach is however inadequate for face detection since itcannot efficiently deal with variations in scale, shape and pose [11].

Appearance-based face detection iterates over smaller rectangle of potential faces, using aBayesian classification or maximum likelihood (MLE) for more selective detection. Requirean heavily trained classifier for detection and minimizing false-positives. Appearance-baseddetection have good tolerance towards different backgrounds and lighting [6].

3.2.1 Techniques used for the FDS

In this thesis a combination of feature-based and appearance-based detection was used fordetection of faces in digital images. The implementation used several different ClassifierCascades (See Section 3.2.4), which were run simultaneously to increase computationalefficiency (See Section 4.1).

We used an feature-checking algorithm to detect face candidates and a nested cascade (seeSection 3.2.4 Nested Cascade) to detect features, such as eyes and mouth (using a classifiercascade).

OpenCV implements a Viola-Jones detector (see Section 3.2.2) for face detection, amongvarious other possible detectors. It obtains a high detection rate at the cost of a low rejectionrate, giving a high possibility for false-positive (a face detected where there is none) andlow for false-negative (face missed).

3.2. Face detection 7

When we introduce a Classifier Cascade into the system, we see a substantial increase incomputational efficiency while also reducing false-positive. Each possible detection is notedas an “region of interest” (note ROI), and can be represented as a specific part of the imagethat may contain a positive detection. This approach was introduced by Paul Viola andMichael Jones [14] later expanded by Rainer Lienhart and Jochen Maydt [8] also TakeshiMita (et al.) [13] (see Section 3.2.5).

Before attempting to detect faces in an image a preprocess was added for rendering imagesmore prone to detection. Using parts of A neural network for face detection preprocess-algorithm [6] (see Section 3.2.3), to accomplish this task. After the preprocess the imagewas the scanned using the Viola-Jones detector, read below. For example system flow viewSection 4.1.

3.2.2 Viola and Jones detection

The cascade used by the Viola-Jones detector [14] consists of many “weak” classifiers, mean-ing that they perform just slightly better than mere guessing. By combining a large numberof these weak classifiers in a chain and each additional classifier strengthens accuracy. If asingle classifier fails then the whole sub-window is considered to be free from face patterns.And by having the most restrictive classifiers at the beginning of the chain, a faster rejectionrate increase overall speed.

The image is divided into sub-windows on which the detection algorithm is applied. Eachsub-window has a classifier, hj() that is a binary threshold function constructed form athreshold θj and a rectangle filter fj() (see Section 3.2.5) which is a linear function of theimage, j being the identifier for each unique feature. In Equation(A-3.1) and (A-3.2) x isthe sub-window of an image, size 24 × 24 pixels, αj and βj are Boolean values signifyingpositive or negative votes in the cascade, set during AdaBoost learning process (see Section3.2.4 AdaBoost). This was suggested by Paul Viola and Michael Jones [16] to expand theiroriginal detector [14] Equation (A-3.2).

hj(x) =

{αj if fj(x) > θjβj otherwise

Equation A-3.1: Viola-Jones sub-window detection equation [16].

hj(x) =

{1 if pjfj(x) < pjθj0 otherwise

Equation A-3.2: Original Viola-Jones sub-window detection equation [14].

In Equation (A-3.2) (pj) indicates the direction of an inequality sign, removed in laterdetector Equation (A-3.1) in favor of positive and negative votes (αj and βj).


(a) Original (b) Grey and scale (c) Histogram equal-ization

Figure 3.1: Neural network pre-process.

3.2.3 A neural network for face detection

The Neural Network approach uses several networks to improve computational efficiencyover a single neural network. This approach was invented by Henry A. Rowley, ShumeetBaluja, and Takeo Kanade [6].

The main functionality of the neural network, as suggested by [6], is to detect uprightfrontal faces, using scaling, greyscale and histogram equalization. And an appearance-basedapproach to face detection.

The neural network pre-process is applied to an image to render it more prone to detection,while also increasing the computational efficiency.

The pre-process attempts to equalize the intensity of pixels to a single channel (instead ofthe three present in RGB). This compensates for different lighting and camera input. Theresult of this procedure will be an grey-scale representation (Figure 3.1b) of the originalimage (Figure 3.1a), effectively lowering the number of elements needed to be processedfrom N3 to N1 if the image is in three channels i.e RGB.

A histogram equalization is then applied to compensate for contrast, reducing the range ofcontrast differences present in the image. This expands the range to be more dynamic andevenly spread over the entire image rather then just at a single peak of contrast, white-out [1] (Figure 3.1b to 3.1c). Histogram equalization requires a single channel of color i.egreyscale.

In the Neural Network-face detection approach a distribution of face features is scaled downto 20× 20 pixel patches and match to create an detection. An image is passed through thenetwork, consisting of a layer of hidden units: four looking at 10 × 10 pixel sub regions,


16 looking at 5 × 5 pixel sub regions (for detecting the eyes and nose), and six looking atoverlapping 20× 5 horizontal stripes of pixels (for detecting the mouth). The return valueof this process is a single real number indication presence of a face.

Neural Network uses Appearance-based detection when fully implemented, only the pre-process was implemented for this thesis.

3.2.4 Cascade classifier

Figure 3.2: Rejection cascade used in the Viola-Jones classifier.

The detection/rejection cascade classifier algorithm works as follows: A Haar-like (see Sec-tion 3.2.5) feature is detected and then the sub-window is allowed to pass to the next step ofdetection, as shown in Table 3-1. A candidate is a sub-window of the image, a rectangularsection of a fixed size (usually 24×24 pixels, in the implementation for this thesis: 20×201).The entire image is scanned for these sub-windows. The algorithm uses an integral image(using Equation (A-3.5)) to make a rapid summation of sub-regions. The Equation (A-3.5)uses an technique similar to summed area table [4], where a single table is created whereeach pixel intensity is replaced by a value representing the sum of all the pixels containedin a rectangle of interest and the lower left corner of the image.

F =

K∏i=1

fi

Equation A-3.3: False-positive detection rate for entire cascade.

1Suggested by: “Tree-based 20× 20 gentle adaboost frontal face detector. Created by Rainer Lienhart.”and [6, 12].


Cascade:

Stage,,1,,:

Classifier,,11,,:

Feature,,11,,

Classifier,,12,,:

Feature,,12,,

...

Stage,,2,,:

Classifier,,21,,:

Feature,,21,,

...

...

Table 3-1: Cascade setup.

The false-positive detection rate for the entire cascade is show in Equation (A-3.3). Toachieve a false-positive rate of 10−6 each stage (f , denotes each stages classifier false-positiverate) need only have a false-positive rate of less then, or equal to 65%. K denotes the numberof classifier stages in the cascade, such as shown in Table 3-1.

D =

K∏i=1

di

Equation A-3.4: Positive detection rate for entire cascade.

In Equation (A-3.4) the total rate of positive detection is D, to achieve a detection rateof 90% each stage (d, denotes each stage’s positive detection rate) has to have a positivedetection rate of ≥ 99.7%, in an 32 stage detector, first outlined in [15].

sum(X,Y) =∑x≤Xy≤Y

image(x, y)

Equation A-3.5: Integral Image for rapid evaluation of features.

tilt sum(X,Y) =∑y≤Y|x−X|≤y

image(x, y)

Equation A-3.6: Tilted sum, same as Equation (A-3.5) except that it is for the image rotated45 degrees.

Experiments were made to expand the tilted sum equation to account for all angles ratherthen just 45 degrees by Christopher Messom and Andre Barczak [10] but it was proven tohave practical problems for low resolution images and is therefor not implemented in thisthesis.


The presence of a Haar-like feature is determine by subtracting the average dark-regionpixel value from the average light-region pixel value. If the difference is above a thresholdthat feature is considered present (see Section 3.2.6).

A classifier is created as follows:

⇓ Collecting sample (1)⇓ Generate training sets (2)⇓ Training classifier (3)Detect classifier

(1) A large set of images containing positive detection of features and a large set of negativedetection, such as backgrounds not containing the desired features. Sets should be atleast 1000 (Rainer Lienhart (et al.) [12] stated that they used 5000 positive imagesderived from 1000 original images) positive images, more is better. And the sameamount of negative images.

(2) Having the classifier learning from the collections.

(3) Training the classifier using a boost algorithm, such as AdaBoost(see Section 3.2.4 sub-section AdaBoost).

AdaBoost

AdaBoost (short for Adaptive Boosting) was suggested by Yoav Freund and Robert E.Schapire [5] and reworked for feature detection by Paul Viola and Michael Jones [14]. Ad-aBoost constructs a “strong” classifier as a linear combination of many “weak” classifiers.

T∑t=1

αtht(x)

Equation A-3.7: Adaptive Boost.

For Equation (A-3.7) ht is an weak classifier or a feature. αt = log 1βt

where βt = εt1−εt and

εt is the error evaluated from each respective weight wt. The weight is collected for theaccuracy of the classifiers “vote” when it is being trained, the weights are normalized duringboosting algorithm, see Equation (A-3.8).

wt,i ←wt,i∑nj=1 wt,j

Equation A-3.8: Weight normalization.

The training generates an final strong classifier in Equation (A-3.9).


h(x) =

{1∑Tt=1 αtht(x) ≤ 1

2

∑Tt=1 αt

0 otherwise

Equation A-3.9: Final strong classifier.

Nested Cascade

For this thesis a nested cascade was used to limit false-positive. By adding an DiscriminativeModel to discard ROI if no specific feature is detected. Such a detection classification wassuggested by Chang Huang (et al.) [2], Peng Wang and Qiang Ji [17].

Pruning

OpenCV supports a functionality known as canny pruning where sections of the image isdiscarded as unlikely to hold an face. Using an edge detector (the Canny Edge Detector,created by John F. Canny [3]) to detect areas containing too many or too few edges tocontain a face feature, and thus decreasing computational load and possible discardingfalse-positives at an early stage.

3.2.5 Haar-like

Haar-like features have scalar values that represent differences in average intensities betweentwo rectangular regions.

Haar wavelets

Haar wavelets, first suggested by Alfred Haar [7], wavelets uses the Equation (A-3.10) to gen-erate an expression for intensity over a function, called ψ(t). Looking at the Figure 3.6 andadding the Equation (A-3.10) for normalizing an function. Additionally the Haar waveletsupports an scaling function for rapid summary of values, Equation (A-3.11).

ψ(t) =

1 0 ≤ t ≤ 1/2−1 1/2 ≤ t ≤ 10 otherwise

Equation A-3.10: Main wavelet function is described using this equation.

φ(t) =

{1 0 ≤ t ≤ 10 otherwise

Equation A-3.11: Scaling function in Haar wavelets.


(a) Centerfeature

(b) Basic fea-ture

(c) Line feature

(d) Basic fea-ture

(e) Basicfeature

(f) Edge feature

(g) Linefeature

Figure 3.3: The rectangular regions are easily calculated using Integral Image: The sum ofthe light regions pixels are subtracted from the sum of the dark regions pixels.

Giving the Figure 3.6 an appearance similar to the Figure 3.3b can be extracted, if y = 1equals light coloration and y = −1 is dark coloration. The same wavelet (Figure 3.3b) canbe seen in Figure 3.3e, if the figure is rotated. Additionally by using a combination of thetwo features described above, a generation of the Figure 3.3d is possible.

Inspired by the original Haar wavelets, the basic Haar set contains only the mentionedfigures, this was later expanded to the Haar-like features (see Figure 3.3c, 3.4a and 3.4b)[14]. Additionally the Haar-like features were expanded to Expanded Haar features by RainerLienhart and Jochen Maydt [8], see Figure 3.3a and all tilted features in Figure 3.4.

F = sign(w1f1 + w2f2 + · · ·+ wnfn)

Equation A-3.12: Classification function [14], sign returns according to Equation (A-3.10).

In Equation (A-3.12) w1 is the weighted vote according to the Boosting, see Section 3.2.4subsection AdaBoost. And each f is hj() in Equation (A-3.1).


(a) Line feature (b) Edge feature (c) Border feature

Figure 3.4: Rotated Haar-like features are easily calculated using Tilted Integral Image (seeEquation (A-3.6)).

3.2.6 Classifiers

Classifiers assigns each input value to a given set of classes, in the case of this thesis, theclasses are “Face” and “Not-Face”. The classifiers use a threshold of the sums and differenceof rectangular regions of data produced by any feature detector, which may include Haarwavelets of rectangular gray-scale image values, rather then the square features generated byfunction in Figure 3.6. Denoted “Haar-like” in deference of this distinction. Figure 3.3 andFigure 3.4 show Haar-like features used by the classifier for this thesis. They are calculatedfrom the integral image representing the original (gray-scale) image. This method was firstsuggested by Michael Oren (et al.) [9].

For example Figure 3.3c is placed over a likely region for the eyes. By comparing theintensities in the eye regions to the intensity across the bridge of the nose we obtain aplausible detection of a face. And Figure 3.3d could be placed according to Figure 3.5c andproduce a valid result, indicating detection of a face. And so forth until all features have beplaced and the sum of the Equation (A-3.13) have all produced results under the thresholdφj (see Equation (A-3.1)).

Using the Equation (A-3.13), the difference (FHaar ) between the fields of the rectangle(see Figure 3.5) can be calculated and translated to a potential detection. In the equationE(Rblack ) is the intensity of the dark region, and E(Rwhite) is the intensity of the light colorsin the white region of the rectangle (see Figure 3.3f placed on Figure 3.5a). The differencebetween them is then: E(Rblack ) − E(Rwhite). By dividing it with the standard deviationon the rectangle containing all features (Equation (A-3.14)) multiplied with the width andheight of the feature rectangle for scaling, i.e the total number of pixels in the sub-window.The division is applied to normalize the variance of the pixel value because the detector istrained by the normalized example images, see Equation (A-3.8).

FHaar =

∣∣∣∣∣∣ E(Rblack )− E(Rwhite)

w × h×√∣∣E(Rµ)2 − E(R2

µ)∣∣∣∣∣∣∣∣

Equation A-3.13: Haar-like calculation for intensity.


(a) Edge feature (b) Line feature (c) Basic feature

Figure 3.5: Haar-like feature placed on an ROI.

√∣∣E(Rµ)2 − E(R2µ)∣∣

Equation A-3.14: Standard deviation for haar-like feature placement.

<feature>

<rects>

<_>3 7 14 9 -1.</_>

<_>3 10 14 3 3.</_>

</rects>

<tilted>0</tilted>

</feature>

<threshold>3.3989109215326607e-004</threshold>

Equation A-3.15: A sample Haar-like feature vector from OpenCV source distribution,generated by Rainer Lienhart [8, 12].

The interpretation of the Equation (A-3.15) is; the feature consists of two rectangles, wherethe start coordinates of the fist rectangle is x = 3 and y = 7 with width = 14 and height = 9.The weight of the first rectangle is −1, which is applied to the summary intensity of theregion during feature evaluation. The second rectangle overlaps the first rectangle andapplies a positive weighting for the evaluation of the feature. This would generate anfeature corresponding to (where each + and − corresponds to one pixel) Figure 3.7.

The Figure 3.7 is similar to the Figure 3.3g in shape and color. The value of<tilted>0</tilted> indicates that the feature is not tilted. The top left corner of the


Figure 3.6: The Haar wavelet Equation (A-3.10) graph representation.

--------------

--------------

--------------

++++++++++++++

++++++++++++++

++++++++++++++

--------------

--------------

--------------

Figure 3.7: Feature generated from Equation (A-3.15).

feature would be located at x = 3, y = 7 in an ROI with size 20 × 20 as this feature isgenerated by Tree-based 20× 20 gentle adaboost frontal face detector [8].

Chapter 4

System structure

Let us now give a brief presentation of the implemented system’s structure.

4.1 Work flow

As shown in Figure 4.3 the feature detection is preformed at stage Do feature detection. Theresults of this detection is then processed and compared with the other positive detections,done in parallel.

Each thread that is created at stage Run thread cascade on image runs an separate cascadeon the image, using nested confirmation if this has been decided, see Section 4.1.2. Eachthread has access to an virtual copy of the original image, preprocessed (see Section 3.2.3),on which the cascade is used on. All threads run parallel to each other and interacts onlywhen results of detection is combined to one complete result, which the application “above”may use.

There may be up to 5 threads running in parallel during detection phase of the FDS, thisdepends on compilation environment and compilation parameters. Or, one main threadmay sequentially run each of the 5 different detection and then compile the result.

4.1.1 Detection conflict

To detect overlapping detections the system uses geometric areas, an detection is marked asX,Y coordinates in the image with a radius as the size of detection area (face), resulting inan circle-mark for detections. If two detection circles intersect, then there may be a conflict.All intersections do not count as conflicts, which means that two detections may overlapby an percentage of their radius. For example with two detection close together, an imagecontaining two faces cheek to cheek.

The Equation (A-4.1) shows potential overlaps, but is also limited detecting overlapping

17

18 Chapter 4. System structure

Figure 4.1: Overlapping detection areas, causing an conflict.

circles where no circle may exist only inside the other circle. The Figure 4.1 shows ascenario which would cause the conflict to be handled and dealt with. This however is anunusual scenario and may be counted as an limitation of the system, Figure 4.2 show anexample of such a scenario.

Figure 4.2: Overlapping detection areas, which would not raise an exception.

If the Equation (A-4.1) indicate an overlapping/intersection of two detection then Equa-tion (A-4.2) is used to confirm if the overlap is exceeding the limit.

(x1 − x2)2 + (y1 − y2)2 ≤ (r1 + r2)2

Equation A-4.1: Circle intersection detection.

√(x1 − x2)2 + (y1 − y2)2 ≤

r1/2

2

Equation A-4.2: Circle overlapping detection confirmation.

4.1.2 Loading nested cascades

Depending on gathered scale factor a nested cascade may be loaded (see Section 3.2.4subsection Nested Cascade). The decision to use or disregard nested cascade is done usingwork flow pictured in Figure 4.4. We use the scale factor Equation 6.2.1 for decision basis.

4.1. Work flow 19

Figure 4.3: Example system flow.

20 Chapter 4. System structure

Figure 4.4: Nested cascade load decision.

Chapter 5

Experiments

Our augmentations caused the detection rate of the system to increase by 10% using lowsize images, see Table 6-8, while the detection time increased by an average of 218 microseconds. Using nested detection greatly deceased false-positives at the expense of an increasein false-negative.

High sized images (of size ≥ 2848× 2848, see Table 6-5) were applied an automatic scaling.Which an average scale factor of 1:5 gave the best combination of false-negatives, false-positives and low detection time. Collecting the scaling factor was done using Equation (A-6.1), with an xs = 700 and ys = 800. This gave an detection time decrease (compared tothe original and the augmented implementation of the FDS) to 10% of original detectiontime. While increasing positive detection with 65% and greatly lowering false-positive. SeeTable 6-4. The results indicated a decrease in average detection time from (old implemen-tation) ≈15 seconds to ≈2 seconds, and a decrease of false-positives from 5.8 to 0.2 peraverage image used in test.

In low sized images (of size ≤ 300 × 397, see Table 6-7) the results showed an increasein detected faces by 10% while also increasing the number of false-positives by 0.725 peraverage image.

The speed of the detection rate was not increased in the relative application. However whenusing the tools supplied from the empirical analysis (see Section 6.2) the speed of detectionimproved while not adding any false-negative detection misses, rather decreasing them.

5.1 Experiment setup

Our test set consisted in 33 high-sized images depicting various poses and corner-cases (seeFigure 6.2) an automatic scaling analysis was executed.

Running all the image through the detectors (both original implementation and this thesisimplementation) noting the results and repeating the tests at different scales.

21

22 Chapter 5. Experiments

The low-sized images were tested using similar methodology and comparison of the results.A database of 161 images were used in this test. In our tests, we evaluated the usefulnessof nested detection, and compared nested and un-nested confirmation. We also attemptedto establish a threshold in the images size below which nested confirmation is no longeruseful. Intuitively, the threshold tells us how small images we can work with before nestedconfirmation leads to a surge of false-negatives.

Chapter 6

Results

6.1 Unmodified Images

Using high resolution (abbreviated HD) images helps mitigating false-positives, see Section6.1.1, but unfortunately increases the computation time needed for detection and analysis.Using too “high” resolution images increases the detection of false-positives, see Section 6.2,and does not necessarily increase correct detections.

6.1.1 High resolution images

The HD images were more prone to generate false-positive detection, when used unmodified,see Section 6.2.2 and Attempt #1 at Table 6-1 and Attempt #1 at Section 6.2.3 andTable 6-3.

6.1.2 Low resolution images

The HD images were subject to the automatic scaling procedure during the pre-process,as demonstrated in Section 6.2. Low resolution (abbreviated LD) images were, for obviousreasons, not suitable for scaling.

Instead the already LD images cause a low false-positive and also a relatively low false-negative, average of 0.8 detection per image. But when the nested detection confirmationusage was added, the number of false-positive detections were reduced close to zero, anaverage of 0.015 per image, see Table 6-4.

23

24 Chapter 6. Results

6.2 Empirical analysis

A study of the impact of automatic scaling was done to increase performance withoutreducing detection rate and increasing false-positive or false-negative. Through usage of aset of 33 HD images an optimal scaling factor was produced, for which an increase in correctdetections was noted as well an much faster run time.

Two of the HD casestudies, for scale value collection, is shown in detail below (see Section6.2.2 and 6.2.3).

6.2.1 Scaling

Using Equation (A-6.1), where x is image columns, y is image rows and xs is column scalefactor and ys is row scale factor. This then generates the complete image scale factor s() forimage φ, this scale factor is then applied to the image resulting in an size-reduced virtualcopy of the image with ration-aspects intact. Additionally using the Equation (A-6.2a) forcollecting the scaling factor float value for each plane of the two dimensional image, andby producing an mean value for the results of the Equation (A-6.2a) for x and y usingEquation (A-6.2b) the final s() value is created.

s(φ) =x

xs+

y

ys

Equation A-6.1: Scaling equation.

lx =

{1−1 x < 450xxs

otherwise

Equation A-6.2a: Collecting column scale factor (lx), same for rows scale factor.

s(φ) =

{1 lx + ly ≤ 1

lx + ly otherwise

Equation A-6.2b: Collecting image scale factor.

6.2.2 Example study number 1

After each Attempt the results of the execution was collected by hand and noted. Thisexample studie shows a combination of profile- and a frontal-face, thus being a good repre-sentation of Goal 1 and 2 (see Section 2.2).

Table 6-1 Attempt #2 uses an xs = 1000 and ys = 1000 giving an scaling factor of

3.568( 42881000 + 2848

1000 = 3.568).

6.2. Empirical analysis 25

Attempt # Size Scaling Size Scale factor

1 4288× 2848 4288× 2848 1:12 4288× 2848 1202× 798 1:3.5683 4288× 2848 725× 482 1:5.9094 4288× 2848 644× 428 1:6.654

Table 6-1: Scaling Table for image Figure 6.1.

Attempt # Time (New1) Detection † # False-positive†1 32632.8 ms 100% 22 2313.22 ms 100% 03 885.427 ms 100% 04 831.908 ms 50% 0

Table 6-2: Times and false-positive for Table 6-1.

The detection percentage for the old implementation (note ‡) was 100%, and a false-positivedetection of 5 items (see Figure 6.2). The detection time (at scale 1:1) for ‡ was 13.5 seconds.(‡ does not scale.)

6.2.3 Example study number 2

Second detailed study, showing an angled face.

Attempt # Size Scaling Size Scale factor

1 2848× 4288 2848× 4288 1:12 2848× 4288 798× 1202 1:3.5683 2848× 4288 604× 910 1:4.714

Table 6-3: Scaling Table for image Figure 6.3.

Detection rate for ‡ was 0%, and a false-positive detection of 12 items (see Figure 6.4). Thedetection rate was manually counted and accounts for false-negatives, which would lowerthe detection rate if any such were made. The detection time (at scale 1:1) for ‡ was 12.4seconds.

6.2.4 Summary of study

Summary of all studies made for scaling, and the throughout for LD images as well.

1 The new implementation (note †), used in this thesis


(a) Scaled

(b) Non-scaled

Figure 6.1: Detection using auto-scaled and non-scaled †.

Figure 6.2: Non-scaled ‡.


(a) Scaled (using setup At-tempt #3)

(b) Non-scaled

Figure 6.3: Detection using auto-scaled and non-scaled †.

Figure 6.4: Non-scaled ‡.


Attempt # Time† # False-positive† Detection†1 34698.1 ms 2 100%2 2313.22 ms 0 100%3 1632.27 ms 0 100%

Table 6-4: Times and false-positive for Table 6-3.

HD images

By using images of varied size, see Table 6-5, and measuring the results from each scalingfactor, an optimal value was collected as presented in Section 5. The automatic scaling wasonly applied to the implementation used for this thesis, not the original system.

Width Height

2848 42884288 28484928 32643264 4928

Table 6-5: Variation in size of the 33 HD images.

Scale factor Detection time False-positive(s) Detection

1 : 1.0 66681.5 ms 3.1 76.4%1 : 1.784 18040.9 ms 0.8 83.8%1 : 2.378 9654.1 ms 0.5 77.9%1 : 4.042 2456.8 ms 0.3 55.0%1 : 4.885 1680.0 ms 0.2 64.7%1 : 6.747 866.9 ms 0.0 50.0%1 : 1.0 ‡ 15848.9 ms 5.8 42.1%

Table 6-6: Summary of all studied HD test cases. All numbers are averages computed overthe different categories.

As show in Table 6-6 the peak of detection performance, while showing low false-positive,is located around a scaling factor of 4.885. At the scaling factor of 6.747 no false-positiveswere detected but the decrease in positive detection was substantial.

LD images

For LD test 161 images were used without any scaling. And if the size of an images wasless than 400× 400 then no nested cascade was used to confirm face detection (see Section3.2.4 Nested Cascade).

The sizes of the set of images varied, see Table 6-7.


Set Width Height

i

200 139200 145200 153200 161200 201

ii300 188300 396300 397

Table 6-7: Sizes of images used in LD test.

Table 6-7 contains two different sets of images (i and ii), Set i contains 61 images such asthe ones showed in Figure 6.5, and Set ii is shown in the Figure 6.6 and consisted of 100images.

Nested Detection time # False-positive Detection% # Missed face

No †130.4242 ms 0.3 83.7% 0.57410.7203 ms 1.07 91.7% 0.11304.5214 ms 0.8 83.8% 0.28

Yes †132.239∗ ms 0.0 30.6% 2.40

429.309∗∗ ms 0.02 87.3% 0.15316.884∗∗∗ ms 0.015 60.0% 1.01

No ‡53.756∗ ms 0.03 64.9% 0.18

125.244∗∗ ms 0.1 85.4% 0.3598.135∗∗∗ ms 0.075 77.6% 1.17

Table 6-8: Summary of detection for all studied LD test cases.

2 Using Set i, see Table 6-7 (note ∗)3 Using Set ii, see Table 6-7 (note ∗∗)4 Using Set i and ii, see Table 6-7 (note ∗ ∗ ∗)


(a) Image from Set i (b) Image from Set i (c) Image from Set i

(d) Complete detec-tion

(e) Three missed de-tections

(f) Complete detec-tion

Figure 6.5: Images from Set i used in Table 6-7 and their detection result (not using nested).


(a) Image from Setii

(b) Image fromSet ii

(c) Image from Setii

(d) Good detec-tion

(e) OK detection (f) Showing onefalse-positive

Figure 6.6: Images from Set ii used in Table 6-7 and their detection result (not using nested).

Chapter 7

Conclusions

The goals of implementing an modern and robust FDS to be integrated in an already existingsystem was met, and an improvement upon the FDS detection rate was also inserted in thisimplementation. This was set in Goal 1 (see Section 2.2).

Detection of faces in profile or seen from various angles was also improved although thiswas a lesser success then the overall improvement. Few classifiers exist to support profiledetection. However an improvement were measurable, as show in Section 6.2.2 and 6.2.3.This means that also the requirements associated with Goal 2 were met.

Improvement of the overall system will aid the expansion of the usability and reliability forboth CodeMill AB and its customers.

7.1 Limitations

The implementation done for this thesis uses parts of a previously implemented library ofcascades, taken from the OpenCV 2.1+ framework and others gathered from inproceedingsarticles, such as Rainer Lienhart and Jochen Maydt article on Haar-like Features [8].

Corner-case such as where faces is tilted in a large angle, may still be reduce detectionrate. Faces that have a large part of them covered will to a high degree not be detec-tion due to the cascades classifier training, this still requires a face containing the majorfeature (eye(s)/nose/mouth) for most stable and correct detection. This is to some partmitigated thanks to the histogram equalization which evenly distribute the saturation andthus elimination some contrast shadows and white-out.

The result of the HD scaling study may be affected by to the low number of images used.It is possible that a larger set of images may yield a different result. However, the studyaided the understanding of why high sized images may cause a behavior which makes thedetection more prone to generation false-positive.

33

34 Chapter 7. Conclusions

7.2 Future work

Let us conclude with a few pointers for future work.

– Integrating the application into the system and improving the overall usability forfuture applications.

– Implementing changes in host server to improve the images being sent to the FDS, tomake them of a higher quality and thus improving the detection rate.

– Improving threading in the system using boost threads instead of pthreads. Ex-panding the user defined parameters for limiting number of possible detections. Alsoimproving on the tilted detection functionality and its performance.

– Expanding the compilation of detection results to implement a “fair” voting for eachdetection, where the detection with the larges detection area and most amount ofnested confirmations should be prioritized.

Chapter 8

Acknowledgments

I would like to express my gratitude to my internal supervisor at the department of Comput-ing Science Johanna Bjorklund for supervising this thesis, my external supervisor RickardLonneborg for supplying this interesting work as a thesis. I would also like to thank JonHollstrom for access to his image database at vimlig.se. In addition I would like to thankall my friends and family.

35

36 Chapter 8. Acknowledgments

References

[1] G. Bradski and A. Kaehler. Computer Vision with the OpenCV Library. O’ReillyMedia, Sebastopol, first edition, 2008.

[2] B. Wu C. Huang, H. Ai and S. Lao. Boosting nested cascade detector for multi-viewface detection. volume 2, pages 415–418. IEEE Conference on Computer Vision andPattern Recognition, 2004.

[3] J. Canny. A computational approach to edge detection. IEEE Transactions on PatternAnalysis and Machine Intelligence, 8:679–714, 1986.

[4] F. Crow. Summed-area tables for texture mapping. volume 18(3), pages 207–212.SIGGRAPH, 1984.

[5] Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learningand an application to boosting, 1995.

[6] S. Baluja H.A. Rowley and T. Kanade. Neural network-based face detection. pages203–208. IEEE Conference on Computer Vision and Pattern Recognition, 1998.

[7] A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen,69:331–371, 1910.

[8] R. Lienhart and J. Maydt. An extended set of haar-like features for rapid objectdetection. volume 1, pages 900–903. IEEE Conference on Computer Vision and PatternRecognition, 2002.

[9] P. Sinha E. Osuna M. Oren, C. Papageorgiou and T. Poggio. Pedestrian detectionusing wavelet templates. pages 193 – 199. IEEE Conference on Computer Vision andPattern Recognition, 1997.

[10] C. Messom and A. Barczak. Fast and efficient rotated haar-like features using rotatedintegral images. pages 1–6. Australian Conference on Robotics and Automation, 2006.

[11] D. Kriegman M.H. Yang and N. Ahuja. Detecting faces in images : A survey. IEEETransactions on Pattern Analysis and Machine Intelligence, pages 34–58, 2002.

[12] A. Kuranov R. Lienhart and V. Pisarevsky. Empirical analysis of detection cascades ofboosted classifiers for rapid object detection. pages 297–304. DAGM Pattern Recogni-tion Symposium, 2003.

37

38 REFERENCES

[13] T. Kaneko T. Mita and O. Hori. Joint haar-like features for face detection. IEEEInternational Conference on Computer Vision, 2005.

[14] P. Viola and M. Jones. Rapid object detection using boosted cascade of simple features.volume 1, pages I–511 – I–518. IEEE Conference on Computer Vision and PatternRecognition, 2001.

[15] P. Viola and M. Jones. Robust real-time object detection. In International Journal ofComputer Vision, 2001.

[16] P. Viola and M. Jones. Fast multi-view face detection. IEEE Conference on ComputerVision and Pattern Recognition, 2003.

[17] P. Wang and Q. Ji. Learning discriminant features for multi-view face and eye de-tection. volume 1, pages 373–379. IEEE Conference on Computer Vision and PatternRecognition, 2005.

[18] G. Yang and T.S. Huang. Human face detection in a complex background. PatternRecognition, 27:53–63, 1994.

face detection in digital imagery using computer vision ... · face detection in digital imagery...

Documents