35 cv facedetection 3106.ppt

Face Detection

Lecture #35Monday, 31/05/2021

Antonis Argyrose-mail: [email protected]

Detection theory

Assume a population of entities. Each individual

may or may not belong to a certain class.

Assume a classifier that tests whether each

individual belongs to the class or not. That is, it

provides a binary answer to the question,

“does this individual belong to the class?”

Detection theory

Therefore, the situation is as follows:

What happens in reality

(actual class)

Belongs Does not

belong

What the method

estimates

(predicted class)

Belongs TP FP

Does not

belong

FN TN

• 1st letter: Whether estimation is right

• 2nd letter: What the estimation really is (yes/no)

Detection theory

Thus:

TP: The method correctly (T) decides that the

individual belongs to the class (P).

TN: The method correctly (T) decides that the

individual does not belong to the class (N).

FP: The method incorrectly (F) decides that the

individual belongs to the class (P).

FN: The method incorrectly (F) decides that the

individual does not belong to the class (N).

Detection theory

TP

FP

TN

FN

TRUTH

CLASSIFIER

THE POPULATION

Detection theory

Precision = TP / (TP + FP)

Recall = Sensitivity = True

Positive Rate (TPR)

= TP / (TP + FN)

False Positive Rate (FPR) =

Fall-out = FP/(FP+TN)

= 1 - Specificity

Specificity = True Negative

Rate (TNR) = TN / (FP+TN)

= 1 – FPR

Accuracy =

= (TP+TN)/(TP+TN+FP+FN)

F-measure =

= 2*TP/(2*TP+FP+FN)

TPFP

TN

FN

TRUTH

CLASSIFIER

Discriminative approach: build a classifier that decides whether a rectangular area (window) contains a face or not

Slide a window across image and evaluate a face model at every location.

Basic Idea

Detection

Challenges Slide a window across image and evaluate a face model at

every location.

Illumination/lighting conditions, pose, scale, …

Sliding window detector must evaluate tens of thousands of

location/scale combinations.

Faces are rare: 0–10 per image

– For computational efficiency, we should try to spend as little

time as possible on the non-face windows

– A megapixel image has ~106 pixels and a comparable number of

candidate face locations

– To avoid having a false positive in every image, our false

positive rate has to be less than 106

The Task ofFace Detection

The Viola/Jones Face Detector

A seminal approach to real-time object detection

Training is slow, but detection is very fast

Key ideas– Integral images for fast feature evaluation

– Boosting for feature selection

– Attentional cascade for fast rejection of non-face windows

P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.

CVPR 2001.

P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.

Image Features

Rectangle filtersFeature Value (Pixel in white area)

(Pixel in black area)

Feature Value (Pixel in white area)


Image Features

Rectangle filters





Example





Source

Result

Integral images

The integral image computes a value at each pixel (x, y)that is the sum of the pixel values above and to the left of (x, y), inclusive.

(x, y)

,

( , ) ( , )x x y y

ii x y i x y

( , 1)ii x y

( 1, )s x y

Computing the Integral Image

The integral image computes a value at each pixel (x, y)that is the sum of the pixel values above and to the left of (x, y), inclusive.

This can quickly be computed in one pass through the image.

The picture can't be displayed.

(x, y)

,

( , ) ( , )x x y y

ii x y i x y

( , )i x y

( , ) ( 1, ) ( , )s x y s x y i x y

( , ) ( , 1) ( , )ii x y ii x y s x y

Computing Sum within a Rectangle


A B C Dsum ii ii ii ii D B

C A

Scaling

Integral image enables us to evaluate allrectangle sizes in constant time.

Therefore, no image scaling is necessary.

Scale the rectangular features instead!

1 2

3 4

5 6

Size of Feature Space





• How many are the possible rectangular features for a 24x24 detection region?

Rectangle filters


12 24

1 1

8 24

1 1

12 12

1 1

2 (24 2 1)(24 1)

2 (24 3 1)(24 1)

(24 2 1)(24 2 1)

w h

w h

w h

w h

w h

w h

A+B

C

D

160,000∼

Feature Selection

• How many number of possible rectangle features for a 24x24 detection region?


A+B

C

D

160,000∼

What features are good for face detection?

12 24

1 1

8 24

1 1

12 12

1 1

2 (24 2 1)(24 1)

2 (24 3 1)(24 1)

(24 2 1)(24 2 1)

w h

w h

w h

w h

w h

w h

Feature Selection

• How many number of possible rectangle features for a 24x24 detection region?


A+B

C

D

160,000∼

Can we create a good classifier

using just a small subset of all

possible features?

How to select such a subset?

12 24

1 1

8 24

1 1

12 12

1 1

2 (24 2 1)(24 1)

2 (24 3 1)(24 1)

(24 2 1)(24 2 1)

w h

w h

w h

w h

w h

w h

Boosting

Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier– A weak learner need only do better than chance

Training consists of multiple boosting rounds– During each boosting round, we select a weak learner

that does well on examples that were hard for the previous weak learners

– “Hardness” is captured by weights attached to training examples

Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese

Society for Artificial Intelligence, 14(5):771-780, September, 1999.

AdaBoost Concept

1 { 1 }) , 1(h x

...

weak classifiers

slightly better than random

1

( )( )T

t

tT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Weaker Classifiers

1 { 1 }) , 1(h x

...

weak classifiers

slightly better than random

1

( )( )T

t

tT th xH x sign

2 { 1 }) , 1(h x

{ 1( }) , 1Th x

strong classifier

Each weak classifier learns by considering one simple feature

T most beneficial features for classification should be selected

How to– define features?

– select beneficial features?

– train weak classifiers?

– manage (weight) training samples?

– associate weight to each weak classifier?

Boosting

Training set contains face and nonface examples– Initially, with equal weight

For each round of boosting:– Evaluate each rectangle filter on each example

– Select best threshold for each filter

– Select best filter/threshold combination

– Reweight examples

Computational complexity of learning: O(MNK)

– M rounds, N examples, K features

Boosting: illustration

Weak

Classifier 1

Boosting illustration

Weights

Increased


Weak

Classifier 2


Weights

Increased


Weak

Classifier 3


Final classifier is

a combination of weak

classifiers

The AdaBoost Algorithm

Given:1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y …

Initialization: 11( ) , 1, ,

mD i i m …

For :1, ,t T …

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t

• Weight classifier:11

ln2

tt

t

• Update distribution:1

( )exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

minimize weighted errorminimize weighted error

Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.

The AdaBoost Algorithm

Given:1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y …

Initialization: 11( ) , 1, ,

mD i i m …

For :1, ,t T …

• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X

1

where arg min ( )[ ( )]j

m

t j j t i j i

ih

h D i y h x

• Weight classifier:11

ln2

tt

t

• Update distribution:1

( )exp[ ( )], is for normalizati( ) ont t i t i

t t

t

D i y h xD i Z

Z

Output final classifier:1

( ) ( )T

t t

t

sign H x h x

Initialize classification error

Classify imagesCalculate error

A

B

C

D

E

F

.2

.2

.2

.2

.2

Adaboost Model

=

.2

.2

+ .4

Classify images

A

B

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate error

.2

.2 .4

.4

Classify images

A

B

C

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate error

.2

.2.4

.4

.4

Classify images

A

B

C

D

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate error

.2

.2.4

.4

.4

Classify images

A

B

C

D

E

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate error

.4

.2

.4

.2 .4

.4

.4

.4

.4

.4

Classify images

A

B

C

D

E

F

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate error

.2

.2

.4

.4

.4

.4

.4

Select classifier

A

B

C

D

E

F

.2

.2

.2

.2

.2

Adaboost Model

.4

Calculate Adaboost weight

.4

0.2027

.2

.2

.2

.2

.2

.2

.2

.2

.2

.2

-1

0.244

0.163

0.244

0.163

0.163

Recalculate classification error

A

B

C

D

E

F

Adaboost Model

0.2027F1

1

1

-1

Normalize weights

0.25

0.166

0.25

0.166

0.166

.4

.41

.33

.41

.41

.5

.4

.4

.4

Select classifier

A

B

C

D

E

F

Adaboost Model

.4

Calculate Adaboost weight

.4

F

0.25

0.166

0.25

0.166

0.166

0.25

0.166

0.25

0.166

0.166

Update classifiers

.33.33

0.3465

0.2027

Repeat as many times as is necessary

A

B

C

D

E

F

Adaboost Model

0.2027F0.12

0.25

0.18

0.25

0.18

0.3465

0.3942

0.3812

0.4722

0.16

0.16

0.25

0.16

0.25

0.20

0.18

0.30

0.18

0.13

0.14

0.13

0.22

0.28

0.21

0.26

0.23

0.15

0.20

0.15

D

Classification accuracy is now higher

D

E

F

0.26

0.23

0.15

0.20

0.15

A

B

C

Adaboost Model

0.2027F

0.3465

0.3942

0.3812

0.4722

D

A

B

C

Features Selected by Boosting

First two features selected by boosting:

This feature combination can yield 100% detection rate and 50% false positive rate

ROC Curve for 200-Feature Classifier

A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14084.

To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.

Attentional Cascade

Classifier 3Classifier 2Classifier 1IMAGE

SUB-WINDOWT T FACET

NON-FACE

F F

NON-FACE

F

NON-FACE

We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows

Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on

A negative outcome at any point leads to the immediate rejection of the sub-window

Attentional Cascade


SUB-WINDOWT T FACET

NON-FACE

F F

NON-FACE

F

NON-FACE

Chain classifiers that are progressively more complex and have lower false positive rates

0

1000 50

% False Pos

% D

etec

tion

ROC Curve

Detection Rate and False Positive Rate for Chained Classifiers


SUB-WINDOWT T FACET

NON-FACE

F F

NON-FACE

F

NON-FACE

The detection rate and the false positive rate of the cascade are found by multiplying the respective rates of the individual stages

A detection rate of 0.9 and a false positive rate on the order of 106 can be achieved by a 10-stage cascade if each stage has a detection rate of 0.99 (0.9910 ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6106 )

11,f d 22 ,f d 33 ,f d

11

, K

i

K

i

i i

F f D d

11

, K

i

K

i

i i

F f D d

,F D,F D

Training the Cascade

Set target detection and false positive rates for each stage

Keep adding features to the current stage until its target rates have been met – Test on a validation set

If the overall false positive rate is not low enough, then add another stage

Use false positives from current stage as the negative training examples for the next stage

Training the Cascade

ROC Curves Cascaded Classifier to Monlithic Classifier

ROC Curves Cascaded Classifier to Monlithic Classifier

There is little difference between There is little difference between the two in terms of accuracy.

There is a big difference in terms of speed.

– The cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they are never evaluated by subsequent stages.

The Implemented System

Training Data– 5000 faces

All frontal, rescaled to 24x24 pixels

– 300 million non-faces 9500 non-face images

– Faces are normalized Scale, translation

Many variations– Across individuals– Illumination– Pose

Structure of the Detector Cascade

Combining successively more complex classifiers in cascade– 38 stages

– included a total of 6060 features

Reject Sub-Window

1 2 3 4 5 6 7 8 3838 Face

F F F F F F F F F

T T T T T T T T T

All Sub-Windows

Structure of the Detector Cascade

Reject Sub-Window

1 2 3 4 5 6 7 8 3838 Face

F F F F F F F F F

T T T T T T T T T

All Sub-Windows

2 features, reject 50% non-faces, detect 100% faces

10 features, reject 80% non-faces, detect 100% faces

25 features

50 features

………

Speed of the Final Detector

On a 700 Mhz Pentium III processor, the face detector can process a 384 288 pixel image in about .067 seconds– 15 Hz

– 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998)

Average of 8 features evaluated per window on test set

Image Processing

Training all example sub-windows were variance normalized to minimize the effect of different lighting conditions

Detection variance normalized as well

2 2 21N

m x

Can be computed using integral image

Can be computed using integral image of

squared image

Scaling the Detector

Scaling is achieved by scaling the detector itself, rather than scaling the image

Good detection results for scaling factor of 1.25

The detector is scanned across location– Subsequent locations are obtained by shifting the

window [s] pixels, where s is the current scale

– The result for = 1.0 and = 1.5 were reported

Merging Multiple Detections

Output of Face Detector on Test Images

Other Detection Tasks

Facial Feature Localization

Male vs.

female

Profile Detection

CS472 – the end

Thanks for your interest and presence in the course!

Good luck with your exams!

Have a nice summer!

Best wishes for whatever you do next!…– If further interested in the topic of

Computer Vision, let me know!

35 cv facedetection 3106.ppt

Documents