35 cv facedetection 3106.ppt
TRANSCRIPT
Detection theory
Assume a population of entities. Each individual
may or may not belong to a certain class.
Assume a classifier that tests whether each
individual belongs to the class or not. That is, it
provides a binary answer to the question,
“does this individual belong to the class?”
Detection theory
Therefore, the situation is as follows:
What happens in reality
(actual class)
Belongs Does not
belong
What the method
estimates
(predicted class)
Belongs TP FP
Does not
belong
FN TN
• 1st letter: Whether estimation is right
• 2nd letter: What the estimation really is (yes/no)
Detection theory
Thus:
TP: The method correctly (T) decides that the
individual belongs to the class (P).
TN: The method correctly (T) decides that the
individual does not belong to the class (N).
FP: The method incorrectly (F) decides that the
individual belongs to the class (P).
FN: The method incorrectly (F) decides that the
individual does not belong to the class (N).
Detection theory
TP
FP
TN
FN
TRUTH
CLASSIFIER
THE POPULATION
Detection theory
Precision = TP / (TP + FP)
Recall = Sensitivity = True
Positive Rate (TPR)
= TP / (TP + FN)
False Positive Rate (FPR) =
Fall-out = FP/(FP+TN)
= 1 - Specificity
Specificity = True Negative
Rate (TNR) = TN / (FP+TN)
= 1 – FPR
Accuracy =
= (TP+TN)/(TP+TN+FP+FN)
F-measure =
= 2*TP/(2*TP+FP+FN)
TPFP
TN
FN
TRUTH
CLASSIFIER
Discriminative approach: build a classifier that decides whether a rectangular area (window) contains a face or not
Slide a window across image and evaluate a face model at every location.
Basic Idea
Detection
Challenges Slide a window across image and evaluate a face model at
every location.
Illumination/lighting conditions, pose, scale, …
Sliding window detector must evaluate tens of thousands of
location/scale combinations.
Faces are rare: 0–10 per image
– For computational efficiency, we should try to spend as little
time as possible on the non-face windows
– A megapixel image has ~106 pixels and a comparable number of
candidate face locations
– To avoid having a false positive in every image, our false
positive rate has to be less than 106
The Task ofFace Detection
The Viola/Jones Face Detector
A seminal approach to real-time object detection
Training is slow, but detection is very fast
Key ideas– Integral images for fast feature evaluation
– Boosting for feature selection
– Attentional cascade for fast rejection of non-face windows
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
Image Features
Rectangle filtersFeature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
Image Features
Rectangle filters
Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
Example
Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
Source
Result
Integral images
The integral image computes a value at each pixel (x, y)that is the sum of the pixel values above and to the left of (x, y), inclusive.
(x, y)
,
( , ) ( , )x x y y
ii x y i x y
( , 1)ii x y
( 1, )s x y
Computing the Integral Image
The integral image computes a value at each pixel (x, y)that is the sum of the pixel values above and to the left of (x, y), inclusive.
This can quickly be computed in one pass through the image.
The picture can't be displayed.
(x, y)
,
( , ) ( , )x x y y
ii x y i x y
( , )i x y
( , ) ( 1, ) ( , )s x y s x y i x y
( , ) ( , 1) ( , )ii x y ii x y s x y
Computing Sum within a Rectangle
The picture can't be displayed.
A B C Dsum ii ii ii ii D B
C A
Scaling
Integral image enables us to evaluate allrectangle sizes in constant time.
Therefore, no image scaling is necessary.
Scale the rectangular features instead!
1 2
3 4
5 6
Size of Feature Space
Feature Value (Pixel in white area)
(Pixel in black area)
Feature Value (Pixel in white area)
(Pixel in black area)
• How many are the possible rectangular features for a 24x24 detection region?
Rectangle filters
The picture can't be displayed.
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
A+B
C
D
160,000∼
Feature Selection
• How many number of possible rectangle features for a 24x24 detection region?
The picture can't be displayed.
A+B
C
D
160,000∼
What features are good for face detection?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
Feature Selection
• How many number of possible rectangle features for a 24x24 detection region?
The picture can't be displayed.
A+B
C
D
160,000∼
Can we create a good classifier
using just a small subset of all
possible features?
How to select such a subset?
12 24
1 1
8 24
1 1
12 12
1 1
2 (24 2 1)(24 1)
2 (24 3 1)(24 1)
(24 2 1)(24 2 1)
w h
w h
w h
w h
w h
w h
Boosting
Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier– A weak learner need only do better than chance
Training consists of multiple boosting rounds– During each boosting round, we select a weak learner
that does well on examples that were hard for the previous weak learners
– “Hardness” is captured by weights attached to training examples
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese
Society for Artificial Intelligence, 14(5):771-780, September, 1999.
AdaBoost Concept
1 { 1 }) , 1(h x
...
weak classifiers
slightly better than random
1
( )( )T
t
tT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
Weaker Classifiers
1 { 1 }) , 1(h x
...
weak classifiers
slightly better than random
1
( )( )T
t
tT th xH x sign
2 { 1 }) , 1(h x
{ 1( }) , 1Th x
strong classifier
Each weak classifier learns by considering one simple feature
T most beneficial features for classification should be selected
How to– define features?
– select beneficial features?
– train weak classifiers?
– manage (weight) training samples?
– associate weight to each weak classifier?
Boosting
Training set contains face and nonface examples– Initially, with equal weight
For each round of boosting:– Evaluate each rectangle filter on each example
– Select best threshold for each filter
– Select best filter/threshold combination
– Reweight examples
Computational complexity of learning: O(MNK)
– M rounds, N examples, K features
Boosting: illustration
Weak
Classifier 1
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 2
Boosting illustration
Weights
Increased
Boosting illustration
Weak
Classifier 3
Boosting illustration
Final classifier is
a combination of weak
classifiers
The AdaBoost Algorithm
Given:1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y …
Initialization: 11( ) , 1, ,
mD i i m …
For :1, ,t T …
• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j i
ih
h D i y h x
:probability distribution of 's at time ( )t iD i x t:probability distribution of 's at time ( )t iD i x t
• Weight classifier:11
ln2
tt
t
• Update distribution:1
( )exp[ ( )], is for normalizati( ) ont t i t i
t t
t
D i y h xD i Z
Z
minimize weighted errorminimize weighted error
Give error classified patterns more chance for learning.Give error classified patterns more chance for learning.
The AdaBoost Algorithm
Given:1 1 where ( , ), , ( , ) , { 1, 1}m m i ix y x y x X y …
Initialization: 11( ) , 1, ,
mD i i m …
For :1, ,t T …
• Find classifier which minimizes error wrt Dt ,i.e.,: { 1, 1}th X
1
where arg min ( )[ ( )]j
m
t j j t i j i
ih
h D i y h x
• Weight classifier:11
ln2
tt
t
• Update distribution:1
( )exp[ ( )], is for normalizati( ) ont t i t i
t t
t
D i y h xD i Z
Z
Output final classifier:1
( ) ( )T
t t
t
sign H x h x
Initialize classification error
Classify imagesCalculate error
A
B
C
D
E
F
.2
.2
.2
.2
.2
Adaboost Model
=
.2
.2
+ .4
Classify images
A
B
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate error
.2
.2 .4
.4
Classify images
A
B
C
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate error
.2
.2.4
.4
.4
Classify images
A
B
C
D
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate error
.2
.2.4
.4
.4
Classify images
A
B
C
D
E
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate error
.4
.2
.4
.2 .4
.4
.4
.4
.4
.4
Classify images
A
B
C
D
E
F
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate error
.2
.2
.4
.4
.4
.4
.4
Select classifier
A
B
C
D
E
F
.2
.2
.2
.2
.2
Adaboost Model
.4
Calculate Adaboost weight
.4
0.2027
.2
.2
.2
.2
.2
.2
.2
.2
.2
.2
-1
0.244
0.163
0.244
0.163
0.163
Recalculate classification error
A
B
C
D
E
F
Adaboost Model
0.2027F1
1
1
-1
Normalize weights
0.25
0.166
0.25
0.166
0.166
.4
.41
.33
.41
.41
.5
.4
.4
.4
Select classifier
A
B
C
D
E
F
Adaboost Model
.4
Calculate Adaboost weight
.4
F
0.25
0.166
0.25
0.166
0.166
0.25
0.166
0.25
0.166
0.166
Update classifiers
.33.33
0.3465
0.2027
Repeat as many times as is necessary
A
B
C
D
E
F
Adaboost Model
0.2027F0.12
0.25
0.18
0.25
0.18
0.3465
0.3942
0.3812
0.4722
0.16
0.16
0.25
0.16
0.25
0.20
0.18
0.30
0.18
0.13
0.14
0.13
0.22
0.28
0.21
0.26
0.23
0.15
0.20
0.15
D
Classification accuracy is now higher
D
E
F
0.26
0.23
0.15
0.20
0.15
A
B
C
Adaboost Model
0.2027F
0.3465
0.3942
0.3812
0.4722
D
A
B
C
Features Selected by Boosting
First two features selected by boosting:
This feature combination can yield 100% detection rate and 50% false positive rate
ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14084.
To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.
Attentional Cascade
Classifier 3Classifier 2Classifier 1IMAGE
SUB-WINDOWT T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows
Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on
A negative outcome at any point leads to the immediate rejection of the sub-window
Attentional Cascade
Classifier 3Classifier 2Classifier 1IMAGE
SUB-WINDOWT T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
Chain classifiers that are progressively more complex and have lower false positive rates
0
1000 50
% False Pos
% D
etec
tion
ROC Curve
Detection Rate and False Positive Rate for Chained Classifiers
Classifier 3Classifier 2Classifier 1IMAGE
SUB-WINDOWT T FACET
NON-FACE
F F
NON-FACE
F
NON-FACE
The detection rate and the false positive rate of the cascade are found by multiplying the respective rates of the individual stages
A detection rate of 0.9 and a false positive rate on the order of 106 can be achieved by a 10-stage cascade if each stage has a detection rate of 0.99 (0.9910 ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6106 )
11,f d 22 ,f d 33 ,f d
11
, K
i
K
i
i i
F f D d
11
, K
i
K
i
i i
F f D d
,F D,F D
Training the Cascade
Set target detection and false positive rates for each stage
Keep adding features to the current stage until its target rates have been met – Test on a validation set
If the overall false positive rate is not low enough, then add another stage
Use false positives from current stage as the negative training examples for the next stage
Training the Cascade
ROC Curves Cascaded Classifier to Monlithic Classifier
ROC Curves Cascaded Classifier to Monlithic Classifier
There is little difference between There is little difference between the two in terms of accuracy.
There is a big difference in terms of speed.
– The cascaded classifier is nearly 10 times faster since its first stage throws out most non-faces so that they are never evaluated by subsequent stages.
The Implemented System
Training Data– 5000 faces
All frontal, rescaled to 24x24 pixels
– 300 million non-faces 9500 non-face images
– Faces are normalized Scale, translation
Many variations– Across individuals– Illumination– Pose
Structure of the Detector Cascade
Combining successively more complex classifiers in cascade– 38 stages
– included a total of 6060 features
Reject Sub-Window
1 2 3 4 5 6 7 8 3838 Face
F F F F F F F F F
T T T T T T T T T
All Sub-Windows
Structure of the Detector Cascade
Reject Sub-Window
1 2 3 4 5 6 7 8 3838 Face
F F F F F F F F F
T T T T T T T T T
All Sub-Windows
2 features, reject 50% non-faces, detect 100% faces
10 features, reject 80% non-faces, detect 100% faces
25 features
50 features
………
Speed of the Final Detector
On a 700 Mhz Pentium III processor, the face detector can process a 384 288 pixel image in about .067 seconds– 15 Hz
– 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998)
Average of 8 features evaluated per window on test set
Image Processing
Training all example sub-windows were variance normalized to minimize the effect of different lighting conditions
Detection variance normalized as well
2 2 21N
m x
Can be computed using integral image
Can be computed using integral image of
squared image
Scaling the Detector
Scaling is achieved by scaling the detector itself, rather than scaling the image
Good detection results for scaling factor of 1.25
The detector is scanned across location– Subsequent locations are obtained by shifting the
window [s] pixels, where s is the current scale
– The result for = 1.0 and = 1.5 were reported
Merging Multiple Detections
Output of Face Detector on Test Images
Other Detection Tasks
Facial Feature Localization
Male vs.
female
Profile Detection
CS472 – the end
Thanks for your interest and presence in the course!
Good luck with your exams!
Have a nice summer!
Best wishes for whatever you do next!…– If further interested in the topic of
Computer Vision, let me know!