a robust real time face detection. outline adaboost – learning algorithm face detection in real...
Post on 21-Dec-2015
232 views
TRANSCRIPT
A Robust Real Time Face A Robust Real Time Face DetectionDetection
OutlineOutline
AdaBoost – Learning AlgorithmAdaBoost – Learning Algorithm Face Detection in real lifeFace Detection in real life Using AdaBoost for Face DetectionUsing AdaBoost for Face Detection ImprovementsImprovements DemonstrationDemonstration
AdaBoostAdaBoost
A short Introduction to Boosting (Freund & Schapire, 1999)A short Introduction to Boosting (Freund & Schapire, 1999)Logistic Regression, AdaBoost and Bregman Distances Logistic Regression, AdaBoost and Bregman Distances
(Collins, Schapire, Singer, 2002)(Collins, Schapire, Singer, 2002)
BoostingBoosting
The Horse-Racing The Horse-Racing Gambler ProblemGambler Problem– Rules of thumbRules of thumb for a set of races for a set of races– How should we choose the set of races in order How should we choose the set of races in order
to get to get the best rules of thumbthe best rules of thumb??– How should the rules be combinedHow should the rules be combined into a single into a single
highly accurate prediction rule?highly accurate prediction rule?
Boosting !Boosting !
BoostingBoosting
AdaBoost - the ideaAdaBoost - the idea
Initialize Initialize sample weightssample weights For each cycle:For each cycle:
– Find a Find a classifier that performs classifier that performs wellwell on the weighted sample on the weighted sample
– Increase weightsIncrease weights of misclassified of misclassified examplesexamples
Return a Return a weighted list of weighted list of classifiersclassifiers
AdaBoost agglomerates AdaBoost agglomerates many weak classifiersmany weak classifiers into one strong classifier.into one strong classifier.
Shoe sizeShoe size
IQ
AdaBoost - algorithmAdaBoost - algorithm
T
ttt
t
yxhe
yxhet
tt
t
tt
iitDitt
t
iimm
xhsignxH
ZZ
iDiD
yxhXh
D
Ty
miD
YyXxyxyx
iiit
iit
t
1
)( if
)( if
1
~
1
11
))(()(
:hypothesis final Output the
factorion normalizat a is where)(
)( e Updat
)1
ln(2
1 Choose
])([Prerror with }1,1{: hypothesisGet weak
on distributi using classifierbest weak Select the
..1For
/1)( Initialize
}1,1{, where),(),..,,(Given
example decision
distribution
step
AdaBoost – AdaBoost – training errortraining error
Freund and Schapire (1997) proved that:Freund and Schapire (1997) proved that:
AdaBoost AdaBoost adapts adapts to the to the error rateserror rates of the of the individual weak hypotheses.individual weak hypotheses.– Therefore it is called Therefore it is called ADAADABoost.Boost.
ttγeHerr
T
tt
2
1 where,)(' 1
2
AdaBoost – AdaBoost – generalization errorgeneralization error Freund and Schapire (1997) showed that:Freund and Schapire (1997) showed that:
sizeset training-
rounds ofnumber -
dimension VC -
sample trainingon they probabilit empirical the-Pr
:where
,)(])([Pr')(
m
T
d
y] '[H(x)
m
TdOyxHHerr
AdaBoost – generalization errorAdaBoost – generalization error
The analysis implies that boosting will The analysis implies that boosting will overfitoverfit if the algorithm is run if the algorithm is run for too many roundsfor too many rounds
However, it was observed However, it was observed empiricallyempirically that that AdaBoost does not overfitAdaBoost does not overfit, , – even when run even when run thousands of roundsthousands of rounds..
Moreover, it was observed that the Moreover, it was observed that the generalization errorgeneralization error continues to drive downcontinues to drive down long after long after training errortraining error reached zero reached zero
AdaBoost – generalization errorAdaBoost – generalization error An alternative analysis was presented by An alternative analysis was presented by
Schapire et al. (1998), that Schapire et al. (1998), that suits the empirical suits the empirical findingsfindings
xhy
yxt
tt
m
dOyxHerr
)(),(margin
:where
2 -1] )(]),(margin[Pr')(Pr[
AdaBoost – different point of viewAdaBoost – different point of view
We try to solve the problem of We try to solve the problem of approximating the approximating the yy’s’s using a using a linear combinationlinear combination of weak hypotheses of weak hypotheses
In other words, we are interested in the problem of In other words, we are interested in the problem of finding a finding a vector of parameters vector of parameters α α such thatsuch that
is a ‘is a ‘good approximationgood approximation’ of ’ of yyii
For classification problems we try to For classification problems we try to match the match the signsign of of f(xf(xii) to y) to yii
n
jijji xhxf
1
)()(
AdaBoost – different point of viewAdaBoost – different point of view
Sometimes it is advantageous to Sometimes it is advantageous to minimize some minimize some other (non-negative) loss functionother (non-negative) loss function instead of the instead of the number of classification errorsnumber of classification errors
For AdaBoost the loss function isFor AdaBoost the loss function is
This point of view was used by This point of view was used by Collins, Schapire Collins, Schapire and Singer (2002) to and Singer (2002) to demonstrate that AdaBoost demonstrate that AdaBoost converges to optimalityconverges to optimality
n
iii xfy
1
))(exp(
Face Detection Face Detection (not face recognition)(not face recognition)
Face Detection in Face Detection in MonkeysMonkeys
There are cells that There are cells that ‘detect faces’‘detect faces’
Face Detection in HumanFace Detection in HumanThere are ‘There are ‘processes of face detectionprocesses of face detection’’
Faces Are SpecialFaces Are SpecialWe humans analyze faces in a ‘different way’We humans analyze faces in a ‘different way’
Faces Are SpecialFaces Are SpecialWe humans analyze faces in a ‘different way’We humans analyze faces in a ‘different way’
Faces Are SpecialFaces Are Special
We analyze faces in a We analyze faces in a ‘different way’‘different way’
Face Recognition in HumanFace Recognition in Human
We analyze faces We analyze faces ‘in a specific location’‘in a specific location’
RobustRobust Real-TimeReal-Time Face Face DetectionDetection
Viola and Jones, 2003Viola and Jones, 2003
FeaturesFeatures
Picture analysis, Integral ImagePicture analysis, Integral Image
FeaturesFeatures The system classifies images based on the The system classifies images based on the value value
of simple featuresof simple features
Two-rectangle
Three-rectangle
Four-rectangle
Value =
∑ (pixels in white area) - ∑ (pixels in black area)
Contrast FeaturesContrast Features
Source
Result
Notice that each feature is related to a Notice that each feature is related to a special locationspecial location in the sub-window in the sub-window
FeaturesFeatures
Notice that each feature is related to a Notice that each feature is related to a special locationspecial location in the sub-window in the sub-window
Why features and not pixels?Why features and not pixels?– EncodeEncode domaindomain knowledge knowledge– Feature based system operates Feature based system operates fasterfaster– Inspiration Inspiration from humanfrom human vision vision
FeaturesFeatures
Later we will see that there are Later we will see that there are other other featuresfeatures that can be used to implement an that can be used to implement an efficient face detectorefficient face detector
The original system of The original system of Viola and JonesViola and Jones used used only rectangle featuresonly rectangle features
ComputingComputing Features Features
Given a Given a detection resolutiondetection resolution of of 24x2424x24, and , and size of ~200x200, the set of rectangle size of ~200x200, the set of rectangle features is features is ~160,000~160,000 ! !
We need to find a way to We need to find a way to rapidly compute rapidly compute the featuresthe features
Integral ImageIntegral Image Intermediate Intermediate
representationrepresentation of the of the imageimage
Computed in one pass Computed in one pass over the original imageover the original image
yyxx
yxiyxii','
)','(),(
0),1(
0)1,(
),(),1(),(
),()1,(),(
yii
xs
yxsyxiiyxii
yxiyxsyxs
Integral ImageIntegral Image
Using the integral image representation one can compute the value of any rectangular sum in constant time.
For example the integral sum inside rectangle D we can compute as:
ii(4) + ii(1) – ii(2) – ii(3)
(x,y)
s(x,y) = s(x,y-1) + i(x,y)
ii(x,y) = ii(x-1,y) + s(x,y)
(0,0)
x
y
Integral ImageIntegral Image
-1 +1+2-1
-2+1
Integral Image
(x,y)
(x,y)
Building a DetectorBuilding a Detector
Cascading, training a cascadeCascading, training a cascade
Main IdeasMain Ideas
The The FeaturesFeatures will be will be used as weak used as weak classifiersclassifiers
We will We will concatenate several detectorsconcatenate several detectors seriallyserially into a cascade into a cascade
We will We will boostboost (using a version of AdaBoost) (using a version of AdaBoost) a number of features a number of features to get ‘good enough’ to get ‘good enough’ detectorsdetectors
Weak ClassifiersWeak Classifiers
Weak ClassifierWeak Classifier : A feature which : A feature which best separatesbest separates the examples the examples
Given a Given a sub-window (sub-window (xx),), a feature ( a feature (ff), ), a threshold a threshold ((ΘΘ),), and a and a polarity (polarity (pp)) indicating the direction of indicating the direction of the inequality:the inequality:
pxpfpfxh )(1),,,(
Probability for this threshold
Weak ClassifiersWeak Classifiers
A A weak classifierweak classifier is a combination of a is a combination of a feature feature and a and a thresholdthreshold
We have We have K K featuresfeatures We have We have NN thresholds where thresholds where NN is the is the
number of examplesnumber of examples Thus there are Thus there are KNKN weak classifiers weak classifiers
Weak Classifier SelectionWeak Classifier Selection
For For each featureeach feature sort the examplessort the examples based on based on feature valuefeature value
For each element For each element evaluate the total sumevaluate the total sum of of positive/negative example weights (T+/T-) and positive/negative example weights (T+/T-) and the sum of positive/negative weightsthe sum of positive/negative weights below the below the current example (S+/S-)current example (S+/S-)
The The error for a thresholderror for a threshold which splits the range which splits the range between the current and previous example in the between the current and previous example in the sorted list is :sorted list is :
))(),(min( STSSTSe
An exampleAn example
xxyyffWWT+T+T-T-S+S+S-S-AABBee
X1X1-1-1221/51/53/53/52/52/500002/52/53/53/52/52/5
X2X2-1-1331/51/53/53/52/52/5001/51/51/51/54/54/51/51/5
X3X311551/51/53/53/52/52/5002/52/5005/55/500
X4X411771/51/53/53/52/52/51/51/52/52/51/51/54/54/51/51/5
X5X511881/51/53/53/52/52/52/52/52/52/52/52/53/53/52/52/5
examples
weight
Error = min(A,B)
positive/negative example weightspositive/negative example weights
the sum of positive/negative the sum of positive/negative weightsweights below the current below the current exampleexample
decision
For e calculationFeature
value
Main Ideas: Main Ideas: CascadingCascading The Features will be used as weak classifiersThe Features will be used as weak classifiers We will concatenate several detectors serially We will concatenate several detectors serially
into a cascadeinto a cascade We will boost (using a version of AdaBoost) a We will boost (using a version of AdaBoost) a
number of features to get ‘good enough’ number of features to get ‘good enough’ detectorsdetectors
CascadingCascading We We start with simple classifiersstart with simple classifiers which which reject reject
many of the negative sub-windowsmany of the negative sub-windows while while detecting almost all positive sub-windowsdetecting almost all positive sub-windows
Positive results from the first classifier Positive results from the first classifier triggers triggers the evaluation of a secondthe evaluation of a second (more complex) (more complex) classifier, and so onclassifier, and so on
A A negative outcomenegative outcome at any point leads to the at any point leads to the immediate rejectionimmediate rejection of the sub-window of the sub-window
CascadingCascading
Main Ideas: Main Ideas: BoostingBoosting
The Features will be used as weak The Features will be used as weak classifiersclassifiers
We will concatenate several detectors We will concatenate several detectors serially into a cascadeserially into a cascade
We will boost (using a version of AdaBoost) We will boost (using a version of AdaBoost) a number of features to get ‘good enough’ a number of features to get ‘good enough’ detectorsdetectors
TrainingTraining a cascade a cascade User selects values for: User selects values for:
– Maximum acceptableMaximum acceptable false positive ratefalse positive rate per per layerlayer
– Minimum acceptableMinimum acceptable detection ratedetection rate per layer per layer– Target Target overall overall false positivefalse positive rate rate
User givesUser gives a set of positive and negative a set of positive and negative examplesexamples
Training a cascade (cont.)Training a cascade (cont.) While the While the overall false positive rateoverall false positive rate is not metis not met::
– While the While the false positive rate of current layerfalse positive rate of current layer is less is less than the maximum per layer:than the maximum per layer: Train Train a classifier with a classifier with nn features features using AdaBoostusing AdaBoost on a set on a set
of of positive and negativepositive and negative examples examples Decrease thresholdDecrease threshold when the current classifier detection when the current classifier detection
rate of the layer is more than the minimumrate of the layer is more than the minimum Evaluate Evaluate current cascade classifier current cascade classifier on validation seton validation set
– EvaluateEvaluate current cascade detector on a set of current cascade detector on a set of non non faces imagesfaces images and and put any false detectionsput any false detections into the into the negative training setnegative training set
ResultsResults
Training Data SetTraining Data Set
4916 4916 hand labeledhand labeled faces faces Aligned to base resolution Aligned to base resolution
(24x24)(24x24) Non facesNon faces for first layer for first layer
were collected from were collected from 9500 9500 non facesnon faces images images
Non faces for subsequent Non faces for subsequent layers were obtained by layers were obtained by scanning the partial scanning the partial cascade across non facescascade across non faces and and collecting false collecting false positivespositives (max 6000 for (max 6000 for each layer)each layer)
Structure of the DetectorStructure of the Detector
38 38 layer cascadelayer cascade 60606060 features features
Layer number 1 2 3 to 4 5 to 38Number of feautures 2 10 50 -Detection rate 100% 100% - -Rejection rate 50% 80% - -
Speed of final DetectorSpeed of final Detector
On a 700Mhz Pentium III processor, the On a 700Mhz Pentium III processor, the face detectorface detector can process a can process a 384 by 288 384 by 288 pixel imagepixel image in about .067 seconds in about .067 seconds
ImprovementsImprovements
Learning Object Detection from a Learning Object Detection from a Small Number of Examples: the Small Number of Examples: the Importance of Good Features Importance of Good Features
(Levy & Weiss, 2004)(Levy & Weiss, 2004)
ImprovementsImprovements
Performance depends crucially on the Performance depends crucially on the featuresfeatures that are used to represent the that are used to represent the objects (Levy & Weiss, 2004)objects (Levy & Weiss, 2004)
Good FeaturesGood Features imply: imply:– Good results from Good results from small training databasessmall training databases– BetterBetter generalizationgeneralization abilities abilities– Shorter (Shorter (fasterfaster) classifiers) classifiers
Edge Orientation HistogramEdge Orientation Histogram
InvariantInvariant to to global illuminationglobal illumination changes changes Captures Captures geometric propertiesgeometric properties of faces of faces Domain knowledgeDomain knowledge represented: represented:
– Inner part of the face includes Inner part of the face includes more horizontalmore horizontal edges then edges then verticalvertical
– The The ration between vertical and horizontal edgesration between vertical and horizontal edges is bounded is bounded– The area of theThe area of the eyes eyes includes includes mainly horizontal edgesmainly horizontal edges– The The chin chin has more or less the same number of has more or less the same number of oblique oblique
edgesedges on both sides on both sides
Edge Orientation Edge Orientation HistogramHistogram
Called EOHCalled EOH The EOH can be calculated using some kind of The EOH can be calculated using some kind of
Integral ImageIntegral Image::– We find the We find the gradientsgradients at the point (x,y) using at the point (x,y) using Sobel masksSobel masks– We calculate the We calculate the orientation orientation of the edge (x,y)of the edge (x,y)– We We divide the edgesdivide the edges into K bins into K bins– The result is The result is stored in K matricesstored in K matrices– We use the same We use the same idea of Integral Imageidea of Integral Image for the matricesfor the matrices
EOH FeaturesEOH Features The ratio between two The ratio between two
orientationsorientations The dominance of a given The dominance of a given
orientationorientation Symmetry FeaturesSymmetry Features
ResultsResults
Already with Already with only 250 positive examplesonly 250 positive examples we can we can see see above 90% detection rateabove 90% detection rate
Faster classifierFaster classifier Better performance in profile facesBetter performance in profile faces
DemoDemoImplementing Viola & Jones systemImplementing Viola & Jones system
Frank Fritze, 2004Frank Fritze, 2004