combining detectors for human layout analysis
TRANSCRIPT
Combining detectors forCombining detectors for human layout analysis.
M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera,M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, A. Lapedriza, D. Masip, P. Radeva, J. Vitrià
Barcelona Perceptual Computing LabD d M à i A li d i A àli i F l d M à iDept. de Matemàtica Aplicada i Anàlisi, Facultat de Matemàtiques
Universitat de Barcelona, Gran Via 585, 08007 Barcelona, Catalonia, Spain
Human Layout AnalysisHuman Layout AnalysisPredicting the bounding box, label and presence
f h t f (h d h d f t)of each part of a person (head, hands, feet)...
Head
Hand
Head
Hand
FootFoot
in a dataset depicting extremely variable human… in a dataset depicting extremely variable human poses and viewpoints in complex backgrounds.
ApproachApproachOur main objective was not to build a new approach to deal with the fullOur main objective was not to build a new approach to deal with the fullcomplexity of the problem but to test some state‐of‐art detectoralgorithms in order to evaluate their limits.
We made the following assumptions:
• We are provided with a bounding box of• We are provided with a bounding box ofthe visible parts of the body.
Th h d i i ibl i h li bl• The head is visible is the most reliablehuman part to detect. The head can beused as an anchor for other parts.
Person
Head DetectionHead DetectionThe head is detected by integrating several state‐of‐the‐art part detectors:The head is detected by integrating several state of the art part detectors:
Face (frontal + lateral) detection
Person detection using poselets
Person detection using Pictorial
Person Detectionlateral) detection using poselets using Pictorial
ModelDetectionusing
DiscriminativelyTrained Part‐Based Models
Head Detection: FaceHead Detection: FaceFaces were detected with OpenCV 2.1.
Details of the implementation:• We use the following cascades:
• Frontal face (default alt alt2 alt tree)Frontal face (default, alt, alt2, alt_tree).• Lateral face (profile).
• Each cascade return several (from 0 up to N) hypothesisabout head position.• To integrate the results we use hierarchical clustering.• The final head box is the one with the maximum scoregiven by hierarchical clustering.
Face (frontal + lateral) detection
0.9
1
prec
isio
n
subset: val, part: head, AP = 0.530
lateral) detection
References: Viola, Jones: Robust Real‐time Object Detection, IJCV 2001
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.8
recall
p
Head Detection: Body (i)Head Detection: Body (i)We use a person detection system proposed byF l lb t lFelzenszwalb et al.
Details of the implementation:• Software version: Discriminatively Trained• Software version: Discriminatively TrainedDeformable Part Models Version 4.• Based on model aspect analysis we choose 4 modelswhich best detect the head position.• For each model we choose the component related withhead position in order to fix the box.
0.70.80.9
prec
isio
n
subset: val, part: head, AP = 0.459
Persondetection
References:• P Felzenszwalb R Girshick D McAllester D Ramanan Object Detection with
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7recall
detection
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection withDiscriminatively Trained Part Based Models, PAMI 2009
Head Detection: Body (ii)Head Detection: Body (ii)We use the body detection system proposed by Bourdevet alet al.
• Initially, we used the set of 1138 poselets trained from the H3Ddatabase.• The poselets were trained to vote for position and size of theThe poselets were trained to vote for position and size of thehead.• In order to improve results a hierarchical clustering per poseletwas introduced.• From original poselets set, we selected the 239 poselets whichFrom original poselets set, we selected the 239 poselets whichgives the best, in terms of reliability, votes for the head position.The used selection criteria was the standard deviation (std) ofvotes for head.• If std was smaller than a defined threshold then the poselet was
Poseletsdetection
If std was smaller than a defined threshold then the poselet wasdefined as reliable.
0 9on
subset: val, part: head, AP = 0.425
detection
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.70.80.9
recall
prec
isio
Reference:• Lubomir Bourdev, Jitendra Malik, Poselets: Body Part Detectors Trained Using 3D Human PoseAnnotations, ICCV 2009.
Head Detection: Body (iii)Head Detection: Body (iii)We used the implementation of CRF‐based poseWe used the implementation of CRF based poserecovery.
The method is applied atppdifferent scales and thesolution with the maximumresponse is selected.
subset: val, part: head, AP = 0.125
Pictorial Model 0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
0.5
1
prec
isio
n
subset: val, part: head, AP 0.125
Reference:• Ramanan, D. Learning to Parse Images of Articulated Bodies , In NIPS, 2006.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35recall
Head Detection: IntegrationHead Detection: IntegrationOpenCV OpenCV OpenCV OpenCVOpenCVFrontal1
OpenCVFrontal2
OpenCVFrontal3
OpenCVFrontal4
Hierarchical
w1 w2 w3 w4
OpenCVProfile Body 1 Body 2 Body 3OpenCV
Head
Clustering
Hierarchical
w5 w6 w7 w8 w9
HEAD
Clustering
Head boxes
Max Score Head Box
wi are the weights assigned to each detector. These weights are set using cross‐validation.
Head Detection: IntegrationHead Detection: Integration
For each head box the hierarchical clustering computesthe scorethe score.
The higher score the larger probability of correct headdetection.detection.
The max score head box is the head detection with thebiggest probability.
HEAD
gg p y
The score is used as a confidence value for eachdetection.
If there are 2 highly probable heads inside layoutbounding box the score is divided by 2.
Head Detection: ResultsHead Detection: Results
Confidence 0.5 Confidence 0.8 Confidence 1.6 Confidence 2.25
0.9ion
subset: val, part: head, AP = 0.753
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.70.8
prec
is
recallb l h d AP 0 000
Hand DetectionHand Detection
Image
Head Detection Face Skin Segmentationg
Blob selection using the person skin model
Image Segmentation using [1]
Filtering
HandDetection
References:[1] Dorin Comaniciu , Peter Meer , Senior Member, Mean shift: A robust approach towardfeature space analysis , PAMI 2002
Hand Detection: ResultsHand Detection: Results
Feet DetectionFeet DetectionWe use a person detection system proposed byFelzenszwalb et al.
Details of the implementation:
• Software: Discriminatively Trained Deformable PartModels Version 4.• Based on model aspect analysis we choose 2 full body• Based on model aspect analysis we choose 2 full bodymodels which detect the feet.• For each model we choose the component related withlower leg position.• Inside lower leg box the foot box is fixed by cross‐validation.
FeetDetection
References: P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object DetectionReferences: P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detectionwith Discriminatively Trained Part Based Models, PAMI 2009
Feet Detection: ResultsFeet Detection: Results
0.5
ion
subset: val, part: foot, AP = 0.013
0 0.01 0.02 0.03 0.04 0.05 0.06 0.070
recall
prec
isi
recall
ConclusionsConclusionsBy integrating state‐of‐the‐art part detectors is possible to obtain goodBy integrating state‐of‐the‐art part detectors is possible to obtain goodresults in head detection.
Our detector system allows to detect heads in following conditions:• Frontal Heads• Picture taken from behind• Heads with partial occlusions
Our detector has problems with :• Rotated images.
f l h d h d d• Profile heads with no standard pose.• Several heads in the layout box.
Head detection is mature enough for developing a fast robust detector providedHead detection is mature enough for developing a fast, robust detector, providedthat we find how to efficiently deal with these problems.
ConclusionsConclusionsFeet detection could be improved by training more diverse models forFeet detection could be improved by training more diverse models for
legs. Presence is an open problem for low resolution images.
H d d t ti h ld b h d h bl (f iHand detection should be approached as a search problem (f.e. usingthe CHAINS model [2]). These models could be easily extended tofeet detection.
Detectors must be integrated in pictorial models that are able to dealwith a large variety of detectors.
It is necessary to build better databases with clear labeling policies.
References:[2] L Karlinsky, M Dinerstein, D Harari, S Ullman, The chains model for detecting parts by their context , CVPR 2010
Combining detectors for human layout analysis.M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, A. Lapedriza, D. Masip, P. Radeva, J. Vitrià
Barcelona Perceptual Computing LabBarcelona Perceptual Computing LabUniversitat de Barcelona, Computer Vision Center, Universitat Oberta de Catalunya
Assumption: The head is visible and is the most reliable human part to detect. Head position is used as an anchor point for detecting other parts.
HEAD Average Precision %: 74.4
The head is detected by integrating evidences from several state‐of‐the‐art part d f l d fil f ( ) ( l lb l)detectors: frontal and profile faces (OpenCV), person parts (Felzenszwalb et al), informative poselets (selected from the full set of Bourdev at al), and pictorial models (Ramanan et al.)
Integration is based on weighted hierarchical clustering of head windows hypothesisIntegration is based on weighted hierarchical clustering of head windows hypothesis.
HANDS Average Precision %: 3.3
A specific skin model is extracted from the detected face This model is used toA specific skin model is extracted from the detected face. This model is used to segment the image using Mean Shift. Resulting blobs are filtered using geometric constraints.
FEET Average Precision %: 1 2FEET Average Precision %: 1.2
We use a person detection system proposed by Felzenszwalb et al. We have selectedtwo 2 full body models which include feet in their components. For each model wechoose the component related to the leg position Inside the leg box the foot boxchoose the component related to the leg position. Inside the leg box the foot boxposition is learned by cross‐validation.