combining detectors for human layout analysis

18
Combining detectors for Combining detectors for human layout analysis. M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, A. Lapedriza, D. Masip, P. Radeva, J. Vitrià Barcelona Perceptual Computing Lab D d M ài A li d iA àli i F l d M ài Dept. de Matemàtica Aplicada iAnàlisi, Facultat de Matemàtiques Universitat de Barcelona, Gran Via 585, 08007 Barcelona, Catalonia, Spain

Upload: others

Post on 27-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining detectors for human layout analysis

Combining detectors forCombining detectors for human layout analysis.

M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera,M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, A. Lapedriza, D. Masip, P. Radeva, J. Vitrià  

Barcelona Perceptual Computing LabD d M à i A li d i A àli i F l d M à iDept. de Matemàtica Aplicada i Anàlisi, Facultat de Matemàtiques

Universitat de Barcelona, Gran Via 585, 08007 Barcelona, Catalonia, Spain

Page 2: Combining detectors for human layout analysis

Human Layout AnalysisHuman Layout AnalysisPredicting the bounding box, label and presence

f h t f (h d h d f t)of each part of a person (head, hands, feet)...

Head

Hand

Head

Hand

FootFoot

in a dataset depicting extremely variable human… in a dataset depicting extremely variable human poses and viewpoints in complex backgrounds. 

Page 3: Combining detectors for human layout analysis

ApproachApproachOur main objective was not to build a new approach to deal with the fullOur main objective was not to build a new approach to deal with the fullcomplexity of the problem but to test some state‐of‐art detectoralgorithms in order to evaluate their limits.

We made the following assumptions:

• We are provided with a bounding box of• We are provided with a bounding box ofthe visible parts of the body.

Th h d i i ibl i h li bl• The head is visible is the most reliablehuman part to detect. The head can beused as an anchor for other parts.

Person

Page 4: Combining detectors for human layout analysis

Head DetectionHead DetectionThe head is detected by integrating several state‐of‐the‐art part detectors:The head is detected by integrating several state of the art part detectors:

Face (frontal + lateral) detection

Person detection using poselets

Person detection using Pictorial

Person Detectionlateral) detection using poselets using Pictorial 

ModelDetectionusing

DiscriminativelyTrained Part‐Based Models

Page 5: Combining detectors for human layout analysis

Head Detection: FaceHead Detection: FaceFaces were detected with OpenCV 2.1.

Details of the implementation:• We use the following cascades:

• Frontal face (default alt alt2 alt tree)Frontal face (default, alt, alt2, alt_tree).• Lateral face (profile).

• Each cascade return several (from 0 up to N) hypothesisabout head position.• To integrate the results we use hierarchical clustering.• The final head box is the one with the maximum scoregiven by hierarchical clustering.

Face (frontal + lateral) detection

0.9

1

prec

isio

n

subset: val, part: head, AP = 0.530

lateral) detection

References: Viola, Jones: Robust Real‐time Object Detection, IJCV 2001

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.8

recall

p

Page 6: Combining detectors for human layout analysis

Head Detection: Body (i)Head Detection: Body (i)We use a person detection system proposed byF l lb t lFelzenszwalb et al.

Details of the implementation:• Software version: Discriminatively Trained• Software version: Discriminatively TrainedDeformable Part Models Version 4.• Based on model aspect analysis we choose 4 modelswhich best detect the head position.• For each model we choose the component related withhead position in order to fix the box.

0.70.80.9

prec

isio

n

subset: val, part: head, AP = 0.459

Persondetection

References:• P Felzenszwalb R Girshick D McAllester D Ramanan Object Detection with

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7recall

detection

P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection withDiscriminatively Trained Part Based Models, PAMI 2009

Page 7: Combining detectors for human layout analysis

Head Detection: Body (ii)Head Detection: Body (ii)We use the body detection system proposed by Bourdevet alet al.

• Initially, we used the set of 1138 poselets trained from the H3Ddatabase.• The poselets were trained to vote for position and size of theThe poselets were trained to vote for position and size of thehead.• In order to improve results a hierarchical clustering per poseletwas introduced.• From original poselets set, we selected the 239 poselets whichFrom original poselets set, we selected the 239 poselets whichgives the best, in terms of reliability, votes for the head position.The used selection criteria was the standard deviation (std) ofvotes for head.• If std was smaller than a defined threshold then the poselet was

Poseletsdetection

If std was smaller than a defined threshold then the poselet wasdefined as reliable.

0 9on

subset: val, part: head, AP = 0.425

detection

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.70.80.9

recall

prec

isio

Reference:• Lubomir Bourdev, Jitendra Malik, Poselets: Body Part Detectors Trained Using 3D Human PoseAnnotations, ICCV 2009.

Page 8: Combining detectors for human layout analysis

Head Detection: Body (iii)Head Detection: Body (iii)We used the implementation of CRF‐based poseWe used the implementation of CRF based poserecovery.

The method is applied atppdifferent scales and thesolution with the maximumresponse is selected.

subset: val, part: head, AP = 0.125

Pictorial Model 0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.5

1

prec

isio

n

subset: val, part: head, AP 0.125

Reference:• Ramanan, D. Learning to Parse Images of Articulated Bodies , In NIPS, 2006.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35recall

Page 9: Combining detectors for human layout analysis

Head Detection: IntegrationHead Detection: IntegrationOpenCV OpenCV OpenCV OpenCVOpenCVFrontal1

OpenCVFrontal2

OpenCVFrontal3

OpenCVFrontal4

Hierarchical

w1 w2 w3 w4

OpenCVProfile Body 1 Body 2 Body 3OpenCV

Head

Clustering

Hierarchical

w5 w6 w7 w8 w9

HEAD

Clustering

Head boxes

Max Score Head Box

wi are the weights assigned to each detector. These weights are set using cross‐validation.

Page 10: Combining detectors for human layout analysis

Head Detection: IntegrationHead Detection: Integration

For each head box the hierarchical clustering computesthe scorethe score.

The higher score the larger probability of correct headdetection.detection.

The max score head box is the head detection with thebiggest probability.

HEAD

gg p y

The score is used as a confidence value for eachdetection.

If there are 2 highly probable heads inside layoutbounding box the score is divided by 2.

Page 11: Combining detectors for human layout analysis

Head Detection: ResultsHead Detection: Results

Confidence 0.5 Confidence 0.8 Confidence 1.6 Confidence 2.25

0.9ion

subset: val, part: head, AP = 0.753

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.70.8

prec

is

recallb l h d AP 0 000

Page 12: Combining detectors for human layout analysis

Hand DetectionHand Detection

Image

Head Detection Face Skin Segmentationg

Blob selection using the person skin model

Image Segmentation using [1]

Filtering

HandDetection

References:[1] Dorin Comaniciu , Peter Meer , Senior Member, Mean shift: A robust approach towardfeature space analysis , PAMI 2002

Page 13: Combining detectors for human layout analysis

Hand Detection: ResultsHand Detection: Results

Page 14: Combining detectors for human layout analysis

Feet DetectionFeet DetectionWe use a person detection system proposed byFelzenszwalb et al.

Details of the implementation:

• Software: Discriminatively Trained Deformable PartModels Version 4.• Based on model aspect analysis we choose 2 full body• Based on model aspect analysis we choose 2 full bodymodels which detect the feet.• For each model we choose the component related withlower leg position.• Inside lower leg box the foot box is fixed by cross‐validation.

FeetDetection

References: P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object DetectionReferences: P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detectionwith Discriminatively Trained Part Based Models, PAMI 2009

Page 15: Combining detectors for human layout analysis

Feet Detection: ResultsFeet Detection: Results

0.5

ion

subset: val, part: foot, AP = 0.013

0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

recall

prec

isi

recall

Page 16: Combining detectors for human layout analysis

ConclusionsConclusionsBy integrating state‐of‐the‐art part detectors is possible to obtain goodBy integrating state‐of‐the‐art part detectors is possible to obtain goodresults in head detection.

Our detector system allows to detect heads in following conditions:• Frontal Heads• Picture taken from behind• Heads with partial occlusions

Our detector has problems with :• Rotated images.

f l h d h d d• Profile heads with no standard pose.• Several heads in the layout box.

Head detection is mature enough for developing a fast robust detector providedHead detection is mature enough for developing a fast, robust detector, providedthat we find how to efficiently deal with these problems.

Page 17: Combining detectors for human layout analysis

ConclusionsConclusionsFeet detection could be improved by training more diverse models forFeet detection could be improved by training more diverse models for

legs. Presence is an open problem for low resolution images.

H d d t ti h ld b h d h bl (f iHand detection should be approached as a search problem (f.e. usingthe CHAINS model [2]). These models could be easily extended tofeet detection.

Detectors must be integrated in pictorial models that are able to dealwith a large variety of detectors.

It is necessary to build better databases with clear labeling policies.

References:[2] L Karlinsky, M Dinerstein, D Harari, S Ullman, The chains model for detecting parts by their context , CVPR 2010

Page 18: Combining detectors for human layout analysis

Combining detectors for human layout analysis.M. Drozdzal, A. Hernández, S. Seguí, X. Baró, S. Escalera, A. Lapedriza, D. Masip, P. Radeva, J. Vitrià

Barcelona Perceptual Computing LabBarcelona Perceptual Computing LabUniversitat de Barcelona, Computer Vision Center, Universitat Oberta de Catalunya

Assumption: The head is visible and is the most reliable human part to detect. Head position is used as an anchor point for detecting other parts.

HEAD Average Precision %: 74.4

The head is detected by integrating evidences from several state‐of‐the‐art part d f l d fil f ( ) ( l lb l)detectors: frontal and profile faces (OpenCV), person parts (Felzenszwalb et al), informative poselets (selected from the full set of Bourdev at al), and pictorial models (Ramanan et al.) 

Integration is based on weighted hierarchical clustering of head windows hypothesisIntegration is based on weighted hierarchical clustering of head windows hypothesis.

HANDS Average Precision %: 3.3

A specific skin model is extracted from the detected face This model is used toA specific skin model is extracted from the detected face. This model is used to segment the image using Mean Shift.  Resulting blobs are filtered using geometric constraints.

FEET Average Precision %: 1 2FEET Average Precision %: 1.2

We use a person detection system proposed by Felzenszwalb et al. We have selectedtwo 2 full body models which include feet in their components. For each model wechoose the component related to the leg position Inside the leg box the foot boxchoose the component related to the leg position. Inside the leg box the foot boxposition is learned by cross‐validation.