1 new features and insights for pedestrian detection stefan walk, nikodem majer, konrad schindler,...

1

New Features and Insights for Pedestrian Detection

Stefan Walk, Nikodem Majer, Konrad Schindler, Bernt Schiele

2

Outline

• Authors• Abstract• Main contributions• Algorithms• Experiments• Conclusion

3

Authors (1/4)

• Stefan Walk– Experience

• 2007-, PhD Candidate in Computer Science, Technische Universität Darmstadt

• 2003-2007, Diploma in Physics, Technische Universität Darmstadt, Germany 2007

– Research interest• People Detection• Detecting from video data (utilizing motion information)

– Papers• Multi-cue Onboard Pedestrian Detection (CVPR09)

4

Authors (2/4)

• Nikodem Majer– Experience

• 2007-, PhD Candidate in Computer Science, Technische Universität Darmstadt

– Research interest• …

– Papers• …

5

Authors (3/4)

• Konrad Schindler– Experience

• 2009-: assistant professor, TU Darmstadt, Germany• 2007-2008: post-doc, ETH Zurich • 2004-2006: post-doc, Monash University,

Melbourne/Australia• 2001-2003: research assistant, Graz University of

Technology, Austria

– Research interest• computer vision (3D scene analysis, biologically inspired

vision, tracking)• image processing, pattern recognition, machine learning,

photogrammetry

– Papers• PAMI10, CVPR10, ICCV10…

6

Authors (4/4)

• Bernt Schiele– Experience

• 1999-2004, Assistant Professor, ETH Zurich, Switzerland• 1997-2000, Postdoctoral Associate and Visiting Assistant

Professor, MIT and Cambridge, MA, USA

• 1994, Visiting researcher at CMU• AE of PAMI, IJCV, AC of ECCV’08, CVPR’09, ICCV’09,

PC of ICCV 2011

– Research interest• Perceptual computing, human-computer interfaces

– Papers• …

7

Outline


8

Abstract (1/2)

• Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory

• In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly—even in the case of low-quality video and consequently degraded flow fields

• Furthermore, we introduce a new feature, self-similarity on color channels, which consistently improves detection performance both for static images and for video sequences, across different datasets. In combination with HOG, these two features outperform the state-of-the-art by up to 20%.

9

Abstract (2/2)

• Finally, we report two insights concerning detector evaluations, which apply to classifier-based object detection in general

• First, we show that a commonly under-estimated detail of training, the number of bootstrapping rounds, has a drastic influence on the relative (and absolute) performance of different feature/classifier combinations

• Second, we discuss important intricacies of detector evaluation and show that current benchmarking protocols lack crucial details, which can distort evaluations

10

Outline


11

Main contribution

• First, we introduce a new feature based on self-similarity of low level features, in particular color histograms from different sub-regions within the detector window

• The second main contribution is to establish a standard what pedestrian detection with a global descriptor can achieve at present, including a number of recent advances which we believe should be part of the “best practice”, but have not yet been included in systematic evaluations

• Our third main contribution are two important insights that apply not only to pedestrian detection, but more generally to classifier-based object detection. (1)Bootstrapping is very important. (2)The existing evaluation protocol is insufficient

12

Outline


13

Outline

• 本文的风格与该实验室文章一贯的风格类似– 在自己提出的两个数据库上 (Caltech Pedestrian, TUD-Brussel)

测试当前人体检测领域不同的特征与不同的分类器，评价这些算法的优劣 (性能越高的算法关注度越高 )

– 自己提出新特征并通过实验给出结论——“在原始方法的基础上引入我们的特征可以进一步提升人体检测系统的性能”

• Related Features– Haar-like, VJ 2001 年成功用于人脸检测领域– HOG (Histogram of Oriented Gradient), Dalal 2005 年成功

用于人体检测领域– HOF (Histogram of Flow), Dalal 2006 年提出，应用于视频人

体检测– HOG-LBP 王晓宇 2009 年应用于人体检测领域，高性能– CSS (Color Self-similarity), 本文提出

• Related Classifiers– SVM– MPLBoost (Multiple Pose Boosting), Dollar 2008 年提出

14

Haar-like feature (1/2)

• Haar-like feature– 图像内部特定模式的两个矩型内部像素和之差– 采用积分图可以快速计算 Haar 特征响应值

• Haar 特征的变种– 45, 22.5, 11.25 度…，仍然受限于“矩形”– 任意多边形区域形状的 Haar 特征 (CVPR10)

Haar特征的积分图计算

传统 Haar特征

15

Haar-like feature (2/2)

• 任意形状的 Haar 特征– 任意多边形区域的像素和可以等价为一系列梯形区域的像素和– 梯形区域的像素和等价于两个直角三角形的像素差

– 算法关键是计算直角三角形区域的积分图，参数 (x,y, 斜率 )

16

HOG feature (1/1)

• HOG feature- 梯度方向直方图– 输入图像的 Gamma 校正– 计算输入图像各像素的梯度幅值与方向– 梯度幅值高斯加权，使用三线形插值计算各个单元梯度方向的直方

图– 相邻的单元直方图归一化得到最终的特征向量

HOG特征计算流程

HOG特征的三线性插值

17

HOF feature (1/1)

• HOF feature- 光流直方图– 计算输入图像的 x、 y方向的光流 (例如 LK 算法等等 )– 对于特定区域对，根据对应像素点的 x、 y方向光流差异，计算光

流梯度幅值与方向– 根据光流梯度方向使用光流梯度幅值构建直方图

Original 3x3 IMHwd (Internal Motion Boundary wavelet diff.)

18

HOG-LBP (1/1)

• HOG-LBP feature ：将 HOG 与 LBP 串联起来– HOG ：将三线性插值与高斯加权替换为卷积– LBP (Local Binary Pattern) ：局部区域的二值模式– 该特征在 INRIA 人体数据库上取得了迄今为止的最好结果

LBP特征示意

19

CSS (1/1)

• CSS feature ：颜色自相似度– 对于 8x8 的图像区域，采用三线性插值计算颜色直方图– We experimented with different color spaces, including

3x3x3 histograms in RGB, HSV, HLS and CIE Luv space, and 4x4 histograms in normalized rg, HS and uv, discarding the intensity and only keeping the chrominance. Among these, HSV worked best, and is used in the following

– 利用这些直方图之间的相似度作为特征向量，作者尝试了 L1-norm ， L2-norm, Chi-square distance 与直方图交，发现直方图交性能作为优秀

– 在实现中，对于 64x128 的窗口划分为 8x16=128 个 8x8 区域，得到 128 个直方图，直方图相似度一共有 128x127/2=8,128 个

• Furthermore, second order image statistics, especially co-occurrence histograms, are gaining popularity, pushing feature spaces to extremely high dimensions

20

Classifiers

• SVMs– Linear SVM– Histogram Intersection Kernel SVM (HIKSVM)

• MPLBoost: Multiple Pose Boosting (In ECCV08 workshop)– 将初始训练样本分成 K个子集，同时训练 K个强分类器，分类器

输出值是这 K个分类器响应值的最大值– 在训练过程中，只有被所有强分类器错分的样本权值保持不变，否

则该样本权值降低– 在检测过程中，对于一个扫描窗口，如果有一个强分类器认为是

positive 就是 positive ，如果所有强分类器认为是 negative 才是 negative

21

Evaluation protocol (1/4)

• 人体检测系统衡量标准的不合理之处– 现阶段用于确定“一个检测窗口是否命中一个人体”依据 VOC 准

则，交并比 >50%

– 没有明确规定如何应对人群中人体与检测框的匹配问题

22


• We split the set of annotations and detections into considered and ignored sets

• Annotations can fall into the ignored set because of size, position, occlusion level, aspect ratio or non-pedestrian label in the Caltech setting

• Detections can fall into the ignored set because of size. E.g. if we wish to evaluate on 50-pixel-or-taller, unoccluded pedestrians, any annotation labeled as occluded and any annotation or detection <50 pixels falls in the ignored set

23


• For considered detections– If they match a considered annotation they count as true

positive– If they match no annotation, or only one that has already

been matched to another detection, they count as false positive

– If they match an ignored annotation they are discarded

• For ignored detections– If an ignored detection matches an ignored annotation, it

should be discarded– If an ignored detection matches no annotation, it seems

reasonable to discard it, but this may introduce a bias– If an ignored detection matches a considered annotation,

count it as a true positive

24


• To summarize, there is no single correct way how to evaluate on a subset of annotations, and all choices have undesirable side effects

• It is therefore imperative that published results are accompanied by detections, and that evaluation scripts are made public

• As there are boundary effects in almost any setting (all realistic datasets have a minimum annotation size), it must be possible for others to verify that differences are not artifacts of the evaluation

25

Outline


26

Database

• INRIA 人体数据库• CalTech 人体数据库

– 2009 年 Dollar 提出– 视频序列– 训练集包括 192k 人体，测试集 155k 人体– 各种困难的情况，光照、遮挡、小尺度 (人体高度 3 像素的都有 )、人群…

– 标注非常完善，方便测试检测器的各种特性• TUD-Brussel 数据库

– 2009 年 Wojek 提出– 视频序列– 仅有训练集，包括 1,326 个人，各种尺度各种视角

• 所有实验训练样本尺寸统一 64x128 ，人体大小 48x96，对齐

27

Experiment1 – HOG-LBP (1/1)

• However, while we were able to reproduce their good results on INRIA Person, we could not gain anything with LBPs on other datasets. They seem to be affected when imaging conditions change (in our case, we suspect demosaicing artifacts to be the issue)

INRIA TUD

28

Experiment2 – Color information (1/2)

• More than 1fppi is usually not acceptable in any practical application

• Self-similarity of colors is more appropriate than using the underlying color histograms directly as feature

• On the contrary, adding the color histogram values directly even hurts the performance of HOG

TUD TUD

29

Experiment2 – Color information (2/2)

• Why CSS is effective?– Self-similarity encodes relevant parts like clothing and

visible skin regions

• Why directly using color information shows no improvements?– The training data was recorded with a different camera

and in different lighting conditions than the test data, so that the weights learned for color do not generalize from one to the other. (Similar reason to Haar feature)

30

Experiment3 – Bootstrap (1/2)

• With less than two bootstrapping rounds, performance depends heavily on the initial training set

• At least two retraining rounds are required in HOG+linear SVM framework

• This problem will be alleviated by using more initial negative samples, not solved

31

Experiment3 – Bootstrap (2/2)

• For boosting classifiers (Fig. 3(c))3, the situation is worse: although mean performance seems stable over bootstrapping rounds, the overall variance only decreases slowly—the initial selection of negative samples has a high influence on the final performance even after 3 bootstrapping rounds

32

Experiment4 – Seed & self similarity(1/1)

• Self-similarity on HOG blocks shows little improvement

• It is important to make sure the result does not depend on the initial selection of negative samples, e.g. by retraining enough rounds with SVMs

TUD

33

Experiment5 – CalTech pedestrian (1/2)

34

Experiment5 – CalTech pedestrian (2/2)

• Color self-similarity is indeed complementary to gradient information

• The motion information contributes greatly on pedestrian detection. The reason that HOF works so well on the “near” scale is probably that during multi-scale flow estimation compression artifacts are less visible at higher pyramid levels, so that the flow field is more accurate for larger people

• The performance of all evaluated algorithms is abysmal under heavy occlusion

35

Experiment6 – Haar feature (1/1)

• Judging from the available research our feeling is that Haar features can potentially harm more than they help

TUD

36

Outline


37

Conclusion

• 主要结论– 运动信息会对视频中的人体检测起到很大的促进作用 (HOG)– 颜色近似度对于人体检测器的性能有很大的提升 (CSS)– Bootstrap 在检测器的学习过程起到关键作用– 现阶段的物体检测评价标准不合理…

• 次要结论– LBP仅仅对于 INRIA 数据库有效– HOG-linear SVM至少需要 2 轮 bootstrap– 使用 Haar 特征辅助人体检测可能弊大于利

38

Thanks!!

1 new features and insights for pedestrian detection stefan walk, nikodem majer, konrad schindler,...

Documents

evaluations slide

iccv10 slide

detection performance

bernt schiele slide

people detection

research assistant

assistant professor

new features