integral channel features - github pages · we refer to as integral channel features, ... vj-opencv...

1
Integral Channel Features Piotr Dollár 1 [email protected] Zhuowen Tu 2 [email protected] Pietro Perona 1 [email protected] Serge Belongie 3 [email protected] 1 Dept. of Electrical Engineering California Institute of Technology 2 Lab of Neuro Imaging University of CA, Los Angeles 3 Dept. of Computer Science and Eng. University of California, San Diego Figure 1: Multiple registered image channels are computed using various trans- formations of the input image; next, features such as local sums, histograms, and Haar wavelets are computed efficiently using integral images. Such features, which we refer to as integral channel features, naturally integrate heterogeneous sources of information, have few parameters, and result in fast, accurate detectors. We study the performance of ‘integral channel features’ for image clas- sification tasks, focusing in particular on pedestrian detection. The gen- eral idea behind integral channel features is that multiple registered image channels are computed using linear and non-linear transformations of the input image [6], and then features such as local sums, histograms, and Haar features and their various generalizations are efficiently computed using integral images [8]. Such features have been used in recent litera- ture for a variety of tasks – indeed, variations appear to have been invented independently multiple times. Although integral channel features have proven effective, little effort has been devoted to analyzing or optimizing the features themselves. In this work we present a unified view of the relevant work in this area and perform a detailed experimental evaluation. We demonstrate that when designed properly, integral channel features not only outperform other features including histogram of oriented gradi- ent (HOG), they also (1) naturally integrate heterogeneous sources of in- formation, (2) have few parameters and are insensitive to exact parameter settings, (3) allow for more accurate spatial localization during detection, and (4) result in fast detectors when coupled with cascade classifiers. Figure 2: Examples of integral channel features: (a) A first-order feature is the sum of pixels in a rectangular region. (b) A Haar-like feature is a second-order feature approximating a local derivative [8]. (c) Generalized Haar features include more complex combinations of weighted rectangles. (d) Histograms can be com- puted by evaluating local sums on quantized images [7]. We show significantly improved results over previous applications of similar features to pedestrian detection [3]. In fact, full-image evalua- tion on the INRIA pedestrian dataset shows that learning using standard boosting coupled with our optimized integral channel features matches or outperforms all but one other method [5], including state of the art ap- proaches obtained using HOG [2] features with more sophisticated learn- 10 -2 10 -1 10 0 10 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positives per image miss rate VJ (0.48) VJ-OpenCv (0.53) HOG (0.23) FtrMine (0.34) Shapelet (0.50) Shapelet-orig (0.90) PoseInv (0.51) PoseInvSvm (0.69) MultiFtr (0.16) HikSvm (0.24) LatSvm-V1 (0.17) LatSvm-V2 (0.09) ChnFtrs (0.14) 10 -2 10 -1 10 0 10 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positives per image miss rate VJ (0.64) VJ-OpenCv (0.67) HOG (0.29) FtrMine (0.67) Shapelet (0.67) Shapelet-orig (0.99) PoseInv (0.75) PoseInvSvm (0.86) MultiFtr (0.31) HikSvm (0.39) LatSvm-V1 (0.36) LatSvm-V2 (0.37) ChnFtrs d=2 (0.21) ChnFtrs d=4 (0.24) ChnFtrs d=8 (0.25) Figure 3: Left: INRIA Results. Right: Localization Accuracy. ing techniques. On the task of accurate localization in the INRIA dataset, the proposed method outperforms state of the art by a large margin. Figure 4: Top: Example image and computed channels. Bottom: Rough visu- alization of spatial support of trained classifier for all channels jointly (left) and separately for each channel type, obtained by averaging the rectangle masks of selected features. Peaks in different channels are highlighted. In addition to the large gains in performance, we describe a number of optimizations that allow us to compute effective channels that take about .05-.2s per 640 × 480 image depending on the options selected. For 320 × 240 images, the channels can be computed in real time at rates of 20-80 frames per second on a standard PC. Our overall detection system has a runtime of about 2s for multiscale pedestrian detection in a 640 × 480 image, the fastest of all methods surveyed in [4]. Finally, we show results on the recently introduced Caltech Pedes- trian Dataset [1, 4] which contains almost half a million labeled bounding boxes and annotated occlusion information. Results for 50-pixel or taller, unoccluded or partially occluded pedestrians are shown in Fig. 5. ChnFtrs significantly outperforms all other methods, achieving a detection rate of almost 60% at 1 fppi, compared to 50% for competing methods. 10 -2 10 -1 10 0 10 1 10 2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positives per image miss rate VJ (0.87) HOG (0.50) FtrMine (0.59) Shapelet (0.82) PoseInv (0.65) MultiFtr (0.57) HikSvm (0.79) LatSvm-V1 (0.67) LatSvm-V2 (0.51) ChnFtrs (0.42) Figure 5: Results on the Caltech Pedestrian Dataset[1, 4] [1] www.vision.caltech.edu/Image_Datasets/ CaltechPedestrians/. [2] N. Dalal and B. Triggs. Histogram of oriented gradient for human detection. In CVPR, 2005. [3] P. Dollár, B. Babenko, S. Belongie, P. Perona, and Z. Tu. Multiple component learning for object detection. In ECCV, 2008. [4] P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark. In CVPR, 2009. [5] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008. [6] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A, 7: 923–932, May 1990. [7] F. Porikli. Integral histogram: A fast way to extract histograms in cartesian spaces. In CVPR, 2005. [8] P. Viola and M. Jones. Robust real-time object detection. IJCV, 57 (2):137–154, 2004.

Upload: vuthien

Post on 30-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Integral Channel Features

Piotr Dollár1

[email protected]

Zhuowen Tu2

[email protected]

Pietro Perona1

[email protected]

Serge Belongie3

[email protected]

1 Dept. of Electrical EngineeringCalifornia Institute of Technology

2 Lab of Neuro ImagingUniversity of CA, Los Angeles

3 Dept. of Computer Science and Eng.University of California, San Diego

Figure 1: Multiple registered image channels are computed using various trans-formations of the input image; next, features such as local sums, histograms, andHaar wavelets are computed efficiently using integral images. Such features, whichwe refer to as integral channel features, naturally integrate heterogeneous sourcesof information, have few parameters, and result in fast, accurate detectors.

We study the performance of ‘integral channel features’ for image clas-sification tasks, focusing in particular on pedestrian detection. The gen-eral idea behind integral channel features is that multiple registered imagechannels are computed using linear and non-linear transformations of theinput image [6], and then features such as local sums, histograms, andHaar features and their various generalizations are efficiently computedusing integral images [8]. Such features have been used in recent litera-ture for a variety of tasks – indeed, variations appear to have been inventedindependently multiple times. Although integral channel features haveproven effective, little effort has been devoted to analyzing or optimizingthe features themselves. In this work we present a unified view of therelevant work in this area and perform a detailed experimental evaluation.We demonstrate that when designed properly, integral channel featuresnot only outperform other features including histogram of oriented gradi-ent (HOG), they also (1) naturally integrate heterogeneous sources of in-formation, (2) have few parameters and are insensitive to exact parametersettings, (3) allow for more accurate spatial localization during detection,and (4) result in fast detectors when coupled with cascade classifiers.

Figure 2: Examples of integral channel features: (a) A first-order feature is thesum of pixels in a rectangular region. (b) A Haar-like feature is a second-orderfeature approximating a local derivative [8]. (c) Generalized Haar features includemore complex combinations of weighted rectangles. (d) Histograms can be com-puted by evaluating local sums on quantized images [7].

We show significantly improved results over previous applications ofsimilar features to pedestrian detection [3]. In fact, full-image evalua-tion on the INRIA pedestrian dataset shows that learning using standardboosting coupled with our optimized integral channel features matches oroutperforms all but one other method [5], including state of the art ap-proaches obtained using HOG [2] features with more sophisticated learn-

10−2

10−1

100

101

0.1

0.2

0.3

0.4

0.5

0.6

0.70.80.9

1

false positives per image

mis

s r

ate

VJ (0.48)

VJ−OpenCv (0.53)

HOG (0.23)

FtrMine (0.34)

Shapelet (0.50)

Shapelet−orig (0.90)

PoseInv (0.51)

PoseInvSvm (0.69)

MultiFtr (0.16)

HikSvm (0.24)

LatSvm−V1 (0.17)

LatSvm−V2 (0.09)

ChnFtrs (0.14)

10−2

10−1

100

101

0.1

0.2

0.3

0.4

0.5

0.6

0.70.80.9

1

false positives per image

mis

s r

ate

VJ (0.64)

VJ−OpenCv (0.67)

HOG (0.29)

FtrMine (0.67)

Shapelet (0.67)

Shapelet−orig (0.99)

PoseInv (0.75)

PoseInvSvm (0.86)

MultiFtr (0.31)

HikSvm (0.39)

LatSvm−V1 (0.36)

LatSvm−V2 (0.37)

ChnFtrs d=2 (0.21)

ChnFtrs d=4 (0.24)

ChnFtrs d=8 (0.25)

Figure 3: Left: INRIA Results. Right: Localization Accuracy.

ing techniques. On the task of accurate localization in the INRIA dataset,the proposed method outperforms state of the art by a large margin.

Figure 4: Top: Example image and computed channels. Bottom: Rough visu-alization of spatial support of trained classifier for all channels jointly (left) andseparately for each channel type, obtained by averaging the rectangle masks ofselected features. Peaks in different channels are highlighted.

In addition to the large gains in performance, we describe a number ofoptimizations that allow us to compute effective channels that take about.05-.2s per 640×480 image depending on the options selected. For 320×240 images, the channels can be computed in real time at rates of 20-80frames per second on a standard PC. Our overall detection system hasa runtime of about 2s for multiscale pedestrian detection in a 640× 480image, the fastest of all methods surveyed in [4].

Finally, we show results on the recently introduced Caltech Pedes-trian Dataset [1, 4] which contains almost half a million labeled boundingboxes and annotated occlusion information. Results for 50-pixel or taller,unoccluded or partially occluded pedestrians are shown in Fig. 5. ChnFtrssignificantly outperforms all other methods, achieving a detection rate ofalmost 60% at 1 fppi, compared to 50% for competing methods.

10−2

10−1

100

101

102

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives per image

mis

s r

ate

VJ (0.87)

HOG (0.50)

FtrMine (0.59)

Shapelet (0.82)

PoseInv (0.65)

MultiFtr (0.57)

HikSvm (0.79)

LatSvm−V1 (0.67)

LatSvm−V2 (0.51)

ChnFtrs (0.42)

Figure 5: Results on the Caltech Pedestrian Dataset[1, 4]

[1] www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/.

[2] N. Dalal and B. Triggs. Histogram of oriented gradient for humandetection. In CVPR, 2005.

[3] P. Dollár, B. Babenko, S. Belongie, P. Perona, and Z. Tu. Multiplecomponent learning for object detection. In ECCV, 2008.

[4] P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection:A benchmark. In CVPR, 2009.

[5] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminativelytrained, multiscale, deformable part model. In CVPR, 2008.

[6] J. Malik and P. Perona. Preattentive texture discrimination with earlyvision mechanisms. Journal of the Optical Society of America A, 7:923–932, May 1990.

[7] F. Porikli. Integral histogram: A fast way to extract histograms incartesian spaces. In CVPR, 2005.

[8] P. Viola and M. Jones. Robust real-time object detection. IJCV, 57(2):137–154, 2004.