pedestrian detection algorithm for on-board cameras of...

8
Abstract— In this paper, a general algorithm for pedestrian detection by on-board monocular camera which can be applied to cameras of various view ranges in unified manner. The Spatio-Temporal MRF model extracts and tracks foreground objects as pedestrians and non-pedestrian distinguishing from background scenes as buildings by referring to motion difference. During the tracking sequences, cascaded HOG classifiers classify the foreground objects into the two classes of pedestrians and non-pedestrians. Before the classification, geometrical constraints on the relationship between heights and positions of the objects are examined to exclude the non-pedestrian objects. This pre-processing contributed to reducing the processing time of the classification while maintaining the classification accuracy. Due to the benefit of the tracking that the classifier can make decision totally considering Regions of Interest (ROIs) with same ID during consecutive images, this algorithm can operates quite robustly against noises and classification errors at each image frame. I. INTRODUCTION ECENTLY, from the viewpoint of the safety of the pedestrian, the driving assistance systems to evade pedestrian's traffic accident has been actively studied. These systems have been come into practical use as an infrastructure system in the beginning [1][2]. However, in the road where the infrastructure maintenance is difficult, the on-board system is valuable. The method to detect the pedestrian in on-board driving assistance systems includes the laser sensor, the millimeter wave radar, and the camera. The laser sensor can detect the presence of the object by scanning horizontal direction. The millimeter wave radar has also the problem in the spatial resolution. Therefore, it is difficult to distinguish a pedestrian from the detected objects by these sensors. On the other hand, on-board camera systems can detect pedestrians visually. These systems utilize monaural, stereo, or infrared camera. In the stereo camera, it is difficult to catch the lateral movement of the pedestrian, because the strain is caused in the image according to the view angle of the camera. In the infrared camera, it is difficult to use it in daytime. The pedestrian detection by on-board camera is also one of the most difficult problems, because the camera moves arbitrarily, and the background image is overly complicated. The existing pedestrian detection techniques can be classified into two groups: texture and motion based. Texture Manuscript received January 18, 2010. This work was founded in part by the STARC. S. Kamijo, K. Fujimura, Y. Shibayama, are with Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505 JAPAN, +81-3-5452-6273, [email protected] kyo.ac.jp based approach utilize the appearance feature that is extracted using Haar-wavelet [3], egde template [4] and histogram of oriented gradients [5], etc. Papageorgiou and Poggio[3] and Oren et al.[6] utilized the extracted Haar-wavelet feature and SVM classifier to validate the candidate region from static image frame. Gavrilla and Munder [7] and Gavrilla et al. [4] detected Region of Interest (ROI) based on pedestrian edge template matching followed by a verification stage based on Neural network architecture. Munder et al. [8] described multicue (i.e. shape, texture, depth) object model within a Bayesian framework for detection and tracking based on particle filtering. Shashua et al. [9] presented a system which break ROIs into sub-regions, and fed processed sub-region vector to adaboost classifier for verification. A similar approach presented by Dalal and Triggs [5] which uses the fact that the shape of object can be represented by a distribution of local intensity gradients or edge direction. For classification, SVM is trained using gradient histogram features from pedestrian and non-pedestrian classes. On the other hand, motion based techniques rely on short-term motion by estimating optical flow. Cutler and davis [10] focused on human periodic walking motion pattern as a main cue for pedestrian detection. Sidenbladh [11] technique was based on collecting examples of human and non-human motion pattern and learns SVM with RBF kernel to create a human classifier. Viola et al. [12] presented a detection algorithm which combines motion and appearance information to build a robust model of walking human. Curio et al. [13] detected walking pedestrians at road intersection. Their initial detection process is based on a fusion of texture analysis, model-based contour features matching of pedestrians, and inverse-perspective mapping (binocular vision). Additionally, motion patterns of limb movements are analyzed to classify pedestrian from other objects. Elzein et al. [14] detected ROIs by computing optical flow only in regions selected by frame differencing, and the selected ROIs are searched to find pedestrian using manually selected Haar-wavelet features. Although above mentioned research looks promising with some extents, more research will be needed to adapt this driving assistance system as a life saving tool in moving vehicles. In this paper, we present inter-layer collaborative pedestrian tracking algorithm where initial foreground segmentation is done by motion based object detection algorithm. A cascade structure of rejection style classifier introduced by viola et al. [12] is utilized to separate pedestrian and non-pedestrian object. Finally, Spatio-Temporal Markov Pedestrian Detection Algorithm for On-board Cameras of Multi View Angles S. Kamijo IEEE, K. Fujimura, and Y. Shibayama R

Upload: others

Post on 21-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

Abstract— In this paper, a general algorithm for pedestrian detection by on-board monocular camera which can be applied to cameras of various view ranges in unified manner. The Spatio-Temporal MRF model extracts and tracks foreground objects as pedestrians and non-pedestrian distinguishing from background scenes as buildings by referring to motion difference. During the tracking sequences, cascaded HOG classifiers classify the foreground objects into the two classes of pedestrians and non-pedestrians. Before the classification, geometrical constraints on the relationship between heights and positions of the objects are examined to exclude the non-pedestrian objects. This pre-processing contributed to reducing the processing time of the classification while maintaining the classification accuracy. Due to the benefit of the tracking that the classifier can make decision totally considering Regions of Interest (ROIs) with same ID during consecutive images, this algorithm can operates quite robustly against noises and classification errors at each image frame.

I. INTRODUCTION ECENTLY, from the viewpoint of the safety of the pedestrian, the driving assistance systems to evade

pedestrian's traffic accident has been actively studied. These systems have been come into practical use as an infrastructure system in the beginning [1][2]. However, in the road where the infrastructure maintenance is difficult, the on-board system is valuable.

The method to detect the pedestrian in on-board driving assistance systems includes the laser sensor, the millimeter wave radar, and the camera. The laser sensor can detect the presence of the object by scanning horizontal direction. The millimeter wave radar has also the problem in the spatial resolution. Therefore, it is difficult to distinguish a pedestrian from the detected objects by these sensors.

On the other hand, on-board camera systems can detect pedestrians visually. These systems utilize monaural, stereo, or infrared camera. In the stereo camera, it is difficult to catch the lateral movement of the pedestrian, because the strain is caused in the image according to the view angle of the camera. In the infrared camera, it is difficult to use it in daytime. The pedestrian detection by on-board camera is also one of the most difficult problems, because the camera moves arbitrarily, and the background image is overly complicated.

The existing pedestrian detection techniques can be classified into two groups: texture and motion based. Texture

Manuscript received January 18, 2010. This work was founded in part by the STARC.

S. Kamijo, K. Fujimura, Y. Shibayama, are with Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505 JAPAN, +81-3-5452-6273, [email protected] kyo.ac.jp

based approach utilize the appearance feature that is extracted using Haar-wavelet [3], egde template [4] and histogram of oriented gradients [5], etc. Papageorgiou and Poggio[3] and Oren et al.[6] utilized the extracted Haar-wavelet feature and SVM classifier to validate the candidate region from static image frame. Gavrilla and Munder [7] and Gavrilla et al. [4] detected Region of Interest (ROI) based on pedestrian edge template matching followed by a verification stage based on Neural network architecture. Munder et al. [8] described multicue (i.e. shape, texture, depth) object model within a Bayesian framework for detection and tracking based on particle filtering. Shashua et al. [9] presented a system which break ROIs into sub-regions, and fed processed sub-region vector to adaboost classifier for verification. A similar approach presented by Dalal and Triggs [5] which uses the fact that the shape of object can be represented by a distribution of local intensity gradients or edge direction. For classification, SVM is trained using gradient histogram features from pedestrian and non-pedestrian classes.

On the other hand, motion based techniques rely on short-term motion by estimating optical flow. Cutler and davis [10] focused on human periodic walking motion pattern as a main cue for pedestrian detection. Sidenbladh [11] technique was based on collecting examples of human and non-human motion pattern and learns SVM with RBF kernel to create a human classifier. Viola et al. [12] presented a detection algorithm which combines motion and appearance information to build a robust model of walking human. Curio et al. [13] detected walking pedestrians at road intersection. Their initial detection process is based on a fusion of texture analysis, model-based contour features matching of pedestrians, and inverse-perspective mapping (binocular vision). Additionally, motion patterns of limb movements are analyzed to classify pedestrian from other objects. Elzein et al. [14] detected ROIs by computing optical flow only in regions selected by frame differencing, and the selected ROIs are searched to find pedestrian using manually selected Haar-wavelet features. Although above mentioned research looks promising with some extents, more research will be needed to adapt this driving assistance system as a life saving tool in moving vehicles.

In this paper, we present inter-layer collaborative pedestrian tracking algorithm where initial foreground segmentation is done by motion based object detection algorithm. A cascade structure of rejection style classifier introduced by viola et al. [12] is utilized to separate pedestrian and non-pedestrian object. Finally, Spatio-Temporal Markov

Pedestrian Detection Algorithm

for On-board Cameras of Multi View Angles S. Kamijo IEEE, K. Fujimura, and Y. Shibayama

R

Page 2: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

Random Field model(S-T MRF) [15] based tracking is performed to track pedestrians.

II. SYSTEM OVERVIEW

A. Target of Our System Our system targets at the pedestrian protection on an open

road including the intersection. In this case, the demanded detection environment is different in the detection in straight. If the vehicles go straight with high speed, the system should detect the pedestrians in the far distance as much as possible with narrow view angle. In contrast, if the vehicle turns intersections, the system needs to detect the near pedestrians in surrounding vehicle with wide view angle. This detection view angle and distance are in the relation of the trade-off for specs of an on-board camera. Our proposed system aims at the framework construction to be able to select this trade-off. In the following experiments, our system is verified with on-board cameras of various view angles.

B. Strategy Overview Existing texture based pedestrian detection techniques have

a problem in extraction of the ROI, and the false detection is caused easily if the entire image is involved. Therefore, we employed a motion-based method for object detection which focuses on the difference of motion between foreground objects and onboard background images. Image regions of detected foreground compose ROIs, which indicate the candidates of pedestrians. Following the ROIs detection process, texture patterns in the ROIs are analyzed to classify pedestrians from the other objects like polls, trees, bushes and other foreground facilities. Walking pedestrians can be classified due to the difference in their motion from the motion of background infrastructures such as buildings. In addition, foreground facilities such as polls, trees, and signboards would be detected at the same time as pedestrians. Moreover, the standing person who is near the wall side cannot be detected by the motion detection. However, this would not be a problem from the viewpoint of the driving assistance systems, because this is thought to be a case without supporting. The object classification by the texture-based techniques is done following the motion based object detection. As a result, it is thought that the pedestrian can be detected with high accuracy compared with conventional techniques. The method of motion-based object detection and the method of object classification are described in the next section.

C. Motion Based Object Detection and Tracking In general, the motion of the onboard background image is

formulated by considering the motion of the onboard camera itself. In this paper, we assumed that the motion of the onboard background image can be approximated by the linear formula along the horizontal axis. It is because that onboard camera is moving horizontally, and the motion of the background image varies depending of distance between the camera and the

(a) Detected objects by motion difference

(b) Objects tracking

Figure.1: Results of motion based object detection background infrastructures. The detail of this method is explained in [16].

Segmentation of the object region in the spatio-temporal image is equivalent to tracking the object against occlusion (see fig. 7).This is the principle idea of the S-T MRF model. We defined our the S-T MRF model [15] so as to divide an image into blocks as a group of pixels, and to optimize the labeling of such blocks by referring to texture and labeling correlations among them, in combination with their motion vectors. Combined with a stochastic relaxation method, the S-T MRF optimizes object boundaries precisely, even when serious occlusions occur. Detail explanation of the S-T MRF tracker can be found in [15].

III. GEOMETRICAL CONSTRAINT ON ROI

A. Geometry Estimation of Objects Since the motion based algorithm detects ROIs of both the

pedestrians and non-pedestrians, ROIs should be classified into two classes of pedestrians and non-pedestrians. In order to reduce processing time for pattern classification, we applied geometrical constraints to the ROIs before the processing of pattern classification to exclude the ROIs which have poor likelihood as pedestrians according to their relationship between the positions and heights in the image. Camera calibration was performed as follows to determine the relationship between the positions and heights by the pixel unit in the image.

Figure.2 shows condition of our camera setup. In this paper,

Page 3: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

(a) Top view of camera geometry

(b) The profile view of the camera

Figure.2: Top views and the profile plane of the camera geometry

on-board cameras are set up as optical axes of the cameras to be horizontal. Figure.2(a) shows top view of the camera setup, and Figure.2(b) shows profile of the camera setup which is obtained along the plane indicated in Figure.2(a). Although the length “fφ” seems to vary according that “φ” in Figure.2(a) varies, this difference in “fφ” should be canceled by lens design as “fφ” to be regarded as constant of focal length “f”. Thus, the relationship between distance from the camera to the objects and height of the object in the image space can be represented as Eq.(1). In Eq.(1), distance between the object and the camera is represented by “D [meters]”, and height of the object in image corresponding to 1 meter in real world coordinate is represented by “hp [pixels / meter]”. The relationship between “D” and “hp” are depicted in Figure.3(a) where four curves correspond to camera view angles of 30 degrees, 60 degrees, 75 degrees, and 100 degrees.

In Figure.2, “[X,Y]” represents coordinate of the object in real world on the road plane, and “[x,y]” represents coordinate of the object in the image space. “[x,y]” is measured on the basis of bottom of the ROI, and the position “[x,y]” is transformed into the position “[X,Y]” in the real world by Eq.(2). Thus, the distance “D” in the real world should be derived from the position “[x,y]” in the image space as Eq.(3), and the relationship between “[x,y]” and “D” are visualizes in Figure.3(b). Difference in the curvature among the four graphs in Figure.3(b) is come from the difference in four kinds of “f” due to the view angles.

Finally, combination of Eq.(1) and Eq.(3) provides the relationship between “[x,y]” and “hp” as represented in Eq.(4), and the height “H” of the object in the real world is obtained as Eq.(5) where “h” represents height of the ROI in the image space. Consequently, when the position “[x,y]” and the height “h” of an ROI were obtained, the height “H” in the real world can be estimated by the above geometrical consideration.

(a) Relationship between hp [pixels / meter] and D [meters]

30 degrees 60 degrees

75 degrees 100 degrees

(b) Relationship between D [meters] from camera and position (x, y) [pixels] Figure.3: Geometrical Estimation of Objects

)1(1DF

fIh hp

)2(2

|2/|

1

2

yIHorHor

FfhY

xIfIYFX

h

c

ww

)3(|2|

122

2

1

22

fF

I

xI

yIHorHor

FfhYXD

w

w

h

c

)4(|2|

12

)(2

2

fF

I

xIHorh

yIHorIh

w

w

c

hhp

)5(ph

hH

B. Calibration on Estimated Height of Pedestrian As derived in the above geometrical consideration, when

the position “[x,y]” and the height “h” of an ROI were obtained, the height “H” in the real world can be estimated. Then, ROIs having the estimated height “H” of more or less

Page 4: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

than thresholds should be excluded from pedestrian candidates. In order to determine the upper and lower thresholds to decide the ROIs to be excluded, deviation of “H” among the ROIs of pedestrian should be examined.

Roll of the vehicle during right and left turns should cause displacement of the optical axis of the on-board camera from horizontal position. This displacement of the optical axis should cause the deviations in the positions “[x,y]” of ROIs in the image space. As a result, estimated distance “D” and “hp” should have deviations against the true value which should have been obtained if the optical axis were horizontally positioned. “h” also should have deviation due to deviation in heights of pedestrians. Consequently, estimated height “H” of the pedestrian should have deviation due to the deviations of “h” and “hp”.

Generally, it is difficult to measure the displacement of the optical axis caused by the vehicle rolling and to correct “[x,y], D, and hp” in real time during the driving situation. Therefore, we decided to calibrate “H” directly from the experiments in order to determine upper and lower threshold of the pedestrian. This experimental calibration would be able to reveal the probable range of “H” considering both the deviations of “h” and “hp” simultaneously.

In this paper, we does not aim at calibrating internal parameters of cameras, since such the deviations in the camera productions would be quite small compared with deviations of “h” and “hp”. In addition, since significance of excluding certain ROIs from pedestrian candidates is to reduce the candidate to be classified by HOG/Fisher algorithm, thresholds on “H” should not be theoretically completed. Figure.4 shows plots of the relationship between the estimated heights and the estimated distances of pedestrians about four different view angles of 30, 60, 75, and 90 degrees. Plots were extracted from the video sequences obtained in practical vehicle driving. From the four graphs in Figure.4, upper and lower thresholds should be determined as 2.2 and 0.9 meters respectively to exclude the ROIs from the pedestrian candidate.

30 degrees 60 degrees

75 degrees 100 degrees

Figure.4: Experimental plots of h [pixels] vs. D [meters] from camera

IV. CASCADE CLASSIFIER BY HOG FEATURE

A. HOG feature Analyses of ROI Among a variety of algorithms for pedestrian detection,

algorithms employing HOG feature are well known to be quite successful. Combined with learning machine such as SVM (Support Vector Machine) or Fisher Linear Classifier, it exerts good performance for object classification problems. Followings are overview of HOG feature extraction process implemented to our system.

HOG (Histograms of Oriented Gradients) feature represents spatial distribution of edge direction of the scene. Image data of pedestrian including various background images behind are used as training data for Fisher. In this paper, training images are scaled into 64 x 128 pixels, and a block of 8 x 8 pixels are defined to extract local HOG feature. An orientation of gradients is estimated by applying an edge operation to each pixel, and the orientation is quantized into nine measures. 64 quantized orientations for 8 x 8 pixels are plotted into a histogram with respect to nine measures. This histogram is translated into a vector of nine dimensions at which values of elements represent nine magnitudes in histograms.

Gradient strengths vary among images and locations within an image owing to illumination and foreground-background contrast. In order to cancel such effects, normalization will be performed about descriptor vectors. In this paper, a descriptor block of 16 x 16 pixels consisting of four 8 x 8 pixel cells are defined, and a descriptor vector of 36 dimensions is obtained by connecting four 9 dimension vectors. A sequence of descriptor blocks are obtained by shifting the region by 8 pixels into the direction of raster scan, and a sequence should be consist of 7 x 15 descriptor blocks. Each descriptor vector is normalized with square norm to be a normalized descriptor vector. Thus, a vector of 3780 dimensions is obtained connecting 105 normalized descriptor vectors with respect to a training image.

An ROI rectangle is scaled into the height of 128 pixels whereas the width of the rectangle does not have to be 64 pixels. Such the scaled ROI should be examined to detect correct region of a pedestrian by HOG/Fisher. In this paper, three different scales of training data that have the height of 128, 120, and 112 pixels are prepared to train three sets of Fisher classifier respectively. L1-sqrt norm in the original paper [5] was employed in this paper.

B. Cascade of HOG/Fisher Classifiers Since non-pedestrian data would be distributed in various

regions in HOG feature space, it is difficult to classify pedestrian data and non-pedestrian data by a single hyperplane. On the other hand, hypersurface as SVM of higher dimensions than quadratic surface should suffer from overfitting problem. Therefore, we decided to construct cascade structure connecting each classifier of a hyperplane which is trained by non-pedestrian learning data of different categories as shown in Figure.5(a).

In this paper, we employed Fisher classifier for each step in cascade classifiers because of its simplicity and competitive

Page 5: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

performance with linear SVM. The cascade is constructed by connecting four HOG/fisher classifiers. In each step of the cascade, data determined as non-pedestrian should be excluded, and residual data will be fed into the classifier in the next step. Consequently, residual data from the final step of the cascade will be determined as pedestrian.

Figure.5(b) shows training pedestrian by HOG/Fisher classifier. Out of them, 1399 pedestrian images were obtained from INRIA training database [17], and 1212 pedestrian images were extracted from our original video sequences. Training data for non-pedestrian classes were extracted from our original video sequences. We performed experiments of classification by using 2663 non-pedestrian data as a feasibility study. As a result, a lot of images which are falsely determined as pedestrians were occupied with the strong edges of vehicles and buildings. Then, we prepared image classes occupied with vertical edges and horizontal edges to construct the cascade of classifiers.

In this paper, when a ROI belonging to the same object of more than three frames out of consecutive four frames was classified into pedestrian, this object is determined as pedestrian. Therefore, failure in pedestrian detection of less than two frames does not degrade the classification result. Thus, tracking by the S-T MRF model is important in improving stability of the classification algorithm.

V. EXPERIMENTAL RESULTS

A. Experimental Results Cameras were set up at left and right side and center of the

vehicle. A CCD camera of 30 degree view angle was employed for the center camera. CCD cameras of 60, 75, and 100 degree view angles were used for left and right side cameras. In this paper, the center camera is supposed to be used for avoiding front collisions to pedestrians crossing in front of the vehicle traveling in high speed along straight roads. Therefore, we employed the camera of narrow view angle for the center camera. On the other hand, the left and right side cameras are supposed to be used for avoiding side collisions to pedestrians crossing around the vehicle traveling in relatively slow speed at intersection. Therefore, we employed the cameras of wide view angles for the left and right side cameras. Video sequences were acquired at intersections and streets in downtown Tokyo. For the evaluation of pedestrian detection in this paper, 17 scenes including 209 pedestrians were examined. Training data in Figure.5 were extracted from the other than above 17 scenes for the examination. Examinations were performed by two different procedures represented in Eq.(6) and Eq.(7).

In Eq.(6), frames trueped_exist_N represents the number of frames in which pedestrians exist. The number of frames is estimated by each object and each frame, and the number of frames for each object and each frame is added into frames trueped_exist_N . For example, there exist two pedestrians. Of which one pedestrian exists for 10 frames and another exists for 8 frames, and thus frames trueped_exist_N should be estimated as 18.

(a)Cascade structure

KMJ : 1212 images

http://lear.inrialpes.fr/dat [17]

INRIA : 1399 images

(b) Training images of Pedestrian images

All categories 2663 images

Vertical edges 2421 images

Horizontal edges

3817 images Complicated textures

1892 images (c) Training images of Non-pedestrian

Figure.5: Cascade classifier by HOG/Fisher rames_correct_fped_detectN represents the number of frames in

which pedestrians can be detected correctly. Therefore, “DetectRate_frame” represents the rate at which pedestrians were successfully detected among existing pedestrians as the

Page 6: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

ground truth. mes_asped_fraROI_detectN represents the number of frames in which the algorithm determined the ROIs as pedestrians. The number of frames is estimated by each object and each frame, and the number of frames for each object and each frame is added into mes_asped_fraROI_detectN .

mes_false_fraROI_detectN represents the number of frames at which the algorithm determined the ROIs as pedestrians while they are non-pedestrians. Therefore, “FalseRate_frame” represents the rate at which the detection results include false detections.

As an analogy from Eq.(6), Eq.(7) represents the detection rate and the false alarm rate estimated by each object. In this estimation, the number of frames is not considered. When a pedestrian is detected at one or more frames during a sequence of image frames, this result is regarded as a successful detection in estimating “DetectRate_object”. When a non-pedestrian object is detected at one or more frames during a sequence of image frames, this result is regarded as a false detection in estimating “FalseRate_object”.

From the above definitions, “DetectRate_frame” should be strict rather than “DetectRate_object” in estimation of the detection rate, and “FalseRate_object” should be strict rather than “FalseRate_frame” in estimation of the false alarm rate.

mes_asped_fraROI_detect

mes_false_fraROI_detect

strue_frameped_exist_

rames_correct_fped_detect

frameFalseRate_

_frameDetectRate

NN

NN

(6)

ects_asped_objROI_detect

ects_false_objROI_detect

tstrue_objecped_exist_

bjects_correct_oped_detect

objectFalseRate_

_objectDetectRate

NN

NN

(7)

Figure.6 exemplifies results of pedestrian detection where C030 represents a center camera of 30 degrees view angle, L075 represents a left camera of 75 degrees view angle, and R100 represents a right camera of 100 degrees view angle and so on. Rectangles represent objects which were detected by motion difference and tracked by the S-T MRF. ROIs bounded by blue rectangles were determined as pedestrians, and ROIs bounded by red rectangles were determined as non-pedestrians by the HOG/Fisher classifier.

Figure.6(a) shows a sequence of object tracking the S-T MRF model and classification by the HOG/Fisher. The sequence shows that pedestrians were tracked successfully while the on-board camera moving, and a pillar was successfully excluded from pedestrian candidates in Frame 258. Persons traveling on bicycles were successfully classified into pedestrians, since the HOG/Fisher examined their ROIs by scanning the HOG window though whole area of the ROIs as explained in Section IV-A.

In Figure.6(b) L060, a parking vehicle were determined as a pedestrian which means a classification failure by the HOG/Fisher classifier. The cascade classifier comprises two descriptors trained by images occupied with horizontal edges

and vertical edges, in order to exclude vehicles, pillars, buildings and so on. However, some objects were not excluded. In Figure.6(b) R060, pedestrians standing beside a building were not detected by the motion detector, because they stand quite close to the building and the motion differences were quite small. This case was regarded as a detection failure in the estimation of detection rate in this paper. However, people besides the building are not dangerous in the practical scene, it would not be necessary to support drivers in such the case.

B. Discussion From our experiments, objects with the heights of more

than 4 blocks, that is 32 pixels, were detected without lowering the performance of the pedestrian detection. Since this restriction as “32 pixel height” comes from the resolution of the image, the restriction commonly used on variations in camera view angles. From Figure.3(a), pedestrians of 1.7 meter tall should be located at the distances of 41, 31, 26, and 17 meters from the camera associated with the view angles of 30, 60, 75, and 100 degrees respectively. We suppose that the wide range camera such the case with 100 degrees view angle should be used to avoid side collision to pedestrians at intersections. Since the vehicle should travel in slow speed in such the situation, detection range of 17 meters would be practically acceptable.

In this paper, the number of people who don’t have any motion difference like in Figure.6 (b) R060-01 has been counted into the results. If we exclude such people, the detect rate rises from 91.39 % to 93.30 % in object based evaluations and from 86.61 % to 88.61 % in frame based evaluations.

TABLE.1: Performance Analyses of Cascade Classifier (a) Object-based evaluations : A = 209 : pedestrian appeared

Sum Pedestrian Non-ped Detect rate False alarm rate

D=B+C B C B / A C / D Motion detection 646 205 441 98.09 68.26

Geometrical constraint on ROI 378 205 173 98.09 45.76

All categories 319 200 119 95.69 37.30

Vertical edges 255 196 59 93.78 23.13

Horizontal edges 229 192 37 91.87 16.16 Complex textures 227 191 36 91.39 15.86

(b) Flame-based evaluations : A = 5883 frames : pedestrian appeared

Sum Pedestrian Non-ped Detect rate False alarm rate

D=B+C B C B / A C / D Motion detection 7558 5751 1807 97.76 23.91

Geometrical constraint on ROI 6960 5736 1224 97.50 17.59

All categories 6748 5553 1195 94.39 17.71

Vertical edges 6464 5466 998 92.91 15.44

Horizontal edges 5853 5145 708 87.46 12.10 Complex textures 5797 5095 702 86.61 12.11

VI. CONCLUSION In this paper, we have developed general method for

pedestrian detection which is applicable to various kinds of

Page 7: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

view angles. The algorithm was examined using cameras of 30-100 degree view angle, and achieved high accuracy in pedestrian detection. In addition, this algorithm requires quite a simple and practical calibration for the camera based on experiment on pedestrian height. The distance ranges of the algorithm as 41meters by the camera of 30 degrees view angle and as 17 meters by the camera of 100 view angle would be practically acceptable. Thus, practical systems can be designed selecting the camera specifications suitable to the systems while the algorithm for pedestrian detection can be uniformly applied.

REFERENCES [1] J.A.Misener, “PATH Investigations in Vehicle-Roadside

Cooperation and Safety: A Foundation for Safety and Vehicle-Infrastructure Integration Research”, Proceedings of the 9th IEEE Intelligent Transportation Systems Conference(ITSC’06), Toronto, Canada, 2006.

[2] L.Alexander, P.M.Cheng, A.Gorjestani, A.Menon, B.Newstrom, C.Shankwitz and M.Donath, "The Minnesota Mobile Intersection Surveillance System", Proceedings of the 9th IEEE Intel-ligent Transportation Systems Conference, Toronto, Canada, 2006.

[3] C. Papageorgiou, T. Poggio, “A trainable system for object detection,” International Journal of Computer Vision 38 (1) (2000) 15– 33.

[4] D. M. Gavrila, and S. Munder, “Multi-cue pedestrian detection and tracking from a moving vehicle,” Int J. Comput. Vis., 2007.

[5] N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection,” IEEE Conference on Computer Vision and Pattern Recognition 2 (2005) 886–893.

[6] M. oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio., “Pedestrian detection using Wavelet template,” In Proceedings of IEEE CVPR 97, pages 193-199,1997.

[7] D. M. Gavrila, J. Giebel, and S. Munder, “Vision Based Pedestrian detection: The PROTECTOR system,” in proc. IEEE Intell. Veh. Symp., June 2004.

[8] S. Munder, C. Schnorr, D. M. Gavrila, “Pedestrian Detection and Tracking Using a Mixture of View-Based Shape – Texture Models,” IEEE Transactions on Intelligent Transportation Systems, Volume: 9, Issue: 2, page(s): 333-343 , June 2008.

[9] A. Shasua, Y. Gdalyahu, and G. Hayun, “Pedestrian detection for driving assistance systems: single-frame classification and system level performance,” IEEE Intelligent vehicles Symposium, 2004.

[10] R. Cutler, L. Davis, “Robust real-time periodic motion detection: analysis and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8) (2000) 781–796.

[11] H. Sidenbladh, “Detecting human motion with support vector machines”, Proceedings of the 17th International Conference on Pattern Recognition, 2:188-191, 2004.

[12] P. Viola, M.J. Jones, and D. Snow, “Detecting pedestrian using patterns of motion and appearance”, IEEE International Conference of Computer Vision, 2:734-741,2003.

[13] C. Curio, J. Edelbrunner, and T. Kalinke, Christos Tzomakas, and Werner von Seelen, “ Walking Pedestrian Recognition.,”IEEE Transaction on Intelligent transportation Systems, Vol.1, No. 3, September 2000.

[14] H. Elzein, S. Lakshmanan, P. Watta, “A motion and shape Based pedestrian Detection algorithm,” Proceedings. IEEE Intelligent Vehicles Symposium, 2003.

[15] S. Kamijo, M. Sakauchi, “Simultaneous Tracking of Pedestrians and Vehicles in Cluttered Images at Intersections,” 10th World Congress on ITS, Madrid, November.2003, CD-ROM

[16] B. Sen, K. Fujimura, S. Kamijo, “Pedestrian Detection by On-board Camera Using Collaboration of Inter-layer Algorithm”, ITSC2009 pp588-595, October 2009, St.Louis.

[17] http://lear.inrialpes.fr/data

Frame 204

Frame 207

Frame 214

Frame 221

Frame 233

Frame 258

(a) A sequence of pedestrian detection (Left camera of 75 degree view angle : L075)

Page 8: Pedestrian Detection Algorithm for On-board Cameras of ...kmj.iis.u-tokyo.ac.jp/research/PDF/Onboard/IV10_Onboard.pdf · Existing texture based pedestrian detection techniques have

C030

L060

R060

L075

R075

L100

R100 (b) Results of each camera specification Figure.6 : Results of pedestrian detection