modified surf based object tracking system - · pdf fileabstract — a speeded up robust...

5
AbstractA Speeded Up Robust Feature (SURF) algorithm is modified and applied to an objectobject tracking problem on fisheye lens images for an automobile safety system. The modified version of SURF algorithm locates objects on images by the matching points distribution. The proposed object tracking system intends to achieve more accurate and faster object tracking results robust to variations in color and shape by correcting the problems with MeanShift algorithm. In order to evaluate the proposed system, experiments on sets of image sequence data obtained from fisheye lens images are performed and compared with the conventional MeanShift algorithm. The preliminary results show that the modified SURF-based object tracking method can be a valuable alternative over existing methods based on MeanShift algorithm in terms of tracking accuracy and speed for fisheye lens images. KeywordsObject detection, Speed Up Robust Feature, , MeanShift, object, Moments. I. INTRODUCTION UTOMATC DETECTION and tracking of objects is one of the most important topics in designing security system. Especially, detecting object is an essential task for autonomous vehicles. When computer vision is used for this purpose, it becomes a very challenging problem because objects have different appearance and shapes [1]-[4]. A simple and powerful tool for this problem is to transform this problem to a binary classification problem, where a local region is classified either a region with object or not with a sliding window strategy. Most important achievements in the field of object detection problem include the method of gradient-based features with SIFT [5]- [7] and Histogram of Oriented Gradient (HOG) features [8]. Various types of sensors including various types of camera lenses are utilized in different object detection and tracking systems for autonomous vehicles. Among different camera lenses, fisheye lens produces a wide panoramic image with strong visual distortion. Because of this strong visual distortion, it sometimes limits its applicability to image processing and understanding tasks even though its wide angles of view. If, however, the task does not require any sophisticated information related with image processing and understanding results. the fisheye lens becomes an excellent candidate to produce images with a very wide angle of view[3]. The object tracking methods used in this paper adopts SURF algorithm with MeanShift algorithm [9][10]. MeanShift is a procedure for locating the maxima of a density function given discrete data sampled from that function [5][6] and SURF algorithm is a local feature detector and descriptor that can be Miso Jang and Dong-Chul Park are with Department of Electronics Engineering, Myong Ji University, Yong In, Gyeonggi-do, 449-728, South Korea (e-mail: [email protected]). used for tasks such as object recognition or registration or classification or 3D reconstruction. The modified SURF algorithm used in this paper utilizes MeanShift and SURF for accurate object tracking purpose with real time operation on fisheye lens images. In this paper, we take the advantages of MeanShift and SURF algorithm for object detection and tracking task on fisheye lens images. The rest of this paper is organized as follows: A brief summary of MeanShift and SURF in Section 2. Section 3 summarizes a object tracking method based on SURF algorithm. Experiments and results are given in Section 4. Finally, Section 5 concludes this paper. II. MEAN SHIFT AND SURF ALGORITHMS MeanShift is a procedure for finding the maxima of a density function given discrete data samples [5][6]. MeanShift considers data space as a probability density function. If the input is a set of points then MeanShift considers them as sampled from the underlying probability density function. Dense regions are considered as the local maxima of the probability density function. A confidence map based on the color information between current image frame and previous image frame in image data is introduced when the mean shift is utilized for object tracking problems. The probability information from the color pixels of the target object in the previous image frame can allow us to find the most probable location near the target objects previous location with the mean shift information. The MeanShift algorithm can find the location of target object iteratively beginning with the initial estimate of the target object. When a Gaussian kernel on the distance to the current estimate is used and is the neighborhood of , the weighted mean of the density in the window estimated by the following equation: For each data point, a gradient ascent method on the local estimated density is performed until convergence. The stationary points obtained via gradient ascent are considered as the modes of the density function. Note that all points Modified SURF-based Object Tracking System Miso Jang, and Dong-Chul Park A International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online) 319

Upload: doanbao

Post on 07-Feb-2018

231 views

Category:

Documents


7 download

TRANSCRIPT

Abstract—A Speeded Up Robust Feature (SURF) algorithm is

modified and applied to an objectobject tracking problem on fisheye

lens images for an automobile safety system. The modified version of

SURF algorithm locates objects on images by the matching points

distribution. The proposed object tracking system intends to achieve

more accurate and faster object tracking results robust to variations in

color and shape by correcting the problems with MeanShift algorithm.

In order to evaluate the proposed system, experiments on sets of image

sequence data obtained from fisheye lens images are performed and

compared with the conventional MeanShift algorithm. The

preliminary results show that the modified SURF-based object

tracking method can be a valuable alternative over existing methods

based on MeanShift algorithm in terms of tracking accuracy and speed

for fisheye lens images.

Keywords— Object detection, Speed Up Robust Feature, ,

MeanShift, object, Moments.

I. INTRODUCTION

UTOMATC DETECTION and tracking of objects is one

of the most important topics in designing security system.

Especially, detecting object is an essential task for

autonomous vehicles. When computer vision is used for this

purpose, it becomes a very challenging problem because

objects have different appearance and shapes [1]-[4]. A simple

and powerful tool for this problem is to transform this problem

to a binary classification problem, where a local region is

classified either a region with object or not with a sliding

window strategy. Most important achievements in the field of

object detection problem include the method of gradient-based

features with SIFT [5]- [7] and Histogram of Oriented Gradient

(HOG) features [8]. Various types of sensors including various

types of camera lenses are utilized in different object detection

and tracking systems for autonomous vehicles. Among

different camera lenses, fisheye lens produces a wide

panoramic image with strong visual distortion. Because of this

strong visual distortion, it sometimes limits its applicability to

image processing and understanding tasks even though its wide

angles of view. If, however, the task does not require any

sophisticated information related with image processing and

understanding results. the fisheye lens becomes an excellent

candidate to produce images with a very wide angle of view[3].

The object tracking methods used in this paper adopts SURF

algorithm with MeanShift algorithm [9][10]. MeanShift is a

procedure for locating the maxima of a density function given

discrete data sampled from that function [5][6] and SURF

algorithm is a local feature detector and descriptor that can be

Miso Jang and Dong-Chul Park are with Department of Electronics

Engineering, Myong Ji University, Yong In, Gyeonggi-do, 449-728, South

Korea (e-mail: [email protected]).

used for tasks such as object recognition or registration or

classification or 3D reconstruction. The modified SURF

algorithm used in this paper utilizes MeanShift and SURF for

accurate object tracking purpose with real time operation on

fisheye lens images. In this paper, we take the advantages of

MeanShift and SURF algorithm for object detection and

tracking task on fisheye lens images.

The rest of this paper is organized as follows: A brief

summary of MeanShift and SURF in Section 2. Section 3

summarizes a object tracking method based on SURF

algorithm. Experiments and results are given in Section 4.

Finally, Section 5 concludes this paper.

II. MEAN SHIFT AND SURF ALGORITHMS

MeanShift is a procedure for finding the maxima of a density

function given discrete data samples [5][6]. MeanShift

considers data space as a probability density function. If the

input is a set of points then MeanShift considers them as

sampled from the underlying probability density function.

Dense regions are considered as the local maxima of the

probability density function. A confidence map based on the

color information between current image frame and previous

image frame in image data is introduced when the mean shift is

utilized for object tracking problems. The probability

information from the color pixels of the target object in the

previous image frame can allow us to find the most probable

location near the target object’s previous location with the

mean shift information. The MeanShift algorithm can find the

location of target object iteratively beginning with the initial

estimate of the target object.

When a Gaussian kernel on the distance to the current

estimate is used and is the neighborhood of , the

weighted mean of the density in the window estimated by the

following equation:

For each data point, a gradient ascent method on the local

estimated density is performed until convergence. The

stationary points obtained via gradient ascent are considered as

the modes of the density function. Note that all points

Modified SURF-based Object Tracking System

Miso Jang, and Dong-Chul Park

A

International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)

319

Fig. 1 The second order differentiation box filter

associated with the same stationary point belong to the same

cluster. In order to find the maximum density, MeanShift

algorithm uses the ratio of the current distribution over the

target iteratively. When is the intensity of the discrete

probability image at within the search window, mean value of

locations of the discrete probability image and the size of

search window are found by using the following moments[9]:

∑∑

∑∑

∑∑

By using the moments in Eq. (2), the followings can be

found:

(3)

where denotes the center of the moving object and is

the estimated size of search window.

SURF(Speeded Up Robust Feature) is an improved version

of SIFT (Scale-Invariant Feature Transform)[5][6]. SIFT is an

algorithm to detect and describe local features in images. The

SIFT feature descriptor is invariant to uniform scaling,

orientation, and partially invariant to affine distortion and

illumination changes and SIFT can identify target objects

distorted with the conditions of clutter and under partial

occlusion. Given target objects, keypoints are first extracted

from reference images in SIFT. An object in a new image is

then recognized by individually matching each feature from the

new image to this reference image. Based on the matching

information for the entire feature space, subsets of keypoints

with its location, scale, and orientation in the new image

decides matches with reference images. In SIFT, Laplacian of

Gaussian with Difference of Gaussian is approximated in order

to find the scale-space.

Fig. 2 Searching for maxima in scale space[11]

In SURF[10], Laplacian of Gaussian (LoG) is approximated

with Box Filter. With this approximation, the convolution

process with box filter can be calculated with integral images

through relatively light computational effort. The SURF

algorithm utilizes the determinant of Hessian matrix for both

scale and location. The Hessian matrix is defined as follows

|

|

where , and are the convolutions of the Gaussian

second order derivative with the image at [9].

For faster computation, the following Hessian determinant is

Utilized[8]:

( ) ( )

In order to achieve the scale invariant feature[11], an image

pyramid of scale spaces is adopted as shown in Fig. 2. As the

difference between two differently low-pass filtered images,

the DoG actually performs as a band-pass filter, which removes

high frequency noise, and also some low frequency

components in the image. The frequency components in the

passing band can be considered as the edges in the images.

Pyramid layers are subtracted and the DoG (Difference of

Gaussian) images are obtained. Each octave in the scale space

represents a group of filter responses obtained by the

convolution of the input image and a set of filters. The interest

points are selected when the determinant value of a point is

larger than a predetermined threshold in the window.

The orientation of rotation invariant interest points can be

obtained by using the wavelet responses in horizontal and

vertical direction for a neighborhood of size 6s. The orientation

is determined by finding the sum of all responses within a

sliding orientation window of 60 degree angle.

Local feature descriptor in SURF can also be extracted by

using Haar wavelet responses in horizontal and vertical

direction. A neighborhood of size is considered

International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)

320

Fig. 3 Building the Interest point descriptor [10]

around the keypoint where s is the size as shown in Fig. 3. It is

then divided into 4x4 subregions. For each subregion,

horizontal and vertical wavelet responses are calculated and a

vector is formed. That is, Haar wavelet responses of

∑ ∑ ∑ ∑ are obtained and the resulting

64-dimensional features on subblocks are

obtained for the region.

In order to reduce the computational burden, the sign of

Laplacian for underlying interest point is utilized because it is

already computed during detection. The sign of the Laplacian

can detect bright blobs on dark backgrounds. In the matching

stage, we only compare features of types of contrast by using

the sign of Laplacian. This simplification results in faster

matching and can help the real-time operation of the systems

adopting this algorithm.

An accurate and stable operation of objects tracking task can

be accomplished by combining SURF algorithm and MeanShift

algorithm. However, in order to estimate the center of moving

objects, the color distribution information used in MeanShift

algorithm is replaced by the matching point distribution

information. That is, is utilized instead of in

calculating the moments of Eq. (2). After finding interest

points, a matching process is then performed. If the signs of

Laplacian match, the Euclidean distances are then calculated.

By using the following nearest neighbor search method, the

matching points are estimated with a predetermined threshold

value, .

Note that is set to be 0.8 in our experiments.

III. MODIFIED SURF-BASED OBJECT TRACKING SYSTEM

An accurate and stable object tracking system utilized the

keypoint information obtained by using SURF algorithm in this

paper. In order to estimate the center of moving objects, the

Modified SURF-based Object Tracking System utilizes the

information of matching points distribution by calculating the

moments of Eq. (2) for instead of while

Fig. 4 Tracking results on video frame No. 1

MeanShift algorithm utilizes the color distribution. Local

feature descriptor is constructed with a square region (size of

) centered at interest points and by rotating along the

orientation. The resulting 64-dimensional features

are then found by calculating Haar wavelet responses of the

following four descriptors: ∑ ∑ ∑ ∑ .

International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)

321

Fig. 5 Tracking results on video frame No. 2

In order to achieve an adaptive estimation of object size, the

ratio between the distance among matching points of the target

object and the distance among matching points of the object in

the image space is calculated. The ratio and the size of moving

object can then be found.

IV. EXPERIMENTS AND RESULTS

The Modified SURF-based Object Tracking System intends

to overcome the problems involved with the object tracking

system based on MeanShift algorithm for accurate and stable

object tracking performance. Since the features adopted in the

proposed object tracking system are based on the SURF

algorithm, the proposed object tracking system can produce

accurate tracking results. The proposed object tracking system

is evaluated with sets of image sequence data in terms of

tracking accuracy and tracking speed. The image sequences

obtained with a fisheye lens used for our experiments and

tracking results are shown in Fig. 3 and Fig. 4. Since the images

obtained through fisheye lens, the images include severe

distortion [3]. As can be seen from Fig. 4 and Fig. 5, the objects

on the image sequence has very strong visual distortions

including color changes. The image sequence has 320 frames

with 640 480 resolution. The computing environment used in

our experiments is: Intel Core i7-2600 CPU 3.40GHz, RAM

8.00GB, Windows 7 Enterprise 64bit OS.

V. CONCLUSIONS

An object tracking system based on modified SURF

algorithm and Meanshift algorithm is proposed in this paper.

Once an object in the proposed object tracking system is

detected, feature points inside a window are calculated by

using SURF algorithm. In the following image frame, matching

points with the target object are calculated and the center of

object is then estimated by using the moment method. When the

matching points of an object between frames are satisfied with

the conditions of the nearest neighbor search method, the size

of moving object is then estimated by using the ratio between

the distances of matching points in the two sequential image

frames. The proposed system improves the tracking inaccuracy

in CAMShift when changes including color, illumination,

rotation, and shape are present. In order to increases its

operation speed for real-time operation, the proposed system

utilizes strategies for minimizing arithmetic operations: using

the moment generating function for the estimation of target

objects on image data. Experiments on sets of image sequence

data obtained from fisheye lens for the system evaluation

purpose show that the proposed system yields very accurate

tracking results with real-time operation speed. This

preliminary results implies that the proposed object tracking

system is far more accurate than the conventional MeanShift

algorithm. The results also show that the operation speed is

suitable for real-time object tracking purpose.

ACKNOWLEDGMENT

This work was supported by the IT R&D program of The

MKE/KEIT (10040191, The development of Automotive

Synchronous Ethernet combined IVN/OVN and Safety control

system for 1Gbps class).

REFERENCES

[1] S. S. Paisitkriangkrai, C. Shen, and J. Zhang, “ Fast pedestrian detection

using a cascade of boosted covariance features,” IEEE Transactions on

Circuits and Systems for Video Technology. Vol. 18, no.8, pp.1140-1151, 2008.

[2] S. Wu, R. Laganiere, and P. Payeur “Improving pedestrian detection

with selective gradient self-similarity feature,” Pattern Recognition vol. 48, no. 8, pp. 2364-2376, August 2015.

[3] Y. Kubo, T. Kitaguchi and J. Yamaguchi, “Human tracking using fisheye

images,” in Proc. IEEE Conf. SICE, 2007, pp. 17-20. [4] N. Tuong, T. Muller, and A. Knoll, “Robust pedestrian detection and

tracking from a moving vehicle,” in Proc. SPIE7878. Intelligent Robots

and Computer Vision XXVIII: Algorithms and Techniques, doi:10.1117/12.871994, 2011.

[5] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vo.17, no.8, pp. 790-799,

1995.

[6] D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence,

vol. 24, no. 5, pp. 603-619, May, 2002

[7] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol.60, no.2, pp.91-110, 2004.

[8] N. Dalal and B. Triggs, “Histograms of oriented gradients for human

detection,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.

[9] R. Mukundan and K. R. Ramakrishman, Moment Functions in Image

Analysis: Theory and Application, Singapore: World Scientific, 1998. [10] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust

features,” Computer Vision and Image Understanding (CVIU), vol. 110,

no. 3, pp. 346-359, June, 2008. [11] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” .

Int. J. of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.

International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)

322

Miso Jang received the B.S. degree in Electronics Engineering from MyongJi University, Korea, in 2012. She is pursuing his M.S. degree in

Electronics Engineering at Intelligent Computing Research Lab at MyongJi

University. Her research interests include intelligent computing, neural networks, and pattern recognition.

Dong-Chul Park (M’90-SM’99) received the B.S. degree in electronics engineering from Sogang University, Seoul, Korea, in 1980, the M.S. degree in

electrical and electronics engineering from the Korea Advanced Institute of

Science and Technology, Seoul, Korea, in 1982, and the Ph.D. degree in electrical engineering, with a dissertation on system identifications using

artificial neural networks, from the University of Washington (UW), Seattle, in

1990. From 1990 to 1994, he was with the Department of Electrical and Computer Engineering, Florida International University, The State University

of Florida, Miami. Since 1994, he has been with the Department of Electronics

Engineering, MyongJi University, Korea, where he is a Professor. From 2000 to 2001, he was a Visiting Professor at UW. He is a pioneer in the area of

electrical load forecasting using artificial neural networks. He has published

more than 130 papers, including 40 archival journals in the area of neural network algorithms and their applications to various engineering problems

including financial engineering, image compression, speech recognition,

time-series prediction, and pattern recognition. Dr. Park was a member of the Editorial Board for the IEEE TRANSACTIONS ON NEURAL NETWORKS

from 2000 to 2002.

International Journal of Computer Science and Electronics Engineering (IJCSEE) Volume 3, Issue 4 (2015) ISSN 2320–4028 (Online)

323