fast moving object detection with non-stationary backgrounddobby/assets/2012_mtap.pdf ·...

Multimed Tools ApplDOI 10.1007/s11042-012-1075-3

Fast moving object detection with non-stationarybackground

Jiman Kim · Xiaofei Wang · Hai Wang ·Chunsheng Zhu · Daijin Kim

© Springer Science+Business Media, LLC 2012

Abstract The detection of moving objects under a free-moving camera is a difficultproblem because the camera and object motions are mixed together and the objectsare often detected into the separated components. To tackle this problem, wepropose a fast moving object detection method using optical flow clustering andDelaunay triangulation as follows. First, we extract the corner feature points usingHarris corner detector and compute optical flow vectors at the extracted cornerfeature points. Second, we cluster the optical flow vectors using K-means clusteringmethod and reject the outlier feature points using Random Sample Consensusalgorithm. Third, we classify each cluster into the camera and object motion usingits scatteredness of optical flow vectors. Fourth, we compensate the camera motionusing the multi-resolution block-based motion propagation method and detect theobjects using the background subtraction between the previous frame and the motioncompensated current frame. Finally, we merge the separately detected objects usingDelaunay triangulation. The experimental results using Carnegie Mellon University

J. Kim · H. Wang · D. Kim (B)Computer Science and Engineering, Pohang University of Science and Technology,Pohang, South Koreae-mail: [email protected]

J. Kime-mail: [email protected]

H. Wange-mail: [email protected]

X. WangSchool of Computer Science and Engineering, Seoul National University, Seoul, South Koreae-mail: [email protected]

C. ZhuDepartment of Computer Science, St. Francis Xavier University, Antigonish,NS Canada B2G 2W5e-mail: [email protected]

Multimed Tools Appl

database show that the proposed moving object detection method outperforms theexisting other methods in terms of detection accuracy and processing time.

Keywords Harris corner detection · K-means optical flow clustering ·Scatteredness · Motion compensation · Delaunay triangulation

1 Introduction

Object detection is the most fundamental task in the vision-based systems such asvisual surveillance system [5, 15], intelligent cars and service robots. There are twotypes of objects: static objects and moving objects. Static objects are considered tobe in a still image and moving objects are considered to be in an image sequence.Many researchers have proposed the static and moving object detection methodsunder the fixed camera environments. Oliver et al. [16] proposed the backgroundeigenspace from a set of sample images. and detected moving object by computingand thresholding the Euclidean distance between the input image and the projectedimage onto the eigenspace. Li et al. [13] used the Bayesian decision rule for classi-fying the moving objects from the stationary and moving backgrounds using thecolor and color co-occurrence features. Zivkovic and Heijden [28] proposed animproved the Gaussian mixture model (GMM) model that updated the parametersand selected the number of components for each pixel simultaneously. Maddalenaand Petrosino [14] proposed a background model by introducing a self-organizinglearning method to deal with many background variations including light changes,moving background and shadows. Han et al. [9] proposed an improved backgroundmodel by the sequential kernel density approximation using the mean-shift mode de-tection and the sequential mode propagation algorithm. Barnich and Droogenbroeck[2] proposed a sample-based background subtraction algorithm that used a set ofrandomly sampled values as a pixel model and compared an input with the model toclassify moving object pixel.

In this paper, we focus on the problem of moving object detection under a movingcamera. Naturally, the object detection methods under a fixed camera cannot beapplied directly for detecting the objects under a moving camera because there aremultiple sources of motions from both the camera and the moving objects. To detectthe moving objects under a moving camera environment, we need to discriminate themultiple motions into the camera motion and the object motions. Generally, thereare three approaches to detect the moving object under a moving camera.

The first approach is to compensate the camera motion by ego-motion estima-tion. Hayman and Eklundh [10] used an image relation between the two framesby finding the corresponding corner points and computing the epipolar geometryto deal with the pan and tilt motions of camera. Ren et al. [18] estimated theprojective transformation model from the corresponding corner pairs between thetwo consecutive frames. Uemura et al. [26] proposed a robust feature extractionalgorithm based on KLT tracker and SIFT and motion compensation method usinghomography estimation. However, the above approaches could deal with small panand tilt motions and assumed that the background was well represented by onestatistical model.

Multimed Tools Appl

The second approach is to separate the multiple motions of the input sequence.Borshukov et al. [3] proposed a motion segmentation method using the affine motionmodel classification, where they divided the input image into many rectangularblocks and estimated the affine motion model for each block using motion vectors,merged the similar motion models to dominant motion model, and obtained themotion segmentation image by labeling each pixel with the closest dominant motionmodel. Ke and Kanade [12] proposed a subspace method for layer extraction, wherethey segmented the image using color information and estimated the affine motionmodels, computed a linear subspace from the motion models, and obtain the motionlayers by projecting the data points into the subspace and clustering them using themean-shift. Tao et al. [25] proposed a dynamic motion layer representation methodto maintain the coherency of the motion, appearance, and shape of each layerover time by modeling the spatial and temporal constraints, which was continuouslyestimated over time by maximizing the posterior probability of the layers on theexpectation-maximization (EM) algorithm. Jin et al. [11] proposed a backgroundmodeling method by the multi-layer homography, where the number of the planesshould be determined manually according to the environment. They computed the aset of homography and constructed the mixture of Gaussian (MoG) background modelusing the rectified image. However, the above approaches just classified variousmotions and needed an additional algorithm to select the moving object motion.

The third approach is to segment the camera and object motions using the graphcut algorithm. Xiao and Shah [27] proposed a motion layer extraction method inthe presence of occlusion, where they introduced an occlusion order constrainton the graph-cut framework to obtain the stable result of object segmentation.Schoenemann and Cremers [22] proposed a high resolution motion layer decom-position method using the graph-cut optimization. Although the above approachesprovided the high segmentation accuracy, they required a lot of computation time.

Additionally, other researchers [17, 23] proposed to detect the moving objectsunder the moving camera are proposed. They estimated the trajectory basis of cornerfeature points and classified all pixels into foreground or background using the basisspace. However, these approaches needed multi-frames to compute the trajectoriesof feature points and were time consuming processes.

We propose an improved moving object detection method under a free-movingcamera with non-stationary background, which provides both high detection perfor-mance and fast processing speed. The overall procedure of the proposed movingobject detection method consists of five main steps: feature points extraction andtracking, clustering, classification, moving object detection, and refinement (SeeFig. 1).

First, we extract the corner feature points using Harris corner detector in thefirst image frame and compute optical flow vector of each corner feature pointbetween the first and second image frames using the pyramidal Lucas–Kanademethod [4, 19]. Second, we cluster the optical flow vectors using K-means clusteringbased on Euclidean distance and reject the outlier feature points of each cluster usingthe RANSAC algorithm. Third, we classify each cluster into the camera or objectmotions using its scatteredness of optical flow vectors in terms of the average volumeof the hyperellipsoids. Fourth, we compensate the camera motion using the block-based perspective motion estimation and the multi-resolution block-based motionpropagation method and detect the moving objects using the background subtraction

Multimed Tools Appl

Fig. 1 Overall procedure of the proposed moving object detection method

between the previous frame and the motion compensated current frame. Finally, werefine the inner part of the moving objects by merging the separately detected objectsusing Delaunay triangulation.

The proposed moving object detection method has the following features. First, itimproves the moving object detection performance because Delaunay triangulationmerges the erroneously separated components into one object. Second, it is fastenough for real-time application due to its low computational complexity becauseit uses only two consecutive image frames and has no planar assumptions for thebackground.

Multimed Tools Appl

This paper is organized as follows. Section 2 describes the extraction of featurepoints by Harris corner detector and feature points tracking by optical flow compu-tation. Section 3 explains the feature points clustering using K-means algorithm andRANSAC algorithm. Section 4 describes describes feature points classification usingthe scatteredness based on the average volume of the hyperellipsoids. Section 5 ex-plains the proposed moving object detection using camera motion compensation andDelaunay triangulation. Section 6 show some experimental results and discussion.Finally, Section 7 presents our conclusion.

2 Motion estimation

We find the corner feature points using Harris corner detector because they areinvariant to rotation, scale, illumination variance and image noise, which is the mostimportant property for the robust motion estimation of feature points [21, 24]. Then,we compute the motion vectors at the corner feature points using pyramidal Lucas–Kanade method [4, 19], which is the most famous algorithm to find the correspondingpoints between two image frames.

2.1 Feature point detection

Harris corner detector is based on the local auto-correlation function that representsthe local changes of the intensity with image patches shifted by a small amount indifferent directions as

c(x, y; u, v) =xi=�Ww/2�∑

xi=�−Ww/2�

yi=�Wh/2�∑

yi=�−Wh/2�[I(x + xi, y + yi) − I(x + xi + u, y + yi + v)]2,

(1)where (x, y) is the image pixel point, (u, v) is a given amount of shift, and I(x +xi, y + yi) is the intensity value at the pixel point (x + xi, y + yi), and Wh, Ww are theheight and width of the window centered on (x, y), respectively. The shifted imageI(x + xi + u, y + yi + v) is approximated by a Taylor expansion truncated to the firstorder terms and substituting it into the auto-correlation function,

(x, y; u, v) =∑

(xi,yi)∈W

([Ix(x + xi, y + yi)Iy(x + xi, y + yi)

] [u v]T)2

= [u v] C(x, y) [u v]T , (2)

where Ix(xi, yi) and Iy(xi, yi) are the partial derivatives on (xi, yi) along the x andy direction, respectively, and the matrix C(x, y) represents the intensity structure ofthe local neighborhood. Each pixel point is classified as flat region or edge, or corneraccording to the eigenvalues of C(x, y) as follows. If two eigenvalues are small, thepixel point is a flat point because there is a little change along the x and y direction. Ifone eigenvalue is large but another is small, the pixel point is an edge point becausethere is a significant change along x (or y) direction but a little change along the y(or x) direction. If two eigenvalues are large, the pixel point is a corner point becausethere are significant changes along the x and y direction.

Each pixel point is classified as a flat or edge or corner point based on the valueof corner response R(x, y) = det(C(x, y)) − k · trace2(C(x, y)) given below, where

Multimed Tools Appl

det(C(x, y)) is the product of eigenvalues, trace(C(x, y)) is the sum of eigenvaluesand k is a constant between 0.04 and 0.06. (1) If the magnitude of R is small, it is aflat point, (2) If R is large negative, it is an edge point, and (3) If R is large positive,it is a corner point.

2.2 Optical flow computation

We compute the optical flow vectors of the detected corner points by searchingthe corresponding corner points between two consecutive image frames using thepyramidal Lucas–Kanade method [4, 19], which consists of two tasks: generation ofimage pyramid and search of the corresponding feature points on the image pyramid.

First, we generate the image pyramid, where an image is represented by apyramidal structure. The lowest level image I0(x, y) is the raw image with the highestresolution, and its width and height are n0

x = nx and n0y = ny, respectively. The

pyramidal structure is generated from the highest resolution image to the lowest res-olution image, recursively: compute I1(x, y) from I0(x, y), then compute I2(x, y) fromI1(x, y), and so on.

If we let Il(x, y) be the image at the lth level and let nlx and nl

y be the widthand height of Il(x, y), respectively, then the image Il(x, y) is defined from the imageIl−1(x, y) as

Il(x, y) = 1

4

1∑

i=−1

1∑

j=−1

(1

2|i|+| j| Il−1(2x + i, 2y + j))

, (3)

where 0 ≤ 2x ≤ nxl−1 − 1 and 0 ≤ 2y ≤ ny

l−1 − 1. From the above pyramidal repre-sentation, we generate the pyramid structure of two consecutive image frames:{Il

t (x, y)}l=0,...,L and {Ilt+1(x, y)}l=0,...,L. In the real implementation in this work, we

take the number of image levels from 2 to 4.Second, we search the corresponding feature points between two consecutive im-

age frames on the generated image pyramid. Algorithm 1 summarizes the overall pro-cedure of searching the feature point (vx, vy) in the image It+1(x, y) corresponding to

Algorithm 1 Pyramidal Lucas–Kanade methodGenerate the image pyramid of two consecutive image frames{Il

t (x, y)}l=0,...,L and {Ilt+1(x, y)}l=0,...,L.

Set the feature point at the Lth level as (ux, uy) = u2L .

Set the iteration index as l = L.while (l is nonnegative) do

Search the best residual displacement vector as(dxmin , dymin) = arg min

(dx,dy)∈W(Il

t (ux, uy) − Ilt+1(ux + dx, uy + dy))

2,

where W is a searching window whose size is 5 × 5 in this work.Set the feature point at the l − 1th level as(ul−1

x , ul−1y ) = (2 · (ul

x + dxmin), 2 · (uly + dymin)).

Set the iteration index as l = l − 1.end whileSet the corresponding feature point in the image It+1(x, y) as (vx, vy) = (u0

x, u0y).

Multimed Tools Appl

a given feature point (ux, uy) in the image It(x, y) on the generated image pyramid.We will repeat this procedure for all feature points in the image It(x, y).

3 Motion clustering

3.1 Block-based k-means clustering

We cluster feature points using the length L and direction θ information of optical flowvectors. The feature points in the optical flow coordinate (L, θ) are shown in Fig. 2.The position of each feature points in the optical flow coordinate is computed as

L =√

(dx)2 + (dy)2, (4)

θ =

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

arctan(dy/dx) if dx > 0 and dy ≥ 0arctan(dy/dx) + 2π if dx > 0 and dy < 0arctan(dy/dx) + π if dx < 0π/2 if dx = 0 and dy > 03π/2 if dx = 0 and dy < 00 if dx = dy = 0,

In clustering, the number and position of initial points are important and weproposed block-based k-means clustering. Block-based k-means clustering consistsof two tasks: optical flow coordinate division into blocks and k-means clustering.

First, we divide optical flow coordinate blocks (L, θ) into blocks, where the heightof block is 30 degree and the width of block is 1 pixel in this work. Second, we selectthe initial points for clustering on the generated blocks and cluster all feature points.We compute the number of feature points within each block and sort the blocksbased on the values, and select the initial blocks starting from the block with the

Fig. 2 a Feature points detection of two consecutive frames; b optical flow vector computationby pyramidal Lucas–Kanade method; c length and angle histograms of the optical flow vector;d distribution of feature points in optical flow coordinate

Multimed Tools Appl

maximum value until the sum of values is more than 70% of the total feature points.The initial point of kth initial block is computed as

(L, θ)kinit = 1

Nk

Nk∑

i=1

(Li, θi)k, (5)

where Nk is the number of feature points of kth initial block and (Li, θi)k is the ith

feature point within kth initial block. Then, the final motion clusters are obtained byassigning each feature point to a closest initial point using Euclidean distance as

Ci = {(L, θ) | ‖(L, θ) − (L, θ)i

init‖ ≤ ‖(L, θ) − (L, θ)kinit‖, k = 1, . . . , K

}, (6)

where (L, θ) is the feature points in the optical flow coordinate and (L, θ)kinit is the

initial point of kthe initial block (Fig. 3).

3.2 Outlier rejection

The outlier rejection of each motion cluster obtained by the block-based k-meansclustering is an important for accurate motion classification of the next section.

We use Random Sample Consensus (RANSAC) algorithm [7] to select inlierfeature points by eliminating outlier feature points of each motion cluster. RANSACalgorithm consists of three tasks: model generation, consensus data update, and bestmodel selection. The three tasks are performed for each motion cluster.

Fig. 3 Motion clustering on the optical flow coordinate. a Corner feature points on the imagecoordinate; b feature points on the optical flow coordinate; c, d motion clustering results by block-based k-mean clustering on the image coordinate

Multimed Tools Appl

First, we generate a 2D Gaussian model of each motion cluster on the optical flowcoordinate by setting a consensus data using random sampling of feature points andmodel parameter estimation using the consensus data. The 2D Gaussian model orbivariate normal density distribution is as follows

p(L, θ) = 1

2πσ1σ2

√1 − ρ2

exp(

−1

2D2(L, θ, μ, �)

), (7)

where σ1 and σ2 are the variance of length L and angle θ of optical flow, respectively.And ρ is the correlation coefficient between L and θ and the Mahalanobis distanceD2(L, θ, μ,�) is computed as:

D2(L, θ, μ,�) = 1

1 − ρ2

(c2

1 − 2ρc1c2 + c22

). (8)

where c1 is (L − L̄)/σ1 and c2 is (θ − θ̄ )/σ2 and L̄ and θ̄ are the mean values of Land θ .

Second, we update the consensus data using the estimated model. The featurepoints whose error from the model is less than a pre-defined threshold value areadded to the consensus data.

Third, we select a best model between the previous best model and the currentmodel when the number of updated consensus data is more than another thresholdvalue. The motion parameters of current model are re-computed from the updatedconsensus data and the sum of error of all feature points from the two model arecompared. If the sum of error of the current model is less than the previous bestmodel, then the best model is substituted for the current model. By repeating thisprocedure until the processing iteration is satisfied, we can obtain the best consensusset, which is the inlier feature points set. Algorithm 2 summarizes the overallprocedure of RANSAC algorithm for outlier feature points rejection.

Algorithm 2 RANSAC algorithm for outlier feature points rejectionConsider data set P = {(L1, θ1), . . . , (LN, θN)} of N feature points.Set the iteration index as I = 0.while (I < L) do

Model GenerationSet consensus data C = {(LC1, θC1), . . . , (LCn, θCn)} by random sampling.Estimate 2D Gaussian model parameters p(L, θ) = (μ, �) with unit amplitude.Consensus Data UpdateCompute error e(Li, θi) = 1 − p(Li, θi), 0 ≤ i ≤ N.Update the consensus data by adding the feature points where e(Li, θi)< threshold1.Best Model Selectionif |C| > threshold2 then

Re-estimate parameters (μ′, �′) from the updated C.Compute the sum of error e(L, θ) = ∑N

i=1(1 − p(Li, θi)).Substitute as Cbest = C, ebest(L, θ) = e(L, θ) when e(L, θ) < ebest(L, θ).

end ifSet the iteration index as I = I + 1.

end whileInlierSet = Cbest.

Multimed Tools Appl

4 Motion classification

Among the separated motion clusters, we select the motion clusters of the cameraand the moving object. Using the camera motion cluster, we can estimate the cameramotion model and compensate it to detect the moving object region. Moving objectmotion cluster is used to refine the moving object region which is detected by thecamera motion compensation.

4.1 Scatteredness

The background corner feature points are scattered more widely than the featurepoints of moving object except when the background is an extremely smooth plane.Thus, it is reasonable to assume that the background takes up more space thanthe moving object in the original image coordinates (x, y). Motion clusters can beclassified as background or moving objects using the scatteredness or dispersion ofeach cluster.

There are various measurements of scatteredness including Range, Mean Ab-solute Deviation, Standard Deviation, and so on [1]. Range (R) is the simplestmeasurement of scatteredness. It is defined as the largest distance between two datapoints in a distribution as follows:

R = max(|xi − x j|), (9)

where xi and x j are the data points in one distribution. It is easy to compute andunderstand, but it can not explain the nature of the distribution because it considersjust two extreme data points not all the data points.

Mean Absolute Deviation (MAD) defines a center point of the distribution andcomputes the mean distance between each data point and the center point as follows:

MAD = 1

n

n∑

i=1

|(xi − x)|, x = 1

n

n∑

i

xi, (10)

where n is the number of the data points of the distribution. It is based on all thedata points and not affected by two extreme data points. But, there is an exceptionalcase where it can not explain the nature of the distribution. If the data points of adistribution is scattered uniformly over an image and one of the another distributionis concentrated on a circle with a radius of the half of image size, then the values ofMean Absolute Deviation can be same and the classification result might be wrong.

Standard Deviation is the square root of sum of the square deviation of all thedata points from the mean point and it is normalized with the data size as follows:

SD =√∑n

i=1(xi − x)2

n − 1, (11)

where x is the mean of all the data points and xi is the ith data, and n is the numberof data. It is the most widely used to measure the dispersion or scatteredness of adistribution.

We use the volume of the hyperellipsoids of data points of a distribution [6] wherethe data points are the feature points and the distribution is a cluster. It is moreintuitively and understandable measurement than Standard Deviation and shows

Multimed Tools Appl

best classification performance among the all measurements for scatteredness. Acluster can be defined as a general multivariate normal density as follows:

p(x) = 1

(2π)d/2|�|1/2exp

[−(1/2)(x − μ)t�−1(x − μ)], (12)

where � is the d-by-d covariance matrix E[(x − μ)(x − μ)t], and (x − μ)t�−1(x − μ)

is the squared Mahalanobis distance from x to μ. The feature points of same densityprobability are located on the hyperellipsoid of constant Mahalanobis distance.Using the covariance matrix and Mahalanobis distance, we define the averagevolume of the hyperellipsoids of a cluster as follows:

Vavg = 1

n

n∑

i=1

Vd · |�(xi)|1/2 · r(xi)d, (13)

where n is the number of feature points of the cluster and �(xi) is the covariancematrix of xi feature points and r(xi)

d is the d-squared Mahalanobis distance from xi

to μ, and Vd is the volume of a d-dimensional unit hypersphere as,

Vd =

⎧⎪⎪⎨

⎪⎪⎩

πd/2/

(d2

)! d even

2d · π(d−1)/2

(d − 1

2

)!/d! d odd,

(14)

In this paper, d is 2 and V2 = π2/2/(2/2)! = π . Thus, the average volume ofhyperellipsoids, which is same as the average area of hyperellipsoids, is Vavg =(1/n)

∑ni=1 Vd · |�(xi)|1/2 · r(xi)

2. The eigenvectors and eigenvalues of covariancematrix � define the shape of hyperellipsoid and Mahalanobis distance r definesthe radius of hyperellipsoid. We compared the classification accuracy of the variousmeasurements of scatteredness and the average volume of hyperellipsoids shows thebest classification performance in Section 6.2.

4.2 Motion classification

For camera motion compensation to detect moving object region, we should selectthe camera motion cluster among multiple motion clusters using the scatteredness.We introduces the average volume of hyperellipsoids to measure scatterednesswhere larger average volume means that the cluster is scattered more widely. Amongthe clusters having feature points more than a threshold value, the cluster index withthe maximum value of scatteredness is considered as the index of the camera motioncluster:

Ic = arg maxk

(Vkavg) k ∈ { j | N j > β · Ntot }, (15)

where (Vkavg) is the average volume of hyperellipsoids of the kth motion cluster and

β is a positive constant less than 1. The two values, N j and Ntot, are the number offeature points of the jth cluster and all the clusters. Motion clusters corresponding tomoving object can be more than one cluster when there are multiple moving objectin the scene. We select the moving object clusters among the clusters having feature

Multimed Tools Appl

Fig. 4 RANSAC is performed to eliminate outliers of feature points in advance. White and greenpoints are the feature points of the background and the object, respectively. The red box is an emptybox at level 0 that has no parameters. The blue box is also an empty box at level 1 that surrounds thered box. The purple box is a box at level 2 that surrounds the blue box. It has a set of parameters. Theresolution extension is stopped when a box with parameters appears. The parameters of the purplebox are propagated to the red box

points more than a threshold value using the relative ratio of the average volume ofhyperellipsoids as follows:

Io ={

k |(

Vkavg

)< γ ·

(Vc

avg

) }, k = 1, ..., K, (16)

where α is a threshold value and (Vcavg) is the volume of hyperellipsoids of camera

motion cluster, K is the total number of motion clusters, and γ is a positive constantless than 1. Here, the assumption that corner feature points of the background aredistributed in the whole input image but those of the object are distributed in abounded region where the object exists is used again (Fig. 4).

5 Moving object detection

5.1 Moving object border detection

A single motion model for camera motion compensation is not appropriate in thegeneral environment because the distances between the camera and backgroundcomponents, or the perspective in the input images, are different. To overcome thisproblem, we propose multi-resolution motion propagation (MRMP) to compensatefor the camera motion exactly. We divide an input image into M × N blocks andcollect the background feature points xback that belong to each block in the Sback,

B(m, n) = {xback : xback ∈ B(m, n)}, (17)

where, B(m, n) is the background feature points set of the m × nth block. We usea perspective model, which can deal with pan, tilt, and zoom effects between twoconsecutive frames, to estimate the camera motion of each block. Let U and U′ bethe background feature points set of B(m, n) at frame t − 1 and t, respectively. Theperspective model between U and U′ is:

U′ = fzU + p, (18)

Multimed Tools Appl

where fz and p = [p(x), p(y)]T are the forward motion model parameters. Whenthere are no background feature points in a block, parameter propagation should beperformed. The blocks without U and U′ refer to the parameters of the neighborhoodblock as:

( fz, p)Lempty = avg

[ (fz, p

)Li

], ( fz, p)L

i ∈ BL+1, (19)

where, ( fz, p)L are the motion model parameters of the empty block at level Land ( fz, p)L

i are the parameters of the ith block within the block BL+1. The blocksizes of higher levels are larger than the ones of lower levels and BL+1 contains theempty block. Figure 4 shows the processing of the parameter propagation where thefeature points are classified as background or objects using the motion clustering andclassification.

Camera motion compensation between two consecutive frames, It and It+1, isperformed for each block using the estimated motion model parameters. The trans-formation between the coordinates is It+1 = fzIt + p. To align the coordinates of It+1

with It, we use the backward motion model parameters as:

It+1|compen = f ′z · It+1 + p′, (20)

where, f ′z = 1/ fz and p′ = −p/ fz are the backward parameters. After performing

camera motion compensation that align the background coordinates of the twoconsecutive frames, moving object detection between It and It+1|compen is performedby using the frame difference (FD). We convert the two color images to gray imagesand compute the absolute value of the difference for each pixel. If the value is largerthan some threshold value Tobj, the pixel is a moving object pixel.

Isub(x, y) = | It(x, y) − It+1(x, y)|compen |. (21)

RFD(x, y) ={

255 , Isub(x, y) >= Tobj

0 , Isub(x, y) < Tobj(22)

Even though the background coordinates are aligned well, the FD cannot detectthe hole moving object region and the exact boundary when the color informationof the object’s inside pixels is similar because the frame difference just compares thetwo consecutive frames. In that case, the detection result of the FD contains only thefront and back region of the moving object where the color information betweentwo frames is different. To overcome this problem, we combine the DelaunayTriangulation (DT) method with the FD technique. DT segments the whole regionof the moving object using the pixels of the moving object cluster and FD refines theboundary shape of the region.

5.2 Moving object interior detection

DT is a method to describe certain terrain or objects [8]. Adjacent triangles areobtained by the given points and they represent the rough shape of the object. Thereare many approaches to construct the triangles and we consider the randomizedincremental Delaunay Triangulation method, which is the representative approach.Given a point set P, DT is a set of triangles in a plane. These circles satisfy emptycircum-circle property such that no point in P is inside the circum-circle of any

Multimed Tools Appl

triangle. DT maximizes the minimum angle of all the angles of the triangles. Theadvanced DT algorithm is described in the Algorithm 3. The example of the processof Delaunay Triangulation using triangle split and side flip is shown in Fig. 5. Triangle1 is divided into three triangles. Triangle 4 and 5 are generated from triangle 2 andthe sub-triangle of triangle 1 and triangle 6 and 7 are generated from triangle 3 andthe sub-triangles of triangle 5.

Given the inner object feature points set Pobj from the motion clustering step andthe inlier object region PFD, we combine the two results to detect the moving objectregion (Algorithm 4). The proposed moving object detection approach consists ofthree steps: inner-outlier rejection, randomized incremental DT, and final objectregion detection. First, we reject the inner-outlier object feature points and region.In the motion classification step, we select the camera motion cluster, which is theoutlier object points, among the multiple motion clusters. We select the inner-inlierobject points by rejecting the inner-outlier object points among the remaining points.We compare the scatteredness of each cluster with the camera motion cluster andreject the cluster that has a scatteredness larger than some threshold value, wherewe use C of 0.4. If the scatteredness of some cluster is similar to that of the cameramotion cluster, the cluster is not a moving object cluster because we assume that thefeature points of the camera motion cluster are spread more widely over the entireimage than those of the moving object cluster. We also reject the inner-outlier objectregion, which is a the false detected region. If the region contains inner-outlier objectpoints, we reject the region.

Algorithm 3 Randomized incremental DT

Initialize DT as the triangulation �p0p−1p−2including all points.

Compute a random permutation p1, p2, . . . , pn of P′obj \ p0.

for all r such that 0 ≤ r ≤ n doInsert pr into DT and find �pip jpk∈DT containing pr .if pr ∈ �pip jpk

thenAdd three edges, prpi, prp j, prpk and split �pip jpk

into three �s.LegalizeEdge(pr, pip j(/p jpk/pkpi), DT)

else(pr ∈ pip j) Add edges, prpk, prpl and split two �s into four �s.LegalizeEdge(pr, pipl(/plp j/p jpk/pkpi), DT)

end ifend forDiscard p−1 and p−2 with all their incident edges from DT.Reject the all long sides e longer than Lth.return DT

LegalizeEdgeThe inserted point pr and the edge pip j of DT that may need to be flipped.if pip j is illegal then

Let �pip jpkis adjacent to �prpip j

along pip j.end ifFlip pip j : replace pip j with prpk.LegalizeEdge(pr, pipk(/pkp j), DT)

Multimed Tools Appl

Fig. 5 A example of Delaunay triangulation. The initial triangles a are split and flipped accordingto the empty circum-circle property. The circles with number represent the process of the triangleupdate

Second, we construct the DT using the Randomized Incremental Delaunay Tri-angulation algorithm from the inner-inlier object points. In the initialization of DT,we let p0 be the point with the largest y value and p−1 and p−2 be two points suchthat P ∈ �p0p−1p−2

. After constructing the DT, we reject the triangles that have longedges larger than twice the block size. Finally, we generate the region RDT from theDT by filling the obtained triangles and we merge the region with the inlier objectregion RFD. The experimental result of the moving object interior detection usingRandomized Incremental Delaunay Triangulation is shown in Fig. 6.

Algorithm 4 The proposed moving object detection frameworkInlier object points set Pobj from motion clustering.Inlier object region RFD from FD.Inner-outlier Rejection1. Reject the inner-outlier object points.for all k such that 0 ≤ k ≤ K do

if scatternessk > C ∗ max(scatterness) thenPobj = Pobj\Pkth

end ifend forDivide the image into sub-blocks and sample the inner-inlier points for each block.The sampled inner-inlier object points set is P′

obj of n + 1 points.2. Reject the inner-outlier region.Select the all region Rio containing the inner-outlier object points.The inner-inlier object region is RFD = RFD − ∑

Rio.Randomized Incremental DT1. Construct the triangles of the object region.2. Reject the non-object triangles and generate the region RDT .Object Region Detection1. Detect the moving object region R = RDT ∪ RFD.2. Fill the holes within R.

Multimed Tools Appl

Fig. 6 The result of the DT-based moving object detection. Each motion cluster is presented usingdifferent color a, where yellow and green points are the object points. Camera and the movingobject motion clusters are presented using white and green points, respectively b. The initial trianglesobtained by DT using the object points and the final triangles after long side rejection are shown inc and d

6 Experimental results and discussion

6.1 Database

We used two data sets to evaluate the proposed algorithm: DATA-I and DATA-II.One data set (DATA-I) consists of four video clips: VPerson, VCars, and VHand,which were obtained by a moving camera [20]. Another data set (DATA-II) consistsof three video clips: Laboratory, Outside1, and Outside2, which were captured by ahand-held camera and a camera that was mounted on a mobile robot. We convertedeach video clip into image sequences and used them to test the previous and theproposed algorithms. VPerson contains pan/title/translation camera motion and onemoving object and includes 201 images. VCars contains pan/tilt/translation cameramotion and three moving objects and includes 196 images. VHand contains pan/tilt/translation camera motion and one moving object that occupies a large number of theinput image and includes 401 images. Laboratory contains large pan/tilt/translationcamera motion and one moving object and includes 500 images. Outside1 andOutside2 contain large pan/tilt/translation camera motion and two moving objectsand include 301 images and 375 images, respectively. The input images include a

Multimed Tools Appl

background with perspective and moving objects of different sizes less than 30% ofthe size of the input image.

We evaluated the two measurements, which are motion classification performanceand moving object detection performance. To evaluate the motion classificationperformance, we used the Laboratory image sequence because it has the mostvarious environment including the large perspective and large pan/tilt/ translationmotion.o To evaluate and compare the moving object detection performance, weused all the data sets: DATA-I and DATA-II. We resized the large images to the320 × 240 size. The proposed moving object detection approach under a free-movingcamera system was implemented on a Windows PC platform with a 2.4 GHz IntelCore 2 CPU and 4GM RAM in the Visual C++ and OpenCV environment.

6.2 Performance evaluation

6.2.1 Method evaluation

We evaluated the moving object detection approach performance for three methods:Motion Compensation (MC), Delaunay Triangulation (DT), and Combination oftwo previous methods (MCDT) by computing the precision and recall, where thetwo measurements are closely related with the boundary detection accuracy of themoving object region. First, we detect the moving object region and compute thethree measurements using only the frame difference method, which uses the two con-secutive frames that the background coordinates are aligned. Second, we performedthe same process using only the triangle region obtained by Delaunay Triangulationmethod. Finally, we detect the moving object region and compute the measurementsusing the proposed MCDT method. We used the Laboratory database and comparedthe two measurements (Table 1). For each method, we used the average volumeof the hyperellipsoids as the scatteredness for the motion classification and 10 × 10block for the camera motion propagation and compensation.

The results of the MC is the worst among the three moving object detectionmethods. If the inside color information of the object is uniform, MC detects only thefront and back region of the moving object because MC just considers the intensitychange between two consecutive frames for each pixel. The results of the DT is betterthan the MC but it is worse than the MCDT. DT can detect the inside region of themoving object even though the inside color information is uniform by constructingthe triangles using the interior object feature points. But, DT cannot detect the exactboundary of the moving object because there are no feature points in the boundaryedge generally because the boundary pixels are belong to the edge not corner. Theproposed MCDT shows the best performance among the three detection methods.The boundary region and the inside region of the moving object, which are detectedby MC and DT, are combined to decide the final moving object region.

Table 1 Results of the moving object detection of three detection methods

Measurements Motion compensation Delaunay triangulation Combination

Precision 0.75 0.80 0.83Recall 0.65 0.86 0.94

Multimed Tools Appl

6.2.2 Block size evaluation

We evaluated the block size effect in the moving object detection using DelaunayTriangulation. In the MCDT-based moving object detection, we rejected the inner-outlier object points and divided the image into M × N sub-blocks, and sampledthe inner-inlier points for each block to reduce the processing time of DelaunayTriangulation. We set the number the selected points for each sub-block as 5. Thesampled inner-inlier feature points were used to the input points of the randomizedincremental DT, which constructs the triangles to describe the moving object region.Some miss-classified background points generate long sides of triangle. The longsides of the triangles are eliminated using a threshold value, which depends on thepre-defined block size. The moving object detection performance depends on theblock size because the number and distribution of the sampled inner-inlier object

Fig. 7 An example of moving object detection. Precision, recall, and relative distance according tothe block size index are represented in a. The final moving object detection result for each block sizeis shown in b. The first and second rows are the results of the rectangle block and the third row is theone of the square block

Multimed Tools Appl

points are different according to the block division degree. First, we changed theblock size from 4 × 3 to 320 × 240 maintaining the aspect ratio of the input imagethat means the rectangle sub-block. Second, we also changed the block size from5 × 5 to 80 × 80 maintaining the aspect ratio of 1 that means the square sub-block.We assigned the index each block according to the block size and evaluated the blocksize effect by comparing the precision, recall, and relative distance from the movingobject detection result using MCDT. We used the same database with the detection

Fig. 8 The detection results of moving objects: indoor a and outdoor b, c, d. The detection isinsensitive to the amount of camera motion and the size of objects. The result of d is not goodbecause the image resolution is very low where motion of the hand-held camera is very large

Multimed Tools Appl

methods experiment in the previous chapter. The MC result for all block indexesis same.

Rectangle block indexes are (1, 3, 5, 6, 8, 9, 11, 12, 14, 15) and the square blockindexes are (2, 4, 7, 10, 13). The precision and recall are the best at the 10 × 10 squareblock whose block index is 4. When the block size is too small, the three performancemeasurements are low because the most sides of triangles are eliminated by the longside rejection (Fig. 7). The recall and relative distance is low when the block sizetoo large because there are not enough inner-inlier points to construct the trianglesdescribing the moving object region. Specifically, when the block size is 120 × 160and 240 × 320 (Fig. 8), the long sides by the miss-classified background points arenot eliminated because of the large threshold value. The large threshold value makethe moving object region rough.

We used the block size of 10 × 10 to detect the various moving object experimentin the next chapter.

6.2.3 Detection evaluation

In this work, we prove the validity of the proposed moving object detection approachby evaluating moving object detection performance using the three measurementsincluding precision, recall, and relative distance. We compared two moving objectdetection algorithms. One is the Rank-Constraint based background subtractionalgorithm [23], which is the state-of-the-art in moving object detection under a free-moving camera. This algorithm use a sparse model of the background that is builtby estimating a trajectory basis from the long trajectories of the feature points andsubtracting trajectories that belong to the space spanned by three basis trajectories.The another algorithm is the proposed algorithm using the combination of motioncompensation and Delaunay triangulation.

We used six different image sequences: VPerson, VCars, VHand, Laboratory,Outside1, and Outside2. The first three image sequences are the reference data usedto compare the two algorithms in terms of moving object detection performance.The other three image sequences are additional data to evaluate the moving objectdetection performance in a more general situation where there are large cameramotion and many trees with waving branches in the background. The precisionand recall of the proposed algorithm are higher than those of the Rank-Constraintbackground subtraction algorithm. The relative distance of the proposed algorithmis smaller than that of the Rank-Constraint background subtraction algorithm.The proposed approach represents more accurate results than the Rank-Constraintapproach for all measurements (Table 2). The result for the relative distance that we

Table 2 Results of the movingobject detection of the twoalgorithms

Datasets Rank-Constraint [23] Proposed

VPerson 0.80/0.95 0.83/0.94VCars 0.71/0.92 0.87/0.95VHand 0.83/0.99 0.83/0.98Laboratory – 0.80/0.90Outside1 – 0.84/0.95Outside2 – 0.82/0.93Average 0.78/0.95 0.83/0.94fps 0.03 15

Multimed Tools Appl

proposed was not compared. The processing speed of the proposed approach is muchfaster than the Rank-Constraint approach.

High processing speed is important in real-time systems. The proposed approachshows more accurate performance with high processing speed. The processing timeis more than 15 frames/sec.

7 Conclusion

The moving object detection is a fundamental and important issue for high-levelimage understanding such as motion tracking, behavior understanding, and contextrecognition in dynamic background. The conventional moving object detection failswhen the camera has motion such as pan, tilt, forward and backward motion. Al-though many researchers have been studied to deal with the moving camera environ-ment, they dealt with simple camera motion and focused on the detection accuracywithout considering the processing time for real-time system.

To overcome these limitations, we proposed a moving object detection methodunder free-moving camera, which used optical flow clustering and Delaunay triangu-lation. The moving object detection consists of motion estimation, motion clustering,motion classification, and moving object detection. Motion estimation consists of cor-ner feature points detection and tracking. Motion clustering is performed by block-based K-means clustering and cluster outlier rejection using RANSAC algorithm.Motion classification consists of scatteredness computation of optical flow densityand camera motion selection using the scatteredness. Moving object detectionconsists of the moving object border detection using camera motion compensationbased on multi-resolution motion propagation and moving object interior detectionby Delaunay triangulation to improve the detection accuracy. The proposed methoduses only two consecutive frames and it is robust to random camera motion.

There are two assumptions that the portion of the background is larger than theone of the foreground and the background feature points is more widely distributedin the image coordinate. The primary limitation of the proposed method is relatedwith the assumptions. In smooth plane background, the feature points are notextracted and the the camera and object motion cannot be classified accurately. Weneed a more robust feature tracking method that can deal with the situation. If themoving object appears with large portion of the image, the second assumption is notproper and we need to additional algorithm such as tracking.

Acknowledgements This work was supported by the MKE (The Ministry of Knowledge Econ-omy), Korea,under the Core Technology Development for Breakthrough of Robot Vision Researchsupport program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2010-C7000-1001-0006). And this research was supported by Basic Science Research Program through theNational Research Foundation of Korea(NRF) funded by the Ministry of Education, Science andTechnology (No. 2011-0027953).

References

1. Bajpai N (2010) Business statistics. Pearson2. Barnich O, Droogenbroeck MV (2009) VIBE: a powerful random technique to estimate the

background in video sequences. In: Proc. IEEE ICASSP 2009, pp 945–948

Multimed Tools Appl

3. Borshukov GD, Bozdagi G, Altunbasak Y, Tekalp AM (1997) Motion segmentation by multi-stage affine classification. IEEE Trans Image Process 6(11):1591–1594

4. Bouguet JY (2000) Pyramidal implementation of the Lucas Kanade feature tracker descriptionof the algorithm. OpenCV Documentation

5. Chen M, Gonzalez S, Cao H, Zhang Y, Vuong S (2010) Enabling low bit-rate and reli-able video surveillance over practical wireless sensor networks. J. Supercomput. doi:10.1007/s11227-010-0475-2

6. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley-Interscience 20017. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with

applications to image analysis and automated cartography. Commun ACM 24(6):381–3958. Guibas LJ, Knuth DE, Sharir M (1992) Randomized incremental construction of Delaunay and

Voronoi diagrams. Algorithmica 7(4):381–4139. Han B, Comaniciu D, Zhu Y, Davis LS (2008) Sequential kernel density approximation and its

application to real-time visual tracking. IEEE Trans Pattern Anal Mach Intell 30(7):1186–119710. Hayman E, Eklundh J (2003) Statistical background subtraction for a mobile observer. In: Proc.

IEEE ICCV 2003, pp 67–7411. Jin Y, Tao L, Di H, Rao NI, Xu G (2008) Background modeling from a free-moving camera by

multi-layer homography algorithm. In: Proc. IEEE ICIP 200812. Ke Q, Kanade T (2001) A subspace approach to layer extraction. In: Proc. IEEE CVPR 200113. Li L, Huang W, Gu IYH, Tian Q (2003) Foreground object detection from videos containing

complex background. In: Proc. ACMMM 200314. Maddalena L, Petrosino A (2008) A self-organizing approach to background subtraction for

visual surveillance applications. IEEE Trans Image Process 17(7):1168–117715. Mittal A, Hunttenlocher D (2000) Scene modeling for wide area surveillance and image synthe-

sis. In: Proc. IEEE CVPR 2000, pp 160–16716. Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling

human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831–84317. Ren X, Malik J (2007) Tracking as repeated figure/ground segmentation. In: Proc. IEEE CVPR

2007, pp 1–818. Ren Y, Chua CS, Ho YK (2003) Statistical background modeling for non-stationary camera.

Pattern Recogn 24(1–3):183–19619. Ren X, Song J, Ying H, Zhu Y, Qiu X (2007) Robust nose detection and tracking using

GentleBoost and improved Lucas–Kanade optical flow algorithms. In: Proc. IEEE ICIC 2007,pp 1240–1246

20. Sand P, Teller S (2006) Particle video: long-range motion estimation using point trajectories. In:Proc. IEEE CVPR 2006, pp 2195–2202

21. Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vis37(2):151–172

22. Schoenemann T, Cremers D (2008) High resolution motion layer decomposition using dual-space graph cuts. In: Proc. IEEE CVPR 2008, pp 1–7

23. Sheikh Y, Javed O, Kanade T (2009) Background subtraction for freely moving cameras. In:Proc. IEEE ICCV 2009, pp 1219–1225

24. Shi J, Tomasi C (1994) Good features to track. In: Proc. IEEE CVPR 1994, pp 593–60025. Tao H, Sawhney HS, Kumar R (2002) Object tracking with bayesian estimation of dynamic layer

representations. IEEE Trans Pattern Anal Mach Intell 24(1):75–8926. Uemura H, Ishikawa S, Mikolajczyk K (2008) Feature tracking and motion compensation for

action recognition. In: Proc. BMVC 200827. Xiao J, Shah M (2005) Motion layer extraction in the presence of occlusion using graph cuts.

IEEE Trans Pattern Anal Mach Intell 27(10):1644–165928. Zivkovic Z, Heijden F (2006) Efficient adaptive density estimation per image pixel for the task

of background subtraction. Pattern Recogn Lett 27(7):773–780

http://dx.doi.org/10.1007/s11227-010-0475-2

http://dx.doi.org/10.1007/s11227-010-0475-2

Multimed Tools Appl

Jiman Kim received the B.S. degree in electronic and engineering from Kyungpook NationalUniversity, Daegu, South Korea in 2006, and the M.S. degree in computer science and engineeringfrom Pohang University of Science and Technology (POSTECH) in 2008. Now he is the Ph.Dcandidate in the Department of Computer Science and Engineering at POSTECH, Pohang, SouthKorea. His research interests include computer vision, vision-based surveillance system, humancomputer interaction.

Xiaofei Wang is a Ph.D. candidate in the Multimedia and Mobile Communication Laboratory,School of Computer Science and Engineering (CSE), Seoul National University (SNU), Korea. Hereceived the B.S. degree in the Department of Computer Science and Technology of HuazhongUniversity of Science and Technology (HUST) in 2005, and M.S. degree from the School of CSEat SNU in 2008. His current research interests are in the areas of mobile content delivery networks,mobile traffic analysis and mobile service evaluation.

Multimed Tools Appl

Hai Wang received the B.S. degree in computer science and engineering from Wuhan University,Hubei, China in 2011. Now he is the Ph.D student in the Department of Computer Science andEngineering at POSTECH, Pohang, South Korea. His research interests include computer vision,face recognition, face analysis.

Chunsheng Zhu received the B.E. degree in Network Engineering from Dalian University ofTechnology, China, in June 2010. He is currently a Master Student in Department of ComputerScience, St. Francis Xavier University, Canada. His current research interests include wirelessnetworks, multimedia and security.

Multimed Tools Appl

Daijin Kim received the B.S. degree in electronic and engineering from Yonsei University, Seoul,South Korea, in 1981, and the M.S. degree in electrical engineering from the Korea AdvancedInstitute of Science and Technology (KAIST), Taejon, 1984. In 1991, he received the Ph.D degreein electrical and computer engineering from Syracuse University, Syracuse, NY. During 1992-1999,he was an Associate Professor in the Department of Computer Engineering at DongA University,Pusan, Korea. He is currently a Professor in the Department of Computer Science and Engineering atPOSTECH, Pohang, Korea. His research interests include biometrics, human computer interaction,and intelligent systems.

fast moving object detection with non-stationary backgrounddobby/assets/2012_mtap.pdf ·...

Documents