research article online detection of abnormal events in ...institut charles delaunay, lms -umr stmr...

Hindawi Publishing CorporationJournal of Electrical and Computer EngineeringVolume 2013, Article ID 837275, 12 pageshttp://dx.doi.org/10.1155/2013/837275

Research ArticleOnline Detection of Abnormal Events in Video Streams

Tian Wang,1 Jie Chen,2 and Hichem Snoussi1

1 Institut Charles Delaunay, LM2S-UMR STMR 6279 CNRS, University of Technology of Troyes, 10004 Troyes, France2Observatoire de la Côte d’Azur, UMR 7293 CNRS, University of Nice Sophia-Antipolis, 06108 Nice, France

Correspondence should be addressed to Tian Wang; [email protected]

Received 19 September 2013; Accepted 12 November 2013

Academic Editor: Yi Zhou

Copyright © 2013 Tian Wang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We propose an algorithm to handle the problem of detecting abnormal events, which is a challenging but important subject invideo surveillance. The algorithm consists of an image descriptor and online nonlinear classification method. We introduce thecovariance matrix of the optical flow and image intensity as a descriptor encoding moving information. The nonlinear onlinesupport vector machine (SVM) firstly learns a limited set of the training frames to provide a basic reference model then updates themodel and detects abnormal events in the current frame. We finally apply the method to detect abnormal events on a benchmarkvideo surveillance dataset to demonstrate the effectiveness of the proposed technique.

1. Introduction

Visual surveillance is one of the major research areas incomputer vision. In a crowd image analysis problem, thescientific challenge includes abnormal events detection. Forinstance, Figure 1(a) illustrates a normal scene where thepeople are walking. In Figure 1(b), all the people are suddenlyrunning in different directions. This dataset imitates panic-driven scenes.

Trajectory analysis of objects was described in [1–3]. Themoving object was labeled by a blob in consecutive frames,and then a trajectory was produced. The deviation from thelearnt trajectories was defined as abnormal events. Trackingbased approaches are suitable for the sparse scenes with a fewobjects. The target might be lost due to occlusion.

In [4, 5], abnormal detection approaches which used fea-tures encoding motion, texture, and size of the objects wereintroduced. Local image regions in a video were analyzed byemploying background subtraction method; then a dynamicBayesian network (DBN) was constructed to model normaland abnormal behavior, and finally a likelihood ratio testwas applied to detect abnormal behaviors. In [6], a space-time Markov random field (MRF) model which detectedabnormal activities in a video was proposed, mixture ofprobabilistic principal component analyzers (MPPCA) wasadopted to model local optical flow. The prediction is based

on probabilistic assumption techniques where an accuratemodel exists, but there are various situations where a robustand tractablemodel cannot be obtained; model-freemethodsare needed to be studied.

Spatiotemporal motion features described by the contextof bag of video words were adopted to detect abnormalevents. In [7], the authors presented an algorithm whichmonitored optical flow in a set of fixed spatial positions,and constructed a histogram of optical flow. The likelihoodof the behavior in a new coming frame concerning theprobability distribution of the statistically learning behaviorwas computed. If the likelihood fell below a preset threshold,the behavior was considered as abnormal. In [8], irregularbehavior of images or videos was detected by an inferenceprocess in a probabilistic graphical model. In [9, 10], thevideo pixels were densely sampled to form the feature. Thesemethods are based on the partial information of images, suchas small blocks in a frame, without fully exploiting the globalinformation of the feature. In [11–13], spatiotemporal featuresmodeled motion regions of the frame as background, andanomaly was detected by subtracting the newly sample to thebackground template. These works are similar to the changedetection method when the background is not stable.

In this paper, the proposed algorithm is composed of twoparts. Firstly, a covariance feature descriptor is constructedover the whole video frame, and then a nonlinear one-class

2 Journal of Electrical and Computer Engineering

support vector machine algorithm is applied in an onlinefashion in order to detect abnormal events. The featuresare extracted based on the optical flow which presents themovement information. Experiments of real surveillancevideo dataset show that our online abnormal detection tech-niques can obtain satisfactory performance. The rest of thepaper is organized as follows. In Section 2, covariance matrixdescriptor of motion feature is introduced. In Section 3, theonline one-class SVM classification method is presented. InSection 4, two abnormal detection strategies based on onlinenonlinear one-class SVM are proposed. In Section 5, wepresent results of real-world video scenes. Finally, Section 6concludes the paper.

2. Covariance Descriptor of Frame Behavior

The optical flow is a feature which presents the directionand the amplitude of a movement. It can provide importantinformation about the spatial arrangement of the objects andthe change rate of this arrangement [14]. We adopt Horn-Schunck (HS) optical flow computation method in our work.The optical flow of the gray scale image is formulated as theminimizer of the following global energy functional:

𝐸 = ∬[(𝐼𝑥𝑢 + 𝐼𝑦V + 𝐼𝑡)2

+ 𝛼2

(‖∇𝑢‖2

+ ‖∇V‖2)] 𝑑𝑥𝑑𝑦, (1)

where 𝐼 is the intensity of the image, 𝐼𝑥, 𝐼𝑦, and 𝐼

𝑡are the

derivatives of the image intensity value along the 𝑥, 𝑦, andtime 𝑡 dimension, 𝑢 and V are the components of the opticalflow in the horizontal and vertical direction, and 𝛼 representsthe weight of the regularization term.

We introduce the covariance matrix encoding the opticalflow and intensity of each frame as the descriptor to representthemovement.The covariance feature descriptor is originallyproposed by Tuzel et al. [15] for pattern matching in a targettracking problem. The descriptor is defined as

𝐹 (𝑥, 𝑦, 𝑖) = 𝜙𝑖(𝐼, 𝑥, 𝑦) , (2)

where 𝐼 is the color information of an image (which can begray, RGB, HSV, HLS, etc.), 𝜙

𝑖is a mapping relating the image

with the 𝑖th feature from the image, 𝐹 is the 𝑊 × 𝐻 × 𝑑dimensional feature extracted from image 𝐼,𝑊 and𝐻 are theimage width and image height, and 𝑑 is the number of chosenfeatures. For each frame, the feature can be represented as𝑑 × 𝑑 covariance matrix:

C = 1𝑛 − 1

𝑛

∑

𝑘=1

(z𝑘− 𝜇) (z

𝑘− 𝜇)⊤

, (3)

where 𝑛 is the number of the pixels sampled in the frame,z𝑘is the feature vector of pixel 𝑘, 𝜇 is the mean of all the

selected points, and C is the covariance matrix of the featurevector 𝐹. The covariance descriptor C of each frame dose nothave any information regarding the sample ordering and thenumber of points [15]. Because the feature 𝐹 can be designedas different approaches to fuse features, the covariancematrixdescriptor proposes a way to merge multiple parameters.Different choices of feature vectors extraction are shown in

Table 1: Different choices of feature 𝐹 to construct the covariancedescriptor.

Feature vector 𝐹𝐹1(6 × 6) [𝑦 𝑥 𝑢 V 𝑢

𝑥𝑢𝑦]

𝐹2(6 × 6) [𝑦 𝑥 𝑢 V V

𝑥V𝑦]

𝐹3(8 × 8) [𝑦 𝑥 𝑢 V 𝑢

𝑥𝑢𝑦V𝑥V𝑦]

𝐹4(12 × 12) [𝑦 𝑥 𝑢 V 𝑢

𝑥𝑢𝑦V𝑥V𝑦𝑢𝑥𝑥

𝑢𝑦𝑦

V𝑥𝑥

V𝑦𝑦]

𝐹5(17 × 17) [𝑦 𝑥 𝑢 V 𝑢

𝑥𝑢𝑦V𝑥V𝑦𝑢𝑥𝑥

𝑢𝑦𝑦

V𝑥𝑥

V𝑦𝑦

𝐼 𝐼𝑥𝐼𝑦𝐼𝑥𝑥

𝐼𝑦𝑦]

Table 1, where 𝐼 is the intensity of the gray image, 𝑢 and Vare horizontal and vertical components of optical flow, 𝐼

𝑥,

𝑢𝑥, and V

𝑥are the first derivatives of the intensity, horizontal

optical flow, and vertical optical flow in the 𝑥 directionrespectively, 𝐼

𝑦, 𝑢𝑦, and V

𝑦are the first derivatives of the

corresponding feature in the 𝑦 direction and respectively,𝐼𝑥𝑥, 𝑢𝑥𝑥, and V

𝑥𝑥are the second derivatives in 𝑥 direction,

𝐼𝑦𝑦, 𝑢𝑦𝑦, and V

𝑦𝑦are the second derivatives in 𝑦 direction.

The flowchart of covariance matrix descriptor computationis shown in Figure 2. The optical flow and correspondingpartial derivative characterize the interframe information orcan be regarded as the movement information. The intensityof the frame and partial derivative of the intensity describethe intraframe information; they encode the appearanceinformation of the frame.

If proper parameters are given, the traditionally usedkernel, such as Gaussian, polynomial, and sigmoidal kernel,has similar performances [19]. Gaussian kernel 𝜅(x

𝑖, x𝑗) =

exp(−‖x𝑖− x𝑗‖2

/2𝜎2

) is chosen for our spatial features. Thecovariance matrix is an element in Lie group; the Gaussiankernel on the Euclidean spaces is not suitable for the covari-ance descriptors.TheGaussian kernel in Lie Group is definedas [20, 21]:

𝜅 (X𝑖,X𝑗) = exp(−

log (X−1

𝑖X𝑗)

2𝜎2) , (X

𝑖,X𝑗) ∈ 𝐺 × 𝐺,

(4)

where X𝑖and X

𝑗are matrices in Lie Group 𝐺.

3. Online One-Class SVM

The essence of an abnormal detection problem is that onlynormal scene samples are available. The one-class SVMframework is well suitable to an abnormal detection problem.Support vector machine (SVM) is initially proposed byVapnik and Lerner [22, 23]. It is a method based on statisticallearning theory and has fine performance to classify data andrecognize patterns. There are two frameworks of one-classSVM, one is support vector data description (SVDD) whichis presented in [24, 25] and the other is ]-support vectorclassifier (]-SVC) introduced in [26].The SVDD formulationis adopted in our work. It computes a sphere shaped decisionboundary with minimal volume around a set of objects. The

Journal of Electrical and Computer Engineering 3

(a) Normal lawn scene (b) Abnormal lawn scene

Figure 1: Examples of the normal and abnormal scenes: (a) Normal lawn scene: all the people are walking. (b) Abnormal lawn scene, all thepeople are running.

Consecutive frameFramei

Framei+1

Optical flow field OPi

Features

F(x, y, j)

j = 1, 2, . . . , nCframe𝑖

Figure 2: Covariance descriptor computation of a video frame.

center of the sphere c and the radius 𝑅 are to be determinedvia the following optimization problem:

min𝑅,𝜉,𝑐

𝑅2

+ 𝐶

𝑛

∑

𝑖=1

𝜉𝑖, (5)

subject to Φ (x𝑖) − c

2

≤ 𝑅2

+ 𝜉𝑖, 𝜉𝑖≥ 0, ∀𝑖, (6)

where 𝑛 is the number of training samples and 𝜉𝑖is the

slack variable for penalizing the outliers.The hyperparameter𝐶 is the weight for restraining slack variables; it tunesthe number of acceptable outliers. The nonlinear functionΦ : X → H maps a datum x

𝑖into the feature space

H; it allows to solve a nonlinear classification problem bydesigning a linear classifier in the feature space H. 𝜅 is thekernel function for computing dot products inH, 𝜅(x, x) =⟨Φ(x), Φ(x)⟩. By introducing Lagrange multipliers, the dual

problem associated with (6) is written by the followingquadratic optimization problem:

max𝛼

𝑛

∑

𝑖=1

𝛼𝑖𝜅 (x𝑖, x𝑖) −

𝑛

∑

𝑖,𝑗=1

𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗) ,

subject to 0 ≤ 𝛼𝑖≤ 𝐶,

𝑛

∑

𝑖=1

𝛼𝑖= 1,

c =𝑛

∑

𝑖=1

𝛼𝑖Φ(x𝑖) .

(7)

The decision function is

𝑓 (x) = sgn(𝑅2 −𝑛

∑

𝑖,𝑗=1

𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗)

+ 2

𝑛

∑

𝑖=1

𝛼𝑖𝜅 (x𝑖, x) − 𝜅 (x, x)) .

(8)

For the large training data, the solution cannot beobtained easily, and an online strategy to train the data is


used in our work. Let cD denotes a sparse model of the centerc𝑛= (1/𝑛)∑

𝑛

𝑖=1Φ(x𝑖) by using a small subset of available

samples which is called dictionary:

cD = ∑𝑖∈D

𝛼𝑖Φ(x𝑖) , (9)

whereD ⊂ {1, 2, . . . , 𝑛}, and let𝑁D denote the cardinality ofthis subset xD. The distance of a mapped datum Φ(x) withrespect to the center cD can be calculated by

Φ (x) − cD = ∑

𝑖,𝑗∈D

𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗)

− 2∑

𝑖∈D

𝛼𝑖𝜅 (x𝑖, x) + 𝜅 (x, x) .

(10)

A modification of the original formulation of the one-class classification algorithm that consists of minimizing theapproximation error ‖c

𝑛− cD‖ is [27, 28]

𝛼 = arg min𝛼𝑖,𝑖∈D

1

𝑛

𝑛

∑

𝑖=1

Φ(x𝑖) − ∑

𝑖∈D

𝛼𝑖Φ(x𝑖)

2

. (11)

The final solution is given by

𝛼 = K−1𝜅, (12)

where K is the Grammatrix with (𝑖, 𝑗)th entry 𝜅(x𝑖, x𝑗) and 𝜅

is the column vector with entries (1/𝑛)∑𝑛𝑖=1

𝜅(x𝑘, x𝑖), 𝑘 ∈ D.

In the online scheme, at each time step there is a newsample. Let 𝛼

𝑛denote the coefficients, K

𝑛denote the Gram

matrix, and 𝜅𝑛denote the vector, at time step 𝑛. A criterion is

used to determine whether the new sample can be includedinto the dictionary. A threshold 𝜇

0is preset, for the datum

x𝑡at time step 𝑡, the coherence-based sparsification criterion

[29, 30] is

𝜖𝑡= max𝑖∈D

𝜅 (x𝑖, x𝑡) . (13)

First Case (𝜖𝑡> 𝜇0). In this case, the new data Φ(x

𝑛+1) is not

included into the dictionary.The GrammatrixK𝑛+1

= K𝑛. 𝜅𝑛

changes online:

𝜅𝑛+1

=1

𝑛 + 1(𝑛𝜅𝑛+ b) ,

𝛼𝑛+1

= K−1𝑛+1𝜅𝑛+1

=𝑛

𝑛 + 1𝛼𝑛+

1

𝑛 + 1K−1𝑛b,

(14)

where b is the column vector with entries 𝜅(x𝑖, x𝑛+1

), 𝑖 ∈ D.

Second Case (𝜖𝑡≤ 𝜇0). In this case, the new data Φ(x

𝑛+1) is

included into the dictionaryD. The Grammatrix K changes:

K𝑛+1

= [

[

K𝑛

bb⊤ 𝜅 (x

𝑛+1, x𝑛+1

)]

]

. (15)

By using Woodbury matrix identity

(𝐴 + 𝑈𝐶𝑉)−1

= 𝐴−1

− 𝐴−1

𝑈(𝐶−1

+ 𝑉𝐴−1

𝑈)−1

𝑉𝐴−1

, (16)

K−1𝑛+1

can be calculated iteratively:

K−1𝑛+1

= [

[

K−1𝑛

00⊤ 0

]

]

+1

𝜅 (x𝑛+1

, x𝑛+1

) − b⊤K−1𝑛b

× [

[

−K−1𝑛b

1]

]

[−b⊤K−1𝑛1] .

(17)

The vector 𝜅𝑛+1

is updated from 𝜅𝑛,

𝜅𝑛+1

=1

𝑛 + 1[𝑛𝜅𝑛+ �⃗�

𝜅𝑛+1

] , (18)

with

𝜅𝑛+1

=

𝑛+1

∑

𝑖=1

𝜅 (x𝑛+1

, x𝑖) . (19)

Computing 𝜅𝑛+1

as (19) needs to save all the samples {x}𝑛+1𝑖=1

inmemory. For conquering this issue, it can compute as 𝜅

𝑛+1=

(𝑛 + 1)𝜅(x𝑛+1

, x𝑛+1

) by considering an instant estimation.Theupdate of 𝛼

𝑛+1from 𝛼

𝑛is

𝛼𝑛+1

=1

𝑛 + 1[𝑛𝛼𝑛+ K−1𝑛b

0]

−1

(𝑛 + 1) (𝜅 (x𝑛+1

, x𝑛+1

) − b⊤K−1𝑛b)

× [

[

K−1𝑛b

1]

]

(𝑛b⊤𝛼𝑛+ b⊤K−1

𝑛b − 𝜅𝑛+1

) .

(20)

Based on (20), we have an online implementation of the one-class SVM learning phase.

4. Abnormal Events Detection

In an abnormal event detection problem, it is assumedthat a set of training frames {𝐼

1, . . . , 𝐼

𝑛} (the positive class)

describing the normal behavior is obtained. The generalarchitectures of abnormal detection are introduced below.

The offline training strategy refers to the case where allthe training samples are learnt as one batch, as shown inFigure 3(a). We propose two abnormal detection strategies;the difference between these two strategies is the time whenthe dictionary is fixed. These two strategies are shown inFigures 3(b) and 3(c). Strategy 1 is shown in Figure 3(b). Thetraining data are learnt one by one.When the training periodis finished, the dictionary and the classifier are fixed. Eachtest datum is classified based on the dictionary. Figure 3(c)illustrates Strategy 2. The training procedure is the same asStrategy 1. But in the testing period, the dictionary is updatedif the datum x

𝑖satisfies the dictionary update condition. The

details of these two strategies are explained in the following.


Train offline

Test onlinem n − m

(a) Strategy offline

Test online

Train online

m n − m

Train

Dictionary fixed

(b) Strategy 1

Train

Dictionary fixed

Train and test online

Test onlinem n − m

(c) Strategy 2

Figure 3: Offline and two online abnormal event detection strategies based on one-class SVM. (a) Strategy offline.The training data are learntas one batch offline. (b) Strategy 1. The dictionary is fixed when all the training data are learnt. (c) Strategy 2. The dictionary continues beingupdated through the testing period.

Features selectionon original image

Learning step (online):Covariance

Covariance

optical flow

Classification

SVM train:

(online training)

people walk

Detection step (online):One-class SVM

R

Origin

Abnormal eventdetection:

people run

optical flow

x1,1 x1,2 · · ·

· · ·

· · ·

x1,n

x2,1 x2,2 x2,n

xn,1 xn,2 xn,n

......

...

x1,1 x1,2 · · ·

· · ·

· · ·

x1,n

x2,1 x2,2 x2,n

xn,1 xn,2 xn,n

......

...

⋱

⋱

Figure 4: Major processing states of the proposed abnormal frame event detection method. The covariance of the frame is computed.


4.1. Strategy 1. In Strategy 1, the dictionary is updated merelythrough the training period.

Step 1. The first step is calculating the covariance matrixdescriptor of training frames based on the image intensity andthe optical flow. This step can be generalized as

{(𝐼1,OP1) , (𝐼2,OP2) , . . . , (𝐼

𝑛,OP𝑛)} → {C

1,C2, . . . ,C

𝑛} ,

(21)

where {(I1,OP1), (I2,OP2), . . . , (𝐼

𝑛,OP𝑛)} are the image

intensity and the corresponding optical flow of the 1st to 𝑛thframe. {C

1,C2, . . . ,C

𝑛} are the covariance matrix descriptors.

Step 2. The second step consists of applying one-class SVMon the small subset of extracted descriptor of the trainingnormal frames to obtain the support vectors. Consider asubset {C

𝑖}𝑚

𝑖=1, 1 ≤ 𝑚 ≪ 𝑛 of data selected from the full

training sample set {C𝑖}𝑛

𝑖=1, without loss of generality, and

assume that the first 𝑚 examples are chosen. This set of 𝑚examples is called dictionary CD:

{C1,C2, . . . ,C

𝑚} , 1 ≤ 𝑚 ≪ 𝑛

SVM→ support vector {𝑆𝑝

1, 𝑆𝑝2, . . . , 𝑆𝑝

𝑜} ,

(22)

where the set {C1,C2, . . . ,C

𝑚} is the first𝑚 covariancematrix

descriptors of the training frames; it is the original dictionaryCD. In one-class SVM, the majority of the training samplesdo not contribute to the definition of the decision function.The entries of a monitory subset of the training samples,{𝑆𝑝1, 𝑆𝑝2, . . . , 𝑆𝑝

𝑜}, 𝑜 ≤ 𝑚, are support vectors contributing

to the definition of the decision function.

Step 3. After learning the dictionary CD which includesthe first 𝑚, 1 ≤ 𝑚 ≪ 𝑛 samples, the training samples{C𝑚+1

,C𝑚+2

, . . . ,C𝑛} are learned online via the technique

described in Section 3. This step can be generalized as

{CD,C𝑘} , 𝑚 < 𝑘 ≤ 𝑛SVM→ support vector {𝑆𝑝

1, 𝑆𝑝2, . . . , 𝑆𝑝

𝑝} ,

𝑜 ≤ 𝑝 ≤ 𝑛,

CD := CD ∪ C𝑘, if 𝜖𝑡 ≥ 𝜇0,(23)

where CD is the dictionary obtained through Step 2, C𝑘 is anew sample in the remaining training dataset. According tothe criterion introduced in Section 3, if the new sample C

𝑘

satisfies the dictionary updated condition, it will be includedinto the dictionary CD.

Step 4. Based on the dictionary and the classifier obtainedfrom the training frames, the incoming frame sample C

𝑛+𝑙is

Table 2: AUC of abnormal event detection method of differentfeatures 𝐹 via original SVDD which learns training samples offline,Strategy 1 online one-class SVM, and Strategy 2 online one-classSVM.The biggest value of each method is shown in bold.

Features Area under ROCLawn Indoor Plaza

Training samples are learned offline𝐹1(6 × 6𝑑𝑢) 0.9426 0.8351 0.9323

𝐹2(6 × 6𝑑V) 0.9400 0.8358 0.9321

𝐹3(8 × 8) 0.9440 0.8375 0.9359

𝐹4(12 × 12) 0.9591 0.8440 0.9580

𝐹5(17 × 17) 0.9567 0.8649 0.9649

Strategy 1𝐹1(6 × 6𝑑𝑢) 0.9399 0.8328 0.9343

𝐹2(6 × 6𝑑V) 0.9390 0.8355 0.9366

𝐹3(8 × 8) 0.9418 0.8377 0.9411

𝐹4(12 × 12) 0.9581 0.8457 0.9573

𝐹5(17 × 17) 0.9551 0.8628 0.9632

Strategy 2𝐹1(6 × 6𝑑𝑢) 0.9427 0.8237 0.9288

𝐹2(6 × 6𝑑V) 0.9370 0.8241 0.9283

𝐹3(8 × 8) 0.9430 0.8274 0.9312

𝐹4(12 × 12) 0.9605 0.8331 0.9505

𝐹5(17 × 17) 0.9601 0.8495 0.9746

classified.Theworkflow of Strategy 1 is shown in Figure 4 anddescribed by the following equation:

𝑓 (C𝑛+𝑙) = sgn(𝑅2 −

𝑛

∑

𝑖,𝑗=1

𝛼𝑖𝛼𝑗𝜅 (C𝑖,C𝑗)

+ 2∑

𝑖

𝛼𝑖𝜅 (C𝑖,C𝑛+𝑙) − 𝜅 (C

𝑛+𝑙,C𝑛+𝑙))

= {1 𝑓 (C

𝑛+𝑙) ≥ 0

−1 𝑓 (C𝑛+𝑙) < 0,

(24)

whereC𝑛+𝑙

is the covariance matrix descriptor of the (𝑛+ 𝑙)thframe needed to be classified and C

𝑖and C

𝑗are the samples

of the dictionary CD. “1” corresponds to the normal frame;“−1” corresponds to the abnormal frame.

4.2. Strategy 2. In this strategy, the dictionary is updatedthrough both training and testing periods.The feature extrac-tion step (Step 1) and the online training steps (Steps 2 and 3)are the same as the ones presented in Strategy 1. The testingstep is different. The new coming datum which is detectedas normal but satisfies dictionary update condition shouldbe included into CD. The dictionary is needed to be updatedthrough the testing period to include new samples.

Step 4: Strategy 2. If the incoming frame sample C𝑛+𝑙

isclassified as normal (𝑓(C

𝑛+𝑙) = 1), the data is checked by

the criterion described in Section 3. When the data satisfies


(a) Normal lawn scene (b) Abnormal lawn scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

ROC lawn offline

0.940.92

0.90.880.860.840.82

0.80 0.05 0.1 0.15 0.2

F1F2

F3

F4

F5

(c) ROC train offline

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

0.940.92

0.90.880.860.840.82

0.80 0.05 0.1 0.15 0.2

ROC lawn Strategy 1

F1F2

F3

F4

F5

(d) ROC Strategy 1

Figure 5: Detection results of the lawn scene. (a) The detection result of one normal frame. (b) The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the lawn scene results via one-class SVM. All the training samples are learned together offline.The biggest AUC value is 0.9591. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUC value is0.9581.

the dictionary update criterion, this testing sample will beincluded into the dictionary. This step can be generalized bythe following equation:

𝑓 (C𝑛+𝑙) = sgn(𝑅2 −

𝑛

∑

𝑖,𝑗=1

𝛼𝑖𝛼𝑗𝜅 (C𝑖,C𝑗)

+ 2∑

𝑖

𝛼𝑖𝜅 (C𝑖,C𝑛+𝑙) − 𝜅 (C

𝑛+𝑙,C𝑛+𝑙))

=

{

{

{

1 𝑓 (C𝑛+𝑙) ≥ 0 {

𝜖𝑡≥ 𝜇0→ CD := CD ∪ C𝑛+𝑙

𝜖𝑡< 𝜇0→ CD := CD

−1 𝑓 (C𝑛+𝑙) < 0.

(25)

5. Abnormal Detection Result

This section presents the results of experiments conducted toanalyze the performance of the proposedmethod. A compet-itive performance through both Strategy 1 and Strategy 2 ofUMN [31] dataset is presented.

5.1. Abnormal Visual Events Detection: Strategy 1. The resultsof the proposed abnormal events detection method viaStrategy 1 online one-class SVM of UMN [31] dataset areshown below.

The UMN dataset includes eleven video sequences ofthree different scenes (the lawn, indoor, and plaza) ofcrowded escape events. The normal samples for training or


(a) Normal indoor scene (b) Abnormal indoor scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

ROC indoor offline

0.85

0.8

0.75

0.7

0.65

0 0.1 0.2 0.3 0.4

F1F2

F3

F4

F5


1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

ROC indoor Strategy 1

0.85

0.8

0.75

0.7

0.65

0 0.1 0.2 0.3 0.4

F1F2

F3

F4

F5

(d) ROC Strategy 1

Figure 6: Detection results of the indoor scene. (a)The detection result of one normal frame. (b)The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the indoor scene results via one-class SVM. All the training samples are learned togetheroffline. The biggest AUC value is 0.8649. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUCvalue is 0.8628.

Table 3: The comparison of our proposed method with the state-of-the-art methods for the abnormal event detection in the UMNdataset.

Method Area under ROCLawn Indoor Plaza

Social force [16] 0.96Optical flow [16] 0.84NN [17] 0.93SRC [17] 0.995 0.975 0.964STCOG [18] 0.9362 0.7759 0.9661COV online (ours) 0.9605 0.8628 0.9746

for normal testing are the frames where the people are walk-ing in different directions. The samples for abnormal testingare the frames where people are running. The detectionresults of the lawn scene, indoor scene, and plaza scene areshown in Figures 5, 6, and 7, respectively. Gaussian kernelfor the Lie Group is used in these three scenes. Differentvalues of𝜎 and penalty factor𝐶 are chosen; the area under theROC curve is shown as a function of these parameters [32].The results show that taking covariance matrix as descriptorcan obtain satisfactory performance for abnormal detection.And also, training the samples online can obtain similarlydetection performance as training all the samples offline.


(a) Normal plaza scene (b) Abnormal plaza scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

0.980.960.940.92

0.90.880.860.840.82

0.80 0.05 0.1 0.15 0.2

ROC plaza offline

False positive

True

pos

itive

F1F2

F3

F4

F5


1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

0.980.960.940.92

0.90.880.860.840.82

0.80 0.05 0.1 0.15 0.2

False positive

True

pos

itive

ROC plaza Strategy 1

F1F2

F3

F4

F5

(d) ROC Strategy 1

Figure 7: Detection results of the plaza scene. (a) The detection result of one normal frame. (b) The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the plaza scene results via one-class SVM. All the training samples are learned together offline.The biggest AUC value is 0.9649. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUC value is0.9632.

Online one-class SVM is appropriate to detect abnormalvisual events. There are 1431 frames in the lawn scene, and480 normal frames are used for training. In the offline strat-egy, all the 480 frames covariancematrices should be saved inthememory. In Strategy 1, 100 frames covariancematrices areconsidered as the dictionary firstly.When𝐹

5-17×17 feature is

adopted to construct the covariance descriptor, the varianceof Gaussian kernel is 𝜎 = 1, the preset threshold of thecriterion is 𝜇

0= 0.5, the dictionary size increases from 100

to 101, and the maximum accuracy of the detection resultsis 91.69%. In the indoor scene, there are 2975 normal framesand 1057 abnormal frames. In the plaza scene, there are 1831normal frames and 286 abnormal frames.Theprocesses of theexperiments are similar to the ones of the lawn scene. Whenfeature vector is 𝐹

5-17 × 17, 𝜎 = 1, 𝜇

0= 0.5, the dictionary

size of these two scenes remain 100.The online strategy keepsthe memory size almost unchanged when the size of trainingdataset increases.

5.2. Abnormal Visual Events Detection: Strategy 2. The resultsof the abnormal event detection method via Strategy 2 ofUMNdataset are shown as follows. In the experiment processof the lawn scene, 100 normal samples from the trainingsamples are learnt firstly, and then the other 380 trainingdata are learnt online one by one. After these two trainingsteps, we can obtain the basic dictionary from the trainingsamples and also the classifier. In the following testing step,the dictionary is updated if the sample satisfies the dictionaryupdate criterion. When a new sample is coming, it is firstlydetected by the previous classifier. If it is classified as anomaly,


0.940.92

0.90.880.860.840.82

0.8

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

0 0.05 0.1 0.15 0.2

ROC lawn Strategy 2

F1F2

F3

F4

F5

(a) ROC Strategy 2 lawn scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

0.85

0.8

0.75

0.7

0.65

0 0.1 0.2 0.3 0.4

ROC indoor Strategy 2

F1F2

F3

F4

F5

(b) ROC Strategy 2 indoor scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

0 0.05 0.1 0.15 0.2

ROC plaza Strategy 2

F1F2

F3

F4

F5

0.980.960.940.92

0.90.880.860.840.82

0.8

(c) ROC Strategy 2 plaza scene

1

0.8

0.6

0.4

0.2

00 0.2 0.4 0.6 0.8 1

False positive

True

pos

itive

ROC train data learnt offline

LawnPlazaIndoor

(d) ROC train offline

Figure 8: ROC curve of UMN dataset. (a) ROC curve of different features 𝐹 results via Strategy 2 of the lawn scene.The biggest AUC value is0.9605. (b) Strategy 2 results of the indoor scene. The biggest AUC value is 0.8495. (c) Strategy 2 results of the plaza scene. The biggest AUCvalue is 0.9746. (d)The ROC curve of best performance of the lawn, indoor, and plaza scene when the training samples are learnt offline.Thebiggest AUC values of the lawn, indoor, and plaza are 0.9591, 0.8649 and 0.9649.

the dictionary and the classifier are not changed. Otherwise,if the sample is classified as a normal one, the sparse criterionintroduced in Section 3 is used to check the correlationbetween the earlier dictionary and this new datum. It willbe included into the dictionary when it satisfied the updatecondition. The dictionary will be updated through the wholetesting period. The other two scenes, the indoor and plazascene, are handled by the same methods. When 𝐹

5-17 × 17

feature is adopted, the variance of the Gaussian kernel is𝜎 = 1, and the preset threshold of the criterion is 𝜇

0= 0.5,

and the dictionary size of the lawn, indoor, and plaza scene

is increased from 100 to 106, 102, and 102, respectively. TheROC curve of detection results of these three scenes is shownin Figures 8(a), 8(b), and 8(c). Besides the merit of savingmemory of Strategy 1, Strategy 2 also has the advantage ofadaptation to the long duration sequence.

The results performances of offline strategy, Strategy 1,and Strategy 2 are shown in Table 2. The performances ofthese two strategies results are similar to that of the resultswhen all training samples are learnt together. When 𝐹

4(12 ×

12) or𝐹5(17×17) are chosen as the features to formcovariance

matrix descriptor, the results have the best performance.


These two features are more abundant to include movementand intensity information.

The result performances of the covariancematrix descrip-tor based online one-class SVM method and the state-of-the-art methods are shown in Table 3. The covariance matrixbased online abnormal frame detection method obtainscompetitive performance. In general, our method is betterthan others except sparse reconstruction cost (SRC) [17]in lawn scene and indoor scene. In that paper, multiscaleHOF is taken as a feature, and a testing sample is classi-fied by its sparse reconstructor cost, through a weightedlinear reconstruction of the overcomplete normal basis set.But computation of the HOF might takes more time thancalculating covariance. By adopting the integral image [15],the covariance matrix descriptor of the subimage can becomputed conveniently. So the covariance descriptor can beappropriately used to analyze the partial movement.

6. Conclusions

A method for the abnormal event detection of the frameis proposed. The method consists of covariance matrixdescriptor encoding the movement features, and the onlinenonlinear one-class SVM classification method. We havedeveloped two nonlinear one-class SVM based abnormalevent detection techniques that update the normal modelsof the surveillance video data in an online framework. Theproposed algorithm has been tested on a video datasetyielding successful results to detect abnormal events.

Acknowledgments

This work is partially supported by China ScholarshipCouncil of Chinese Government, SURECAP CPER Projectfunded by Région Champagne-Ardenne, and CAPSECCRCA FEDER platform.

References

[1] F. Jiang, J. Yuan, S. A. Tsaftaris, and A. K. Katsaggelos, “Anoma-lous video event detection using spatiotemporal context,” Com-puter Vision and Image Understanding, vol. 115, no. 3, pp. 323–333, 2011.

[2] C. Piciarelli, C. Micheloni, and G. L. Foresti, “Trajectory-basedanomalous event detection,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 18, no. 11, pp. 1544–1554, 2008.

[3] C. Piciarelli and G. L. Foresti, “On-line trajectory clustering foranomalous events detection,” Pattern Recognition Letters, vol.27, no. 15, pp. 1835–1842, 2006.

[4] T. Xiang and S. Gong, “Incremental and adaptive abnormalbehaviour detection,” Computer Vision and Image Understand-ing, vol. 111, no. 1, pp. 59–73, 2008.

[5] S. Gong and T. Xiang, “Recognition of group activities usingdynamic probabilistic networks,” in Proceedings of the 9th IEEEInternational Conference on Computer Vision (ICCV ’03), pp.742–749, October 2003.

[6] J. KimandK.Grauman, “Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incrementalupdates,” in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition (CVPR ’09), pp. 2921–2928,Miami, Fla, USA, June 2009.

[7] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robustreal-time unusual event detection using multiple fixed-locationmonitors,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 30, no. 3, pp. 555–560, 2008.

[8] O. Boiman andM. Irani, “Detecting irregularities in images andin video,” International Journal of Computer Vision, vol. 74, no.1, pp. 17–31, 2007.

[9] X. Zhu, Z. Liu, and J. Zhang, “Human activity clustering foronline anomaly detection,” Journal of Computers, vol. 6, no. 6,pp. 1071–1079, 2011.

[10] X. Zhu and Z. Liu, “Human behavior clustering for anomalydetection,” Frontiers of Computer Science in China, vol. 5, no.3, pp. 279–289, 2011.

[11] Y. Benezeth, P. M. Jodoin, and V. Saligrama, “Abnormalitydetection using low-level co-occurring events,” Pattern Recog-nition Letters, vol. 32, no. 3, pp. 423–431, 2011.

[12] Y. Benezeth, P. M. Jodoin, V. Saligrama, and C. Rosenberger,“Abnormal events detection based on spatio-temporal co-occurences,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR ’09), pp. 2458–2465,Miami, Fla, USA, June 2009.

[13] A. Mittal, A. Monnet, and N. Paragios, “Scene modeling andchange detection in dynamic scenes: a subspace approach,”Computer Vision and Image Understanding, vol. 113, no. 1, pp.63–79, 2009.

[14] B. K. P. Horn and B. G. Schunck, “Determining optical flow,”Artificial Intelligence, vol. 17, no. 1–3, pp. 185–203, 1981.

[15] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: afast descriptor for detection and classification,” in ComputerVision—ECCV 2006, vol. 3952 of Lecture Notes in ComputerScience, pp. 589–600, Springer, New York, NY, USA, 2006.

[16] R.Mehran, A. Oyama, andM. Shah, “Abnormal crowd behaviordetection using social force model,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPR’09), pp. 935–942, Miami, Fla, USA, June 2009.

[17] Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost forabnormal event detection,” in Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR ’11),pp. 3449–3456, Providence, RI, USA, June 2011.

[18] Y. Shi, Y. Gao, and R. Wang, “Real-time abnormal eventdetection in complicated scenes,” in Proceedings of the 20thInternational Conference on Pattern Recognition (ICPR ’10), pp.3653–3656, Istanbul, Turkey, August 2010.

[19] B. Schölkopf and A. J. Smola, Learning with Kernels: SupportVector Machines, Regularization, Optimization and Beyond,MIT Press, Boston, Mass, USA, 2002.

[20] B. Hall, Lie Groups, Lie Algebras, And Representations: AnElementary Introduction, vol. 222, Springer, New York, NY,USA, 2003.

[21] C. Gao, F. Li, and C. Shen, “Research on lie group kernellearning algorithm,” Journal of Frontiers of Compter Science andTechnology, vol. 6, pp. 1026–1038, 2012.

[22] V. N. Vapnik and A. Lerner, “Pattern recognition using general-ized portrait method,” Automation and Remote Control, vol. 24,pp. 774–780, 1963.

[23] V. Vapnik, The Nature of Statistical Learning Theory, Springer,New York, NY, USA, 2000.

[24] D. Tax,One-class classification [Ph.D. thesis], Delft University ofTechnology, 2001.


[25] D. M. Tax and R. P. Duin, “Data domain description usingsupport vectors,” in Proceedings of the European Symposiumon Artificial Neural Networks, Computational Intelligence andMachine Learning (ESANN ’99), vol. 99, pp. 251–256, 1999.

[26] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R.C. Williamson, “Estimating the support of a high-dimensionaldistribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471,2001.

[27] Z. Noumir, P. Honeine, and C. Richard, “Online one-classmachines based on the coherence criterion,” in Proceedings ofthe 20th European Signal Processing Conference (EUSIPCO ’12),pp. 664–668, 2012.

[28] Z. Noumir, P. Honeine, and C. Richard, “One-class machinesbased on the coherence criterion,” in Proceedings of the IEEEStatistical Signal Processing Workshop (SSP ’12), pp. 600–603,2012.

[29] P. Honeine, “Online kernel principal component analysis: areducedorder model,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 34, no. 9, pp. 1814–1826, 2012.

[30] C. Richard, J. C. M. Bermudez, and P. Honeine, “Onlineprediction of time series data with kernels,” IEEE Transactionson Signal Processing, vol. 57, no. 3, pp. 1058–1067, 2009.

[31] UMN, “Unusual crowd activity dataset of university of Min-nesota,” 2006, http://mha.cs.umn.edu/proj events.shtml.

[32] J. A. Hanley and B. J. McNeil, “The meaning and use of thearea under a receiver operating characteristic (ROC) curve,”Radiology, vol. 143, no. 1, pp. 29–36, 1982.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


research article online detection of abnormal events in ...institut charles delaunay, lms -umr stmr...

Documents