research article online detection of abnormal events in ...institut charles delaunay, lms -umr stmr...
TRANSCRIPT
-
Hindawi Publishing CorporationJournal of Electrical and Computer EngineeringVolume 2013, Article ID 837275, 12 pageshttp://dx.doi.org/10.1155/2013/837275
Research ArticleOnline Detection of Abnormal Events in Video Streams
Tian Wang,1 Jie Chen,2 and Hichem Snoussi1
1 Institut Charles Delaunay, LM2S-UMR STMR 6279 CNRS, University of Technology of Troyes, 10004 Troyes, France2Observatoire de la Côte d’Azur, UMR 7293 CNRS, University of Nice Sophia-Antipolis, 06108 Nice, France
Correspondence should be addressed to Tian Wang; [email protected]
Received 19 September 2013; Accepted 12 November 2013
Academic Editor: Yi Zhou
Copyright © 2013 Tian Wang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We propose an algorithm to handle the problem of detecting abnormal events, which is a challenging but important subject invideo surveillance. The algorithm consists of an image descriptor and online nonlinear classification method. We introduce thecovariance matrix of the optical flow and image intensity as a descriptor encoding moving information. The nonlinear onlinesupport vector machine (SVM) firstly learns a limited set of the training frames to provide a basic reference model then updates themodel and detects abnormal events in the current frame. We finally apply the method to detect abnormal events on a benchmarkvideo surveillance dataset to demonstrate the effectiveness of the proposed technique.
1. Introduction
Visual surveillance is one of the major research areas incomputer vision. In a crowd image analysis problem, thescientific challenge includes abnormal events detection. Forinstance, Figure 1(a) illustrates a normal scene where thepeople are walking. In Figure 1(b), all the people are suddenlyrunning in different directions. This dataset imitates panic-driven scenes.
Trajectory analysis of objects was described in [1–3]. Themoving object was labeled by a blob in consecutive frames,and then a trajectory was produced. The deviation from thelearnt trajectories was defined as abnormal events. Trackingbased approaches are suitable for the sparse scenes with a fewobjects. The target might be lost due to occlusion.
In [4, 5], abnormal detection approaches which used fea-tures encoding motion, texture, and size of the objects wereintroduced. Local image regions in a video were analyzed byemploying background subtraction method; then a dynamicBayesian network (DBN) was constructed to model normaland abnormal behavior, and finally a likelihood ratio testwas applied to detect abnormal behaviors. In [6], a space-time Markov random field (MRF) model which detectedabnormal activities in a video was proposed, mixture ofprobabilistic principal component analyzers (MPPCA) wasadopted to model local optical flow. The prediction is based
on probabilistic assumption techniques where an accuratemodel exists, but there are various situations where a robustand tractablemodel cannot be obtained; model-freemethodsare needed to be studied.
Spatiotemporal motion features described by the contextof bag of video words were adopted to detect abnormalevents. In [7], the authors presented an algorithm whichmonitored optical flow in a set of fixed spatial positions,and constructed a histogram of optical flow. The likelihoodof the behavior in a new coming frame concerning theprobability distribution of the statistically learning behaviorwas computed. If the likelihood fell below a preset threshold,the behavior was considered as abnormal. In [8], irregularbehavior of images or videos was detected by an inferenceprocess in a probabilistic graphical model. In [9, 10], thevideo pixels were densely sampled to form the feature. Thesemethods are based on the partial information of images, suchas small blocks in a frame, without fully exploiting the globalinformation of the feature. In [11–13], spatiotemporal featuresmodeled motion regions of the frame as background, andanomaly was detected by subtracting the newly sample to thebackground template. These works are similar to the changedetection method when the background is not stable.
In this paper, the proposed algorithm is composed of twoparts. Firstly, a covariance feature descriptor is constructedover the whole video frame, and then a nonlinear one-class
-
2 Journal of Electrical and Computer Engineering
support vector machine algorithm is applied in an onlinefashion in order to detect abnormal events. The featuresare extracted based on the optical flow which presents themovement information. Experiments of real surveillancevideo dataset show that our online abnormal detection tech-niques can obtain satisfactory performance. The rest of thepaper is organized as follows. In Section 2, covariance matrixdescriptor of motion feature is introduced. In Section 3, theonline one-class SVM classification method is presented. InSection 4, two abnormal detection strategies based on onlinenonlinear one-class SVM are proposed. In Section 5, wepresent results of real-world video scenes. Finally, Section 6concludes the paper.
2. Covariance Descriptor of Frame Behavior
The optical flow is a feature which presents the directionand the amplitude of a movement. It can provide importantinformation about the spatial arrangement of the objects andthe change rate of this arrangement [14]. We adopt Horn-Schunck (HS) optical flow computation method in our work.The optical flow of the gray scale image is formulated as theminimizer of the following global energy functional:
𝐸 = ∬[(𝐼𝑥𝑢 + 𝐼𝑦V + 𝐼𝑡)2
+ 𝛼2
(‖∇𝑢‖2
+ ‖∇V‖2)] 𝑑𝑥𝑑𝑦, (1)
where 𝐼 is the intensity of the image, 𝐼𝑥, 𝐼𝑦, and 𝐼
𝑡are the
derivatives of the image intensity value along the 𝑥, 𝑦, andtime 𝑡 dimension, 𝑢 and V are the components of the opticalflow in the horizontal and vertical direction, and 𝛼 representsthe weight of the regularization term.
We introduce the covariance matrix encoding the opticalflow and intensity of each frame as the descriptor to representthemovement.The covariance feature descriptor is originallyproposed by Tuzel et al. [15] for pattern matching in a targettracking problem. The descriptor is defined as
𝐹 (𝑥, 𝑦, 𝑖) = 𝜙𝑖(𝐼, 𝑥, 𝑦) , (2)
where 𝐼 is the color information of an image (which can begray, RGB, HSV, HLS, etc.), 𝜙
𝑖is a mapping relating the image
with the 𝑖th feature from the image, 𝐹 is the 𝑊 × 𝐻 × 𝑑dimensional feature extracted from image 𝐼,𝑊 and𝐻 are theimage width and image height, and 𝑑 is the number of chosenfeatures. For each frame, the feature can be represented as𝑑 × 𝑑 covariance matrix:
C = 1𝑛 − 1
𝑛
∑
𝑘=1
(z𝑘− 𝜇) (z
𝑘− 𝜇)⊤
, (3)
where 𝑛 is the number of the pixels sampled in the frame,z𝑘is the feature vector of pixel 𝑘, 𝜇 is the mean of all the
selected points, and C is the covariance matrix of the featurevector 𝐹. The covariance descriptor C of each frame dose nothave any information regarding the sample ordering and thenumber of points [15]. Because the feature 𝐹 can be designedas different approaches to fuse features, the covariancematrixdescriptor proposes a way to merge multiple parameters.Different choices of feature vectors extraction are shown in
Table 1: Different choices of feature 𝐹 to construct the covariancedescriptor.
Feature vector 𝐹𝐹1(6 × 6) [𝑦 𝑥 𝑢 V 𝑢
𝑥𝑢𝑦]
𝐹2(6 × 6) [𝑦 𝑥 𝑢 V V
𝑥V𝑦]
𝐹3(8 × 8) [𝑦 𝑥 𝑢 V 𝑢
𝑥𝑢𝑦V𝑥V𝑦]
𝐹4(12 × 12) [𝑦 𝑥 𝑢 V 𝑢
𝑥𝑢𝑦V𝑥V𝑦𝑢𝑥𝑥
𝑢𝑦𝑦
V𝑥𝑥
V𝑦𝑦]
𝐹5(17 × 17) [𝑦 𝑥 𝑢 V 𝑢
𝑥𝑢𝑦V𝑥V𝑦𝑢𝑥𝑥
𝑢𝑦𝑦
V𝑥𝑥
V𝑦𝑦
𝐼 𝐼𝑥𝐼𝑦𝐼𝑥𝑥
𝐼𝑦𝑦]
Table 1, where 𝐼 is the intensity of the gray image, 𝑢 and Vare horizontal and vertical components of optical flow, 𝐼
𝑥,
𝑢𝑥, and V
𝑥are the first derivatives of the intensity, horizontal
optical flow, and vertical optical flow in the 𝑥 directionrespectively, 𝐼
𝑦, 𝑢𝑦, and V
𝑦are the first derivatives of the
corresponding feature in the 𝑦 direction and respectively,𝐼𝑥𝑥, 𝑢𝑥𝑥, and V
𝑥𝑥are the second derivatives in 𝑥 direction,
𝐼𝑦𝑦, 𝑢𝑦𝑦, and V
𝑦𝑦are the second derivatives in 𝑦 direction.
The flowchart of covariance matrix descriptor computationis shown in Figure 2. The optical flow and correspondingpartial derivative characterize the interframe information orcan be regarded as the movement information. The intensityof the frame and partial derivative of the intensity describethe intraframe information; they encode the appearanceinformation of the frame.
If proper parameters are given, the traditionally usedkernel, such as Gaussian, polynomial, and sigmoidal kernel,has similar performances [19]. Gaussian kernel 𝜅(x
𝑖, x𝑗) =
exp(−‖x𝑖− x𝑗‖2
/2𝜎2
) is chosen for our spatial features. Thecovariance matrix is an element in Lie group; the Gaussiankernel on the Euclidean spaces is not suitable for the covari-ance descriptors.TheGaussian kernel in Lie Group is definedas [20, 21]:
𝜅 (X𝑖,X𝑗) = exp(−
log (X−1
𝑖X𝑗)
2𝜎2) , (X
𝑖,X𝑗) ∈ 𝐺 × 𝐺,
(4)
where X𝑖and X
𝑗are matrices in Lie Group 𝐺.
3. Online One-Class SVM
The essence of an abnormal detection problem is that onlynormal scene samples are available. The one-class SVMframework is well suitable to an abnormal detection problem.Support vector machine (SVM) is initially proposed byVapnik and Lerner [22, 23]. It is a method based on statisticallearning theory and has fine performance to classify data andrecognize patterns. There are two frameworks of one-classSVM, one is support vector data description (SVDD) whichis presented in [24, 25] and the other is ]-support vectorclassifier (]-SVC) introduced in [26].The SVDD formulationis adopted in our work. It computes a sphere shaped decisionboundary with minimal volume around a set of objects. The
-
Journal of Electrical and Computer Engineering 3
(a) Normal lawn scene (b) Abnormal lawn scene
Figure 1: Examples of the normal and abnormal scenes: (a) Normal lawn scene: all the people are walking. (b) Abnormal lawn scene, all thepeople are running.
Consecutive frameFramei
Framei+1
Optical flow field OPi
Features
F(x, y, j)
j = 1, 2, . . . , nCframe𝑖
Figure 2: Covariance descriptor computation of a video frame.
center of the sphere c and the radius 𝑅 are to be determinedvia the following optimization problem:
min𝑅,𝜉,𝑐
𝑅2
+ 𝐶
𝑛
∑
𝑖=1
𝜉𝑖, (5)
subject to Φ (x𝑖) − c
2
≤ 𝑅2
+ 𝜉𝑖, 𝜉𝑖≥ 0, ∀𝑖, (6)
where 𝑛 is the number of training samples and 𝜉𝑖is the
slack variable for penalizing the outliers.The hyperparameter𝐶 is the weight for restraining slack variables; it tunesthe number of acceptable outliers. The nonlinear functionΦ : X → H maps a datum x
𝑖into the feature space
H; it allows to solve a nonlinear classification problem bydesigning a linear classifier in the feature space H. 𝜅 is thekernel function for computing dot products inH, 𝜅(x, x) =⟨Φ(x), Φ(x)⟩. By introducing Lagrange multipliers, the dual
problem associated with (6) is written by the followingquadratic optimization problem:
max𝛼
𝑛
∑
𝑖=1
𝛼𝑖𝜅 (x𝑖, x𝑖) −
𝑛
∑
𝑖,𝑗=1
𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗) ,
subject to 0 ≤ 𝛼𝑖≤ 𝐶,
𝑛
∑
𝑖=1
𝛼𝑖= 1,
c =𝑛
∑
𝑖=1
𝛼𝑖Φ(x𝑖) .
(7)
The decision function is
𝑓 (x) = sgn(𝑅2 −𝑛
∑
𝑖,𝑗=1
𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗)
+ 2
𝑛
∑
𝑖=1
𝛼𝑖𝜅 (x𝑖, x) − 𝜅 (x, x)) .
(8)
For the large training data, the solution cannot beobtained easily, and an online strategy to train the data is
-
4 Journal of Electrical and Computer Engineering
used in our work. Let cD denotes a sparse model of the centerc𝑛= (1/𝑛)∑
𝑛
𝑖=1Φ(x𝑖) by using a small subset of available
samples which is called dictionary:
cD = ∑𝑖∈D
𝛼𝑖Φ(x𝑖) , (9)
whereD ⊂ {1, 2, . . . , 𝑛}, and let𝑁D denote the cardinality ofthis subset xD. The distance of a mapped datum Φ(x) withrespect to the center cD can be calculated by
Φ (x) − cD = ∑
𝑖,𝑗∈D
𝛼𝑖𝛼𝑗𝜅 (x𝑖, x𝑗)
− 2∑
𝑖∈D
𝛼𝑖𝜅 (x𝑖, x) + 𝜅 (x, x) .
(10)
A modification of the original formulation of the one-class classification algorithm that consists of minimizing theapproximation error ‖c
𝑛− cD‖ is [27, 28]
𝛼 = arg min𝛼𝑖,𝑖∈D
1
𝑛
𝑛
∑
𝑖=1
Φ(x𝑖) − ∑
𝑖∈D
𝛼𝑖Φ(x𝑖)
2
. (11)
The final solution is given by
𝛼 = K−1𝜅, (12)
where K is the Grammatrix with (𝑖, 𝑗)th entry 𝜅(x𝑖, x𝑗) and 𝜅
is the column vector with entries (1/𝑛)∑𝑛𝑖=1
𝜅(x𝑘, x𝑖), 𝑘 ∈ D.
In the online scheme, at each time step there is a newsample. Let 𝛼
𝑛denote the coefficients, K
𝑛denote the Gram
matrix, and 𝜅𝑛denote the vector, at time step 𝑛. A criterion is
used to determine whether the new sample can be includedinto the dictionary. A threshold 𝜇
0is preset, for the datum
x𝑡at time step 𝑡, the coherence-based sparsification criterion
[29, 30] is
𝜖𝑡= max𝑖∈D
𝜅 (x𝑖, x𝑡) . (13)
First Case (𝜖𝑡> 𝜇0). In this case, the new data Φ(x
𝑛+1) is not
included into the dictionary.The GrammatrixK𝑛+1
= K𝑛. 𝜅𝑛
changes online:
𝜅𝑛+1
=1
𝑛 + 1(𝑛𝜅𝑛+ b) ,
𝛼𝑛+1
= K−1𝑛+1𝜅𝑛+1
=𝑛
𝑛 + 1𝛼𝑛+
1
𝑛 + 1K−1𝑛b,
(14)
where b is the column vector with entries 𝜅(x𝑖, x𝑛+1
), 𝑖 ∈ D.
Second Case (𝜖𝑡≤ 𝜇0). In this case, the new data Φ(x
𝑛+1) is
included into the dictionaryD. The Grammatrix K changes:
K𝑛+1
= [
[
K𝑛
bb⊤ 𝜅 (x
𝑛+1, x𝑛+1
)]
]
. (15)
By using Woodbury matrix identity
(𝐴 + 𝑈𝐶𝑉)−1
= 𝐴−1
− 𝐴−1
𝑈(𝐶−1
+ 𝑉𝐴−1
𝑈)−1
𝑉𝐴−1
, (16)
K−1𝑛+1
can be calculated iteratively:
K−1𝑛+1
= [
[
K−1𝑛
00⊤ 0
]
]
+1
𝜅 (x𝑛+1
, x𝑛+1
) − b⊤K−1𝑛b
× [
[
−K−1𝑛b
1]
]
[−b⊤K−1𝑛1] .
(17)
The vector 𝜅𝑛+1
is updated from 𝜅𝑛,
𝜅𝑛+1
=1
𝑛 + 1[𝑛𝜅𝑛+ �⃗�
𝜅𝑛+1
] , (18)
with
𝜅𝑛+1
=
𝑛+1
∑
𝑖=1
𝜅 (x𝑛+1
, x𝑖) . (19)
Computing 𝜅𝑛+1
as (19) needs to save all the samples {x}𝑛+1𝑖=1
inmemory. For conquering this issue, it can compute as 𝜅
𝑛+1=
(𝑛 + 1)𝜅(x𝑛+1
, x𝑛+1
) by considering an instant estimation.Theupdate of 𝛼
𝑛+1from 𝛼
𝑛is
𝛼𝑛+1
=1
𝑛 + 1[𝑛𝛼𝑛+ K−1𝑛b
0]
−1
(𝑛 + 1) (𝜅 (x𝑛+1
, x𝑛+1
) − b⊤K−1𝑛b)
× [
[
K−1𝑛b
1]
]
(𝑛b⊤𝛼𝑛+ b⊤K−1
𝑛b − 𝜅𝑛+1
) .
(20)
Based on (20), we have an online implementation of the one-class SVM learning phase.
4. Abnormal Events Detection
In an abnormal event detection problem, it is assumedthat a set of training frames {𝐼
1, . . . , 𝐼
𝑛} (the positive class)
describing the normal behavior is obtained. The generalarchitectures of abnormal detection are introduced below.
The offline training strategy refers to the case where allthe training samples are learnt as one batch, as shown inFigure 3(a). We propose two abnormal detection strategies;the difference between these two strategies is the time whenthe dictionary is fixed. These two strategies are shown inFigures 3(b) and 3(c). Strategy 1 is shown in Figure 3(b). Thetraining data are learnt one by one.When the training periodis finished, the dictionary and the classifier are fixed. Eachtest datum is classified based on the dictionary. Figure 3(c)illustrates Strategy 2. The training procedure is the same asStrategy 1. But in the testing period, the dictionary is updatedif the datum x
𝑖satisfies the dictionary update condition. The
details of these two strategies are explained in the following.
-
Journal of Electrical and Computer Engineering 5
Train offline
Test onlinem n − m
(a) Strategy offline
Test online
Train online
m n − m
Train
Dictionary fixed
(b) Strategy 1
Train
Dictionary fixed
Train and test online
Test onlinem n − m
(c) Strategy 2
Figure 3: Offline and two online abnormal event detection strategies based on one-class SVM. (a) Strategy offline.The training data are learntas one batch offline. (b) Strategy 1. The dictionary is fixed when all the training data are learnt. (c) Strategy 2. The dictionary continues beingupdated through the testing period.
Features selectionon original image
Learning step (online):Covariance
Covariance
optical flow
Classification
SVM train:
(online training)
people walk
Detection step (online):One-class SVM
R
Origin
Abnormal eventdetection:
people run
optical flow
x1,1 x1,2 · · ·
· · ·
· · ·
x1,n
x2,1 x2,2 x2,n
xn,1 xn,2 xn,n
......
...
x1,1 x1,2 · · ·
· · ·
· · ·
x1,n
x2,1 x2,2 x2,n
xn,1 xn,2 xn,n
......
...
⋱
⋱
Figure 4: Major processing states of the proposed abnormal frame event detection method. The covariance of the frame is computed.
-
6 Journal of Electrical and Computer Engineering
4.1. Strategy 1. In Strategy 1, the dictionary is updated merelythrough the training period.
Step 1. The first step is calculating the covariance matrixdescriptor of training frames based on the image intensity andthe optical flow. This step can be generalized as
{(𝐼1,OP1) , (𝐼2,OP2) , . . . , (𝐼
𝑛,OP𝑛)} → {C
1,C2, . . . ,C
𝑛} ,
(21)
where {(I1,OP1), (I2,OP2), . . . , (𝐼
𝑛,OP𝑛)} are the image
intensity and the corresponding optical flow of the 1st to 𝑛thframe. {C
1,C2, . . . ,C
𝑛} are the covariance matrix descriptors.
Step 2. The second step consists of applying one-class SVMon the small subset of extracted descriptor of the trainingnormal frames to obtain the support vectors. Consider asubset {C
𝑖}𝑚
𝑖=1, 1 ≤ 𝑚 ≪ 𝑛 of data selected from the full
training sample set {C𝑖}𝑛
𝑖=1, without loss of generality, and
assume that the first 𝑚 examples are chosen. This set of 𝑚examples is called dictionary CD:
{C1,C2, . . . ,C
𝑚} , 1 ≤ 𝑚 ≪ 𝑛
SVM→ support vector {𝑆𝑝
1, 𝑆𝑝2, . . . , 𝑆𝑝
𝑜} ,
(22)
where the set {C1,C2, . . . ,C
𝑚} is the first𝑚 covariancematrix
descriptors of the training frames; it is the original dictionaryCD. In one-class SVM, the majority of the training samplesdo not contribute to the definition of the decision function.The entries of a monitory subset of the training samples,{𝑆𝑝1, 𝑆𝑝2, . . . , 𝑆𝑝
𝑜}, 𝑜 ≤ 𝑚, are support vectors contributing
to the definition of the decision function.
Step 3. After learning the dictionary CD which includesthe first 𝑚, 1 ≤ 𝑚 ≪ 𝑛 samples, the training samples{C𝑚+1
,C𝑚+2
, . . . ,C𝑛} are learned online via the technique
described in Section 3. This step can be generalized as
{CD,C𝑘} , 𝑚 < 𝑘 ≤ 𝑛SVM→ support vector {𝑆𝑝
1, 𝑆𝑝2, . . . , 𝑆𝑝
𝑝} ,
𝑜 ≤ 𝑝 ≤ 𝑛,
CD := CD ∪ C𝑘, if 𝜖𝑡 ≥ 𝜇0,(23)
where CD is the dictionary obtained through Step 2, C𝑘 is anew sample in the remaining training dataset. According tothe criterion introduced in Section 3, if the new sample C
𝑘
satisfies the dictionary updated condition, it will be includedinto the dictionary CD.
Step 4. Based on the dictionary and the classifier obtainedfrom the training frames, the incoming frame sample C
𝑛+𝑙is
Table 2: AUC of abnormal event detection method of differentfeatures 𝐹 via original SVDD which learns training samples offline,Strategy 1 online one-class SVM, and Strategy 2 online one-classSVM.The biggest value of each method is shown in bold.
Features Area under ROCLawn Indoor Plaza
Training samples are learned offline𝐹1(6 × 6𝑑𝑢) 0.9426 0.8351 0.9323
𝐹2(6 × 6𝑑V) 0.9400 0.8358 0.9321
𝐹3(8 × 8) 0.9440 0.8375 0.9359
𝐹4(12 × 12) 0.9591 0.8440 0.9580
𝐹5(17 × 17) 0.9567 0.8649 0.9649
Strategy 1𝐹1(6 × 6𝑑𝑢) 0.9399 0.8328 0.9343
𝐹2(6 × 6𝑑V) 0.9390 0.8355 0.9366
𝐹3(8 × 8) 0.9418 0.8377 0.9411
𝐹4(12 × 12) 0.9581 0.8457 0.9573
𝐹5(17 × 17) 0.9551 0.8628 0.9632
Strategy 2𝐹1(6 × 6𝑑𝑢) 0.9427 0.8237 0.9288
𝐹2(6 × 6𝑑V) 0.9370 0.8241 0.9283
𝐹3(8 × 8) 0.9430 0.8274 0.9312
𝐹4(12 × 12) 0.9605 0.8331 0.9505
𝐹5(17 × 17) 0.9601 0.8495 0.9746
classified.Theworkflow of Strategy 1 is shown in Figure 4 anddescribed by the following equation:
𝑓 (C𝑛+𝑙) = sgn(𝑅2 −
𝑛
∑
𝑖,𝑗=1
𝛼𝑖𝛼𝑗𝜅 (C𝑖,C𝑗)
+ 2∑
𝑖
𝛼𝑖𝜅 (C𝑖,C𝑛+𝑙) − 𝜅 (C
𝑛+𝑙,C𝑛+𝑙))
= {1 𝑓 (C
𝑛+𝑙) ≥ 0
−1 𝑓 (C𝑛+𝑙) < 0,
(24)
whereC𝑛+𝑙
is the covariance matrix descriptor of the (𝑛+ 𝑙)thframe needed to be classified and C
𝑖and C
𝑗are the samples
of the dictionary CD. “1” corresponds to the normal frame;“−1” corresponds to the abnormal frame.
4.2. Strategy 2. In this strategy, the dictionary is updatedthrough both training and testing periods.The feature extrac-tion step (Step 1) and the online training steps (Steps 2 and 3)are the same as the ones presented in Strategy 1. The testingstep is different. The new coming datum which is detectedas normal but satisfies dictionary update condition shouldbe included into CD. The dictionary is needed to be updatedthrough the testing period to include new samples.
Step 4: Strategy 2. If the incoming frame sample C𝑛+𝑙
isclassified as normal (𝑓(C
𝑛+𝑙) = 1), the data is checked by
the criterion described in Section 3. When the data satisfies
-
Journal of Electrical and Computer Engineering 7
(a) Normal lawn scene (b) Abnormal lawn scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
ROC lawn offline
0.940.92
0.90.880.860.840.82
0.80 0.05 0.1 0.15 0.2
F1F2
F3
F4
F5
(c) ROC train offline
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
0.940.92
0.90.880.860.840.82
0.80 0.05 0.1 0.15 0.2
ROC lawn Strategy 1
F1F2
F3
F4
F5
(d) ROC Strategy 1
Figure 5: Detection results of the lawn scene. (a) The detection result of one normal frame. (b) The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the lawn scene results via one-class SVM. All the training samples are learned together offline.The biggest AUC value is 0.9591. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUC value is0.9581.
the dictionary update criterion, this testing sample will beincluded into the dictionary. This step can be generalized bythe following equation:
𝑓 (C𝑛+𝑙) = sgn(𝑅2 −
𝑛
∑
𝑖,𝑗=1
𝛼𝑖𝛼𝑗𝜅 (C𝑖,C𝑗)
+ 2∑
𝑖
𝛼𝑖𝜅 (C𝑖,C𝑛+𝑙) − 𝜅 (C
𝑛+𝑙,C𝑛+𝑙))
=
{
{
{
1 𝑓 (C𝑛+𝑙) ≥ 0 {
𝜖𝑡≥ 𝜇0→ CD := CD ∪ C𝑛+𝑙
𝜖𝑡< 𝜇0→ CD := CD
−1 𝑓 (C𝑛+𝑙) < 0.
(25)
5. Abnormal Detection Result
This section presents the results of experiments conducted toanalyze the performance of the proposedmethod. A compet-itive performance through both Strategy 1 and Strategy 2 ofUMN [31] dataset is presented.
5.1. Abnormal Visual Events Detection: Strategy 1. The resultsof the proposed abnormal events detection method viaStrategy 1 online one-class SVM of UMN [31] dataset areshown below.
The UMN dataset includes eleven video sequences ofthree different scenes (the lawn, indoor, and plaza) ofcrowded escape events. The normal samples for training or
-
8 Journal of Electrical and Computer Engineering
(a) Normal indoor scene (b) Abnormal indoor scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
ROC indoor offline
0.85
0.8
0.75
0.7
0.65
0 0.1 0.2 0.3 0.4
F1F2
F3
F4
F5
(c) ROC train offline
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
ROC indoor Strategy 1
0.85
0.8
0.75
0.7
0.65
0 0.1 0.2 0.3 0.4
F1F2
F3
F4
F5
(d) ROC Strategy 1
Figure 6: Detection results of the indoor scene. (a)The detection result of one normal frame. (b)The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the indoor scene results via one-class SVM. All the training samples are learned togetheroffline. The biggest AUC value is 0.8649. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUCvalue is 0.8628.
Table 3: The comparison of our proposed method with the state-of-the-art methods for the abnormal event detection in the UMNdataset.
Method Area under ROCLawn Indoor Plaza
Social force [16] 0.96Optical flow [16] 0.84NN [17] 0.93SRC [17] 0.995 0.975 0.964STCOG [18] 0.9362 0.7759 0.9661COV online (ours) 0.9605 0.8628 0.9746
for normal testing are the frames where the people are walk-ing in different directions. The samples for abnormal testingare the frames where people are running. The detectionresults of the lawn scene, indoor scene, and plaza scene areshown in Figures 5, 6, and 7, respectively. Gaussian kernelfor the Lie Group is used in these three scenes. Differentvalues of𝜎 and penalty factor𝐶 are chosen; the area under theROC curve is shown as a function of these parameters [32].The results show that taking covariance matrix as descriptorcan obtain satisfactory performance for abnormal detection.And also, training the samples online can obtain similarlydetection performance as training all the samples offline.
-
Journal of Electrical and Computer Engineering 9
(a) Normal plaza scene (b) Abnormal plaza scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
0.980.960.940.92
0.90.880.860.840.82
0.80 0.05 0.1 0.15 0.2
ROC plaza offline
False positive
True
pos
itive
F1F2
F3
F4
F5
(c) ROC train offline
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
0.980.960.940.92
0.90.880.860.840.82
0.80 0.05 0.1 0.15 0.2
False positive
True
pos
itive
ROC plaza Strategy 1
F1F2
F3
F4
F5
(d) ROC Strategy 1
Figure 7: Detection results of the plaza scene. (a) The detection result of one normal frame. (b) The detection result of one abnormal panicframe. (c) ROC curve of different features 𝐹 of the plaza scene results via one-class SVM. All the training samples are learned together offline.The biggest AUC value is 0.9649. (d) ROC curve of different features 𝐹 results via Strategy 1 online one-class SVM.The biggest AUC value is0.9632.
Online one-class SVM is appropriate to detect abnormalvisual events. There are 1431 frames in the lawn scene, and480 normal frames are used for training. In the offline strat-egy, all the 480 frames covariancematrices should be saved inthememory. In Strategy 1, 100 frames covariancematrices areconsidered as the dictionary firstly.When𝐹
5-17×17 feature is
adopted to construct the covariance descriptor, the varianceof Gaussian kernel is 𝜎 = 1, the preset threshold of thecriterion is 𝜇
0= 0.5, the dictionary size increases from 100
to 101, and the maximum accuracy of the detection resultsis 91.69%. In the indoor scene, there are 2975 normal framesand 1057 abnormal frames. In the plaza scene, there are 1831normal frames and 286 abnormal frames.Theprocesses of theexperiments are similar to the ones of the lawn scene. Whenfeature vector is 𝐹
5-17 × 17, 𝜎 = 1, 𝜇
0= 0.5, the dictionary
size of these two scenes remain 100.The online strategy keepsthe memory size almost unchanged when the size of trainingdataset increases.
5.2. Abnormal Visual Events Detection: Strategy 2. The resultsof the abnormal event detection method via Strategy 2 ofUMNdataset are shown as follows. In the experiment processof the lawn scene, 100 normal samples from the trainingsamples are learnt firstly, and then the other 380 trainingdata are learnt online one by one. After these two trainingsteps, we can obtain the basic dictionary from the trainingsamples and also the classifier. In the following testing step,the dictionary is updated if the sample satisfies the dictionaryupdate criterion. When a new sample is coming, it is firstlydetected by the previous classifier. If it is classified as anomaly,
-
10 Journal of Electrical and Computer Engineering
0.940.92
0.90.880.860.840.82
0.8
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
0 0.05 0.1 0.15 0.2
ROC lawn Strategy 2
F1F2
F3
F4
F5
(a) ROC Strategy 2 lawn scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
0.85
0.8
0.75
0.7
0.65
0 0.1 0.2 0.3 0.4
ROC indoor Strategy 2
F1F2
F3
F4
F5
(b) ROC Strategy 2 indoor scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
0 0.05 0.1 0.15 0.2
ROC plaza Strategy 2
F1F2
F3
F4
F5
0.980.960.940.92
0.90.880.860.840.82
0.8
(c) ROC Strategy 2 plaza scene
1
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1
False positive
True
pos
itive
ROC train data learnt offline
LawnPlazaIndoor
(d) ROC train offline
Figure 8: ROC curve of UMN dataset. (a) ROC curve of different features 𝐹 results via Strategy 2 of the lawn scene.The biggest AUC value is0.9605. (b) Strategy 2 results of the indoor scene. The biggest AUC value is 0.8495. (c) Strategy 2 results of the plaza scene. The biggest AUCvalue is 0.9746. (d)The ROC curve of best performance of the lawn, indoor, and plaza scene when the training samples are learnt offline.Thebiggest AUC values of the lawn, indoor, and plaza are 0.9591, 0.8649 and 0.9649.
the dictionary and the classifier are not changed. Otherwise,if the sample is classified as a normal one, the sparse criterionintroduced in Section 3 is used to check the correlationbetween the earlier dictionary and this new datum. It willbe included into the dictionary when it satisfied the updatecondition. The dictionary will be updated through the wholetesting period. The other two scenes, the indoor and plazascene, are handled by the same methods. When 𝐹
5-17 × 17
feature is adopted, the variance of the Gaussian kernel is𝜎 = 1, and the preset threshold of the criterion is 𝜇
0= 0.5,
and the dictionary size of the lawn, indoor, and plaza scene
is increased from 100 to 106, 102, and 102, respectively. TheROC curve of detection results of these three scenes is shownin Figures 8(a), 8(b), and 8(c). Besides the merit of savingmemory of Strategy 1, Strategy 2 also has the advantage ofadaptation to the long duration sequence.
The results performances of offline strategy, Strategy 1,and Strategy 2 are shown in Table 2. The performances ofthese two strategies results are similar to that of the resultswhen all training samples are learnt together. When 𝐹
4(12 ×
12) or𝐹5(17×17) are chosen as the features to formcovariance
matrix descriptor, the results have the best performance.
-
Journal of Electrical and Computer Engineering 11
These two features are more abundant to include movementand intensity information.
The result performances of the covariancematrix descrip-tor based online one-class SVM method and the state-of-the-art methods are shown in Table 3. The covariance matrixbased online abnormal frame detection method obtainscompetitive performance. In general, our method is betterthan others except sparse reconstruction cost (SRC) [17]in lawn scene and indoor scene. In that paper, multiscaleHOF is taken as a feature, and a testing sample is classi-fied by its sparse reconstructor cost, through a weightedlinear reconstruction of the overcomplete normal basis set.But computation of the HOF might takes more time thancalculating covariance. By adopting the integral image [15],the covariance matrix descriptor of the subimage can becomputed conveniently. So the covariance descriptor can beappropriately used to analyze the partial movement.
6. Conclusions
A method for the abnormal event detection of the frameis proposed. The method consists of covariance matrixdescriptor encoding the movement features, and the onlinenonlinear one-class SVM classification method. We havedeveloped two nonlinear one-class SVM based abnormalevent detection techniques that update the normal modelsof the surveillance video data in an online framework. Theproposed algorithm has been tested on a video datasetyielding successful results to detect abnormal events.
Acknowledgments
This work is partially supported by China ScholarshipCouncil of Chinese Government, SURECAP CPER Projectfunded by Région Champagne-Ardenne, and CAPSECCRCA FEDER platform.
References
[1] F. Jiang, J. Yuan, S. A. Tsaftaris, and A. K. Katsaggelos, “Anoma-lous video event detection using spatiotemporal context,” Com-puter Vision and Image Understanding, vol. 115, no. 3, pp. 323–333, 2011.
[2] C. Piciarelli, C. Micheloni, and G. L. Foresti, “Trajectory-basedanomalous event detection,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 18, no. 11, pp. 1544–1554, 2008.
[3] C. Piciarelli and G. L. Foresti, “On-line trajectory clustering foranomalous events detection,” Pattern Recognition Letters, vol.27, no. 15, pp. 1835–1842, 2006.
[4] T. Xiang and S. Gong, “Incremental and adaptive abnormalbehaviour detection,” Computer Vision and Image Understand-ing, vol. 111, no. 1, pp. 59–73, 2008.
[5] S. Gong and T. Xiang, “Recognition of group activities usingdynamic probabilistic networks,” in Proceedings of the 9th IEEEInternational Conference on Computer Vision (ICCV ’03), pp.742–749, October 2003.
[6] J. KimandK.Grauman, “Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incrementalupdates,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR ’09), pp. 2921–2928,Miami, Fla, USA, June 2009.
[7] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robustreal-time unusual event detection using multiple fixed-locationmonitors,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 30, no. 3, pp. 555–560, 2008.
[8] O. Boiman andM. Irani, “Detecting irregularities in images andin video,” International Journal of Computer Vision, vol. 74, no.1, pp. 17–31, 2007.
[9] X. Zhu, Z. Liu, and J. Zhang, “Human activity clustering foronline anomaly detection,” Journal of Computers, vol. 6, no. 6,pp. 1071–1079, 2011.
[10] X. Zhu and Z. Liu, “Human behavior clustering for anomalydetection,” Frontiers of Computer Science in China, vol. 5, no.3, pp. 279–289, 2011.
[11] Y. Benezeth, P. M. Jodoin, and V. Saligrama, “Abnormalitydetection using low-level co-occurring events,” Pattern Recog-nition Letters, vol. 32, no. 3, pp. 423–431, 2011.
[12] Y. Benezeth, P. M. Jodoin, V. Saligrama, and C. Rosenberger,“Abnormal events detection based on spatio-temporal co-occurences,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR ’09), pp. 2458–2465,Miami, Fla, USA, June 2009.
[13] A. Mittal, A. Monnet, and N. Paragios, “Scene modeling andchange detection in dynamic scenes: a subspace approach,”Computer Vision and Image Understanding, vol. 113, no. 1, pp.63–79, 2009.
[14] B. K. P. Horn and B. G. Schunck, “Determining optical flow,”Artificial Intelligence, vol. 17, no. 1–3, pp. 185–203, 1981.
[15] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: afast descriptor for detection and classification,” in ComputerVision—ECCV 2006, vol. 3952 of Lecture Notes in ComputerScience, pp. 589–600, Springer, New York, NY, USA, 2006.
[16] R.Mehran, A. Oyama, andM. Shah, “Abnormal crowd behaviordetection using social force model,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPR’09), pp. 935–942, Miami, Fla, USA, June 2009.
[17] Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost forabnormal event detection,” in Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR ’11),pp. 3449–3456, Providence, RI, USA, June 2011.
[18] Y. Shi, Y. Gao, and R. Wang, “Real-time abnormal eventdetection in complicated scenes,” in Proceedings of the 20thInternational Conference on Pattern Recognition (ICPR ’10), pp.3653–3656, Istanbul, Turkey, August 2010.
[19] B. Schölkopf and A. J. Smola, Learning with Kernels: SupportVector Machines, Regularization, Optimization and Beyond,MIT Press, Boston, Mass, USA, 2002.
[20] B. Hall, Lie Groups, Lie Algebras, And Representations: AnElementary Introduction, vol. 222, Springer, New York, NY,USA, 2003.
[21] C. Gao, F. Li, and C. Shen, “Research on lie group kernellearning algorithm,” Journal of Frontiers of Compter Science andTechnology, vol. 6, pp. 1026–1038, 2012.
[22] V. N. Vapnik and A. Lerner, “Pattern recognition using general-ized portrait method,” Automation and Remote Control, vol. 24,pp. 774–780, 1963.
[23] V. Vapnik, The Nature of Statistical Learning Theory, Springer,New York, NY, USA, 2000.
[24] D. Tax,One-class classification [Ph.D. thesis], Delft University ofTechnology, 2001.
-
12 Journal of Electrical and Computer Engineering
[25] D. M. Tax and R. P. Duin, “Data domain description usingsupport vectors,” in Proceedings of the European Symposiumon Artificial Neural Networks, Computational Intelligence andMachine Learning (ESANN ’99), vol. 99, pp. 251–256, 1999.
[26] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R.C. Williamson, “Estimating the support of a high-dimensionaldistribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471,2001.
[27] Z. Noumir, P. Honeine, and C. Richard, “Online one-classmachines based on the coherence criterion,” in Proceedings ofthe 20th European Signal Processing Conference (EUSIPCO ’12),pp. 664–668, 2012.
[28] Z. Noumir, P. Honeine, and C. Richard, “One-class machinesbased on the coherence criterion,” in Proceedings of the IEEEStatistical Signal Processing Workshop (SSP ’12), pp. 600–603,2012.
[29] P. Honeine, “Online kernel principal component analysis: areducedorder model,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 34, no. 9, pp. 1814–1826, 2012.
[30] C. Richard, J. C. M. Bermudez, and P. Honeine, “Onlineprediction of time series data with kernels,” IEEE Transactionson Signal Processing, vol. 57, no. 3, pp. 1058–1067, 2009.
[31] UMN, “Unusual crowd activity dataset of university of Min-nesota,” 2006, http://mha.cs.umn.edu/proj events.shtml.
[32] J. A. Hanley and B. J. McNeil, “The meaning and use of thearea under a receiver operating characteristic (ROC) curve,”Radiology, vol. 143, no. 1, pp. 29–36, 1982.
-
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Journal ofEngineeringVolume 2014
Submit your manuscripts athttp://www.hindawi.com
VLSI Design
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
DistributedSensor Networks
International Journal of