toprep
TRANSCRIPT
-
7/31/2019 toprep
1/30
Abstract
Background subtraction is a technique of separating out foreground objects from the
background in a sequence of video frames. Background/Foreground must arguably be one of
the most popular research topics in computer vision industries. It finds use in a lot of
applications related to video, such as video surveillance, traffic monitoring, and gesture
recognition for human-machine interfaces, to name a few.
Many methods exist for background subtraction, each with different strengths and weaknesses
in terms of performance and computational requirements. My project focuses mainly on two of
the used techniques. The results of the techniques have been taken for different video
sequences. I have tried to draw a comparison between the two techniques namely FrameDifferencing and Approximate Median.
These techniques were chosen for the reasons of being:
Computationally efficient for many low power applications.
Being a good representation of background subtraction implementations in today's
video applications.
1
-
7/31/2019 toprep
2/30
1. Introduction
BGS techniques are defined by the background model and the foreground detection process.
According to [1], the background model is described by three aspects: the initialization, the
representation and the update process of the scene background. A correct initialization allows
acquiring a background of the scene without errors. For instance, techniques that analyse video
sequences with presence of moving objects in the whole sequence should consider different
initialization schemes to avoid the acquisition of an incorrect background of the scene. The
representation describes the mathematical techniques used to model the value of each
background pixel. For instance, unimodal sequences (where background pixels variationfollows a unimodal scheme) need more simple models to describe the background of the scene
than the multimodal ones (where background pixels, due to scene dynamism, vary following
more complex schema). The update process allows incorporating specific global changes in the
background model, such as those owing to illumination and viewpoint variation. Additionally,
these techniques usually include pre-processing and post-processing stages to improve final
foreground detection results.
1.1 Challenges for any model
The background image is not fixed but must adapt to:
Illumination changes
Motion changes
Changes in the background geometry.
These changes might result in errors and hence these must be dealt with
wisely.
2
-
7/31/2019 toprep
3/30
1.2Classification of techniques
The BGS techniques, classified according to the model representation, just in order to identify
their most relevant parameters and implementation details that might diverge from the
referenced work.
1.2.1 Basic Models
Frame differencing (FD) [5]: also known as temporal difference, this method uses the
previous frame as background model for the current frame. Setting a threshold, , on the
squared difference between model and frame decides on foreground and background. This
threshold is the analysed parameter.
Median filtering (MF) [3]: uses the median of the previous N frames as background model for
the current frame. As FD, a threshold on the model-frame difference decides. This threshold
is the only analysed parameter. This method claims to be very robust, but requires memory
resources to store the last N frames.
1.2.2 Parametric Models
Simple Gaussian (SG) [7]: represents each background pixel variation with a Gaussian
distribution. For every new frame, a pixel is determined to belong to the background if it falls
into a deviation, , around the mean. The parameters of each Gaussian are updated with the
current frame pixel by using a running average scheme [6], controlled by a learning factor
.The initial deviation value, o, and are the analysed parameters.
Mixture of Gaussians (MoG) [9]: represents each background pixel variation with a set of
weighted Gaussian distributions. Distributions are ordered according to its weight; the more
relevant (until the accumulated weight gets past a threshold, ) are considered to model the
background; the remaining models the foreground. A pixel is decided to belong to the
background if it falls into a deviation,, around the mean of any of the Gaussians that model it.
The update process is only performed on the Gaussian distribution that describes the pixel
3
-
7/31/2019 toprep
4/30
value in the current frame, also following a running average scheme with parameter . The
initial Gaussians deviation value,o, the threshold , and are the analysed parameters.
Gamma method (G) [8]: in practice represents each background pixel with a running averageof its previous values. The decision on a pixel belonging to the background is performed by
summing up the square differences between pixel values of a square spatial window centred in
the considered pixel frame and the corresponding background model values, and setting a
threshold over it. A theoretical development, based on the assumption that the pixel variation
follows a Gaussian, concludes that the threshold function follows a Chi-square distribution,
which bases the threshold selection, in fact a probability. This threshold is the analysed
parameter.
1.2.3 Non-Parametric Models
Histogram-based approach (Hb) [15]: represents each background pixel variation with a
histogram of its last N values, which is re-computed every L frames (L
-
7/31/2019 toprep
5/30
2. Equipment and Methodology
2.1 Equipment used
The equipment used for any setting will include a digital camera and a processing tool that will
perform the extraction process. The background subtraction algorithms typically process lower
resolution grayscale video, so it might also require a video editing and compressing software.
2.1.1 MATLAB
For the project, I chose to use Matlab as the programming language. It is a high-level language
that specializes in data analysis and computing mathematical problems. Matlabs official
website can be found at www.mathworks.com. The program environment has an interactive
command window that allows users to test and experiment with the code line by line. Users can
also save their codes into an M-file and run the program. The Matlab Help Navigator is also
very useful. It properly categorizes and provides detailed explanations and sample usages of all
functions. Just like C++ and Java, the language syntax provides loops and condition statements
for programming purposes.
The language was chosen over C++ and Java because there are a lot of built-in functions
that are specific for image processing. As well, the compiler can compute large mathematical
equations faster than other languages. These advantages suit the project perfectly due to the
large matrix computations required during the extraction process. There were some minor
problems that occurred during the working of the project. The first problem was that Matlab is
a complete new language and environment for me. I had to get myself familiarized with Matlab
by practicing simple tutorials and exploring with the programming environment. Another
problem that arose is that Matlab takes a long time running the segmentation code.
5
-
7/31/2019 toprep
6/30
When compared to C++ and Java, Matlab can calculate matrices quicker, but the large video
files take a long time for a scripting language to compile. Lastly, the Matlab software
environment requires a lot of memory to run. During the process of starting up and compiling,
windows often cannot provide enough memory for Matlab and windows will sometimes
shutdown automatically.
2.2 Assumptions
Assumptions for the project may alter the decision for choosing the practical extraction
algorithm. There are many different techniques that can extract background/foregrounds from
videos. There are a couple assumptions relative to the background environment of the video. The first assumption allows the user to choose the location of the video. It can be filmed
either indoor or outdoor.
Secondly, lighting in the video must always be constant due to the difficulty that arises
when a light source changes its brightness or location.
Also, the background of the video must be static. No moving objects are allowed in the
background. Even slight movement such as a reflection off of a window can create unwantednoises. Lastly, the software process is not required to run at real-time. This assumption greatly
reduces the complexity of the software.
The videos are taken for two setting inside and outside environments. (The earlier discussed
problems of moving background and illumination have caused some minor deviations in the
result but overall it seems to be running fine.)
2.3 Algorithm
6
-
7/31/2019 toprep
7/30
The two models used basically use a similar approach in the sense that we need to compare the
frame difference with a threshold. For instance in frame differencing the step that does the job
of background/foreground detection is
The algorithm will be taken up with the techniques later on.
7
-
7/31/2019 toprep
8/30
3. Designs and Logic
I will now discuss the two basic models used for Background/Foreground subtraction. The
algorithm and steps involved will be given with it.
3.1 Frame Difference
Frame difference is arguably the simplest form of background subtraction. The current frame is
simply subtracted from the previous frame, and if the difference in pixel values for a given
pixel is greater than a threshold Ts, the pixel is considered part of the foreground.
3.1.1 Algorithm
Figure 1:The figure shows Algorithm for Frame Differencing model
8
-
7/31/2019 toprep
9/30
The algorithm of two-frame-based B/F detection is described below.
fi : A pixel in a current frame, where I is the frame index.
fi-1: A pixel in a previous frame (fi and fi-1 are located at the same location.)
di: Absolute difference of fi and fi-1.
bi: B/F mask.
T: Threshold value.
Steps
1. di = |fi fi-1|
2. If di > T, fibelongs to the foreground; otherwise, it belongs to the background.
3.1.2 Drawbacks
A major (perhaps fatal) flaw of this method is that for objects with uniformly distributed
intensity values (such as the side of a car), the interior pixels are interpreted as part of the
background. Another problem is that objects must be continuously moving. If an object stays
still for more than a frame period (1/fps), it becomes part of the background.
3.1.3 Advantages
This method does have two major advantages. One obvious advantage is the modest
computational load. Another is that the background model is highly adaptive. Since the
background is based solely on the previous frame, it can adapt to changes in the background
faster than any other method (at 1/fps to be precise). As we'll see later on, the frame difference
method subtracts out extraneous background noise (such as waving trees), much better than the
more complex approximate median and mixture of Gaussians methods.
3.1.4 Challenges
A challenge with this method is determining the threshold value. (This is also a problem for the
other methods.) The threshold is typically found empirically, which can be tricky. The
threshold set too low will let every object pass through so it should not be set low. Also it
cannot be set high because it will block the foreground.
9
-
7/31/2019 toprep
10/30
Figure 2 : Background from one of the videos using Frame differencing model
3.2 Approximate Median
In median filtering, the previous N frames of video are buffered, and the background is
calculated as the median of buffered frames. Then (as with frame difference), the background is
subtracted from the current frame and thresholded to determine the foreground pixels.
Median filtering has been shown to be very robust and to have performance comparable to
higher complexity methods. However, storing and processing many frames of video (as is often
required to track slower moving objects) requires an often prohibitively large amount of
memory. This can be alleviated somewhat by storing and processing frames at a rate lower than
the frame rate thereby lowering storage and computation requirements at the expense of a
slower adapting background.
A more efficient compromise was devised back in 1995 by UK researchers N.J.B. McFarlane
and C.P. Schofield. While doing government funded research on piglet tracking in large
commercial farms, they came up with an efficient recursive approximation of the median filter.Their approximate median method, presented in their seminal paper, Segmentation and
10
http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/ -
7/31/2019 toprep
11/30
tracking of piglets in images, has since seen wide implementation in the background
subtraction literature, and been applied to a wide range of background subtraction scenarios.
3.2.1 Logic
The approximate median method works as such: if a pixel in the current frame has a value
larger than the corresponding background pixel, the background pixel is incremented by 1.
Likewise, if the current pixel is less than the background pixel, the background is decremented
by one. In this way, the background eventually converges to an estimate where half the input
pixels are greater than the background, and half are less than the backgroundapproximately
the median (convergence time will vary based on frame rate and amount movement in the
scene.)
3.2.2 Algorithm
The steps involved in the process of approximate median method are:
Read the video
Input the frame
Convert the frame into GRAY
Determine the threshold
Determine the frame difference value
Compare the threshold with the frame difference values
For the foreground values > background values make the background more light
For the foreground values > background values make the background more dark
Subplot the frame foreground and background
3.2.3 Efficiency
The approximate median method does a much better job at separating the entire object from the
background. This is because the more slowly adapting background incorporates a longer
11
http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/ -
7/31/2019 toprep
12/30
history of the visual scene, achieving about the same result as if we had buffered and processed
Nframes.
4. Coding
This section describes the coding used for the project. I will mention the codes for different
models in separate sections.
The video along with the directory must be mentioned in the source. Threshold must be chosen
for the background to be visible. The result will show that there is a slight deviation in the
output that is because of the problems discussed in section 1.1.
The code is shown next. I have given the code in the same format as it appears in Matlab. To
cover the entire code on a single page the code is shown on the next page.
The statement movie2avi(M,'frame_difference_output', 'fps', 30); is used to
save the output as an avi file, named 'frame_difference_output' running at
30 frames per second.
12
-
7/31/2019 toprep
13/30
4.1 Frame Difference
source = aviread('Filename'); %Give the path of the file
with name
thresh = 25; %Set threshold
bg = source(1).cdata; % read in 1st frame as
background frame
bg_bw = rgb2gray(bg); % convert background to
greyscale
% ----------------------- set frame size variables
-----------------------
fr_size = size(bg);
width = fr_size(2);
height = fr_size(1);
fg = zeros(height, width);
% --------------------- process frames
-----------------------------------
for i = 2:length(source)
fr = source(i).cdata; % read in framefr_bw = rgb2gray(fr); % convert frame to grayscale
13
-
7/31/2019 toprep
14/30
fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast
operands as double to avoid negative overflow
for j=1:width % if fr_diff > thresh pixel
in foreground
for k=1:height
if ((fr_diff(k,j) > thresh))
fg(k,j) = fr_bw(k,j);
else
fg(k,j) = 0;
end
end
end
bg_bw = fr_bw;
figure(1),subplot(3,1,1),imshow(fr)
subplot(3,1,2),imshow(fr_bw)
subplot(3,1,3),imshow(uint8(fg))
M(i-1) = im2frame(uint8(fg),gray); % put frames
into movie
end
% movie2avi(M,'frame_difference_output', 'fps', 30); % save
movie as avi
14
-
7/31/2019 toprep
15/30
4.2 Approximate Median
source = aviread('File name'); % Give path of the video
thresh = 28;
bg = source(1).cdata; % read in 1st frame as
background frame
bg_bw = double(rgb2gray(bg)); % convert background to
greyscale
% ----------------------- set frame size variables
-----------------------
fr_size = size(bg);
width = fr_size(2);
height = fr_size(1);
fg = zeros(height, width);
% --------------------- process frames
-----------------------------------
for i = 2:length(source)
fr = source(i).cdata;
fr_bw = rgb2gray(fr); % convert frame to grayscale
fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast
operands as double to avoid negative overflow
for j=1:width % if fr_diff > thresh pixel in
foreground
for k=1:height
if ((fr_diff(k,j) > thresh))
15
-
7/31/2019 toprep
16/30
fg(k,j) = fr_bw(k,j);
else
fg(k,j) = 0;
end
if (fr_bw(k,j) > bg_bw(k,j))
bg_bw(k,j) = bg_bw(k,j) + 1;
elseif (fr_bw(k,j) < bg_bw(k,j))
bg_bw(k,j) = bg_bw(k,j) - 1;
end
end
end
figure(1),subplot(3,1,1),imshow(fr)
subplot(3,1,2),imshow(uint8(bg_bw))
subplot(3,1,3),imshow(uint8(fg))
M(i-1) = im2frame(uint8(fg),gray); % save
output as movie
end
%movie2avi(M,'approximate_median_background','fps',15);
% save movie as avi
5. Results
16
-
7/31/2019 toprep
17/30
In this section I will show the results obtained for the techniques at different threshould and for
different video sequences.
5.1 Results for Frame Difference
5.1.1 Test Video 1
Threshold value 15
Threshold value 25
17
-
7/31/2019 toprep
18/30
Threshold value 45
18
-
7/31/2019 toprep
19/30
5.1.2. Test Video 2
19
-
7/31/2019 toprep
20/30
Threshold value 15 Threshold value 25 Threshold value 45
5.2 Results for Approximate Median
20
-
7/31/2019 toprep
21/30
5.2.1 Test Video 1
Threshold Value 15
Threshold Value 25
21
-
7/31/2019 toprep
22/30
Threshold Value 45
22
-
7/31/2019 toprep
23/30
5.2.2 Test Video 2
23
-
7/31/2019 toprep
24/30
Threshold value 15 Threshold value 25 Threshold value 45
6. Observations and Analysis of Results
24
-
7/31/2019 toprep
25/30
6.1 Frame Differencing
Frame difference method subtracts out extraneous background noise (such as waving
trees), much better than the more complex approximate median and mixture of
Gaussians methods.
As can be seen, a major (perhaps fatal) flaw of this method is that for objects with
uniformly distributed intensity values (such as the side of a car), the interior pixels are
interpreted as part of the background. Another problem is that objects must be
continuously moving. If an object stays still for more than a frame period (1/fps), it
becomes part of the background.
6.2 Approximate Median
The approximate median method works as such: if a pixel in the current frame has a value
larger than the corresponding background pixel, the background pixel is incremented by 1.
Likewise, if the current pixel is less than the background pixel, the background is decremented
by one.
the background eventually converges to an estimate where half the input pixels are greater
than the background, and half are less than the backgroundapproximately the median
(convergence time will vary based on frame rate and amount movement in the scene.)
As you can see, the approximate median method does a much better job at separating the
entire object from the background. This is because the more slowly adapting background
incorporates a longer history of the visual scene, achieving about the same result as if we
had buffered and processedNframes.
We do see some trails behind the larger objects (the cars). This is due to updating the
background at a relatively high rate (30 fps). In a real application, the frame rate would
likely be lower (say, 15 fps).
The processing time increases with larger frame sequence.
7. Conclusion
25
-
7/31/2019 toprep
26/30
Each technique has its merits. The conclusion.
In frame difference one obvious advantage is the modest computational load. Another is that
the background model is highly adaptive. Since the background is based solely on the previous
frame, it can adapt to changes in the background faster than any other method (at 1/fps to be
precise). As we'll see later on, the frame A challenge with this method is determining the
threshold value.
Approximate Median is a very good compromise. It offers performance near what you can
achieve with higher-complexity methods (according to my research and the academic
literature), and it costs not much more in computation and storage than frame differencing.
26
-
7/31/2019 toprep
27/30
Appendices
References
27
-
7/31/2019 toprep
28/30
[1] Cristani, M., Bicego, M., Murino, V.: Multi-level background initialization
using Hidden Markov Models. In: First ACM SIGMM Int. workshop on Video
surveillance, pp. 1120 (2003)
[2] Piccardi, M.: Background subtraction techniques: a review. In: SMC 2004,
vol. 4,pp. 30993104 (2004)
[3] Cheung, S.-C., Kamath, C.: Robust techniques for background subtraction in
urban traffic video. In: Panchanathan, S., Vasudev, B. (eds.) Proc. Elect
Imaging: Visual Comm. Image Proce. (Part One) SPIE, vol. 5308, pp. 881892
(2004)
[4] Cucchiara, R.: People Surveillance, VISMAC Palermo (2006)
[5] Ewerth, R., Freisleben, B.: Frame difference normalization: an approach to
reduce errorrates of cut detection algorithms for MPEG videos. In: ICIP, pp.
10091012 (2003)
[6] 1. C. Stauffer and W Grimson. Adaptive Background Mixture Models for
Real-Time Tracking, Proc IEEE Conference Computer Vision and Pattern
Recognition, 1999.
[7] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of
Computer algorithms. Addison Wesley, Reading, MA 1974.
[7] Wren, A., Darrell, P.: Pfinder: Real-time tracking of the human body.
PAMI (1997)
[8] Cavallaro, A., Steiger, O., Ebrahimi, T.: Semantic video analysis for
adaptive content delivery and automatic description. IEEE Transactions onCircuits and Systems for Video Technology 15(10), 12001209 (2005)
[9] Stauffer, G.: Adaptive background mixture models for real-time tracking. In:
CVPR (1999)
[10] Carminati, L., Benois-Pineau, J.: Gaussian mixture classification for
moving object detection in video surveillance environment. In: ICIP, pp. 113
116 (2005)
28
-
7/31/2019 toprep
29/30
[11] Comaniciu, D.: Mean shift: a robust approach toward feature space
analysis. IEEE Transactions on Pattern Analysis and machine Intelligence
24(5), 603 (2002)
[12] Elgammal, A.M., Harwood, D., Davis, L.S.: Non-parametric model for
background subtraction. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp.
751767. Springer, Heidelberg (2000)
[13] Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation
per image pixel for the task of background subtraction. Pattern Recognition
Letters 27(7), 773780 (2006)
[14] Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., Buhmann, J.M.:
Topology Free Hidden Markov Models: Application to Background Modeling.
In: Eighth Int. Conf. on Computer Vision, ICCV 2001, vol. 1, pp. 294301
(2001)
[15] Mittal, A., Paragios, N.: Motion-based background subtraction using
adaptive kernel density estimation. In: Proceedings of the Int. Conf. Comp.
Vision and Patt. Recog., CVPR, pp. 302309 (2004)
[16] Tiburzi, F., Escudero, M., Bescs, J., Martnez, J.M.: A Corpus for Motion-
based Videoobject Segmentation. In: IEEE International Conference on Image
Processing (Workshop on Multimedia Information Retrieval), ICIP 2008,
SanDiego, USA (2008)
[17] El Baf, F., Bouwmans, T., Vachon, B.: Comparison of Background
Subtraction Methods for a Multimedia Application. In: 14th InternationalConference on systems, Signals and Image Processing, IWSSIP 2007, Maribor,
Slovenia, pp. 385388 (2007)
[18] Parks, D.H., Fels, S.S.: Evaluation of Background Subtraction Algorithms
with Post- Processing. In: IEEE Fifth International Conference on Advanced
Video and Signal Based Surveillance, AVSS 2008, pp. 192199 (2008)
[19] Z. Zivkovic and F. van der Heijden, Efficient adaptive density estimation
per image pixel for the task of background subtraction, Pattern
29
-
7/31/2019 toprep
30/30
Recognition Letters, vol. 27, no. 7, pp. 773780, May 2006.
[20] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2007.
[21] AMD/ATI, ATI CTM (Close to Mental) Guide, 2007.
[22] J. Fung, Advances in GPU-based Image Processing and Computer
Vision, in SIGGRAPH, 2009.
[23] S.-j. Lee and C.-s. Jeong, Real-time Object Segmentation based on GPU,
2006 International Conference on Computational Intelligence and Security, pp.
739742, November 2006.
[24] P. Carr, GPU Accelerated Multimodal Background Subtraction. Digital
Image Computing: Techniques and Applications, December 2008.
[25] Apple Inc., Core Image Programming Guide, 2008.
[26] NVIDIA Corporation, NVIDIA CUDA Best Practices Guide, 2010.
[27] VSSN 2006 Competition, 2006. [Online]. Available: http://mmc36.
informatik.uni-augsburg.de/VSSN06_OSAC/
[28] PETS 2009 Benchmark Data, 2009. [Online]. Available: http:
//www.cvg.rdg.ac.uk/PETS2009/a.html
[30]//www.eetimes.com