toprep

7/31/2019 toprep

1/30

Abstract

Background subtraction is a technique of separating out foreground objects from the

background in a sequence of video frames. Background/Foreground must arguably be one of

the most popular research topics in computer vision industries. It finds use in a lot of

applications related to video, such as video surveillance, traffic monitoring, and gesture

recognition for human-machine interfaces, to name a few.

Many methods exist for background subtraction, each with different strengths and weaknesses

in terms of performance and computational requirements. My project focuses mainly on two of

the used techniques. The results of the techniques have been taken for different video

sequences. I have tried to draw a comparison between the two techniques namely FrameDifferencing and Approximate Median.

These techniques were chosen for the reasons of being:

Computationally efficient for many low power applications.

Being a good representation of background subtraction implementations in today's

video applications.

1

7/31/2019 toprep

2/30

1. Introduction

BGS techniques are defined by the background model and the foreground detection process.

According to [1], the background model is described by three aspects: the initialization, the

representation and the update process of the scene background. A correct initialization allows

acquiring a background of the scene without errors. For instance, techniques that analyse video

sequences with presence of moving objects in the whole sequence should consider different

initialization schemes to avoid the acquisition of an incorrect background of the scene. The

representation describes the mathematical techniques used to model the value of each

background pixel. For instance, unimodal sequences (where background pixels variationfollows a unimodal scheme) need more simple models to describe the background of the scene

than the multimodal ones (where background pixels, due to scene dynamism, vary following

more complex schema). The update process allows incorporating specific global changes in the

background model, such as those owing to illumination and viewpoint variation. Additionally,

these techniques usually include pre-processing and post-processing stages to improve final

foreground detection results.

1.1 Challenges for any model

The background image is not fixed but must adapt to:

Illumination changes

Motion changes

Changes in the background geometry.

These changes might result in errors and hence these must be dealt with

wisely.

2

7/31/2019 toprep

3/30

1.2Classification of techniques

The BGS techniques, classified according to the model representation, just in order to identify

their most relevant parameters and implementation details that might diverge from the

referenced work.

1.2.1 Basic Models

Frame differencing (FD) [5]: also known as temporal difference, this method uses the

previous frame as background model for the current frame. Setting a threshold, , on the

squared difference between model and frame decides on foreground and background. This

threshold is the analysed parameter.

Median filtering (MF) [3]: uses the median of the previous N frames as background model for

the current frame. As FD, a threshold on the model-frame difference decides. This threshold

is the only analysed parameter. This method claims to be very robust, but requires memory

resources to store the last N frames.

1.2.2 Parametric Models

Simple Gaussian (SG) [7]: represents each background pixel variation with a Gaussian

distribution. For every new frame, a pixel is determined to belong to the background if it falls

into a deviation, , around the mean. The parameters of each Gaussian are updated with the

current frame pixel by using a running average scheme [6], controlled by a learning factor

.The initial deviation value, o, and are the analysed parameters.

Mixture of Gaussians (MoG) [9]: represents each background pixel variation with a set of

weighted Gaussian distributions. Distributions are ordered according to its weight; the more

relevant (until the accumulated weight gets past a threshold, ) are considered to model the

background; the remaining models the foreground. A pixel is decided to belong to the

background if it falls into a deviation,, around the mean of any of the Gaussians that model it.

The update process is only performed on the Gaussian distribution that describes the pixel

3

7/31/2019 toprep

4/30

value in the current frame, also following a running average scheme with parameter . The

initial Gaussians deviation value,o, the threshold , and are the analysed parameters.

Gamma method (G) [8]: in practice represents each background pixel with a running averageof its previous values. The decision on a pixel belonging to the background is performed by

summing up the square differences between pixel values of a square spatial window centred in

the considered pixel frame and the corresponding background model values, and setting a

threshold over it. A theoretical development, based on the assumption that the pixel variation

follows a Gaussian, concludes that the threshold function follows a Chi-square distribution,

which bases the threshold selection, in fact a probability. This threshold is the analysed

parameter.

1.2.3 Non-Parametric Models

Histogram-based approach (Hb) [15]: represents each background pixel variation with a

histogram of its last N values, which is re-computed every L frames (L

7/31/2019 toprep

5/30

2. Equipment and Methodology

2.1 Equipment used

The equipment used for any setting will include a digital camera and a processing tool that will

perform the extraction process. The background subtraction algorithms typically process lower

resolution grayscale video, so it might also require a video editing and compressing software.

2.1.1 MATLAB

For the project, I chose to use Matlab as the programming language. It is a high-level language

that specializes in data analysis and computing mathematical problems. Matlabs official

website can be found at www.mathworks.com. The program environment has an interactive

command window that allows users to test and experiment with the code line by line. Users can

also save their codes into an M-file and run the program. The Matlab Help Navigator is also

very useful. It properly categorizes and provides detailed explanations and sample usages of all

functions. Just like C++ and Java, the language syntax provides loops and condition statements

for programming purposes.

The language was chosen over C++ and Java because there are a lot of built-in functions

that are specific for image processing. As well, the compiler can compute large mathematical

equations faster than other languages. These advantages suit the project perfectly due to the

large matrix computations required during the extraction process. There were some minor

problems that occurred during the working of the project. The first problem was that Matlab is

a complete new language and environment for me. I had to get myself familiarized with Matlab

by practicing simple tutorials and exploring with the programming environment. Another

problem that arose is that Matlab takes a long time running the segmentation code.

5

7/31/2019 toprep

6/30

When compared to C++ and Java, Matlab can calculate matrices quicker, but the large video

files take a long time for a scripting language to compile. Lastly, the Matlab software

environment requires a lot of memory to run. During the process of starting up and compiling,

windows often cannot provide enough memory for Matlab and windows will sometimes

shutdown automatically.

2.2 Assumptions

Assumptions for the project may alter the decision for choosing the practical extraction

algorithm. There are many different techniques that can extract background/foregrounds from

videos. There are a couple assumptions relative to the background environment of the video. The first assumption allows the user to choose the location of the video. It can be filmed

either indoor or outdoor.

Secondly, lighting in the video must always be constant due to the difficulty that arises

when a light source changes its brightness or location.

Also, the background of the video must be static. No moving objects are allowed in the

background. Even slight movement such as a reflection off of a window can create unwantednoises. Lastly, the software process is not required to run at real-time. This assumption greatly

reduces the complexity of the software.

The videos are taken for two setting inside and outside environments. (The earlier discussed

problems of moving background and illumination have caused some minor deviations in the

result but overall it seems to be running fine.)

2.3 Algorithm

6

7/31/2019 toprep

7/30

The two models used basically use a similar approach in the sense that we need to compare the

frame difference with a threshold. For instance in frame differencing the step that does the job

of background/foreground detection is

The algorithm will be taken up with the techniques later on.

7

7/31/2019 toprep

8/30

3. Designs and Logic

I will now discuss the two basic models used for Background/Foreground subtraction. The

algorithm and steps involved will be given with it.

3.1 Frame Difference

Frame difference is arguably the simplest form of background subtraction. The current frame is

simply subtracted from the previous frame, and if the difference in pixel values for a given

pixel is greater than a threshold Ts, the pixel is considered part of the foreground.

3.1.1 Algorithm

Figure 1:The figure shows Algorithm for Frame Differencing model

8

7/31/2019 toprep

9/30

The algorithm of two-frame-based B/F detection is described below.

fi : A pixel in a current frame, where I is the frame index.

fi-1: A pixel in a previous frame (fi and fi-1 are located at the same location.)

di: Absolute difference of fi and fi-1.

bi: B/F mask.

T: Threshold value.

Steps

1. di = |fi fi-1|

2. If di > T, fibelongs to the foreground; otherwise, it belongs to the background.

3.1.2 Drawbacks

A major (perhaps fatal) flaw of this method is that for objects with uniformly distributed

intensity values (such as the side of a car), the interior pixels are interpreted as part of the

background. Another problem is that objects must be continuously moving. If an object stays

still for more than a frame period (1/fps), it becomes part of the background.

3.1.3 Advantages

This method does have two major advantages. One obvious advantage is the modest

computational load. Another is that the background model is highly adaptive. Since the

background is based solely on the previous frame, it can adapt to changes in the background

faster than any other method (at 1/fps to be precise). As we'll see later on, the frame difference

method subtracts out extraneous background noise (such as waving trees), much better than the

more complex approximate median and mixture of Gaussians methods.

3.1.4 Challenges

A challenge with this method is determining the threshold value. (This is also a problem for the

other methods.) The threshold is typically found empirically, which can be tricky. The

threshold set too low will let every object pass through so it should not be set low. Also it

cannot be set high because it will block the foreground.

9

7/31/2019 toprep

10/30

Figure 2 : Background from one of the videos using Frame differencing model

3.2 Approximate Median

In median filtering, the previous N frames of video are buffered, and the background is

calculated as the median of buffered frames. Then (as with frame difference), the background is

subtracted from the current frame and thresholded to determine the foreground pixels.

Median filtering has been shown to be very robust and to have performance comparable to

higher complexity methods. However, storing and processing many frames of video (as is often

required to track slower moving objects) requires an often prohibitively large amount of

memory. This can be alleviated somewhat by storing and processing frames at a rate lower than

the frame rate thereby lowering storage and computation requirements at the expense of a

slower adapting background.

A more efficient compromise was devised back in 1995 by UK researchers N.J.B. McFarlane

and C.P. Schofield. While doing government funded research on piglet tracking in large

commercial farms, they came up with an efficient recursive approximation of the median filter.Their approximate median method, presented in their seminal paper, Segmentation and

10
http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/

7/31/2019 toprep

11/30

tracking of piglets in images, has since seen wide implementation in the background

subtraction literature, and been applied to a wide range of background subtraction scenarios.

3.2.1 Logic

The approximate median method works as such: if a pixel in the current frame has a value

larger than the corresponding background pixel, the background pixel is incremented by 1.

Likewise, if the current pixel is less than the background pixel, the background is decremented

by one. In this way, the background eventually converges to an estimate where half the input

pixels are greater than the background, and half are less than the backgroundapproximately

the median (convergence time will vary based on frame rate and amount movement in the

scene.)

3.2.2 Algorithm

The steps involved in the process of approximate median method are:

Read the video

Input the frame

Convert the frame into GRAY

Determine the threshold

Determine the frame difference value

Compare the threshold with the frame difference values

For the foreground values > background values make the background more light

For the foreground values > background values make the background more dark

Subplot the frame foreground and background

3.2.3 Efficiency

The approximate median method does a much better job at separating the entire object from the

background. This is because the more slowly adapting background incorporates a longer

11
http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/

7/31/2019 toprep

12/30

history of the visual scene, achieving about the same result as if we had buffered and processed

Nframes.

4. Coding

This section describes the coding used for the project. I will mention the codes for different

models in separate sections.

The video along with the directory must be mentioned in the source. Threshold must be chosen

for the background to be visible. The result will show that there is a slight deviation in the

output that is because of the problems discussed in section 1.1.

The code is shown next. I have given the code in the same format as it appears in Matlab. To

cover the entire code on a single page the code is shown on the next page.

The statement movie2avi(M,'frame_difference_output', 'fps', 30); is used to

save the output as an avi file, named 'frame_difference_output' running at

30 frames per second.

12

7/31/2019 toprep

13/30

4.1 Frame Difference

source = aviread('Filename'); %Give the path of the file

with name

thresh = 25; %Set threshold

bg = source(1).cdata; % read in 1st frame as

background frame

bg_bw = rgb2gray(bg); % convert background to

greyscale

% ----------------------- set frame size variables

-----------------------

fr_size = size(bg);

width = fr_size(2);

height = fr_size(1);

fg = zeros(height, width);

% --------------------- process frames

-----------------------------------

for i = 2:length(source)

fr = source(i).cdata; % read in framefr_bw = rgb2gray(fr); % convert frame to grayscale

13

7/31/2019 toprep

14/30

fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast

operands as double to avoid negative overflow

for j=1:width % if fr_diff > thresh pixel

in foreground

for k=1:height

if ((fr_diff(k,j) > thresh))

fg(k,j) = fr_bw(k,j);

else

fg(k,j) = 0;

end

end

end

bg_bw = fr_bw;

figure(1),subplot(3,1,1),imshow(fr)

subplot(3,1,2),imshow(fr_bw)

subplot(3,1,3),imshow(uint8(fg))

M(i-1) = im2frame(uint8(fg),gray); % put frames

into movie

end

% movie2avi(M,'frame_difference_output', 'fps', 30); % save

movie as avi

14

7/31/2019 toprep

15/30


source = aviread('File name'); % Give path of the video

thresh = 28;

bg = source(1).cdata; % read in 1st frame as

background frame

bg_bw = double(rgb2gray(bg)); % convert background to

greyscale

% ----------------------- set frame size variables

-----------------------

fr_size = size(bg);

width = fr_size(2);

height = fr_size(1);

fg = zeros(height, width);

% --------------------- process frames

-----------------------------------

for i = 2:length(source)

fr = source(i).cdata;

fr_bw = rgb2gray(fr); % convert frame to grayscale

fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast

operands as double to avoid negative overflow

for j=1:width % if fr_diff > thresh pixel in

foreground

for k=1:height

if ((fr_diff(k,j) > thresh))

15

7/31/2019 toprep

16/30

fg(k,j) = fr_bw(k,j);

else

fg(k,j) = 0;

end

if (fr_bw(k,j) > bg_bw(k,j))

bg_bw(k,j) = bg_bw(k,j) + 1;

elseif (fr_bw(k,j) < bg_bw(k,j))

bg_bw(k,j) = bg_bw(k,j) - 1;

end

end

end

figure(1),subplot(3,1,1),imshow(fr)

subplot(3,1,2),imshow(uint8(bg_bw))

subplot(3,1,3),imshow(uint8(fg))

M(i-1) = im2frame(uint8(fg),gray); % save

output as movie

end

%movie2avi(M,'approximate_median_background','fps',15);

% save movie as avi

5. Results

16

7/31/2019 toprep

17/30

In this section I will show the results obtained for the techniques at different threshould and for

different video sequences.

5.1 Results for Frame Difference

5.1.1 Test Video 1

Threshold value 15

Threshold value 25

17

7/31/2019 toprep

18/30

Threshold value 45

18

7/31/2019 toprep

19/30

5.1.2. Test Video 2

19

7/31/2019 toprep

20/30

Threshold value 15 Threshold value 25 Threshold value 45

5.2 Results for Approximate Median

20

7/31/2019 toprep

21/30

5.2.1 Test Video 1

Threshold Value 15

Threshold Value 25

21

7/31/2019 toprep

22/30

Threshold Value 45

22

7/31/2019 toprep

23/30

5.2.2 Test Video 2

23

7/31/2019 toprep

24/30

Threshold value 15 Threshold value 25 Threshold value 45

6. Observations and Analysis of Results

24

7/31/2019 toprep

25/30

6.1 Frame Differencing

Frame difference method subtracts out extraneous background noise (such as waving

trees), much better than the more complex approximate median and mixture of

Gaussians methods.

As can be seen, a major (perhaps fatal) flaw of this method is that for objects with

uniformly distributed intensity values (such as the side of a car), the interior pixels are

interpreted as part of the background. Another problem is that objects must be

continuously moving. If an object stays still for more than a frame period (1/fps), it

becomes part of the background.


The approximate median method works as such: if a pixel in the current frame has a value

larger than the corresponding background pixel, the background pixel is incremented by 1.

Likewise, if the current pixel is less than the background pixel, the background is decremented

by one.

the background eventually converges to an estimate where half the input pixels are greater

than the background, and half are less than the backgroundapproximately the median

(convergence time will vary based on frame rate and amount movement in the scene.)

As you can see, the approximate median method does a much better job at separating the

entire object from the background. This is because the more slowly adapting background

incorporates a longer history of the visual scene, achieving about the same result as if we

had buffered and processedNframes.

We do see some trails behind the larger objects (the cars). This is due to updating the

background at a relatively high rate (30 fps). In a real application, the frame rate would

likely be lower (say, 15 fps).

The processing time increases with larger frame sequence.

7. Conclusion

25

7/31/2019 toprep

26/30

Each technique has its merits. The conclusion.

In frame difference one obvious advantage is the modest computational load. Another is that

the background model is highly adaptive. Since the background is based solely on the previous

frame, it can adapt to changes in the background faster than any other method (at 1/fps to be

precise). As we'll see later on, the frame A challenge with this method is determining the

threshold value.

Approximate Median is a very good compromise. It offers performance near what you can

achieve with higher-complexity methods (according to my research and the academic

literature), and it costs not much more in computation and storage than frame differencing.

26

7/31/2019 toprep

27/30

Appendices

References

27

7/31/2019 toprep

28/30

[1] Cristani, M., Bicego, M., Murino, V.: Multi-level background initialization

using Hidden Markov Models. In: First ACM SIGMM Int. workshop on Video

surveillance, pp. 1120 (2003)

[2] Piccardi, M.: Background subtraction techniques: a review. In: SMC 2004,

vol. 4,pp. 30993104 (2004)

[3] Cheung, S.-C., Kamath, C.: Robust techniques for background subtraction in

urban traffic video. In: Panchanathan, S., Vasudev, B. (eds.) Proc. Elect

Imaging: Visual Comm. Image Proce. (Part One) SPIE, vol. 5308, pp. 881892

(2004)

[4] Cucchiara, R.: People Surveillance, VISMAC Palermo (2006)

[5] Ewerth, R., Freisleben, B.: Frame difference normalization: an approach to

reduce errorrates of cut detection algorithms for MPEG videos. In: ICIP, pp.

10091012 (2003)

[6] 1. C. Stauffer and W Grimson. Adaptive Background Mixture Models for

Real-Time Tracking, Proc IEEE Conference Computer Vision and Pattern

Recognition, 1999.

[7] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of

Computer algorithms. Addison Wesley, Reading, MA 1974.

[7] Wren, A., Darrell, P.: Pfinder: Real-time tracking of the human body.

PAMI (1997)

[8] Cavallaro, A., Steiger, O., Ebrahimi, T.: Semantic video analysis for

adaptive content delivery and automatic description. IEEE Transactions onCircuits and Systems for Video Technology 15(10), 12001209 (2005)

[9] Stauffer, G.: Adaptive background mixture models for real-time tracking. In:

CVPR (1999)

[10] Carminati, L., Benois-Pineau, J.: Gaussian mixture classification for

moving object detection in video surveillance environment. In: ICIP, pp. 113

116 (2005)

28

7/31/2019 toprep

29/30

[11] Comaniciu, D.: Mean shift: a robust approach toward feature space

analysis. IEEE Transactions on Pattern Analysis and machine Intelligence

24(5), 603 (2002)

[12] Elgammal, A.M., Harwood, D., Davis, L.S.: Non-parametric model for

background subtraction. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp.

751767. Springer, Heidelberg (2000)

[13] Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation

per image pixel for the task of background subtraction. Pattern Recognition

Letters 27(7), 773780 (2006)

[14] Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., Buhmann, J.M.:

Topology Free Hidden Markov Models: Application to Background Modeling.

In: Eighth Int. Conf. on Computer Vision, ICCV 2001, vol. 1, pp. 294301

(2001)

[15] Mittal, A., Paragios, N.: Motion-based background subtraction using

adaptive kernel density estimation. In: Proceedings of the Int. Conf. Comp.

Vision and Patt. Recog., CVPR, pp. 302309 (2004)

[16] Tiburzi, F., Escudero, M., Bescs, J., Martnez, J.M.: A Corpus for Motion-

based Videoobject Segmentation. In: IEEE International Conference on Image

Processing (Workshop on Multimedia Information Retrieval), ICIP 2008,

SanDiego, USA (2008)

[17] El Baf, F., Bouwmans, T., Vachon, B.: Comparison of Background

Subtraction Methods for a Multimedia Application. In: 14th InternationalConference on systems, Signals and Image Processing, IWSSIP 2007, Maribor,

Slovenia, pp. 385388 (2007)

[18] Parks, D.H., Fels, S.S.: Evaluation of Background Subtraction Algorithms

with Post- Processing. In: IEEE Fifth International Conference on Advanced

Video and Signal Based Surveillance, AVSS 2008, pp. 192199 (2008)

[19] Z. Zivkovic and F. van der Heijden, Efficient adaptive density estimation

per image pixel for the task of background subtraction, Pattern

29

7/31/2019 toprep

30/30

Recognition Letters, vol. 27, no. 7, pp. 773780, May 2006.

[20] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2007.

[21] AMD/ATI, ATI CTM (Close to Mental) Guide, 2007.

[22] J. Fung, Advances in GPU-based Image Processing and Computer

Vision, in SIGGRAPH, 2009.

[23] S.-j. Lee and C.-s. Jeong, Real-time Object Segmentation based on GPU,

2006 International Conference on Computational Intelligence and Security, pp.

739742, November 2006.

[24] P. Carr, GPU Accelerated Multimodal Background Subtraction. Digital

Image Computing: Techniques and Applications, December 2008.

[25] Apple Inc., Core Image Programming Guide, 2008.

[26] NVIDIA Corporation, NVIDIA CUDA Best Practices Guide, 2010.

[27] VSSN 2006 Competition, 2006. [Online]. Available: http://mmc36.

informatik.uni-augsburg.de/VSSN06_OSAC/

[28] PETS 2009 Benchmark Data, 2009. [Online]. Available: http:

//www.cvg.rdg.ac.uk/PETS2009/a.html

[30]//www.eetimes.com

toprep

Documents