toprep

Upload: syed-amir-ali

Post on 05-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 toprep

    1/30

    Abstract

    Background subtraction is a technique of separating out foreground objects from the

    background in a sequence of video frames. Background/Foreground must arguably be one of

    the most popular research topics in computer vision industries. It finds use in a lot of

    applications related to video, such as video surveillance, traffic monitoring, and gesture

    recognition for human-machine interfaces, to name a few.

    Many methods exist for background subtraction, each with different strengths and weaknesses

    in terms of performance and computational requirements. My project focuses mainly on two of

    the used techniques. The results of the techniques have been taken for different video

    sequences. I have tried to draw a comparison between the two techniques namely FrameDifferencing and Approximate Median.

    These techniques were chosen for the reasons of being:

    Computationally efficient for many low power applications.

    Being a good representation of background subtraction implementations in today's

    video applications.

    1

  • 7/31/2019 toprep

    2/30

    1. Introduction

    BGS techniques are defined by the background model and the foreground detection process.

    According to [1], the background model is described by three aspects: the initialization, the

    representation and the update process of the scene background. A correct initialization allows

    acquiring a background of the scene without errors. For instance, techniques that analyse video

    sequences with presence of moving objects in the whole sequence should consider different

    initialization schemes to avoid the acquisition of an incorrect background of the scene. The

    representation describes the mathematical techniques used to model the value of each

    background pixel. For instance, unimodal sequences (where background pixels variationfollows a unimodal scheme) need more simple models to describe the background of the scene

    than the multimodal ones (where background pixels, due to scene dynamism, vary following

    more complex schema). The update process allows incorporating specific global changes in the

    background model, such as those owing to illumination and viewpoint variation. Additionally,

    these techniques usually include pre-processing and post-processing stages to improve final

    foreground detection results.

    1.1 Challenges for any model

    The background image is not fixed but must adapt to:

    Illumination changes

    Motion changes

    Changes in the background geometry.

    These changes might result in errors and hence these must be dealt with

    wisely.

    2

  • 7/31/2019 toprep

    3/30

    1.2Classification of techniques

    The BGS techniques, classified according to the model representation, just in order to identify

    their most relevant parameters and implementation details that might diverge from the

    referenced work.

    1.2.1 Basic Models

    Frame differencing (FD) [5]: also known as temporal difference, this method uses the

    previous frame as background model for the current frame. Setting a threshold, , on the

    squared difference between model and frame decides on foreground and background. This

    threshold is the analysed parameter.

    Median filtering (MF) [3]: uses the median of the previous N frames as background model for

    the current frame. As FD, a threshold on the model-frame difference decides. This threshold

    is the only analysed parameter. This method claims to be very robust, but requires memory

    resources to store the last N frames.

    1.2.2 Parametric Models

    Simple Gaussian (SG) [7]: represents each background pixel variation with a Gaussian

    distribution. For every new frame, a pixel is determined to belong to the background if it falls

    into a deviation, , around the mean. The parameters of each Gaussian are updated with the

    current frame pixel by using a running average scheme [6], controlled by a learning factor

    .The initial deviation value, o, and are the analysed parameters.

    Mixture of Gaussians (MoG) [9]: represents each background pixel variation with a set of

    weighted Gaussian distributions. Distributions are ordered according to its weight; the more

    relevant (until the accumulated weight gets past a threshold, ) are considered to model the

    background; the remaining models the foreground. A pixel is decided to belong to the

    background if it falls into a deviation,, around the mean of any of the Gaussians that model it.

    The update process is only performed on the Gaussian distribution that describes the pixel

    3

  • 7/31/2019 toprep

    4/30

    value in the current frame, also following a running average scheme with parameter . The

    initial Gaussians deviation value,o, the threshold , and are the analysed parameters.

    Gamma method (G) [8]: in practice represents each background pixel with a running averageof its previous values. The decision on a pixel belonging to the background is performed by

    summing up the square differences between pixel values of a square spatial window centred in

    the considered pixel frame and the corresponding background model values, and setting a

    threshold over it. A theoretical development, based on the assumption that the pixel variation

    follows a Gaussian, concludes that the threshold function follows a Chi-square distribution,

    which bases the threshold selection, in fact a probability. This threshold is the analysed

    parameter.

    1.2.3 Non-Parametric Models

    Histogram-based approach (Hb) [15]: represents each background pixel variation with a

    histogram of its last N values, which is re-computed every L frames (L

  • 7/31/2019 toprep

    5/30

    2. Equipment and Methodology

    2.1 Equipment used

    The equipment used for any setting will include a digital camera and a processing tool that will

    perform the extraction process. The background subtraction algorithms typically process lower

    resolution grayscale video, so it might also require a video editing and compressing software.

    2.1.1 MATLAB

    For the project, I chose to use Matlab as the programming language. It is a high-level language

    that specializes in data analysis and computing mathematical problems. Matlabs official

    website can be found at www.mathworks.com. The program environment has an interactive

    command window that allows users to test and experiment with the code line by line. Users can

    also save their codes into an M-file and run the program. The Matlab Help Navigator is also

    very useful. It properly categorizes and provides detailed explanations and sample usages of all

    functions. Just like C++ and Java, the language syntax provides loops and condition statements

    for programming purposes.

    The language was chosen over C++ and Java because there are a lot of built-in functions

    that are specific for image processing. As well, the compiler can compute large mathematical

    equations faster than other languages. These advantages suit the project perfectly due to the

    large matrix computations required during the extraction process. There were some minor

    problems that occurred during the working of the project. The first problem was that Matlab is

    a complete new language and environment for me. I had to get myself familiarized with Matlab

    by practicing simple tutorials and exploring with the programming environment. Another

    problem that arose is that Matlab takes a long time running the segmentation code.

    5

  • 7/31/2019 toprep

    6/30

    When compared to C++ and Java, Matlab can calculate matrices quicker, but the large video

    files take a long time for a scripting language to compile. Lastly, the Matlab software

    environment requires a lot of memory to run. During the process of starting up and compiling,

    windows often cannot provide enough memory for Matlab and windows will sometimes

    shutdown automatically.

    2.2 Assumptions

    Assumptions for the project may alter the decision for choosing the practical extraction

    algorithm. There are many different techniques that can extract background/foregrounds from

    videos. There are a couple assumptions relative to the background environment of the video. The first assumption allows the user to choose the location of the video. It can be filmed

    either indoor or outdoor.

    Secondly, lighting in the video must always be constant due to the difficulty that arises

    when a light source changes its brightness or location.

    Also, the background of the video must be static. No moving objects are allowed in the

    background. Even slight movement such as a reflection off of a window can create unwantednoises. Lastly, the software process is not required to run at real-time. This assumption greatly

    reduces the complexity of the software.

    The videos are taken for two setting inside and outside environments. (The earlier discussed

    problems of moving background and illumination have caused some minor deviations in the

    result but overall it seems to be running fine.)

    2.3 Algorithm

    6

  • 7/31/2019 toprep

    7/30

    The two models used basically use a similar approach in the sense that we need to compare the

    frame difference with a threshold. For instance in frame differencing the step that does the job

    of background/foreground detection is

    The algorithm will be taken up with the techniques later on.

    7

  • 7/31/2019 toprep

    8/30

    3. Designs and Logic

    I will now discuss the two basic models used for Background/Foreground subtraction. The

    algorithm and steps involved will be given with it.

    3.1 Frame Difference

    Frame difference is arguably the simplest form of background subtraction. The current frame is

    simply subtracted from the previous frame, and if the difference in pixel values for a given

    pixel is greater than a threshold Ts, the pixel is considered part of the foreground.

    3.1.1 Algorithm

    Figure 1:The figure shows Algorithm for Frame Differencing model

    8

  • 7/31/2019 toprep

    9/30

    The algorithm of two-frame-based B/F detection is described below.

    fi : A pixel in a current frame, where I is the frame index.

    fi-1: A pixel in a previous frame (fi and fi-1 are located at the same location.)

    di: Absolute difference of fi and fi-1.

    bi: B/F mask.

    T: Threshold value.

    Steps

    1. di = |fi fi-1|

    2. If di > T, fibelongs to the foreground; otherwise, it belongs to the background.

    3.1.2 Drawbacks

    A major (perhaps fatal) flaw of this method is that for objects with uniformly distributed

    intensity values (such as the side of a car), the interior pixels are interpreted as part of the

    background. Another problem is that objects must be continuously moving. If an object stays

    still for more than a frame period (1/fps), it becomes part of the background.

    3.1.3 Advantages

    This method does have two major advantages. One obvious advantage is the modest

    computational load. Another is that the background model is highly adaptive. Since the

    background is based solely on the previous frame, it can adapt to changes in the background

    faster than any other method (at 1/fps to be precise). As we'll see later on, the frame difference

    method subtracts out extraneous background noise (such as waving trees), much better than the

    more complex approximate median and mixture of Gaussians methods.

    3.1.4 Challenges

    A challenge with this method is determining the threshold value. (This is also a problem for the

    other methods.) The threshold is typically found empirically, which can be tricky. The

    threshold set too low will let every object pass through so it should not be set low. Also it

    cannot be set high because it will block the foreground.

    9

  • 7/31/2019 toprep

    10/30

    Figure 2 : Background from one of the videos using Frame differencing model

    3.2 Approximate Median

    In median filtering, the previous N frames of video are buffered, and the background is

    calculated as the median of buffered frames. Then (as with frame difference), the background is

    subtracted from the current frame and thresholded to determine the foreground pixels.

    Median filtering has been shown to be very robust and to have performance comparable to

    higher complexity methods. However, storing and processing many frames of video (as is often

    required to track slower moving objects) requires an often prohibitively large amount of

    memory. This can be alleviated somewhat by storing and processing frames at a rate lower than

    the frame rate thereby lowering storage and computation requirements at the expense of a

    slower adapting background.

    A more efficient compromise was devised back in 1995 by UK researchers N.J.B. McFarlane

    and C.P. Schofield. While doing government funded research on piglet tracking in large

    commercial farms, they came up with an efficient recursive approximation of the median filter.Their approximate median method, presented in their seminal paper, Segmentation and

    10

    http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/
  • 7/31/2019 toprep

    11/30

    tracking of piglets in images, has since seen wide implementation in the background

    subtraction literature, and been applied to a wide range of background subtraction scenarios.

    3.2.1 Logic

    The approximate median method works as such: if a pixel in the current frame has a value

    larger than the corresponding background pixel, the background pixel is incremented by 1.

    Likewise, if the current pixel is less than the background pixel, the background is decremented

    by one. In this way, the background eventually converges to an estimate where half the input

    pixels are greater than the background, and half are less than the backgroundapproximately

    the median (convergence time will vary based on frame rate and amount movement in the

    scene.)

    3.2.2 Algorithm

    The steps involved in the process of approximate median method are:

    Read the video

    Input the frame

    Convert the frame into GRAY

    Determine the threshold

    Determine the frame difference value

    Compare the threshold with the frame difference values

    For the foreground values > background values make the background more light

    For the foreground values > background values make the background more dark

    Subplot the frame foreground and background

    3.2.3 Efficiency

    The approximate median method does a much better job at separating the entire object from the

    background. This is because the more slowly adapting background incorporates a longer

    11

    http://www.springerlink.com/content/qgl74778617tq121/http://www.springerlink.com/content/qgl74778617tq121/
  • 7/31/2019 toprep

    12/30

    history of the visual scene, achieving about the same result as if we had buffered and processed

    Nframes.

    4. Coding

    This section describes the coding used for the project. I will mention the codes for different

    models in separate sections.

    The video along with the directory must be mentioned in the source. Threshold must be chosen

    for the background to be visible. The result will show that there is a slight deviation in the

    output that is because of the problems discussed in section 1.1.

    The code is shown next. I have given the code in the same format as it appears in Matlab. To

    cover the entire code on a single page the code is shown on the next page.

    The statement movie2avi(M,'frame_difference_output', 'fps', 30); is used to

    save the output as an avi file, named 'frame_difference_output' running at

    30 frames per second.

    12

  • 7/31/2019 toprep

    13/30

    4.1 Frame Difference

    source = aviread('Filename'); %Give the path of the file

    with name

    thresh = 25; %Set threshold

    bg = source(1).cdata; % read in 1st frame as

    background frame

    bg_bw = rgb2gray(bg); % convert background to

    greyscale

    % ----------------------- set frame size variables

    -----------------------

    fr_size = size(bg);

    width = fr_size(2);

    height = fr_size(1);

    fg = zeros(height, width);

    % --------------------- process frames

    -----------------------------------

    for i = 2:length(source)

    fr = source(i).cdata; % read in framefr_bw = rgb2gray(fr); % convert frame to grayscale

    13

  • 7/31/2019 toprep

    14/30

    fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast

    operands as double to avoid negative overflow

    for j=1:width % if fr_diff > thresh pixel

    in foreground

    for k=1:height

    if ((fr_diff(k,j) > thresh))

    fg(k,j) = fr_bw(k,j);

    else

    fg(k,j) = 0;

    end

    end

    end

    bg_bw = fr_bw;

    figure(1),subplot(3,1,1),imshow(fr)

    subplot(3,1,2),imshow(fr_bw)

    subplot(3,1,3),imshow(uint8(fg))

    M(i-1) = im2frame(uint8(fg),gray); % put frames

    into movie

    end

    % movie2avi(M,'frame_difference_output', 'fps', 30); % save

    movie as avi

    14

  • 7/31/2019 toprep

    15/30

    4.2 Approximate Median

    source = aviread('File name'); % Give path of the video

    thresh = 28;

    bg = source(1).cdata; % read in 1st frame as

    background frame

    bg_bw = double(rgb2gray(bg)); % convert background to

    greyscale

    % ----------------------- set frame size variables

    -----------------------

    fr_size = size(bg);

    width = fr_size(2);

    height = fr_size(1);

    fg = zeros(height, width);

    % --------------------- process frames

    -----------------------------------

    for i = 2:length(source)

    fr = source(i).cdata;

    fr_bw = rgb2gray(fr); % convert frame to grayscale

    fr_diff = abs(double(fr_bw) - double(bg_bw)); % cast

    operands as double to avoid negative overflow

    for j=1:width % if fr_diff > thresh pixel in

    foreground

    for k=1:height

    if ((fr_diff(k,j) > thresh))

    15

  • 7/31/2019 toprep

    16/30

    fg(k,j) = fr_bw(k,j);

    else

    fg(k,j) = 0;

    end

    if (fr_bw(k,j) > bg_bw(k,j))

    bg_bw(k,j) = bg_bw(k,j) + 1;

    elseif (fr_bw(k,j) < bg_bw(k,j))

    bg_bw(k,j) = bg_bw(k,j) - 1;

    end

    end

    end

    figure(1),subplot(3,1,1),imshow(fr)

    subplot(3,1,2),imshow(uint8(bg_bw))

    subplot(3,1,3),imshow(uint8(fg))

    M(i-1) = im2frame(uint8(fg),gray); % save

    output as movie

    end

    %movie2avi(M,'approximate_median_background','fps',15);

    % save movie as avi

    5. Results

    16

  • 7/31/2019 toprep

    17/30

    In this section I will show the results obtained for the techniques at different threshould and for

    different video sequences.

    5.1 Results for Frame Difference

    5.1.1 Test Video 1

    Threshold value 15

    Threshold value 25

    17

  • 7/31/2019 toprep

    18/30

    Threshold value 45

    18

  • 7/31/2019 toprep

    19/30

    5.1.2. Test Video 2

    19

  • 7/31/2019 toprep

    20/30

    Threshold value 15 Threshold value 25 Threshold value 45

    5.2 Results for Approximate Median

    20

  • 7/31/2019 toprep

    21/30

    5.2.1 Test Video 1

    Threshold Value 15

    Threshold Value 25

    21

  • 7/31/2019 toprep

    22/30

    Threshold Value 45

    22

  • 7/31/2019 toprep

    23/30

    5.2.2 Test Video 2

    23

  • 7/31/2019 toprep

    24/30

    Threshold value 15 Threshold value 25 Threshold value 45

    6. Observations and Analysis of Results

    24

  • 7/31/2019 toprep

    25/30

    6.1 Frame Differencing

    Frame difference method subtracts out extraneous background noise (such as waving

    trees), much better than the more complex approximate median and mixture of

    Gaussians methods.

    As can be seen, a major (perhaps fatal) flaw of this method is that for objects with

    uniformly distributed intensity values (such as the side of a car), the interior pixels are

    interpreted as part of the background. Another problem is that objects must be

    continuously moving. If an object stays still for more than a frame period (1/fps), it

    becomes part of the background.

    6.2 Approximate Median

    The approximate median method works as such: if a pixel in the current frame has a value

    larger than the corresponding background pixel, the background pixel is incremented by 1.

    Likewise, if the current pixel is less than the background pixel, the background is decremented

    by one.

    the background eventually converges to an estimate where half the input pixels are greater

    than the background, and half are less than the backgroundapproximately the median

    (convergence time will vary based on frame rate and amount movement in the scene.)

    As you can see, the approximate median method does a much better job at separating the

    entire object from the background. This is because the more slowly adapting background

    incorporates a longer history of the visual scene, achieving about the same result as if we

    had buffered and processedNframes.

    We do see some trails behind the larger objects (the cars). This is due to updating the

    background at a relatively high rate (30 fps). In a real application, the frame rate would

    likely be lower (say, 15 fps).

    The processing time increases with larger frame sequence.

    7. Conclusion

    25

  • 7/31/2019 toprep

    26/30

    Each technique has its merits. The conclusion.

    In frame difference one obvious advantage is the modest computational load. Another is that

    the background model is highly adaptive. Since the background is based solely on the previous

    frame, it can adapt to changes in the background faster than any other method (at 1/fps to be

    precise). As we'll see later on, the frame A challenge with this method is determining the

    threshold value.

    Approximate Median is a very good compromise. It offers performance near what you can

    achieve with higher-complexity methods (according to my research and the academic

    literature), and it costs not much more in computation and storage than frame differencing.

    26

  • 7/31/2019 toprep

    27/30

    Appendices

    References

    27

  • 7/31/2019 toprep

    28/30

    [1] Cristani, M., Bicego, M., Murino, V.: Multi-level background initialization

    using Hidden Markov Models. In: First ACM SIGMM Int. workshop on Video

    surveillance, pp. 1120 (2003)

    [2] Piccardi, M.: Background subtraction techniques: a review. In: SMC 2004,

    vol. 4,pp. 30993104 (2004)

    [3] Cheung, S.-C., Kamath, C.: Robust techniques for background subtraction in

    urban traffic video. In: Panchanathan, S., Vasudev, B. (eds.) Proc. Elect

    Imaging: Visual Comm. Image Proce. (Part One) SPIE, vol. 5308, pp. 881892

    (2004)

    [4] Cucchiara, R.: People Surveillance, VISMAC Palermo (2006)

    [5] Ewerth, R., Freisleben, B.: Frame difference normalization: an approach to

    reduce errorrates of cut detection algorithms for MPEG videos. In: ICIP, pp.

    10091012 (2003)

    [6] 1. C. Stauffer and W Grimson. Adaptive Background Mixture Models for

    Real-Time Tracking, Proc IEEE Conference Computer Vision and Pattern

    Recognition, 1999.

    [7] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of

    Computer algorithms. Addison Wesley, Reading, MA 1974.

    [7] Wren, A., Darrell, P.: Pfinder: Real-time tracking of the human body.

    PAMI (1997)

    [8] Cavallaro, A., Steiger, O., Ebrahimi, T.: Semantic video analysis for

    adaptive content delivery and automatic description. IEEE Transactions onCircuits and Systems for Video Technology 15(10), 12001209 (2005)

    [9] Stauffer, G.: Adaptive background mixture models for real-time tracking. In:

    CVPR (1999)

    [10] Carminati, L., Benois-Pineau, J.: Gaussian mixture classification for

    moving object detection in video surveillance environment. In: ICIP, pp. 113

    116 (2005)

    28

  • 7/31/2019 toprep

    29/30

    [11] Comaniciu, D.: Mean shift: a robust approach toward feature space

    analysis. IEEE Transactions on Pattern Analysis and machine Intelligence

    24(5), 603 (2002)

    [12] Elgammal, A.M., Harwood, D., Davis, L.S.: Non-parametric model for

    background subtraction. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp.

    751767. Springer, Heidelberg (2000)

    [13] Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation

    per image pixel for the task of background subtraction. Pattern Recognition

    Letters 27(7), 773780 (2006)

    [14] Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., Buhmann, J.M.:

    Topology Free Hidden Markov Models: Application to Background Modeling.

    In: Eighth Int. Conf. on Computer Vision, ICCV 2001, vol. 1, pp. 294301

    (2001)

    [15] Mittal, A., Paragios, N.: Motion-based background subtraction using

    adaptive kernel density estimation. In: Proceedings of the Int. Conf. Comp.

    Vision and Patt. Recog., CVPR, pp. 302309 (2004)

    [16] Tiburzi, F., Escudero, M., Bescs, J., Martnez, J.M.: A Corpus for Motion-

    based Videoobject Segmentation. In: IEEE International Conference on Image

    Processing (Workshop on Multimedia Information Retrieval), ICIP 2008,

    SanDiego, USA (2008)

    [17] El Baf, F., Bouwmans, T., Vachon, B.: Comparison of Background

    Subtraction Methods for a Multimedia Application. In: 14th InternationalConference on systems, Signals and Image Processing, IWSSIP 2007, Maribor,

    Slovenia, pp. 385388 (2007)

    [18] Parks, D.H., Fels, S.S.: Evaluation of Background Subtraction Algorithms

    with Post- Processing. In: IEEE Fifth International Conference on Advanced

    Video and Signal Based Surveillance, AVSS 2008, pp. 192199 (2008)

    [19] Z. Zivkovic and F. van der Heijden, Efficient adaptive density estimation

    per image pixel for the task of background subtraction, Pattern

    29

  • 7/31/2019 toprep

    30/30

    Recognition Letters, vol. 27, no. 7, pp. 773780, May 2006.

    [20] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2007.

    [21] AMD/ATI, ATI CTM (Close to Mental) Guide, 2007.

    [22] J. Fung, Advances in GPU-based Image Processing and Computer

    Vision, in SIGGRAPH, 2009.

    [23] S.-j. Lee and C.-s. Jeong, Real-time Object Segmentation based on GPU,

    2006 International Conference on Computational Intelligence and Security, pp.

    739742, November 2006.

    [24] P. Carr, GPU Accelerated Multimodal Background Subtraction. Digital

    Image Computing: Techniques and Applications, December 2008.

    [25] Apple Inc., Core Image Programming Guide, 2008.

    [26] NVIDIA Corporation, NVIDIA CUDA Best Practices Guide, 2010.

    [27] VSSN 2006 Competition, 2006. [Online]. Available: http://mmc36.

    informatik.uni-augsburg.de/VSSN06_OSAC/

    [28] PETS 2009 Benchmark Data, 2009. [Online]. Available: http:

    //www.cvg.rdg.ac.uk/PETS2009/a.html

    [30]//www.eetimes.com