tfm ingles

8/8/2019 Tfm Ingles

1/23

Smart Telecare Video Monitoring for AnomalousEvent Detection

Ivan G omez Conde

Advisors: David Olivieri Cecchi and Xose Ant on Vila SobrinoMaster Thesis (Intelligent and Adaptable Software Systems Master)

University of [email protected]

Abstract. Behavior determination and multiple object tracking for videosurveillance are two of the most active elds of computer vision. The rea-

son for this activity is largely due to the fact that there are many applica-tion areas. This thesis describes work in developing software algorithmsfor the tele-assistance for the elderly, which could be used as early warn-ing monitor for anomalous events. The thesis treats algorithms for boththe multiple object tracking problem as well simple behavior detectorsbased on human body positions. There are several original contributionsproposed by this thesis. First, a method for comparing foreground - back-ground segmention is proposed. Second a feature vector based trackingalgorithm is developed for discriminating multiple objects. Finally, a sim-ple real-time histogram based algorithm is described for discriminatingmovements and body positions.

1 Introduction

Computer vision has entered an exciting phase of development and use in re-cent years and has benetted from rapid developments in related elds such asmachine learning, improved Internet networks, computational power, and cam-era technology. Present applications go far beyond the simple security cameraof a decade ago and now include such elds as assembly line monitoring, sportsmedicine, robotics, and medical tele-assistance. Indeed, developing a system thataccomplishes these complex tasks requires coordinated techniques of image anal-ysis, statistical classication, segmentation, inference algorithms, and state spaceestimation algorithms.

Irrespective of the problem domain, the computer vision problem is approxi-mately the same for application: rst we must somehow detect what we consider

to be the foreground object (or objects), we must then track these objects intime (over several video frames), and we must discern something about whatthese objects are doing. This is summarized as follows:

1. Foreground object segmentation2. Tracking over time3. Determination of actions and behavior

8/8/2019 Tfm Ingles

2/23

2 Ivan G omez Conde

Fig. 1. Age and sex distribution for the year 2010 and the predicted for the year 2050[37].

1.1 Problem Domain of this Thesis: Tele-assistance

The motivation for this thesis is the development of a tele-assistance application,which represents a useful and very relevant problem domain. To understandthe importance of this problem area, it is interesting to appreciate some recentstatistics. Life expectancy worldwide has risen sharply in recent years. In 2050the number of people aged 65 and over will exceed the number of youth under15 years, according to recent demographic studies [37,38]. Combined with sociologic factors, there is thus a growing number of elderly people (Figure 1) that

live alone or with their partners. These people need medical care and thereforethere are two big problems: not enough people to care for elderly population andthe government can not cope with this enormous social spending.

Thus, Computer Vision (CV) can provide a strong economic savings by elim-inating the need for 24 hour in-house assistance by medical staff. A potentialtele-assistance application could combine vital signs monitoring while at thesame time use video to detect anomalous behavior, such as falls or excessivelylong periods of inactivity. Therefore, present systems are being developed thatcollect a wide array of relevant medical information of patients in their homes,and send this information to a central server over the Internet. Thus, thesesystems provide clear economic cost reduction of primary care as well as earlywarning monitoring patients before they enter a critical phase, necessitatingcostly emergency care.

A general schematic of the application developed for this masters projectis given in Figure 2. It has a number of parameters that can be hand-tunedfor investigating optimal parameters for background subtraction. The system,based upon the OpenCV computer vision library, captures real time video froma webcam, performs foreground object segmentation, tracks these objects, andsubsequently determines a limited set of human body action. A screen capture of

8/8/2019 Tfm Ingles

3/23

Smart Telecare Video Monitoring for Anomalous Event Detection 3

main window of the application is shown in Figure 2. The gure shows detection,extraction and characterization of moving foreground objects.

Fig. 2. A screenshot of the main window of the tele-assistance application.

1.2 Objectives and Organization

This master thesis describes a computer vision application that can be used forTelecare. There are three contributions made by this thesis: (1) a comparisonof different foreground/background algorithms, (2) a feature based model fortracking multiple objects, and (3) an online method for detecting basic humanbody positions, that could eventually be used in a behavior detection algorithm.

This thesis is organized as follows. First, we describe the state of the artof video surveillance and especially how it is related to the domain of remoteTelecare. In section III, we describe the architecture of our software system, aswell as details of motion detection, segmentation of objects, and the methodswe have developed for detecting anomalous events. Finally, the section IV andV show the performance results and conclusions of this work.

Part of this master thesis was presented as a long article on 5th Iberianconference of systems and information technology (CISTI 2010) [14].

2 State of the art

For nearly 30 years, researchers have been seeking the best ways of carrying out

the concerted tasks of computer vision. As such, the the quantity of techniquespublished in the literature is daunting. Due to the maturity of the eld, severalseminal reviews are now available. For segmentation and tracking, for example,the review by Hu [17] provides a useful review and taxonomy of algorithms usedin multi-object detection. For human body behavior determination from a videosequences, the recent review by Poppe [31] provides an impressive whirlwindtour of algorithms and techniques.

8/8/2019 Tfm Ingles

4/23

4 Ivan G omez Conde

Open Source software has also left its mark on the computer vision eldwith a wide array of libraries freely available to developers to try their hand at

the three fundamental problems of computer vision. A large listing of availablecomputer vision software libraries (both open source and can be found at CMU[39].

2.1 Surveillance System Evolution

The technological evolution of video monitoring and surveillance systems hasbeen ongoing over recent years, and now includes the latest advances in com-puter vision. In their review, Hu[17] and co-workers have divided video surveil-lance systems in three generations. Starting from rst generation systems thatconsisted of analog monitors requiring intensive human intervention, up until thepresent with real-time digital embedded processing, a rapid evolution is under-

way. Present systems can incorporate real-time compression and networks arefully integrated with the possibility of interconnection with the Internet and ac-cess from mobile devices. Increased computational capacity of computers todaycan perform real-time segmentation and tracking of multiple objects.

Next generation video surveillance systems are presently being intensely re-searched, and the fundamental problems being addressed are to monitor and inferbehavior detected from the video sequences. Van der Heijden[15] and Olson[29],describe event graphs, which provide decision making by connecting a series of sequential events. Other fundamental issues in present systems are the need formulti-object tracking as well as multiple synchronized cameras. The most up-dated review relevant to this thesis, however, is that of Poppe [31], for humanaction recognition, containing 180 references. Poppe clearly demonstrates that

the main thrust of work over the past ten years, shows that successful systemsmust heavily rely upon statistical machine learning and modern kernel methods.Remote Telecare Systems: There are several research groups investigating re-

mote assistance technologies for the elderly in their homes, some incorporatingsmart video camera technology, such as that treated in this paper. Many sys-tems under development also incorporate vital signs monitoring and other sensortechnologies. Examples of these projects are MobiAlaram, TeleADM, Soprano,PLAMSI, Dreaming, and so on [3,25,33,36]. These systems translate this lowlevel signal information, whether from video monitoring, vital signs monitoring,ect, into higher level system that can be used to make decisions or ultimatelyprovide more timely care.

2.2 Motion DetectionDetecting moving objects in a video scene and separating them from the back-ground is a challenging problem due to complications from illumination, shad-ows, dynamic backgrounds ect[30]. Motion detection is concerned with beingable to notice objects that move within a scene, thereby separating foregroundobjects from the background[1,11]. There are several techniques that have been

8/8/2019 Tfm Ingles

5/23


applied to motion detection and can be grouped into three broad categories:environmental modeling, motion segmentation, and object classication.

Environmental modeling: requires constructing a set of images of the back-ground. While trivial for stationary backgrounds, this technique is funda-mentally limited for most real world problems, especially for outdoor scenesthat contain moving objects and dramatic changes in lighting [6].

Motion Segmentation: can be further divided into three different types: back-ground subtraction, temporal differencing, and optical ow.

Static background subtraction: uses pixel by pixel comparison of the currentimage to a static reference image in order to detect moving objects. Whilesimple and valid for static backgrounds, the method places stringent con-straints on the background image [24,21].

Temporal differencing: uses pixel-wise differencing of a sequence of frames.The method is highly dependent upon different time frames [32].

Optical ow: creates complex contours of the entire velocity eld. It is themost robust method, however requires considerable computational resourcesand cannot at present be used in real-time applications [27,22].

Object Classication: these techniques are applied to situations where thereare more than one foreground object moving in a scene. The solution to thisclass of problem can further be divided into two main categories: shape basedclassiers, which discern objects for their shape, or motion based classiers,that classify based upon the type of motion [23].

2.3 Object Tracking

Once objects have been separated from the background, we are interested intracking these objects individually. The goal is to track particular objects duringsubsequent frames in the video sequence based upon their unique characteristicsand locations. An important research eld is to track multiple people in clutteredenvironments. This is a challenging problem due to several factors: people havesimilar shape or color, complex events such as running and walking may bewithin the same scene, or depth perspective of real world cameras. Anotherchallenging problem that must be treated by the software is occlusions withobjects or other people. We shall briey consider four prominent strategies for

implementing object tracking: region based, active contour based, feature basedand model based.

Region based: region tracking consists in arbitrarily dening static regionswithin the image frame to monitor. The methods tracks objects within eachregion, however fails when objects move between regions [34].

8/8/2019 Tfm Ingles

6/23

6 Ivan G omez Conde

Active contour based: the idea is to create bounding contours which rep-resent each objects outline. The problem is that the contours need to be

updated from frame to frame, and the method suffers from precision [10,34].

Feature based: feature based tracking methods are very successful techniquesfor tracking objects from frame to frame. No assumptions need to be madeabout the objects other than an intelligent choice of feature vector. A typicalfeature vector contains the following quantitative information of each object:the object size, position, velocity, ratio of major axis orientation, coordinatesof the bounding box, the centroid, and the histogram of color pixels. Theadvantage of using a feature based method is that statistical clustering canbe used to classify the objects from frame to frame [5,20].

Model based: these methods are very accurate, but computationally very ex-pensive. These include modeling the objects that are to be tracked withdetailed models. For example, human gures have been modeled with stickgures and with complex gait algorithms [2,28].

2.4 Object Segmentation

Several problems arise when implementing background subtraction techniques.The most obvious is noise, which has its origin in the fact that the backgroundof consecutive frames may not be exactly the same due to changes in luminosity,temporal occlusions, and the number and velocity of different objects in thescene. Although often associated with interior video scenes, all or a subset of these problems can also occur within exterior scenes.

Thus, exterior scenes share similar problems as those of interior scenes, yet

other more complex effects may arise. An example is an oscillating backgroundin the case of a forest, where wind generates tree leave movements (Figure 3).Models may be complicated by other problems, such as how to determine if an object is part of the background or not. A typical textbook example thatillustrate the foreground-background problem is the following: imagine a videoscene with parked cars. In the moment that one car enters the parking lot, it isa foreground object. However, at what point should our algorithm consider thiscar to be a background object?

The principal methods used for determining the background in a frame arethe following: differences between frames , which determines the background bytaking the logical difference between two frames, mean background , which isbased upon the mean based upon the mean background of n frames, and, code-book model , which analyzes blocks of frames in order to estimate the backgroundmodel.Differences between frames: determines the background by taking the logi-

cal difference between two frames separated by a time t . It is easy to imple-ment and is not computationally intensive, but it is not sufficiently robust,and plagued with artifacts and noise [24].

8/8/2019 Tfm Ingles

7/23


Fig. 3. Scene with a lot of moving objects like tree leaves, dogs or birds.

Mean background: this method overcomes the deciencies of the pairwise dif-ference by performing a cumulative average over n frames. This has the ad-vantage that objects do not need to be in continuous movement. The method[18,35], however suffers from artifacts that arise when objects moves at dif-ferent velocities. Another problem is that it needs to be hand-tuned. Forinitial studies in this thesis, this technique was utilized. Thus, a more com-prehensive discussion shall be provided later.

Codebook model: the codebook method is a powerful technique for some sit-uations since it stores a model background based upon previously obtainedbackgrounds. This is an interesting choice for indoor environments, as in ourTelecare system, where the background remains relatively static [19].

2.5 Human action recognition

There is a large body of literature in the area of human action recognition.Recent comprehensive surveys of human action in videos is described by Poppe[31] and Forsyth [12]. Related work for locating people in static images has beenreviewed in [7,13].

Many techniques have been described in the literature for distinguishing bodypart positions for the purpose of action recognition. As an example of early ap-proaches to this complex problem, a template matching technique was reportedby Felzenzwalb [9], where body parts are represented by a deformable skeleton

model which is mapped to points on the body from frame to frame. More recentmodels rely upon sophisticated statistical machine learning techniques, such asthat by Meeds, which uses a fully Bayesian probabilistic model to constructtree models of foreground objects. Other researchers are using kernel classiers[8] for constructing systems based on mixtures of multi-scale deformable partmodels. Hsieh et al, [16], have shown that using a triangulation method one candistinguish human parts over sequences of video scenes.

8/8/2019 Tfm Ingles

8/23

8 Ivan G omez Conde

3 The software system

The software application, TELECARE, developed for this thesis has been writ-ten in C++ and uses the OpenCV library, which is an open-source cross-platformstandard library, originally developed by Intel, for developing a wide range of real-time computer vision applications. OpenCV implements low level image pro-cessing as well as high level machine learning algorithms so that developers canrapidly leverage well-known techniques for building new algorithms and robustcomputer vision applications.

For the graphical interface, the QT library is used since it provides excellentcross-platform performance. The QT development environment is an industryopen standard and is now owned by Nokia. Existing applications not only runon other desktop platforms, but can be extended for inclusion in web-enabledapplications as well as mobile and embedded operating systems without the need

for rewriting source code.Since the TELECARE software is an experimental application, the graphicalinterface is designed to provide maximum information about feature vector pa-rameters as well as easy comparison of different segmentation algorithms. Thus,the system is not meant for end-users at the moment. Instead, the architectureof the system provides a plugin- framework for including new ideas. A high levelschema of our software system is shown in Figure 4 . It consists of the basiccomponents described in the previous section, namely a foreground extraction,object segmentation, and nally tracking of individual objects. Tracking of mul-tiple objects is performed by rst clustering individual objects, or blobs, fromfeature vectors formed by their dimensions, color space, and velocity vectors.In order to experiment with different segmentation schemes for discriminatingevents, we have implemented histogram based algorithms.

3.1 Foreground Segmentation

The rst phase of extracting information from videos consists of performing basicimage processing functions. These include: loading the video, capturing individ-ual frames, and applying various smoothing lters. Next, blobs are identied

Fig. 4. Computer Vision system for video surveillance.

8/8/2019 Tfm Ingles

9/23


based upon movement between frames. For static cameras, this amounts to sim-ply taking note of pixel locations that change value from frame to frame within

a specied threshold. There are several sophisticated background subtractionmethods for eliminating shadows and other artefacts that we shall describe inthe next section, however the basic background subtraction is based upon twoideas: (a) nding an appropriate mask for subtracting moving objects in the fore-ground and (b) updating the background. Two methods are typically employed:a running average or Gaussian mixtures model, where each pixel is classied tobelong to a particular class.

Running average [4]: A running average technique is by far easiest to com-prehend. Each point of the background is calculated as by taking the mean of accumulated points over some pre-specied time interval, t . In order to controlthe inuence of previous frames, a weighting parameter is used as a multiplyingconstant in the following way:

Acc t (x, y ) = (1 ) Acc t 1 (x, y ) + I t (x, y ) (1)

where the matrix Acc as the accumulated pixel matrix, I (x, y ) the image, and is the weighting parameter.

For a constant value of , the running average is not equivalent to the result of summing all of values for each pixel across a large set of images, and then divideby the total number of images to obtain the mean. To see this, simply consideradding three numbers (2, 3, and 4) with set to 0.5. If we were to accumulatethem with normal mean, then the sum would be 9 and the average 3. If we were to accumulate them with running average, the rst sum would give0.520.53 = 2 .5 and then adding the third term would give 0 .52.50.54 = 3 .25.The reason that the second number is larger is that the most recent contributionsare given more weight than those from farther in the past. The parameter essentially sets the amount of time necessary for the inuence of a previousframe to fade. In the Figure 5 the results of different executions with values of : 0.05, 0.4 and 0.8 can be seen.

Thus, the background is subtracted by calculating the running average of previous background images and the current frame with the mask applied. Anexample using this algorithm is given in Figure 6. The left image is the gener-ated mask. The second image is the resulting background image obtained frommultiple calls to the running average routine. The third image is the result of applying the mask to the current image. Finally, the fourth image is the orig-

inal image with rectangular contours dening moving foreground objects, afterapplying image segmentation.

Gaussian Mixture Model [18]: This model is able to eliminate many of the artefacts that the running average method is unable to treat. This methodmodels each background pixel as a mixture of K Gaussian distributions (where

8/8/2019 Tfm Ingles

10/23

10 Ivan G omez Conde

Fig. 5. Different executions of running average.

Fig. 6. Foreground subtraction results for a particular frame in a video sequence.

K is typically a small number from 3 to 5). Different Gaussian are assumed torepresent different colors. The weight parameters of the mixture represent thetime proportions that those colors stay in the scene. The background componentsare determined by assuming that the background contains B highest probablecolors. The probable background colors are the ones which stay longer and morestatic. Static single-color objects tend to form tight clusters in the color spacewhile those in movement form wider clusters due to different reecting surfacesduring the movement. The measure of this is called the tness value . To allowthe model to adapt to changes in illumination and run in real-time, an updatescheme is applied. It is based upon selective updating. Every new pixel value ischecked against an existing model component to check the value of its tness .

The rst matched model component will be updated. If it nds no match, anew Gaussian component will be added with the mean at that point and a largecovariance matrix and a small value of weighting parameter.

Each pixel in the scene is modelled by a mixture of K Gaussian distributions.The probability that a certain pixel has a value of xN at time N can be writtenas:

8/8/2019 Tfm Ingles

11/23


p(xN ) =K

j =1

wj (xN ; j ) (2)

where wk is the weight parameter of the kth Gaussian component, and (xN ; j )is the Normal distribution of kth component, given by

(x ; k ) = (x ; k ,k

) =1

(2 )D2 | k |

12

e12 (x k )

T P 1k (x k ) (3)

In the above equation, k is the mean and k = 2k I is the covariance of the

k th component. The K distributions are ordered based on the tness value w k kand the rst B distributions are used as a model of the background of the scenewhere B is estimated as:

B = argmin(b

j =1

wj > T ) (4)

The threshold T is the minimum fraction of the background model. In otherwords, it is the minimum prior probability that the background is in the scene.The Figure 7 shows the results of three executions with different values for the of Gaussians: 1, 2.5 and 5

Fig. 7. Different executions of Gaussian mixture model.

8/8/2019 Tfm Ingles

12/23

8/8/2019 Tfm Ingles

13/23


Fig. 8. % of FN and FP based on manually segmented image.

Fig. 9. % error of the best conguration with the Running average model and the bestconguration with the Gaussian mixture model .

3.2 Finding and Matching blob features

Foreground objects are identied in each frame as rectangular blobs, which inter-nally are separate images that can be manipulated and analyzed. Only minimumsized blobs are considered real objects, thereby eliminating possible artifacts thatmay arise. For each blob, the background mask is subtracted and a simple image

erosion operation is performed in order to be sure to eliminate any backgroundcolor contributions. This erosion process helps in feature classication since com-mon background pixels dont contribute to the object color histograms.

In order to classify each blob uniquely, we dene the following feature vectorparameters (Figure 10): (a) the size of the blob, (b) the Gaussian tted valuesof RGB components (Figure 11), (c) the coordinates of the blob center, and (d)the motion vector. Because of random uctuations in luminosity, smaller blobs

8/8/2019 Tfm Ingles

14/23


appear, but we discard these blobs. The size of blobs is simply the total numberof pixels. Histograms are obtained by considering bin sizes of 10 pixels. We also

normalize the feature vectors by the number of pixels.

Fig.10. The feature vector parameters for classifying blobs.

Fig.11. Discrimination of the histogram of color space between blobs in taken fromdifferent frames. The x-axis is the norm difference of red, while the y-axis is the normdifference histogram for green.

In order to match blobs from frame to frame, we perform a clustering. Sincethis can be expensive to calculate for each frame, we only recalculate the fullclustering algorithm when blobs intersect. Figure 12 shows excellent discrim-ination by using the norm histogram differences between blobs for each colorspace.

In the plot, the x axis is the difference of the red color channel between blobs,while the y axis is the norm of the difference of histograms in the green channel.

8/8/2019 Tfm Ingles

15/23


For a particular video sequence, the results shown in Figure 12 demonstrate thattwo separate blobs are easily classied.

Fig. 12. The Histograms of the BGR color space.

3.3 Tracking individual blobs

The tracking algorithm used is similar to other systems described in the liter-ature. Once segmented foreground objects have been separated, and we haveformed a rectangular blob region, we characterize the blob by its feature vector.Tracking is performed by matching features of the rectangular regions. Thus,given N rectangular blobs, we match all these rectangles with the previousframes. The frame to frame information allows us to extract a motion vectorof the blob centroid, which we use to predict the updated position of each blob.For cases where there is ambiguity, such as track merging due to crossing, werecalculate the clusters. Thus, the situations where we explicitly recalculate theentire feature vector are the following: (a) objects can move in and out of theview-port, (b) objects can be occluded by objects or by each other, and (c)complex depth information must be obtained.

An example of velocity vector taken over time is given in Figure 13, where thetracking algorithm must consider the crossing of two objects. The simple predic-tion algorithm easily handles this situation by rst extracting the instantaneousvelocity vector, and then performing a simple update based on the equations of

motion.

3.4 Detecting Events and Behavior for Telecare

For our initial work, we have considered a limited domain of events that weshould detect, namely: arm gestures, and body positions upright or horizontal,

8/8/2019 Tfm Ingles

16/23


Fig. 13. Frames over from a video sequence.

to detect falls. These two cases are used to address anomalous behavior or simple

help signals for elderly in their home environments.Our analysis is based upon comparing histogram moment distributions through

the normalized difference of the histograms as well as the normalized differenceof the moments of each histogram.

Hist (H i , H j ) = |H i H j | (5)

MHist (H i , H j ) = |M i M j | (6)

In order to test our histogram discriminatory technique, video events wererecorded with very simple arm gestures, as shown in Figure 14. The foregroundobject was subtracted from the background by the methods previously described.The blob image is then eroded in order to obtain principle color space histograms.Figure 14 shows the detected rectangular blobs after subtracting the backgroundmask.

Fig.14. Simple histogram results for obtaining moments for detecting arm gestures.

8/8/2019 Tfm Ingles

17/23


For each of the histograms obtained in Figure 14, and for the histograms of Figure 15, statistical moments are calculated and then normalized histograms

(normalized both by bins and total number of points) are obtained. Clusteringcan then be performed (similar to that of gure 11), by calculating MHist (H i , H j ),the normed difference.

Fig.15. Basic histogram technique used for discrimination body position. The insetimage demonstrates the color space normalized to unity.

The histograms are obtained by summing all the pixels in the vertical direc-tion. This is performed by dividing the image into equal-sized vertical stripes,and summing over all nonzero pixels within each slice for each color channel.

3.5 Event discrimination

Video sequence were recorded with typical actions / events that we would like todetect, such as a person falling to the oor. Figure 16 shows three video frames,taken from our application where we have detected and tracked a foregroundobject, in this case a person as he falls to the oor.

Fig. 16. Video sequence of person falling to the oor.

8/8/2019 Tfm Ingles

18/23


From each frame in the sequence, we calculated the histogram moments inthe horizontal and vertical direction, H x and H y , respectively, in order to char-

acterize body position. The results of comparing H y (the vertical histogram) fortwo body positions, representing the extremes of the video sequence, are shownin gure 15. The inset of Figure 15 shows the histograms normalized along thex-axis to the total number of pixels.

In order to automatically discriminate the different body positions, we cancompare the moments of the histograms obtained. Figure 17 and Table 2 showthe results of different moments for frames shown in Figures 14 and 15. Figure14b (the central histogram) demonstrates a highly peaked rst and second mo-ment, thereby useful for discriminating between the other two cases, althoughcases 14a and 14c are not discriminated for any moment. For Figure 15, thedifference in the distributions in the rst and second moments is highly pro-nounced, and thus discrimination of the two cases is easily obtained from thesimple moment analysis.

Fig.17. Comparison of different histogram moments obtained from the video framesstudied in Figure 8 and Figure 9.

Fig 14 (a) Fig 14 (b) Fig 14 (c) Fig 15 (a) Fig 15 (b)Mean 0.52 0.33 0.45 0.57 0.49

Standard Deviation 0.21 0.16 0.2 0.18 0.24Skewness 0.18 3.85 3.12 0.05 1.5

Table 2. Results of different moments for frames shown in gure 14 and gure 15.

8/8/2019 Tfm Ingles

19/23


4 Experimental Results and Discussion

All the experimental tests and development were performed on a standard PC,consisting of an Intel Pentium D CPU 2.80GHz, with 2G of RAM and using theUbuntu 9.10 Linux operating system. Videos and images were obtained fromwebcam with 1.3MPixel resolution. For testing the system, many different videosequences were recorded using different lighting situations and object character-istics, of which a subset of the results have been shown in our examples of theprevious section.

The performance of the algorithm is highly dependent upon amount of pro-cessing that can be perform from frame to frame. In all cases, we execute theforeground segmentation step between each step with a 0% reduction in theoriginal frame rate of 30fps. We do not perform blob clustering between eachframe, since it would be costly, but instead re-calculate clusters only when thereare ambiguities (such as track crossings). Our histogram method for determininghuman body position is also costly and therefore we are experimenting with theoptimal multiple of frames (such as every 25 frames) so that we can still detectrelevant body positions for real-time processing.

The gure 18 shows on a logarithmic scale the time results of the performanceof the algorithms with a video of 30fps and 12 seconds of duration. The blue linerepresents the normal video reproduction, the magenta line is the video playingwith our system without processing, the red color represents the foregroundsegmentation and the green line adds the time for processing blob clusteringbetween each frame. The quantitative time is shown in the Table 3.

Fig.18. Time of the different video reproductions.

As shown in the previous section, the results of Figure 17 demonstrate thatwe can use statistical moment comparisons of histograms in order to discrim-inate between simple body positions. However, this method is not robust for

8/8/2019 Tfm Ingles

20/23

8/8/2019 Tfm Ingles

21/23


2. Alexandru O. Balan and Leonid Sigal and Michael J. Black, A Quantitative Evalu-ation of Video-based 3D Person Tracking, International Workshop on Performance

Evaluation of Tracking and Surveillance, pp. 349-356 (2005)3. Botsis, G. Hartvigsen, Current status and future perspectives in telecare for elderlypeople suffering form chronic diseases. Journal of Telemedicine and Telecare, Vol14, pp. 195-203 (2008)

4. G. Bradski, A. Kaehler, Learning OpenCV: Computer Vision with the OpenCVLibrary. OReilly Media, (2008)

5. Robert T. Collins, Yanxi Liu, Marius Leordeanu, Online Selection of DiscriminativeTracking Features. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, pp. 1631-1643 (2005)

6. Ahmed Elgammal and Ramani Duraiswami and David Harwood and Larry S. Davisand R. Duraiswami and D. Harwood, Background and Foreground Modeling UsingNonparametric Kernel Density for Visual Surveillance. Proceedings of the IEEE,pp. 1151-1163 (2002)

7. Markus Enzweiler, Dariu M. Gavrila, Monocular pedestrian detection: survey and

experiments, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), Vol 31(12), pp. 21792195 (2009)8. P. Felzenszwalb, R. Girshick, D. McAllester and D. Ramanan, Object detection with

discriminatively trained part based models. IEEE Transactions on Pattern Analysisand Machine Intelligence, (2009)

9. Pedro F. Felzenszwalb and Daniel P. Huttenlocher, Efficient Matching of PictorialStructures. Proc. IEEE Computer Vision and Pattern Recognition Conf. Vol 2, pp.66-73 (2000)

10. Daniel Freedman, Active Contours for Tracking Distributions. IEEE Trans. ImageProcessing, Vol 13, pp. 518-526 (2004)

11. G.L. Foresti, L. Marcenaro, C.S. Regazzoni, Automatic detection and indexing of video-event shots for surveillance applications. IEEE Transactions on multimedia,Vol 4, No 4 (2002)

12. David A. Forsyth and Okan Arikan and Deva Ramanan, Computational Studiesof Human Motion: Part 1, Tracking and Motion Synthesis. Foundations and Trendsin Computer Graphics and Vision, (2006)

13. Tarak Gandhi, Mohan M. Trivedi, Pedestrian protection systems: issues, survey,and challenges, IEEE Transactions On Intelligent Transportation Systems, Vol 8(3),pp. 413430 (2007)

14. I. Gomez-Conde, D. N. Olivieri, X.A. Vila, L. Rodrguez-Li nares, Smart TelecareVideo Monitoring for Anomalous Event Detection. Proceedings of 5th Iberian con-ference of systems and information technology (CISTI 2010), Vol 1, pp. 384-389,(Jun. 2010)

15. F. van der Heijden, R.P.W. Duin, D. de Ridder, D.M.J., Tax, Classication, Pa-rameter estimation, and State Estimation, An engineering approach using Matlab.John Wiley and Sons Ltd, West Sussex, England (2004)

16. Chih-Chiang Chen, Jun-Wei Hsieh, Yung-Tai Hsu, and Chuan-Yu Huang, Segmen-tation of Human Body Parts Using Deformable Triangulation. In: IEEE Interna-tional Conference on Pattern Recognition, Vol 1, pp. 355358 (2006)

17. W. Hu, T. Tan, L. Wang, S. Maybank, A Survey on Visual Surveillance of ObjectMotion and Behavior. IEEE Trans. Systems, Man, and cybernetics - Part C. Vol34, No 3, pp. 334-352 (2004)

18. P. KaewTraKulPong and R. Bowden, An improved adaptive background mixturemodel for real-time tracking with shadow detection. In Proceedings of the 2nd Eu-ropean Workshop on Advanced Video-Based Surveillance Systems (2001)

8/8/2019 Tfm Ingles

22/23


19. Thanarat H. Chalidabhongse, David Harwood, Larry Davis, Real-time foreground-background segmentation using codebook model. Real-Time Imaging, Vol 11, Issue

3, Special Issue on Video Object Processing, pp 172-185 (2005)20. T. Lee and T. H ollerer, Hybrid Feature Tracking and User Interaction for Mark-erless Augmented Reality, Proc. IEEE Conf. Virtual Reality (VR 08), pp. 145-152,(2008)

21. M. K. Leung and Y. Yang, First Sight: A Human Body Outline Labeling System.IEEE Trans. Pattern Anal. Mach. Intell. Vol 17, No 4, pp. 359-377, (Apr. 1995)

22. Arno Lepisk, The use of Optic Flow within Background Subtraction. MastersDegree Project, Royal Institute of Technology of Stockholm, Sweden, (2005)

23. A. J. Lipton, H. Fujiyoshi, R.S. Patil, Moving target classication and trackingfrom real-time video. Proc. IEEE Workshop Applications of Computer Vision, pp8-14 (1998)

24. W. Long and Y. Yang, Stationary background generation: an alternative to thedifference of two images. Pattern Recogn. Vol 23, No 12, pp. 1351-1359, (Nov. 1990)

25. T. Martin, Fuzzy Ambient Intelligence in Home Telecare. Lecture Notes in Com-puter Science, Vol 4031, pp. 12-13 (2006)

26. Meeds, D. A. Ross, R. S. Zemel, and S. T. Roweis. Learning stick-gure modelsusing nonparametric Bayesian priors over trees. In IEEE Conference on ComputerVision and Pattern Recognition, (2008)

27. Anurag Mittal, Nikos Paragios, Motion-Based Background Subtraction UsingAdaptive Kernel Density Estimation. Computer Vision and Pattern Recognition,IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR04), pp. 302-309 (2004)

28. Thomas B. Moeslund, Adrian Hilton, Volker Kruger, A survey of advances invision-based human motion capture and analysis, Computer Vision and Image Un-derstanding, Vol 104, Issues 2-3, Special Issue on Modeling People: Vision-basedunderstanding of a persons shape, appearance, movement and behaviour, pp. 90-126 (2006)

29. T. Olson, F. Brill, Moving object detection and event recognition algorithms forsmart cameras. Proc DARPA Image Understanding Workshop, pp. 159-175 (1997)30. J. Pons, J. Prades-Nebot, A. Albiol, J. Molina, Fast motion detection in the com-

pressed domain for video surveillance. IEE Electronics Letters, Vol 38, pp. 409-411(2002)

31. Ronald Poppe, A Survey on Vision-based Human Action Recognition. Image andVision Computing, aceptado y pendiente de publicaci on (2010)

32. Chris Stauffer and W. Eric and W. Eric L. Grimson, Learning Patterns of ActivityUsing Real-Time Tracking. IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol 22, pp.747-757 (2000)

33. M.A. Valero, An Implementation Framework for Smart Home Telecare Services.Future Generation Communication and Networking (FGCN), Vol 2, pp. 60-65 (2007)

34. Zifu Wei, Duyan Bi, Shan Gao, Jianjun Xu, Contour Tracking Based on OnlineFeature Selection and Dynamic Neighbor Region Fast Level Set. Fifth InternationalConference on Image and Graphics, pp. 238-243 (2009)

35. Zoran Zivkovic, Ferdinand van der Heijden, Efficient adaptive density estimationper image pixel for the task of background subtraction, Pattern Recognition Letters,Vol 27, Issue 7, pp. 773-780 (2006)

36. N. Zouba, Multi-sensor Analysis for Everyday Activity Monitoring. 4th Int. Con-ference: Sciences of Electronic, Technologies of Information and Telecommunications(SETIT) (2007)

8/8/2019 Tfm Ingles

23/23


37. United Nations, Department of Economic and Social Affairs Population Division,http://www.un.org/esa/population/

38. NationMaster, a massive central demographic data source,http://www.nationmaster.com/index.php39. Listing of computer vision software, CMU site:

http://www.cs.cmu.edu/ cil/v-source.html

tfm ingles

Documents