2009.06.09 chris poppe - public phd defense

ELIS – Multimedia Lab

Detectie en representatie van bewegende objecten voor

videobewaking

Detection and Representation of Moving Objects for Video

SurveillanceChris Poppe

Multimedia LabDepartment of Electronics and Information Systems

Faculty of EngineeringGhent University

Supervisor: prof. dr. ir. Rik Van de Walle

2/39


Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

3/39




Introduction: Video Surveillance

• “Usage of a video camera to act upon crime” • Number of cameras and surveillance systems has grown

– 2004: 4 285 000 cameras in United Kingdom

• Operators have problems to interpret the increasing amount of data– need for intelligent video surveillance systems

4/39




Introduction: Intelligent Video Surveillance System

encoding

video

analytics

storage

visualization

video + metadat

a

5/39




Introduction: Video Surveillance

• Automated analysis of the video to make intelligent decisions



person1

person2

intruder alert!!!

analytics

1. detection of moving objects

2. tracking3. classification4. identification5. interpretation

6/39




Introduction: Moving Object Detection

• Detection of moving objects first step in video analytics– needs to be fast and accurate

• Classify each pixel in the image as foreground or background

• Current techniques – good for “simple” situations– problems with moving trees, changing lighting conditions,

environmental conditions, …

• Goal– fast and robust detection of moving objects

analytics

7/39




Introduction: Moving Object Representation

• Analytics extracts information (e.g., moving objects) from video– represented using standardized formats (metadata standards)

• Large video surveillance systems contain several analytics modules – same information can be represented using different formats

• To retrieve relevant information (e.g., find all moving objects) a common understanding of this information is needed

• Goal – provide means to combine different metadata standards

analytics information

metadatastandard

8/39




Outline





• Conclusions

9/39




Moving Object Detection in the Pixel Domain

• Background subtraction – create a background model for each pixel– compare new images with the background model– large differences result in foreground objects

• Different background models have been proposed in the literature– previous value, average value, …

background model

new image result

- =

10/39




Moving Object Detection in the Pixel Domain

• Problems with background subtraction1. moving trees, opened or closed doors, construction works, …

• single static model is insufficient

2. noise, weather conditions, shadows, …• model needs to accommodate for such situations

3. parked car • need to gather information on background and foreground

• Solution: multimodal background subtraction 1. multiple models per pixel2. each model contains several dynamic parameters3. model can represent both background and foreground

background model• noise statistics• previous value• average value


foreground model• noise statistics• previous value• average value

11/39


model 2



Multimodal Background Subtraction

model 1 model 3

For each new image1.compare pixel value with the models

• find a match with one of the models2.adapt the parameters of the models3.decision based on the matched model



foreground model• noise statistics• previous value• average value

pixel is background

12/39




Multimodal Background Subtraction

• Each pixel in the image has been classified as foreground or background

• Problem of “camouflage”– moving objects can contain parts that resemble the

environment

• Only using temporal information is not sufficient

13/39




Spatio-Temporal Multimodal Background Subtraction

• Use spatial information to improve the temporal background subtraction– spatial segmentation

• edge detection• fill the segments

14/39




Spatio-Temporal Multimodal Background Subtraction

• Combine spatial segmentation with temporal detection– segments containing many foreground pixels

are entirely regarded as foreground

spatio-tempor

al

temporal

spatial

15/39




Evaluation: Objective Results

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?

• Apply algorithm on video sequence and count correct and wrong detections– calculate precision and recall value

• Good systems obtain high precision and recall• Different parameter of an algorithm gives different outputs

– vary parameters– calculate precision and recall values– represent on a graph

16/39




Evaluation: Objective Results

• Compare proposed algorithm with similar techniques– Stauffer (2001), Shan (2006)

17/39


Evaluation: Subjective Results

• Visual examples of output of different algorithms

input image

ground truth

Stauffer ‘01

Shan ‘06 proposed

18/39


• Proposed system is faster than related work• Spatial segmentation can occur in parallel with temporal

detection– processing speed can be increased



Evaluation: Execution Times

Sequence Stauffer’01(fps)

proposed

(fps)

temporal(fps)

spatial(fps)

PetsD2TeC2 (384x288)

8.33 10 29.4 18.2

Indoor (340x240) 9.5 15.4 45.5 30

Ismail (320x240) 9.7 14.9 71.4 29.4

ThirdView (720x576)

1.1 2.3 3.6 7.7

19/39




Outline





• Conclusions

20/39




Moving Object Detection in the Compressed Domain

• Video is encoded to reduce network traffic and storage cost

• Video coding exploits redundancy in video– neighboring pixels often have similar values– successive images are closely related

• Before video analytics can be applied a decoding step is needed

• Apply analytics directly on the compressed bit stream

encoding

analytics

21/39


H.264/AVC

• Block-based video coding (standardized 2003)– frame divided into macroblocks (MBs) of 16x16 pixels – MBs are predicted based on previously encoded data– difference between prediction and MB is further encoded

• motion vector is stored in the bit stream to point to the prediction

• Current object detection techniques are based on motion vectors– motion vectors are created to compress,

not to represent the real motion– processing/filtering needed

to deal with noisy motion vectors

• Search for new approach

motion vectors

22/39




Observations

• Size of a MB (number of bits used within the compressed bit stream) changes over several consecutive images– MBs corresponding to background use few bits (frame 0 to 90)– if moving object passes the size of the MB rises (frame 90 to

120)

23/39


• Background model for each MB– training period– determine maximum size

• Threshold T• Compare MB sizes

with maximum + T– MBs with large sizes are

considered foreground



MB-based Background Subtraction

T

24/39


(sub)MB-based Background Subtraction

• MBs can be coarse (16x16 pixels)• H.264/AVC divides MBs into subMBs (4x4 pixels)• Refine the MB output to subMB level

– only regard foreground MBs at the boundaries of moving object

– analyze the size (in bits) of the subMBs in these boundary MBs

– small subMBs are regarded as background

25/39




Evaluation: Objective comparison

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?• Comparison with Zeng (2005) (based on motion vectors)

26/39




Evaluation: Execution Times

• Very high execution speeds– up to 20x faster than the related work

SequenceZeng’0

5(fps)

proposed(fps)

Etri od A (352x240) 28 662

PetsD2TeC2 (384x288)

22 448

Indoor (340x240) 31 751

27/39




Evaluation: Subjective Results

• Demonstration

28/39




Outline





• Conclusions

29/39


• Metadata is “data about data”– data about detected object: size, color, bounding box, …

• Metadata standard– common agreement on the format of the metadata

• Several metadata standards exist for video surveillance– modules can use different standards– same information can be represented in different formats



Metadata: Representing Moving Objects

analytics1

metadata

metadatastandard A

analytics2

metadata

metadata standard B

metadata

metadatastandard B

30/39





• Metadata standards– XML (eXtensible Markup Language)

• describes terms and structure of metadata

– specification• textual description of the semantics of the XML elements

<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>

Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”

metadata example 1

CVML (Computer Vision Markup Language)

<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>

BB: “Coordinates of a rectangular segment.”

metadata example 2

VS7 (Video Surveillance Schema)

31/39





• Proposal: use Semantic Web Technologies– make information on the internet accessible for

machines– information in a domain is structured using an

ontology• a data model that represents a set of concepts and relations

amongst these concepts within a specific domain

• OWL (Web Ontology Language)– W3C Recommendation (2004)

– standardized language for the description of an ontology

• classes, properties and relations• Individuals or instances

– can be queried through standardized languages

32/39





• Example: ontology for domain of science

subClassOf

birth date

DatatypeProperty

PersonClass: Person

Class: ScientistScientist

Individualbirth date

“14/10/1801”

OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …

“Joseph Plateau”

33/39


• Create OWL ontologies for the metadata standards used in video surveillance– CVML, VS7, MPEG-7, …

• Mappings link the different ontologies– use OWL constructs to link classes– denote that classes in the different ontologies can be

the same

• Information in different formats is linked– however, metadata can be very technical or general




OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

…

34/39


• One global ontology with general concepts for video surveillance

• Link with metadata ontologies through mappings• Layered metadata model • Only need to know the upper ontology to retrieve

information (e.g., retrieve all images with moving objects)OWL ontologyVideo Surveillance

upper layer

lower layer




OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

…

35/39




Evaluation: Practical Use Case Scenario

• Scenario– “operator wants to retrieve images that contain moving

objects”– analytics module 1 detects objects in CVML (XML)– analytics module 2 detects objects in VS7 (XML)

• Proposed– XML fragments are automatically converted to OWL instances– through the mappings these instances are linked to each

otherand to the Video Surveillance Ontology

– operator can use standardized languages to query the Video Surveillance Ontology

• Related work– specific software written to interpret CVML and VS7– specific software written to “translate” the operator’s request

to the corresponding XML elements

36/39




Outline





• Conclusions

37/39




Conclusions

• Algorithm for the detection of moving objects in pixel domain– multimodal background subtraction technique – combines spatial and temporal information– evaluated by comparison with related work

• more robust detection• faster execution speeds

• Algorithm for detection of moving objects in the compressed domain– novel approach that disregards motion vectors– macroblock-based background subtraction– evaluated by comparison with related work

• better detection results (very high precision)• up to 20 times faster than the related work

38/39




Conclusions

• Metadata for the representation of moving objects– discussed problems of the usage of different XML-based

metadata standards– introduction of Semantic Web Technologies – layered metadata model

• upper Video Surveillance Ontology• lower layer with pool of metadata ontologies• links defined using mappings

– evaluation based on practical use case scenario

39/39




Publications

• First author of 3 publications recorded in SCI (A1)– Robust Spatio-Temporal Multimodal Background

Subtraction for Video Surveillance

Optical Engineering

– Moving Object Detection in the H.264/AVC Compressed Domain for Video Surveillance Applications

Journal of Visual Communication & Image Representation

– Personal Content Management System, a Semantic Approach

Journal of Visual Communication & Image Representation

• Co-author of 1 publication recorded in SCI (A1)• 17 articles at international conferences• 5 standardization contributions

2009.06.09 chris poppe - public phd defense

Technology

video video

moving trees

video analytics

foreground detection

video camera

sufficient detection

video surveillance usage

literature previous