2009.06.09 chris poppe - public phd defense
DESCRIPTION
Chris Poppe's public PhD defense entitled: "Detection and Representation of Moving Objects for Video Surveillance", 9th of June, 2009.TRANSCRIPT
ELIS – Multimedia Lab
Detectie en representatie van bewegende objecten voor
videobewaking
Detection and Representation of Moving Objects for Video
SurveillanceChris Poppe
Multimedia LabDepartment of Electronics and Information Systems
Faculty of EngineeringGhent University
Supervisor: prof. dr. ir. Rik Van de Walle
2/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Outline
• Introduction: Context and Problem Description
• Detection of Moving Objects in the Pixel Domain
• Detection of Moving Objects in the Compressed Domain
• Metadata: Representing Moving Objects
• Conclusions
3/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Introduction: Video Surveillance
• “Usage of a video camera to act upon crime” • Number of cameras and surveillance systems has grown
– 2004: 4 285 000 cameras in United Kingdom
• Operators have problems to interpret the increasing amount of data– need for intelligent video surveillance systems
4/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Introduction: Intelligent Video Surveillance System
encoding
video
analytics
storage
visualization
video + metadat
a
5/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Introduction: Video Surveillance
• Automated analysis of the video to make intelligent decisions
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
person1
person2
intruder alert!!!
analytics
1. detection of moving objects
2. tracking3. classification4. identification5. interpretation
6/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Introduction: Moving Object Detection
• Detection of moving objects first step in video analytics– needs to be fast and accurate
• Classify each pixel in the image as foreground or background
• Current techniques – good for “simple” situations– problems with moving trees, changing lighting conditions,
environmental conditions, …
• Goal– fast and robust detection of moving objects
analytics
7/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Introduction: Moving Object Representation
• Analytics extracts information (e.g., moving objects) from video– represented using standardized formats (metadata standards)
• Large video surveillance systems contain several analytics modules – same information can be represented using different formats
• To retrieve relevant information (e.g., find all moving objects) a common understanding of this information is needed
• Goal – provide means to combine different metadata standards
analytics information
metadatastandard
8/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Outline
• Introduction: Context and Problem Description
• Detection of Moving Objects in the Pixel Domain
• Detection of Moving Objects in the Compressed Domain
• Metadata: Representing Moving Objects
• Conclusions
9/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Moving Object Detection in the Pixel Domain
• Background subtraction – create a background model for each pixel– compare new images with the background model– large differences result in foreground objects
• Different background models have been proposed in the literature– previous value, average value, …
background model
new image result
- =
10/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Moving Object Detection in the Pixel Domain
• Problems with background subtraction1. moving trees, opened or closed doors, construction works, …
• single static model is insufficient
2. noise, weather conditions, shadows, …• model needs to accommodate for such situations
3. parked car • need to gather information on background and foreground
• Solution: multimodal background subtraction 1. multiple models per pixel2. each model contains several dynamic parameters3. model can represent both background and foreground
background model• noise statistics• previous value• average value
background model• noise statistics• previous value• average value
foreground model• noise statistics• previous value• average value
11/39
ELIS – Multimedia Lab
model 2
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Multimodal Background Subtraction
model 1 model 3
For each new image1.compare pixel value with the models
• find a match with one of the models2.adapt the parameters of the models3.decision based on the matched model
background model• noise statistics• previous value• average value
background model• noise statistics• previous value• average value
foreground model• noise statistics• previous value• average value
pixel is background
12/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Multimodal Background Subtraction
• Each pixel in the image has been classified as foreground or background
• Problem of “camouflage”– moving objects can contain parts that resemble the
environment
• Only using temporal information is not sufficient
13/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Spatio-Temporal Multimodal Background Subtraction
• Use spatial information to improve the temporal background subtraction– spatial segmentation
• edge detection• fill the segments
14/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Spatio-Temporal Multimodal Background Subtraction
• Combine spatial segmentation with temporal detection– segments containing many foreground pixels
are entirely regarded as foreground
spatio-tempor
al
temporal
spatial
15/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Objective Results
• Precision: How much of the detected foreground pixels are correct?
• Recall: How much of the real foreground pixels are detected?
• Apply algorithm on video sequence and count correct and wrong detections– calculate precision and recall value
• Good systems obtain high precision and recall• Different parameter of an algorithm gives different outputs
– vary parameters– calculate precision and recall values– represent on a graph
16/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Objective Results
• Compare proposed algorithm with similar techniques– Stauffer (2001), Shan (2006)
17/39
ELIS – Multimedia Lab
Evaluation: Subjective Results
• Visual examples of output of different algorithms
input image
ground truth
Stauffer ‘01
Shan ‘06 proposed
18/39
ELIS – Multimedia Lab
• Proposed system is faster than related work• Spatial segmentation can occur in parallel with temporal
detection– processing speed can be increased
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Execution Times
Sequence Stauffer’01(fps)
proposed
(fps)
temporal(fps)
spatial(fps)
PetsD2TeC2 (384x288)
8.33 10 29.4 18.2
Indoor (340x240) 9.5 15.4 45.5 30
Ismail (320x240) 9.7 14.9 71.4 29.4
ThirdView (720x576)
1.1 2.3 3.6 7.7
19/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Outline
• Introduction: Context and Problem Description
• Detection of Moving Objects in the Pixel Domain
• Detection of Moving Objects in the Compressed Domain
• Metadata: Representing Moving Objects
• Conclusions
20/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Moving Object Detection in the Compressed Domain
• Video is encoded to reduce network traffic and storage cost
• Video coding exploits redundancy in video– neighboring pixels often have similar values– successive images are closely related
• Before video analytics can be applied a decoding step is needed
• Apply analytics directly on the compressed bit stream
encoding
analytics
21/39
ELIS – Multimedia Lab
H.264/AVC
• Block-based video coding (standardized 2003)– frame divided into macroblocks (MBs) of 16x16 pixels – MBs are predicted based on previously encoded data– difference between prediction and MB is further encoded
• motion vector is stored in the bit stream to point to the prediction
• Current object detection techniques are based on motion vectors– motion vectors are created to compress,
not to represent the real motion– processing/filtering needed
to deal with noisy motion vectors
• Search for new approach
motion vectors
22/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Observations
• Size of a MB (number of bits used within the compressed bit stream) changes over several consecutive images– MBs corresponding to background use few bits (frame 0 to 90)– if moving object passes the size of the MB rises (frame 90 to
120)
23/39
ELIS – Multimedia Lab
• Background model for each MB– training period– determine maximum size
• Threshold T• Compare MB sizes
with maximum + T– MBs with large sizes are
considered foreground
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
MB-based Background Subtraction
T
24/39
ELIS – Multimedia Lab
(sub)MB-based Background Subtraction
• MBs can be coarse (16x16 pixels)• H.264/AVC divides MBs into subMBs (4x4 pixels)• Refine the MB output to subMB level
– only regard foreground MBs at the boundaries of moving object
– analyze the size (in bits) of the subMBs in these boundary MBs
– small subMBs are regarded as background
25/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Objective comparison
• Precision: How much of the detected foreground pixels are correct?
• Recall: How much of the real foreground pixels are detected?• Comparison with Zeng (2005) (based on motion vectors)
26/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Execution Times
• Very high execution speeds– up to 20x faster than the related work
SequenceZeng’0
5(fps)
proposed(fps)
Etri od A (352x240) 28 662
PetsD2TeC2 (384x288)
22 448
Indoor (340x240) 31 751
27/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Subjective Results
• Demonstration
28/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Outline
• Introduction: Context and Problem Description
• Detection of Moving Objects in the Pixel Domain
• Detection of Moving Objects in the Compressed Domain
• Metadata: Representing Moving Objects
• Conclusions
29/39
ELIS – Multimedia Lab
• Metadata is “data about data”– data about detected object: size, color, bounding box, …
• Metadata standard– common agreement on the format of the metadata
• Several metadata standards exist for video surveillance– modules can use different standards– same information can be represented in different formats
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
analytics1
metadata
metadatastandard A
analytics2
metadata
metadata standard B
metadata
metadatastandard B
30/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
• Metadata standards– XML (eXtensible Markup Language)
• describes terms and structure of metadata
– specification• textual description of the semantics of the XML elements
<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>
Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”
metadata example 1
CVML (Computer Vision Markup Language)
<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>
BB: “Coordinates of a rectangular segment.”
metadata example 2
VS7 (Video Surveillance Schema)
31/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
• Proposal: use Semantic Web Technologies– make information on the internet accessible for
machines– information in a domain is structured using an
ontology• a data model that represents a set of concepts and relations
amongst these concepts within a specific domain
• OWL (Web Ontology Language)– W3C Recommendation (2004)
– standardized language for the description of an ontology
• classes, properties and relations• Individuals or instances
– can be queried through standardized languages
32/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
• Example: ontology for domain of science
subClassOf
birth date
DatatypeProperty
PersonClass: Person
Class: ScientistScientist
Individualbirth date
“14/10/1801”
OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …
“Joseph Plateau”
33/39
ELIS – Multimedia Lab
• Create OWL ontologies for the metadata standards used in video surveillance– CVML, VS7, MPEG-7, …
• Mappings link the different ontologies– use OWL constructs to link classes– denote that classes in the different ontologies can be
the same
• Information in different formats is linked– however, metadata can be very technical or general
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
OWL ontologyCVML
OWL ontologyVS7
OWL ontologyMPEG7
…
34/39
ELIS – Multimedia Lab
• One global ontology with general concepts for video surveillance
• Link with metadata ontologies through mappings• Layered metadata model • Only need to know the upper ontology to retrieve
information (e.g., retrieve all images with moving objects)OWL ontologyVideo Surveillance
upper layer
lower layer
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Metadata: Representing Moving Objects
OWL ontologyCVML
OWL ontologyVS7
OWL ontologyMPEG7
…
35/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Evaluation: Practical Use Case Scenario
• Scenario– “operator wants to retrieve images that contain moving
objects”– analytics module 1 detects objects in CVML (XML)– analytics module 2 detects objects in VS7 (XML)
• Proposed– XML fragments are automatically converted to OWL instances– through the mappings these instances are linked to each
otherand to the Video Surveillance Ontology
– operator can use standardized languages to query the Video Surveillance Ontology
• Related work– specific software written to interpret CVML and VS7– specific software written to “translate” the operator’s request
to the corresponding XML elements
36/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Outline
• Introduction: Context and Problem Description
• Detection of Moving Objects in the Pixel Domain
• Detection of Moving Objects in the Compressed Domain
• Metadata: Representing Moving Objects
• Conclusions
37/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Conclusions
• Algorithm for the detection of moving objects in pixel domain– multimodal background subtraction technique – combines spatial and temporal information– evaluated by comparison with related work
• more robust detection• faster execution speeds
• Algorithm for detection of moving objects in the compressed domain– novel approach that disregards motion vectors– macroblock-based background subtraction– evaluated by comparison with related work
• better detection results (very high precision)• up to 20 times faster than the related work
38/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Conclusions
• Metadata for the representation of moving objects– discussed problems of the usage of different XML-based
metadata standards– introduction of Semantic Web Technologies – layered metadata model
• upper Video Surveillance Ontology• lower layer with pool of metadata ontologies• links defined using mappings
– evaluation based on practical use case scenario
39/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Publications
• First author of 3 publications recorded in SCI (A1)– Robust Spatio-Temporal Multimodal Background
Subtraction for Video Surveillance
Optical Engineering
– Moving Object Detection in the H.264/AVC Compressed Domain for Video Surveillance Applications
Journal of Visual Communication & Image Representation
– Personal Content Management System, a Semantic Approach
Journal of Visual Communication & Image Representation
• Co-author of 1 publication recorded in SCI (A1)• 17 articles at international conferences• 5 standardization contributions