2009.06.09 chris poppe - public phd defense

Post on 19-Jan-2015

1.074 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chris Poppe's public PhD defense entitled: "Detection and Representation of Moving Objects for Video Surveillance", 9th of June, 2009.

TRANSCRIPT

ELIS – Multimedia Lab

Detectie en representatie van bewegende objecten voor

videobewaking

Detection and Representation of Moving Objects for Video

SurveillanceChris Poppe

Multimedia LabDepartment of Electronics and Information Systems

Faculty of EngineeringGhent University

Supervisor: prof. dr. ir. Rik Van de Walle

2/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

3/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Video Surveillance

• “Usage of a video camera to act upon crime” • Number of cameras and surveillance systems has grown

– 2004: 4 285 000 cameras in United Kingdom

• Operators have problems to interpret the increasing amount of data– need for intelligent video surveillance systems

4/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Intelligent Video Surveillance System

encoding

video

analytics

storage

visualization

video + metadat

a

5/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Video Surveillance

• Automated analysis of the video to make intelligent decisions

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

person1

person2

intruder alert!!!

analytics

1. detection of moving objects

2. tracking3. classification4. identification5. interpretation

6/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Moving Object Detection

• Detection of moving objects first step in video analytics– needs to be fast and accurate

• Classify each pixel in the image as foreground or background

• Current techniques – good for “simple” situations– problems with moving trees, changing lighting conditions,

environmental conditions, …

• Goal– fast and robust detection of moving objects

analytics

7/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Moving Object Representation

• Analytics extracts information (e.g., moving objects) from video– represented using standardized formats (metadata standards)

• Large video surveillance systems contain several analytics modules – same information can be represented using different formats

• To retrieve relevant information (e.g., find all moving objects) a common understanding of this information is needed

• Goal – provide means to combine different metadata standards

analytics information

metadatastandard

8/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

9/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Pixel Domain

• Background subtraction – create a background model for each pixel– compare new images with the background model– large differences result in foreground objects

• Different background models have been proposed in the literature– previous value, average value, …

background model

new image result

- =

10/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Pixel Domain

• Problems with background subtraction1. moving trees, opened or closed doors, construction works, …

• single static model is insufficient

2. noise, weather conditions, shadows, …• model needs to accommodate for such situations

3. parked car • need to gather information on background and foreground

• Solution: multimodal background subtraction 1. multiple models per pixel2. each model contains several dynamic parameters3. model can represent both background and foreground

background model• noise statistics• previous value• average value

background model• noise statistics• previous value• average value

foreground model• noise statistics• previous value• average value

11/39

ELIS – Multimedia Lab

model 2

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Multimodal Background Subtraction

model 1 model 3

For each new image1.compare pixel value with the models

• find a match with one of the models2.adapt the parameters of the models3.decision based on the matched model

background model• noise statistics• previous value• average value

background model• noise statistics• previous value• average value

foreground model• noise statistics• previous value• average value

pixel is background

12/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Multimodal Background Subtraction

• Each pixel in the image has been classified as foreground or background

• Problem of “camouflage”– moving objects can contain parts that resemble the

environment

• Only using temporal information is not sufficient

13/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Spatio-Temporal Multimodal Background Subtraction

• Use spatial information to improve the temporal background subtraction– spatial segmentation

• edge detection• fill the segments

14/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Spatio-Temporal Multimodal Background Subtraction

• Combine spatial segmentation with temporal detection– segments containing many foreground pixels

are entirely regarded as foreground

spatio-tempor

al

temporal

spatial

15/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective Results

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?

• Apply algorithm on video sequence and count correct and wrong detections– calculate precision and recall value

• Good systems obtain high precision and recall• Different parameter of an algorithm gives different outputs

– vary parameters– calculate precision and recall values– represent on a graph

16/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective Results

• Compare proposed algorithm with similar techniques– Stauffer (2001), Shan (2006)

17/39

ELIS – Multimedia Lab

Evaluation: Subjective Results

• Visual examples of output of different algorithms

input image

ground truth

Stauffer ‘01

Shan ‘06 proposed

18/39

ELIS – Multimedia Lab

• Proposed system is faster than related work• Spatial segmentation can occur in parallel with temporal

detection– processing speed can be increased

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Execution Times

Sequence Stauffer’01(fps)

proposed

(fps)

temporal(fps)

spatial(fps)

PetsD2TeC2 (384x288)

8.33 10 29.4 18.2

Indoor (340x240) 9.5 15.4 45.5 30

Ismail (320x240) 9.7 14.9 71.4 29.4

ThirdView (720x576)

1.1 2.3 3.6 7.7

19/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

20/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Compressed Domain

• Video is encoded to reduce network traffic and storage cost

• Video coding exploits redundancy in video– neighboring pixels often have similar values– successive images are closely related

• Before video analytics can be applied a decoding step is needed

• Apply analytics directly on the compressed bit stream

encoding

analytics

21/39

ELIS – Multimedia Lab

H.264/AVC

• Block-based video coding (standardized 2003)– frame divided into macroblocks (MBs) of 16x16 pixels – MBs are predicted based on previously encoded data– difference between prediction and MB is further encoded

• motion vector is stored in the bit stream to point to the prediction

• Current object detection techniques are based on motion vectors– motion vectors are created to compress,

not to represent the real motion– processing/filtering needed

to deal with noisy motion vectors

• Search for new approach

motion vectors

22/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Observations

• Size of a MB (number of bits used within the compressed bit stream) changes over several consecutive images– MBs corresponding to background use few bits (frame 0 to 90)– if moving object passes the size of the MB rises (frame 90 to

120)

23/39

ELIS – Multimedia Lab

• Background model for each MB– training period– determine maximum size

• Threshold T• Compare MB sizes

with maximum + T– MBs with large sizes are

considered foreground

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

MB-based Background Subtraction

T

24/39

ELIS – Multimedia Lab

(sub)MB-based Background Subtraction

• MBs can be coarse (16x16 pixels)• H.264/AVC divides MBs into subMBs (4x4 pixels)• Refine the MB output to subMB level

– only regard foreground MBs at the boundaries of moving object

– analyze the size (in bits) of the subMBs in these boundary MBs

– small subMBs are regarded as background

25/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective comparison

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?• Comparison with Zeng (2005) (based on motion vectors)

26/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Execution Times

• Very high execution speeds– up to 20x faster than the related work

SequenceZeng’0

5(fps)

proposed(fps)

Etri od A (352x240) 28 662

PetsD2TeC2 (384x288)

22 448

Indoor (340x240) 31 751

27/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Subjective Results

• Demonstration

28/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

29/39

ELIS – Multimedia Lab

• Metadata is “data about data”– data about detected object: size, color, bounding box, …

• Metadata standard– common agreement on the format of the metadata

• Several metadata standards exist for video surveillance– modules can use different standards– same information can be represented in different formats

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

analytics1

metadata

metadatastandard A

analytics2

metadata

metadata standard B

metadata

metadatastandard B

30/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Metadata standards– XML (eXtensible Markup Language)

• describes terms and structure of metadata

– specification• textual description of the semantics of the XML elements

<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>

Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”

metadata example 1

CVML (Computer Vision Markup Language)

<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>

BB: “Coordinates of a rectangular segment.”

metadata example 2

VS7 (Video Surveillance Schema)

31/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Proposal: use Semantic Web Technologies– make information on the internet accessible for

machines– information in a domain is structured using an

ontology• a data model that represents a set of concepts and relations

amongst these concepts within a specific domain

• OWL (Web Ontology Language)– W3C Recommendation (2004)

– standardized language for the description of an ontology

• classes, properties and relations• Individuals or instances

– can be queried through standardized languages

32/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Example: ontology for domain of science

subClassOf

birth date

DatatypeProperty

PersonClass: Person

Class: ScientistScientist

Individualbirth date

“14/10/1801”

OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …

“Joseph Plateau”

33/39

ELIS – Multimedia Lab

• Create OWL ontologies for the metadata standards used in video surveillance– CVML, VS7, MPEG-7, …

• Mappings link the different ontologies– use OWL constructs to link classes– denote that classes in the different ontologies can be

the same

• Information in different formats is linked– however, metadata can be very technical or general

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

34/39

ELIS – Multimedia Lab

• One global ontology with general concepts for video surveillance

• Link with metadata ontologies through mappings• Layered metadata model • Only need to know the upper ontology to retrieve

information (e.g., retrieve all images with moving objects)OWL ontologyVideo Surveillance

upper layer

lower layer

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

35/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Practical Use Case Scenario

• Scenario– “operator wants to retrieve images that contain moving

objects”– analytics module 1 detects objects in CVML (XML)– analytics module 2 detects objects in VS7 (XML)

• Proposed– XML fragments are automatically converted to OWL instances– through the mappings these instances are linked to each

otherand to the Video Surveillance Ontology

– operator can use standardized languages to query the Video Surveillance Ontology

• Related work– specific software written to interpret CVML and VS7– specific software written to “translate” the operator’s request

to the corresponding XML elements

36/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

37/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Conclusions

• Algorithm for the detection of moving objects in pixel domain– multimodal background subtraction technique – combines spatial and temporal information– evaluated by comparison with related work

• more robust detection• faster execution speeds

• Algorithm for detection of moving objects in the compressed domain– novel approach that disregards motion vectors– macroblock-based background subtraction– evaluated by comparison with related work

• better detection results (very high precision)• up to 20 times faster than the related work

38/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Conclusions

• Metadata for the representation of moving objects– discussed problems of the usage of different XML-based

metadata standards– introduction of Semantic Web Technologies – layered metadata model

• upper Video Surveillance Ontology• lower layer with pool of metadata ontologies• links defined using mappings

– evaluation based on practical use case scenario

39/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Publications

• First author of 3 publications recorded in SCI (A1)– Robust Spatio-Temporal Multimodal Background

Subtraction for Video Surveillance

Optical Engineering

– Moving Object Detection in the H.264/AVC Compressed Domain for Video Surveillance Applications

Journal of Visual Communication & Image Representation

– Personal Content Management System, a Semantic Approach

Journal of Visual Communication & Image Representation

• Co-author of 1 publication recorded in SCI (A1)• 17 articles at international conferences• 5 standardization contributions

top related