wavelet transform for video surveillance: from moving object...

Dr. Mohammed A.-Megeed SalemAssociate Professor

Faculty of Media Engineering Technology,German University in Cairo

Wavelet Transform for Video Surveillance: from Moving Object Detection to Action Recognition

1

Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018 2

Outline

• Multiresolution Analysis

• Wavelet Transform

• Resolution Mosaic Representation

• Video Surveillance

• Object Detection

• Action Recognition

Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018

Introduction

• The term of Multiresolution Analysis goes back to the end of 1970s. One of the first workshops titled “Multiresolution Visual Computing and Analysis” was held in Leesburg, VA, USA on July 1982 [Ros84b].

3


Introduction

• Multiresolution analysis of a signal is a successive coarser approximation of the original signal.

• Finer resolution shows more details, while coarser resolution shows the approximation of the signal and only strong features can be detected.

4


Introduction

• Signal: a composition of a smooth background and actions or details in the foreground.

• At a given resolution the signal is approximated by ignoring some details.

• To increase in resolution, finer details are added to the coarser approximation of the signal.

5


Introduction

• Advantages

– reducing the computational cost.

– Enabling selective information extraction.

• Local information may be better processed in the high resolution levels.

• Global information may be processed in the low resolution levels.

6


Multiresolution Representation: Approximation Spaces

• A multiresolution analysis consists of a sequence of successive approximation spaces Vj , j ∈ Z.

1.

2.

3.

4.

5.

𝑓 𝑡 ∈ 𝑉𝑗 𝑓 2−𝑗𝑡 ∈ 𝑉0, ∀𝑗 ∈ 𝑍

7


Multiresolution Representation: Orthogonal Complement Spaces

Vj

Vj+1

Wj+1

Vj+2Wj+2

Orthonormal basis functions needed for projecting a signal in Vj .

Orthonormal basis functions needed for projecting a signal in Wj .

8


Approximations and Details

• One-dimensional analysis is based

on one scaling function φ and one

wavelet ψ.

• The integral of ψ is zero, and ψ is

used to define the details.

• The integral of φ is 1, , and φ is

used to define the approximations

9


Multiresolution Analysis

10


• The usual two-dimensional wavelets are defined as

tensor products of one-dimensional wavelets:

2-D Discrete Wavelet TransformApproximation and Details

11


• H-detail is obtained by averaging in the x-dimension and differencing in the y-dimension.

• V-detail is obtained by averaging in the y-dimension and differencing in the x-dimension.

• D-detail is obtained by differencing in both directions and then averaging.

• The three detail images can be combined to show the edges of the image.

Wavelets in Image ProcessingApproximation and Details

12


Wavelets in Image Processing

13


Multiresolution Analysis: Wavelets

• The 3D scaling and wavelet functions can each be expressed as a product of 3 one-dimensional functions.

• The analysis is carried out along the x-dimension, the y-dimension, and the z-dimension of the volumetric data.

• Eight coefficients result from the one level analysis.14


Relevance-based Resolution Mosaic

• Assumptions to design a new algorithm:

– Spatial features of the pixels enhance theprocessing

– The image consists of relevant as well as non-relevant parts.

15



• Gestaltists

– “A Gestalt or whole differs from the sum of its parts”

• We don’t see a complex scene of a set of objects, but rather objects and relations.

16


Low

High

Reso

lutio

n le

ve

lInformation relevance

Very low


17




18


Outline




• Video Surveillance• Object Detection



Visual Surveillance

• Application Domains– Environment Surveillance

– Traffic Surveillance

– People Surveillance

21


Video Surveillance

• Multiresolution and Resolution Mosaic image representation.

• Apply the 3D Wavelet transforms.

• Utilize the time dimension for motion/action detection and analysis.

• Address the problems of Moving Object Detection and Action Recognition.

22


Res

olu

tion

leve

l

Information relevance

16

Moving Object Detection

23


25

1 3

5

5


24


Image(Original

resolution)

Map

Different levels 2D wavelet transform

A H V D

Spatial transformation

Spatial resolution mosaic subbands


25


Image(Original

resolution)

Map


A H V D

1D (temporal) wavelet transform


Temporal transformation

Temporal-spatial subbands

AA AD HA HD VA VD DA DD


Temporal 1D arrays


26


Image(Original

resolution)

Map


A H V D

1D (temporal) wavelet transform


Temporal transformation

Temporal-spatial subbands

A D4 D2 D6 D1 D5 D3 D7

AA AD HA HD VA VD DA DD


Temporal 1D arrays


27

Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018

Extraction of ROI

Extraction of active traffic area

Smallest bounding boxes

Manually segmented data

Results

28


Outline




• Video Surveillance• Object Detection



Action vs Activity

• “Actions” : – refer to simple motion patterns usually executed by a single person

and typically lasting for short durations of time, on the order of tens of seconds.

• “Activity”:– refers to the complex sequence of actions performed by several

humans who could be interacting with each other in a constrained manner.

– They are typically characterized by much longer time durations, e.g., a football team scoring a goal, two persons shaking hand, or a coordinated bank attack by multiple robbers.

30


Gesture Action Activity Event: Group Activity

Simple Complex

31

Action vs Activity

Ch

ange

Mo

veme

nt

Mo

veme

nt P

attern

Actio

n

Verb

Episo

de

Activity

Event

Histo

ry


Challenges of Action Recognition

• Intra- and inter-class variations– Large variations in performance (walking speed and stride length).– Anthropometric differences between individuals.

• Environment and recording settings– Cluttered and dynamic environments– Lighting conditions– Viewpoints– Dynamic backgrounds.

• Temporal variations– Variation in the rate of performance of an action.

• Obtaining and labeling training data.– Limited number of training and testing sequences.– Labeling of datasets.

32







33







34







35


Benchmark Dataset

• CAVIAR test case scenarios

• Weizmann• UCF Sports action

dataset• Hollywood human

action dataset (HoHa1)

• TRECVid• VIRAD

36


Structure of Action Recognition System

Action Segmentation

Action Modeling and

Re-presentation

Action Description

Action learning and classification

37


Structure of Action Recognition System

Action Segmentation

Action Modeling and

Re-presentation

Action Description

Action learning and classification

38


Action Representation Techniques

Spatial or Temporal

Spatial

Image-models

Body models

Spatial Statistics

Temporal

Action grammars

Action templates

Temporal statistics

39


Action Representation Techniques

Global or Local

Global

Grid-basedSpace-time

volumes

Local

Space-time interest points

Local descriptors

Local grid-based

40


Action Representations

• Spatial: Body models

– MLD landmarks

– Cylindrical primitives

– Rectangular patches

– Stick figures

MLD: Moving Light Display

Cylindrical primitivesRectangular patches

42


Action Representations

• Temporal Action Templates

– Space-time volumes

– Motion history volumes Space-time volumes

Motion history volumes

43


Temporal-Spatial Action Representations

• Global: Image models

– Silhouettes

– Contours

– Motion Energy Images (MEI) and Motion History Images (MHI)

• Local: Spatial statistics

– Space-time interest points

– Spatio-temporal features

Silhouettes

Contours

MEI & MHI44


Temporal-Spatial Action Representations

Space-time interest points

Spatio-temporal features

45

• Global: Image models

– Silhouettes

– Contours

– Motion Energy Images (MEI) and Motion History Images (MHI)

• Local: Spatial statistics

– Space-time interest points

– Spatio-temporal features


Wavelet-based Action Representation

Directional Wavelet energy Images:

𝜕

𝜕𝑦𝜕𝑡𝑉𝑖𝑑𝑒𝑜

𝜕

𝜕𝑥𝜕𝑡𝑉𝑖𝑑𝑒𝑜

𝜕

𝜕𝑥𝜕𝑦𝜕𝑡𝑉𝑖𝑑𝑒𝑜

Multiscale

Multiscale temporal change

46

horizontal

vertical

Diagonal


Results: Directional Wavelet Energy ImagesUsing Weizman Dataset


Action Recognition

3D SWT action detector

Learning action patterns

Action database

3D SWT motion detector

Motion detected? Action

label

Offline Training

Detection and recognition (Online)

yes

No

48


Wavelet Transform for Video Surveillance: from Moving Object Detection to Action Recognition

The first International Conference on Advanced Computer Communication and Informatics (ACCI 2018), Ismailia, Egypt.

Dr. Mohammed Abdel-Megeed M. SalemFaculty of Media Engineering & Technology,German University in CairoTel.: +2 011 1727 1050Email: [email protected]

Thank you

mailto:[email protected]

wavelet transform for video surveillance: from moving object...

Documents