wavelet transform for video surveillance: from moving object...
TRANSCRIPT
Dr. Mohammed A.-Megeed SalemAssociate Professor
Faculty of Media Engineering Technology,German University in Cairo
Wavelet Transform for Video Surveillance: from Moving Object Detection to Action Recognition
1
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018 2
Outline
• Multiresolution Analysis
• Wavelet Transform
• Resolution Mosaic Representation
• Video Surveillance
• Object Detection
• Action Recognition
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Introduction
• The term of Multiresolution Analysis goes back to the end of 1970s. One of the first workshops titled “Multiresolution Visual Computing and Analysis” was held in Leesburg, VA, USA on July 1982 [Ros84b].
3
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Introduction
• Multiresolution analysis of a signal is a successive coarser approximation of the original signal.
• Finer resolution shows more details, while coarser resolution shows the approximation of the signal and only strong features can be detected.
4
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Introduction
• Signal: a composition of a smooth background and actions or details in the foreground.
• At a given resolution the signal is approximated by ignoring some details.
• To increase in resolution, finer details are added to the coarser approximation of the signal.
5
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Introduction
• Advantages
– reducing the computational cost.
– Enabling selective information extraction.
• Local information may be better processed in the high resolution levels.
• Global information may be processed in the low resolution levels.
6
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Multiresolution Representation: Approximation Spaces
• A multiresolution analysis consists of a sequence of successive approximation spaces Vj , j ∈ Z.
1.
2.
3.
4.
5.
𝑓 𝑡 ∈ 𝑉𝑗 𝑓 2−𝑗𝑡 ∈ 𝑉0, ∀𝑗 ∈ 𝑍
7
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Multiresolution Representation: Orthogonal Complement Spaces
Vj
Vj+1
Wj+1
Vj+2Wj+2
Orthonormal basis functions needed for projecting a signal in Vj .
Orthonormal basis functions needed for projecting a signal in Wj .
8
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Approximations and Details
• One-dimensional analysis is based
on one scaling function φ and one
wavelet ψ.
• The integral of ψ is zero, and ψ is
used to define the details.
• The integral of φ is 1, , and φ is
used to define the approximations
9
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Multiresolution Analysis
10
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
• The usual two-dimensional wavelets are defined as
tensor products of one-dimensional wavelets:
2-D Discrete Wavelet TransformApproximation and Details
11
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
• H-detail is obtained by averaging in the x-dimension and differencing in the y-dimension.
• V-detail is obtained by averaging in the y-dimension and differencing in the x-dimension.
• D-detail is obtained by differencing in both directions and then averaging.
• The three detail images can be combined to show the edges of the image.
Wavelets in Image ProcessingApproximation and Details
12
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Wavelets in Image Processing
13
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Multiresolution Analysis: Wavelets
• The 3D scaling and wavelet functions can each be expressed as a product of 3 one-dimensional functions.
• The analysis is carried out along the x-dimension, the y-dimension, and the z-dimension of the volumetric data.
• Eight coefficients result from the one level analysis.14
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Relevance-based Resolution Mosaic
• Assumptions to design a new algorithm:
– Spatial features of the pixels enhance theprocessing
– The image consists of relevant as well as non-relevant parts.
15
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Relevance-based Resolution Mosaic
• Gestaltists
– “A Gestalt or whole differs from the sum of its parts”
• We don’t see a complex scene of a set of objects, but rather objects and relations.
16
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Low
High
Reso
lutio
n le
ve
lInformation relevance
Very low
Relevance-based Resolution Mosaic
17
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Relevance-based Resolution Mosaic
Relevance-based Resolution Mosaic
18
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018 19
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Outline
• Multiresolution Analysis
• Wavelet Transform
• Resolution Mosaic Representation
• Video Surveillance• Object Detection
• Action Recognition
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Visual Surveillance
• Application Domains– Environment Surveillance
– Traffic Surveillance
– People Surveillance
21
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Video Surveillance
• Multiresolution and Resolution Mosaic image representation.
• Apply the 3D Wavelet transforms.
• Utilize the time dimension for motion/action detection and analysis.
• Address the problems of Moving Object Detection and Action Recognition.
22
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Res
olu
tion
leve
l
Information relevance
16
Moving Object Detection
23
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
25
1 3
5
5
Moving Object Detection
24
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Image(Original
resolution)
Map
Different levels 2D wavelet transform
A H V D
Spatial transformation
Spatial resolution mosaic subbands
Moving Object Detection
25
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Image(Original
resolution)
Map
Different levels 2D wavelet transform
A H V D
1D (temporal) wavelet transform
Spatial transformation
Temporal transformation
Temporal-spatial subbands
AA AD HA HD VA VD DA DD
Spatial resolution mosaic subbands
Temporal 1D arrays
Moving Object Detection
26
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Image(Original
resolution)
Map
Different levels 2D wavelet transform
A H V D
1D (temporal) wavelet transform
Spatial transformation
Temporal transformation
Temporal-spatial subbands
A D4 D2 D6 D1 D5 D3 D7
AA AD HA HD VA VD DA DD
Spatial resolution mosaic subbands
Temporal 1D arrays
Moving Object Detection
27
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Extraction of ROI
Extraction of active traffic area
Smallest bounding boxes
Manually segmented data
Results
28
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Outline
• Multiresolution Analysis
• Wavelet Transform
• Resolution Mosaic Representation
• Video Surveillance• Object Detection
• Action Recognition
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action vs Activity
• “Actions” : – refer to simple motion patterns usually executed by a single person
and typically lasting for short durations of time, on the order of tens of seconds.
• “Activity”:– refers to the complex sequence of actions performed by several
humans who could be interacting with each other in a constrained manner.
– They are typically characterized by much longer time durations, e.g., a football team scoring a goal, two persons shaking hand, or a coordinated bank attack by multiple robbers.
30
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Gesture Action Activity Event: Group Activity
Simple Complex
31
Action vs Activity
Ch
ange
Mo
veme
nt
Mo
veme
nt P
attern
Actio
n
Verb
Episo
de
Activity
Event
Histo
ry
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Challenges of Action Recognition
• Intra- and inter-class variations– Large variations in performance (walking speed and stride length).– Anthropometric differences between individuals.
• Environment and recording settings– Cluttered and dynamic environments– Lighting conditions– Viewpoints– Dynamic backgrounds.
• Temporal variations– Variation in the rate of performance of an action.
• Obtaining and labeling training data.– Limited number of training and testing sequences.– Labeling of datasets.
32
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Challenges of Action Recognition
• Intra- and inter-class variations– Large variations in performance (walking speed and stride length).– Anthropometric differences between individuals.
• Environment and recording settings– Cluttered and dynamic environments– Lighting conditions– Viewpoints– Dynamic backgrounds.
• Temporal variations– Variation in the rate of performance of an action.
• Obtaining and labeling training data.– Limited number of training and testing sequences.– Labeling of datasets.
33
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Challenges of Action Recognition
• Intra- and inter-class variations– Large variations in performance (walking speed and stride length).– Anthropometric differences between individuals.
• Environment and recording settings– Cluttered and dynamic environments– Lighting conditions– Viewpoints– Dynamic backgrounds.
• Temporal variations– Variation in the rate of performance of an action.
• Obtaining and labeling training data.– Limited number of training and testing sequences.– Labeling of datasets.
34
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Challenges of Action Recognition
• Intra- and inter-class variations– Large variations in performance (walking speed and stride length).– Anthropometric differences between individuals.
• Environment and recording settings– Cluttered and dynamic environments– Lighting conditions– Viewpoints– Dynamic backgrounds.
• Temporal variations– Variation in the rate of performance of an action.
• Obtaining and labeling training data.– Limited number of training and testing sequences.– Labeling of datasets.
35
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Benchmark Dataset
• CAVIAR test case scenarios
• Weizmann• UCF Sports action
dataset• Hollywood human
action dataset (HoHa1)
• TRECVid• VIRAD
36
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Structure of Action Recognition System
Action Segmentation
Action Modeling and
Re-presentation
Action Description
Action learning and classification
37
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Structure of Action Recognition System
Action Segmentation
Action Modeling and
Re-presentation
Action Description
Action learning and classification
38
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action Representation Techniques
Spatial or Temporal
Spatial
Image-models
Body models
Spatial Statistics
Temporal
Action grammars
Action templates
Temporal statistics
39
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action Representation Techniques
Global or Local
Global
Grid-basedSpace-time
volumes
Local
Space-time interest points
Local descriptors
Local grid-based
40
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action Representations
• Spatial: Body models
– MLD landmarks
– Cylindrical primitives
– Rectangular patches
– Stick figures
MLD: Moving Light Display
Cylindrical primitivesRectangular patches
42
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action Representations
• Temporal Action Templates
– Space-time volumes
– Motion history volumes Space-time volumes
Motion history volumes
43
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Temporal-Spatial Action Representations
• Global: Image models
– Silhouettes
– Contours
– Motion Energy Images (MEI) and Motion History Images (MHI)
• Local: Spatial statistics
– Space-time interest points
– Spatio-temporal features
Silhouettes
Contours
MEI & MHI44
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Temporal-Spatial Action Representations
Space-time interest points
Spatio-temporal features
45
• Global: Image models
– Silhouettes
– Contours
– Motion Energy Images (MEI) and Motion History Images (MHI)
• Local: Spatial statistics
– Space-time interest points
– Spatio-temporal features
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Wavelet-based Action Representation
Directional Wavelet energy Images:
𝜕
𝜕𝑦𝜕𝑡𝑉𝑖𝑑𝑒𝑜
𝜕
𝜕𝑥𝜕𝑡𝑉𝑖𝑑𝑒𝑜
𝜕
𝜕𝑥𝜕𝑦𝜕𝑡𝑉𝑖𝑑𝑒𝑜
Multiscale
Multiscale temporal change
46
horizontal
vertical
Diagonal
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Results: Directional Wavelet Energy ImagesUsing Weizman Dataset
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018
Action Recognition
3D SWT action detector
Learning action patterns
Action database
3D SWT motion detector
Motion detected? Action
label
Offline Training
Detection and recognition (Online)
yes
No
48
Salem, Wavelet Transform for Video Surveillance & Robot Vision, ACCI 2018 49
Wavelet Transform for Video Surveillance: from Moving Object Detection to Action Recognition
The first International Conference on Advanced Computer Communication and Informatics (ACCI 2018), Ismailia, Egypt.
Dr. Mohammed Abdel-Megeed M. SalemFaculty of Media Engineering & Technology,German University in CairoTel.: +2 011 1727 1050Email: [email protected]
Thank you