video synopsis yael pritch alex rav-acha shmuel peleg the hebrew university of jerusalem
TRANSCRIPT
Video Synopsis
Yael Pritch Alex Rav-Acha Shmuel Peleg
The Hebrew University of Jerusalem
Detective Series: “Elementary”
Video Surveillance Problem
• It took weeks to find these events in video archives.
• Cost of a lost information or a delay may be very high.
Terrorists, London tube, 7-7-05Cologne Train Bombs, 31-7-06
Challenges in Video Surveillance
• Millions of surveillance cameras are installed, capturing data 24/365
• Number of cameras and their resolution increases rapidly
• Not enough people to watch captured data
• Human Attention is Lost after ~20 Minutes
• Result: Recorded Video is Lost Video– Less than 1% of surveillance video is
examined
Handling Surveillance Video
• Object Detection and Tracking– Background Subtraction
• Object Recognition– Individual people
• Activity Recognition– Left luggage; Fight
• A lot of progress done. More work remains.
• Object Detection and Tracking– Background Subtraction (Assume Done)
• Object Recognition (Do not use)– Individual people
• Activity Recognition (Do not use)– Left luggage; Fight
• A lot of progress done. More work remains.
• Let People do the Recognition
Handling Surveillance VideoVideo Synopsis
Video Synopsis
Video SynopsisOriginal video
• A fast way to browse & index video archives.• Summarize a full day of video in a few minutes.• Events from different times appear simultaneously.• Human inspection of synopsis!!!
Synopsis of Surveillance VideosHuman Inspection of Search Results
• Serve queries regarding each camera:– Generate a 3 minutes video showing
most activities in the last 24 hours– Generate the shortest video showing all
activities in the last 24 hours
• Each presented activity points back to original time in the original video
• Orthogonal to Video Analytics
Non-Chronological Time
Dynamic Mosaicing Video Synopsis
SalvadorDali
The Hebrew University of Jerusalem
Dynamic Mosaics
Non Chronological Time
HandheldStereo Mosaic
u
t
Mosaic Image
Original framesstrips
Frame tl
u
t
Frame tk
uaub
Mosaic Image
Space-TimeSlice
Visibility region
u
t
First Slice
Last Slice
play
Creating Dynamic Panoramic Movies
First Mosaic - Appearance
Last Mosaic - Disappearance
Dynamic Panorama: Iguazu Falls
u
t
From Video In to Video OutConstructing an aligned
Space-Time Volume
u
dtv
aαt
bAlignment: Parallax, Dynamic Scenes, etc.
t
u
kk+1
u
t
Stationary Camera Panning Camera
kk+1
Aligned ST Volume: View from Top
Generate Output VideoSweeping a “Time Front” surface
Time is not chronological any more
Interpolation
Generate Output VideoSweeping a “Time Front” surface
Time is not chronological any more
Interpolation
u
t
Evolving Time Frontu
t
x
Mapping each TF to a new frame using spatio-temporal interpolation
Example: Demolition
t
u
Example: Racing
t
v
Dynamic Panorama: Thessaloniki
Creating Panorama: 4D min-cutAligned space-time
volume
t
x
Mosaic Stitching Examples
Mosaic Stitching Examples
Video Synopsis and IndexingMaking a Long Video Short
• 11 million cameras in 2008• Expected 30 million in 2013• Recording 24 hours a day, every day
2009
Explosive growth in cameras…
201431
11m
24m
Handling the Video Overflow
• Not enough people to watch captured data
• Guards are watching 1% of video
• Automatic Video Analytics covers less than 5%
– Only when events can be accurately defined & detected
• Most video is never watched or examined!!!
A Recent Example
• Key framesC. Kim and J. Hwang. An integrated scheme for object-based video abstraction. In ACM Multimedia, pages 303–311, New York, 2000.
• Collection of short video sequencesA. M. Smith and T. Kanade. Video skimming and characterization through the combination of image and
language understanding. In CAIVD, pages 61–70, 1998.
• Adaptive Fast Forward N. Petrovic, N. Jojic, and T. Huang. Adaptive video fast forward. Multimedia Tools and Applications,
26(3):327–344, August 2005.
Entire frames are used as the fundamental building blocks
• Mosaic images together with some meta-data for video indexingM. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu. Efficient representations of video sequences
and their applications. Signal Processing: Image Communication, 8(4):327–351, 1996.
• Space Time Video montageH. Kang, Y. Matsushita, X. Tang, and X. Chen. Space-time video montage. In CVPR’06, pages 1331–
1338, New-York, June 2006.
Related Work (Video Summary)
• We proposed Objects / Events based summary as opposed to Frames based summary– Enables to shorten a very long video
into a short time
– No fast forward of objects (preserve dynamics)
– Causality is not necessarily kept
Object Based Video Summary
Original video: 24 hours Video Synopsis: 1 minute
Video Synopsis• Browse Hours in Minutes• Index back to Original Video
t
Video SynopsisShift Objects in Time
Input Video I(x,y,t)
Synopsis Video S(x,y,t)
Objects Extracted to Database
10:00
09:0311:08
14:38
18:45
21:50
38
How does Video Synopsis work?
Original: 9 hours
Video Synopsis:30 seconds
38
How Does Video Synopsis works
Original: 9 hours
Video Synopsis:30 seconds
• Detect and track objects, store in database.• Select relevant objects from database• Display selected objects in a very short
“Video Synopsis”• In “Video Synopsis”, objects from different
times can appear simultaneously• Index from selected objects into original video• Cluster similar objects
Steps in Video Synopsis
42
Input Video
t
Synopsis Video
x
Object “Packing”
• Compute object
trajectories
• Pack objects in shorter
time (minimize overlap)
• Overlay objects on top
of time-laps background
Example: Monitoring a Coffee Station
t
x
x
t
Original Movie Stroboscopic Movie
Panoramic Synopsis
Panoramic synopsis is possible when the camera is rotating.
Original
Panoramic Video Synopsis
Endless video – Challenges
• Endless video – finite storage (“forget” events)
• Background changes during long time periods
• Stitching object on a background from a different time
• Fast response to user queries
Online Monitoring• Online Monitoring (real time)
– Compute background (background model)– Find Activity Tubes and insert to database– Handle a queue of objects
• Query Service– Collect tubes with desired properties (time…)– Generate Time Lapse Background– Pack tubes into desired length of synopsis– Stitching of objects to background
2 Phase approach
Online Monitoring• Online Monitoring (real time)
– Compute background (background model)– Find Activity Tubes and insert to database– Handle a queue of objects
• Query Service– Collect tubes with desired properties (time…)– Generate Time Lapse Background– Pack tubes into desired length of synopsis– Stitching of objects to background
2 Phase approach
Extract TubesObject Detection and
Tracking• We used a simplification of
Background-Cut*– combining background subtraction
with min-cut
• Connect space time tubes component
• Morphological operations
* J. Sun, W. Zhang, X. Tang, and H. Shum. Background cut. In ECCV, pages 628–641, 2006
Extract Tubes
The Object Queue
• Limited Storage Space with Endless Video– May need to discard objects
• Estimate object usefulness for future queries– “Importance” (application dependent)– Collision Potential – Age: discard older objects first
• Take mistakes into account….
Query Service• Online Monitoring (real time)
– Pre-Processing : remove stationary frames– Compute background (temporal median)– Find Activity Tubes and insert to database– Handle a queue of objects
• Query Service– Collect tubes with desired properties (time…)– Generate Time Lapse Background– Pack tubes into desired length of synopsis– Stitching of objects to background
2 Phase approach
Time-Lapse Background
Time-Lapse Background
• Time Lapse background goals– Represent background changes over time– Represent the background of activity tubes
Activity distribution over time(parking lot 24 hours)
20% night frames
Tubes Selection
Guidelines for the tubes arrangement :• Maximum “activity” in synopsis• Minimum collision between objects• Preserve causality (temporal consistency)
This defines energy minimization process :
A time mapping between the input tubes and the appearance time in the output synopsis
Energy Minimization Problem
Bb Bbb
tca bbEbbEbEME',
)'ˆ,ˆ()'ˆ,ˆ()ˆ()(
Activity Cost(favors synopsis
video with maximal activity)
Temporal consistency Cost(favors synopsis video that preserves original
order of events )
Collision Cost(favors synopsis
video withminimal collision between tubes )
synopsis theinto b tubeof shift) (time mapping the- b̂
ubesactivity t -
synopsis theinput to thefrom mapping temporal-
B
M
Tubes Selection as Energy Minimization
• Each state – temporal mapping of tubes into the synopsis
• Neighboring states - states in which a single activity tube changes its mapping into the synopsis.
• Initial state - all tubes are shifted to the beginning of the synopsis video.
Stitching the Synopsis
• Challenge : Different lighting for objects and background
• Assumption : Extracted tubes are surrounded with background pixels
• Our Stitching method :Modification of Poisson Editing – add weight for object to
keep original color
Stitching the Synopsis
• Challenge : objects stitched on time lapse background with possibly different lighting condition (for example : day / night)
• Assumption : no accurate segmentation. Tubes are extracted surrounded with background pixels
• Our Stitching method : modification of Poisson editing
add weight
for object to
keep original color
Stitching the Synopsis
Stitching the Synopsis
Webcam in Parking LotTypical Webcam Stream
(24 hours)
Webcam Synopsis :20 Seconds
Video Indexing
Webcam Synopsis :20 Seconds
Link from the synopsis back to the original video context
synopsis can be used for video indexing
Webcam Synopsis :20 Seconds
Link from the synopsis back to the original video context
synopsis can be used for video indexing
Video Indexing
Link from the synopsis back to the original video context
Video Indexing
Hotspot on Tracked Objects
Link from the synopsis back to the original video context
Video Indexing
Hotspot on Tracked Objects
Who soiled my lawn?
Unexpected Applications
2 hours 20 seconds
Examples
Video Synopsis Should be More Organized
Clustered SynopsisFaster and more accurate browsing
cars people
Example: Cluster into 2 clusters based on shape
Continue Examining the ‘Car’ cluster
Clustering by Motion of ‘Cars’ ClassSynopsis now useful in crowded scenes
ExitEnter
Up HillRight
)ˆˆ(2
1 k k
ik
jk
jk
ikij ssss
Nsd
Appearance (Shape) Distance Between Objects
Symmetric Average Nearest Neighbor distance between SIFT descriptors
O. Boiman, E. Shechtman and M. Irani, In Defense of Nearest-Neighbor Based Image Classification .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008 .
K’s Sift Descriptor in tube iSift Descriptor closest to K of tube j
Spectral Clustering by Appearance
Cluster 1 Cluster 2
Cluster 3 Cluster 4
• More Classes : Easy to Remove False Alarm Classes
Gate Trees
Spectral Clustering by Appearance
)()(
)()( kSep
kT
kwkMd ij
ijij
Object Distance: MotionTrajectory Similarity
– Computing minimum area between trajectories over all temporal shifts
– Efficient computation using NN and KD trees
Weight encouraging long temporal overlap
Common Time of tubes
Space Time trajectory distance
))()(()()(
22
kTt
j
kt
i
t
j
kt
i
tij
ij
yyxxkSep
x
t
k
Spectral Clustering by Motion‘Cars’ Class
ExitEnter
Up HillRight
Creating Video Synopsis
• Goals – Video Synopsis Having Shortest Duration– Minimal Collision Between Objects
• Approach– Displaying clustered objects together– Objects packed in space-time like sardines
Packing Cost Example• Packing cars on the top road
Affinity Matrix after Clustering
Arranged Cluster 1 Arranged Cluster 2
Combining Two Clusters
Low Collision Cost Between
Classes
High Collision Cost Between
Classes
An Important Application:Display Results of Video Analytics
• Display the hundreds of “Blue Cars”
• Display thousands of people going left
• Good for verification of algorithm as well as for
deployment
Two Clusters
Cars
People
Camera in St. Petersburg
• Detect specific events• Discover activity patterns
Cars
People
Two Clusters
Camera in China
Automatically Generated ClustersUsing Only Shape & Motion
People LeftPeople Right
Cars LeftCars Right Cars Parking
People Misc.