sequence-to-sequence alignment and applications. video > collection of image frames
TRANSCRIPT
Sequence-to-Sequence Alignment
[work with Yaron Caspi]
Sequence 1Frame 1
Frame 2Frame 3
Frame n
Sequence 2
Frame 1Frame 2
Frame 3
Frame n
Video 2 Video 1Frame 1
Frame 2Frame 3
Frame n
Frame 1Frame 2
Frame 3
Frame n
(a) Find temporal correspondences
(b) Find spatial correspondences
(x,y,t)(x’,y’,t’)
x
y t
Align and Integrate Space-Time Info[work with Yaron Caspi]
• Spatial resolution
• Temporal resolution
• Spectral range
• Depth of focus
• Dynamic range
• Field-of-View (FOV)
• View point
“Super Sensors” Exceed Optical Bounds of Visual Sensors:
Align and Integrate space-time info
Information in Video:
Alignment uniquely defined
• Appearance info
• Dynamic info
within frames
between frames
• Moving objects• Non rigid motion • Varying illumination
),,(),,(21
wtvyuxStyxS
Where:
);,,(
);,,(
);,,(
temporal
spatial
spatial
Ptyxww
Ptyxvv
Ptyxuu
affine 1D ,homography temporalspatial PP
Problem Formulation
) projective 2D(
Spatio-Temporal Alignment
SSD Minimization:
tyx
wtvyuxStyxSPErr,,
212 ),,(),,()(
tyx
T
w
v
u
SSSPErr,,
2
212)(
Gauss-Newton (coarse-to-fine) iterations
Coarse-to-Fine Minimization
time
256
256
100
Sequence 1time
256
256
100
Sequence 2
Pyramid of Sequence 2Pyramid of Sequence 1
128
12850
6425
64
128
128
50
64 2564
… …
time
time
time Super-resolution in space and in time.
time
High-resolution output sequence: time
Low-resolution input sequences
Increasing Space-Time Resolution in Video[work with Eli Shechtman & Yaron Caspi]
Spatial Super-ResolutionMultiple
low-resolution input images:
High-resolution output image:
Recover small details
What is Super-Resolution in Time?
Recover dynamic events that are “faster” than frame-rate (Generate a “high-speed” camera)
• Application areas: sports events, scientific imaging, etc...
• Effects of “fast” events imaged by “slow cameras”:
(1) Motion aliasing (2) Motion blur
(1) Motion AliasingThe “Wagon wheel” effect: Slow-motion:
time
Continuous signal
time
Sub-sampled in time
time
“Slow motion”
Sh(xh,yh,th)
lnS
lS1
Space-Time Super-Resolution
x
y t
y
x
t
Blur kernel:
PSF
Exposure time
lhA
Low resolution input sequences
High-resolution space-time volume
Input sequence in slow motion:
Super Resolution in Time
Output sequence (super-resolved) :
(75 frames/sec)(75 frames/sec)
Overlay of frames
Simulated sequences of “fast” event:• Very long exposure-time • Very low frame-rate
One low-res sequence: Another low-res sequence: And another one...
Motion Blur
Output trajectory: (overlay of frames)
Deblurring:
3 out of 18 low-resolution input sequences: (frame overlays)
Output:
Input:
Output sequence:
(x15 frame-rate)Without estimating motion of the ball!
Input (low-res) frames at collision:
4 input sequences: Output (high-res) frame at collision:
Motion-Blur
Video 1
Video 3
Video 2
Video 4
• Spatial resolution
• Temporal resolution
• Spectral range
• Depth of focus
• Dynamic range
• Field-of-View (FOV)
• View point
Optical Limits of Visual Sensors:
Very little common visual information!!!
Alignment of Non-Overlapping Sequences
Coherent appearance (Image-to-Image
Alignment)
Sequence-to-Sequence Alignment:
Alignment in time and in space
Coherent camera behavior
Coherent scene dynamics (Seq-to-Seq
Alignment)
[work with Yaron Caspi]
iT
ΔtiS
HΔt 1
Δt
HSHT ii Δt HH=? =?
Problem formulation
H
H
}{ iT }{ iSInput:
1Δt
HSHTi ii:Output: and such that H Δt Δt HHi
Sequence 1 Sequence 2
Conjugate matrices have the same eigenvalues:
)( seigenvalue )( seigenvalue tii ST
Recovering Temporal Alignment
iT
ΔtiS
Δt=?i],,[ 321
Ti ],,[ '3
'2
'1
))(),(( argmini
tii SseigenvalueTseigenvaluet
T and S have the same eigenvalues, up to scale:
)( )( tii SseigenvalueTseigenvalue
1Δt
HSHT ii Δt HH
Search for the temporal shift which minimizes:
Recovering Spatial Transformation
Given : Δt1
Δt
HSHT ii Δt HH
0 Δtii SHHT ΔtH H
Solve a homogeneous set of linear equations in H HH
011 ΔtSHHT ΔtH H
0 Δtnn SHHT ΔtH H
Fused Sequence:
Visible light (video): Infra-Red:
Exceed Limited Spectral Range – Day and Night Vision
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Summary• Forget image frames
Video = space-time volume >> collection of images
• Use all available spatio-temporal info for analysis, representation, and exploitation.
Applies to many problem areas: 1. Quick search in video. 2. Alignment and integration of information to exceed optical bounds of visual sensors. 3. Action analysis and recognition 4. Synthesis of video data
and many more…