DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML
Fast super-resolution of video sequences using sparse directional transforms*
Sandeep KanumuriOnur G. Guleryuz
DoCoMo USA Labs
*Presented at 2008 SIAM Conference on Imaging Science on 07/09/2008
(Animated slides, please use slide show mode)
2DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Outline
• System Model
• Motivation
• Prior Work
• Our Solution: SWAT (Sparse Warped transform and Adaptive Thresholding)– Algorithm Flowchart
– Over-complete Transform
– Warped (Directional) Transform
– Over-complete Inverse Transform
– Adaptive Thresholding
• Performance Comparison
• Conclusion
3DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
System Model
• Design goals1. High Quality Rendering
2. Fast Algorithm (Lower Complexity) – Single Frame, Simple Transform
DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML
Motivation
5DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Broadcast Video – TV application
Docking station
Low-resolution video signal for mobile phones
Low-resolution video is sent to the docking station
Docking station uses the SWAT algorithm to convert low-resolution video to high-resolution video
High-resolution video is sent to a TV or a large display
BENEFIT: Broadcast programming aimed at mobile phones can also be
used in stationary environments
A.1
A.2
B
Low-resolution video is converted to high-resolution video by the cell phone itself
using the SWAT algorithm and high-resolution video is transmitted to the TV
using local wireless technologies
Only one path (Path A or Path B) is used
6DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Broadcast Video – VGA phones
Low-resolution video signal for mobile phones
BENEFIT: SWAT capability allows this cell phone to convert low-resolution
video to high-resolution video
VGA phone with SWAT capability
VGA phone without SWAT capability
7DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
More Applications…
• Video Quality Enhancement Service– SWAT algorithm can be deployed as a service to enhance the
resolution and quality of videos
• Video Conferencing– A SWAT equipped terminal can show video at a higher zoom level
and with improved quality
• High-quality Image Zooming– SWAT algorithm enables the mobile phone to convert the low quality,
low resolution image into a high quality, high resolution image
8DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Prior Work
• Linear solutions– Filter design
• Non-linear solutions– Regularization (Projection onto the model space)
• Signal Sparsity– Iterated Denoising / Shrinkage– Lp-Norm Minimization
• Optical Flow
• Adaptive filtering
• Example-based approaches
– Data Consistency (Projection onto the input space)
9DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
SWAT Algorithm Flowchart
Output Image/Video
Input Image/Video
Linear Interpolation Filter
Directional Over-completeTransform
Adaptive Thresholding
Directional Over-complete Inverse Transform
Enforce Data Consistency
More iterations?
Low-resolution, low quality
High-resolution, low quality
High-resolution, high quality
yes no
Regularization
10DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Linear Interpolation Filter
• A linear interpolation filter is used to form an initial estimate of the high-resolution image/video– However, the quality of interpolation is relatively low
• Popular filter choice– Low pass filter of Daubechies 7/9 Inverse Wavelet
– H.264 Interpolation Filter
• A customized linear interpolation filter can be used, if any of the following is known.– Downsampling filter (if the input was obtained by downsampling a
higher resolution original)
– Filtering caused by the camera acquisition process
11DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
0 N-1k
(Sparse Decomposition Domain)(Signal Domain)
S(k)
+T
-T
0 N-1n
s(n)
0 N-1k
C(k)^
(Denoised)
Core idea – Exploit Signal Sparsity
S(k)
0 N-1k
+ W(k)C(k) =
“noise”
12DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
• Transform size: 4x4 (used for description), 3x3
• Transform used: DCT, Hadamard• For an Over-complete Transform
– all possible 4x4 blocks in the image/frame are selected using a non-directional mask
– Each 4x4 block undergoes a transform to produce a set of transformed coefficients
– Each pixel is involved in multiple transforms (16, on the average)
– Total number of transformed coefficients ~ 16 x number of pixels
• Directional Over-complete Transform– Here, each of the 4x4 blocks is formed
by applying a directional mask followed by a warping process (see next slide)
Block (1,1)
Block (2,1)
Block (H-3,1) Block (H-3,2) Block (H-3,W-3)
Block (1,2) Block (1,W-3)
Block (2,2) Block (2,W-3)
…
…
…
… … …
Blocks of an Over-complete Transform
H = Height of image; W = Width of image
Non-directional mask used to select a 4x4 block
Over-complete Transform
13DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
but violated on directional edges
Signal sparsity in DCT domain holds for horizontal
and veritcal edges
Non-directional mask
Directional masks
Transform domain: 4x4 DCT
Transform support is warped
Animated Slide, Please use slide show mode
Let us consider 4 blocks along the edge- First, using Non-directional masks- Now, using Directional masks- Directional masks lead to sparse representation
For Directional Over-complete Transform, Directional masks replace the Non-directional mask
Warped (Directional) Transform
14DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
• Decision made for a block (4x4) of pixels– At each pixel, a vote is cast for the mask that minimizes the signal
variance along the mask direction.
– The mask with the most votes is chosen
• Reduces inconsistency in directions
How to choose a mask?
Example masks
15DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Over-complete Inverse Transform
• For an Over-complete Inverse Transform– Each set of transformed coefficients is converted back to pixel domain
– Each pixel has multiple estimates from different blocks and a weighted combination is used to arrive at its final estimate
W1 W2 W3
and so on with all the blocks….
16DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Adaptive Thresholding
• Transform coefficients are thresholded for denoising
• A master threshold ( ) is used for an initial pass
• A local threshold ( ) is calculated and finally used– Elost: Energy lost due to thresholding when is used as threshold.
TEfT lost ˆ
• Parameters f1 to fn and E1 to En are tuned to achieved a local optimum
1
f1
f2
fn
(0,0) E2E1 En
Elost
f()
T
T
T̂
17DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Enforcing Data Consistency
• Role of data consistency module – Ensure that the high-resolution estimate, when downsampled, can
produce the low-resolution input.
Data Consistency module
Downsampling FilterLinear Interpolation
FilterHigh-resolution Input
Low-resolution Input
High-resolution Output
+
+
_
+
18DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Performance Comparison
• Super-resolution of QCIF to CIF sequences– Low pass filter from Daubechies 7/9 wavelet filter bank
– Compression is done using H.264/AVC codec (JM12.0)
• SWAT run with 2 iterations
• Compared with– Bilinear interpolation
– H.264 interpolation
– Simple Inverse
– Iterated Denoising / Shrinkage (ID)• 2 iterations (similar complexity compared to SWAT)
• 10 iterations
19DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
PSNR comparison (uncompressed)
20DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
PSNR comparison (uncompressed)
21DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
PSNR comparison (uncompressed)
22DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
H264ID (2 iterations)SWAT
Visual Comparison (uncompressed)
23DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
H264ID (2 iterations)SWAT
Visual Comparison (uncompressed)
24DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
PSNR comparison (compression at QP=20)
25DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
PSNR comparison (compression at QP=25)
26DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
H264SWAT
Visual Comparison (compression at QP=25)
27DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Visual Comparison (compression at QP=25)
H264SWAT
28DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML
Conclusion
• SWAT algorithm renders high quality output and yet remains fast– Quality comparable to ID (10 iterations)– Complexity comparable to ID (2 iterations)
• Enabling Features– Over-complete transform representation– Simple basic transform (Hadamard, Integer DCT)– Sparse warped transform– Adaptive thresholding– Weighted inverse transform
• Reference– S. Kanumuri, O. G. Guleryuz and M. R. Civanlar, "Fast super-resolution
reconstructions of mobile video using warped transforms and adaptive thresholding", SPIE Applications of Digital Image Processing XXX , August 2007
• Flicker Reduction Application– To appear in SPIE 2008 (Applications of Digital Image Processing XXXI)
• E-mail:– Sandeep Kanumuri ([email protected])– Onur G. Guleryuz ([email protected])