docomo usa labs all rights reserved sandeep kanumuri, nml fast super-resolution of video sequences...

Post on 20-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML

Fast super-resolution of video sequences using sparse directional transforms*

Sandeep KanumuriOnur G. Guleryuz

DoCoMo USA Labs

*Presented at 2008 SIAM Conference on Imaging Science on 07/09/2008

(Animated slides, please use slide show mode)

2DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Outline

• System Model

• Motivation

• Prior Work

• Our Solution: SWAT (Sparse Warped transform and Adaptive Thresholding)– Algorithm Flowchart

– Over-complete Transform

– Warped (Directional) Transform

– Over-complete Inverse Transform

– Adaptive Thresholding

• Performance Comparison

• Conclusion

3DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

System Model

• Design goals1. High Quality Rendering

2. Fast Algorithm (Lower Complexity) – Single Frame, Simple Transform

DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML

Motivation

5DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Broadcast Video – TV application

Docking station

Low-resolution video signal for mobile phones

Low-resolution video is sent to the docking station

Docking station uses the SWAT algorithm to convert low-resolution video to high-resolution video

High-resolution video is sent to a TV or a large display

BENEFIT: Broadcast programming aimed at mobile phones can also be

used in stationary environments

A.1

A.2

B

Low-resolution video is converted to high-resolution video by the cell phone itself

using the SWAT algorithm and high-resolution video is transmitted to the TV

using local wireless technologies

Only one path (Path A or Path B) is used

6DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Broadcast Video – VGA phones

Low-resolution video signal for mobile phones

BENEFIT: SWAT capability allows this cell phone to convert low-resolution

video to high-resolution video

VGA phone with SWAT capability

VGA phone without SWAT capability

7DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

More Applications…

• Video Quality Enhancement Service– SWAT algorithm can be deployed as a service to enhance the

resolution and quality of videos

• Video Conferencing– A SWAT equipped terminal can show video at a higher zoom level

and with improved quality

• High-quality Image Zooming– SWAT algorithm enables the mobile phone to convert the low quality,

low resolution image into a high quality, high resolution image

8DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Prior Work

• Linear solutions– Filter design

• Non-linear solutions– Regularization (Projection onto the model space)

• Signal Sparsity– Iterated Denoising / Shrinkage– Lp-Norm Minimization

• Optical Flow

• Adaptive filtering

• Example-based approaches

– Data Consistency (Projection onto the input space)

9DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

SWAT Algorithm Flowchart

Output Image/Video

Input Image/Video

Linear Interpolation Filter

Directional Over-completeTransform

Adaptive Thresholding

Directional Over-complete Inverse Transform

Enforce Data Consistency

More iterations?

Low-resolution, low quality

High-resolution, low quality

High-resolution, high quality

yes no

Regularization

10DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Linear Interpolation Filter

• A linear interpolation filter is used to form an initial estimate of the high-resolution image/video– However, the quality of interpolation is relatively low

• Popular filter choice– Low pass filter of Daubechies 7/9 Inverse Wavelet

– H.264 Interpolation Filter

• A customized linear interpolation filter can be used, if any of the following is known.– Downsampling filter (if the input was obtained by downsampling a

higher resolution original)

– Filtering caused by the camera acquisition process

11DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

0 N-1k

(Sparse Decomposition Domain)(Signal Domain)

S(k)

+T

-T

0 N-1n

s(n)

0 N-1k

C(k)^

(Denoised)

Core idea – Exploit Signal Sparsity

S(k)

0 N-1k

+ W(k)C(k) =

“noise”

12DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

• Transform size: 4x4 (used for description), 3x3

• Transform used: DCT, Hadamard• For an Over-complete Transform

– all possible 4x4 blocks in the image/frame are selected using a non-directional mask

– Each 4x4 block undergoes a transform to produce a set of transformed coefficients

– Each pixel is involved in multiple transforms (16, on the average)

– Total number of transformed coefficients ~ 16 x number of pixels

• Directional Over-complete Transform– Here, each of the 4x4 blocks is formed

by applying a directional mask followed by a warping process (see next slide)

Block (1,1)

Block (2,1)

Block (H-3,1) Block (H-3,2) Block (H-3,W-3)

Block (1,2) Block (1,W-3)

Block (2,2) Block (2,W-3)

… … …

Blocks of an Over-complete Transform

H = Height of image; W = Width of image

Non-directional mask used to select a 4x4 block

Over-complete Transform

13DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

but violated on directional edges

Signal sparsity in DCT domain holds for horizontal

and veritcal edges

Non-directional mask

Directional masks

Transform domain: 4x4 DCT

Transform support is warped

Animated Slide, Please use slide show mode

Let us consider 4 blocks along the edge- First, using Non-directional masks- Now, using Directional masks- Directional masks lead to sparse representation

For Directional Over-complete Transform, Directional masks replace the Non-directional mask

Warped (Directional) Transform

14DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

• Decision made for a block (4x4) of pixels– At each pixel, a vote is cast for the mask that minimizes the signal

variance along the mask direction.

– The mask with the most votes is chosen

• Reduces inconsistency in directions

How to choose a mask?

Example masks

15DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Over-complete Inverse Transform

• For an Over-complete Inverse Transform– Each set of transformed coefficients is converted back to pixel domain

– Each pixel has multiple estimates from different blocks and a weighted combination is used to arrive at its final estimate

W1 W2 W3

and so on with all the blocks….

16DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Adaptive Thresholding

• Transform coefficients are thresholded for denoising

• A master threshold ( ) is used for an initial pass

• A local threshold ( ) is calculated and finally used– Elost: Energy lost due to thresholding when is used as threshold.

TEfT lost ˆ

• Parameters f1 to fn and E1 to En are tuned to achieved a local optimum

1

f1

f2

fn

(0,0) E2E1 En

Elost

f()

T

T

17DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Enforcing Data Consistency

• Role of data consistency module – Ensure that the high-resolution estimate, when downsampled, can

produce the low-resolution input.

Data Consistency module

Downsampling FilterLinear Interpolation

FilterHigh-resolution Input

Low-resolution Input

High-resolution Output

+

+

_

+

18DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Performance Comparison

• Super-resolution of QCIF to CIF sequences– Low pass filter from Daubechies 7/9 wavelet filter bank

– Compression is done using H.264/AVC codec (JM12.0)

• SWAT run with 2 iterations

• Compared with– Bilinear interpolation

– H.264 interpolation

– Simple Inverse

– Iterated Denoising / Shrinkage (ID)• 2 iterations (similar complexity compared to SWAT)

• 10 iterations

19DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

PSNR comparison (uncompressed)

20DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

PSNR comparison (uncompressed)

21DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

PSNR comparison (uncompressed)

22DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

H264ID (2 iterations)SWAT

Visual Comparison (uncompressed)

23DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

H264ID (2 iterations)SWAT

Visual Comparison (uncompressed)

24DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

PSNR comparison (compression at QP=20)

25DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

PSNR comparison (compression at QP=25)

26DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

H264SWAT

Visual Comparison (compression at QP=25)

27DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Visual Comparison (compression at QP=25)

H264SWAT

28DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Conclusion

• SWAT algorithm renders high quality output and yet remains fast– Quality comparable to ID (10 iterations)– Complexity comparable to ID (2 iterations)

• Enabling Features– Over-complete transform representation– Simple basic transform (Hadamard, Integer DCT)– Sparse warped transform– Adaptive thresholding– Weighted inverse transform

• Reference– S. Kanumuri, O. G. Guleryuz and M. R. Civanlar, "Fast super-resolution

reconstructions of mobile video using warped transforms and adaptive thresholding", SPIE Applications of Digital Image Processing XXX , August 2007

• Flicker Reduction Application– To appear in SPIE 2008 (Applications of Digital Image Processing XXXI)

• E-mail:– Sandeep Kanumuri (skanumuri@docomolabs-usa.com)– Onur G. Guleryuz (guleryuz@docomolabs-usa.com)

top related