max planck institute for intelligent systems seoul ... · arxiv:1611.09572v1 [cs.cv] 29 nov 2016...

arX

iv:1

611.

0957

2v1

[cs.

CV

] 29

Nov

201

6

Occlusion-Aware Video Deblurring with a New Layered Blur Model

Byeongjoo Ahn1,*, Tae Hyun Kim2,*, Wonsik Kim3,†, and Kyoung Mu Lee31Korea Institute of Science and Technology

2Max Planck Institute for Intelligent Systems3Seoul National University

(a) Blurry frames (b) Kim & Lee [19] (c) Wulff & Black [35] (d) Ours

Figure 1: Deblurring results for a scene with an occluding object. Our method produces better results at occlusion boundariesthan generalized video deblurring method [19] and the previous layered deblurring method [35].

Abstract

We present a deblurring method for scenes with occlud-ing objects using a carefully designed layered blur model.Layered blur model is frequently used in the motion de-blurring problem to handle locally varying blurs, which iscaused by object motions or depth variations in a scene.However, conventional models have a limitation in repre-senting the layer interactions occurring at occlusion bound-aries. In this paper, we address this limitation in both the-oretical and experimental ways, and propose a new layeredblur model reflecting actual blur generation process. Basedon this model, we develop an occlusion-aware deblurringmethod that can estimate not only the clear foreground andbackground, but also the object motion more accurately. Wealso provide a novel analysis on the blur kernel at objectboundaries, which shows the distinctive characteristics ofthe blur kernel that cannot be captured by conventional blurmodels. Experimental results on synthetic and real blurredvideos demonstrate that the proposed method yields supe-rior results, especially at object boundaries.

∗Part of this work was done while the authors were at Seoul NationalUniversity.

†Currently at Samsung Electronics.

1. IntroductionRecently, deblurring technique has made a lot of

progress. Early deblurring methods only focused on theblur caused by camera shake in constant depth images [8,12, 34, 39]. Recently, however, there are some methods tohandle the blurred images with depth variations [15, 23, 29,37], and there are even methods to handle object motionblur [17, 19, 28]. Object motion deblurring problem is verychallenging since it requires the estimation of independentspatially-varying blur kernels.

What makes the object motion deblurring problem moredifficult is the occlusions generated by intra-frame motions.While occlusions generated byinter-frame motions causea photo-inconsistency, occlusions generated byintra-framemotions cause a mixture of foreground pixels and back-ground pixels at occlusion boundaries in the blurred image.These ambiguous pixels can lead to severe ringing artifactsin the deblurring results.

To address this problem, several methods explicitly mod-eled the occlusions using layered blur model [1, 4, 11].Especially, Wulff and Black [35] proposed a layered blurmodel for the case where both layers are blurred, and ob-tained convincing results. They modeled a blur image as acomposition of individually blurred foreground and back-

http://arxiv.org/abs/1611.09572v1

shutteropen

shuttercloseexposure time

(a) Intermediate images (b) Wulff & Black [35] (c) Ours

Generated blurry image

Figure 2: Counterexample that shows the difference betweenthe actual blur generation process and the conventional layeredblur model [35]. Even though the tiger’s left eye (red color) in the background is occluded by the the foreground fence inthe entire time window between the opening and closing of thecamera shutter, it is exposed in the conventional generativemodel [35]. This error can cause severe ringing artifacts in deblurring results. Notice that our generative model successfullyreflects the actual blur generation process.

ground, and this generative model could express the layerinteraction caused by occlusions. However, their layeredblur model does not reflect the actual blur generation pro-cess. A blurred image is the integration of intermediate im-ages that the camera sees while the shutter is open, whichdiffers from their layered blur model. Figure2 shows acounterexample. The foreground and background are mov-ing at similar speeds in the image, and the blurred image isobserved as an integration of the intermediate images. Sincethe tiger’s left eye (red color) in the background is occludedby the foreground fence in the entire time window betweenthe opening and closing of the camera shutter, it should notbe exposed in the blurry image but it is visible in their gen-erative model. This error can cause severe ringing artifactsin deblurring results.

In this paper, we propose a new layered blur model re-flecting the actual blur generation process, and occlusion-aware video deblurring method accordingly. We enhancedthe model by changing the order of a composition and ablurring of layers, so that it follows the actual blur gen-eration process. By using the carefully designed likeli-hood from our layered blur model, clear foreground andbackground can be successfully recovered from blurred im-ages with occluding objects. Specifically, given a set ofmotion-blurred video frames, our method estimates clearforeground, clear background, clear alpha blending mask,and the motion for each layer. Figure1 shows that our re-sult is better than that of [35], especially at occlusion bound-aries. Also, we analyze the layered blur model theoreticallyand experimentally. We show that the model of [35] is agood approximation that is identical to our model for somespecific physical situations, and also present that the blurkernels at boundaries have distinct characteristics that can-not be captured by conventional blur models.

2. Related Work

Early works on deblurring focused on the blur caused bycamera shake in constant depth images. They are roughlycategorized into spatially-invariant and spatially-varyingconfigurations. Spatially-invariant deblurring achievedsome success in single-image deblurring [8, 12, 31, 36]and video deblurring [3]. However, the spatially-invariantblur model cannot deal with a rotational camera motion,which is a significant and common component in practi-cal scenarios [26]. To overcome this limitation, some re-searchers parameterized the blur as a possible camera mo-tions in a 3D space, and this approach is applied to single-image [13, 14, 34, 39] and video deblurring [7, 27]. Al-though these methods solve spatially-varying motion blurin some extent, they are limited to camera shake in a con-stant depth and cannot handle more general depth variationor object motion problem.

In the case of blurred images including depth variations,the blur cannot be represented by a simple homography.Some methods solved this problem by casting a blur ker-nel estimation problem as a scene depth estimation prob-lem [15, 23, 29, 37]. These methods extended the applica-bility of deblurring methods. However, they are limited tostatic scenes, and do not take the mixture of pixels at occlu-sion boundaries into account.

Recently, several object motion deblurring methods havebeen developed. Some of the methods divided the imageinto segments to restore each of them independently. Theydivided the image using hybrid cameras [2, 32], based onsimilar motions [9, 17, 24], or under the guidance of thematting [28]. There are also some methods without segmen-tations. Choet al. [10] used patch-based synthesis for de-blurring by detecting and interpolating proper sharp patchesat nearby frame. Kim and Lee approximated the blur ker-nel as the pixel-wise 2D linear motion and performed de-

blurring of dynamic scene in a single-image [18] and avideo [19]. These object motion deblurring methods per-form well, but do not consider the interaction between theobject and the background at occlusion boundaries.

At occlusion boundaries, blurred pixels consist of a mix-ture of foreground pixels and background pixels and it playsan important role for object motion deblurring. To addressthis problem, some authors used layered models [1, 4, 11].However, these methods assumed the background to bestatic and modeled the foreground motion only. Sellentetal. [30] used outlier rejections [6] to handle the occlusions.Takeda and Milanfar [33] proposed a method that can dealwith occlusions using a spatiotemporal approach, but it re-quires priorly given blur kernels and depends on time inter-polators.

To deal with the general case where both the foregroundand the background are moving independently, Wulff andBlack [35] proposed a layered blur model that consists ofa composition of individually blurred foreground and back-ground. This model included the interaction between lay-ers and improved the performance at occlusion boundaries.However, as shown in Figure2, this generative model isdifferent with the actual blur generation process and not al-ways valid.

3. Analysis of Layered Blur Model

In this section, we briefly review the previous layeredblur model [35], propose our new layered blur model, andcompare them. The differences in these models will beproved to be greatly attributable to removing serious arti-facts at occlusion boundaries in the later section.

First of all, we set a layered model for a clear imageI ∈ R

n (n is the number of pixels) to be:

I = (1−A)⊙ L1 +A⊙ L0, (1)

whereL1 ∈ Rn andL0 ∈ R

n are the clear foregroundand background layer, respectively,A ∈ R

n is an alphablending mask,1 ∈ R

n is a vector with all componentsequal 1, and⊙ denotes element-wise multiplication. No-tice thatA multiplied the background image (A = 1 meansa background pixel) for notational simplicity later on, in-spired by [40].

Our goal is to express blurred images using each layerof the reference image (i.e. {L1, L0, A}). Before express-ing blurred images, we express a warped image using{L1, L0, A}. We assume that the appearance and shape ofeach layer is constant, which is a common assumption inthe deblurring literature [1, 11, 35]. If we let θil denotes amotion parameter for layerl ∈ {0, 1} from the referenceframe to the framei, then the warped imageIi at framei is

as follows:

Ii = (1−W(θi1)A)⊙W(θi1)L1 +W(θi1)A⊙W(θi0)L0,

(2)whereW(θil) ∈ R

n×n is a warping matrix according tothe motion parameterθil . The alpha blending maskA iswarped by the foreground motionθi1 since its appearancedepends on the foreground object. Similarly to [40], fornotational simplification, we redefine the clear foregroundlayer asL1 = (1 − A) ⊙ L1, and abbreviate the notationW(θi1) to W

i1. Then, the simplified equation is:

Ii = Wi1L1 +W

i1A⊙W

i0L0, (3)

Based on Eq. (3), we compare the previous layered blurmodel [35] and our layered blur model in the following sub-sections.

3.1. Previous Layered Blur Model

Wulff and Black [35] proposed a layered blur model torepresent the mixture of the foreground and the backgroundat occlusion boundaries. Their generative model addressedthat blurred images consist of the composition of individu-ally blurred foreground and background layers. To expressthis model, letKi

l ∈ Rn×n denotes a blur matrix for each

layerl at framei, which equals the average of warping ma-trices while the shutter is open, as:

Kil =

1

T

∫ T

0

W(θi,tl )dt, (4)

whereT is a exposure time, andθi,tl is the motion parameterfor intra-frame capture timet (the elapsed time after theshutter of the framei is opened) from the reference frame(i.e. θi,0l = θil ). Then, the blurred frameBi

prev ∈ Rn based

on [35] is as follows:

Biprev = K

i1L1 +K

i1A⊙K

i0L0. (5)

In this model, the blurred frame is the composition of in-dividually blurred layers (Ki

1L1 andKi0L0). This equation

is equivalent to the layered representation of motion-blurredvideo of [35], although the notation is different.

3.2. Proposed Layered Blur Model

Here we propose a new layered blur model. As shownin Figure2, the previous layered blur model does not re-flect actual blur generation process in which a blurred imageis generated by integrating intermediate images the camerasees during exposure. By applying this concept to the lay-ered blur model, a blurred frameBi ∈ R

n is newly repre-sented as follows:

Bi =1

T

∫ T

0

(

Wi,t1 L1 +W

i,t1 A⊙W

i,t0 L0

)

dt, (6)

(a) Previous layered blur model [35]

layer composition

blur

+

Σ

+

… …

Σ

Σ

… …

(b) Proposed layered blur model

+ + +

… …

Σ

Figure 3: Comparison of two blur generative models. The order of layer composition and blur is changed in the previousmodel. The proposed model coincides with the actual blur generation process.

Sta

tic

fore

gro

un

d

Ho

mo

gen

eou

s

bac

kg

rou

nd

Sta

tic

bac

kg

rou

nd

Wulff & Black [35] Ours Comparisons

Figure 4: Three special cases that the model of Wulff andBlack [35] becomes equivalent to ours. Both blurred imagesare identical even at occlusion boundaries.

whereWi,tl is an abbreviated notation ofW(θi,tl ).

In this model, the blurred frame is the integration of in-termediate images, each of which is a composition of in-termediate layers. This proposed model coincides with theactual blur generation process.

3.3. Comparison of Layered Blur Models

The main difference between the conventional layeredblur model (Eq. (5)) and the proposed layered blur model(Eq. (6)) is the order of a composition and a blurring of lay-ers, as illustrated in Figure3. While the conventional modelcomposites two layers after blurring each layer (i.e. inte-grating each intermediate layer), our model composites twolayers first and then integrates the composited intermediateimages.

Note that although Eq. (5) does not model actual physical

two layered blurring process correctly, it becomes identicalto Eq. (6) in some special cases. We analyze and comparethe two models, and show the conditions when the two mod-els become equivalent, both analytically and empirically.

Due to the fact that(

1

T

∫ T

0

X(t)dt

)

⊙

(

1

T

∫ T

0

Y (t)dt

)

6=1

T

∫ T

0

X(t)⊙Y (t)dt,

(7)the blurred images generated by the two models in Eq. (5)and Eq. (6) are different with each other as follows:

Biprev = K

i1L1 +K

i1A⊙K

i0L0

6= Ki1L1 +

1

T

∫ T

0

(

Wi,t1 A⊙W

i,t0 L0

)

dt = Bi.

(8)

Note, however, that the left and right formulas in Eq. (7)become identical whenX(t) or Y (t) is a constant withrespect tot. And this leads to the following three casesthat can make the two deblurring models in Eq. (8) becomeequivalent.

1. Background is static(Wi,t0

is a constant w.r.t.t)

2. Foreground is static(Wi,t1

is a constant w.r.t.t)

3. Background is homogeneous(Wi,t0

L0 is a constant w.r.t.t)

Although homogeneous alpha map(Wi,t1

A is a constant w.r.t.

t) can also make two models become identical, it is impos-sible at occlusion boundaries.

Figure4 is the experimental comparison that shows theblurred images corresponding to these three situations. Thetwo models give the same blurred images. Thus, the previ-ous layered blur model is a good approximation of ours, butit may lead to artifacts at occlusion boundaries in general.

4. Occlusion-Aware Video Deblurring

In this section, we propose an occlusion-aware deblur-ring method based on the proposed layered blur model.

4.1. Formulation

Given a set of blurred frames including an occlud-ing layer, we restore clear foreground, background, alphablending mask, and object motions.

First we discretize the proposed model for deblurring.We divide the exposure time intoM samples uniformly;{τk}

Mk=1 denote the sampled times such thatτk = (k−1)

MT .

Then, our data term is defined as follows:

∑

i

‖∇Bi−∇1

M

M∑

k=1

(

Wi,τk1 L1 +W

i,τk1 A⊙W

i,τk0 L0

)

‖22,

(9)where∇ denotes a gradient operator, which is widely usedin deblurring field to reduce ringing artifacts [8, 19, 39]. Weuse affine transformations for motion parameters andW

i,τkl

is obtained by linearly interpolating the motions of adjacentframesθil andθi+1

l [35].Since the deblurring is a highly ill-posed problem, we

add regularization terms to reduce the ambiguity. We en-force hyper-Laplacian priors [20, 21, 25] on the gradientsof the foreground and the background images as follows:

∑

l

‖∇Ll‖0.80.8. (10)

Also, since the alpha map is smoother than natural im-ages, we use Laplacian prior on the gradient ofA as:

‖∇A‖1. (11)

Additionally, we enforceA to be close to binary valuesby using following constraint:

AT (1−A). (12)

This term prevents the restored layers from being blurrytransparent images. Figure5 shows the contribution of thebinary term Eq. (12).

The final objective function is as follows:

minL0,L1,A,Θ

λ1

∑

i‖∇Bi−∇ 1

M

∑

Mk=1

(

Wi,τk1

L1+Wi,τk1

A⊙Wi,τk0

L0

)

‖2

2

+∑

l‖∇Ll‖

0.80.8+λ2‖∇A‖1+λ3A

T (1−A),

(13)

whereλ1, λ2, andλ3 are parameters that adjust the weightof each term, andΘ denotes a set of the motion parametersfor each layerl at each framei (i.e. {θil}). We restrict theintensities ofL0, L1, andA to be in the range [0, 1].

4.2. Initialization

Since the problem is non-convex and easy to get stuckat a local minimum, it is important to start with good initialvalues. We use optical flow [38] to initialize the motion pa-rameters as done in [35]. Although the optical flow does nothandle blurred images correctly, it provides good initial val-ues for the variables. Based on the initial optical flow, thetwo dominant affine transformations are estimated for eachlayer using RANSAC. Also, we initialize each layer as anaverage of aligned input frames using initial motion param-eters for each layer [40]. For the alpha blending mask, fromthe RANSAC result, we first specify the pixels that corre-spond to the background of each frame to the intermediatemasks. Then, we initialize alpha blending mask as an av-erage of the aligned intermediate masks using the motionparameters for the background.

4.3. Optimization

To optimize the non-convex objective function inEq. (13), we divide the original problem into three sub-problems and use alternating optimization techniques [8,17, 35] that iteratively estimate each unknown while otherunknowns are fixed. In this section, we reorganize each sub-problem as a traditional deconvolution formula and describehow to solve it. Notice that we estimate non-simplified{L1, L0, A} (i.e. I = (1−A)⊙L1+A⊙L0) although weused the simplified formula (i.e. I = L1 + A ⊙ L0) in theprevious section for notational simplicity.

Latent Image Estimation In this step, we restore clearforeground and background layers while alpha blendingmask and motion parameters are fixed. To make this sub-problem be in a simpler formula, we concatenate some vari-ables. LetL ∈ R

2n denotes the row concatenation ofL0

andL1 (i.e. L =

[

L0

L1

]

), andKiL ∈ R

n×2n denotes the col-

umn concatenation ofKiL0

∈ Rn×n andKi

L1∈ R

n×n (i.e.K

iL =

[

KiL0

KiL1

]

) such that

KiL0

=1

M

M∑

k=1

diag(

Wi,τk1 A

)

Wi,τk0 ,

KiL1

=1

M

M∑

k=1

Wi,τk1 diag(1 −A),

(14)

where diag(·) denotes a diagonal matrix formed from itsvector argument. SinceX ⊙ Y can be represented asdiag(X)Y , Ki

LL is equivalent to our layered blur model.Then, for multi-frames, letB ∈ R

Nn denotes the rowconcatenation of{Bi}, andKL ∈ R

Nn×2n denotes therow concatenation of{Ki

L} (N is the number of observedframes). The sub-problem for latent image estimation can

(a) Initial alpha map (c) Wulff & Black [35] (d) Without binary prior (e) Our result(b) Kim & Lee [19]

Figure 5: The contribution of the binary prior term for alphablending mask. (a) In blurred video, the alpha map obtainedthrough optical flow (Section4.2) does not accurately estimate the shape of the object. (b) State-of-the-art video deblur-ring [19] method do not correctly estimate the motion map. (c) The previous layered deblurring method [35] restore theshape to some extent. (d) Even when there is no binary term (i.e. λ3 = 0), our method estimates a more detailed shape, but itis noisy. (e) Our method with the binary term provides a clearresult.

be expressed as follows:

minL

λ1‖∇B −∇KLL‖22 + ‖∇L‖0.80.8. (15)

This optimization problem is same as the traditional decon-volution problem with hyper-Laplacian prior [20]. We op-timize Eq. (15) using a conjugate gradient method and alookup table in the same way as [20].

Alpha Blending Mask Estimation In this step, we re-store clear alpha blending mask while layer appearancesand motion parameters are fixed. Similarly to the latent im-age estimation step, we defineBi

A ∈ Rn andKi

A ∈ Rn×n

as follows:

BiA = Bi −

1

M

M∑

k=1

Wi,τk1 L1,

KiA =

1

M

M∑

k=1

(

diag(

Wi,τk0 L0

)

Wi,τk1 −W

i,τk1 diag(L1)

)

.

(16)Also, let BA ∈ R

Nn denotes the row concatenation of{Bi

A}, andKA ∈ RNn×n denotes the row concatenation

of {KiA}. Then, the sub-problem for alpha map estimation

can be expressed as follows:

minA

λ1‖∇BA −∇KAA‖22 + λ2‖∇A‖1 + λ3A

T (1−A)

(17)This sub-problem is also same as a traditionalL1 deconvo-lution problem except the last term. We optimize Eq. (17)

using a primal-dual optimization method [5] as follows:

Dm+1 = Dm+σD∇Am

max(1,|Dm+σD∇Am|)

Am+1 = argminA

λ2‖A− (Am − σA∇TDm+1)‖22

2τ+

λ1‖∇BA −∇KAA‖22 + λ3A

T (1−A)

Dm+1

= 2Dm+1 −Dm,

(18)wherem denotes the iteration number,D ∈ R

n denotesthe dual variable, andσD = 10 andσA = 0.0125 are theparameters for each update step. We apply the conjugategradient method to optimizeAm+1.

Motion Parameter Estimation In this step, we esti-mate motion parameters while layer appearances and alphablending mask are fixed. If we focus on the terms dependentonΘ in Eq. (13):

minΘ

∑

i‖∇Bi−∇ 1

M

∑Mk=1

(

Wi,τk1

L1+Wi,τk1

A⊙Wi,τk0

L0

)

‖2

2,

(19)We solve this equation using the Nelder-Mead simplexmethod [22] that can be implemented simply by a Matlabbuilt-in function “fminsearch”.

4.4. Implementation Details

To accelerate the algorithm, we optimize the objectivefunction based on coarse-to-fine approach where the scalefactor of image pyramid is 0.8. Also, the parameters usedin the optimization is fixed for all experiments asλ1 = 2500

N

andλ3 = λ20000 , whereN denotes the number of frames.λ2

was adjusted to a value between0.05λ and0.06λ dependingon the shape of the occluding object. Camera duty cycle,which is the ratio of an exposure timeT to a frame interval,

Algorithm 1 Overview of the proposed deblurring method

Input: A set of blurred frames{Bi}Output: Latent layersL, maskA, and motionsΘ.

1: InitializeL, A, Θ using optical flow.2: Build image pyramid.3: for iteration = 1:34: Solve forL (Eq. (15)).5: Solve forA (Eq. (18)).6: Solve forΘ (Eq. (19)).7: end8: Propagate variables to the next pyramid level if exists.9: Repeat steps 3-8 from coarse to fine pyramid level.

is given from the camera setting (0.5). The overview of ouralgorithm is summarized in Algorithm1.

5. Experimental Results

We compare our deblurring results with those of thestate-of-the-art video deblurring method [19] and the previ-ous layered deblurring method [35]. Since the source codeof the previous method [35] is not available, we focus oncomparing our results with published results from the pre-vious methods. Please see our supplementary material formore results.

The image sequences consist of 5 to 10 images. We pro-cessed the sequences on a desktop computer with Intel i7-6700k CPU and 64GB memory. It took 10 minutes per im-age to process 640x480 images with a non-optimized Mat-lab implementation.

Figure1 and Figure5 show the deblurring results for im-ages with severe occlusions. Since the detailed structure ofthe bicycle is mixed with the background by the occlusion,generalized deblurring method [19] that does not considerthe occlusion attempts to restore the background mainly.Also, since both the foreground and background are blurred,the previous layered deblurring method [35] do not restorethe details on the face and the bicycle properly. On the otherhand, the proposed method restores the detailed structure ofthe human face and bicycle handle by using the carefullydesigned model.

Figure 6 shows the comparision with recent methodsfor deblurring results. The ”sign” sequence and ”car” se-quence correspond to the situations where both foregroundand background are moving. In the result of the previousmethod [19, 35], we can see the artifacts caused by segmen-tation errors at boundaries. Our result shows better perfor-mance at occlusion boundaries. Also, the ”hand” sequencebelongs to a situation where the previous model and ourmodel are the same because the background is static. How-ever, even in this situation, we can see that our method pro-duces better results because it uses effective regularization

and optimization.Additionally, our method can also achieve the effects of

layer separations [40]. It can remove occluding objects evenwhen the images are blurry. Figure7 shows not only that thefence occluding the tiger is removed, but also that the imageis clearly restored. This image sequence was created by aphysics-based renderer [16].

6. Discussion

In this section, we analyze the blur kernel at objectboundaries, which shows the distinctive characteristics ofthe blur kernel that cannot be captured by conventional blurmodels. In addition, we discuss the limitations and futureworks of the proposed method.

6.1. Blur Kernel at Occlusion Boundaries

Visualizing the proposed blur model as a traditional blurkernel gives us the interesting result at layer boundaries.Figure8 illustrates the blur kernel of each model at bound-aries where the foreground is moving to the left-side and thebackground is moving to the right-side.

Early works that handle abruptly-varying blur find thekernel either of the foreground or of the background [4, 9,11, 17, 23, 29] such as Figure8(a), or find an ambiguouskernel between them [18, 19].

In the model of Wulff and Black [35], the pixel at bound-aries is blurred by both foreground and background kernelswhile each kernel is truncated or diminished in intensitycompared to the kernel without occlusion. The foregroundkernel is shortened in length to a factor of1 − α and thebackground kernel is reduced in intensity to a proportionof α as shown in Figure8(b), whereα is the blurred maskvalue of the corresponding pixel.

In the proposed model, the foreground kernel experi-ences truncation equal to that of [35], but the backgroundkernel istruncatedinstead of beingweakenedto a factor ofα as shown in Figure8(c). In representing the occlusion ofa background by a foreground object, the proposed modelcorrectly models the occlusion event withthe length of theblur kernel.

Thus, the blur kernel of the foreground and the back-ground can beseparateor overlappedeach other accordingto the relative velocity of the layers. These distinctive ker-nel characteristics cannot be captured by conventional blurmodels.

6.2. Limitation and Future work

In this study, several assumptions are made for our de-blurring method. We assumed that the camera duty cycleis given for every frame, and the object motion is smoothin a frame. Since we fix the camera setting, and the expo-sure time is short enough in videos, these assumptions can

(a) Input frame (representative) (c) Wulff & Black [35] (d) Ours(b) Kim & Lee [19]

hand

car

sign

Figure 6: Comparison with recent methods. The previous method [19, 35] causes segmentation errors and consequentartifacts at occlusion boundaries, but our method yields clear results.

be justified to some extent. For the videos without expo-sure information, the estimation of duty cycle [27] shouldbe combined to our method.

Also, we parameterized the object motion as an affine

motion, which causes a limitation in dealing with gen-eral object motions. Although a projective motion can beapplied to our model, a further consideration is requiredfor additional problems such as occlusions in a layer or

Representative input frame Background Foreground Alpha map

Figure 7: Occlusion removal result. Our deblurring method estimates clean foreground and background separately, so itispossible to achieve the effects of layer separations [40] as well. When a tiger is blurred while being occluded by a fence, wecan separate the tiger in the background and foreground fence, and restore clear images of them at the same time.

or

(b)

(c)

Foreground kernel Background kernel

(a)

Combined kernel

Figure 8: Illustration of blur kernels at boundaries ac-cording to each model. (a) Early models. (b) Wulff andBlack [35]. (c) Proposed model. The proposed model cor-rectly models the occlusion event with the length of the blurkernel instead of its intensity [35] according to the blurredmask valueα.

brightness constancy. Combining our method with a non-parametrical motion deblurring method [18, 19] is one ofour future directions. In addition, our method assumes thescene with two layers currently. Expanding this to multi-layer requires additional consideration of the occlusionsin-volving multi-layer and the depth order of the layers. Solv-ing this problem is our another future direction.

7. Conclusion

In this paper, we proposed occlusion-aware video de-blurring based on a new layered blur model, allowing usan accurate restoration of object boundaries. We addressedthe limitation of the conventional layered blur model the-oretically and experimentally, and enhanced the model bychanging the order of layer composition and blur, so that itfollows the actual blur generation process. Based on thismodel, the proposed occlusion-aware deblurring methodobtains more accurate latent image, object motion, and seg-mentation mask. Also, we analyzed that our model exactly

extracts the contribution of occlusion from the original ker-nel, helping the capture of the property to overlap or sepa-rate the foreground and background kernels at boundaries.Experimental results on synthetic and real blurred videosdemonstrate the outstanding performance of the proposedmethod.

References

[1] L. Bar, B. Berkels, M. Rumpf, and G. Sapiro. A variationalframework for simultaneous motion estimation and restora-tion of motion-blurred video. InIEEE International Confer-ence on Computer Vision (ICCV), 2007.1, 3

[2] M. Ben-Ezra and S. K. Nayar. Motion deblurring using hy-brid imaging. InIEEE Conference on Computer Vision andPattern Recognition (CVPR), 2003.2

[3] J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurringusing multiple images.Journal of Computational Physics,228(14):5057–5071, 2009.2

[4] A. Chakrabarti, T. Zickler, and W. T. Freeman. Analyzingspatially-varying blur. InIEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), 2010.1, 3, 7

[5] A. Chambolle and T. Pock. A first-order primal-dual al-gorithm for convex problems with applications to imaging.Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.6

[6] J. Chen, L. Yuan, C.-K. Tang, and L. Quan. Robust dualmotion deblurring. InIEEE Conference on Computer Visionand Pattern Recognition (CVPR), 2008.3

[7] S. Cho, H. Cho, Y.-W. Tai, and S. Lee. Registration basednon-uniform motion deblurring.Computer Graphics Forum,31(7):2183–2192, 2012.2

[8] S. Cho and S. Lee. Fast motion deblurring.ACM Transac-tions on Graphics (TOG), 28(5):145, 2009.1, 2, 5

[9] S. Cho, Y. Matsushita, and S. Lee. Removing non-uniformmotion blur from images. InIEEE International Conferenceon Computer Vision (ICCV), 2007.2, 7

[10] S. Cho, J. Wang, and S. Lee. Video deblurring for hand-heldcameras using patch-based synthesis.ACM Transactions onGraphics (TOG), 31(4):64, 2012.2

[11] S. Dai and Y. Wu. Removing partial blur in a single image.In IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), 2009.1, 3, 7

[12] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T.Freeman. Removing camera shake from a single photo-graph. ACM Transactions on Graphics (TOG), 25(3):787–794, 2006.1, 2

[13] A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless.Single image deblurring using motion density functions. InEuropean Conference on Computer Vision (ECCV), 2010.2

[14] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Scholkopf.Fast removal of non-uniform camera shake. InIEEE Inter-national Conference on Computer Vision (ICCV), 2011.2

[15] Z. Hu, L. Xu, and M.-H. Yang. Joint depth estimation andcamera shake removal from single blurry image. InIEEEConference on Computer Vision and Pattern Recognition(CVPR), 2014.1, 2

[16] W. Jakob. Mitsuba renderer, 2010. http://www.mitsuba-renderer.org.7

[17] T. H. Kim, B. Ahn, and K. M. Lee. Dynamic scene deblur-ring. In IEEE International Conference on Computer Vision(ICCV), 2013.1, 2, 5, 7

[18] T. H. Kim and K. M. Lee. Segmentation-free dynamic scenedeblurring. In IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2014.3, 7, 9

[19] T. H. Kim and K. M. Lee. Generalized video deblurring fordynamic scenes. InIEEE Conference on Computer Visionand Pattern Recognition (CVPR), 2015.1, 3, 5, 6, 7, 8, 9

[20] D. Krishnan and R. Fergus. Fast image deconvolution usinghyper-laplacian priors. InAdvances in Neural InformationProcessing Systems (NIPS), 2009.5, 6

[21] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolutionusing a normalized sparsity measure. InIEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2011.5

[22] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright.Convergence properties of the nelder–mead simplex methodin low dimensions.SIAM Journal on optimization, 9(1):112–147, 1998.6

[23] H. S. Lee and K. M. Lee. Dense 3d reconstruction fromseverely blurred images using a single moving camera. InIEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), 2013.1, 2, 7

[24] A. Levin. Blind motion deblurring using image statistics. InAdvances in Neural Information Processing Systems (NIPS),2006.2

[25] A. Levin and Y. Weiss. User assisted separation of reflectionsfrom a single image using a sparsity prior.IEEE Transac-tions on Pattern Analysis and Machine Intelligence (TPAMI),29(9):1647, 2007.5

[26] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Under-standing and evaluating blind deconvolution algorithms. InIEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), 2009.2

[27] Y. Li, S. B. Kang, N. Joshi, S. M. Seitz, and D. P. Hutten-locher. Generating sharp panoramas from motion-blurredvideos. InIEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), 2010.2, 8

[28] J. Pan, Z. Hu, Z. Su, H.-Y. Lee, and M.-H. Yang. Soft-segmentation guided object motion deblurring. InIEEEConference on Computer Vision and Pattern Recognition(CVPR), 2016.1, 2

[29] C. Paramanand and A. N. Rajagopalan. Non-uniform mo-tion deblurring for bilayer scenes. InIEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2013. 1,2, 7

[30] A. Sellent, C. Rother, and S. Roth. Stereo video deblurring.In European Conference on Computer Vision (ECCV), 2016.3

[31] Q. Shan, J. Jia, and A. Agarwala. High-quality motion de-blurring from a single image.ACM Transactions on Graph-ics (TOG), 27(3):73, 2008.2

[32] Y.-W. Tai, H. Du, M. S. Brown, and S. Lin. Correction ofspatially varying image and video motion blur using a hybridcamera.IEEE Transactions on Pattern Analysis and MachineIntelligence (TPAMI), 32(6):1012–1028, 2010.2

[33] H. Takeda and P. Milanfar. Removing motion blur withspace–time processing.IEEE Transactions on Image Pro-cessing (TIP), 20(10):2990–3000, 2011.3

[34] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniformdeblurring for shaken images.International Journal of Com-puter Vision (IJCV), 98(2):168–186, 2012.1, 2

[35] J. Wulff and M. J. Black. Modeling blurred video with layers.In European Conference on Computer Vision (ECCV), 2014.1, 2, 3, 4, 5, 6, 7, 8, 9

[36] L. Xu and J. Jia. Two-phase kernel estimation for robustmotion deblurring. InEuropean Conference on ComputerVision (ECCV), 2010.2

[37] L. Xu and J. Jia. Depth-aware motion deblurring. InIEEE International Conference on Computational Photog-raphy (ICCP), 2012.1, 2

[38] L. Xu, J. Jia, and Y. Matsushita. Motion detail preserving op-tical flow estimation.IEEE Transactions on Pattern Analysisand Machine Intelligence (TPAMI), 34(9):1744–1757, 2012.5

[39] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse represen-tation for natural image deblurring. InIEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2013. 1,2, 5

[40] T. Xue, M. Rubinstein, C. Liu, and W. T. Freeman. A com-putational approach for obstruction-free photography.ACMTransactions on Graphics (TOG), 34(4):79, 2015.3, 5, 7, 9

max planck institute for intelligent systems seoul ... · arxiv:1611.09572v1 [cs.cv] 29 nov 2016...

Documents