the image matting method with zero-one regularizationcsusap.csu.edu.au/~rpaul/paper/icme2012.pdf ·...

8
The Image Matting Method with Zero-One Regularization Junbin Gao and Manoranjan Paul School of Computing and Mathematics Charles Sturt University Bathurst, NSW 2795, Australia {jbgao,rpaul}@csu.edu.au Jun Liu Department of Computer Science and Engineering Arizona State University Tempe, AZ 85287, USA [email protected] Abstract Image matting refers to the problem of accurately ex- tracting foreground objects in images and video. The most recent work [11] and [7] in natural image matting relies on the local and manifold smoothness assumptions on fore- ground and background colors on which a cost function is established. The closed-form solution has been derived based on certain degree of user inputs. In this paper, we present a framework of formulating new regularization for robust solutions under the so-called zero-one regularisers. We also illustrate our new algorithm using the standard benchmark images and very comparable results have been obtained. Keywords: Image Matting; Local Tangent Space Align- ment; Manifold Learning; Locally Linear Embedding; Laplacian Matrix 1. Introduction Image matting refers to the problem of softly extracting the foreground object from a single image. Image Matting is important in both computer vision and graphics applica- tions and is a key technique in many image/video editing and film production applications. The approach for digi- tal matting has been extensively studied in the literature. A 2007 survey article [20] provides a comprehensive review of existing image and video matting algorithms and systems, with an emphasis on the advanced techniques that have been recently proposed. Most of existing matting techniques de- pends on the so-called alpha matte. Mathematically, the al- pha matte is based on the following model assumption I i = α i F i + (1 - α i )B i (1) where α i , I i , F i and B i are the alpha matte value, image color value, the foreground image color value and the back- ground color value, respectively, at an given pixel i. The alpha matte value α i is assumed to be 0 or 1 , hard matte value which is desired in the matting process. For most of digital matting approaches, an appropriate assumption that α i lies between 0 and 1 is widely used, i.e., the so-called soft matte. In a matting problem, given the image pixel information I i for all the pixels, the goal is to estimate α i , F i and B i si- multaneously. Obviously this is a severely underconstrained problem. Many existing matting algorithms and systems require certain degree of user interaction to extract a good matte. For example, a so-called trimap is usually supplied to matting algorithms or systems, which indicates definite foreground, definite background and unknown regions. To infer the alpha matte value in the regions, Bayesian mat- ting [6] uses a progressively moving window marching in- ward from the known regions. [2] proposed to calculate al- pha matte through the geodesic distance from a pixel to the known regions. Among the propagation-based approaches, the Poisson matting algorithm [18] assumes the foreground and background colors are smooth in a narrow band of un- known pixels, then solves a homogenous Laplacian matrix; An alike algorithm is proposed in [8] based on Random Walks. The Closed-form matting approach [11] was pro- posed by introducing the matting Laplacian matrix under the assumption that foreground and background colors can be fit with linear models in local windows, which leads to a quadratic cost function in alpha that can be minimized glob- ally. Minimizing the quadratic cost function is equivalent to solving a linear system which is often time-consuming when the image size is large. In a recent work [9], He et al. proposed a fast matting scheme by using large kernel mat- ting Laplacian matrices. Review and assessment of more matting techniques can be found at the alpha matting evalu- ation website http://www.alphamatting.com/. Majority of matting algorithms rely on certain amount of user’s scribble information to make the algorithm them- selves successful. Too few information may result in mat- ting failure as demonstrated by examples in [9]. Basically many of the current matting approaches like the Lapla- cian Matting [11, 12], LTSA matting [7], and LLE mat- 1

Upload: hoangquynh

Post on 22-Apr-2018

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

The Image Matting Method with Zero-One Regularization

Junbin Gao and Manoranjan PaulSchool of Computing and Mathematics

Charles Sturt UniversityBathurst, NSW 2795, Australia

jbgao,[email protected]

Jun LiuDepartment of Computer Science and Engineering

Arizona State UniversityTempe, AZ 85287, USA

[email protected]

Abstract

Image matting refers to the problem of accurately ex-tracting foreground objects in images and video. The mostrecent work [11] and [7] in natural image matting relieson the local and manifold smoothness assumptions on fore-ground and background colors on which a cost functionis established. The closed-form solution has been derivedbased on certain degree of user inputs. In this paper, wepresent a framework of formulating new regularization forrobust solutions under the so-called zero-one regularisers.We also illustrate our new algorithm using the standardbenchmark images and very comparable results have beenobtained.

Keywords: Image Matting; Local Tangent Space Align-ment; Manifold Learning; Locally Linear Embedding;Laplacian Matrix

1. IntroductionImage matting refers to the problem of softly extracting

the foreground object from a single image. Image Mattingis important in both computer vision and graphics applica-tions and is a key technique in many image/video editingand film production applications. The approach for digi-tal matting has been extensively studied in the literature. A2007 survey article [20] provides a comprehensive review ofexisting image and video matting algorithms and systems,with an emphasis on the advanced techniques that have beenrecently proposed. Most of existing matting techniques de-pends on the so-called alpha matte. Mathematically, the al-pha matte is based on the following model assumption

Ii = αiFi + (1− αi)Bi (1)

where αi, Ii, Fi and Bi are the alpha matte value, imagecolor value, the foreground image color value and the back-ground color value, respectively, at an given pixel i. Thealpha matte value αi is assumed to be 0 or 1 , hard matte

value which is desired in the matting process. For most ofdigital matting approaches, an appropriate assumption thatαi lies between 0 and 1 is widely used, i.e., the so-calledsoft matte.

In a matting problem, given the image pixel informationIi for all the pixels, the goal is to estimate αi, Fi and Bi si-multaneously. Obviously this is a severely underconstrainedproblem. Many existing matting algorithms and systemsrequire certain degree of user interaction to extract a goodmatte. For example, a so-called trimap is usually suppliedto matting algorithms or systems, which indicates definiteforeground, definite background and unknown regions. Toinfer the alpha matte value in the regions, Bayesian mat-ting [6] uses a progressively moving window marching in-ward from the known regions. [2] proposed to calculate al-pha matte through the geodesic distance from a pixel to theknown regions. Among the propagation-based approaches,the Poisson matting algorithm [18] assumes the foregroundand background colors are smooth in a narrow band of un-known pixels, then solves a homogenous Laplacian matrix;An alike algorithm is proposed in [8] based on RandomWalks. The Closed-form matting approach [11] was pro-posed by introducing the matting Laplacian matrix underthe assumption that foreground and background colors canbe fit with linear models in local windows, which leads to aquadratic cost function in alpha that can be minimized glob-ally. Minimizing the quadratic cost function is equivalentto solving a linear system which is often time-consumingwhen the image size is large. In a recent work [9], He et al.proposed a fast matting scheme by using large kernel mat-ting Laplacian matrices. Review and assessment of morematting techniques can be found at the alpha matting evalu-ation website http://www.alphamatting.com/.

Majority of matting algorithms rely on certain amountof user’s scribble information to make the algorithm them-selves successful. Too few information may result in mat-ting failure as demonstrated by examples in [9]. Basicallymany of the current matting approaches like the Lapla-cian Matting [11, 12], LTSA matting [7], and LLE mat-

1

Page 2: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

ting [1] formulate the matting problem as an unconstrainedquadratic optimization problem (except for user-specifiedscribble constrains which can be removed by variable sub-stitution) although one formulation proposed in [7] involvesthe constrain 0 ≤ αi ≤ 1. In this paper, we try to introducea different robust quasi-hard matte regularization to reducethe dependency of user-specified constraints (such as scrib-bles or a bounding rectangle) as done in [11, 12, 16].

The paper is organized as follows. Section 2 simply re-views three main image matting algorithms, Laplacian Ma-trix Matting [11], LTSA alignment matrix matting [21] andLLE alignment matrix matting [1]. Section 3 is dedicatedto formulating the new regularization and propose a fast op-timization algorithm for the new formulation. In Section4, we present several examples of using the new approachover the benchmark images, see [11]. We make our conclu-sions in Section 5 with suggestions of further exploration inimage matting.

2. Reviews of Matrix-Based Matting Ap-proaches

Throughout the paper we will make use of the followingnotations: For a given image I, denote the pixel by i and itscorresponding RGB color vector by Ii.

2.1. Matting Laplacian Matrix Approach

The matting Laplacian matrix approached is proposedin [11]. The matrix defines certain affinity of color infor-mation specially designed for image matting. The successof the proposed approach relies on the so-called color linemodel: the foreground colors in a local window lie on asingle line in the RGB color space, which is described as alocal piecewise linear transformation of the image I,

αij = aTi Iij + bi,∀ij ∈ wi = i1, ..., iK

where wi = i1, ..., iK containing K neighbor pixels ofi denotes a local window of certain size at pixel i and aiand bi are assumed to be constant in the local window. Un-der the above model assumption, accordingly, an objectivefunction E(α,A,b) can be defined to reinforce the alphavalues fitting this model:

E(α,A,b) =∑i

K∑j=1

[(αi − aTi Iij − bi)2 + εaTi ai

](2)

where ε > 0 is a given parameter controlling numericalstability, α = (α1, ..., αN )T is the vector of all the alphavalues and A = [a1, ...,aN ] and b = [b1, ..., bN ]T with Nthe number of pixels on the image I.

By minimizing the cost function with respect to (A,b),a quadratic function of α can be obtained:

E(α) = αTLα

Please refer to [11] for the detail of calculating the elementsof the matting Laplacian matrix L.

Combining this objective function with the user-specified constraints like a trimap Ω, the whole objectivefunction to be optimized is defined as

F (α) = αTLα + λ(α−αΩ)TDΩ(α−αΩ) (3)

where αΩ is the trimap or user’s scribble, and DΩ is a di-agonal matrix whose elements are one for trimap/scribbleconstrained pixels Ω and zero otherwise and λ is a largenumber enforcing larger penalty over the alpha componentscorresponding to constrained pixels Ω.

Eqn (3) defines an unconstained optimization problemwhich can be easily solved and the user-specified informa-tion is enforced by choosing a larger regularizer λ.

2.2. LTSA Alignment Matrix Approach

The Local Tangent Subspace Alignment (LTSA) is amethod for manifold learning, which can efficiently learna nonlinear embedding into low-dimensional coordinatesfrom high-dimensional data, and can also reconstruct high-dimensional coordinates from embedding coordinates. Thealgorithm of LTSA consists of two main steps: (1) Conducta PCA in the neighborhood of each data point; (2) Globallyalign the embedding under the local PCA coordinates.

In order to incorporate pixel structures, [7] defines dataneighborhood by using their spatial affinity of pixel loca-tion. That is, for each pixel i, the neighborhood of Ii is allthe RGB vectors Iij with ij being in the neighbor of i interms of a local window wi = i1, ..., iK. For each pixeli, define a subset

Xi = Iij |ij ∈ wi, j = 1, ...,K. (4)

Applying the classical PCA [4] over Xi, there exist a Qiof d ≤ 2 orthonormal columns such that

Iij = Ii +Qiθ(i)j + ξ

(i)j (5)

where ξ(i)j = (E−QiQTi )(Iij−Ii) with the identity matrix

E is the reconstruction error, θ(i)j is the local coordinates

over local tangent space in the color space and Ii is themean color vector. As done in the LTSA algorithm, Qi canbe obtained by computing the SVD decomposition of Xi.For more details refer to [21, 7].

For the purpose of image matting, the authors of [7] re-construct global matting feature αi of the local coordinatesθ

(i)j based on the local color information on the manifold

Page 3: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

defined by local windows. Specifically, the prospective mat-ting values αij satisfies the following set of equations, ac-cording to local structures determined by the θ(i)

j ,

αij = αi + Liθ(i)j + ε

(i)j , j = 1, ...,K; i = 1, ..., N

where αi is the mean of αij (j = 1, ...,K). Denote αi =[αi1 , ..., αiK ], Θi = [θ(i)

1 , ..., θ(i)K ] and Ei = [ε(i)1 , ..., ε

(i)K ],

then the above linear system can be written in a matrix form,

αi =1K

αieeT + LiΘi + Ei

where e = (1, ..., 1)T is a K dimensional vector. Simi-lar to LTSA, the algorithm seeks to find αi and the localaffine transformations Li to minimize the reconstruction er-rors ε(i)j , i.e.,

minLi,αi

∑i

‖Ei‖2 =∑i

‖αi(E −1K

eeT )− LiΘi‖2 (6)

Denote α = (α1, ..., αN )T . Similar to the mattingLaplacian matrix formulation, the above optimal problemcan be solved by minimizing it with respect to each Li, thenthe minimizer α should be the solution of the following re-vised optimal problem after some notation manipulation

minα

∑i

‖Ei‖2 = αTBα

whereB is called LTSA alignment matrix which can be cal-culated by a simple iterative procedure. See [21, 7]. Likethe matting Laplacian matrix L, the matrix B is also semi-definite positive.

In [7], three different formulations are proposed by com-bining the above objective function with different con-straint. The below is one of examples. The matting valuesα is sought by

minΓΩ(α)=αΩ

0αe

F (α) =∑i

‖Ei‖2 = αTBα (7)

where ΓΩ is the operator mapping α to the alpha value αΩ

over user-specified pixels Ω.

2.3. LLE Alignment Matrix Approach

Locally linear embedding (LLE) is an unsuper-vised learning algorithm that computes low-dimensional,neighborhood-preserving embeddings of high-dimensionalinputs, see [17]. LLE maps its inputs into a single global co-ordinate system of lower dimensionality, and its optimiza-tions do not involve local minima. By exploiting the localsymmetries of linear reconstructions, LLE is able to learnthe global structure of nonlinear manifolds, such as thosegenerated by images of faces or documents of text.

The argument discussed in [1] to use LLE algorithm forimage matting is that, like the application of LTSA ap-proach, local linear structure among the color informationover a local window can be transferred to the matting valuespace. This observation is very intuitive for the LLE align-ment matrix to be successful in image matting.

In the sequel, we still use Xi = Iij |ij ∈ wi, j =1, ...,K to denote the subset of color vectors over a localwindow pixels of pixel i. Note that the pixel Ii is containedin Xi. Under the LLE assumption, the color vector Ii atpixel i can be approximated by a linear combination wij(the so-called reconstruction weights) of its K − 1 nearestneighbors Xi\Ii. Hence, LLE fits a hyperplane through Iiand its nearest neighbors in the color manifold defined overthe image pixels. The fitting is achieved by solving the fol-lowing optimal problem

minW

∑i

‖Ii −K∑j=1

wijIij‖2

under the condition∑Kj=1 wij = 1. For the sake of sim-

plicity, we assume that wij = 0 when Iij = Ii. Assumingthat the manifold is locally linear, the local linearity may bepreserved in the space of matting values. Once the weightsW have been determined, the matting values α can be de-termined by minimizing the following objective function

minαF (α) =

∑i

‖αi −K∑j=1

wijαij‖2 = αTRα (8)

where R = (E −W )T (E −W ) is called LLE alignmentmatrix. In the standard LLE algorithm, (8) is usually solvedwith additional constrained conditions like ‖α‖2 = 1 whichresults in an eigen vector problem. Instead of such standardconstraints, in image matting we can formulate a mattingsolution by including similar smoothing constraint as (3) ormatting constraint as (7).

3. Zero-One Regularized Matting and Its Effi-cient Algorithm

3.1. Zero-One Regularization

Each of the three methods introduced in the last sectioninvolves in a quadratic objective function with some addi-tional constraints. As we have already pointed out, in prac-tices, we favor a constraint which results in a hard mattingvalue, meaning αi = 0 or αi = 1. Inspired by the re-cent research in sparse modeling approaches [13, 10], thehard matting values may be enforced by applying a spar-sity inducing penalty. Obviously the LASSO [19] is one ofwidely used sparsity inducing penalties. A lot of practicalalgorithms have been proposed since then. In this paper, we

Page 4: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

propose to use the `1 penalty as well to enforce hard mattingvalues. The `1 penalty is combined with the common ob-jective functions in the three approaches reviewed in the lastsection. Thus, the matting problem is formulated as findinga solution to the following optimization problem

minΓΩ(α)=αΩ

F (α) = αTΦα + λ1|α|1 + λ2|e−α|1

where e is a vector of all components 1s and Φ is a semi-positive definite matrix such as the matting Laplacian ma-trix L, both λ1 > 0 and λ2 > 0 are two user-specified regu-larizers, and |α|1 =

∑|αi| and |e − α|1 =

∑|1 − αi|.

Given the fact that the constraints ΓΩ(α) = αΩ is theequality for the variables, this constraint can be removedby variable substitution. Without loss of generality, supposethat α has been ordered as α = (αΩc ,αΩ) where αΩ is theuser given scribbles or trimap. Accordingly we decomposethe Φ into

Φ =(

ΦΩc Φ12

Φ21 ΦΩ

)where ΦT12 = Φ21. Taking the above decomposition intoF (α) results in the following unconstrained problem withrespect to αΩc ,

minαΩc

αΩcΦΩcαTΩc +2αΩΦ21α

TΩc +λ1|αΩc |1+λ2|e−αΩc |1

In the sequel, without loss of generality, we consider ageneral unconstrained formulation with a linear term

minαF (α) =αTΦα + bTα + λ1|α|1 + λ2|e−α|1 (9)

where b is a known vector of dimension N . We call thistype of penalty the zero-one LASSO penalty.

3.2. Constrained Zero-One Regularization

The above zero-one regularization formulation can befurther strengthened by imposing the constraints 0 α e. Thus we have the following constrained zero-one regu-larizartion,

min0αe

F (α) = αTΦα + bTα + λ1|α|1 + λ2|e−α|1(10)

3.3. Smoothed Zero-One Regularization

Two formulations presented above use the stricttrimap/scribble constraints ΓΩ(α) = αΩ which is removedby using variable substitution. However the smoothingtrimap/scribble method as done in (3) has been proven tobe robust in matting [11]. This inspires us to define a com-bined version of smoothes zero-one regularization as

F (α) =αTΦα + λ(α−αΩ)TDΩ(α−αΩ)+ λ1|α|1 + λ2|e−α|1 (11)

where Φ as usual is the Laplacian matrix L, LTSA align-ment matrix B or LLE alignment matrix R.

3.4. Suggested Optimization Algorithms

In (9) and (10), denote f(α) = αTΦα + bTα orf(α) = αT (Φ + λDΩ)α− 2λ(DT

ΩαΩ)Tα + λαTΩDΩαΩ

and φ(α) = λ1|α|1 + λ2|e − α|1, then we can see thatf(α) is a convex smooth function while φ(α) is a con-vex nonsmooth function. Both problems in (9) and (10) arechallenging to solve, as the zero-one LASSO is non-smoothand there is a constraint. One of algorithms to solve (9) isto reformulate it as an equivalent constrained smooth op-timization problem by introducing additional variables andconstraints, and then apply the standard solver for optimiza-tion.

Note that the only difference between (9) and (10) is theextra constraint 0 α e. First we focus on (9). In thispaper, we develop an Efficient Zero-One Lasso Algorithm(EZOLA) based on the fact that the objective function of(9) is a composite function with the smooth part f(α) and anon-smooth part φ(α). One appealing feature of EZOLA isthat it makes use of the special structure of (9) for achievinga convergence rate of O(1/k2) for k iterations, which isoptimal for the first-order black-box methods. Note that,when directly applying the black-box first-order method forsolving the non-smooth problem (9), one can only achieve aconvergence rate of O(1/

√k), much slower than O(1/k2).

In the sequel, we propose to apply the optimal first-orderblack-box method for composite non-smooth convex opti-mization, i.e., one type of Nesterov’s methods [14, 15, 3],to achieve a convergence rate of O(1/k2). We first con-struct the following model for approximating the objectivefunction F (α) in (9) at the point α,

hC,α(α) =f(α) + f ′(α)(α−α)T

+C

2‖α−α‖2 + φ(α), (12)

where C > 0 is a constant.With model (12), the Nesterov’s method is based on two

sequences αk and sk in which αk is the sequence ofapproximate solutions while sk is the sequence of searchpoints. The search point sk is the convex linear combinationof αk−1 and αk as

sk = αk + βk(αk −αk−1)

where βk is a properly chosen coefficient. The approximatesolution αk+1 is computed as the minimizer of hCk,sk

(α).It can be proved that

αk+1 = argminα

Ck2

∥∥∥∥α− (sk −1Ck

f ′(sk))∥∥∥∥2

+ φ(sk)

(13)

where Ck is determined by the line search according to theArmijo-Goldstein rule so that Ck should be appropriate for

Page 5: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

sk, see [3]. Optimization problem defined in (13) can bedecoupled into the following single-variable optimizationproblem

minαh(α) =

12‖α− v‖22 + λ1|α|+ λ2|1− α| (14)

where v is known and determined by sk − 1Ckf ′(sk).

(14) can be easily solved by the following lemma:Lemma 1. The solution to the problem (14) can be ana-

lytically computed as:

α∗ =

v − λ1 − λ2, v − λ1 − λ2 > 11, |1− v − λ1| ≤ λ2

v − λ1 + λ2, 0 < v − λ1 + λ2 < 10, |v + λ2| ≤ λ1

v + λ1 + λ2, v + λ1 + λ2 < 0

Proof: The necessary and sufficient condition for α∗ be-ing the optimal solution to (14) is that 0 ∈ ∂h(α∗). We notethat, the solution is unique. We can consider the followingfive cases.

• Case 1: α∗ > 1. Since 0 ∈ h(α∗) = α∗ − v + λ1 +λ2, we have α∗ = v− λ1 − λ2 and v− λ1 − λ2 > 1.

• Case 2: α∗ = 1. Since 0 ∈ h(α∗) = 1 − v + λ1 +λ2SGN(0), we have |1− v + λ1| ≤ λ2.

• Case 3: 0 < α∗ < 1. Since 0 ∈ h(α∗) = α∗ −v + λ1 − λ − 2, we have α∗ = v − λ1 + λ2 and0 < v − λ1 + λ2 < 1.

• Case 4: α∗ = 0. Since 0 ∈ h(α∗) = −v +λ1SGN(0)− λ2, we have |v + λ2| ≤ λ1.

• Case 5: α∗ < 0. Since 0 ∈ h(α∗) = α∗ − v − λ1 −λ2, we have α∗ = v+ λ1 + λ2 and v+ λ1 + λ2 < 0.

This completes the proof.Now let us turn to the constrained version (10). Nes-

terov’s method can be easily applied to the constrained ver-sion [3]. In the constrained case, the building block similarto (14) is formulated according to constrain conditions. Inits single component, this is given by

min0≤α≤1

h(α) =12‖α− v‖22 + λ1|α|+ λ2|1− α| (15)

Then we haveLemma 2. The solution to the problem (15) can be ana-

lytically computed as:

α∗ =

1, v ≥ 1 + λ1 − λ2

v − λ1 + λ2, λ1 − λ2 < v < 1 + λ1 − λ2

0, v ≤ λ1 − λ2

Proof: Due to constraint condition, the absolute oper-ation in the objective function can be removed, so we areconsidering the following optimal problem

min0≤α≤1

h(α) =12

(α− v)2 + λ1α+ λ2(1− α)

The corresponding Lagrangian is

L(α, µ1, µ2) =12

(α− v)2 + λ1α+ λ2(1− α)

+ µ1α+ µ2(1− α)

where µ1 ≤ 0 and µ2 ≤ 0. For the optimal value α∗, µ∗1and µ∗2, TTK conditions [5] are

α∗ − v + λ1 − λ2 + µ∗1 − µ∗2 = 00 ≤ α∗ ≤ 1µ∗1α

∗ = 0 and µ∗1 ≤ 0µ∗2(1− α∗) = 0 and µ∗2 ≤ 0

Now we consider three cases:

• Case 1: α∗ = 0. When this happens, we must haveµ∗2 = 0. Hence

v = λ1 − λ2 + µ∗1 ≤ λ1 − λ2.

• Case 2: α∗ = 1. When this happens, we must haveµ∗1 = 0. Hence

v = 1 + λ1 − λ2 − µ∗2 ≤ 1 + λ1 − λ2

• Case 3: 0 < α∗ < 1. In this case, we must have bothµ∗1 = µ∗2 = 0. Hence

α∗ = v − λ1 + λ2

and λ1 − λ2 < v < 1 + λ1 − λ2.

This completes the proof of Lemma 2.In the sequel, as an example, we propose the efficient

algorithm for (9). To do so, taking as v in Lemma 1 eachcomponent of sk − 1

Ckf ′(sk) in (13) in turn, we can work

out each component of the minimizer αk+1 of (13).The whole algorithm for (9) or (10) is summarized in

Algorithm 1 on the next page.Finally the smoothed zero-one regularization problem

(11) can be easily solved by using the above Nesterov’smethod.

3.5. Reconstruction of Foreground and Back-ground Images

After solving for the alpha values α, we need to recon-struct foreground F and background B. For this purpose,

Page 6: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

Algorithm 1 The Efficient Nesterov’s AlgorithmInput: C0 > 0 and α0, KOutput: αk+1

1: Initialize α1 = α0, γ−1 = 0, γ0 = 1 and C = C0.2: for k = 1 to K do3: Set βk = γk−2−1

γk−1, sk = αk + βk(αk −αk−1)

4: Find the smallest C = Ck−1, 2Ck−1, ... such that

F (αk+1) ≤ hC,sk(αk+1),

where αk+1 is defined by (13) and solved by Lemma1 or by Lemma 2 for the constrained case.

5: Set Ck = C and γk+1 = 1+√

1+4γ2k

26: end for

we take the same strategy as [11] to reconstruct F and Bby using the composition equation (1) with certain smooth-ness priors on both F and B. F and B are obtained fromoptimizing the following objective function

minF,B

∑i

‖αiFi+(1−αi)Bi−Ii‖2+〈∂αi, (∂Fi)2+(∂Bi)2〉

where ∂ is the gradient operator over the image grid. For afixed α, the problem is quadratic and its minimum can befound by solving a set of linear equations.

4. Experiment ResultsTo assess the performance of three newly suggested for-

mulations for image matting, we will take the LTSA align-ment matrix as an example. LTSA alignment matrix hasbeen proven to be very comparable to the original Lapla-cian matrix [7, 11] in image matting.

All the algorithms are implemented using MATLAB ona small workstation machine with 32G memory. We usebenchmark images taken from the matting website http://www.alphamatting.com. Four images (GT02,GT04, GT13 and GT26) are chosen for testing the pro-posed Nesterov’s algorithms for Zero-One Regulariza-tion (ZOR), Constrained Zero-One Regularization (CZOR)and Smoothed Zero-One Regularization (SZOR). Figure 1presents those original images (column one), their trimapimages (column two) and ground truth matting masks (col-umn three). The trimap images are used in the algorithmsas users guidance. In order to run the algorithms on ourmachine, we downsize both images and trimaps by 40%,respectively.

We note that when λ1 = λ2 the algorithm of ZOR isequivalent to that of CZOR. To distinguish them in the ex-periment, we set λ1 = 100, λ2 = 100.01 for ZOR andλ1 = 100, λ2 = 100.01 for CZOR formulations. Thatis, we slightly prefer to foreground objects. We set λ =

Figure 1: Images, Their Trimaps and Ground Truth Masks:Row one: GT02; Row two: GT04; Row three: GT13; Rowfour: GT26

100, λ1 = λ2 = 10 for SZOR formulation. In Nesterov’smethod we set C0 = 1 for all the experiments. The LTSAalignment matrix and the Laplacian matrix are constructedwith window size 2.

Figure 2: Foreground objects with holes: ZOR Results inColumn One; CZOR in Column Two and SZOR in ColumnThree.

In Figure 2, the first column shows the result from the

Page 7: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

Nesterov’s method for the ZOR formulation, the second col-umn for the CZOR formulation and the third column for theSZOR formulation. Visually it is easy to identify that theresults in column three of Figure 2 are quite close to theground truth masks shown in the third column of Figure 1.The results of SZOR are superior to the results given byboth ZOR and CZOR formulations and the ZOR is quitecomparable to CZOR. For images GT02 and GT04, mostof background in holes are recovered by the SZOR formu-lation. However, for image GT26, it is very hard to distin-guish the results given by three formulations.

As a comparison, we also applied the standard close-form solution of Laplacian matrix [11] to the same imagesand their trimaps. The learned masks are displayed in Fig-ure 3. In general the results are slightly inferior to the resultsgiven by both ZOR and CZOR formulation.

Figure 3: Foreground Masks Learned by the StandardClosed-form Solution based on the Laplacian Matrix

To better assess the results quantitatively, we also mea-sure the error between the ground truth masks and learntmasks. Table 1 shows the mean square errors (MSEs) foreach case. The MSE rank of our algorithms shows theSZOR formulation is the winner for most case while theSZOR performs worst for image GT26.

Images Closed-Form ZOR CZOR SZORGT02 0.00754 0.00668 0.00667 0.00233GT04 0.01215 0.01183 0.01213 0.00863GT13 0.01275 0.02372 0.01828 0.01267GT26 0.01676 0.02197 0.02308 0.01974

Table 1: MSE Results

We also tested the impact of different window sizes inthe smoothed zero-one formulation using image GT04. TheLTSA alignment matrices are constructed at four windowsize w = 2, 3, 4, 5. The learnt masks at four window sizesare shown in Figure 4.

Table 2 reports the MSEs given by SZOR formulationfrom which we can see that the best result is obtained in

Figure 4: Foreground Masks Learned by the SZOR formu-lation with four different window sizes when constructingLTSA alignment matrix

the case of window size w = 3. We have to stress that wefailed to run the Matlab program of the closed-form solutionprovided in [11] for large window sizes due to the fact thatthe matting Laplacian is getting less sparse in larger windowsize. In the later case, both the memory and the time neededto solve the linear system will increase tremendously.

Images w = 2 w = 3 w = 4 w = 5GT04 0.00863 0.00673 0.00702 0.00679

Table 2: MSE Results for GT04 at Different Window Sizes

In the last experiment, we aim to assess the quality ofthree new formulations for image matting with users scrib-bles. Due to the space limitation, here we only report thetesting result for image GT02 in which there are many smallholes. The original image and the corresponding strokesused in this experiment are shown in Figure 5.

Figure 5: Benchmark Image GT02: The original image(left) and The image with strokes (right).

For each of formulations ZOR, CZOR and SZOR, wetested the LTSA matrices with four different window sizes.Figure 6 shows the learnt masks (the first row), the extractedforeground images (the second row) and the backgroundimages (the third row) for the best window size in each for-mulation. It is clear to see that the mask given by SZORformulation reveals more small holes in the image.

Furthermore, Table 3 reports the MSEs for all the tests.The measures show the SZOR is the best among three for-mulations.

Page 8: The Image Matting Method with Zero-One Regularizationcsusap.csu.edu.au/~rpaul/paper/ICME2012.pdf · The Image Matting Method with Zero-One Regularization ... Tempe, AZ 85287, USA

Figure 6: Best Results for Each Formulation: Column onefor ZOR with w = 2, Column two for CZOR with w = 5,and SZOR with w = 4.

Formulations w = 2 w = 3 w = 4 w = 5ZOR 0.00651 0.00699 0.00713 0.00706CZOR 0.02839 0.00715 0.00724 0.00712SZOR 0.00912 0.00659 0.00638 0.00662

Table 3: MSE Results for GT02 for ZOR, CZOR and SZORat Different Window Sizes

5. Discussion

We proposed three optimization formulations for the im-age matting based on the LTSA alignment matrix. The sim-ilar approach can be applied to the Laplacian matrix intro-duced in the closed-form solution of image matting in [11]and the Locally Linear Embedding (LLE) alignment ma-trix [17]. The experiment result shows that the smoothedzero-one regularization gives the best visual and quantita-tive result in image matting assisted with both the trimapand user’s scribbles.

Acknowledgements

This work is partially supported by [University NameRemoved] Competitive Research Grant OPA 4818.

References[1] T. Authors. LLE algorithm in natural image matting. Re-

search report, XXXX University, Submitted for Publication,2011. 2, 3

[2] X. Bai and G. Sapiro. A geodesic framework for fast interac-tive image and video segmentation and matting. In Proceed-ings of International Conference on Computer Vision, pages1–8, 2007. 1

[3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAMJournal of Imaging Sciences, 2(1):183–202, 2009. 4, 5

[4] C. Bishop. Pattern Recognition and Machine Learning. In-formation Science and Statistics. Springer, 2006. 2

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cam-bridge University Press, United Kingdom, 2004. 5

[6] Y. Chuang, B. Curless, D. Salesin, and R. Szeliski. ABayesian approach to digital matting. In Proceedings ofCVPR, volume 2, pages 264–271, 2001. 1

[7] J. Gao and T. Bossomaier. Image matting via local tangentspace alignment. Research report, Charles Sturt University,submitted for publication, 2011. 1, 2, 3, 6

[8] L. Grady, T. Schiwietz, and S. Aharon. Random walks forinteractive alpha-matting. In VIIP, 2005. 1

[9] K. He, J. Sun, and X. Tang. Fast matting using large kernelmatting Laplacian matrices. In Proceedings of CVPR 2010,pages 2165–2172, 2010. 1

[10] L. Jacob, G. Obozinski, and J.-P. Vert. Group LASSO withoverlap and graph LASSO. In Proceedings of 26th Interna-tional Conference on Machine Learning, 2009. 3

[11] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solu-tion to natural image matting. IEEE Transactions on PatternAnalysis and Machine Intelligence, 30(2):228–242, 2008. 1,2, 4, 6, 7, 8

[12] Y. Li, J. Sun, C. K. Tang, and H. Y. Shum. Lazy snapping.ACM Transactiona on Graphics, 23(3):303–308, 2004. 1, 2

[13] J. Liu, S. Ji, and J. Ye. Multi-task feature learning viaefficient `2,1-norm minimization. In Proceedings of TheTwenty-fifth Conference on Uncertainty in Artificial Intelli-gence (UAI 2009), page 43, Montreal, Quebec, 2009. 3

[14] Y. Nesterov. A method for solving a convex programmingproblem with convergence rate o(1/k2). Soviet Math. Dokl.,27:372–376, 1983. 4

[15] Y. Nesterov. Gradient methods for minimizing compositeobjective function. CORE DISCUSSION PAPER 76, Uni-versite catholique de Louvain, Belgium, 2007. 4

[16] C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interac-tive foreground extraction using iterated graph cuts. ACMTransactions on Graphics, 23(3):309–314, 2004. 2

[17] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduc-tion by locally linear embedding. Science, 290:2323–2326,2000. 3, 8

[18] J. Sun, J. Jia, C. K. Tang, and H. Y. Shum. Poisson matting.ACM Trans. Graph., 23(3):315–321, 2004. 1

[19] R. Tibshirani. Regression shrinkage and selection via theLASSO. Journal of the Royal Statistical Society, Ser. B,58:267–288, 1996. 3

[20] J. Wang and M. F. Cohen. Image and video matting: A sur-vey. Foundations and Trends in Computer Graphics and Vi-sion, 3(2):97–175, 2007. 1

[21] Z. Zhang and H. Zha. Principal manifolds and nonlinear di-mension reduction via local tangent space alignment. SIAMJournal of Scientific Computing, 26:313–338, 2004. 2, 3