mc complete version

8/3/2019 MC Complete Version

1/170

ELEG 867 - Compressive Sensing and Sparse Signal

Representations

Introduction to Matrix Completion and Robust PCA

Gonzalo Garateguy

Depart. of Electrical and Computer Engineering

University of Delaware

Fall 2011

ELEG 867 (MC and RPCA problems) Fall, 2011 1 / 91
http://-/?-http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


2/170

Matrix Completion Problems - Motivation

Recomender Systems

Items

User 1 x x ? ? x xUser 2 ? ? x x ? ?

. ? x ? x x ?

. x ? ? x ? x

. x ? x ? ? x

. ? x ? ? x ?

. ? ? x x x ?

User n x x ? ? ? x

Collaborative filtering (Amazon, last.fm)Content based (Pandora,

www.nanocrowd.com)

Netflix prize competition boosted interest in

the area

http://www.ima.umn.edu/videos/index.php?id=1598

http://sahd.pratt.duke.edu/Videos/keynote.html

http://www.ima.umn.edu/videos/index.php?id=1598http://sahd.pratt.duke.edu/Videos/keynote.htmlhttp://sahd.pratt.duke.edu/Videos/keynote.htmlhttp://www.ima.umn.edu/videos/index.php?id=1598http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


3/170


Sensor location estimation in Wireless

Sensor Networks

node 1

no e 2

node 3

node 4

node 5

node 6

node 7

d12

d13

d24

d34d45

d57

d56

d67

d23=?

d43=?

d74=?

d64=?

.

.

.

Distance matrix

1 2 3 4 5 6 7

1 0 d1,2 d1,3 ? ? ? ?

2 d2,1 0 ? d2,4 ? ? ?

3 d3,1 ? 0 d3,4 ? ? ?

4 ? d4,2

d4,3

0 d4,5

? ?

5 ? ? ? d5,4 0 d5,6 d5,76 ? ? ? ? d6,5 0 d6,77 ? ? ? ? d7,5 d7,6 0

The problem is to find the positions of the

sensors in R2 given the partial information

about relative distances

A distance matrix like this has rank 2 in R2

For certain types of graphs the problem can be

solved if we know the whole distance matrix

http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


4/170


Image reconstruction from incomplete data

Reconstructed image Incomplete image 50% of the pixels



5/170

Robust PCA - Motivation

Foreground identification for surveillance applications

E.J. Candes, X. Li, Y. Ma, and Wright, J. Robust principal component analysis? http://arxiv.org/abs/0912.3599

http://arxiv.org/abs/0912.3599http://arxiv.org/abs/0912.3599http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


6/170


Image alignment and texture recognition

Z. Zhang, X. Liang, A. Ganesh, and Y. Ma, TILT: transform invariant low-rank textures Computer VisionACCV 2010



7/170


Camera calibration with radial distortion

J. Wright, Z. Lin, and Y. Ma Low-Rank Matrix Recovery: From Theory to Imaging Applications Tutorial presented at International

Conference on Image and Graphics (ICIG), August 2011



8/170

Motivation

Many other applications

System Identification in control theory

Covariance matrix estimation

Machine Learning

Computer Vision

Videos to watchMatrix Completion via Convex Optimization: Theory and Algorithms by Emmanuel Candeshttp://videolectures.net/mlss09us_candes_mccota/

Low Dimensional Structures in Images or Data by Yi Ma, Workshop in Signal Processing with Adaptive

Sparse Structured Representations (June 2011)

http://ecos.maths.ed.ac.uk/SPARS11/YiMa.wmv

http://videolectures.net/mlss09us_candes_mccota/http://ecos.maths.ed.ac.uk/SPARS11/YiMa.wmvhttp://ecos.maths.ed.ac.uk/SPARS11/YiMa.wmvhttp://videolectures.net/mlss09us_candes_mccota/http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


9/170

Problem Formulation

Matrix completion

minimize rank (A) (1)

subject to Aij = Dij (i,j)

Robust PCA

minimize rank (A) + ||E||0 (2)

subject to Aij + Eij = Dij (i,j)



10/170

Problem Formulation

Matrix completion



Robust PCA



Very hard to solve in general without any asumptions, some times NP

hard.



11/170

Problem Formulation

Matrix completion



Robust PCA




hard.

Even if we can solve them, are the solutions always what we expect?



12/170

Problem Formulation

Matrix completion



Robust PCA




hard.

Even if we can solve them, are the solutions always what we expect?

Under wich conditions we can have exact recovery of the real matrices?



13/170

Outline

Convex Optimization concepts

Matrix Completion

Exact Recovery from incomplete data by convex relaxationALM method for Nuclear Norm Minimization

Robust PCA

Exact Recovery from incomplete data and corrupted data by convex

relaxation

ALM method for Low rank and Sparse separation

http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


14/170

Convex sets and Convex functions

Convex set

A set C is convex if the line segment between any two points in Clies in C.

For any x1,x2 C and any with 0 1 we have

x1 + (1 )x2 C.

http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


15/170


Convex set

A set C is convex if the line segment between any two points in Clies in C.

For any x1,x2 C and any with 0 1 we have

x1 + (1 )x2 C.

convex non convex non convex



16/170


Convex combination

A convex combination of k points x1,..,xk is defined as

1x1 + ... + kxk , where i 0 and 1 + ... + k = 1



17/170


Convex combination

A convex combination of k points x1,..,xk is defined as

1x1 + ... + kxk , where i 0 and 1 + ... + k = 1

Convex hull

The convex hull ofC is the set of all convex conbinations of points in C

conv C = {1x1 + ... + kxk|xi C, i 0, i = 1,..., k, 1 + ... + k = 1}



18/170


Operations that preserve convexity

Intersection

IfS1 and S2 are convex, then S1

S2 is convex.

In general ifS is convex for every A, then

A S is convex.

Subspaces, affine sets and convex cones are therefore closed under arbitraryintersections.

Affine functions

Let f : Rn Rm be affine, f(x) = Ax + b, where A Rmn and b Rm. IfS R

n

is convex, then the image ofS under f

f(S) = {f(x)|x S}

is convex



19/170


Convex functions

A function f : Rn R is convex ifdomf is a convex set and if for allx,y domf, and with 0 1, we have

f(x + (1 )y) f(x) + (1 f(y))

we say that f is strictly convex if the strict intequality holds whenever x = yand 0 < < 1



20/170

Operations that preserve convexity

Composition with an affine mapping

Suppose f : Rn R, A Rnm and b Rn. Define g : Rm R byg(x) = f(Ax + b)

with domg =

{x

|Ax + b

domf

}. Then iff is convex, so is g.

Pointwise maximum

iff1 and f2 are convex functions then their pointwise maximum f defined by

f(x) = max

{f1(x),f2(x)

}with domf = domf1 domf2 is also convex. This also extend to the casewhere f1, ...,fm are convex, then

f(x) = max

{f1(x), ...,fm(x)

}, is also convex



21/170

Pointwise maximum of convex functions

f1(x)

f2(x)f(x)=max{f1(x),f2(x)}

f(x)f(x1)

f(x2)^f(x)



22/170


Convex differentiable functions

Iff is differentiable (i.e. its gradient f exist at each point in domf). Then fis convex if and only if domf is convex and

f(y) f(x) + f(x)T(y x)

holds for all x,y domf.



23/170

Second order conditions

Iff is twice differentiable, i.e. its Hessian 2f exist at each point in domf.Then f is convex if and only ifdomf is convex and its Hessian is positive

semidefinite for all x domf2f(x) 0



24/170

Convex non-differentiable functions

The concept of gradient can be extended to non-differentiable functions

introducing the subgradient

Subgradient of a function

A vector g Rn is a subgradient off : Rn R at x domf if for allz

domf

f(z) f(x) + gT(z x)



25/170

Subgradients

Observations

Iff is convex and differentiable, then its gradient at x , f(x) is its onlysubgradient

Subdifferentiable functions

A function f is called subdifferentiable at x if there exist at least one

subgradient at x

Subdifferential at a point

The set of subgradients of f at the point x is called the subdifferential of f at x,

and is denoted f(x)

Subdifferentiability of a function

A function f is called subdifferentiable if it is subdifferentiable at all

x

domf



26/170

Basic properties

Existence of the subgradient of a convex function

Iff is convex and x int domf, then f(x) is nonempty and bounded.

The subdifferential f(x) is always a closed convex set, even if f is notconvex. This follows from the fact that it is the intersection of an infinite set

of halfspaces

f(x) = zdomf{

g

|f(z)

f(x) + gT(z

x)

}.



27/170

Basic properties

Nonnegative scaling

For 0, (f)(x) = f(x)



28/170

Basic properties

Nonnegative scaling

For 0, (f)(x) = f(x)

Subgradient of the sum

Given f = f1 + ... + fm, where f1, ...,fm are convex functions, the subgradient

off at x is given by f(x) = f1(x) + ... + fm(x)



29/170

Basic properties

Nonnegative scaling

For 0, (f)(x) = f(x)




Affine transformations of domain

Suppose f is convex, and let h(x) = f(Ax + b). Then h(x) = ATf(Ax + b).


B i i


30/170

Basic properties

Nonnegative scaling

For 0, (f)(x) = f(x)




Affine transformations of domain

Suppose f is convex, and let h(x) = f(Ax + b). Then h(x) = ATf(Ax + b).

Pointwise maximum

Suppose f is the pointwise maximum of convex functions f1, ...,fm,f(x) = max

i=1,...,mfi(x), then f(x) = Co {fi(x)|fi(x) = f(x)}


S b di f h i i i f


31/170

Subgradient of the pointwise maximum of two convex

functions

f1(x)


x


S b di t f th i t i i f t


32/170


functions

f1(x)


x


S b di t f th i t i i f t


33/170


functions

f1(x)


x


E l


34/170

Examples

Conside the function f(x) = |x|.


Examples


35/170

Examples

Conside the function f(x) = |x|. At x0=0 , the subdiferential is defined by theinequality


Examples


36/170

Examples


f(z) f(x0) + g(z x0), z dom f


Examples


37/170

Examples


f(z) f(x0) + g(z x0), z dom f|z| gz, z R


Examples


38/170

Examples



f(0) = {g | g [1 , 1]}


Examples


39/170

Examples



f(0) = {g | g [1 , 1]}then for all x


Examples


40/170

Examples



f(0) = {g | g [1 , 1]}then for all x

f(x) = 1 for x < 01 for x > 0{g|g [1, 1]} for x = 0


Examples


41/170

Examples



f(0) = {g | g [1 , 1]}then for all x

f(x) = 1 for x < 01 for x > 0{g|g [1, 1]} for x = 0


Example: 1 norm


42/170

Example: 1 norm

Consider f(x) = x1 = |x1| + + |xn|, and note that f can be expressed asthe maximum of 2

n

linear functions


Example: 1 norm


43/170

Example: 1 norm


n

linear functions

x1 = max{ f1(x) ,.., f2n (x) }


Example: 1 norm


44/170

a p 1


n

linear functions

x1 = max{ f1(x) ,.., f2n (x) }

x1 = max{ sT1x ,.., sT2nx | si {1, 1}n }


Example: 1 norm


45/170

p 1


n

linear functions

x1 = max{ f1(x) ,.., f2n (x) }

x1 = max{ sT1x ,.., sT2nx | si {1, 1}n }

The active functions fi(x) at x are the ones for wich sTi x = x1. Thendenoting

si = [si,1, ..., si,n]T, si,j {1, 1}


Example: 1 norm


46/170

p 1


n

linear functionsx1 = max{ f1(x) ,.., f2n (x) }

x1 = max{ sT1x ,.., sT2nx | si {1, 1}n }

The active functions fi(x) at x are the ones for wich sTi x = x1. Thendenoting

si = [si,1, ..., si,n]T, si,j {1, 1}

the set of indices of the active functions at x is

Ax =i

si,j = 1 for xj < 0si,j = 1 for xj > 0

si,j = 1 or 1 for xj = 0, for j = 1,.., n


subgradient of the 1 norm


47/170

g

The subgradient ofx1 at a generic point x is defined byx1 = co { fi(x) | i Ax }




48/170

g

The subgradient ofx1 at a generic point x is defined byx1 = co { fi(x) | i Ax }x1 = co{ fi(x) | i Ax }




49/170

The subgradient ofx1 at a generic point x is defined byx1 = co { fi(x) | i Ax }x1 = co{ fi(x) | i Ax }x1 = co{ si|i Ax }




50/170

The subgradient ofx1 at a generic point x is defined byx1 = co { fi(x) | i Ax }x1 = co{ fi(x) | i Ax }x1 = co{ si|i Ax }x1 = {g|g = iAx isi , i 0 , i i = 1}

or equivalently

x1 = g gj = 1 for xj < 0gj = 1 for xj > 0gj = [1, 1] for xj = 0


1 norm on R2


51/170

in R2

the set of subgradients are

s1 = [ 1, 1]Ts2 = [ 1, 1]Ts3 = [ 1, 1]

T

s4 = [ 1,

1]T



52/170


Convex optimization problems


53/170

An optimization problem is convex if its objective is a convex function, theinequality constraints fj are convex and the equality constraints hj are affine

minimizex

f0(x) (Convex function)

s.t. fi(x)

0 (Convex sets)

hj(x) = 0 (Affine)

or equivalently

minimizex f0(x) (Convex function)s.t. x C C is a convex set

hj(x) = 0 (Affine)


Theorem

If i l l i i i f ti i ti bl it i l b l


54/170

Ifx is a local minimizer of a convex optimization problem, it is a global

minimizer.

Optimality conditions

A point x is a minimizer of a convex function f if and only iff is

subdifferentiable at x and

0 f(x)


Convex optimization problems


55/170

Given the convex problem

minimizex

f0(x)

s.t. fi(x) 0, i = {1, ..., k}hj(x) = 0, j =

{1, ..., l

}its Lagrangian function is defined as

L(x, , ) = f0(x) +

l

j=1

jhj(x) +k

i=1

ifi(x)

where i 0, i R


Augmented Lagrangian Method


56/170

Considering the problem

minimizex

f(x)

s.t. x Ch(x) = 0

(3)

The augmented lagrangian is defined as

L(x, , c) = f(x) + Th(x) +

2 h(x)

22

where is a penalty parameter and is the multiplier vector




57/170

The augmented lagrangian method consist of solving a sequence of problems

of the form

minimizex

L(x, k, k) = f(x) + kTh(x) + k2 h(x)22s.t. x C

where {k} is a bounded sequence in Rl and {k} is a penalty parametersequence satisfying

0 < k < k+1 k , k




58/170

The exact solution to problem (3) can be found using the following iterative

algorithm

set > 1

while not converged do

solve xk+1 = argminxC

L(x, k, k)k+1 = k + kh(xk+1)k = k

end while

Output xk


Matrix completion


59/170

Optimization problem


subject to Aij = Dij

(i,j)


Matrix completion


60/170




(i,j)

We look for the simplest explanation for the observed data


Matrix completion


61/170




(i,j)

We look for the simplest explanation for the observed data

Given enough number of samples, the likelihood of the solution to beunique should be high


Matrix completion


62/170

minimize rank (A)



Matrix completion


63/170

minimize rank (A)

subject to Aij = Dij (i,j) The minimization of the rank() function is a combinatorial problem,with exponential complexity in the size of the matrix!

Need for a convex relaxation


Matrix completion


64/170

minimize rank (A)



rank(A)


Matrix completion


65/170

minimize rank (A)



rank(A) = ||diag()||0 A = UVT


Matrix completion


66/170

minimize rank (A)





Matrix completion


67/170

minimize rank (A)




||A|| = ||diag()||1


Matrix completion


68/170

minimize rank (A)




||A|| = ||diag()||1

Convex relaxation

minimize A (5)subject to Aij = Dij (i,j)


Matrix Completion


69/170

Nuclear Norm

The nuclear norm of a matrix A Rmn is defined as ||A|| =r

i=1 i(A),where {i(A)}ri=1 are the elements of the diagonal matrix from the SVDdecomposition ofA = UVT


Matrix Completion


70/170

Nuclear Norm



Observationsr= rank(A) can be r < m, n. If this is the case we say that the matrix islow rank


Matrix Completion


71/170

Nuclear Norm




the singular values i(A) =

i(ATA) are obtained as the square root ofthe eigenvalues ofATA and are always

i 0


Matrix Completion


72/170

Nuclear Norm





i(ATA) are obtained as the square root ofthe eigenvalues ofATA and are always

i 0

the left singular vectors U are the eigenvectors ofAAT


Matrix Completion


73/170

Nuclear Norm





i(ATA) are obtained as the square root ofthe eigenvalues ofATA and are always i

0

the left singular vectors U are the eigenvectors ofAAT

the right singular vectors V are the eigenvectors ofATA


Matrix Completion


74/170

Spectral Norm

The spectral norm of a matrix A Rmn is defined as A2 = max(A), wheremax = max({i(A)}ri=1)


Matrix Completion


75/170

Spectral Norm


Dual Norm

Given an arbitrary norm | | | | in Rn, its dual norm | | | | is defined asz = sup{zTx | x 1}


Matrix Completion


76/170

Spectral Norm


Dual Norm


Observations

The nuclear norm is the dual norm of the spectral norm


Matrix Completion


77/170

Spectral Norm


Dual Norm


Observations

The nuclear norm is the dual norm of the spectral norm

A = sup{tr(ATX)|X2 1}


Matrix Completion

Convex relaxation of the rank


78/170



Matrix Completion



79/170


Convex envelope of a function

Let f : C R where C Rn. The convex envelope off (on C) is defined asthe largest convex function g such that g(x) f(x) for all x C


Matrix Completion



80/170




TheoremThe convex envelope of the function (X) =rank(X) onC = {X Rmn|X2 1}, is env(X) = X.



81/170

Matrix Completion



82/170




TheoremThe convex envelope of the function (X) =rank(X) onC = {X Rmn|X2 1}, is env(X) = X.Observations

The convex envelope of rank(X) on a the set {X|X2 M} is given by 1

MX

By solving the heuristic problem we obtain a lower bound on the optimal value of the original

problem (provided we can identify a bound M on the feasible set).

M. Fazel, H. Hindi and S. Boyd A Rank Minimization Heuristic with Application to Minimum Order System Approximation American

Control Conference, 2001.


Matrix completion


83/170

Convex relaxation

minimize A (6)subject to Aij = Dij

(i,j)

The original problem is now a problem with a non-smooth but convex

function as the objective

The remaining problem is the number of measurements and in which

positions have to be taken in order to guarantee that the solution is equalto the matrix D?


Matrix completion


84/170

Which types of matrices can be completed exactly?


Matrix completion


85/170


Consider the matrix

M = e1.eT

n

=

0 0 0 10 0 0 0...

..

.

..

.

..

.

..

.0 0 0 00 0 0 0


Matrix completion


86/170


Consider the matrix

M = e1.eT

n

=

0 0 0 10 0 0 0..

.

..

.

..

.

..

.

..

.0 0 0 00 0 0 0

Can it be recovered from 90 % of its samples ?

ELEG 867 (MC and RPCA problems) Fall 2011 43 / 91

Matrix completion


87/170


Consider the matrix

M = e1.eT

n

=

0 0 0 10 0 0 0..

.

..

.

..

.

..

.

..

.0 0 0 00 0 0 0


Is the sampling set important?


Matrix completion


88/170


Consider the matrix

M = e1.eT

n

=

0 0 0 10 0 0 0..

.

..

.

..

.

..

.

..

.0 0 0 00 0 0 0


Is the sampling set important?

Which sampling sets work and which ones doesnt?


Matrix completion

Sampling set


89/170

p g

The sampling set

is defined as = {(

i,j

) |D

ijis observed

}


Matrix completion

Sampling set


90/170

p g

The sampling set

is defined as = {(

i,j

) |D

ijis observed

}Consider

D = xyT x Rm,y Rn

Dij = xiyj


Matrix completion

Sampling set


91/170

The sampling set

is defined as = {(

i,j

) |D

ijis observed

}Consider

D = xyT x Rm,y Rn

Dij = xiyj

If the sampling set avoids row i, then xi can not be recovered by any

method whatsoever


Matrix completion

Sampling set


92/170

The sampling set is defined as ={

(i,j)|D

ijis observed

}Consider

D = xyT x Rm,y Rn

Dij = xiyj


method whatsoever

ObservationNo columns or rows from D can be avoided in the sampling set


Matrix completion

Sampling set


93/170

The sampling set is defined as ={

(i,j)|D

ijis observed

}Consider

D = xyT x Rm,y Rn

Dij = xiyj


method whatsoever

ObservationNo columns or rows from D can be avoided in the sampling set

There is a need for a characterization of the sampling operator with

respect to the set of matrices that we want to recover



94/170



95/170



96/170



97/170



98/170



99/170



100/170

ELEG 867 (MC d RPCA bl ) F ll 2011 51 / 91

Matrix completion

Intuition

the singular vectors need to be sufficiently spread, i.e. uncorrelated with


101/170

g y p , w

the standar basis in order to minimize the number of observations neededto recover a low rank matrix


Matrix completion

Intuition



102/170

g y p ,


Coherence of a subspace

Let U be a subspace ofRn of dimension r and PU be the orthogonal projection

onto U. Then the coherence ofU is defined to be

(U) =n

rmax

1inPUei2


Matrix completion

Intuition



103/170

g y p





(U) =n

rmax

1inPUei2

Observations

The minimum value that (U) can achieve is 1 for example if U isspanned by vectors whos entries all have magnitude 1/

n


Matrix completion

Intuition



104/170

g y p





(U) =n

rmax

1inPUei2

Observations

The minimum value that (U) can achieve is 1 for example if U isspanned by vectors whos entries all have magnitude 1/

n

The largest possible value for (U) is n/r corresponding to a subspacethat contains a standard basis element.


Matrix completion

0 coherence


105/170

A matrix D =

1kr kukvTk is 0 coherent if for some positive 0max((U), (V)) 0


Matrix completion

0 coherence


106/170

A matrix D =


1 coherence

A matrix D =

1kr kukvTk has 1 coherence if

UVT 1

r/mn

for some 1 > 0


Matrix completion

0 coherence


107/170

A matrix D =


1 coherence

A matrix D =

1kr kukvTk has 1 coherence if

UVT 1

r/mn

for some 1 > 0

Observation

IfD is 0 coherent then it is 1 coherent for 1 = 0

r


Coherence of a rank 300 approximation of kowalski

2 4

2.6

||PU

ei||

3

||PV

ei||


108/170

0 100 200 300 400 500 600 7000.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

index i

100 200 300 400 500 600 700 800 900

1

1.5

2

2.5

index i

(U) = 1.9588 (V) = 2.22900 = 2.2290

1 =

mnr

UVT = 13.412


Matrix completion

Theorem


109/170

Let D Rmn of rankr be (0, 1)-coherent and let N = max(m, n). If weobserve M entries ofD with locations sampled uniformly at random. Then

there exist constants Cand c such that if

M

Cmax(21,

1/20 1, 0N

1/4)Nr(logN)

for some > 2, then the minimizer of (6) is unique and equal to D withprobability at least 1 cn . If in addition r 10 N1/5 then the number ofobservations can be improved to

M C0N6/5r(logN)Candes, E.J. and Recht, B. Exact matrix completion via convex optimization, Foundations of Computational Mathematics 2009


Matrix completion


110/170

21 = 179.99 1/20 1 = 12.5139 0N1/4 = 4.7682

max(21, 1/20 1, 0N

1/4)Nr(2.1logN) = 6.6076 108

What is the value of C? must be C > 0.

In the limit case M = mn, C = 6759006.607108

= 9.194

104.

For the bound to be useful 0 < C < 9.194 104.


SNR = 23.74 dB , 10% of the samples


111/170




112/170




113/170




114/170




115/170




116/170




117/170



118/170

Completion Performance

70


119/170

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.910

20

30

40

50

60

percentage of samples

SN

R


Matrix completion

Recovery performance for random matrices


120/170

Figure: The x axis corresponds to rank(A)/min{m, n} and the y axis to s = 1M/mn (probability that

an entry is omited from the observations)

Emmanuel J. Candes, Xiaodong Li, Yi Ma, John Wright Robust Principal Component Analysis?

http://arxiv.org/abs/0912.3599


Matrix completion


121/170

Other bounds on number of meassurements and sampling operatorsEmmanuel J. Candes, Xiaodong Li, Yi Ma, John Wright Rodbust Principal Component Analysis?


Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo, Alan S. Willsky Rank-Sparsity Incoherence for Matrix Decomposition


Zihan Zhou, Xiaodong Li, John Wright, Emmanuel Candes, Yi Ma Stable Principal Component Pursuit


Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh Matrix Completion from a Few Entries


Sahand Negahban, Martin J. Wainwright Restricted strong convexity and weighted matrix completion: Optimal bounds with noise

http://arxiv.org/abs/1009.2118v2

Yonina C. Eldar, Deanna Needell, Yaniv Plan Unicity conditions for low-rank matrix recovery



Solving the problem

Rewriting the problem


122/170


Solving the problem


minimize

A


123/170



Solving the problem


minimize

A

( )


124/170



Solving the problem


minimize

A

bj A D (i j)


125/170


minimize Asubject to A + E = D

, (E) = 0


Solving the problem


minimize

A

bj t t A D (i j)


126/170



, (E) = 0

where


Solving the problem


minimize

A

bj t t A D (i j)


127/170



, (E) = 0

where

[(E)]ij =

Eij if(i,j)

0 if(i,j) / D

ij =

Dij if(i,j)

0 if(i,j) /


Solving the problem


minimize

A

subject to A D (i j)


128/170



, (E) = 0

where

[(E)]ij =

Eij if(i,j)

0 if(i,j) / D

ij =

Dij if(i,j)

0 if(i,j) /

The new problem can be solved by Augmented Lagrangian Method in anefficient way

Z. Lin, M. Chen, L. Wu and Y. Ma The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices



Solving the problem

The augmented lagrangian for the problem


129/170



, (E) = 0

isL(A,E, Y, ) = A + Y,D A E +

2D A E2F (7)

The tradidional iterative method to minimize the augmented lagrangian can be

used here, but at each iteration the constraint (E) = 0 has to be fulfilled.


Solving the problem

Algorithm

i t Ob i l D (i j )


130/170

input: Observation samples Dij, (i,j )

Y0 = 0;E0 = 0; 0 > 0; > 1; k = 0

while not converged

Ak+1 = argminA

L(A,Ek, Yk, k)

Ek+1 = argminE,(E)=0

L(Ak+1,E, Yk, k)

Yk+1 = Yk + k(D

Ak+1 Ek+1)

k+1 = kk k+ 1

end while

Output: (Ak,Ek)


Solving the subproblems

Solving for Ak+1

Ak+1 = argmin L(A Ek Yk k)


131/170

Ak+1 = argminA

L(A,Ek, Yk, k)

Ak+1 = argminA

A + Yk,D A Ek + 2 D A Ek2F

Ak+1 = argminA

1A + 12D A Ek + 1Yk2F

which has the general form

argminA

A + 12XA2F



Singular value shrinkage operator

Given a matrix X = UV T


132/170

Given a matrix X = UVT,

the operator D(.) : Rmn Rmn is defined as

D(X) = U

S()V

T,

S() = sign()

{|

|

}



Singular value shrinkage operator

Given a matrix X = UV T


133/170

Given a matrix X = UVT,

the operator D(.) : Rmn Rmn is defined as

D(X) = U

S()V

T,

S() = sign()

{|

|

}Theorem

For each 0 and Y Rmn, the singular value shrinkage operator obeys

D(Y) = argminX

X + 12Y X2F



Proof:

Consider the function h0(X) =

X

+

12

X

Y

2F


134/170



Proof:


X

+

12

X

Y

2F


135/170

A sufficient condition for optimality of X is that

0 X Y + Xwhere X is the set of subgradients of the nuclear norm X at X.



Proof:


X

+

12

X

Y

2F


136/170


0 X Y + Xwhere X is the set of subgradients of the nuclear norm X at X.We know that for an arbitraty X = UVT Rmn

X

=

{UVT + W : W

Rmn, UTW = 0, WV = 0,

W

2

1

}



Proof:


X

+

12

X

Y

2F


137/170


0 X Y + Xwhere X is the set of subgradients of the nuclear norm X at X.We know that for an arbitraty X = UVT Rmn

X

=

{UVT + W : W

Rmn, UTW = 0, WV = 0,

W

2

1

}If we set X = D(Y) and prove that Y X X, then the theorem isconcluded.


Decompose Y = U00VT0 + U11V

T1


138/170



T1 where U0, V0 are the singular vectors

associated with singular values > and U1, V1 are the ones associated withvalues

.


139/170



T1 where U0, V0 are the singular vectors

associated with singular values > and U1, V1 are the ones associated withvalues

. Since X =

D(Y) we can write

( ) T


140/170

X = U0(0 I)VT0 .

ThenY X = U11VT1 + U0VT0

= (U0VT0 + W) , W =

1U11VT1

By definition UT0 W = 0 , WV0 = 0 and since the diagonal elements of1have magnitudes bounded by , we also have W2 1. HenceY X X which concludes the proof.



Solving for Ek+1


L(Ak+1,E, Yk, k)


141/170


Y,D A E + 2D A E2F


12D

A E+ 1Y2F

Ek+1 = (D Ak+1 + 1k Yk)

Here is the complementary set of,

= {(i,j) | (i,j) / }.


Solving the problem

The algorithm is reduced to

Input: Observation samples Dij, (i,j )


142/170

Y0 = 0;E0 = 0; 0 > 0; > 1; k = 0

while not converged

Ak+1 = D1k

(D

Ek + 1k

Yk)

Ek+1 = (D

Ak+1 + 1k Yk)

Yk+1 = Yk + k(D

Ak+1 Ek+1)

k+1 = kk k+ 1

end while

Output: (Ak,Ek)


Robust PCA



143/170

minimize rank (A) + ||E||0 (8)subject to Aij + Eij = Dij (i,j)


Robust PCA



144/170


We look for the best rank-k approximation of the matrix D which is

corrupted by sparse noise


Robust PCA



145/170


We look for the best rank-k approximation of the matrix D which is

corrupted by sparse noise

Similar problems and conditions as in the case of matrix completion


Robust PCA

The original problem is very hard to solve so we look again for a convex

relaxation of the problem


146/170


Robust PCA




147/170



Robust PCA




148/170

rank(A) = ||diag()||0 A = UVT , E0


Robust PCA




149/170



Robust PCA




150/170


||A|| = ||diag()||1 E1


Robust PCA



T


151/170


||A|| = ||diag()||1 E1

Convex relaxation

minimize A + E1 (9)subject to Aij + Eij = Dij (i,j)



152/170


Robust PCA

Conditions for exact recovery of the convex relaxation

In order to have exact recovery we need to impose that the low rank part

i t d l th t th t i t l k


153/170

is not sparse and also that the sparse part is not low rank

Icoherence condition of the low rank part

The incoherence condition of a matrix A = USVT Rmn with parameter states that

maxi

UTei2 rm , maxi VTei2 rn

UVT rmn


Robust PCA

Theorem

IfA0 obeys the incoherent condition of parameter and the sampling set isuniformly distributed among all sets of cardinality M obeying M = 0.1mn and


154/170

uniformly distributed among all sets of cardinality M obeying M 0.1mn andalso each observed entry is corrupted with probability independently of theothers. Then for N = max(m, n) there exist a constant c such that withprobability at least 1

cN10 problem (9) with = 1/

0.1N recovers the

exact solutions (A0,E0) provided that

rank(A0) rN1(logN)2 , swhere r and s are positive numerical constants

E.J. Candes, X. Li, Y. Ma, and Wright, J. Robust principal component analysis? http://arxiv.org/abs/0912.3599


Solving the problem



155/170


Solving the problem


minimizeA

+ E

1subject to Aij + Eij = Dij (i,j)


156/170


Solving the problem


minimizeA

+

E



157/170


Solving the problem


minimizeA

+

E



158/170

minimize A + E1subject to A + E+ Z = D

, (Z) = 0


Solving the problem


minimizeA

+

E



159/170


, (Z) = 0

where


Solving the problem


minimizeA

+

E



160/170


, (Z) = 0

where

[(Z)]ij =

Zij if(i,j)

0 if(i,j) / D

ij =

Dij if(i,j)

0 if(i,j) /


Solving the problem


minimizeA

+

E

1



161/170


, (Z) = 0

where

[(Z)]ij =

Zij if(i,j)

0 if(i,j) / D

ij =

Dij if(i,j)

0 if(i,j) /

Z. Lin, M. Chen, L. Wu and Y. Ma The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices



Solving the problem


minimize A + E1


162/170


, (Z) = 0

is

L(A,E, Y,Z, ) = A+E1+Y,DAEZ+ 2DAEZ2F (10)

with the additional constraint (Z) = 0.


Solving the problem

Algorithm


Y0 = 0;E0 = 0;A0 = D

;Z0 = 0, 0 > 0; > 1; k = 0

while not converged


163/170

while not converged

Ak+1 = argminA

L(A,Ek, Yk,Zk, k)

Ek+1 = argminE

L(Ak+1,E, Yk,Zk, k)

Zk+1 = argminZ,(Z)=0

L(Ak+1,Ek+1, Yk,Z, k)

Yk+1 = Yk + k(D

Ak+1 Ek+1 Zk+1)

k+1 = k

k k+ 1end while

Output: (Ak,Ek,Zk)



Solving for Ak+1

Ak+1 = argminA

L(A,Ek, Yk,Zk, k)


164/170

Ak+1 = argminA

A + Yk,D A Ek Zk + k2 D A Ek Zk2F

Ak+1 = argminA

1k A + 12D A Ek Zk + 1Yk2F

which has closed form solution

Ak+1 = D1k (D

Ek Zk + 1

k Yk)



Solving for Ek+1

Ek+1 = argminE

L(Ak+1,E, Yk,Zk, k)

i A


165/170

Ek+1 = argminE

E1 + Y,D Ak+1 EZk+ k

2D Ak+1 EZk2F

Ek+1 = argminE

1k E1 + 12D Ak+1 EZk + 1k Y2F

which has the form

argminE

E1 + 12XE2F



Shrinkage operator

Given a matrix Y

Rmn, the operatorS

(.) : Rmn

Rmn is defined as

S (Y) = sign(Y)(|Y| )


166/170

( ) g ( )(| | )

where

sign(Y)(|Y

| ) is applied componentwise to Y

Theorem

For each 0 and Y Rmn, the shrinkage operator obeys

S(Y) = argminX

X1 + 12Y X2F


Proof:

Consider the function h(X) = X1 + 12X Y2F


0 X Y + X1h i h f b di f h l


167/170

where X1 is the set of subgradients of the l1 norm X1 at X.

All the subgradients of

X

1 at X are given by

X1 =

G Rmn

Gij =

1 for Xij < 01 for Xij > 0

[1, 1] for Xij = 0

If we prove that Y X X1 then X is the unique minimizer of theproblem.


Consider the candidate X = S(Y), then[Y S(Y)]ij = Yij sign(Yij).max(|Yij| , 0)

[Y

S(Y)]ij =

sign(Yij) if|Yij| > Y

ijif

|Y

ij| 1 for Yij <


168/170

S(Y)1 =

G Rmn

Gij =

j

1 for Yij > [1, 1] for |Yij|

S(Y)1 =

G Rmn

Gij = sign(Yij) for |Yij| > [, ] for |Yij|

Y S(Y) S(Y) S(Y) is the optimal solution


Solving the subproblemsSolving for Zk+1


L(Ak+1,Ek+1, Yk,Z, k)

Zk+1 = argmin Yk,D Ak+1 Ek+1 Z


169/170

Z,(Z)=0

+ 2D Ak+1 Ek+1 Z2F


12D Ak+1 Ek+1 Z + 1k Yk2F

Zk+1 = (D Ak+1 Ek+1 + 1k Yk)

Here is the complementary set of,

= {(i,j) | (i,j) / }.ELEG 867 (MC and RPCA problems) Fall, 2011 90 / 91

Solving the problem

Algorithm


Y0 = 0;E0 = 0;Z0 = 0; 0 > 0; > 1; k = 0

while not converged


170/170

Ak+1 = D1k

(D

EkZk + 1k

Yk)

Ek+1 = S(D

Ak+1 Zk + 1k

Y)

Zk+1 = (D

Ak+1 Ek+1 + 1k

Yk)

Yk+1 = Yk + k(D

Ak+1 Ek+1)

k+1 = kk k+ 1

end while

Output: (Ak,Ek,Zk)


mc complete version

Documents