a fixed point algorithm for 1 large scale underdetermined ...cybershare.utep.edu › ... › poster...

A Fixed Point Algorithm for `1 large scaleunderdetermined systems

REINALDO SANCHEZMIGUEL ARGAEZ

SACNAS Research EXPOApril 16, 2009

The University of Texas at El PasoComputational Science Program

Department of Mathematical Sciences1 / 16

Introduction

Current data acquisition methods are often extremely wasteful. Many proto-cols acquire massive amounts of data which are then (in large part) discardedwithout much information by a subsequent compression stage (necessary forstorage and transmission)

Assertion: It is possible to recover signals from far fewer measurementsthan was thought necessary! In practice, this means for example that high-resolution imaging is possible with fewer sensors.

COMPRESSED SENSING

Initiated in 2004 by Emmanuel Candes, Justin Romberg and TerenceTao, and independently by David Donoho.

General Theme: How much information is necessary to accuratelyreconstruct a signal / image?

Key: Sparsity. Many real-world signals are compressible: They can bewell-approximated by signals that have only few non-zero coefficientsin a suitable basis, i.e., by sparse ones.

2 / 16

Formulation of the Problem

The `1 norm of the vector z ∈ Rn is defined by

‖z‖1 =n∑

i=1

|zi |

Given some corrupted measurements b = Ax∗ + ν, where b, v ∈ Rm, A ∈Rm×n with m < n , we want to recover x∗.

Candes and Tao (2004) proved that the vector x∗ is the unique solution ofthe following problem

minx‖x‖1

s.t Ax = b (1)

provided that ‖ν‖0 ≤ ρm for some ρ > 0. Here m is the number ofmeasurements to recover the signal and ‖ν‖0 := |{i : νi 6= 0}| .

Question: How to solve the convex optimization problem (1)?3 / 16

Strategies to solve the problem

Saunders et. al (1999), Stanford UniversityCandes (2004), Caltech. `1- magic algorithm

minx‖x‖1 s.t Ax = b (2)

Boyd et. al (2007), Stanford UniversityWright et. al (2007), University of Wisconsin

minx

{‖b − Ax‖22 + λ‖x‖1

}(3)

Lasso problem (1996). Statistics communityTibshirani (1996)

minx‖b − Ax‖22 s.t ‖x‖1 ≤ q (4)

Zhang et. al (2007), Rice University

minx‖x‖1 +

µ

2‖Ax − b‖22 (5)

4 / 16

Our Approach (UTEP)

We focus on the formulation of the problem given by (1) and denote the solutionby x∗, that is

x∗ = arg{

minx‖x‖1 s.t Ax = b

}We use the following approximation for ‖x‖1

‖x‖1 =n∑

i=1

|xi | 'n∑

i=1

(x2i + µ

)1/2, µ > 0.

Let us define, for µ > 0, the function fµ(x) =∑n

i=1

(x2i + µ

)1/2.

fµ is a strictly convex function for any µ > 0.

limµ→0 fµ(x) = ‖x‖1 for any x ∈ Rn.

∇fµ ∈ C1(Rn) since∂fµ(x)∂xi

= xi

(x2i +µ)1/2 .

5 / 16

Properties

We solve a sequence of subproblems of the following form:

For µ > 0,

minx

fµ(x) (6)

s.t Ax = b

and call

xµ = arg{

minx

fµ(x) s.t Ax = b}.

Each of these subproblems is a strictly convex problem, therefore it has a uniqueglobal solution.

Property 1: (6) is a strictly convex problem ⇒ unique solution

Property 2: xµ → x∗ as µ→ 0

6 / 16

Optimality Conditions to problem (6)

The Lagrange function associated to (6) is

`(x , z) = fµ(x) + (Ax − b)T y .

where y ∈ Rm is the Lagrange multiplier associated to the equality constraint.The KKT (necessary and sufficient) conditions are:

∇x`(x , y) = ∇fµ(x) + AT y = 0

∇y `(x , y) = Ax − b = 0. (7)

and we can write this equations as: for µ > 0,

xi

(x2i + µ)1/2

+ (AT y)i = 0, i = 1, ..., n

Ax − b = 0.

We then have a square system of nonlinear equations of order n + m. This systemof equations has a unique solution.

7 / 16

Fixed Point Formulation

We rewrite the system of nonlinear equations as:[D−1/2µ (x) AT

A 0

] [xy

]=

[0b

](8)

where Dµ(x) = diag (x . ∧ 2) , x ∈ Rn and µ > 0.

Let X =

[xy

], B =

[0b

]and Fµ(X ) =

[D−1/2µ (x) AT

A 0

], then the aug-

mented system (8) can be written as

Fµ(X )X = B,

Since Fµ(X ) is non-singular (A is full rank and D−1/2µ (x) > 0 ), then we have that

X = F−1µ (X )B =: Sµ(X )

The problem is then posed as a fixed point problem of the form:

Find X such that: X = Sµ(X ) (9)

There exists a unique X ∗ such that Sµ(X ∗) = X ∗ for µ > 0.8 / 16

Globalization Strategy

For some fixed i = 1, ..., n we have that

xi

(x2i + µ)1/2

+(AT y

)i

= 0,

and solving the above equation for µ, we have

xi zi = µ,

where xi = x2i and zi =

1−w2i

w2i

with wi = (AT y)i . Notice that 0 < w2i < 1. So the

KKT conditions are just

For µ > 0,

Ax = b

XZe = µe,

where X = diag(xi ) and Z = diag(zi ) for i = 1, ..., n. We have xi > 0 and zi > 0.Here e = (1, ..., 1)T ∈ Rn.

9 / 16

Fundamental Property

Perturbed KKT ⇐⇒ KKT Conditions

Conditions to problem (1) to problem (6)

Perturbed KKT conditions to

problem (1)

For µ > 0Ax − b = 0x2

i (1− w 2i ) = µ, i = 1, ..., n

Remark: if µ = 0 then we have justthe KKT conditions

At the solution we have that

xi > 0 and wi = −1, i = 1, ..., nor xi < 0 and wi = 1, i = 1, ..., nwhere wi = (AT y)i

⇐⇒

KKT conditions to problem (6)

For µ > 0Ax − b = 0

x2i

(1−w2i )

w2i

= µ, i = 1, ..., n

When µ→ 0 we have that

xi > 0 and wi → −1, i = 1, ..., nor xi < 0 and wi → 1, i = 1, ..., nwhere wi = (AT y)i

We can implement an interior-point method to solve (1) without using adual condition nor forcing positivity.

10 / 16

Solving the linear system associated to (6)

Let Dµ(x) = diag(x2i + µ) for i = 1, ..., n.[

D−1/2µ (x) AT

A 0

] [xy

]=

[0b

]⇐⇒ AD

1/2µ (x)AT y = b

x = D1/2µ (x)AT y

The normal equation is solved using a Conjugate Gradient algorithm.

The more expensive arithmetic operation is

q = A(D1/2µ

(ATd

))where d is a direction vector.

We do not store the whole matrix A. In our algorithm we only use matrixvector multiplications.

11 / 16

Algorithm

Fixed Point Signal Recovery (FPSR)

Task: Find an approximate solution x to the problem Step 8. Set x = x. ∧ 2, w = AT y, z = (1− w. ∧ 2)./w. ∧ 2

minx ‖x‖1 s.t Ax = b, Step 9. Compute errorprimal =‖Ax−b‖1+‖b‖ , gap = xT z

n

Parameters: We are given the matrix A, the vector b Step 10. Stopping criteria for the problem:

and the true signal x∗ if (errorprimal + gap) > ε2

Step 1. Initialization: Set: µ, σ, ε1, ε2, ε3 Update µ = σgap, ε1 =√µ go to 3

Step 2. Initial approximate solution x = AT b else

Step 3. Outer Loop: for k = 1, ...,maxiter display ’x is an optimal solution’

Step 4. Inner Loop: Set xprev = x Step 11. Quality measurement:

Step 5. Update matrix: Dµ = diag(xprev . ∧ 2 + µ) if ‖x − x∗‖ ≤ ε3

Step 6. Solve the AS associated to FP problem: display ’the signal was recovered’[D−1/2µ AT

A 0

] [xy

]=

[0b

]else

Step 7. Stopping Criteria for the fixed point problem: display ’fail to recover the signal’

if‖x−xprev‖1+‖xprev‖

> ε1 go to step 4 end

12 / 16

Numerical Experimentation

Goal: investigate the capabilities of recovering sparse signals using FPSR andcompare its behavior with respect to l1 ls (Stanford) and GPSR BB (Wisconsin)algorithms.

Problem: recover a signal x ∈ R4096 with 160 spikes with an amplitude ±1, fromm = 1024 noisy measurements.We do not have access to A in a explicit form. In the problem, we have b = Ax + εin Rm, where the noise vector ε is set according to a Gaussian distribution withmean 0 and standard deviation 0.01.

The algorithm was run in one of the nodes of the Virgo2 Machine at UTEP, withtwo Intel quad-core CPUs and 8 GBytes of memory.

The original signal and the reconstructed signals obtained by l1 ls, GPSR Mono-tone with continuation and FPSR are shown in the following figure

13 / 16

Sparse Signal Recovery Example

0 500 1000 1500 2000 2500 3000 3500 4000 4500−1

0

1

original signal x0

0 500 1000 1500 2000 2500 3000 3500 4000 4500−1

0

1

l1−ls Stanford solution

0 500 1000 1500 2000 2500 3000 3500 4000 4500−1

0

1

GPSR−BB Wisconsin solution

0 500 1000 1500 2000 2500 3000 3500 4000 4500−1

0

1

FPSR UTEP solution

We run each of the algorithms

ten times and the

CPU average time was

Algorithm Norm-2 Error CPU Time (sec)

l1 ls 0.92 1.91

GPSR . 0.93 0.37

FPSR 1.06 0.28

14 / 16

Scalability

We generate a sequence of synthetic problems to examine how the problem sizeaffects the runtime of our algorithm and compare with l1 ls and GPSR. We createa family of data sets, with n going from 212 to 222.. The sparsity of the signal wascontrolled so that the total number of nonzero entries was 1

27 n. The matrix A wasset as in the Sparse Signal Recovery Example with m = n

4 . The level of noise ε is1% with Gaussian distribution N(0, 0.012I )

103

104

105

106

107

10−1

100

101

102

103

104

Problem size (n)

CP

U ti

me

l1−ls (Stanford)GPSR−BB monotone (Wisconsin)FPSR (UTEP)

15 / 16

References

E. Candes, J. Romberg, and T. Tao. Robust Uncertainty Principles: ExactSignal Reconstruction from Highly Incomplete Frequency Information. 2005.

M. Figueiredo, R. Nowak, S. Wright. Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other InverseProblems. 2007

S.Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky. An Interior-Point Methodfor Large-Scale l1-Regularized Least Squares. 2007

Acknowledgments

The authors thank the financial support from the Computational Science Programand ARL Grant No. W911NF-07-2-0027.The authors also acknowledge the office space provided by the Department ofMathematical Sciences.The authors thank the administrators of the Virgo2 Machine at UTEP supportedby the National Science Foundation under Grant No. CNS-0709438.

16 / 16

a fixed point algorithm for 1 large scale underdetermined ...cybershare.utep.edu › ... › poster...

Documents