a fixed point algorithm for 1 large scale underdetermined ...cybershare.utep.edu › ... › poster...
TRANSCRIPT
A Fixed Point Algorithm for `1 large scaleunderdetermined systems
REINALDO SANCHEZMIGUEL ARGAEZ
SACNAS Research EXPOApril 16, 2009
The University of Texas at El PasoComputational Science Program
Department of Mathematical Sciences1 / 16
Introduction
Current data acquisition methods are often extremely wasteful. Many proto-cols acquire massive amounts of data which are then (in large part) discardedwithout much information by a subsequent compression stage (necessary forstorage and transmission)
Assertion: It is possible to recover signals from far fewer measurementsthan was thought necessary! In practice, this means for example that high-resolution imaging is possible with fewer sensors.
COMPRESSED SENSING
Initiated in 2004 by Emmanuel Candes, Justin Romberg and TerenceTao, and independently by David Donoho.
General Theme: How much information is necessary to accuratelyreconstruct a signal / image?
Key: Sparsity. Many real-world signals are compressible: They can bewell-approximated by signals that have only few non-zero coefficientsin a suitable basis, i.e., by sparse ones.
2 / 16
Formulation of the Problem
The `1 norm of the vector z ∈ Rn is defined by
‖z‖1 =n∑
i=1
|zi |
Given some corrupted measurements b = Ax∗ + ν, where b, v ∈ Rm, A ∈Rm×n with m < n , we want to recover x∗.
Candes and Tao (2004) proved that the vector x∗ is the unique solution ofthe following problem
minx‖x‖1
s.t Ax = b (1)
provided that ‖ν‖0 ≤ ρm for some ρ > 0. Here m is the number ofmeasurements to recover the signal and ‖ν‖0 := |{i : νi 6= 0}| .
Question: How to solve the convex optimization problem (1)?3 / 16
Strategies to solve the problem
Saunders et. al (1999), Stanford UniversityCandes (2004), Caltech. `1- magic algorithm
minx‖x‖1 s.t Ax = b (2)
Boyd et. al (2007), Stanford UniversityWright et. al (2007), University of Wisconsin
minx
{‖b − Ax‖22 + λ‖x‖1
}(3)
Lasso problem (1996). Statistics communityTibshirani (1996)
minx‖b − Ax‖22 s.t ‖x‖1 ≤ q (4)
Zhang et. al (2007), Rice University
minx‖x‖1 +
µ
2‖Ax − b‖22 (5)
4 / 16
Our Approach (UTEP)
We focus on the formulation of the problem given by (1) and denote the solutionby x∗, that is
x∗ = arg{
minx‖x‖1 s.t Ax = b
}We use the following approximation for ‖x‖1
‖x‖1 =n∑
i=1
|xi | 'n∑
i=1
(x2i + µ
)1/2, µ > 0.
Let us define, for µ > 0, the function fµ(x) =∑n
i=1
(x2i + µ
)1/2.
fµ is a strictly convex function for any µ > 0.
limµ→0 fµ(x) = ‖x‖1 for any x ∈ Rn.
∇fµ ∈ C1(Rn) since∂fµ(x)∂xi
= xi
(x2i +µ)1/2 .
5 / 16
Properties
We solve a sequence of subproblems of the following form:
For µ > 0,
minx
fµ(x) (6)
s.t Ax = b
and call
xµ = arg{
minx
fµ(x) s.t Ax = b}.
Each of these subproblems is a strictly convex problem, therefore it has a uniqueglobal solution.
Property 1: (6) is a strictly convex problem ⇒ unique solution
Property 2: xµ → x∗ as µ→ 0
6 / 16
Optimality Conditions to problem (6)
The Lagrange function associated to (6) is
`(x , z) = fµ(x) + (Ax − b)T y .
where y ∈ Rm is the Lagrange multiplier associated to the equality constraint.The KKT (necessary and sufficient) conditions are:
∇x`(x , y) = ∇fµ(x) + AT y = 0
∇y `(x , y) = Ax − b = 0. (7)
and we can write this equations as: for µ > 0,
xi
(x2i + µ)1/2
+ (AT y)i = 0, i = 1, ..., n
Ax − b = 0.
We then have a square system of nonlinear equations of order n + m. This systemof equations has a unique solution.
7 / 16
Fixed Point Formulation
We rewrite the system of nonlinear equations as:[D−1/2µ (x) AT
A 0
] [xy
]=
[0b
](8)
where Dµ(x) = diag (x . ∧ 2) , x ∈ Rn and µ > 0.
Let X =
[xy
], B =
[0b
]and Fµ(X ) =
[D−1/2µ (x) AT
A 0
], then the aug-
mented system (8) can be written as
Fµ(X )X = B,
Since Fµ(X ) is non-singular (A is full rank and D−1/2µ (x) > 0 ), then we have that
X = F−1µ (X )B =: Sµ(X )
The problem is then posed as a fixed point problem of the form:
Find X such that: X = Sµ(X ) (9)
There exists a unique X ∗ such that Sµ(X ∗) = X ∗ for µ > 0.8 / 16
Globalization Strategy
For some fixed i = 1, ..., n we have that
xi
(x2i + µ)1/2
+(AT y
)i
= 0,
and solving the above equation for µ, we have
xi zi = µ,
where xi = x2i and zi =
1−w2i
w2i
with wi = (AT y)i . Notice that 0 < w2i < 1. So the
KKT conditions are just
For µ > 0,
Ax = b
XZe = µe,
where X = diag(xi ) and Z = diag(zi ) for i = 1, ..., n. We have xi > 0 and zi > 0.Here e = (1, ..., 1)T ∈ Rn.
9 / 16
Fundamental Property
Perturbed KKT ⇐⇒ KKT Conditions
Conditions to problem (1) to problem (6)
Perturbed KKT conditions to
problem (1)
For µ > 0Ax − b = 0x2
i (1− w 2i ) = µ, i = 1, ..., n
Remark: if µ = 0 then we have justthe KKT conditions
At the solution we have that
xi > 0 and wi = −1, i = 1, ..., nor xi < 0 and wi = 1, i = 1, ..., nwhere wi = (AT y)i
⇐⇒
KKT conditions to problem (6)
For µ > 0Ax − b = 0
x2i
(1−w2i )
w2i
= µ, i = 1, ..., n
When µ→ 0 we have that
xi > 0 and wi → −1, i = 1, ..., nor xi < 0 and wi → 1, i = 1, ..., nwhere wi = (AT y)i
We can implement an interior-point method to solve (1) without using adual condition nor forcing positivity.
10 / 16
Solving the linear system associated to (6)
Let Dµ(x) = diag(x2i + µ) for i = 1, ..., n.[
D−1/2µ (x) AT
A 0
] [xy
]=
[0b
]⇐⇒ AD
1/2µ (x)AT y = b
x = D1/2µ (x)AT y
The normal equation is solved using a Conjugate Gradient algorithm.
The more expensive arithmetic operation is
q = A(D1/2µ
(ATd
))where d is a direction vector.
We do not store the whole matrix A. In our algorithm we only use matrixvector multiplications.
11 / 16
Algorithm
Fixed Point Signal Recovery (FPSR)
Task: Find an approximate solution x to the problem Step 8. Set x = x. ∧ 2, w = AT y, z = (1− w. ∧ 2)./w. ∧ 2
minx ‖x‖1 s.t Ax = b, Step 9. Compute errorprimal =‖Ax−b‖1+‖b‖ , gap = xT z
n
Parameters: We are given the matrix A, the vector b Step 10. Stopping criteria for the problem:
and the true signal x∗ if (errorprimal + gap) > ε2
Step 1. Initialization: Set: µ, σ, ε1, ε2, ε3 Update µ = σgap, ε1 =√µ go to 3
Step 2. Initial approximate solution x = AT b else
Step 3. Outer Loop: for k = 1, ...,maxiter display ’x is an optimal solution’
Step 4. Inner Loop: Set xprev = x Step 11. Quality measurement:
Step 5. Update matrix: Dµ = diag(xprev . ∧ 2 + µ) if ‖x − x∗‖ ≤ ε3
Step 6. Solve the AS associated to FP problem: display ’the signal was recovered’[D−1/2µ AT
A 0
] [xy
]=
[0b
]else
Step 7. Stopping Criteria for the fixed point problem: display ’fail to recover the signal’
if‖x−xprev‖1+‖xprev‖
> ε1 go to step 4 end
12 / 16
Numerical Experimentation
Goal: investigate the capabilities of recovering sparse signals using FPSR andcompare its behavior with respect to l1 ls (Stanford) and GPSR BB (Wisconsin)algorithms.
Problem: recover a signal x ∈ R4096 with 160 spikes with an amplitude ±1, fromm = 1024 noisy measurements.We do not have access to A in a explicit form. In the problem, we have b = Ax + εin Rm, where the noise vector ε is set according to a Gaussian distribution withmean 0 and standard deviation 0.01.
The algorithm was run in one of the nodes of the Virgo2 Machine at UTEP, withtwo Intel quad-core CPUs and 8 GBytes of memory.
The original signal and the reconstructed signals obtained by l1 ls, GPSR Mono-tone with continuation and FPSR are shown in the following figure
13 / 16
Sparse Signal Recovery Example
0 500 1000 1500 2000 2500 3000 3500 4000 4500−1
0
1
original signal x0
0 500 1000 1500 2000 2500 3000 3500 4000 4500−1
0
1
l1−ls Stanford solution
0 500 1000 1500 2000 2500 3000 3500 4000 4500−1
0
1
GPSR−BB Wisconsin solution
0 500 1000 1500 2000 2500 3000 3500 4000 4500−1
0
1
FPSR UTEP solution
We run each of the algorithms
ten times and the
CPU average time was
Algorithm Norm-2 Error CPU Time (sec)
l1 ls 0.92 1.91
GPSR . 0.93 0.37
FPSR 1.06 0.28
14 / 16
Scalability
We generate a sequence of synthetic problems to examine how the problem sizeaffects the runtime of our algorithm and compare with l1 ls and GPSR. We createa family of data sets, with n going from 212 to 222.. The sparsity of the signal wascontrolled so that the total number of nonzero entries was 1
27 n. The matrix A wasset as in the Sparse Signal Recovery Example with m = n
4 . The level of noise ε is1% with Gaussian distribution N(0, 0.012I )
103
104
105
106
107
10−1
100
101
102
103
104
Problem size (n)
CP
U ti
me
l1−ls (Stanford)GPSR−BB monotone (Wisconsin)FPSR (UTEP)
15 / 16
References
E. Candes, J. Romberg, and T. Tao. Robust Uncertainty Principles: ExactSignal Reconstruction from Highly Incomplete Frequency Information. 2005.
M. Figueiredo, R. Nowak, S. Wright. Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other InverseProblems. 2007
S.Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky. An Interior-Point Methodfor Large-Scale l1-Regularized Least Squares. 2007
Acknowledgments
The authors thank the financial support from the Computational Science Programand ARL Grant No. W911NF-07-2-0027.The authors also acknowledge the office space provided by the Department ofMathematical Sciences.The authors thank the administrators of the Virgo2 Machine at UTEP supportedby the National Science Foundation under Grant No. CNS-0709438.
16 / 16