adaptive sparse reconstruction

Upload: rahit-r-nair

Post on 06-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Adaptive Sparse Reconstruction

    1/12

    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010 409

    A Stochastic Gradient Approach on CompressiveSensing Signal Reconstruction Based on

    Adaptive Filtering FrameworkJian Jin, Yuantao Gu, and Shunliang Mei

    AbstractBased on the methodological similarity betweensparse signal reconstruction and system identification, a newapproach for sparse signal reconstruction in compressive sensing(CS) is proposed in this paper. This approach employs a stochasticgradient-based adaptive filtering framework, which is commonlyused in system identification, to solve the sparse signal reconstruc-tion problem. Two typical algorithms for this problem:

    0

    -leastmean square (

    0

    -LMS) algorithm and0

    -exponentially forgettingwindow LMS (

    0

    -EFWLMS) algorithm are hence introducedhere. Both the algorithms utilize a zero attraction method, which

    has been implemented by minimizing a continuous approximationof

    0

    norm of the studied signal. To improve the performancesof these proposed algorithms, an

    0

    -zero attraction projection(

    0

    -ZAP) algorithm is also adopted, which has effectively accel-erated their convergence rates, making them much faster thanthe other existing algorithms for this problem. Advantages of theproposed approach, such as its robustness against noise, etc., aredemonstrated by numerical experiments.

    Index TermsAdaptive filter, compressive sensing (CS), leastmean square (LMS),

    0

    norm, sparse signal reconstruction, sto-chastic gradient.

    I. INTRODUCTION

    A. Overview of Compressive Sampling

    COMPRESSIVE sensing or compressive sampling (CS)[1][4] is a novel technique that enables sampling below

    Nyquist rate, without (or with little) sacrificing reconstructionquality. It is based on exploiting signal sparsity in some typicaldomains. A brief review on CS is given here.

    For a piece of finite-length, real-valued 1-D discrete signal ,its representation in domain is

    (1)

    where and are column vectors, and is anbasis matrix with vectors as columns.

    Manuscript received February 27, 2009; revised August 31, 2009. Currentversion published March 17, 2010. This work was supported in part by the Na-tional Natural Science Foundation of China under Grants NSFC 60872087 andNSFC U0835003. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Yonina Eldar.

    The authors are with the Department of Electronic Engineering, TsinghuaUniversity, Beijing 100084, China (e-mail: [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/JSTSP.2009.2039173

    Obviously, and are equivalent representations of the signalwhen is full ranked. Signal is -sparse if out ofcoefficients of are nonzero in the domain , and it is sparse if

    .Take linear, nonadaptive measurement

    of through a linear transform , which is

    (2)

    where is an matrix, and each of its rows canbe considered as a basis vector, usually orthogonal. is thustransformed, or down sampled, to an vector .

    According to the discussion above, the main task of CS is asfollows.

    To design a stable measurement matrix. It is important tomake a sensing matrix which allows recovery of as manyentries of as possible with as few as measurements.The matrix should satisfy the conditions of incoherenceand restricted isometry property (RIP) [3]. Fortunately,simple choice of as the random matrix can makesatisfy these conditions with high possibility. Common

    design methods include Gaussian measurements, Binarymeasurements, Fourier measurements, and Incoherentmeasurement [3]. The Gaussian measurements are em-ployed in this work, i.e., the entries of sensingmatrix are independently sampled from a normal distri-bution with mean zero and variance .When the basis matrix (wavelet, Fourier, discrete cosinetransform (DCT), etc.) is orthogonal, is also indepen-dent and identically distributed (i.i.d.) with[4].

    To design a signal reconstruction algorithm. The signal re-construction algorithm aims to find the sparsest solution to(2), which is ill-conditioned. This will be discussed in de-

    tail in the following subsection.Compressive Sensing methods provide a robust framework

    that can reduce the number of measurements required to esti-mate a sparse signal. For this reason, CS methods are useful inmany areas, such as MR imaging [5] and analog-to-digital con-version [6].

    B. Signal Reconstruction Algorithms

    Although CS is a new concept emerged recently, searchingfor the sparse solution to an under-determined system of linearequations (2) has always been of significant importance in signal

    processing and statistics. The main idea is to obtain the sparse

    1932-4553/$26.00 2010 IEEE

  • 8/2/2019 Adaptive Sparse Reconstruction

    2/12

    410 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    solution by adding sparse constraint. The sparsest solution canbe acquired by taking norm into account

    (3)

    Unfortunately, this criterion is not convex, and the computa-tional complexity of optimizing it is non-polynomial (NP) hard.To overcome this difficulty, norm has to be replaced by sim-pler ones in terms of computational complexity. For example,the convex norm is used

    (4)

    This idea is known as basis pursuit, and it can be recasted asa linear programming (LP) issue. A recent body of related re-search shows that perhaps there are conditions guaranteeing aformal equivalence between the norm solution and thenorm solution [1].

    In the presence of noise and/or imperfect data, however,it is undesirable to fit the linear system exactly. Instead, theconstraint in (4) is relaxed to obtain the basis pursuit de-noise(BPDN) problem

    (5)

    where the positive parameter is an estimation of the noiselevel in the data. The convex optimization problem (5) is onepossible statement of the least-squares problem regularized bythe norm. In fact, the BPDN label is typically applied to thepenalized least-squares problem

    (6)

    which is proposed by Chen et al. in [7] and [8]. The third for-mulation

    (7)

    which has an explicit norm constraint, is often called the LeastAbsolute Shrinkage and Selection Operator (LASSO) [9]. Theproblems (5), (6) and (7) are identical in some situations. Theprecise relationship among them is discussed in [10], [11].

    Many approaches and their variants to these problems have

    been described by the literature. They mainly fall into two basiccategories.

    Convex Relaxation: The first kind of convex optimizationmethods to solve problems (5)(7) includes interior-point (IP)methods [12], [13], which transfer these problems to a convexquadratic problem. Thestandard IP methods cannot handle largescale situation. However, many improved IP methods, which ex-ploit fast algorithms for the matrix vector operations with and

    , can deal with large scale situation, as demonstrated in [7]and [14]. High-quality implementations of such IP methods in-clude l1-magic [15] and PDCO [16], which use iterative algo-rithms, such as the conjugate gradients (CG) or LSQR algorithm[17], to compute the search step. The fastest IP method has been

    recently proposed to solve (6), different from the method usedin the previous works. In such a method called , the search

    operation in each step is done using the Preconditioned Conju-gate Gradient (PCG) algorithm, which requires less computa-tion, i.e., only the products of and [18].

    The second kind of convex optimization methods to solveproblems (5)(7) includes homotopy method and its variants.Homotopy method is employed to find the full path of solutionsfor all nonnegative values of the scalar parameters in the abovesaid three problems. When solution is extremely sparse, themethods described in [19][21] can be very fast [22]. Otherwise,the path-following methods are slow, which is often the casefor large-scale problems. Other recent developed computationalmethods include coordinate-wise descent methods [23], fixed-point continuation method [24], sequential subspace optimiza-tion methods [26], bound optimization methods [27], iteratedshrinkage methods [28], gradient methods [29], gradient pro-jection for sparse reconstruction algorithm (GPSR) [11], sparsereconstruction by separable approximation (SpaRSA) [25], andBregman iterative method [30], [31]. Some of these methods,such as the GPSR, SpaRSA and Bregman iterative method, can

    efficiently handle large-scale problems.Besides norm, another typical function to represent spar-sity is norm . The problem is a non-convex one,thus it is often transferred to a solvable convex problem. Typicalmethods include FOCal Under-determined System Solver (FO-CUSS) [32] and Iteratively Reweighted Least Square (IRLS)[33], [34]. Compared with the norm-based methods, thesemethods always need more computational time.

    Greedy Pursuits: Rather than minimize an objective func-tion globally, these methods make a local optimal choice afterbuilding up an approximation at each step. Matching Pursuit(MP) and Orthogonal Matching Pursuit (OMP) [35], [36] aretwo of the earliest greedy pursuit methods, then came Stage-

    wise OMP (StOMP) [37] and Regularized OMP [38] as theirimproved versions. The reconstruction complexity of thesealgorithms is around , which is significantly lowerthan BP methods. However, they require more measurementsfor perfect reconstruction and may fail to find the sparsest solu-tion in certain scenarios, where minimization succeeds. Morerecently, Subspace Pursuit (SP) [39], Compressive SamplingMatching Pursuit (CoSaMP) [40] and Iterative hard thresh-olding method (IHT) [41] have been proposed by incorporatingthe idea of backtracking. Theoretically, they offer comparablereconstruction quality and low reconstruction complexity asthat of LP methods. However, all of them assume that the spar-sity parameter is known, whereas may not be available in

    many practical applications. In addition, all greedy algorithmsare more demanding in memory requirement.

    C. Our Work

    The convex optimization methods, such as and SpaRSA,take all the data of into account for each iteration, while thegreedy pursuits consider each column of for iterations. In thispaper, the adaptive filtering framework, which uses each row of

    for each iteration, is applied for signal reconstruction. More-over, instead of norm, we take one of the approximations of

    norm, which is widely used in recent contribution [42], as thesparse constraint. The authors of [42] give several effective ap-

    proximations of norm for magnetic resonance image (MRI)reconstruction. However, their solver of this problem adopts the

  • 8/2/2019 Adaptive Sparse Reconstruction

    3/12

    JIN et al.: A STOCHASTIC GRADIENT APPROACH ON COMPRESSIVE SENSING SIGNAL RECONSTRUCTION 411

    traditional fix-point method, which needs much more compu-tational time. Thus, it is hard to implement for the large-scaleproblem, with which our approach can effectively deal.

    According to our best knowledge, it is the first time that theadaptive filtering framework is employed to solve CS recon-struction problem. In our approach, two modified stochastic gra-dient-based adaptive filtering methods are introduced for signalreconstruction purpose, and a novel and improved reconstruc-tion algorithm is proposed in the end.

    As the adaptive filtering framework can be used to solveunder-determined equation, it can be readily accepted that CSreconstruction problem can be seen as a problem of sparsesystem identification by making some correspondence. Thus,a variant of least mean square (LMS) algorithm, -LMS,which imposes a zero attractor on standard LMS algorithmand has good performance in sparse system identification, isintroduced to CS signal reconstruction. In order to get betterperformance, an algorithm -exponentially forgetting windowLMS ( -EFWLMS) is also adopted. The convergence of the

    above two methods may be slow since norm and normneed to be balanced in their cost functions. As regard to fasterconvergence, a new method named -zero attraction projection( -ZAP) with little sacrifice in accuracy is further proposed.Simulations show that -LMS, -EFWLMS, and -ZAPhave better performances in solving CS problem than the othertypical algorithms.

    The remainder of this paper is organized as follows. InSection II, the adaptive filtering framework is reviewed and themethodological similarity between sparse system identificationand CS problem is demonstrated. Then -LMS, -EFWLMSand -ZAP are introduced. The convergence performance of

    -LMS is analyzed in Section III. In Section IV, five experi-

    ments demonstrate the performances of the three methods invarious aspects. Finally, our conclusion is made in Section V.

    II. OUR ALGORITHMS

    A. Adaptive Filtering Framework to Solve CS Problem

    Adaptive filtering algorithms have been widely used nowa-days when the exact nature of a system is unknown or its char-acteristics are time-varying. The estimation error of the adaptivefilter output with respect to the desired signal is denotedby

    (8)

    where anddenote the filter coeffi-

    cient vector and input vector, respectively, is the time instant,and is the filter length. By minimizing the cost function, theparameters of the unknown system can be identified iteratively.

    Recalling the CS problem, one of its requirements is to solvethe under-determined equations . Suppose that

    (9)

    (10)

    (11)(12)

    TABLE ICORRESPONDENCES BETWEEN THE VARIABLES IN ADAPTIVE FILTER AND

    THOSE IN THE CS PROBLEM

    Fig. 1. Framework of adaptive filter to solve CS reconstruction problem.

    CS reconstruction problem can be regarded as an adaptivesystem identification problem by the correspondences listed inTable I. Thus, (2) can be solved in the framework of adaptivefilter.

    When the above adaptive filtering framework is applied tosolve the CS problem, there may not be enough data to trainthe filter coefficients into convergence. Thus, the rows of andthe corresponding elements of are utilized recursively. Theprocedures using adaptive filtering framework are illustrated inFig. 1. Suppose that is the updating vector, the detailedupdate procedures are as follows.

    1) Initialize , .

    2) Send data and to adaptive filter, where

    (13)

    3) Use adaptive algorithm to update .4) Judge whether stop condition is satisfied

    (14)

    where is a given error tolerance and is a givenmaximum iteration number.

    5) When satisfied, send back to and exit; otherwiseincreases by one and go back to Step 2).

    Adaptive filtering methods are well-known while CS is apopular topic in recent years, so it is surprising that no litera-

  • 8/2/2019 Adaptive Sparse Reconstruction

    4/12

    412 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    ture employs adaptive filtering structure in CS reconstructionproblem. The reason might be that the aim of CS is to recon-struct a sparse signal while the solutions to general adaptivefiltering algorithms are not sparse. In fact, several LMS vari-ations [43][45], with some sparse constraints added in theircost functions, exist in sparse system identification. Thus, thesemethods can be applied to solve CS problem.

    In following subsections, -LMS algorithm and the ideaof zero attraction will be first introduced. Then -EFWLMS,which imposes zero attraction on EFW-LMS, is introduced forbetter performance. Finally, to speed up the convergence of thetwo new methods, a novel algorithm -ZAP, which adopts zeroattraction in solution space, is further proposed.

    B. Based on -LMS Algorithm

    LMS is the most attractive one in all adaptive filtering algo-rithms because of its simplicity, robustness and low computationcost. In traditional LMS, the cost function is defined as squared

    error

    (15)

    Consequently, the gradient descent recursion of the filter coef-ficient vector is

    (16)

    where positive parameter is called step-size.In order to improve the convergence performance when the

    unknown parameters are sparse, a new algorithm -LMS [43] is

    proposed by introducing a norm penalty to the cost function.The new cost function is defined as

    (17)

    where is a factor to balance the new penalty and the es-timation error. Considering that norm minimization is an NPhard problem, norm is generally approximated by a contin-uous function. A popular approximation [46] is

    (18)

    where the two sides of (18) are strictly equal when parameterapproaches infinity. According to (18), the proposed cost func-tion can be rewritten as

    (19)

    By minimizing (19), the new gradient descent recursion offilter coefficients is

    (20)

    where and is a component-wise sign functiondefined as

    elsewhere.(21)

    To reduce the computational complexity of (20), especiallythat caused by the last term, the first order Taylor series expan-sion of exponential functions is taken into consideration,

    elsewhere.(22)

    Note that the approximation of (22) is bound to be positive be-cause the value of exponential function is larger than zero. Thus,the final gradient descent recursion of filter coefficient vector is

    (23)

    where

    (24)

    and

    elsewhere.

    (25)

    The last term of (23) is called zero attraction term, whichimposes an attraction to zero on small coefficients. Since zerocoefficients are the majority in sparse systems, the convergence

    acceleration of zero coefficients will improve identification per-formance. In CS, the zero attraction term will ensure the sparsityof the solution.

    By utilizing the correspondence in Table I, the final solu-tion to CS problem can be obtained, which is summarize asMethod 1.

    Method 1. -LMS method for CS

    1: Initialize , , choose .

    2: while stop condition (14) is not satisfied.

    3: Determine the input vector and desired signal

    4: Calculate error

    5: Update using LMS

    6: Impose a zero attraction

  • 8/2/2019 Adaptive Sparse Reconstruction

    5/12

    JIN et al.: A STOCHASTIC GRADIENT APPROACH ON COMPRESSIVE SENSING SIGNAL RECONSTRUCTION 413

    7: Iteration number increases by one

    8: End while.

    C. Based on -EFWLMS Algorithm

    Recursive least square (RLS) is another popular adaptive fil-tering algorithm [47], [48], whose cost function is defined as theweighted sum of continuous squared error sequence

    (26)

    where is called forgetting factor and

    (27)

    The RLS algorithm is difficult to implement in CS becauseit costs a lot of computing resources. However, motivated byRLS, the approximation of its cost function with shorter sliding-window is considered, which suggests a new penalty

    (28)

    where is the length of the sliding-window. The algorithm,

    which minimizes (28), is called Exponentially ForgettingWindow LMS (EFW-LMS) [49]. The gradient descent recur-sion of the filter coefficient vector is

    (29)

    where

    (30)

    ......

    . . ....

    (31)

    (32)

    and

    (33)

    In order to obtain sparse solutions in the CS problem, zeroattraction is employed again. Thereby the final gradient descentrecursion of the filter coefficient vector is

    (34)

    This algorithm is denoted as -EFWLMS.

    Themethod to solve CS problem utilizing the correspondencein Table I based on -EFWLMS is summarized in Method 2.

    Method 2. -EFWLMS method for CS

    1: Initialize , choose

    2: while stop condition (14) is not satisfied.

    3: Determine input vectors .

    and desired signals

    For

    End for.

    4: Calculate error vector

    5: Update using EFW-LMS

    6: Impose a zero attraction

    7: Iteration number increases by one

    8: End while.

    D. Based on -ZAP Algorithm

    The two methods described above -LMS and -EFWLMScan be considered as solutions to problem. Observing(23) and (34), it is obvious that both gradient descent recursionsare consisted of two parts.

    gradient correction zero attraction (35)

    The gradient correction term is to ensure , and thezero attraction term is to guarantee the sparsity of the solution.Taking both parts into account, the sparse solution can finally

    be extracted. The updating procedures of the two methods pro-posed are shown in Fig. 2(a) and (b). However, convergence ofthe recursions may be slow because the two parts are hard tobalance.

    According to the discussions above, CS problem (2) is ill-conditioned anditssolutionis a subspace. Itimplies thatthe sparse solution can be searched iteratively in the solutionspace in order to speed up convergence. That is, the gradientcorrection term can be omitted. The updating procedures aredemonstrated in Fig. 2(c), where the initial vector of istaken as the least square (LS) solution, which belongs to thesolution space. Then in iterations, only the zero attraction termis used for updating the vector. The updated vector is replaced

    by the projection of the vector on solution space as soon as itdeparts from the solution space. Particularly, suppose is

  • 8/2/2019 Adaptive Sparse Reconstruction

    6/12

    414 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    Fig. 2. Updating procedures of the three methods, where s denotes the original signal and s ( 0 ) denotes the initial value. (a) lll -LMS; (b) lll -EFWLMS. (c)ll

    l -ZAP.

    the result gained after th zero attraction, its projection vector

    in the solution space satisfy the following equation:

    (36)

    Laplacian Method can be used to solve (36)

    (37)

    where is the pseudo-inverse matrix of least square. This method is called -Zero Attraction Projection( -ZAP), which is summarized in Method 3.

    Method 3. -ZAP method for CS

    1: Initialize , choose ,2: while stop condition (14) is not satisfied

    3: Update using zero attraction

    4: Project on the solution space

    5: Iteration number increases by one

    6: End while

    E. Discussion

    The typical performance of the three proposed methods arebriefly discussed here.

    Memory requirement: -LMS and -EFWLMS needstorage for , , and , so both their storage requirementsare about . -ZAP needs additional storagefor, at least, the Pseudoinverse matrix of least square, .For large scale situation, -ZAP requires about twice thememory of the other two algorithms.

    Computational complexity: the total computational com-plexity depends on the number of iterations required andthe complexity of each iteration. First, the complexity of

    each iteration of these methods will be analyzed. For sim-plicity, the complexity of each period, instead of that of

    each iteration, will be discussed. Here, a period is defined

    as all data in matrix has been used for one time. For ex-ample, in one period, (23) is iterated times in -LMSand the projection is used once in -ZAP. For each period,the complexity of the three methods is listed in Table II. Itcan be seen that

    (38)

    Second, the number of periods of these methods will bediscussed. It is impossible to accurately predict the numberof periods of the three proposed methods required to findan approximate solution. However, according to the abovediscussion, the following equation is always satisfied forthe number of periods

    (39)

    Thus, taking both (38) and (39) into consideration, -ZAPhas significantly lower computation complexity than

    -LMS and -EFWLMS. Because -LMS has lowercomplexity for each period but larger number of periodsthan -EFWLMS, a comparison between -LMS and

    -EFWLMS is hard to make. De-noise performance: -LMS and -EFWLMS inherit

    the merit of LMS algorithm that has good de-noise perfor-mance. For -ZAP

    (40)where and is an additive noise. Thus, theiterative vector is not projected on the true solution setbut the solution space with additive noise . However,we have

    (41)

    where denotes the expectation. The proof of (41) is inAppendix A. Equation (41) shows that the power of is farsmaller than that of since . Moreover, the dimen-sion of (e.g., ) is far smaller than that of (e.g., ).Therefore, -ZAP also has good de-noise performance.

    Implementation difficulty: -ZAP need two parametersand , while in -LMS and -EFWLMS, there is another

  • 8/2/2019 Adaptive Sparse Reconstruction

    7/12

    JIN et al.: A STOCHASTIC GRADIENT APPROACH ON COMPRESSIVE SENSING SIGNAL RECONSTRUCTION 415

    TABLE IICOMPUTATIONAL COMPLEXITY OF DIFFERENT METHOD IN EACH PERIOD

    [1] Please note the computations of zero attraction is not included in the

    above multiplicaitons and additions.

    parameter to be chosen. Thus, -ZAP is easier to controlthan the other two algorithms.

    F. Some Comments

    Comment 1: Besides the proposed -LMS and -EFWLMS,the idea of zero attraction can be readily adopted to improvemost LMS variants, e.g., Normalized LMS (NLMS), whichmay be more attractive than LMS because of its robustness.

    The gradient descent recursion of the filter coefficient vector of-NLMS is

    (42)

    where is the regularization parameter. These variants canalso improve the performance in sparse signal reconstruction.

    Comment 2: Equation (18) is one of the multiple approxima-tions of norm. In fact, many other continuous functions canbe used for zero attraction. For example, an approximation sug-gested by Weston et al. [46] is

    (43)

    where is a small positive number. By minimizing (43), thecorresponding zero attraction is

    (44)

    where

    (45)

    This zero attraction term can also be used in the proposed-LMS, -EFWLMS and -ZAP.

    III. CONVERGENCE ANALYSIS

    In this section, we will analyze the convergence performanceof -LMS. The steady-state mean square derivation betweenthe original signal and the reconstruction signal will be analyzedand the bound of parameter to guarantee convergence will bededuced.

    Theorem 1: Suppose that is the original signal, and is thereconstruction signal by -LMS, the final mean square deriva-tion in steady state is

    (46)

    where

    (47)

    (48)

    (49)

    (50)

    is the power of measurement noise. At the same time, in orderto guarantee convergence, parameter should satisfy

    (51)

    The proof of the theorem is postponed to Appendix B.As shown in Theorem 1, the final derivation is proportional toand the power of measurement noise. Thus, a large will re-

    sult in a large derivation; However, a small means a weak zeroattraction that will induce a slower convergence. Therefore, the

    parameter is determined by a tradeoff between convergencerate and reconstruction quality in particular applications.

    By (78) and (79) in Appendix we have the following corol-lary.

    Corollary 1: The upper bound of derivation is

    (52)

    The upper bound is a constant under a given signal; thus, itcan be regarded as a rough criterion to choose the parameters.

    IV. EXPERIMENT RESULTS

    The performances of the presented three methods are experi-mentally verified and compared with typical CS reconstructionalgorithms BP [1], SpaRSA [25], GPSR-BB [11], [18],Bregman iterative algorithm based on FPC (FPC_AS) [31],IRLS [33], and OMP [36]. In the following experiments,these algorithms are tested with parameters recommended byrespective authors. The entries of sensing matrix areindependently generated from normal distribution with meanzero and variance . The locations of nonzero coeffi-cients of sparse signal are randomly chosen with uniform

    distribution . The corresponding nonzero coefficients areGaussian with mean zero and unit variance. Finally, the sparsesignal is normalized. The measurements are generated by thefollowing noisy model

    (53)

    where is an additive white Gaussian noise with covariancematrix ( is an identity matrix).

    The parameters in stop condition (14) are for allthree methods, for -LMS and -EFWLMS,

    for -ZAP. Experiment 1. Algorithm Performance: In this experiment,

    the performances of the three proposed methods in solving CSproblem are tested. The parameters used for the signal model

  • 8/2/2019 Adaptive Sparse Reconstruction

    8/12

    416 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    Fig. 3. Reconstruction result of the three proposed methods.

    Fig. 4. Convergence performances of the three proposed methods.

    (53) are , , , . Theparameters for the three methods are as follows:

    -LMS: , , ; -EFWLMS: , , , ,

    ; -ZAP: , .

    The original signal and the estimation results obtained with-LMS, -EFWLMS, and -ZAP are shown in Fig. 3. It can

    be seen that all three proposed methods reconstruct the originalsignal. The convergence curves of the three methods are demon-strated in Fig. 4, where MSD denotes mean square derivation.For -LMS and -EFWLMS, all data of matrix is used oncein each iteration (please note that the stop condition is not usedhere). As can be seen in Fig. 4, -EFWLMS has the smallestMSD after convergence and -ZAP achieves the fastest con-vergence with sacrifice in reconstruction quality.

    To compare with the other algorithms, CPU time is used asan index of complexity, although it gives only a rough estima-tion of complexity. Our simulations are performed in MATLAB

    7.4 environment using an Intel T8300, 2.4-GHz processor with2-GB of memory, and under Microsoft Windows XP operating

    TABLE IIICPU TIME AND MSD

    Fig. 5. Probability of exact reconstruction versus sparsity K .

    system. The final average CPU time (of total ten times, in sec-onds) and MSD are listed in Table III. Here, the parameter inIRLS is . It can be seen that the proposed three methodshave the least MSD. In addition, -ZAP is fastest among listedalgorithms, though -LMS and -EFWLMS have no signifi-cant advantage over the other algorithms.

    Experiment 2. Effect of Sparsity on the Performance: This ex-periment explores the answer to this question: with the proposedmethods, how sparse a source vector should be to make its

    estimation possible under given number of measurements. Theparameters are the same as the first experiment except that thenoise variance is zero. Different sparsities (i.e., ) are chosenfrom 10 to 80. For each , 200 simulations are conducted tocalculate the probability of exact reconstruction in different al-gorithms. The results for all seven algorithms are demonstratedin Fig. 5. As can be seen, performances of the three proposedmethods far exceed those of the other algorithms. While all theother algorithms fail when sparsity is larger than 40, the threemethods proposed succeed until sparsity reaches 45. In ad-dition, the proposed three methods have similar good perfor-mances.

    Experiment 3. Effect of Number of Measurements on the Per-

    formance: This experiment is to investigate the probability ofexact recovery when given different numbers of measurements

  • 8/2/2019 Adaptive Sparse Reconstruction

    9/12

    JIN et al.: A STOCHASTIC GRADIENT APPROACH ON COMPRESSIVE SENSING SIGNAL RECONSTRUCTION 417

    Fig. 6. Probability of exact reconstruction versus measurement number M .

    and a fixed signal sparsity . The same setups of thefirst experiment is used except that the noise variance is zero.Different numbers of measurements are chosen from 140 to320. All these algorithms are repeated 200 times for each valueof , and the probability curves are shown in Fig. 6. Again, itcan be seen that the three proposed methods have the best perfor-mances. While all other algorithms fail when the measurementnumber is lower than 230, the three proposed methods can

    still reconstruct exactly the original signal until reaches 220.Meanwhile, the proposed algorithms have comparable good per-formances.

    Experiment 4. Robustness Against Noise: The fourth ex-periment is to test the effect of signal-to-noise ratio (SNR)on reconstruction performance, where SNR is defined asSNR . The parameters are the sameas the first experiment and SNR is chosen from 4 to 32 dB.For each SNR, all these algorithms are repeated 200 times tocalculate the MSD. Fig. 7 shows that the three new methodshave better performances than the other traditional algorithmsin all SNR. With the same SNR, the proposed algorithms canacquire small MSDs. In addition, the -EFWLMS has the

    smallest MSD and -ZAP has the largest MSD in the threenew methods. Obviously, the above results are consistent withdiscussions in previous sections.

    Experiment 5. Effect of Parameter on the Performance of

    -LMS: In this experiment, the condition (51) on step-sizeto guarantee the convergence of -LMS will be verified. Thesetups of this experiment are the same as the first experimentexcept that . For each , 100simulations are conducted to calculate the probability of exactreconstruction using -LMS with the parameters ,

    , and different step-sizes (from 0.3 to 1.1). Fig. 8demonstrates that exact reconstruction cannot be achieved atabout with respective values,

    which are consistent with the values calculated by condi-tion (51). This result verifies our derivation in Theorem 1.

    Fig. 7. Reconstruction MSD versus SNR.

    Fig. 8. Probability of exact reconstruction of l -LMS versus with differentM .

    V. CONCLUSION

    The adaptive filtering framework is introduced at the firsttime to solve CS problem. Two typical adaptive filtering algo-rithms -LMS and -EFWLMS, both imposing zero attractionmethod, are introduced to solve CS problem, as well as to verifyour framework. In order to speed up the convergence of thetwo methods, a novel algorithm -ZAP, which adopts the zeroattraction method in the solution space, is further proposed.Thus, the mean square derivation of -LMS in steady state hasbeen deduced. The performances of these methods have beenstudied experimentally. Compared with those existing typicalalgorithms, they can reconstruct signal with more nonzerocoefficients under a certain given number of measurements;while under a given sparsity, fewer measurements are required

    by these algorithms. Moreover, they are more robust againstnoise.

  • 8/2/2019 Adaptive Sparse Reconstruction

    10/12

    418 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    Up to now, there is no theoretical result for determining howto choose the parameters of the proposed algorithms and howmuch the number of measurements is in the context of RIP.These remain open problems for our future work. In addition,our future work includes the detailed discussion about the con-vergence performances of -EFWLMS and -ZAP.

    APPENDIX APROOF OF (41)

    Proof: The power of is

    (54)

    where the reason of the last equation of (54) holding is that thenoise and measurement matrix are independent.Suppose

    (55)

    As mentioned in Section I, is i.i.d. with . Let

    (56)

    Thus, for the diagonal components

    (57)

    Since is very large in CS, according to the central limittheorem [50], the following equation holds approximately

    (58)

    where denotes the variance. Similarly, for the non-diag-onal components

    (59)

    Because , we have

    (60)

    Thus,

    (61)

    Therefore, (54) can be simplified as

    (62)

    APPENDIX BPROOF OF THEOREM 1

    Proof: For s implicity, w e use , , a nd insteadof , and , respectively. Suppose that is the Wienersolution; thus

    (63)

    where is the measurement noise with zero mean. Define

    the misalignment vector as

    (64)

    Thus, we have

    (65)

    Equation (23) is equivalent to

    (66)Postmultiplying both sides of (66) with their respective trans-

    poses

    (67)

    Let

    (68)

    denote a second moment matrix of the coefficient misalignmentvector. Taking expectations on both sides of (67) and using theIndependence Assumption [48], there is

    (69)

    where

    (70)

    is the input correlation matrix,

    (71)

    is the minimum mean-squared estimation error and de-notes the trace.

    As mentioned in Section I, is i.i.d. Gaussian with meanzero and variance . Then

    (72)

  • 8/2/2019 Adaptive Sparse Reconstruction

    11/12

    JIN et al.: A STOCHASTIC GRADIENT APPROACH ON COMPRESSIVE SENSING SIGNAL RECONSTRUCTION 419

    Therefore, (69) can be simplified as

    (73)

    Let

    (74)

    Take the trace on both side of (73)

    (75)

    where

    (76)

    (77)

    Note that both and are bounded

    (78)

    (79)

    Therefore, the following equation should be satisfied to guar-antee convergence of (75):

    (80)

    We have

    (81)

    The final mean square derivation in steady state is

    (82)where

    (83)

    (84)

    (85)

    ACKNOWLEDGMENT

    The authors would like to thank D. Mao at the Universityof British Columbia for his part in improving the English ex-pression of this paper. The authors also would like to thank the

    anonymous reviewers for their valuable comments on this paper.

    REFERENCES

    [1] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, vol.52, no. 4, pp. 12891306, Apr. 2006.

    [2] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489509, Feb.2006.

    [3] E. Cands, Compressivesampling, in Proc.Int. Congr. Math., Madrid,Spain, 2006, vol. 3, pp. 14331452.

    [4] R. G. Baraniuk, Compressive sensing, IEEE Signal Process. Mag.,vol. 24, no. 4, pp. 118122, Jul. 2007.

    [5] M. Lustig, D. Donoho, and J. Pauly, Sparse MRI: The application of

    compressed sensing for rapidMR imaging,Magn. Resonance in Med.,vol. 58, no. 6, pp. 11821195, Dec. 2007.[6] S. Kirolos, J. Laska, and M. Wakin et al., Analog-to-digital informa-

    tion conversion via random demodulation, in Proc. IEEE Dallas Cir-cuits Syst. Conf., 2006, pp. 7174.

    [7] S. S. Chen, D. L. Donoho, andM. A. Saunders, Atomic decompositionby basis pursuit, SIAM J. Sci. Comput., vol. 20, no. 1,pp. 3361,1998.

    [8] S. S. Chen, D. L. Donoho, andM. A. Saunders, Atomic decompositionby basis pursuit, SIAM Rev., vol. 43, pp. 129159, 2001.

    [9] R. Tibshirani, Regression shrinkage and selection via the Lasso,J. R.Statist. Soc. B., vol. 58, pp. 267288, 1996.

    [10] E. van den Berg and M. P. Friedlander, In pursuit of a root, Dept. ofComput. Sci., Univ. of British Columbia, Tech. Rep. TR-2007-19, Jun.2007.

    [11] M. Figueiredo, R. Nowak, and S. Wright, Gradient projection forsparse reconstruction: Application to compressed sensing and otherinverse problems, IEEE J. Sel. Topics Signal Process., vol. 1, no. 4,

    pp. 586598, Dec.. 2007.[12] Y. Nesterov and A. Nemirovsky, Interior-point polynomial

    methods in convex programming, in Studies in Applied Mathe-matics. Philadelphia, PA: SIAM, 1994, vol. 13.

    [13] D. Luenberger , Linear and Nonlinear Programming, 2nd ed.Reading, MA: Addison-Wesley, 1984.

    [14] C. Johnson, J. Seidel, and A. Sofer, Interior point methodology for3-D PET reconstruction, IEEE Trans. Med. Imag., vol. 19, no. 4, pp.271285, Apr. 2000.

    [15] E. Cands and J. Romberg, l1-Magic: A collection of MATLAB rou-tines for solving the convex optimization programs central to com-pressive sampling, 2006 [Online]. Available: www.acm.caltech.edu/l1magic/

    [16] M. Saunders, PDCO: Primal-dual interior method for convex objec-tives, 2002 [Online]. Available: http://www.stanford.edu/group/SOL/software/pdco.html

    [17] C. Paige and M. Saunders, LSQR: An algorithm for sparse linearequations and sparse least squares, ACM Trans. Math. Software, vol.8, no. 1, p. 43C71, 1982.

  • 8/2/2019 Adaptive Sparse Reconstruction

    12/12

    420 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 4, NO. 2, APRIL 2010

    [18] S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinvesky, A methodfor large-scale 1-regularized least squares problems with appli-cations in signal processing and statistics, Dept. of Elect. Eng.,Stanford Univ., Stanford, CA, 2007 [Online]. Available: www.stan-ford.edu/.boyd/l1_ls.html, Tech. Report

    [19] M. Osborne, B. Presnell, and B. Turlach, A new approach to variableselection in least squares problems, IMA J. Numer. Anal., vol. 20, pp.389403, 2000.

    [20] B. Turlach,On algorithms forsolving least squares problems underanL1 penalty or an L1 constraint, in Proc. Amer. Statist. Assoc.; Statist.Comput. Sect., Alexandria, VA, 2005, pp. 25722577.

    [21] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle re-gression, Ann. Statist., vol. 32, pp. 407499, 2004.

    [22] D. Donoho and Y. Tsaig, Fast solution of l1 -norm minimizationproblems when the solution may be sparse, 2006 [Online]. Available:http://www.stanford.edu/

    [23] J. Friedman, T. Hastie, and R. Tibshirani, Pathwise coordinate op-timization, 2007 [Online]. Available: www-stat.stanford.edu/hastie/pub.htm

    [24] E. Hale, W.Yin,and Y. Zhang,A Fixed-pointcontinuationmethod forl1 regularized minimization with applications to compressed sensing,2007 [Online]. Available: http://www.dsp.ece.rice.edu/cs/

    [25] S. Wright, R. Nowak, and M. Figueiredo, Sparse reconstruction byseparable approximation, in Proc. ICASSP08, 2008, pp. 33733376.

    [26] G. Narkiss and M. Zibulevsky, Sequential subspace optimization

    method for large-scale unconstrained problems The Technion, Haifa,Israel, Tech. Rep. CCIT No.559, 2005.

    [27] M. Figueiredo and R. Nowak, A bound optimization approach towavelet-based image deconvolution, in Proc. IEEE Int. Conf. ImageProcess. (ICIP), 2005, pp. 782785.

    [28] I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholdingalgorithm for linear inverse problems with a sparsity constraint,Commun. Pure Appl. Math., vol. 57, pp. 14131541, 2004.

    [29] Y. Nesterov, Gradient methods for minimizing composite objectivefunction, CORE Discussion Paper 2007/76 [Online]. Available: http://www.optimization-online.org/DB_HTML/2007/09/1784.html

    [30] J. F. Cai, S. Osher, and Z. Shen, Linearized bregman iterations forcompressive sensing, Math. Comput., vol. 78, pp. 15151536, Oct.2008.

    [31] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative al-gorithm for l1-minimization withapplicationsto compressive sensing,SIAM J. Imaging Sci., vol. 1, no. 1, pp. 143168, 2008.

    [32] I. F. Gorodnitsky and B. D. Rao, Sparse signal reconstruction fromlimited data using FOCUSS: A re-weighted minimum norm algo-rithm, IEEE Trans. Signal Process., vol. 45, no. 3, pp. 600616, Mar.1997.

    [33] C.Rich and W. Yin, Iteratively reweighted algorithms for compressivesensing, in Proc. ICASSP, Apr. 2006, pp. 38693872.

    [34] I. Daubechies, R. DeVore, and M. Fornasier et al., Iterativelyre-weighted least squares minimization for sparse recovery, Commun.Pure Apple. Math., vol. 63, no. 1, pp. 138, 2010.

    [35] Y. C. Pati, R. Rezaiifar,and P. S. Krishnaprasad,Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition, in Proc. 27th Annu. Asilomar Conf. Signals, Syst.,Comput., Pacific Grove, CA, Nov. 1993, vol. 1, pp. 4044.

    [36] J. Tropp and A. Gilbert, Signal recovery from random measurementsvia orthogonal matching pursuit, IEEE Trans. Inf. Theory, vol. 53, no.12, pp. 46554666, Dec. 2007.

    [37] D. L. Donoho, Y. Tsaig, and J.-L. Starck, Sparse solution of under-determined linear equations by stagewise orthogonal matching pursuitTech. Rep., Mar. 2006.

    [38] D. Needell and R. Vershynin, Signal recovery from incomplete andinaccurate measurements via regularized orthogonal matching pursuit[Online]. Available: http://www-stat.stanford.edu/~dneedell/pa-pers/ROMP-stability.pdf. 2007

    [39] W. Dai and O. Milenkovic, Subspacepursuit for compressive sensing:Closing the gap between performance and complexity, arXiv:0803.0811v3 [CS.NA], Jan. 2009.

    [40] D. Needell and J. A. Tropp, Cosamp: Iterative Signal Recovery FromIncomplete and Inaccurate Samples, Appl. Comput. Harmonic Anal.,vol. 26, no. 3, pp. 301321, 2009.

    [41] T. Blumensath and M. E. Davies, Iterative hard thresholding for com-pressed sensing, Appl. Comput. Harmonic Anal., vol. 27, no. 3, pp.265274, 2009.

    [42] J. Trzasko and A. Manduca, Highly undersampled magnetic reso-nance image econstruction via homotopic l0-minimization, IEEETrans. Med. Imag., vol. 28, no. 1, Jan. 2009.

    [43] Y. Gu,J. Jin, andS. Mei, l0norm constraint LMSalgorithm forsparsesystem identification, IEEE Signal Process. Lett., vol. 16, no. 9, pp.

    774777, Sep. 2009.[44] J. Benesty and S. L. Gay, An improved PNLMS algorithm, in Proc. IEEE ICASSP, 2002, pp. II-1881II-1884.

    [45] R. K. Martin and W. A. Sethares et al., Exploiting sparsity in adaptivefilters, IEEE Trans. Signal Process., vol. 50, no. 8, pp. 18831894,Aug. 2002.

    [46] J. Weston, A. Elisseeff, and B. Scholkopf et al., Use of zero-normwith linear models and kernel methods, JMLR Special Iss. Variableand Feature Selection, pp. 14391461, 2002.

    [47] C F. N. Cowan and P. M. Grant, Adaptive Filters. Englewood Cliffs,NJ: Prentice-Hall, 1985.

    [48] S. Haykin , Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, 1986.

    [49] G. Glentis, K. Berberidis, and S. Theodoridis, Efficient least squaresadaptive algorithms for FIR transversal filtering,IEEE SignalProcess.

    Mag., vol. 16, no. 4, pp. 1341, Jul. 1999.[50] O. Kallenberg , Foundation of Modern Probability, 2nd ed. New

    York: Springer, 2002.

    Jian Jin was born in Zhejiang, China, in 1984. Hereceived the B.Eng. degree in electronic engineeringfrom Tsinghua University, Beijing, China, in 2007.He is currently pursuing the Ph.D. degree in elec-tronic engineeirng at Tsinghua University.

    His research interests include signal processingand compressive sensing.

    Yuantao Gu received the B.S. degree from XianJiaotong University, Xian, China, in 1998, and thePh.D. degree (with honors) from Tsinghua Uni-versity, Beijing, China, in 2003, both in electronicengineering.

    He joined the Faculty of Tsinghua Universityin 2003 and is currently an Associate Professor inthe Department of Electronics Engineering. He haspublished more than 30 papers on adaptive filtering,echo cancellation, and visual object tracking. Cur-rently, his research interests includes compressive

    sampling related topics, coding theory and protocol for reliable multicastdistribution, and multimedia services over wireless ad-hoc networks.

    Shunliang Mei received the B.Eng. degree in com-

    puter engineering (automatic control) from TsinghuaUniversity, Beijing, China, in 1970.

    He is currently a Professor and Ph.D. supervisor inelectronic engineering at the same university. He hasfinished more than ten national key research projectsin digital microwave and communication systems inover 30 years of academic career. He has publishedmore than 100 papers on a variety of topics in inter-national conferences and journals. He has supervisedover 80 M.S. and Ph.D. students. Currently, his re-

    search interests include next-generation wireless communication systems andbroadband wireless access.

    Mr. Mei received more than ten national awards in technology.