sequential bayesian sparse signal reconstruction …prior information using linear mmse...

11
1 Sequential Bayesian Sparse Signal Reconstruction using Array Data Christoph F. Mecklenbr¨ auker, Senior Member, IEEE, Peter Gerstoft, Member, IEEE, Ashkan Panahi, Student Member, IEEE, Mats Viberg Fellow, IEEE. Abstract—In this paper, the sequential reconstruction of source waveforms under a sparsity constraint is considered from a Bayesian perspective. Let the wave field which is observed by a sensor array be caused by a spatially-sparse set of sources. A spa- tially weighted Laplace-like prior is assumed for the source field and the corresponding weighted Least Absolute Shrinkage and Selection Operator (LASSO) cost function is derived. After the weighted LASSO solution has been calculated as the maximum a posteriori estimate at time step k, the posterior distribution of the source amplitudes is analytically approximated. The weighting of the Laplace-like prior for time step k +1 is then fitted to the approximated posterior distribution. This results in a sequential update for the LASSO weights. Thus, a sequence of weighted LASSO problems is solved for estimating the temporal evolution of a sparse source field. The method is evaluated numerically using a uniform linear array in simulations and applied to data which were acquired from a towed horizontal array during the long range acoustic communications experiment. Index Terms—sequential estimation, Bayesian estimation, spar- sity, weighted LASSO. I. I NTRODUCTION Compressive sensing and sparse reconstruction is widely used, but their sequential use have received little attention. In a Bayesian framework, the sparseness constraint is obtained using a Laplace-like prior which give rise to the 1 constraint. In the present contribution the focus is on developing a Bayesian sequential sparse approach by enforcing the prior at each step to be Laplace-like. Since the late nineties, numerous papers have appeared in the areas compressive sensing and sparse reconstruction which employ the penalization by a 1 -norm to promote sparsity of the solution, e.g. [1], [2], [3], [4], [5], [6]. Excellent books on the topic are available [7], [8]. The least absolute shrinkage and selection operator (LASSO) is a least squares regression constrained by an 1 -bound of the solution [1]. It is well- known that the LASSO sets many regression coefficients to zero favoring sparse solutions. Manuscript received March 18, 2013; revised July 31, 2013; accepted September 12, 2013. Date of publication XXX YY 2013. The associate editor coordinating the review was Dr. Maria Sabrina Greco. The work was funded by Grant ICT08-44 of Wiener Wissenschafts–, Forschungs– und Technologiefonds, National Science Foundation EAR-0944109, and the Swedish Research Council (VR). Christoph F. Mecklenbr¨ auker is with Institute of Telecommunications, Vienna University of Technology 1040 Vienna, Austria, [email protected] Peter Gerstoft is with University of California San Diego, La Jolla, CA 92093-0238, USA http://www.mpl.ucsd.edu/people/pgerstoft Ashkan Panahi, Mats Viberg are with Signals and Systems, Chalmers University of Technology, G¨ oteborg, Sweden, [email protected] Digital Object Identifier: 10.1109/TSP.2013.ZZZZZ Research on compressive sensing and sparse reconstruction has focused largely on batch processing. However, many practical settings require the online estimation of sparse signals from noisy data samples that become available sequentially [9], [10], [11]. Angelosante et al. proposed time-recursive al- gorithms for the estimation of time-varying sparse signals from observations obeying a linear model, and become available sequentially in time [10]. Their time-weighted LASSO is an 1 -norm regularized Recursive Least Squares algorithm. Eiwen et al. [11] proposed a sequential compressive channel estima- tor which is based on modified compressed sensing, which assumes that a part of the signal support is known a priori. The resulting compressive channel tracker uses a modified version of the orthogonal matching pursuit algorithm. An iterative pursuit algorithm is developed in [12] that uses sequential predictions for dynamic compressive sensing. It incorporates prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum A Posteriori (MAP) estimation for sparse signal reconstruction in a problem setting that is similar to [10]. First the LASSO cost function is generalized by weighting the regularization parameters. This allows to update the prior adaptively based on the past history of observations. Based on ideas in [16], we approximate a sequential MAP fil- ter which preserves sparsity in the source field estimates. Two types of approximations are derived in Secs. III-D and III-E and their behavior is compared in numerical examples. It is implemented by sequentially minimizing adaptively weighted LASSO cost functions using a single new measurement snap- shot in each time step. The theory is formulated so that it can be applied to sparse source field estimation in higher spatial dimensions. We have earlier demonstrated the sequential approach for a two dimensional array, where an earthquake rupture process evolves sequentially with sparse locations in both time and two dimensional space [17]. High resolution earthquake location has been obtained using compressive sensing [18]. The advantages of the sparse formulation for source local- ization and direction of arrival (DOA) estimation are: Numerical simulations indicate that it is a high-resolution method [3]. There is no need to estimate the cross-spectral density matrix. Only one snapshot is needed. Eigenvalue based high-resolution methods require a full or nearly full Cross Spectral Density Matrix (CSDM). As new information becomes available, the confidence in the solution is build up and the trajectories are updated.

Upload: others

Post on 24-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

1

Sequential Bayesian Sparse Signal Reconstructionusing Array Data

Christoph F. Mecklenbrauker, Senior Member, IEEE, Peter Gerstoft, Member, IEEE, Ashkan Panahi, StudentMember, IEEE, Mats Viberg Fellow, IEEE.

Abstract—In this paper, the sequential reconstruction of sourcewaveforms under a sparsity constraint is considered from aBayesian perspective. Let the wave field which is observed by asensor array be caused by a spatially-sparse set of sources. A spa-tially weighted Laplace-like prior is assumed for the source fieldand the corresponding weighted Least Absolute Shrinkage andSelection Operator (LASSO) cost function is derived. After theweighted LASSO solution has been calculated as the maximum aposteriori estimate at time step k, the posterior distribution of thesource amplitudes is analytically approximated. The weighting ofthe Laplace-like prior for time step k + 1 is then fitted to theapproximated posterior distribution. This results in a sequentialupdate for the LASSO weights. Thus, a sequence of weightedLASSO problems is solved for estimating the temporal evolutionof a sparse source field. The method is evaluated numericallyusing a uniform linear array in simulations and applied to datawhich were acquired from a towed horizontal array during thelong range acoustic communications experiment.

Index Terms—sequential estimation, Bayesian estimation, spar-sity, weighted LASSO.

I. INTRODUCTION

Compressive sensing and sparse reconstruction is widelyused, but their sequential use have received little attention. Ina Bayesian framework, the sparseness constraint is obtainedusing a Laplace-like prior which give rise to the `1 constraint.In the present contribution the focus is on developing aBayesian sequential sparse approach by enforcing the priorat each step to be Laplace-like.

Since the late nineties, numerous papers have appeared inthe areas compressive sensing and sparse reconstruction whichemploy the penalization by a `1-norm to promote sparsity ofthe solution, e.g. [1], [2], [3], [4], [5], [6]. Excellent books onthe topic are available [7], [8]. The least absolute shrinkageand selection operator (LASSO) is a least squares regressionconstrained by an `1-bound of the solution [1]. It is well-known that the LASSO sets many regression coefficients tozero favoring sparse solutions.

Manuscript received March 18, 2013; revised July 31, 2013; acceptedSeptember 12, 2013. Date of publication XXX YY 2013. The associateeditor coordinating the review was Dr. Maria Sabrina Greco. The workwas funded by Grant ICT08-44 of Wiener Wissenschafts–, Forschungs–und Technologiefonds, National Science Foundation EAR-0944109, and theSwedish Research Council (VR).

Christoph F. Mecklenbrauker is with Institute of Telecommunications,Vienna University of Technology 1040 Vienna, Austria, [email protected]

Peter Gerstoft is with University of California San Diego, La Jolla, CA92093-0238, USA http://www.mpl.ucsd.edu/people/pgerstoft

Ashkan Panahi, Mats Viberg are with Signals and Systems, ChalmersUniversity of Technology, Goteborg, Sweden, [email protected]

Digital Object Identifier: 10.1109/TSP.2013.ZZZZZ

Research on compressive sensing and sparse reconstructionhas focused largely on batch processing. However, manypractical settings require the online estimation of sparse signalsfrom noisy data samples that become available sequentially[9], [10], [11]. Angelosante et al. proposed time-recursive al-gorithms for the estimation of time-varying sparse signals fromobservations obeying a linear model, and become availablesequentially in time [10]. Their time-weighted LASSO is an`1-norm regularized Recursive Least Squares algorithm. Eiwenet al. [11] proposed a sequential compressive channel estima-tor which is based on modified compressed sensing, whichassumes that a part of the signal support is known a priori. Theresulting compressive channel tracker uses a modified versionof the orthogonal matching pursuit algorithm. An iterativepursuit algorithm is developed in [12] that uses sequentialpredictions for dynamic compressive sensing. It incorporatesprior information using linear MMSE reconstruction.

Here, we extend the Bayesian approach [13], [14], [15]to sequential Maximum A Posteriori (MAP) estimation forsparse signal reconstruction in a problem setting that is similarto [10]. First the LASSO cost function is generalized byweighting the regularization parameters. This allows to updatethe prior adaptively based on the past history of observations.Based on ideas in [16], we approximate a sequential MAP fil-ter which preserves sparsity in the source field estimates. Twotypes of approximations are derived in Secs. III-D and III-Eand their behavior is compared in numerical examples. It isimplemented by sequentially minimizing adaptively weightedLASSO cost functions using a single new measurement snap-shot in each time step.

The theory is formulated so that it can be applied tosparse source field estimation in higher spatial dimensions.We have earlier demonstrated the sequential approach for atwo dimensional array, where an earthquake rupture processevolves sequentially with sparse locations in both time and twodimensional space [17]. High resolution earthquake locationhas been obtained using compressive sensing [18].

The advantages of the sparse formulation for source local-ization and direction of arrival (DOA) estimation are:• Numerical simulations indicate that it is a high-resolution

method [3].• There is no need to estimate the cross-spectral density

matrix. Only one snapshot is needed. Eigenvalue basedhigh-resolution methods require a full or nearly full CrossSpectral Density Matrix (CSDM).

• As new information becomes available, the confidence inthe solution is build up and the trajectories are updated.

Page 2: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

2

• The solution is presented in terms of a sparsifyingLaplacian distribution which can then be used as a priorin the sequential processing. Relative to more generalsolutions like particle filters, the solution requires justone parameter per possible source location.

II. ARRAY DATA MODEL AND PROBLEM FORMULATION

Let {r1, . . . , rM} be a set of M hypothetical source loca-tions on a given fixed grid. Further, let xk = (xk1, . . . , xkM )T

be the associated complex-valued source vector at time stepk = 1, 2, . . . and frequency ω. We observe time-sampledwaveforms on an array of N sensors which are stacked ina vector. We obtain the following linear model which relatesthe short-time Fourier transformed sensor array output yk attime step k and at frequency ω to the source vector xk,

yk = Axk + nk . (1)

The mth column of the transfer matrix A is the array steeringvector am for hypothetical source location rm. The transfermatrix A is constructed by sampling all possible sourcelocations, but only very few of these can correspond to realsources. Therefore, the dimension of A is N × M withN � M and xk is sparse. In our setting, the numberof hypothetical source locations M is much larger than thenumber of sensors N , i.e. N � M and the linear model (1)are underdetermined.

We model the nmth element of A by e−jωτnm . Here τnm isthe traveltime from hypothetical location rm to the nth sensorelement of the sensor array. The additive noise vector nk isassumed to be spatially uncorrelated and follows the zero-mean complex normal distribution with diagonal covariancematrix σ2I . Further, the noise is assumed to be uncorrelatedacross time and frequency.

The history of all previous array observations is summarizedin Y k−1 = (y1, . . . ,yk−2,yk−1). Given the history Y k−1and the new data yk, we seek the maximum a posteriori(MAP) source estimate xk under a sparsity constraint. Toreconstruct a physically meaningful source vector xk, weenforce its sparsity through the introduction of a prior prob-ability density on xk which promotes sparsity. A widelyused sparseness prior is the Laplace probability density [14],[19] which puts a higher probability mass than the normaldistribution both for large and small absolute deviations.

The prior distribution for each xkm is assumed to follow aLaplace distribution with hyperparameter λkm, condensed invector λk = (λk1, . . . , λkM )T . A time-recursive expression ofthe sparse MAP source estimate at step k is obtained using asequential Bayesian sparse update rule

(xk, λk+1) = ϕ(yk, λk) (2)

Where the function ϕ will be derived in the following.In addition to xk, the hyperparameter λk+1 for the priordistribution for the next time step k+1 is computed. Due to thesparsity constraint it is necessary to let the hyperparameterscarry the sequential information from each step, and not theextracted parameters as is customary in sequential filtering[20].

Since the conditional distribution for yk given xk follows aGaussian (3) it is not given that the posterior will be Laplace-like. As the posterior for time step k is used as a prior inthe next time step k + 1 it is required to approximate theposterior with a Laplace-like distribution. The hyperparameterλk+1 is determined by matching either the probabilities forlarge source amplitudes (i.e., the tail of the distribution, seeSec. III-D) or the expected source amplitudes (Sec. III-E).

III. BAYESIAN FORMULATION

For the linear model (1), we arrive at the following condi-tional probability density for the single-frequency observationsgiven the source vector xk,

p(yk|xk) =exp

(− 1σ2 ‖yk −Axk‖22

)(πσ2)N

(3)

where ‖ · ‖p is the `p-norm.We assume a multivariate complex Laplace-like probability

density [21] for the source vector xk at step k, conditionedon the history Y k−1,

p(xk|Y k−1) =

M∏m=1

(λkm√2π

)2

e−λkm|xkm| . (4)

xkm = |xkm|ejφkm is the complex source signal at step kand hypothetical source location rm. Note that (4) defines ajoint distribution for |xkm| and φkm at step k for all m =1, . . . ,M . The associated hyperparameters λkm > 0 modelthe prior knowledge on the occurrence of a source at step k,at location rm. Taking the logarithm gives

− ln p(xk|Y k−1) =

M∑m=1

λkm|xkm|−2M∑m=1

lnλkm+M ln 2π .

(5)For the posterior probability density function (pdf) p(xk|Y k),we use Bayes’ rule to obtain the weighted LASSO costfunction to be minimized, cf. Eq.(6) in Ref. [16]:

− ln p(xk|Y k) =− ln

(p(yk|xk)p(xk|Y k−1)

p(yk)

)=‖yk −Axk‖22

σ2+

M∑m=1

λkm|xkm|+ ck,

=‖yk −Axk‖22

σ2+ ‖λk � xk‖1 + ck, (6)

= Lk(xk) + ck, (7)

in which the notation Lk(xk) is introduced and

ck = −2M∑m=1

lnλkm +N lnπσ2 +M ln 2π + ln p(yk) . (8)

Minimizing the weighted LASSO cost function (7) with re-spect to xk for given λk = (λk1, . . . , λkM )T and σ2, givesthe MAP source estimate xk. The � in (6) is the element-wise (Hadamard–) product, making it clear that this is acost function promoting sparse solutions in which the `1constraint is weighted by giving every source amplitude itsown hyperparameter λkm. The minimization of (7) constitutesa convex optimization problem.

Page 3: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

3

A. Posterior Probability Density

The purpose of this section is deriving an expression forthe posterior pdf p(xk|Y k). We distinguish the variable xk, afree variable for p(xk|Y k), and xk, the MAP source estimatewhich is treated as constant.

The set of active indices Mk at step k is introduced as theset of m such that xkm is non zero, i.e.

Mk = {m1,m2, . . . ,mR} := {m|xkm 6= 0}. (9)

The true xk and the estimated xk source amplitude vectorsare related by the source vector error ek:

xk = xk + ek , (10)

using the active indices

ekm =

{xkm, for m 6∈ Mk ,xkm − xkm, for m ∈Mk .

(11)

The log-likelihood function (7) is expanded

‖yk −Axk‖22= ‖Aek‖22 + ‖yk −Axk‖22−2Re[eHk A

H(yk −Axk)]= ‖Aek‖22 + ‖yk −Axk‖22−2∑

m∈Mk

Re[e∗kmaHm(yk −Axk)]

−2∑

m 6∈Mk

Re[x∗kmaHm(yk −Axk)] (12)

where we separated the sum and used (11). The sum overm ∈Mk is simplified by the following

Theorem 1. For given weighting λk, the MAP source estimate(minimizing solution) xk to (7) is characterized by (cf. [16])

aHmnk = λkmxkm|xkm|

, for m ∈Mk, (13a)

|aHmnk| < λkm, for m 6∈ Mk (13b)

with the normalized residual vector

nk =2

σ2(yk −Axk) . (14)

The proof is in Appendix A. Continuing from (12), gives

‖yk −Axk‖22 = ‖Aek‖22 + ‖yk −Axk‖22−σ2

∑m∈Mk

λkmRe(xkme

∗km

|xkm|)− σ2

∑m 6∈Mk

Re(x∗kmaHmnk).(15)

We expand the LASSO cost function Lk(·) (7) around the

MAP source estimate xk using (15) and then (10):

Lk(xk) =‖yk −Axk‖22

σ2+

M∑m=1

λkm|xkm|

−∑

m6∈Mk

Re(x∗kmaHmnk) + ‖Aek‖22

+∑

m∈Mk

λkm

[|xkm| − Re(

xkmx∗km

|xkm|)

](16)

= Lk(xk) + ‖Aek‖22+∑

m 6∈Mk

|xkm|(λkm − ζkm cos(φkm − ρkm)

)+∑

m∈Mk

|xkm|(λkm − λkm cos(φkm − φkm)

).

(17)

where we introduced the polar representations,

aHmnk = ζkmejρkm , xkm = |xkm|ejφkm . (18)

ζkm = |aHmnk| is the magnitude of the projection of thenormalized residual vector onto the mth array steering vector.By Theorem 1, λkm > ζkm for m 6∈ Mk. Equation (17)shows that the posterior pdf depends on both the unknownsource magnitudes |xkm| and phases φkm. In the Sec. III-B, anapproximate marginal posterior pdf for the source magnitudesis derived. This allows to approximate the next MAP estimatexk+1 as the minimizer of an updated weighted LASSO costfunction.

B. Marginalization with respect to source phases

From (4), it follows that the prior for the source phases isi.i.d. uniformly distributed on [−π, π]. We regard the phasesφkm as nuisance parameters and marginalize p(xk|Y k) =exp[−Lk(xk) − ck] in (7) inserting (17) and integrating thephases φkm out. The source vector errors are assumed smalland ‖ek‖22 is neglected in (17). Assuming the errors to be smallis common in sequential methods. For example the ExtendedKalman Filter (EKF) also assumes small errors so that theparameter estimate remains Gaussian. As demonstrated inthe sequel, we use this approximation so that the posteriorremains Laplacian. This is motivated by the assumption thatthe posterior pdf is concentrated around its peak point xk.Using (17), the approximate marginal posterior pdf dependssolely on the source magnitudes

p(|xk|∣∣Y k) =

∫exp[−Lk(xk)− ck] dφk1 · · · dφkM

=

∫e−Lk(xk)−ck+‖Aek‖22 e−‖Aek‖22 dφk1 · · · dφkM

≈ γk∏

m 6∈Mk

e−λkm|xkm|I0 (ζkm|xkm|)

×∏

m∈Mk

e−λkm|xkm|I0 (λkm|xkm|) (19)

Page 4: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

4

where |xk| is the vector of source magnitudes with elements|xkm|, γk is a normalization constant defined by

γk = exp(−Lk(xk)− ck) =M∏m=1

γkm (20)

γkm =

[∫∞

0e−λkmxI0(ζkmx) dx

]−1=√λ2km − ζ2km

for m 6∈ Mk

1 for m ∈Mk

and I0(z) = 1π

∫ π0exp(z cosφ) dφ is the modified Bessel

function of first kind and zero order, cf. [23]:9.6.16. To assessthe quality of the approximation in (19), we use the first orderTaylor expansion in the vicinity of the MAP estimate xk,

e−‖Aek‖22 ≈ 1− ‖Aek‖22. (21)

In (19), only the constant term was retained and the qualityof the approximation depends on the magnitude of ‖Aek‖22.The approximate marginal pdf (19) becomes improper asζkm → λkm because it cannot be normalized on the half line|xkm| > 0 in the limit ζkm = λkm. This difficulty occurs whenthe index m becomes active (see Sec. III-C). Its form is notLaplace-like (4) and we develop two types of approximationsfor fitting the posterior distribution at time step k to a Laplace-like prior distribution for time step k+1 in the following Secs.III-D and III-E.

C. Active indices

To estimate the posterior distribution for the active indiceswe use (19). For any given λkm > 0, the Laplace-like pdf (4)favors a model with small source amplitude |xkm|. In the lim-iting case λkm → 0, however, the Laplace-like pdf approachesa uniform pdf for the source amplitude |xkm| ∈ [0,∞] whichcannot be normalized. We express this informally as

p(|xkm|∣∣Y k) ∝ 1 for λkm → 0. (22)

Thus, when some index m has become active in the currenttime step k there will be no constraint on the correspondingamplitude and we define the posterior hyperparameter

λkm|k = 0 for m ∈Mk, (23)

Note that this model has no information to see how likely anactive index is to become inactive.

D. Asymptotic fit of Laplace-like distribution

Here we are concerned with the λkm for the inactiveindices m 6∈ Mk. The λkm for any m 6∈ Mk conveys priorinformation about how likely the index m is to transition tothe set of active indices Mk+1 at the next time step k + 1.The approximate marginal pdf (19) is close to Laplace-like(4) in terms of asymptotic behavior for large values of xkm.The MAP estimate xk is sparse and many xkm = 0, butthe variables xkm can take any value. In this section, xkm isassumed large. The first-order asymptotic expansion is used,

I0(ξ) ≈ eξ/√2πξ for large ξ . (24)

For amplitude |xkm| we get

p(|xkm|∣∣Y k) ≈

e−(λkm−ζkm)|xkm|√ζkm|xkm|

for m 6∈ Mk. (25)

The asymptotic expansion (24)–(25) has a small approxima-tion error for |xkm| � ζ−1km. Taking the logarithm analogouslyto (5), gives

− ln p(|xkm|∣∣Y k) ≈ (λkm − ζkm)|xkm|

+1

2ln ζkm +

1

2ln |xkm|. (26)

Since ζkm < λkm (Theorem 1) and |xkm| is large, weneglect ln ζkm and ln |xkm| in (26). Finally, all these stepsand approximations give the posterior weighted `1-penalty

− ln p(|xk|∣∣Y k) ≈

∑m 6∈Mk

(λkm − ζkm) |xkm| . (27)

Comparison with (4) and combination with (23) shows thatthis translates to the posterior weight vector λk|k with ele-ments

λkm|k =

{λkm

(1− ζkm

λkm

)for m 6∈ Mk,

0 for m ∈Mk,(28)

The weight λkm|k reduces to zero when index m becomesactive (see Sec. III-C). The first line in (28) is always positiveas ζkm < λkm.

E. Mean fit of Laplace-like distributionAlternatively to the approach in Sec. III-D, the expected

source amplitudes for the assumed the Laplace-like distri-bution (4) can be fitted to the expected posterior sourceamplitudes.

µkm|k = E{|xkm|∣∣Y k} =

∫RM

+

amp(a∣∣Y k) da1 · · · daM (29)

Proceeding from (19) for m 6∈ Mk, this gives

µkm|k ≈ γkm

∫ ∞0

ame−λkmamI0(ζkmam) dam (30)

=γkmλkm

(λ2km − ζ2km)32

=λkm

λ2km − ζ2km(31)

For a Laplace-like distribution,

µkm|k =1

λkm|k. (32)

Combining (31) and (32) gives the elements of the posteriorweight vector λk|k for m 6∈ Mk. The corresponding elementsfor m ∈Mk are obtained in the limit ζkm → λkm consistentwith (23). The posterior weight vector λk|k becomes

λkm|k =

{λkm

(1− ζ2km

λ2km

)for m 6∈ Mk,

0 for m ∈Mk,(33)

For m ∈ Mk, the weight λkm|k reduces to zero when indexm becomes active.

The posterior weight according to (28) is always smallerthan posterior weight (33) for m 6∈ Mk because λkm > ζkmfor m 6∈ Mk by Theorem 1. We choose (33) rather than (28),because the weighting in the cost function (6) will then playa stronger role.

Page 5: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

5

IV. SIMPLE SOURCE MOTION MODEL

After having estimated the posterior λk|k with elementsdefined by (33), the next step is to predict λk+1 for theLaplace-like prior (4) at time step k+1. Time-varying sourcesignal estimates are obtained by varying the active set (creatingnew active indices as well as annihilating old ones). Here,a very simple random walk model is assumed: A sourceoccurring at location rm at time step k+1 is either inheritedfrom the previous time step k with probability (1 − α) ornewly generated from an unweighted (uninformed) Laplace-like prior with parameter λ′0 with probability α. Thus, wedefine the hyperparameter for time step k + 1,

λk+1 := (1− α)λk|k + αλ′01 , (34)

where 1 is the all-1 vector. λ0 = αλ′0 plays arole which is similar to the process noise covariancein the Kalman filter, see Sec. VI. If λ0 is large thenminm(λk+1,m|k)/maxm(λk+1,m|k) will be close to one.Then, the history Y k has only little influence on the futureestimate xk+1. Conversely, for a smaller λ0, the adaptivityof the weighting through the posterior (28) will be moreinfluential to the future sequential estimates.

V. IMPLEMENTATION OF ϕ USING LASSO PATH

We define the positive µ parameter and the vector of non-negative weighting coefficients wk via λk = µwk, ‖wk‖∞ =1, equivalently wkm = λkm/µ. Thus the weights wkm areobtained by the previous data and the source model andnormalized so that maxm wkm = 1. For the data yk the sourcevector xk is found by minimizing

Lk(xk) =‖yk −Axk‖22

σ2+ µ

M∑m=1

wkm|xkm| . (35)

The optimization (35) is convex and is solved assuming a fixednumber of sources R. However, µ is unknown and adjusted toreach a desired number of active indices R as described below.The weighted complex-valued LASSO is based on solvingLASSO by homotopy [16], [22], as indicated in Fig. 1.

The solution of (35) evolves with µ, due to the uniformcontinuity of the cost. Furthermore, for sufficiently large µ,xk=0. Thus, a solution with R active indices is found bypursuing the continuum from a large µ. The evolution ofµ from large to smaller values is segmented into intervals(µp+1, µp) for p = 0, 1, . . ., in which the set of active indicesis constant, for p = 0 corresponds to µ0=∞. Varying µ withinan interval changes just xkm for the active indices m∈ Mk.Each interval (µp+1, µp) has an end-point at a candidate pointµp+1, where the active set Mk changes by including ordeleting an element (potentially multiple elements).

Creation is defined as including an index l 6∈ Mpk into

Mp+1k if |xkl| > 0 as µ decreases below µp+1. Vice versa,

annihilation is the process of deleting an index l ∈ Mpk

as xkl → 0 with decreasing µ. Due to the (rather unlikely)possibility of annihilation, the number of active indices R canbe less than p for µ ∈ (µp+1, µp).

The sequence of µ with source amplitudes xk(µ) is foundusing the optimality conditions in Theorem 1. Starting from

µ large, in the first interval (µ1, ∞), xk(µ) = 0. Then, µ1

is the smallest value, for which Theorem 1 is satisfied withxk(µ) = 0:

µ1 = min(µ) s.t. 2|aHmyk| ≤ µwkmσ2 ∀m

= maxm=1,...,M

2|aHmyk|wkmσ2

. (36)

The LASSO is next solved for µ < µ1. Recalling Theorem1 with a given active set Mk from the previous candidatesolution xk(µ), the next candidate point µp+1 is the smallestµ (< µp) with the active indexes unchanged, i.e.

µp+1 = min(µ) s.t. 2|aHm(yk −Axk(µ))| ≤ µwkmσ2 ∀mand xkm 6= 0 ∀m ∈Mp

k (37)

where xk(µ) is the LASSO solution for given µ and Mpk is

the set of R active indexes in the interval (µp+1, µp):

Mpk = {m1,m2, . . . ,mR} := {m|xkm 6= 0}. (38)

The contracted dictionary AM, its Moore-Penrose pseu-doinverse A+

M, contracted source amplitudes xM, and con-tracted phase-weighted weights wM are defined by

AM = [am1. . .amR

] (39)A+M = (AH

MAM)−1AHM (40)

xM(µ) = [xkm1(µ) . . . xkmR

(µ)]T (41)

wM =

[wkm1

xkm1

|xkm1 |. . . wkmR

xkmR

|xkmR|

]T. (42)

The projection matrix P⊥AMdefines the projection onto the

orthogonal complement of the range of AM,

P⊥AM= I −QMQ

HM, where QM = orth(AM) (43)

is a unitary basis of the range of AM. The following Theorem2 provides an approximate solution to the minimization prob-lem (37) for a given set of active indices. The approximation isbased on the assumption that the wM vector ( 42) is constantinside the interval (µp+1, µp). Equivalently, the phase anglesof all elements of xM(µ) are assumed to be constant whichis a first order approximation in µ.

Theorem 2. The approximate solution to (37) is given by

µp+1k := max{µ+, µ−}, (44)

where the creation µ+ and annihilation µ− are

µ+=maxm≤M

Q

(2aHmP

⊥AM

ykσ2wkm

,aHmA

+HM wMwkm

)(45)

µ−=2

σ2max

l=1,...,RxMl/wMl, (46)

where Q(α, β) is defined in Appendix C, and

xM = A+Myk , (47)

wM = (AHMAM)−1wM. (48)

xM is the bias-corrected estimate.

The proof is in Appendix B. The implementation ofthe sequential Bayesian sparse signal reconstruction update

Page 6: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

6

ϕ(yk,λk) in (2) is given concisely in Table I. Line 8 “Defineactive set” is implemented with a numerical peak finder andsubsequent thresholding. Note the bias correction step in line13 “choice 2” using just the active indexes [30]:

xM = A+Myk. (49)

Implementation of ϕ(yk, λk) defined via Eq. (2):Input: yk ∈ CN , λk ∈ RM

+

Given constants: A ∈ CN×M , Rmax ∈ N, ε > 0, α, λ0 > 01: wk := λk/‖λk‖∞ (the weighting coefficients are normalized)2: p := 03: x0

k := 0, M0k := ∅, R := 0 (there are no active indices)

4: while R ≤ Rmax do

5: Evaluate µp+1 according to{

Eq.(36) for p = 1Eq.(44) for p > 1

6: µ := (1− ε)µp+1 (we choose µ slightly smaller than µp+1)7: xp+1

k := argminx

(− ln p(x|Y k)) using (7) for λk = µwk .

8: Define active set Mp+1k according to (9)

9: R := card(Mp+1k ) (R is the new no. of active indices)

10: p := p+ 111: end12: Mk :=Mp

k (this is the active set of the MAP estimate)

13:{xk = xp

k (choice 1: MAP estimate)xM =A+

Myk (choice 2: bias corrected)

14: Evaluate posterior λk|k using{

choice 3 : (28)choice 4 : (33)

15: Evaluate prior λk+1 for next time step using (34) with α, λ0.Output: xk ∈ CM , λk+1 ∈ RM

+ .

TABLE ISEQUENTIAL BAYESIAN SPARSE SIGNAL RECONSTRUCTION

VI. NUMERICAL EXAMPLE

In this section we investigate the performance of the pro-posed sequential sparse estimation procedure using computersimulations. For the sequential Bayesian sparse signal recon-struction, we initialize the prior vector λ1 with all ones: allhypothetical source locations have the same prior. We thenapply (2) sequentially for all time steps k = 1, 2, 3, . . . withthe implementation given in Table I with ε = 0.05.

We use synthetic data on a uniform linear array withN = 64 elements with half-wavelength spacing and using 50snapshots sequentially (k = 1, . . . , 50). The traveltime τnmfor plane waves impinging on the nth sensor is modeled as

ωτnm = π(n− 1) cos θm, (50)

for a hypothetical source at direction of arrival rm = θm =(m − 1) πM rad with M = 180 and n = 1, . . . , N ; m =1, . . . ,M . The simulated scenario has R = 9 far-fieldsources modeled as emitting plane waves (1). The uncor-related noise n is N (0, I), i.e. 0 dB power. Eight sourcesare stationary at θT = [45, 60, 76, 99, 107, 120, 134, 162]degrees relative to endfire with constant power levels (PL)[−5, 10, 5, 0, 11, 12, 9, 25] dB. The moving source starts at65◦ and moving +0.5◦/snapshot and has a power level of20 dB.

A. Beamforming

In beamforming the Cross Spectral Density Matrix (CSDM)is estimated based on all snapshots yk for k = 1, . . . , 50. TheCSDM R estimate is singular as it is snapshot deficient, withat least 64−50 = 14 eigenvalues equal zero. For conventionalbeamforming, the resolution is not sufficient to separate all thesources (Fig. 2). To numerically stabilize the matrix inversionin the adaptive beamforming, diagonal loading is applied at alevel of ε = −10 dB relative to the sum of the eigenvalues. Thestabilized beamformer output B for a given steering vector sis

B = wHRw with w =[R+ εI]−1s

sH[R+ εI]−1s. (51)

For the stationary case (Fig. 2a), the conventional beamformer,ε = ∞ in (51), resolves all 8 sources whereas the MinimumVariance Distortionless Response (MVDR) ε = −10dB, justresolves 6 sources. For this scenario, we observe that abovebeamforming finds the source location well for the stationarycase (Fig. 2a), but the moving source masks the stationarysource at θ = 76◦ (Fig. 2b). The observed bias in the powerestimate for the MVDR is due to the snapshot deficient CSDM[24]. More advanced adaptive beamforming might be able tobetter estimate the source locations in this scenario with highertime resolution, e.g the multi-rate adaptive beamforming [25],[26]. Multi-rate beamforming uses long time averages toproduce the MVDR steer vectors and shorter averages for thedata CSDM. It was introduced to track a stable signal in thepresence of dynamic interferers [25].

In contrast, the sparse localization works directly with asingle snapshot optimization (7) giving higher time-resolution.Successive snapshot localizations then include the informationcarried forward from previous localizations via the prior andthe hyperparameters. This enables finding lower poweredsources than if each snapshot was used independently.

B. Sequential LASSO

Going down the LASSO path, Sec. V, is computationallydemanding, especially for the 9 sources involved in the studiedscenario. Since the noise level is constant over time, the λshould not change much.

It is not necessary going down the full LASSO path.Therefore, each iteration is started at a µ slightly larger thanthe expected final µ. Assuming that the active indices do notchange, the second term in (35) is proportional to µλ0. Weuse µ = 400 for λ0 = 0.05 and µ = 40 for λ0 = 0.5.

For a given value of µ, the convex optimization (35) issolved using CVX [27], [28], [29]. We carry out the sim-ulations with the mean fitting approach in Sec. III-E whichcorresponds to “choice 4” in line 14 of Table I.

1) Stationary sources: The posterior weights λk|k definedby (33) are updated after each step and used as prior informa-tion to the following snapshot k+1 forming the sequential es-timation approach. The non-sequential LASSO results shownin Fig. 3a show more variation than those from the proposedsequential Bayesian sparse signal reconstruction shown in Fig.3b.

Page 7: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

7

The constraint in (7) that sparsifies the solution causes thepower estimates (peak of solid in Fig. 3a) to be underestimatedespecially for the low-power sources (e.g., θ = 45◦ and 99◦).Using the unbiased estimate (choice 2 in line 13 of Table I)gives powers within 3 dB of the correct ones.

The posterior weights show how the solution is being guidedtowards the solution of the active indices in the next step.

2) Moving sources: For a moving source scenario (Figs. 5and 6), it will be difficult observing the stationary sources us-ing beamforming. Since the proposed approach works directlyon a snapshot, there is sufficient time resolution to track bothmoving and stationary sources, cf. Fig. 5a.

The temporal evolution of the weights is shown in Fig.6. The weights for the asymptotic fitting technique in Sec.III-D are shown in Figs. 6a and 6b. For the asymptotic fittingtechnique, the difference in the weights between active indicesand inactive ones is rather small. On the other hand, theweights for the mean fitting technique in Sec. III-E are shownin Figs. 6c and 6d. It is clearly seen that the contrast betweenthe weights for active and inactive indices is much larger.

The value of λ0 determines how long the weight willremember a track, the non-sequential LASSO results areobtained with α = 1 in (34). A large λ0 = 0.5 causes theweights to faster forget a previously estimated source position(Figs. 6a and 6c) and gives more stable track estimates (Fig.5a) than for the smaller λ0 = 0.05. The weights for the smallλ0 = 0.05 are shown in Figs. 6b 6d and the correspondingsequential Bayesian sparse signal reconstruction in Fig. 5b. Asmall λ0 results in a longer memory as seen in Figs. 6b and6d which is beneficial for the sparse reconstruction of staticsource scenarios.

From Fig. 7 it is observed that the posterior weights becomeclose to zero near the active indices. The weights have abroader valley near endfire where the resolution cells arelarger. As the moving source moves from 65◦ to 75◦ theweights for the corresponding set of source locations shift.

VII. EXPERIMENT

The data is from a towed horizontal array during the longrange acoustic communications (LRAC) experiment [32], [33]from 10:00–10:13 UTC on 16 September 2010 in the NEPacific in 5-km water depth. Other data periods yield similarresults to those shown here. The array was towed at 3.5 knotsat a depth of 200 m. The data were sampled at 2000 Hz usinga nested array with each configuration having 64 channels[34] with the the ultra low frequency (ULF) array hydrophonespacing 3 m (fd = 250 Hz).

Each 4 s snapshot was Fourier transformed with 213 sampleswithout overlapping snapshots. The beamformed time series,Fig. 8a, is based on single snapshots and performed at onequarter wavelength element spacing, i.e. 125 Hz. The broadarrival at 150–165◦ is from the towship R/V Melville (180◦ isforward looking). Apparently, the two arrivals at 45◦ and 60◦

come from distant transiting ships, although a log of ships inthe area was not kept. Overall, the beam time series showslittle change with time.

The sparse sampling was performed assuming K = 10discrete signals and for sequential processing λ0 = 0.05.

Fig. 8b shows the non-sequential estimates, while the dis-crete features are well-captured with the processing, there aresignificant fluctuations in the estimates from each snapshot.This is stabilized when using the sequential processing, Fig.8c.

VIII. CONCLUSION

A sequential weighted LASSO problem formulation is usedto estimate the spatiotemporal evolution of a sparse sourcefield from observed samples of a wave field. The prior isadapted based on the past history of observations. A sequentialMAP filter is derived which preserves sparsity in the sourcefield estimates by enforcing the priors at each estimation stepto be Laplacian. The sequential information is carried in thehyperparameters.

With a uniform linear array it is demonstrated that theproposed sequential Bayesian sparse signal reconstruction ob-tains high resolution in both time in space relative to adaptivebeamforming. Using towed array data, the estimated tracksbecome more stable with the sequential estimation.

ACKNOWLEDGMENT

The authors like to thank Florian Xaver for valuable dis-cussions and comments on the manuscript.

APPENDIX A

The proof of Theorem 1 is similar to Appendix A in [22],see also [31]. Suppose the active set in (7) is given by Mk atthe global minimum L (x). For an infinitesimal change dxkmat the mth element of xk, we have

dLk

∣∣∣xk

=

Re[(− 2σ2a

Hm(yk −Axk) + λkm

xkm

|xkm|

)dx∗km

]for m ∈Mk

Re[− 2σ2a

Hm(yk −Axk) dxkm

]+ λkm|dxkm|

for m 6∈ Mk

(52)The variations in L at the global minimum xk satisfy

dLk ≥ 0 for every m ∈ {1, . . . ,M} and dxkm ∈ C. Theminimality condition is satisfied for m ∈Mk if and only if

− 2

σ2aHm(yk −Axk) + λkm

xkm|xkm|

= 0. (53a)

Alternatively, if m 6∈ Mk then

2

σ2|aHm(yk −Axk)| < λkm (53b)

APPENDIX B

The proof of Theorem 2 proceeds similar to Appendix B in[22]. The aim is to find the next singular point µp+1 definedin (37). We rewrite the minimality condition (53a) as

AHM(yk −AMxM(µ)) = µ

σ2

2wM. (54)

This is solved by

xM(µ) = A+Myk − (AH

MAM)−1µσ2

2wM. (55)

Page 8: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

8

It is assumed that the phase of the solution is approximatelyconstant in one µ-interval [16]. Then also wM is approxi-mately constant for all µ ∈ (µp+1, µp) and xM(µ) dependsapproximately linearly on µ. Then, we have

yk−AMxM(µ) = yk −A+HM

(AHMyk − µ

σ2

2wM

)= P⊥AM

yk + µσ2

2A+HM wM (56)

where P⊥AMis the orthogonal projection matrix defined in

(43). For the creation case, according to (37) and (56), µ+ isthe smallest µ value, for which

2|aHm(P⊥AMyk + µ

σ2

2A+HM )| ≤ µwkmσ2 (57)

for every index m. This implies that µ+ is the largest valuein the set of positive solutions of

µ+ =

∣∣∣∣∣2aHmP⊥AMyk

σ2wkm+ µ+a

HmA

+HM wMwkm

∣∣∣∣∣ , (58)

for every m, which is solved by the Q-function in AppendixC. For the annihilation case, (37) and (55) is combined. Afterdefining xM and wM (47–48), we express (55) as

xM(µ) = xM − µσ2

2wM. (59)

Due to (37) annihilation occurs when xMl(µ) = 0, whichgives µ = 2xMl/(σ

2wMl) for some l ∈ {1, . . . , R}. Thus, µ−

is the maximum of such annihilation points. This completesthe proof.

APPENDIX C

In (45), the function Q(·, ·) is introduced. The real-valuedfunction Q(a, b) = % of complex numbers a, b is defined as thepositive root of % = |a+%b|. The resulting quadratic equationis

(|b|2 − 1)%2 + 2Re(a∗b)%+ |a|2 = 0. (60)

Its roots are

%± = −Re(a∗b)

|b|2 − 1±

√(Re(a∗b)

|b|2 − 1

)2

− |a|2|b|2 − 1

(61)

If |b|2 > 1 then Q(a, b) := %+ else Q(a, b) := %−.

REFERENCES

[1] R. Tibshirani: Regression Shrinkage and Selection via the Lasso, J. R.Statist. Soc. B, vol. 58, no. 1, pp. 267–288, 1996.

[2] I. F. Gorodnitsky and B. D. Rao: Sparse signal reconstruction fromlimited data using FOCUSS: A re-weighted minimum norm algorithm,IEEE Trans. Signal Process., vol. 45, no. 3, pp. 600–616, 1997.

[3] D. M. Malioutov, C. Mujdat, A. S. Willsky: A sparse signal reconstruc-tion perspective for source localization with sensor arrays, IEEE Trans.Signal Process., vol. 53, no. 8, pp. 3010–3022, Aug. 2005.

[4] E. J. Candes, J. Romberg, T. Tao: Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.

[5] D. Donoho: Compressed sensing, IEEE Trans. Inf. Theory, vol. 52, no.4, pp. 1289–1306, Apr. 2006.

[6] R. G. Baraniuk: Compressive sensing, IEEE Signal Processing Mag.,vol. 24, no. 4, pp. 118–120, Jul. 2007.

[7] M. Elad: Sparse and Redundant Representations: From Theory toApplications in Signal and Image Processing, Springer; 2010

[8] Y. C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applica-tions, Cambridge University Press; 2012.

[9] D. M. Malioutov, S. Sanghavi, A. S. Willsky: Sequential CompressedSensing, IEEE J. Sel. Top. Sig. Proc, vol. 4, pp. 435–444, 2010.

[10] D. Angelosante, J. A. Bazerque, G. B. Giannakis: Online adaptiveestimation of sparse signals: Where RLS meets the `1-norm, IEEETrans. Signal Process., vol. 58, no. 7, pp. 3436–3447, 2010.

[11] D. Eiwen, G. Taubock, F. Hlawatsch, H. G. Feichtinger: Compressivetracking of doubly selective channels in multicarrier systems based onsequential delay-Doppler sparsity, in Proc. ICASSP 2011, pp. 2928–2931, Prague, Czech Republic, May 2011.

[12] D. Zachariah, S. Chatterjee, M. Jansson: Dynamic Iterative Pursuit, IEEETrans. Signal Process., vol. 60, no. 9, pp. 4967–4972, Sep. 2012.

[13] D. P. Wipf, B. D. Rao: An Empirical Bayesian Strategy for Solvingthe Simultaneous Sparse Approximation Problem, IEEE Trans. SignalProcess., vol. 55, no. 7, pp. 3704–3716, Jul. 2007.

[14] S. Ji, Y. Xue, L. Carin: Bayesian compressive sensing, IEEE Trans.Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008.

[15] S. D. Babacan, R. Molina, A. K. Katsaggelos: Bayesian compressivesensing using Laplace priors, IEEE Trans. Image Proc., vol. 19, no. 1,pp. 53–63, Jan. 2010.

[16] A. Panahi, M. Viberg: Fast LASSO based DOA tracking, in Proc. FourthInternational Workshop on Computational Advances in Multi-SensorAdaptive Processing (IEEE CAMSAP 2011), San Juan, Puerto Rico,Dec. 2011.

[17] P. Gerstoft, C. F. Mecklenbrauker, H. Yao: Bayesian sparse sensing ofthe Japanese 2011 earthquake, in Proc. 46th Asilomar Conference onSignals, Systems, and Computers, Pacific Grove (CA), USA, Nov. 4–7,2012.

[18] H. Yao, P. Gerstoft, P. M. Shearer, C. F. Mecklenbrauker: Compressivesensing of Tohoku-Oki M9.0 earthquake: Frequency-dependent rupturemodes, Geophys. Res. Lett., 38, L20310, 2011.

[19] J. M. Bernardo and A. F. M. Smith: Bayesian Theory. New York: Wiley,1994.

[20] B. Ristic, S. Arulampalam, N. Gordon, Beyond the Kalman Filter:Particle Filters for Tracking Applications. Boston, MA: Artech House,2004.

[21] Z. He, S. Xie, S. Ding, A. Cichocki: Convolutive blind source separationin the frequency domain based on sparse representation, IEEE Trans.Audio, Speech, and Language Proc., vol. 15, no. 5, Jul. 2007.

[22] A. Panahi, M. Viberg: Fast candidate points selection in the LASSOpath, IEEE Signal Process. Lett., vol. 19, no. 2, pp. 79–82, Feb. 2012.

[23] M. Abramowitz and I. A. Stegun. Handbook of Mathematical FunctionsWith Formulas, Graphs, and Mathematical Tables. 10th Printing, Dec.1972, with corrections.

[24] H. C. Song, W.A. Kuperman, W.S. Hodgkiss, P. Gerstoft, J. S. Kim:Null broadening with snapshot-deficient covariance matrices in passivesonar, IEEE J. Oceanic Engineering, vol. 28, no. 2, pp. 250–261, Apr.2003.

[25] H. Cox: Multi-rate adaptive beamforming (MRABF), in Proc SensorArray and Multichannel Sig Proc Workshop, Boston (MA), USA, pp.306–309, Mar. 16–17, 2000.

[26] J. Traer, P. Gerstoft: Coherent averaging of the passive fathometerresponse using short correlation time, J. Acoust. Soc. Am., vol. 130,no. 6, pp. 3633–3641, Dec. 2011.

[27] M. Grant, S. Boyd, CVX: Matlab software for disciplined convexprogramming, version 1.21, http://cvxr.com/cvx, Apr. 2011.

[28] M. Grant and S. Boyd: “Graph implementations for nonsmooth convexprograms,” in Recent Advances in Learning and Control, pp. 95–110(Springer-Verlag Limited, 2008).

[29] S. P. Boyd, L. Vandenberghe: Convex optimization, Chapters 1–7,Cambridge University Press, 2004.

[30] M.A.T. Figueiredo, R.D. Nowak, S.J. Wright, J. Stephen: Gradientprojection for sparse reconstruction: Application to compressed sensingand other inverse problems, IEEE J. Sel. Topics Signal Process., vol. 1,no. 4, pp 586–597, Dec. 2007.

[31] M. R. Osborne, B. Presnell, A. Turlach: On the LASSO and its Dual,J. Comp. Graphical Stat., vol. 9, no. 2, pp. 319–337, Feb. 2000.

[32] H. C. Song, S. Cho, T. Kang, W. S. Hodgkiss, and J. R. Preston, “Long-range acoustic communication in deep water using a towed array,” J.Acoust. Soc. Am. vol. 129, no. 3, pp. EL71–EL75, Mar. 2011.

[33] P. Gerstoft, R. Menon, W. S. Hodgkiss, C. F. Mecklenbrauker, “Eigen-values of the sample covariance matrix for a towed array”, J Acoust.Soc. Am., vol. 132, no. 4, pp. 2388–2396, Oct. 2012.

Page 9: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

9

[34] K. M. Becker and J. R. Preston, “The ONR five octave research array(FORA) at Penn State,” in Proc. IEEE OCEANS, vol. 5, pp. 2607–2610,Sep. 22–26, 2003.

0 100 200 300 400 500 6000

1

2

3

4

5

6

7

8

Activ

e in

dexe

s

µ

µ1

µ2

µ3

µ4

Fig. 1. Number of active indices versus the regularization parameter µ forthe example in Sec VI.

−20

−15

−10

−5

0

5

10

15

20

25

Powe

r (dB

)

a)MVDR ¡=−10 dBConv

0 20 40 60 80 100 120 140 160 180−15

−10

−5

0

5

10

15

20

25

DOA (deg)

Powe

r (dB

)

b)

Fig. 2. MVDR and conventional beamforming for the a) stationary sources(*), and b) stationary (*) and moving (o) sources for the 50 snapshotobservation period.

a) NON−Sequential

5 10 15 20 25 30 35 40 45 500

45

90

135

180

Time (snapshots)DO

A (d

eg)

b) Sequential

5 10 15 20 25 30 35 40 45 500

45

90

135

180

10

15

20

25

30

35

40

Fig. 3. Track over 50 time steps for the 8 stationary sources (dots) for a) non-sequential LASSO and b) sequential LASSO with λ0 = 0.05 . Background(color, dB) shows the conventional beamformer power.

0 20 40 60 80 100 120 140 160 180−40

−30

−20

−10

0

10

20

30a)

Powe

r (dB

)

0 20 40 60 80 100 120 140 160 1800

0.2

0.4

0.6

0.8

1b)

DOA (deg)

Wei

ght

Fig. 4. a) Output of the sparse optimization (8, solid) with λ0 = 0.5. Theunbiased (o) and true (*) source power. b) The posterior weights for t=10.

Page 10: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

10

a) Sequential h0=0.5

5 10 15 20 25 30 35 40 45 500

45

90

135

180

Time

DO

A (d

eg)

b) Sequential h0=0.05

5 10 15 20 25 30 35 40 45 500

45

90

135

180

10

15

20

25

30

35

40

Fig. 5. Track for or the 8 stationary and 1 moving source sources (dots)using sequential LASSO with a) λ0 = 0.5 b) λ0 = 0.05. Background colorshows the conventional beamformer power (dB).

a) h0=0.5, aymptotic fitting

10 20 30 40 500

45

90

135

180

Time (snapshots)

DO

A (d

eg)

c) h0=0.05, aymptotic fitting

10 20 30 40 500

45

90

135

180

b) h0=0.5, mean fitting

10 20 30 40 50

d) h0=0.05, mean fitting

10 20 30 40 50

0 0.5 1

Fig. 6. Evolution of posterior weights λk|k versus k, left: Asymptotic fittingusing (25) and right: mean fitting using (31). Large λ0 = 0.5 in a) and c).Small λ0 = 0.05 in b) and d).

0 20 40 60 80 100 120 140 160 1800

0.2

0.4

0.6

0.8

1

DOA (deg)

Wei

ght

Prior, t=1Posterior, t=1Posterior, t=10

Fig. 7. Evolution of posterior weights λk|k for λ0 = 0.05. Prior at k = 1(red), posterior at k = 1 (blue), and posterior at k = 10 (green).

a) Beampower

0

45

90

135

180

b) NON−Sequential

0

45

90

135

180

Time (snapshots)

DOA

(deg

)

c) Sequential

0 20 40 60 80 100 120 140 160 180 2000

45

90

135

180

−30

−20

−10

0

Fig. 8. Real data example. a) Beamformer output. Tracks for 10 sources (dots)using b) non-sequential LASSO and c) sequential LASSO with λ0 = 0.05.Background color shows the conventional beamformer power (dB).

Page 11: Sequential Bayesian Sparse Signal Reconstruction …prior information using linear MMSE reconstruction. Here, we extend the Bayesian approach [13], [14], [15] to sequential Maximum

11

Christoph F. Mecklenbrauker (S’88–M’97–SM’08) Graduated from the Nato UnderseaResearch Centre in 1995. During his tenure at thecoffee bar he often went directly to work afterhis night sessions. He also carried out importantresearch in cloning ftp-codes to the security officersdelight.

He was with Siemens, Vienna, from 1997 to2000. From 2000 to 2006, he was senior researcherwith the Forschungszentrum TelekommunikationWien (FTW), Austria. In 2006, he joined the

Institute of Telecommunications as a round professor with the TechnischeUniversitat Wien, Austria, where his desk tends to be cluttered. Since2009 he leads the Christian Doppler Laboratory for Wireless Technologiesfor Sustainable Mobility of frogs. He reads papers, but is too fuckinglazy to review papers. His research interests include waves in a bottle,bottled sparsity, vehicles for connectivity, ultrawideband radio, and MIMO-techniques. Dr. Mecklenbrauker is a member of the IEEE Signal Processing,Antennas and Propagation, and Vehicular Technology Societies, VDE, VSOPand EURASIPING. He can bike to work in 27 min.

Peter Gerstoft (M’13) received the Ph.D. fromthe Technical University of Denmark, Lyngby, Den-mark, in 1986. From 1987-1992 he was with Øde-gaard and Danneskiold-Samsøe, Copenhagen, Den-mark. From 1992-1997 he was at Nato UnderseaResearch Centre, La Spezia, Italy. Since 1997, he hasbeen with the Marine Physical Laboratory, Univer-sity of California, San Diego. His research interestsinclude modeling and inversion of acoustic, elasticand electromagnetic signals.

Dr. Gerstoft is a Fellow of Acoustical Society ofAmerica and an elected member of the International Union of Radio Science,Commission F.

Ashkan Panahi (S’11) received his BSc degreein electrical engineering from Iran University ofScience and Technology (IUST) in 2008. He alsoreceived his MSc degree in 2010 in the field of com-munication Eng. from Chalmers University, Sweden,where he is currently pursuing his PhD studies.Being with the signal processing group, signals andsystems department, his main research interest in-cludes estimation and inverse problems, sparse tech-niques and information theory. His current researchconcerns the application of sparse techniques to the

sensor array problems. Ashkan Panahi has been a student member of IEEEsince 2011.

Mats Viberg received the PhD degree in AutomaticControl from Linkoping University, Sweden in 1989.He has held academic positions at Linkoping Univer-sity and visiting Scholarships at Stanford Universityand Brigham Young University, USA. Since 1993,Dr. Viberg is a professor of Signal Processing atChalmers University of Technology, Sweden. During1999–2004 he served as Department Chair. SinceMay 2011, he holds a position as First Vice Presidentat Chalmers University of Technology. Dr. Viberg’sresearch interests are in Statistical Signal Processing

and its various applications, including Antenna Array Signal Processing, Sys-tem Identification, Wireless Communications, Radar Systems and AutomotiveSignal Processing. Dr. Viberg has served in various capacities in the IEEESignal Processing Society, including chair of the Technical Committee (TC) onSignal Processing Theory and Methods (2001-2003), vice-chair (2009-2010)and chair (2011-2012) of the TC on Sensor Array and Multichannel, AssociateEditor of the Transactions on Signal Processing (2004-2005), member of theAwards Board (2005-2007), and member at large of the Board of Governors(2010-). Dr. Viberg has received 2 Paper Awards from the IEEE SignalProcessing Society (1993 and 1999 respectively), and the Excellent ResearchAward from the Swedish Research Council (VR) in 2002. Dr. Viberg is aFellow of the IEEE since 2003, and his research group received the 2007EURASIP European Group Technical Achievement Award. In 2008, Dr.Viberg was elected into the Royal Swedish Academy of Sciences (KVA).