petrels: parallel subspace estimation and tracking by ... · arxiv:1207.6353v2 [stat.me] 11 jan...

12
arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations Yuejie Chi * , Yonina C. Eldar and Robert Calderbank Abstract—Many real world datasets exhibit an embedding of low-dimensional structures in a high-dimensional manifold. Examples include images, videos and internet traffic data. It is of great significance to reduce the storage requirements and com- putational complexity when the data dimension is high. Therefore we consider the problem of reconstructing a data stream from a small subset of its entries, where the data is assumed to lie in a low-dimensional linear subspace, possibly corrupted by noise. We further consider tracking the change of the underlying subspace, which can be applied to applications such as video denoising, network monitoring and anomaly detection. Our setting can be viewed as a sequential low-rank matrix completion problem in which the subspace is learned in an online fashion. The proposed algorithm, dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS), first identifies the underlying low-dimensional subspace, and then reconstructs the missing entries via least-squares estimation if required. Subspace identification is perfermed via a recursive procedure for each row of the subspace matrix in parallel with discounting for previous observations. Numerical examples are provided for direction-of- arrival estimation and matrix completion, comparing PETRELS with state of the art batch algorithms. Index Terms—subspace estimation and tracking, recursive least squares, matrix completion, partial observations, online algorithms I. I NTRODUCTION Many real world datasets exhibit an embedding of low- dimensional structures in a high-dimensional manifold. When the embedding is assumed linear, the underlying low- dimensional structure becomes a linear subspace. Subspace Identification and Tracking (SIT) plays an important role in various signal processing tasks such as online identification of network anomalies [1], moving target localization [2], beam- forming [3], and denoising [4]. Conventional SIT algorithms collect full measurements of the data stream at each time, and subsequently update the subspace estimate by utilizing Y. Chi is with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210, USA (email: [email protected]). Y. C. Eldar is with the Department of Electrical Engineering, Tech- nion, Israel Institute of Technology, Haifa 32000, Israel (email: yon- [email protected]). R. Calderbank is with the Department of Computer Science, Duke Univer- sity, Durham, NC 27708, USA (email: [email protected]). The work of Y. Chi and R. Calderbank was supported by ONR under Grant N00014-08-1-1110, by AFOSR under Grant FA 9550-09-1-0643, and by NSF under Grants NSF CCF -0915299 and NSF CCF-1017431. The work of Y. Eldar was supported in part by the Ollendorf foundation, and by the Israel Science Foundation under Grant 170/10. A preliminary version of this work was presented at the 2012 International Conference on Acoustics, Speech, and Signal Processing (ICASSP). the track record of the stream history in different ways [5], [6]. Recent advances in Compressive Sensing (CS) [7], [8] and Matrix Completion (MC) [9], [10] have made it possible to infer data structure from highly incomplete observations. Compared with CS, which allows reconstruction of a single vector from only a few attributes by assuming it is sparse in a pre-determined basis or dictionary, MC allows reconstruction of a matrix from a few entries by assuming it is low rank. A popular method to perform MC is to minimize the nuclear norm of the corresponding matrix [9], [10] that the observed entries are satisfied. This method requires no prior knowledge of rank, in a similar spirit with 1 minimization [11] for sparse recovery in CS. Other approaches including greedy algorithms such as OptSpace [12] and ADMiRA [13] require an estimate of the matrix rank for initialization. Identifying the underlying low-rank structure in MC is equivalent to subspace identification in a batch setting. When the number of observed entries is slightly larger than the subspace rank, it has been shown that with high probability, it is possible to test whether a highly incomplete vector of interest lies in a known subspace [14]. Recent works on covariance matrix and principal components analysis of a dataset with missing entries also validate that it is possible to infer the principal components with high probability [15], [16]. In high-dimensional problems, it might be expensive and even impossible to collect data from all dimensions. For example in wireless sensor networks, collecting data from all sensors continuously will quickly drain the battery power. Ideally, we would prefer to obtain data from a fixed budget of sensors of each time to increase the overall battery life, and still be able to identify the underlying structure. Another example is in online recommendation systems, where it is impossible to expect rating feedbacks from all users on every product are available. Therefore it is of growing interest to identify and track a low-dimensional subspace from highly incomplete information of a data stream in an online fashion. In this setting, the estimate of the subspace is updated and tracked across time when new observations become available with low computational cost. The GROUSE algorithm [17] has been recently proposed for SIT from online partial observa- tions using rank-one updates of the estimated subspace on the Grassmannian manifold. However, performance is limited by the existence of “barriers” in the search path [18] which result in GROUSE being trapped at a local minima. We demonstrate this behavior through numerical examples in Section VI in the

Upload: others

Post on 12-Sep-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

arX

iv:1

207.

6353

v2 [

stat

.ME

] 11

Jan

201

31

PETRELS: Parallel Subspace Estimation andTracking by Recursive Least Squares from Partial

ObservationsYuejie Chi∗, Yonina C. Eldar and Robert Calderbank

Abstract—Many real world datasets exhibit an embeddingof low-dimensional structures in a high-dimensional manifold.Examples include images, videos and internet traffic data. It isof great significance to reduce the storage requirements andcom-putational complexity when the data dimension is high. Thereforewe consider the problem of reconstructing a data stream fromasmall subset of its entries, where the data is assumed to lie in alow-dimensional linear subspace, possibly corrupted by noise. Wefurther consider tracking the change of the underlying subspace,which can be applied to applications such as video denoising,network monitoring and anomaly detection. Our setting canbe viewed as a sequential low-rank matrix completion problemin which the subspace is learned in an online fashion. Theproposed algorithm, dubbed Parallel Estimation and Trackingby REcursive Least Squares (PETRELS), first identifies theunderlying low-dimensional subspace, and then reconstructs themissing entries via least-squares estimation if required.Subspaceidentification is perfermed via a recursive procedure for each rowof the subspace matrix in parallel with discounting for previousobservations. Numerical examples are provided for direction-of-arrival estimation and matrix completion, comparing PETRELSwith state of the art batch algorithms.

Index Terms—subspace estimation and tracking, recursiveleast squares, matrix completion, partial observations, onlinealgorithms

I. I NTRODUCTION

Many real world datasets exhibit an embedding of low-dimensional structures in a high-dimensional manifold. Whenthe embedding is assumed linear, the underlying low-dimensional structure becomes a linear subspace. SubspaceIdentification and Tracking (SIT) plays an important role invarious signal processing tasks such as online identification ofnetwork anomalies [1], moving target localization [2], beam-forming [3], and denoising [4]. Conventional SIT algorithmscollect full measurements of the data stream at each time,and subsequently update the subspace estimate by utilizing

Y. Chi is with the Department of Electrical and Computer Engineering, TheOhio State University, Columbus, OH 43210, USA (email: [email protected]).

Y. C. Eldar is with the Department of Electrical Engineering, Tech-nion, Israel Institute of Technology, Haifa 32000, Israel (email: [email protected]).

R. Calderbank is with the Department of Computer Science, Duke Univer-sity, Durham, NC 27708, USA (email: [email protected]).

The work of Y. Chi and R. Calderbank was supported by ONR underGrantN00014-08-1-1110, by AFOSR under Grant FA 9550-09-1-0643,and by NSFunder Grants NSF CCF -0915299 and NSF CCF-1017431. The work of Y.Eldar was supported in part by the Ollendorf foundation, andby the IsraelScience Foundation under Grant 170/10.

A preliminary version of this work was presented at the 2012 InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP).

the track record of the stream history in different ways [5],[6].

Recent advances in Compressive Sensing (CS) [7], [8] andMatrix Completion (MC) [9], [10] have made it possibleto infer data structure from highly incomplete observations.Compared with CS, which allows reconstruction of a singlevector from only a few attributes by assuming it is sparse in apre-determined basis or dictionary, MC allows reconstructionof a matrix from a few entries by assuming it is low rank.A popular method to perform MC is to minimize the nuclearnorm of the corresponding matrix [9], [10] that the observedentries are satisfied. This method requires no prior knowledgeof rank, in a similar spirit withℓ1 minimization [11] forsparse recovery in CS. Other approaches including greedyalgorithms such as OptSpace [12] and ADMiRA [13] requirean estimate of the matrix rank for initialization. Identifyingthe underlying low-rank structure in MC is equivalent tosubspace identification in a batch setting. When the numberof observed entries is slightly larger than the subspace rank,it has been shown that with high probability, it is possibleto test whether a highly incomplete vector of interest lies ina known subspace [14]. Recent works on covariance matrixand principal components analysis of a dataset with missingentries also validate that it is possible to infer the principalcomponents with high probability [15], [16].

In high-dimensional problems, it might be expensive andeven impossible to collect data from all dimensions. Forexample in wireless sensor networks, collecting data fromall sensors continuously will quickly drain the battery power.Ideally, we would prefer to obtain data from a fixed budgetof sensors of each time to increase the overall battery life,and still be able to identify the underlying structure. Anotherexample is in online recommendation systems, where it isimpossible to expect rating feedbacks from all users on everyproduct are available. Therefore it is of growing interest toidentify and track a low-dimensional subspace from highlyincomplete information of a data stream in an online fashion.In this setting, the estimate of the subspace is updated andtracked across time when new observations become availablewith low computational cost. The GROUSE algorithm [17] hasbeen recently proposed for SIT from online partial observa-tions using rank-one updates of the estimated subspace on theGrassmannian manifold. However, performance is limited bythe existence of “barriers” in the search path [18] which resultin GROUSE being trapped at a local minima. We demonstratethis behavior through numerical examples in Section VI in the

Page 2: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

2

context of direction-of-arrival estimation.In this paper we further study the problem of SIT given

partial observations from a data stream as in GROUSE. Ourproposed algorithm is dubbed Parallel Estimation and Trackingby REcursive Least Squares (PETRELS). The underlyinglow-dimensional subspace is identified by minimizing thegeometrically discounted sum of projection residuals on theobserved entries per time index, via a recursive procedurewith discounting for each row of the subspace matrix inparallel. The missing entries are then reconstructed via least-squares estimation if required. The discount factor balances thealgorithm’s ability to capture long term behavior and changesto that behavior to improve adaptivity. We also benefit fromthe fact that our optimization of the estimated subspace ison all the possible low-rank subspaces, not restricted to theGrassmannian manifold. In the partial observation scenario,PETRELS always converges locally to a stationary point sinceit it a second-order stochastic gradient descent algorithm.In the full observation scenario, we prove that PETRELSactually converges to the global optimum by revealing itsconnection with the well-known Projection ApproximationSubspace Tracking (PAST) algorithm [5]. Finally, we providenumerical examples to measure the impact of the discountfactor, estimated rank and number of observed entries. Inthe context of direction-of-arrival estimation we demonstratesuperior performance of PETRELS over GROUSE in termsof separating close-located modes and tracking changes in thescene. We also compare PETRELS with state of the art batchMC algorithms, showing it as a competitive alternative whenthe subspace is fixed.

The rest of the paper is organized as follows. Section IIstates the problem and provides background in the contextof matrix completion and conventional subspace tracking.Section III describes the algorithm in details. Two extensionsof PETRELS to improve robustness and reduce complexityare presented in Section IV. We discuss convergence issuesof PETRELS in the full observation scenario in Section V.Section VI shows the numerical results and we conclude thepaper in Section VII.

II. PROBLEM STATEMENT AND RELATED WORK

A. Problem Statement

We consider the following problem. At each timet, a vectorxt ∈ RM is generated as:

xt = Utat + nt ∈ RM , (1)

where the columns ofUt ∈ RM×rt span a low-dimensional

subspace, the vectorat ∈ Rrt specifies the linear combinationof columns and is Gaussian distributed asat ∼ N (0, Irt),and nt is an additive white Gaussian noise distributed asnt ∼ N (0, σ2IM ). The rank of the underlying subspacertis not assumed known exactly and can be slowly changingover time. The entries in the vectorsxt can be consideredas measurements from different sensors in a sensor network,values of different pixels from a video frame, or movie ratingsfrom each user.

We assume only partial entries of the full vectorxt areobserved, given by

yt = pt ⊙ xt = Ptxt ∈ RM , (2)

where⊙ denotes point-wise multiplication,Pt = diagpt,pt = [p1t, p2t, · · · , pMt]

T ∈ 0, 1M with pmt = 1 if the mthentry is observed at timet. We denoteΩt = m : pmt = 1 asthe set of observed entries at timet. In a random observationmodel, we assume the measurements are taken uniformly atrandom.

We are interested in an online estimate of a low-rank sub-spaceDn ∈ RM×r at each time indexn, which identifies andtracks the changes in the underlying subspace, from streamingpartial observations(yt,pt)

nt=1. The rank of the estimated

subspaceDn is assumed known and fixed throughout thealgorithm asr. In practice, we assume the upper bound of therank of the underlying subspaceUt is known, andsupt rt ≤ r.The desired properties for the algorithm include:

• Low complexity:each step of the online algorithm attime indexn should be adaptive with small complexitycompared to running a batch algorithm using history data;

• Small storage:The online algorithm should require astorage size that does not grow with the data size;

• Convergence:The subspace sequence generated by theonline algorithm should converge to the true subspaceUt = U if it is constant.

• Adaptivity: The online algorithm should be able to trackthe changes of the underlying subspace in a timelyfashion.

B. Conventional Subspace Identification and Tracking

When xt’s are fully observed, our problem is equivalentto the classical SIT problem, which is widely studied andhas a rich literature in the signal processing community.Here we describe the Projection Approximation SubspaceTracking (PAST) algorithm in details which is the closest toour proposed algorithm in the conventional scenario.

First, consider optimizing the scalar function with respectto a subspaceW ∈ R

M×r, given by

J(W) = E‖xt −WWTxt‖22. (3)

When Ut = U is fixed over time, letCx = E[xtxTt ] =

UUT + σ2IM be the data covariance matrix. It is shown in[5] that the global minima of (3) is the only stable stationarypoint, and is given byW = UrQ, whereUr is composed ofther dominant eigenvectors ofCx, andQ ∈ Cr×r is a unitarymatrix. Without loss of generality, we can chooseUr = U.This motivates PAST to optimize the following function attime n without constrainingW to have orthogonal columns:

Wn = argminW∈RM×r

n∑

t=1

αn−t‖xt −WWTxt‖22, (4)

≈ argminW∈RM×r

n∑

t=1

αn−t‖xt −WWTn−1xt‖

22, (5)

where the expectation in (3) is replaced by geometricallyreweighting the previous observations byα in (4), and is

Page 3: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

3

further approximated by replacing the secondW by itsprevious estimate in (5). Based on (5), the subspaceWn canbe found by first estimating the coefficient vectoran using theprevious subspace estimate asan = WT

n−1xn, and updatingthe subspace as

Wn = argminW∈RM×r

n∑

t=1

αn−t‖xt −Wat‖22. (6)

Suppose thatα = 1 and denoteRn =∑n

i=1 anaTn . In [19],

the asymptotic dynamics of the PAST algorithm is describedby its equilibrium as time goes to infinity using the OrdinaryDifferential Equation (ODE) below:

R = E[anaTn ]−R = WTCxW −R,

W = E[xn(xn −Wan)T ]R† = (I−WWT )CxWR†,

where an = WTxn, R = R(t) and W = W(t) arecontinuous time versions ofRn andWn, and† denotes thepseudo-inverse. It is proved in [19] that ast increases,W(t)converges to the global optima, i.e. to a matrix which spans theeigenvectors ofCx corresponding to ther largest eigenvalues.In Section V we show that our proposed PETRELS algorithmbecomes essentially equivalent to PAST when all entries ofthe data stream are observed, and can be shown to convergeglobally.

The PAST algorithm belongs to the class of power-basedtechniques, which include the Oja’s method [20], the NovelInformation Criterion (NIC) method [21] and etc: Thesealgorithms are treated under a unified framework in [22] withslight variations for each algorithm. The readers are referredto [22] for details. In general, the estimate of the low-ranksubspaceWn ∈ RM×r is updated at timen as

Wn = CnWn−1(WTn−1C

2nWn−1)

−1/2, (7)

whereCn is the sample data covariance matrix updated from

Cn = αnCn−1 + xnxTn , (8)

and αn is a parameter between0 and 1. The normalizationin (7) assures that the updated subspaceWn is orthogonalbut this normalization is not performed strictly in differentalgorithms.

It is shown in [22] that these power-based methods guar-antee global convergence to the principal subspace spannedby eigenvectors corresponding to ther largest eigenvalues ofCx. If the entries of the data vectorxt’s are fully observed,thenCn converges toCx very fast, and this is exactly why thepower-based methods perform very well in practice. When thedata is highly incomplete, the convergence of (8) is very slowsince only a small fraction|Ωn|2/n2 of entries inCn−1 areupdated, where|Ωn| is the number of observed entries at timen, making direct adoption of the above method unrealistic inthe partial observation scenario.

C. Matrix Completion

When only partial observations are available andUt = U

are fixed, our problem is closely related to the Matrix Com-pletion (MC) problem, which has been extensively studied

recently. AssumeX ∈ RM×n is a low-rank matrix,P is abinary M × n mask matrix with0 at missing entries and1at observed entries. LetY = P ⊙ X = [y1, . . . ,yn] be theobserved partial matrix where the missing entries are filledinas zero, and⊙ denotes point-wise multiplication. MC aims tosolve the following problem:

minZ

rank(Z) s.t. Y −P⊙ Z = 0, (9)

i.e. to find a matrix with the minimal rank such that theobserved entries are satisfied. This problem is combinatoriallyintractable due to the rank constraint.

It has been shown in [9] that by replacing the rank constraintwith nuclear norm minimization, (9) can be solved by a convexoptimization problem, resulting in the following spectral-regularized MC problem:

minZ

1

2‖Y −P⊙ Z‖2F + µ‖Z‖∗, (10)

where‖Z‖∗ is the nuclear norm ofZ, i.e. the sum of singularvalues ofZ, andµ > 0 is a regularization parameter. Undermild conditions, the solution of (10) is the same as that of (9)[9]. The nuclear norm [23] ofZ is given by

‖Z‖∗ = minU,V:Z=UVT

1

2

(‖U‖2F + ‖V‖2F

)(11)

whereU ∈ CM×r andV ∈ Cn×r. Substituting (11) in (10)we can rewrite the MC problem as

minU,V

‖P⊙ (X−UV)‖2F + µ(‖U‖2F + ‖V‖2F

). (12)

Our problem formulation can be viewed as an online wayof solving the above batch-setting MC problem. Consider arandom processnt where eachnt is drawn uniformly from1, . . . , n, and a data stream is constructed where the data ateach time is given asxnt

, i.e. thentth column ofX. Comparedwith (1), the subspace is fixed asUt = U since we drawcolumns from a fixed low-rank matrix. Each time we onlyobserve partial entries ofxnt

, given asynt= pnt

⊙xnt, where

Pntis thentth column ofP. The problem of MC becomes

equivalent to retrieving the underlying subspaceU from thedata stream(ynt

,pnt)∞t=1. After estimatingU, the low-rank

matrix X can be recovered via least-squares estimation. Theonline treatment of the batch MC problem has potentialadvantages for avoiding large matrix manipulations. We willcompare the PETRELS algorithm against some of the popularMC methods in Section VI.

III. T HE PETRELS ALGORITHM

We now describe our proposed Parallel Estimation andTracking by REcursive Least Squares (PETRELS) algorithm.

A. Objective Function

We first define the functionft(D) at each timet = 1, · · · , nfor a fixed subspaceD ∈ RM×r, which is the total projectionresidual on the observed entries,

ft(D) = minat

‖Pt(xt −Dat)‖22, t = 1, · · · , n. (13)

Page 4: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

4

Here r is the rank of the estimated subspace, which isassumed known and fixed throughout the algorithm1. We aimto minimize the following loss function at each timen withrespect to the underlying subspace:

Dn = argminD∈RM×r

Fn(D) = argminD∈RM×r

n∑

t=1

λn−tft(D), (14)

whereDn is the estimated subspace of rankr at timen, andthe parameter0 ≪ λ ≤ 1 discounts past observations.

Before developing PETRELS we note that if there arefurther constraints on the coefficientsat’s, a regularizationterm can be incorporated as:

ft(D) = minat∈Rr

‖Pt(Dat − xt)‖22 + β‖at‖p, (15)

wherep ≥ 0. For example,p = 1 enforces a sparse constrainton at, andp = 2 enforces a norm constraint onat.

In (14) the discount factorλ is fixed, and the influence ofpast estimates decreases geometrically; a more general onlineobjective function can be given as

Fn(D) = λnFn−1(D) + fn(D), (16)

where the sequenceλn is used to control the memory andadaptivity of the system in a more flexible way.

To motivate the loss function in (14) we note that ifUt = U

is not changing over time, then the RHS of (14) is minimizedto zero whenDn spans the subspace defined byU. If Ut isslowly changing, thenλ is used to control the memory of thesystem and maintain tracking ability at timen. For example,by usingλ → 1 the algorithm gradually loses its ability toforget the past.

Fixing D, ft(D) can be written as

ft(D) = xTt

(Pt −PtD(DTPtD)†DTPt

)xt. (17)

Plugging this back to (14) the exact optimization problembecomes:

Dn = argminD∈RM×r

n∑

t=1

λn−txTt

[Pt −PtD(DTPtD)†DTPt

]xt.

This problem is difficult to solve overD and requires storingall previous observations. Instead, we propose PETRELS toapproximately solve this optimization problem.

B. PETRELS

The proposed PETRELS algorithm, as summarized byAlgorithm 1, alternates between coefficient estimation andsubspace update at each timen. In particular, the coefficientvectoran is estimated by minimizing the projection residualon the previous subspace estimateDn−1:

an = argmina∈Rr

‖Pn(xn −Dn−1a)‖22

= (DTn−1PnDn−1)

†DTn−1yn, (18)

1The rank may not equal the true subspace dimension.

Algorithm 1 PETRELS for SIT from Partial ObservationsInput: a stream of vectorsyt and observed patternPt.Initialization: anM×r random matrixD0, and(R0

m)† = δIr,δ > 0 for all m = 1, · · · ,M .

1: for n = 1, 2, · · · do2: an = (DT

n−1PnDn−1)†DT

n−1yn.3: xn = Dn−1an.4: for m = 1, · · · ,M do5: βn

m = 1 + λ−1aTn (Rn−1m )†an,

6: vnm = λ−1(Rn−1

m )†an,7: (Rn

m)† = λ−1(Rn−1m )† − pmt(β

nm)−1vn

m(vnm)T ,

8: dnm = dn−1

m + pmn(xmn − aTndn−1m )(Rn

m)†an.9: end for

10: end for

whereD0 is a random subspace initialization. The full vectorxn is then estimated as:

xn = Dn−1an. (19)

The subspaceDn is then updated by minimizing

Dn = argminD

n∑

t=1

λn−t‖Pt(xt −Dat)‖22, (20)

whereat, t = 1, · · · , n are estimates from (18). Comparing(20) with (14), the optimal coefficients are substituted forthe previous estimated coefficients. This results in a simplerproblem for findingDn. The discount factor mitigates the errorpropagation and compensates for the fact that we used theprevious coefficients updated rather than solving (14) directly,therefore improving the performance of the algorithm.

The objective function in (20) can be equivalently de-composed into a set of smaller problems for each row ofDn = [dn

1 ,dn2 , · · · ,d

nM ]T as

dnm = argmin

dm

n∑

t=1

λn−tpmt(xmt − aTt dm)2, (21)

for m = 1, · · · ,M . To find the optimaldnm, we equate the

derivative of (21) to zero, resulting in(

n∑

t=1

λn−tpmtataTt

)dnm −

n∑

t=1

λn−tpmtxmtat = 0.

This equation can be rewritten as

Rnmdn

m = snm, (22)

where Rnm =

∑nt=1 λ

n−tpmtataTt and snm =∑n

t=1 λn−tpmtxmtat. Therefore,dn

m can be found as

dnm = (Rn

m)†snm. (23)

WhenRnm is not invertible, (23) is the least-norm solution to

dnm.We now show how (22) can be updated recursively. First

we rewrite

Rnm = λRn−1

m + pmnanaTn , (24)

snm = λsn−1m + pmnxmnan, (25)

Page 5: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

5

for all m = 1, · · · ,M . Then we plug (24) and (25) into (22),and get

Rnmdn

m = λsn−1m + pmnxmnan

= λRn−1m dn−1

m + pmnxmnan

= Rnmdn−1

m − pmnanaTnd

n−1m + pmnxmnan

= Rnmdn−1

m + pmn(xmn − aTndn−1m )an, (26)

wheredn−1m is the row estimate in the previous timen − 1.

This results in a parallel procedure to update all rows of thesubspace matrixDn, give as

dnm = dn−1

m + pmn(xmn − aTndn−1m )(Rn

m)†an. (27)

Finally, by the Recursive Least-Squares (RLS) updating for-mula for the general pseudo-inverse matrix [24], [25],(Rn

m)†

can be easily updated without matrix inversion using

(Rnm)† = (λRn−1

m + pmnanaTn )

= λ−1(Rn−1m )† − pmtG

nm. (28)

HereGnm = (βn

m)−1vnm(vn

m)T , with βnm andvn

m given as

βnm = 1 + λ−1aTn (R

n−1m )†an,

vnm = λ−1(Rn−1

m )†an.

To enable the RLS procedure, the matrix(R0m)† is initial-

ized as a matrix with large entries on the diagonal, which wechoose arbitrarily as the identity matrix(R0

m)† = δIr, δ > 0for all m = 1, · · · ,M . It is worth noting that implementationof the fast RLS update rules is in general very efficient.However, caution needs to be taken since direct applicationof fast RLS algorithms suffer from numerical instability offinite-precision operations when running for a long time [26].

C. Second-Order Stochastic Gradient Descent

The PETRELS algorithm can be regarded as a second-orderstochastic gradient descent method to solve (14) by usingdn−1m , m = 1, · · · ,M as a warm start at timen. Specifically,

we can write the gradient offn(D) in (13) atDn−1 as

∂fn(D)

∂D

∣∣∣D=Dn−1

= −2Pn(xn −Dn−1an)aTn , (29)

where an is given in (18). Then the gradient ofFn(D) atDn−1 is given as

∂Fn(D)

∂D

∣∣∣D=Dn−1

= −2

n∑

t=1

λn−tPt(xt −Dn−1at)aTt .

The Hessian for each row ofD at dn−1m is therefore

Hn(dn−1m , λ) =

∂2Fn(D)

∂dm∂dTm

∣∣∣dm=d

n−1

m

= 2

n∑

t=1

λn−tpmtataTt . (30)

It follows that the update rule for each rowdm given in (27)can be written as

dnm = dn−1

m −Hn(dn−1m , λ)−1 ∂fn(D)

∂dn−1m

, (31)

which is equivalent to second-order stochastic gradient de-scent. Therefore, PETRELS converges to a stationary pointof Fn(D) [27], [28]. Compared with first-order algorithms,PETRELS enjoys a faster convergence speed to the stationarypoint [27], [28].

D. Comparison with GROUSE

The GROUSE algorithm [17] proposed by Balzano et.al. addresses the same problem of online identification oflow-rank subspace from highly incomplete information. TheGROUSE method can be viewed as optimizing (14) forλ = 1at each timen using afirst-orderstochastic gradient descent onthe orthogonalGrassmannian defined asGr = D ∈ R

M×r :DTD = Ir instead ofRM×r. Thus, GROUSE aims to solvethe following optimization problem,

Dn = argminD∈Gr

Gn(D) = argminD∈Gr

n∑

t=1

ft(D). (32)

GROUSE updates the subspace estimate along the directionof ∇ft(D)|D=Dn−1

on Gr, given by

Dn = Dn−1 −[(cos(σηn)− 1)

xt

‖xn‖2+

sin(σηt)rt

‖rt‖2

] aTn‖an‖2

, (33)

where σ = ‖xt‖2‖rt‖2, and ηn is the step-size at timen.At each step GROUSE also alternates between coefficientestimation (18) and subspace update (33). Moreover, theresulting algorithm is a fast rank-one update onDn−1 at eachtimen. Given that it is a first-order gradient descent algorithm,convergence to a stationary point but not global optimal isguaranteed under mild conditions on the step-size. Specifically,if the step size satisfies

limn→∞

ηn = 0 and∞∑

t=1

ηt = ∞, (34)

then GROUSE is guaranteed to converge to a stationary pointof Gn(D). However, due to the existence of “barriers” inthe search path on the Grassmannian [18], GROUSE maybe trapped at a local minima as shown in Section VI inthe example of direction-of-arrival estimation. AlthoughbothPETRELS and GROUSE have a tuning parameter, comparedwith the step-size in GROUSE, the discount factor in PE-TRELS is an easier parameter to tune. For example, withoutdiscounting (i.e.λ = 1) PETRELS can still converge to theglobal optimal given full observations as shown in Section V,while this is impossible to achieve with a first-order algorithmlike GROUSE if the step size is not tuned properly to satisfy(34).

If we relax the objective function of GROUSE (32) to allrank-r subspacesRM×r, given as

Dn = argminD∈RM×r

n∑

t=1

ft(D), (35)

then the objective function becomes equivalent to PETRELSwithout discounting. It is possible to use a different formula-

Page 6: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

6

tion of second-order stochastic gradient descent with step-sizeto solve (35), yielding the update rule for each row ofDn as

dnm = dn−1

m − γnHn(dn−1m , λ = 1)−1 ∂fn(D)

∂dn−1m

, (36)

whereHn(dn−1m , λ = 1) is given in (30), andγn is the step-

size at timen. Compared with the update rule for PETRELSin (31), the discount parameter has a similar role as thestep-size, but weights the contribution of previous data inputgeometrically. However, in this paper we didn’t investigate theperformance of this alternative update rule in (36).

E. Complexity Issues

We compare both storage complexity and computationalcomplexity for PETRELS, GROUSE and the PAST algorithm.The storage complexity of PAST and GROUSE isO(Mr),which is the size of the low-rank subspace. On the otherhand, PETRELS has a larger storage complexity ofO(Mr2),which is the total size ofRn

m’s for each row. In terms ofcomputational complexity, PAST has a complexity ofO(Mr),while PETRELS and GROUSE have a similar complexity onthe order ofO(|Ωt|r2), where the main complexity comesfrom computation of the coefficient (18). This indicates an-other merit of dealing with partial observations, i.e. to reducecomputational complexity when the dimension is high.

IV. EXTENSIONS OF THEPETRELS ALGORITHM

A. Simplified PETRELS

In the subspace update step of PETRELS in (20), considerreplacing the objective function in (14) by

Dn = argminD

Fn(D)

= argminD

n∑

t=1

λn−t‖xt −Dat‖22, (37)

whereat andxt, t = 1, · · · , n are estimates from earlier stepsin (18) and (19). The only change we made is to removethe partial observation operator from the objective function,and replace it by the full vector estimate. It remains true thatdnm = argmindm

Fn(dm) = dn−1m if the correspondingmth

entry ofxn is unobserved, i.e.m /∈ Ωn, since

Fn(dm) =

n−1∑

t=1

λn−t‖xmt − dTmat‖

22 + ‖(dn−1

m − dm)Tat‖22,

= λFn−1(dm) + ‖(dn−1m − dm)Tat‖

22

≥ λFn−1(dn−1m ) = Fn(d

n−1m )

is minimized whendm = dn−1m for m /∈ Ωn.

This modification leads to a simplified update rule forRnm,

since now the updating formula for all rowsdm’s is the same,whereRn

m = Rn = λRn−1 + anaTn for all m. The row

updating formula (27) is replaced by

Dn = Dn−1 +Pn(xn −Dn−1an)aTnR

†n, (38)

which further saves storage requirement for the PETRELSalgorithm from O(Mr2), to O(Mr) which is the size of

the subspace. We compared the performance of the simplifiedPETRELS against PETRELS in Section VI, which convergesslower than PETRELS but might have an advantage if thesubspace rank is underestimated.

B. Incorporating Prior Information

It is possible to incorporate regularization terms into PE-TRELS to encode prior information about the data stream.Here we outline the regularization on the subspaceD in thesubspace updatestep, such that at each timen, Dn is updatedvia

Dn = argminD

n∑

t=1

λn−t‖Pt(xt −Dat)‖22 + µn‖D‖2F , (39)

where µn > 0 is the regularization parameter. Similar asPETRELS, (39) can be decomposed for each row ofD =[d1,d2, · · · ,dM ]T as

dnm = argmin

dm

n∑

t=1

λn−tpmt(xmt − aTt dm)2 + µn‖dm‖22

=

(n∑

t=1

λn−tpmtataTt + µnI

)−1( n∑

t=1

λn−tpmtxmtat

)

= (Tnm)−1snm.

The matrixTnm can be updated as

Tnm = λTn−1

m + pmnataTt + (µn − λµn−1)Ir,

and snm can be updated as (25). However the fast RLSalgorithm no longer applies here, so additional complexityfor matrix inversion is required. It is worth noticing that (39)closely resembles the matrix completion formula (12) whenV is fixed and composed of columns ofat’s.

V. GLOBAL CONVERGENCEWITH FULL OBSERVATION

In the partial observation regime, the PETRELS algorithmalways converges to a stationary point ofFn(D), given it’sa second-order stochastic gradient descent method in Sec-tion III-C, but whether it converges to the global optimalremains open. However, in the full observation regime, i.e.yn = xn for all n, we can show that the PETRELS algorithmconverge globally as below.

In this case, PETRELS becomes essentially equivalent tothe conventional PAST algorithm [5] for SIT except that thecoefficient is estimated differently. Specifically, in PASTit isestimated asan = DT

n−1yn = DTn−1xn, while in PETRELS

it is estimated asan = (DTn−1Dn−1)

−1DTn−1xn.

Now let λ = 1, similar to PAST in [19], the asymptoticdynamics of the PETRELS algorithm can be described by theODE below,

R = E[anaTn ]−R

= (DTD)−1DTCxD(DTD)−1 −R, (40)

D = E[xn(xn −Dan)T ]R†

= (I−D(DTD)−1DT )CxD(DTD)−1R−1. (41)

Page 7: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

7

Here an = (DTD)−1DTxn, R = R(t) and D = D(t)are continuous-time versions ofRn andDn. Now let D =D(DTD)−1/2 andR = (DTD)1/2R(DTD)1/2. From (41),

DT D = DT (I−D(DTD)−1DT )CxD(DTD)−1R−1 = 0,

andd

dt(DTD) = DT D+ DTD = 0,

furthermored

dtf(DTD) = 0

for any function ofDTD. Hence,

˙D = D(DTD)−1/2 +D

d

dt(DTD)−1/2 = D(DTD)−1/2,

and

˙R =

d

dt(DTD)−1/2R(DTD)1/2 + (DTD)1/2R(DTD)1/2

+ (DTD)1/2Rd

dt(DTD)1/2 = (DTD)1/2R(DTD)1/2.

Therefore (40) and (41) can be rewritten as

˙R = DTCxD− R,

˙D = (I− DDT )CxDR†,

which is equivalent to the ODE of PAST. Hence we concludethat PETRELS will converge to the global optima in the samedynamic as the PAST algorithm.

VI. N UMERICAL RESULTS

Our numerical results fall into four parts. First we exam-ine the influence of parameters specified in the PETRELSalgorithm, such as discount factor, rank estimation, and itsrobustness to noise level. Next we look at the problem ofdirection-of-arrival estimation and show PETRELS demon-strates performance superior to GROUSE by identifying andtracking all the targets almost perfectly even in low SNR.Thirdly, we compare our approach with matrix completion,and show that PETRELS is at least competitive with stateof the art batch algorithms. Finally, we provide numericalsimulations for the extensions of the PETRELS algorithm.

A. Choice of Parameters

At each timet, a vectorxt is generated as

xt = Dtrueat + nt, t = 1, 2, · · · (42)

whereDtrue is anr-dimensional subspace generated with i.i.d.N (0, 1) entries,at is anr×1 vector with i.i.d.N (0, 1) entries,andnt is anm× 1 Gaussian noise vector with i.i.d.N (0, ǫ2)entries. We further fix the signal dimensionm = 500 and thesubspace rankrtrue = 10. We assume that a fixed numberof entries inxt, denoted byK, are revealed each time. Thisrestriction is not necessary for the algorithm to work as shownin matrix completion simulations, but we make it here in orderto get a meaningful estimate ofat. Denoting the estimatedsubspace byD, we use the normalized subspace reconstructionerror to examine the algorithm performance. This is calculated

as ‖PD⊥

Dtrue‖2F/‖Dtrue‖2F , wherePD⊥

is the projectionoperator to the orthogonal subspaceD⊥ .

The choice of discount factorλ plays an important role inhow fast the algorithm converges. WithK = 50, a mere10%percent of the full dimension, the rank is given accuratelyas r = 10 in a noise-free setting whereǫ = 0. We run thealgorithm to timen = 2000 for the same data, and find that thenormalized subspace reconstruction error is minimized whenλ is around0.98 as shown in Fig. 1. Hence, we will keepλ = 0.98 hereafter.

0.94 0.95 0.96 0.97 0.98 0.99 110

−6

10−5

10−4

10−3

10−2

10−1

100

norm

aliz

ed s

ubsp

ace

erro

r

forgetting parameter λ

Fig. 1. The normalized subspace reconstruction error as a function of thediscount factorλ after running the algorithm to timen = 2000 when50 outof 500 entries of the signal are observed each time without noise.

In reality it is almost impossible to accurately estimate theintrinsic rank in advance. Fortunately the convergence rateof our algorithm degrades gracefully as the rank estimationerror increases. In Fig. 2, the evolution of normalized subspaceerror is plotted against data stream index, for rank estimationr = 10, 12, 14, 16, 18. We only examine over-estimation ofthe rank here since this is usually the case in applications.In the next section we show examples for the case of rankunderestimation.

Taking more measurements per time leads to faster conver-gence since it is approaching the full information regime, asshown in Fig. 3. Theoretically it requiresM ∼ O(r log r) ≈23 measurements to test if an incomplete vector is within asubspace of rankr [14]. The simulation shows our algorithmcan work even whenM is close to this lower bound.

Finally the robustness of the algorithm is tested against thenoise varianceǫ2 in Fig. 4, where the normalized subspaceerror is plotted against data stream index for different noiselevels ǫ. The estimated subspace deviates from the groundtruth as we increase the noise level, hence the normalizedsubspace error degrades gracefully and converges to an errorfloor determined by the noise variance.

We now consider a scenario where a subspace of rankr = 10 changes abruptly at time indexn = 3000 andn = 5000, and examine the performance of GROUSE [17]and PETRELS in Fig. 5 when the rank is over-estimated by4

Page 8: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

8

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed s

ubsp

ace

erro

r

rank error=0rank error=2rank error=4rank error=6rank error=8

Fig. 2. Normalized subspace reconstruction error as a function of data streamindex when the rank is over-estimated when50 out of500 entries of the signalare observed each time without noise.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

10−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed s

ubsp

ace

erro

r

M=30M=32M=35M=40M=50M=60

Fig. 3. Normalized subspace reconstruction error as a function of data streamindex when the number of entries observed per timeM out of 500 entriesare varied with accurate rank estimation and no noise.

and the noise level isǫ = 10−3. The normalized residual errorfor data stream, calculated as‖Pn(xn − xn)‖2/‖Pnxn‖2,is shown in Fig. 5 (a), and the normalized subspace erroris shown in Fig. 5 (b) respectively. Both PETRELS andGROUSE can successfully track the changed subspace, butPETRELS can track the change faster.

B. Direction-Of-Arrival Analysis

Given GROUSE [17] as a baseline, we evaluate the re-silience of our algorithm to different data models and applica-tions. We use the following example of Direction-Of-Arrivalanalysis in array processing to compare the performance ofthese two methods. Assume there aren = 256 sensors froma linear array, and the measurements from all sensors at timet are given as

xt = VΣat + nt, t = 1, 2, · · · . (43)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010

−6

10−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed s

ubsp

ace

erro

r

ε=0ε=1e−3ε=1e−2ε=1e−1

Fig. 4. Normalized subspace error against data stream indexwith differentnoise levelǫ when50 out of 500 entries of the signal are observed each timewith accurate rank estimation.

HereV ∈ Cn×p is a Vandermonde matrix given by

V = [α1(ω1), · · · ,αp(ωp)], (44)

whereαi(ωi) = [1, ej2πωi , · · · , ej2πωi(n−1)]]T , 0 ≤ ωi < 1;Σ = diagd = diagd1, · · · , dp is a diagonal matrix whichcharacterizes the amplitudes of each mode. The coefficientsatare generated withN (0, 1) entries, and the noise is generatedwith N (0, ǫ2) entries, whereǫ = 0.1.

Each time we collect measurements fromK = 30 ran-dom sensors. We are interested in identifying allωi

pi=1

and dipi=1. This can be done by applying the well-known

ESPRIT algorithm [29] to the estimated subspaceD of rankr, wherer is specified a-priori corresponding to the numberof modes to be estimated. Specifically, ifD1 = D(1 : n− 1)andD2 = D(2 : n) are the first and the lastn− 1 rows of D,then from the eigenvalues of the matrixT = D

†1D2, denoted

by λi, i = 1, · · · , r, the set ofωipi=1 can be recovered as

ωi =1

2πargλi, i = 1, · · · , r. (45)

The ESPRIT algorithm also plays a role in recovery of multi-path delays from low-rate samples of the channel output [30].

We show that in a dynamic setting when the underlyingsubspace is varying, PETRELS does a better job of discardingout-of-date modes and picking up new ones in comparisonwith GROUSE. We divide the running time into4 parts, andthe frequencies and amplitudes are specified as follows:

1) Start with the same frequencies

ω = [0.1769, 0.1992, 0.2116, 0.6776, 0.7599];

and amplitudes

d = [0.3, 0.8, 0.5, 1, 0.1].

2) Change two modes (only frequencies) at stream index1000:

ω = [0.1769, 0.1992, 0.4116, 0.6776, 0.8599];

Page 9: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

9

0 1000 2000 3000 4000 5000 6000 700010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed r

esid

ual e

rror

GROUSEPETRELS

0 1000 2000 3000 4000 5000 6000 700010

−6

10−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed s

ubsp

ace

erro

r

GROUSEPETRELS

(a) Normalized residual error (b) Normalized subspace error

Fig. 5. The normalized subspace error when the underlying subspace is changing with fixed rankr = 10. The rank is over-estimated by4 and the noiselevel is ǫ = 10−3, when50 out of 500 entries of the signal are observed each time for both GROUSE and PETRELS.

and amplitudes

d = [0.3, 0.8, 0.5, 1, 0.1].

3) Add one new mode at stream index2000:

ω = [0.1769, 0.1992, 0.4116, 0.6776, 0.8599,0.9513];

and amplitudes

d = [0.3, 0.8, 0.5, 1, 0.1,0.6].

4) Delete the weakest mode at stream index3000:

ω = [0.1769, 0.1992, 0.4116, 0.6776, 0.9513];

and amplitudes

d = [0.3, 0.8, 0.5, 1, 0.6].

Fig. 6 shows the ground truth of mode locations andamplitudes for the scenario above. Note that there are threeclosely located modes and one weak mode in the beginning,which makes the task challenging. We compare the perfor-mance of PETRELS and GROUSE. The rank specified in bothalgorithms isr = 10, which is the number of estimated modesat each time index; in our case it is twice the number of truemodes.2

Each time both algorithms estimated10 modes, with theiramplitude shown shown against the data stream index in Fig. 7(a) and (b). The color shows the amplitude corresponding tothe color bar. The direction-of-arrival estimations in Fig. 7(a) and (b) are further thresholded with respect to level0.5,and the thresholded results are shown in Fig. 7 (c) and (d)for PETRELS and GROUSE respectively. PETRELS identifiesall modes correctly. In particular PETRELS distinguishes thethree closely-spaced modes perfectly in the beginning, andidentifies the weak modes that come in the scene at a latertime. With GROUSE the closely spaced nodes are erroneously

2In practice the number of modes can be estimated via the MaximumDescription Length (MDL) algorithm [31].

estimated as one mode, the weak mode is missing, andspurious modes have been introduced. PETRELS also fullytracked the later changes in accordance with the entrance andexit of each mode, while GROUSE is not able to react tochanges in the data model.

Since the number of estimated modes at each time is greaterthan the number of true modes, the additional rank in theestimated subspace contributes “auxiliary modes” that do notbelong to the data model. In PETRELS these modes exhibit asscatter points with small amplitudes as in Fig. 7 (a), so theywill not be identified as actual targets in the scene. Whilein GROUSE these auxiliary modes are tracked and appearas spurious modes. All changes are identified and trackedsuccessfully by PETRELS, but not by GROUSE.

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

data stream index

mod

e lo

catio

ns

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 6. Ground truth of the actual mode locations and amplitudes in adynamic scenario.

C. Matrix Completion

We next compare performance of PETRELS for matrixcompletion against batch algorithms LMaFit [32], FPCA [33],

Page 10: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

10

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

data stream index

mod

e lo

catio

ns

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

data stream index

mod

e lo

catio

ns

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(a) PETRELS (b) GROUSE

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

data stream index

mod

e lo

catio

ns

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

data stream index

mod

e lo

catio

ns

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) PETRELS (thresholded) (d) GROUSE (thresholded)

Fig. 7. Tracking of mode changes in direction-of-arrival estimation using PETRELS and GROUSE algorithms: the estimated directions at each time for10modes are shown against the data stream in (a) and (b) for PETRELS and GROUSE respectively. The estimations in (a) and (b) are further thresholded withrespect to level0.5, and the thresholded results are shown in (c) and (d) respectively. All changes are identified and tracked successfully by PETRELS, butnot by GROUSE.

Singular Value Thresholding (SVT) [34], OptSpace [12] andGROUSE [17]. The low-rank matrix is generated from amatrix factorization model withX = UVT ∈ R1000×2000,whereU ∈ R1000×10 andV ∈ R2000×10, all entries inU andV are generated from standard normal distributionN (0, 1)(Gaussian data) or uniform distributionU [0, 1] (uniform data).The sampling rate is taken to be0.05, so only5% of all entriesare revealed.

The running time is plotted against the normalized matrixreconstruction error, calculated as‖X − X‖F /‖X‖F , whereX is the reconstructed low-rank matrix for Gaussian data anduniform data respectively in Fig. 8 (a) and (b). PETRELSmatches the performance of batch algorithms on Gaussiandata and improves upon the accuracy of most algorithmson uniform data, where the Grassmaniann-based optimiza-tion approach may encounter “barriers” for its convergence.Note that different algorithms have different input parameterrequirements. For example, OptSpace needs to specify thetolerance to terminate the iterations, which directly decidesthe trade-off between accuracy and running time; PETRELS

and GROUSE require an initial estimate of the rank. Oursimulation here only shows one particular realization and wesimply conclude that PETRELS is competitive.

D. Simplified PETRELS

Under the same simulation setup as for Fig. 2 except thatthe subspace of rank10 is generated byDtrue = DtrueΣ,whereΣ is a diagonal matrix with5 entries fromN (0, 1) and5 entries from0.01 · N (0, 1), we examine the performance ofthe simplified PETRELS algorithm (with optimizedλ = 0.9)in Section IV A and the original PETRELS (withλ = 0.98)algorithm when the rank of the subspace is over-estimatedas 12 or under-estimated as8. When the rank ofD isover-estimated, the change in (9) will introduce more errorsand converges slower compared with the original PETRELSalgorithm; however, when the rank ofD is under-estimated,the simplified PETRELS performs better than PETRELS.This is an interesting feature of the proposed simplification,and quantitative justification of this phenomenon is beyond

Page 11: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

11

100

101

102

103

10−4

10−3

10−2

10−1

Time

Nor

mal

ized

Mat

rix C

ompl

etio

n E

rror

SVTOptSpaceLMaFitFPCAGROUSEPETRELS

10−1

100

101

102

103

10−3

10−2

10−1

100

Time

Nor

mal

ized

Mat

rix C

ompl

etio

n E

rror

SVTOptSpaceLMaFitFPCAGROUSEPETRELS

(a) matrix factor fromN (0, 1) (b) matrix factor fromU [0, 1]

Fig. 8. Comparison of matrix completion algorithms in termsof speed and accuracy: PETRELS is a competitive alternativefor matrix completion tasks.

the scope of this paper. Intuitively, when the rank is under-estimated, the simplified PETRELS also uses the interpolatedentries to update the subspace estimate, which seems to helpthe performance.

0 200 400 600 800 1000 1200 1400 1600 1800 200010

−6

10−5

10−4

10−3

10−2

10−1

100

data stream index

norm

aliz

ed s

ubsp

ace

erro

r

PETRELS rank = 12simplified rank = 12PETRELS rank = 8simplified rank − 8

Fig. 9. Normalized subspace reconstruction error against data stream indexwhen the rank is over-estimated as12 or under-estimated as8 for the originalPETRELS and modified algorithm.

VII. C ONCLUSIONS

We considered the problem of reconstructing a data streamfrom a small subset of its entries, where the data stream isassumed to lie in a low-dimensional linear subspace, possiblycorrupted by noise. This has significant implications for less-ening the storage burden and reducing complexity, as well astracking the changes for applications such as video denoising,network monitoring and anomaly detection when the problemsize is large. The well-known low-rank matrix completionproblem can be viewed as a batch version of our problem.The PETRELS algorithm first identifies the underlying low-dimensional subspace via a discounted recursive procedureforeach row of the subspace matrix in parallel, then reconstructs

the missing entries via least-squares estimation if required.The discount factor allows the algorithm to capture long-termbehavior as well as track the changes of the data stream. Weshow that PETRELS converges to a stationary point givenit is a second-order stochastic gradient descent algorithm. Inthe full observation scenario, we further prove that PETRELSactually convergence globally by revealing its connectionwiththe PAST algorithm. We demonstrate superior performance ofPETRELS in direction-of-arrival estimation and showed thatit is competitive with state of the art batch matrix completionalgorithms.

REFERENCES

[1] T. Ahmed, M. Coates, and A. Lakhina, “Multivariate online anomalydetection using kernel recursive least squares,”Proc. 26th IEEE Inter-national Conference on Computer Communications, pp. 625–633, 2007.

[2] S. Shahbazpanahi, S. Valaee, and M. H. Bastani, “Distributed sourcelocalization using esprit algorithm,”IEEE Transactions on Signal Pro-cessing, vol. 49, no. 10, p. 21692178, 2001.

[3] R. Kumaresan and D. Tufts, “Estimating the angles of arrival of multipleplane waves,”IEEE Transactions On Aerospace And Electronic Systems,vol. AES-19, no. 1, pp. 134–139, 1983.

[4] A. H. Sayed,Fundamentals of Adaptive Filtering. Wiley, NY, 2003.[5] B. Yang, “Projection approximation subspace tracking,” IEEE Transac-

tions on Signal Processing, vol. 43, no. 1, pp. 95–107, 1995.[6] K. Crammer, “Online tracking of linear subspaces,”In Proc. COLT 2006,

vol. 4005, pp. 438–452, 2006.[7] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE

Trans. Inform. Theory, vol. 51, pp. 4203–4215, Dec. 2005.[8] D. L. Donoho, “Compressed sensing,”IEEE Trans. Inform. Theory,

vol. 52, no. 4, pp. 1289–1306, 2006.[9] E. J. Candes and T. Tao, “The power of convex relaxation:Near-optimal

matrix completion,” IEEE Trans. Inform. Theory, vol. 56, no. 5, pp.2053–2080, 2009.

[10] E. J. Candes and B. Recht, “Exact matrix completion via convexoptimization,”Foundations of Computational Mathematics, vol. 9, no. 6,pp. 717–772, 2008.

[11] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basispursuit,”SIAM journal on scientific computing, vol. 20, no. 1, pp. 33–61,1998.

[12] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion fromnoisy entries,”Journal of Machine Learning Research, pp. 2057–2078,2010.

[13] K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimumrank approximation,”Information Theory, IEEE Transactions on, vol. 56,no. 9, pp. 4402–4416, 2010.

Page 12: PETRELS: Parallel Subspace Estimation and Tracking by ... · arXiv:1207.6353v2 [stat.ME] 11 Jan 2013 1 PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares

12

[14] L. Balzano, B. Recht, and R. Nowak, “High-dimensional matchedsubspace detection when data are missing,” inProc. ISIT, June 2010.

[15] K. Lounici, “High-dimensional covariance matrix estimation with miss-ing observations,”arXiv preprint arXiv:1201.2577, 2012.

[16] Y. Chi, “Robust nearest subspace classification with missing data,”Submitted to 2013 International Conference on Acoustics, Speech, andSignal Processing (ICASSP), 2012.

[17] L. Balzano, R. Nowak, and B. Recht, “Online identification and trackingof subspaces from highly incomplete information,”Proc. Allerton 2010,2010.

[18] W. Dai, O. Milenkovic, and E. Kerman, “Subspace Evolution andTransfer (SET) for Low-Rank Matrix Completion,”IEEE Trans. SignalProcessing, p. submitted, 2010.

[19] B. Yang, “Asymptotic convergence analysis of the projection approx-imation subspace tracking algorithm,”Signal Processing, vol. 50, pp.123–136, 1996.

[20] E. Oja, “A simplified neuron model as a principal component analyzer.”Journal of Mathematical Biology, vol. 15, no. 3, pp. 267–273, 1982.

[21] Y. Miao, “Fast subspace tracking and neural network learning by a novelinformation criterion,”IEEE Trans on Signal Processing, vol. 46, no. 7,pp. 1967–1979, 1998.

[22] Y. Hua, “A new look at the power method for fast subspace tracking,”Digital Signal Processing, vol. 9, no. 4, pp. 297–314, 1999.

[23] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral regularizationalgorithms for learning large incomplete matrices,”Journal of MachineLearning Research, vol. 11, pp. 1–26, 2009.

[24] G. H. Golub and C. F. Van Loan,Matrix Computations. Johns HopkinsUniversity Press, 1996, vol. 10, no. 8.

[25] C. D. Meyer, “Generalized inversion of modified matrices,” SIAMJournal of Applied Mathematics, p. 315323, 1973.

[26] J. Cioffi, “Limited-precision effects in adaptive filtering,” IEEE Trans-actions on Circuits and Systems, vol. 34, no. 7, pp. 821–833, 1987.

[27] L. Bottou and O. Bousquet, “The tradeoffs of large scalelearning,” inAdvances in Neural Information Processing Systems, J. Platt, D. Koller,Y. Singer, and S. Roweis, Eds., 2008, vol. 20, pp. 161–168.

[28] L. Bottou, “Large-scale machine learning with stochastic gradient de-scent,”COMPSTAT2010 Book of Abstracts, p. 270, 2008.

[29] R. Roy and T. Kailath, “ESPRIT–Estimation of signal parameters viarotational invariance techniques,”IEEE Trans. on Acoustics, Speech,Signal Processing, vol. 37, no. 7, pp. 984–995, Jul. 1989.

[30] K. Gedalyahu and Y. C. Eldar, “Time-delay estimation from low-ratesamples: A union of subspaces approach,”IEEE Trans. on SignalProcessing, vol. 58, no. 6, pp. 3017–3031, 2010.

[31] J. Rissanen, “A universal prior for integers and estimation by minimumdescription length,”The Annals of statistics, vol. 11, no. 2, pp. 416–431,1983.

[32] Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorizationmodel for matrix completion by a non-linear successive over-relaxationalgorithm,” Rice CAAM Tech Report TR10-07, 2010.

[33] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregmaniterativemethods for matrix rank minimization,”Mathematical Programming,vol. 1, no. 1, pp. 1–27, 2009.

[34] J.-F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding al-gorithm for matrix completion,”SIAM Journal on Optimization, vol. 20,no. 4, pp. 1–28, 2008.