robust spectral compressed sensing via structured matrix ...yuejiec/papers/sparse... · matrix...

Robust Spectral Compressed Sensing viaStructured Matrix Completion

Yuxin Chen†, Yuejie Chi‡

†Electrical Engineering, Stanford University‡Electrical and Computer Engineering, The Ohio State University

Abstract

The paper studies the problem of recovering a spectrally sparse object from a small number of time domainsamples. Specifically, the object of interest with ambient dimension n is assumed to be a mixture of r complexsinusoids, while the underlying frequencies can assume any continuous values in the unit disk. Conventionalcompressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on theFourier representation. To address this problem, we develop a novel nonparametric algorithm, called EnhancedMatrix Completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data intoa low-rank enhanced form with multi-fold Hankel structure whose rank is upper bounded by r, and then attemptsrecovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soonas the number of samples exceeds the order of r log2 n, and is robust against bounded noise. Even if a constantportion of samples are corrupted with arbitrary magnitude, EMaC can still allow accurate recovery if the number ofsamples exceeds the order of r log3 n. Along the way, our results demonstrate that accurate completion of a low-rank multi-fold Hankel matrix is possible when the number of observed entries is proportional to the informationtheoretical limits (except for a logarithmic gap) under mild conditions. The performance of our algorithm and itsapplicability to super resolution are further demonstrated by numerical experiments.

I. INTRODUCTION

A. Motivation and Contributions

A large class of practical applications features high-dimensional objects that can be modeled or approximatedby a superposition of spikes in the spectral (resp. time) domain, and involves estimation of the object from its time(resp. frequency) domain samples. A partial list includes acceleration of medical imaging [1], target localizationin radar and sonar systems [2], inverse scattering in seismic imaging [3], fluorescence microscopy [4], channelestimation in wireless communications [5], [6], etc. The data acquisition devices, however, are often limited byhardware and physical constraints (e.g. the diffraction limit in an optical system), precluding sampling with desiredresolution. It is thus of paramount interest to reduce the sensing complexity while retaining recovery resolution.

Fortunately, in many instances, it is possible to recover an object even when the number of samples is farbelow the ambient dimension, provided that the object has a parsimonious representation in the transform domain.In particular, recent advances in Compressed Sensing (CS) [7], [8] popularize nonparametric methods based onconvex surrogates. Such tractable methods do not require prior information on the model order, and are often robustagainst noise and outliers [9], [10].

Nevertheless, the success of CS relies on sparse representation or approximation of the object of interest in a finitediscrete dictionary, while the true parameters in many applications are actually specified in a continuous dictionary.For concreteness, consider an object x (t) that is a weighted sum of K-dimensional sinusoids at r distinct frequenciesf i ∈ [0, 1]K : 1 ≤ i ≤ r. Conventional CS paradigms operate under the assumptions that these frequencies lieon a pre-determined grid on the unit disk. However, cautions need to be taken when imposing a discrete dictionaryon continuous frequencies, since nature never poses the frequencies on the pre-determined grid, no matter howfine the grid is [11], [12]. This issue, known as basis mismatch between the true frequencies and the discretizedgrid, results in loss of sparsity due to spectral leakage along the Dirichlet kernel, and hence degeneration in theperformance of CS algorithms. While one might impose finer gridding to mitigate this weakness, this approach

Y. Chen is with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (email: [email protected]).Y. Chi is with the Department of Electrical and Computer Engineering, the Ohio State University, Columbus, OH 43210, USA (email:[email protected]). Manuscript date: April 28, 2013.

1

often leads to numerical instability and high correlation between dictionary elements, which significantly weakensthe advantage of these CS approaches [13].

In this paper, we consider the spectral compressed sensing problem, which aims to recover a spectrally sparseobject from a small set of time domain samples. The underlying (possibly multi-dimensional) frequencies canassume any value on the unit disk, and need to be recovered with infinite precision. We develop a nonparametricalgorithm, called Enhanced Matrix Completion (EMaC), based on structured matrix completion. Specifically, EMaCstarts by converting the data samples into an enhanced matrix with K-fold Hankel structures whose rank is upperbounded by r, and then solves a nuclear norm minimization program to complete the enhanced matrix. We showthat, under mild incoherence conditions, EMaC admits exact recovery from O(r log2 n) random samples1, wherer and n denote respectively the spectral sparsity and the ambient dimension. Our results further demonstrate thatunder the same mild incoherence conditions, EMaC is robust against bounded noise and sparse corruptions. Inparticular, the robust version of EMaC admits exact recovery from O(r log3 n) random samples even when aconstant proportion of the samples are corrupted with arbitrary magnitudes. Furthermore, numerical experimentsvalidate our theoretical findings, and show that EMaC is also applicable to the problem of super resolution.

Additionally, we provide theoretical guarantee for low-rank Hankel matrix completion, which is of greatimportance in control, natural language processing, and computer vision. To the best of our knowledge, our resultsprovide the first theoretical bounds that are close to the information theoretic limit.

B. Connection to Prior Work

Spectral compressed sensing is closely related to harmonic retrieval, which seeks to extract the underlyingfrequencies of an object from a collection of its time domain samples. Conventional methods for harmonic retrievalinclude the Prony’s method [14], ESPRIT [15], the matrix pencil method [16], the Tufts and Kumaresan approach[17], etc. These methods are typically based on the eigenvalue decomposition of covariance matrices constructedfrom equi-spaced samples, which can accommodate infinite frequency precision in the absence of noise. The K-foldHankel structure, which plays a central role in the EMaC algorithm, roots from the traditional spectral estimationliterature under the name “matrix enhancement matrix pencil” [18] for estimating multi-dimensional frequencies.One weakness of these techniques lies in that they are parametric and require prior knowledge on the modelorder, that is, the number of underlying frequency spikes of the signal or, at least, an estimate of it. Besides, theirperformance largely depends on the the knowledge of noise spectra – these methods are often sensitive to noiseand unstable against outliers [19]. The finite rate of innovation approach [20], [21], while not restricting itself toequi-spaced samples, also requires prior knowledge on the model order and may not be easily extend to handlesignals that are approximately sparse or vulnerable to outliers.

Nonparametric algorithms based on convex optimization differ from the above parametric techniques in that themodel order does not need to be specified a priori. For instance, the CS paradigms [7], [8] assert that it is possibleto recover a spectrally sparse signal from a small number of time domain samples. These algorithms admit faithfulrecovery even when the samples are contaminated by bounded noise [9], [22] or arbitrary sparse outliers [10]. Byassuming the frequency spikes lie on an a priori grid, rather than identify the locations of frequencies, CS selectsthem from a pre-determined set. However, there is an inevitable basis mismatch between the true frequencies andthe discrete grid, no matter how fine the grid is [11], [23]. A degeneration in performance of CS algorithms isreported due to loss of sparsity via spectral leakage along the Dirichlet kernel.

Recently, Candès and Fernandez-Granda [24] proposed a total-variation norm minimization algorithm to super-resolve a sparse object from frequency samples at the low end of the spectrum. This algorithm allows accuratesuper-resolution when the point sources are appropriately separated, and is stable against noise [25]. Inspired by thisapproach, Tang et. al. [13] then developed an atomic norm minimization algorithm [26] for line spectral estimationfrom O(r log r log n) random time domain samples, which achieves exact recovery under mild separation conditions.However, these works are limited to one-dimensional (1-D) frequency models, require that the complex signs ofthe frequency spikes are i.i.d. drawn from uniform distribution, and their robust against sparse outliers has not beenestablished.

1The standard notation f(n) = O (g(n)) means that there exists a constant c such that f(n) ≤ cg(n); f(n) = Θ (g(n)) means there areconstants c1 and c2 such that c1g(n) ≤ f(n) ≤ c2g(n).

2

In contrast, our approach can accommodate multi-dimensional frequencies, and only assumes randomness inthe observation basis. The algorithm is inspired by recent advances of Matrix Completion (MC), which aims atrecovering a low-rank matrix from partial entries. It has been shown [27], [28] that exact recovery is possible vianuclear norm minimization, as soon as the number of observed entries is on the order of the information theoreticlimit. This line of algorithms is also robust against noise and outliers [29], [30]. Nevertheless, the theoreticalguarantees of these algorithms do not apply to the more structured observation models associated with the proposedmulti-fold Hankel structure. Consequently, direct application of existing MC results yields pessimistic bounds onthe required number of samples, which is far beyond the degrees of freedom underlying the sparse object.

C. Organization

The rest of the paper is organized as follows. The data and sample models are described in Section II. By restrictingour attention to two-dimensional (2-D) frequency models, we present the enhanced matrix and the associated matrixcompletion algorithms. The extension to high-dimensional frequency models is discussed in Section II-F. The maintheoretical guarantees are summarized in Sections III, based on the incoherence conditions introduced in SectionIII-A. We then discuss the extension to low-rank Hankel matrix completion in Section IV. Section V presents thenumerical validation of our algorithms. The proofs for Theorem 1 and Theorem 3 are based on duality analysisand a golfing scheme [28], which are supplied in Section VI and Section VII, respectively. Section VIII concludesthe paper with a short summary of our findings, a discussion of potential extensions and improvements. Finally,proofs of auxiliary lemmas supporting our results are provided in the Appendix.

II. MODEL AND ALGORITHM

Assume that the object of interest x (t) can be modeled as a weighted sum of K-dimensional sinusoids at rdistinct frequencies f i ∈ [0, 1]K (1 ≤ i ≤ r), i.e.

x(t) =

r∑i=1

diej2π〈t,f i〉,

where di’s denote the complex amplitude, and 〈·, ·〉 denotes the inner product. Here, it is assumed that the frequenciesf i’s are normalized with respect to the Nyquist frequency of x(t). For concreteness, our discussion is mainly devotedto a 2-D frequency model. This subsumes line spectral estimation as a special case, and indicates how to addressmulti-dimensional models. The extension to higher dimension scenarios is discussed in Section II-F.

A. 2-D Frequency Model

Consider a data matrix X = [Xk,l], 0 ≤ k < n1, 0 ≤ l < n2 of ambient dimension n1 × n2, which can beobtained by sampling x(t) on a uniform grid at the Nyquist rate. Each entry Xk,l can be expressed as

Xk,l =

r∑i=1

diyki zli, (1)

where for any i (1 ≤ i ≤ r), we have

yi = exp (j2πf1i) and zi = exp (j2πf2i) ,

for some frequency pairs (f1i, f2i) | 1 ≤ i ≤ r. We can then express X in a matrix form as follows

X = Y DZT , (2)

where the above matrices are defined as

Y :=

1 1 · · · 1y1 y2 · · · yr...

......

...yn1−1

1 yn1−12 · · · yn1−1

r

, (3)

3

Z :=

1 1 · · · 1z1 z2 · · · zr...

......

...zn2−1

1 zn2−12 · · · zn2−1

r

, (4)

andD := diag [d1, d2, · · · , dr] . (5)

This is sometimes referred to as the Vandemonde decomposition of X .Suppose that there exists a location set Ω of size m such that Xk,l is observed if and only if (k, l) ∈ Ω. It

is assume that Ω is selected uniformly at random. We are interested in recovering X from the observation of itsentries on a small location set Ω.

B. Matrix Enhancement

One might naturally attempt recovery by applying the low-rank MC algorithms [27], arguing that when r issmall, perfect recovery of X is possible from partial measurements since X is low rank if r minn1, n2.Specifically, this corresponds to the following algorithm:

minimizeM∈Cn1×n2

‖M‖∗ (6)

subject to Mk,l = Xk,l, ∀ (k, l) ∈ Ω

where ‖M‖∗ denotes the nuclear norm (or sum of all singular values) of M = [Mk,l]. This is a convex relaxationparadigm of the rank minimization problem. However, naive MC algorithms [28] require at least the order ofrmax (n1, n2) log (n1n2) samples, which far exceeds the degrees of freedom (which is O (r)) in our problem.What is worse, when r > min (n1, n2) (which is possible since r can be as large as n1n2), X is no longerlow-rank. This motivates us to construct other low-rank forms that better capture the harmonic structure.

In this paper, we adopt one effective enhanced form of X based on two-fold Hankel structure as follows. Theenhanced matrix Xe with respect to X is defined as a k1 × (n1 − k1 + 1) block Hankel matrix

Xe :=

X0 X1 · · · Xn1−k1X1 X2 · · · Xn1−k1+1

......

......

Xk1−1 Xk1 · · · Xn1−1

, (7)

where 1 ≤ k1 ≤ n1, and each block is a k2×(n2 − k2 + 1) Hankel matrix defined such that for every ` (0 ≤ ` < n1):

X` :=

X`,0 X`,1 · · · X`,n2−k2X`,1 X`,2 · · · X`,n2−k2+1

......

......

X`,k2−1 X`,k2 · · · X`,n2−1

, (8)

where 1 ≤ k2 ≤ n2. This enhanced form allows us to express each block as2

X` = ZLY`dDZR, (9)

where ZL, ZR and Y d are defined as

ZL :=

1 1 · · · 1z1 z2 · · · zr...

......

...zk2−1

1 zk2−12 · · · zk2−1

r

,2Note that the lth row Xl∗ of X can be expressed as

Xl∗ =[yl1, · · · , ylr

]DZ =

[yl1d1, · · · , ylrdr

]Z,

and hence we only need to find the Vandemonde decomposition for X0 and then replace di by ylidi.

4

ZR :=

1 z1 · · · zn2−k2

1

1 z2 · · · zn2−k22

......

......

1 zr · · · zn2−k2r

,and

Y d := diag [y1, y2, · · · , yr] .

Substituting (9) into (7) yields the following:

Xe =

ZL

ZLY d...

ZLYk1−1d

︸︷︷︸

√k1k2EL

D[ZR, Y dZR, · · · , Y n1−k1

d ZR]︸︷︷︸√

(n1−k1+1)(n2−k2+1)ER

, (10)

where EL and ER characterize the column and row space of Xe, respectively. This immediately implies that Xe

is low-rank, i.e.,rank (Xe) ≤ r. (11)

This form aligns with the traditional matrix pencil approach proposed in [16], [18] to estimate harmonic frequenciesif all entries of X are available. Thus, one can extract all underlying frequencies of X using methods proposed in[18], as long as X can be faithfully recovered.

C. The EMaC Algorithm without Noise

Define PΩ(X) as the orthogonal projection of X onto the subspace of matrices that vanish outside Ω. We thenattempt recovery through the following Enhancement Matrix Completion (EMaC) algorithm:

(EMaC) minimizeM∈Cn1×n2

‖M e‖∗ (12)

subject to PΩ (M) = PΩ (X) ,

where M e denotes the enhanced form of M . In other words, EMaC minimizes the nuclear norm of the enhancedform over the constraint set. This convex program can be solved using off-the-shelf semidefinite program solversin a tractable manner (see, e.g., [31]).

D. The Noisy-EMaC Algorithm with Bounded Noise

In practice, measurements are almost always contaminated by a certain amount of noise. To make our model andalgorithm more practically applicable, we can replace our measurements by Xo = [Xo

kl]0≤k<n1,0≤l<n2through the

following noisy model∀(k, l) ∈ Ω : Xo

k,l = Xk,l +Nk,l, (13)

where Xok,l is the observed (k, l)-th entry, and N = [Nk,l]0≤k<n1,0≤l<n2

denotes some unknown noise. We assumethat the noise magnitude is bounded by a known amount ‖PΩ (N)‖F ≤ δ, where ‖·‖F denotes the Frobenius norm.In order to adapt our algorithm to such noisy measurements, one wishes that small perturbation in the measurementsshould result in small change in the estimate. Our algorithm is then modified as follows

(Noisy-EMaC) : minimizeM∈Cn1×n2

‖M e‖∗ (14)

subject to ‖PΩ (M −Xo)‖F ≤ δ.

That said, the algorithm searches for a candidate with minimum nuclear norm among all signals close to themeasurements.

5

E. The Robust-EMaC Algorithm with Sparse Outliers

An outlier is a data sample that can deviate arbitrarily from the true data point. Practical data samples one collectsmay contain a certain portion of outliers due to abnormal behavior of data acquisition devices such as amplifiersaturation, sensor failures, and malicious attacks. A desired recovery algorithm should be able to automaticallyidentify the set of outliers even when they arise in up to a constant portion of all data samples.

Specifically, suppose that our measurements Xo are given by

∀(k, l) ∈ Ω : Xok,l = Xk,l + Sk,l, (15)

where Xok,l is the observed (k, l)-th entry, and S = [Sk,l]0≤k<n1,0≤l<n2

denotes the outliers, which is assumed tobe a sparse matrix supported on some location set Ωdirty ⊆ Ω. We assume that the sparsity pattern of S is selecteduniformly at random conditioned on Ω. More specifically, we assume the following model:

1) Suppose that Ω is obtained by sampling m entries uniformly at random, and define ρ = mn1n2

.2) Conditioning on (k, l) ∈ Ω, the events

(k, l) ∈ Ωdirty

are independent with conditional probability

P

(k, l) ∈ Ωdirty | (k, l) ∈ Ω

= s

for some small constant corruption fraction 0 < s < 1.3) Define Ωclean := Ω\Ωdirty as the location set of uncorrupted measurements.

EMaC is then modified as follows to accommodate sparse outliers:

(Robust-EMaC) minimizeM ,S∈Cn1×n2

‖M e‖∗ + λ‖Se‖1 (16)

subject to PΩ

(M + S

)= PΩ (X + S) ,

where M e and Se denote the enhanced form of M and S, respectively. Here, ‖Se‖1 := ‖vec(Se)‖1 denotes the `1-norm of the vectorized Se. The objective function of Robust-EMaC is the convex envelope of rank (M e)+λ‖Se‖0,which captures the low-rankness of the enhanced form as well as the sparsity of the outliers.

F. Extension to Higher-Dimensional Frequency Models

The EMaC method extends to higher dimensional frequency models without difficulty. In fact, for K-dimensionalfrequency models, one can arrange the original data into a K-fold Hankel matrix of rank at most r. For instance,consider a 3-D model such that

∀ (l1, l2, l3) ∈ [n1]× [n2]× [n3] : Xl1,l2,l3 =

r∑i=1

diyl1i z

l2i w

l3i .

An enhanced form can be defined as a 3-fold Hankel matrix such that

Xe :=

X0,e X1,e · · · Xn3−k3,eX1,e X2,e · · · Xn3−k3+1,e

......

......

Xk3−1,e Xk1,e · · · Xn3−1,e

,where Xi,e denotes the 2-D enhanced form of the matrix consisting of all entries Xl1,l2,l3 obeying l3 = i. Onecan verify that Xe is of rank at most r, and can thereby apply EMaC on the 3-D enhanced form. To summarize,for K-dimensional frequency models, EMaC (resp. Noisy-EMaC, Robust-EMaC) searches over all K-fold Hankelmatrices that are faithful with the measurements.

6

G. Notations

Before continuing, we introduce a few notations that will be used throughout. Let the singular value decomposition(SVD) of Xe be Xe = UΛV ∗. Denote by

T :=UM∗ + MV ∗ : M ∈ C(n1−k1+1)(n2−k1+1)×r;M ∈ Ck1k2×r

the tangent space with respect to Xe, and T⊥ the orthogonal complement of T . Denote by PU (resp. PV , PT ) theorthogonal projections onto the column (resp. row, tangent) space of Xe, i.e. for any M

PUM = UU∗M ; PVM = MV V ∗; PT = PU + PV − PUPV .

We let PT⊥ = I − PT be the orthogonal complement of PT , where I denotes the identity operator.Denote by ‖M‖, ‖M‖F, ‖M‖∗ the spectral norm (operator norm), Frobenius norm, and nuclear norm of M .

Also, ‖M‖1 and ‖M‖∞ are defined to be the `1 and `∞ norm of the vectorized M . Additionally, we use sgn (M)to denote the elementwise complex sign of M .

On the other hand, we denote by Ωe(k, l) the set of locations of the enhanced matrix Xe containing copies ofXk,l. Due to the Hankel and multi-fold Hankel structures, one can easily verify the following: for any Ωe(k, l),there exists at most one index in any given row of the enhanced form, and at most one index in any given column.For each (k, l) ∈ [n1] × [n2], we use A(k,l) to denote a basis matrix that extracts the average of all entries inΩe (k, l). Specifically, (

A(k,l)

)α,β

:=

1√|Ωe(k,l)|

, if (α, β) ∈ Ωe (k, l) ,

0, else.(17)

We will use ωk,l := |Ωe (k, l)| as a short hand notation.

III. MAIN RESULTS

Encouragingly, under certain incoherence conditions, the simple EMaC enables faithful recovery of the true datamatrix from a small number of noiseless time-domain samples, even when the samples are contaminated by boundednoise and a constant portion of arbitrary outliers.

A. Incoherence Measures

In general, matrix completion from a few entries is hopeless unless the underlying structure is uncorrelated withthe observation basis. This inspires us to introduce certain incoherence measures. Let GL and GR be two r × rcorrelation matrices such that

(GL) i1,i2 :=

1k1k2

1−(y∗i1yi2)k1

1−y∗i1yi21−(z∗i1zi2)

k2

1−z∗i1zi2, if i1 6= i2,

1, if i1 = i2,

(GR) i1,i2 :=

1(n1−k1+1)(n2−k2+1)

1−(y∗i1yi2)n1−k1+1

1−y∗i1yi21−(z∗i1zi2)

n2−k2+1

1−z∗i1zi2, if i1 6= i2,

1, if i1 = i2.

Denote the smallest singular values of GL and GR as σmin (GL) and σmin (GR). Note that GL and GR can beobtained by sampling the 2-D Dirichlet kernel, which is frequently considered in Fourier analysis. Our incoherencemeasure is then defined as follows.

Definition 1. [Incoherence]Let Xe denote the enhanced matrix associated with X , and suppose the SVD of Xe

is given by Xe = UΛV ∗. Then X is said to have incoherence (µ1, µ2, µ3) if they are respectively the smallestvalues obeying

σmin (GL) ≥ 1

µ1, σmin (GR) ≥ 1

µ1; (18)

7

max(k,l)∈[n1]×[n2]

1

ω2k,l

∣∣∣∣∣∣∑

(α,β)∈Ωe(k,l)

(UV ∗)α,β

∣∣∣∣∣∣2

≤ µ2r

n21n

22

; (19)

∀(k, l) ∈ [n1]× [n2] :∑

(α,β)∈[n1]×[n2]

ωα,β∣∣⟨UU∗A(k,l)V V ∗,A(α,β)

⟩∣∣2 ≤ µ3r

n1n2ωk,l. (20)

There is a large class of spectrally sparse 2-D signals that exhibit good incoherence properties. In this paper, wewill always choose k1 = Θ (n1), k2 = Θ (n2), n1−k1 = Θ (n1), and n2−k2 = Θ (n1). To give the reader a flavorfor the incoherence conditions, some interpretations are in order as well as a few examples. In this subsection, werestrict our attention to square matrices, i.e. n := n1 = n2. Note that the class of incoherent signals are far beyondthe ones discussed below.

1) Condition (18): This specifies certain incoherence among the locations of frequency pairs, which does notcoincide with and is not subsumed by the separation condition required in [13], [24]. The frequencies satisfyingCondition (18) can be either spread out or minimally separated. Two examples are listed as follows.• Random frequency locations: suppose that the r frequencies are generated uniformly at random, then the

minimum pairwise separation can be crudely bounded by Θ(

1r2 logn

). If n r2.5 log n, then a crude bound

yields

∀i1 6= i2, max

1

k1

1−(y∗i1yi2

)k11− y∗i1yi2

,1

k2

1−(z∗i1zi2

)k21− z∗i1zi2

1√

r,

indicating that the off-diagonal entries of GL and GR are much smaller than 1/r in magnitude. Simplemanipulation then allows us to conclude that σmin (GL) and σmin (GR) are bounded below by positive constants.

• Small perturbation off the grid: suppose that all frequencies are within a distance at most 1nr1/4 from some

grid points(l1n ,

l2n

)(0 ≤ l1, l2 < n). One can verify that

∀i1 6= i2, max

1

k1

1−(y∗i1yi2

)k11− y∗i1yi2

,1

k2

1−(z∗i1zi2

)k21− z∗i1zi2

<

1

2√r,

and hence the magnitude of all off-diagonal entries of GL and GR are no larger than 1/(4r). This immediatelyreveals that σmin (GL) and σmin (GR) are lower bounded by 3/4.

2) Condition (19): This can be satisfied when the total energy of each skew diagonal of UV ∗ is proportional tothe dimension of this skew diagonal. This is weaker than the one introduced in [27] for matrix completion, whichrequires uniform energy distribution over all entries of UV ∗. In general, the matrix UV ∗ relies jointly on thefrequencies and their complex amplitudes. For instance, an ideal µ2 may arises when the complex phases of allfrequencies are generated in some random fashion.

3) Condition (20): This is an incoherence measure based on the (K-fold) Hankel structure, which depends onthe energy distribution between A(k,l) and the column and row spaces U ,V . The following manipulation mayallow the reader to get a flavor for when this incoherence condition may hold.

Let a, b ∈ [n1]× [n2]. One can easily verify that

|〈Ab,UU∗AaV V ∗〉| ≤√ωb max

(α,β)∈Ωe(b)

∣∣∣(UU∗AaV V ∗)α,β

∣∣∣ ,which, through simple manipulation, leads to∑

a∈[n1]×[n2]

|〈UU∗AbV V ∗,Aa〉|2 ≤ ωb

∑a∈[n1]×[n2]

max(α,β)∈Ωe(b)

∣∣∣(UU∗AaV V ∗)α,β

∣∣∣2 .Note that ωa ‖UU∗AaV V ∗‖2F ≤ ‖UU∗‖2F = r. If the energy of UU∗AaV V ∗ is spread out, i.e. most entries ofωaUU∗AaV V ∗ on Ωe(b) are bounded by O

(√r

n2

), then one can bound∑

a∈[n1]×[n2]

ωa |〈UU∗AbV V ∗,Aa〉|2 ≤ ωb

∑a∈[n1]×[n2]

O( rn4

)= ωbO

( rn2

).

In this situation, we have a good incoherence measure µ3.

8

4) Relations among Conditions (18), (19) and (20): Although it is not easy to examine Conditions (19) and(20) directly through the properties of frequency pairs, we can bound µ2 and µ3 using the knowledge of µ1. Theirmutual relations are summarized in the following lemma, which implies that µ2 = O

(µ2

1r)

and µ3 = O(µ2

1r).

Lemma 1. Define

cs := max

(n1n2

k1k2,

n1n2

(n1 − k1 + 1) (n2 − k2 + 1)

).

Then the incoherence measures (µ1, µ2, µ3) of Xe satisfy

µ2 ≤ µ21c

2sr, and µ3 ≤ µ2

1c2sr.

Proof: See Lemma 2 in Section VI.

B. Theoretical Guarantees

With the above incoherence measures, the main theoretical guarantees are supplied in the following three theoremseach accounting for a distinct data model: (1) noiseless measurements, (2) measurements contaminated by boundednoise, and (3) measurements corrupted by a constant portion of arbitrary outliers.

1) Exact Recovery from Noiseless Measurements: The theorem below shows that exact recovery is possibleunder mild incoherence conditions.

Theorem 1. Let X be a data matrix with matrix form (2), and Ω the random location set of size m. If allmeasurements are noiseless, then there exists a constant c1 > 0 such that under either of the following conditions:

i) (strong incoherence condition) (18), (19) and (20) hold and

m > c1 max (µ1cs, µ2, µ3cs) r log2 (n1n2) ; (21)

ii) (weak incoherence condition) (18) holds and

m > c1µ21c

2sr

2 log2(n1n2); (22)

X is the unique solution of EMaC with probability exceeding 1− (n1n2)−2.

Note that the result under conditions ii) is an immediate consequence of that under conditions i) by Lemma 1.Theorem 1 states the following: (1) under strong incoherence condition (i.e. given that (µ1, µ2, µ3) are all constants),prefect recovery is possible as soon as the number of measurements exceeds O(r log2 (n1n2)); (2) under weakincoherence condition (i.e. given only that µ1 is a constant), exact recovery is possible from O

(r2 log (n1n2)

)samples. Since there are at least Θ(r) degrees of freedom in total, the lower bound should be no smaller than Θ(r)3. This establishes the orderwise optimality of EMaC under strong incoherence condition except for a logarithmicgap.

We would like to emphasize that while we assume random observation models, the conditions imposed on thedata model are deterministic. This is different from [13], where randomness are assumed for both the observationmodel and the data model.

2) Stable Recovery in the Presence of Bounded Noise : Our method enables stable recovery even when the timedomain samples are noisy copies of the true data. Here, we say the recovery is stable if the solution of Noisy-EMaCis “close” to the ground truth. To this end, we establish the following theorem, which is a counterpart of Theorem1 in the noisy setting.

Theorem 2. Suppose Xo is a noisy copy of X that satisfies ‖PΩ(X − Xo)‖F ≤ δ. Under the conditions ofTheorem 1, the solution to Noisy-EMaC in (14) satisfies

‖Xe −Xe‖F ≤

2√n1n2 + 8n1n2 +

8√

2n21n

22

m

δ (23)

with probability exceeding 1− (n1n2)−2.

3In fact, the minimum number of measurements may better be thought of as Θ (r log (n1n2)) rather than Θ(r) due to a coupon collector’seffect.

9

Proof: See Appendix J.Theorem 2 basically implies that the recovered enhanced matrix (which contains Θ(n2

1n22) entries) is close to

the true enhanced matrix at high signal-to-noise ratio. In particular, the average entry inaccuracy is bounded aboveby O(n1n2

m δ). We note that in practice, Noisy-EMaC usually yields much better estimate, possibly by a polynomialfactor. The practical applicability will be illustrated in Section V through numerical examples.

3) Robust Recovery in the Presence of Sparse Outliers: The theoretical performance of Robust-EMaC issummarized in the following theorem.

Theorem 3. Let X be a data matrix with matrix form (2), and Ω a random location set of size m. Set λ =1√

m log(n1n2), and assume s is some small positive constant. Suppose that the complex sign of S is randomly

generated such that Esgn (S) = 0. Then there exist constants c1, c2 > 0 such that under either of the followingconditions:

i) (strong incoherence condition) (18), (19) and (20) hold and

m > c1 max (µ1cs, µ2, µ3cs) r log3 (n1n2) , (24)

ii) (weak incoherence condition) (18) holds and

m > c1µ21c

2sr

2 log3(n1n2), (25)

then Robust-EMaC is exact, i.e. the minimizer(M , S

)satisfies M = X , with probability exceeding 1−(n1n2)−2.

Theorem 3 specifies a candidate choice of regularization parameter λ that allows orderwise optimality, whichonly depends on the size of Ω but is otherwise data-independent. In practice, however, λ may better be selectedvia cross validation.

Furthermore, Theorem 3 demonstrates the possibility of robust recovery under a constant proportion of sparsecorruptions. It states the following: (1) under strong incoherence condition, robust recovery is possible as soon asthe number of measurements exceeds O

(r log3 (n1n2)

); (2) under weak incoherence condition, robust recovery is

possible from O(r2 log (n1n2)

)samples. Compared with Theorem 1, an additional O(log(n1n2)) factor occurs.

We note, however, that these conditions can be refined via finer tuning of concentration of measure inequalities.

IV. STRUCTURED MATRIX COMPLETION

One problem closely related to our method is completion of multi-fold Hankel matrices from a small number ofentries. While each spectrally sparse signal can be mapped to a low-rank multi-fold Hankel matrix, it is not clearwhether all multi-fold Hankel matrices of rank r can be written as the enhanced form of an object with spectralsparsity r. Therefore, one can think of recovery of multi-fold Hankel matrices as a more general problem than thespectral compressed sensing problem. Indeed, Hankel matrix completion has found numerous applications in systemidentification [32], [33], natural language processing [34], computer vision [35], magnetic resonance imaging [36],etc.

There has been several work concerning algorithms and numerical experiments for Hankel matrix completions[32], [33], [37]. However, to the best of our knowledge, there has been little theoretical guarantee that addressesdirectly Hankel matrix completion. Our analysis framework can be straightforwardly adapted to the general K-foldHankel matrix completions. Notice that µ2 and µ3 are defined using the SVD of Xe in (19) and (20), and we onlyneed to modify the definition of µ1, as stated in the following theorem.

Theorem 4. Consider a K-fold Hankel matrix Xe of rank r. The bounds in Theorems 1, 2 and 3 continue to hold,if the incoherence µ1 is defined as the smallest number that satisfies

max(k,l)∈[n1]×[n2]

‖UU∗A(k,l)‖2F, ‖A(k,l)V V ∗‖2F

≤ µ1csr

n1n2. (26)

Proof: See Appendix K.Condition (26) requires that the left and right singular vectors are sufficiently uncorrelated with the observation

basis. In fact, condition (26) is a much weaker assumption than (18).It is worth mentioning that low-rank Hankel matrices can often be converted to low-rank Toeplitz counterparts.

Both Hankel and Toeplitz matrices are important forms that capture the underlying harmonic structures. Our resultsand analysis framework extend to low-rank Toeplitz matrix completion problem without difficulty.

10

V. NUMERICAL EXPERIMENTS

In this section, we present numerical examples to evaluate the performance of the EMaC algorithm and itsvariants under different scenarios. We further examine the application of EMaC in image super resolution. Finally,we propose an extension of singular value thresholding (SVT) developed by Cai et. al. [38] that exploits themulti-fold Hankel structure to handle larger scale data sets.

A. Dirichlet Kernel

To better understand the incoherence condition set in the theorem, we define the two-dimensional Dirichlet kernelas

K(f1, f2) =1

k1k2

(1− e−j2πk1f11− e−j2πf1

)(1− e−j2πk2f21− e−j2πf2

),

where f1, f2 ∈ [−1/2, 1/2]. Fig. 1 (a) shows the amplitude of K(f1, f2) when k = k1 = k2 = 6. The (i1, i2)-thentry of the matrix GL is then given as

(GL)i1,i2 = K(yi1 − yi2 , zi1 − zi2).

The values of the off-diagonal entries decay inverse proportionally to the separation between the frequenciesaccording to the Dirichlet kernel. We numerically evaluated the minimum eigenvalue of GL in Fig. 1 (b) fordifferent k = 6, 36, 72 when the spikes are randomly generated and the number of spikes is given as the sparsitylevel. As we increase the dimension k, the minimum eigenvalue of GL is closer to one, verifying our theoreticalargument in Section III-A.

separation on x axis

se

pa

ratio

n o

n y

axis

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sparsity level

min

imu

m e

ige

nva

lue

of

GL

k = 6

k = 36

k = 72

(a) (b)

Fig. 1. (a) The two-dimensional Dirichlet kernel when k = k1 = k2 = 6; (b) The empirical distribution of the minimum eigenvalueσmin(GL) for different k with respect to the sparsity level.

B. Phase Transition in the Noiseless Setting

To evaluate the practical ability of the EMaC algorithm, we conducted a series of numerical experiments toexamine the phase transition for exact recovery. A square enhanced form was adopted with n1 = n2, whichcorresponds to the smallest cs. For each (r,m) pair, 100 Monte Carlo trials were conducted. We generated aspectrally sparse data matrix X by randomly generating r frequency spikes in [0, 1]× [0, 1], and sampled a subsetΩ of size m entries uniformly at random. The EMaC algorithm was conducted using the convex programmingmodeling software CVX with the interior-point solver SDPT3 [39]. Each trial is declared successful if the normalizedmean squared error (NMSE) satisfies ‖X −X‖F/‖X‖F ≤ 10−3, where X denotes the estimate obtained throughEMaC. The empirical success rate is calculated by averaging over 100 Monte Carlo trials.

Fig. 2 illustrates the results of these Monte Carlo experiments when the dimensions of X are 11 × 11 and15× 15. The horizontal axis corresponds to the number m of samples revealed to the algorithm, while the vertical

11

axis corresponds to the spectral sparsity level r. The empirical success rate is reflected by the color of each cell.It can be seen from the plot that the number of samples m grows approximately linearly with respect to thespectral sparsity r, and that the slopes of the phase transition lines for two cases are approximately the same. Theseobservation are in line with our theoretical guarantee in Theorem 1. This phase transition diagram validates thepractical applicability of our algorithm in the noiseless setting.

number of samples

sp

ars

ity le

ve

l

20 30 40 50 60 70 80 90 100 110 120

2

4

6

8

10

12

14

16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m: number of samples

r: s

pa

rsity le

ve

l

20 40 60 80 100 120 140 160 180 200

2

4

6

8

10

12

14

16

18

20

22

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) (b)

Fig. 2. Phase transition plots where frequency locations are randomly generated. The plot (a) concerns the case where n1 = n2 = 11,whereas the plot (b) corresponds to the situation where n1 = n2 = 15. The empirical success rate is calculated by averaging over 100 MonteCarlo trials.

C. Stable Recovery from Noisy Data

Fig. 3 further examines the stability of the proposed algorithm by performing Noisy-EMaC with respect todifferent parameter δ on a noise-free dataset of r = 4 complex sinusoids with n1 = n2 = 11. The number ofrandom samples is m = 50. The reconstructed NMSE grows approximately linear with respect to δ, validating thestability of the proposed algorithm.

10−3

10−2

10−1

100

101

10−5

10−4

10−3

10−2

10−1

100

δ

NM

SE

Fig. 3. The reconstruction NMSE with respect to δ for a dataset with n1 = n2 = 11, r = 4 and m = 50.

D. Robust Line Spectrum Estimation

Consider the problem of line spectrum estimation, where the time domain measurements are contaminated by aconstant portion of outliers. We conducted a series of Monte Carlo trials to illustrate the phase transition for perfect

12

recovery of the ground truth. The true data X is assumed to be a 125-dimensional vector, where the locations of theunderlying frequencies are randomly generated. The simulations were carried out again using CVX with SDPT3.

Fig. 4(a) shows the phase transition for robust line spectrum estimation when 10% of the entries are corrupted,which showcases the tradeoff between the number m of measurements and the recoverable spectral sparsity levelr. One can see from the plot that m is approximately linear in r on the phase transition curve even when 10% ofthe measurements are corrupted, which validates our finding in Theorem 3. Fig. 4(b) illustrates the success rate ofexact recovery when we obtain samples for all entry locations. This plot illustrates the tradeoff between the spectralsparsity level and the number of outliers when all entries of the corrupted Xo are observed. It can be seen thatthere is a large region where exact recovery can be guaranteed, demonstrating the power of our algorithms in thepresence of sparse outliers.

m: number of samples

r: s

pa

rsity le

ve

l

20 40 60 80 100

5

10

15

20

25

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r: sparsity spectral level

s:

nu

mb

er

of

ou

tlie

rs

10 15 20 25 30 35

10

20

30

40

50

60

70

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Ground truth and observation (b) Recovered signal and corruptions

Fig. 4. Robust line spectrum estimation where mode locations are randomly generated: (a) Phase transition plots when n = 125, and 10%of the entries are corrupted; the empirical success rate is calculated by averaging over 100 Monte Carlo trials. (b) Phase transition plotswhen n = 125, and all the entries are observed; the empirical success rate is calculated by averaging over 20 Monte Carlo trials.

E. Synthetic Super Resolution

The proposed EMaC algorithm works beyond the random observation model in Theorem 1. Fig. 5 considers asynthetic super resolution example motivated by [24], where the ground truth in Fig. 5(a) contains 6 point sourceswith constant amplitude. The low-resolution observation in Fig. 5(b) is obtained by measuring low-frequencycomponents [−flo, flo] of the ground truth. Due to the large width of the associated point-spread function, both thelocations and amplitudes of the point sources are distorted in the low-resolution image.

We apply EMaC to extrapolate high-frequency components up to [−fhi, fhi], where fhi/flo = 2. Thereconstruction in Fig. 5(c) is obtained via applying directly inverse Fourier transform of the spectrum to avoidparameter estimation such as the number of modes. The resolution is greatly enhanced from Fig. 5(b), suggestingthat EMaC is a promising approach for super resolution tasks.

F. Singular Value Thresholding for EMaC

The above Monte Carlo experiments were conducted using the advanced semidefinite programming solver SDPT3.This and many other popular solvers (e.g. SeDuMi) are based on interior point methods, which are typicallyinapplicable to large-scale data. In fact, SDPT3 fails to handle an n × n data matrix when n exceeds 19, whichcorresponds to a 100× 100 enhanced matrix.

One alternative for large-scale data is the first-order algorithms tailored for matrix completion problems, e.g. thesingular value thresholding (SVT) algorithm [38]. We propose a modified SVT algorithm in Algorithm 1 to exploitthe Hankel structure.

In particular, two operators are defined as follows:

13

0

0.2

0.4

0.6

0.8

1 0

0.2

0.4

0.6

0.8

1

0

0.5

1

am

plit

ud

e

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Ground truth (b) Low-resolution observation (c) High-resolution reconstruction

Fig. 5. A synthetic super resolution example, where the observation (b) is taken from the low-frequency components of the ground truthin (a), and the reconstruction (c) is done via inverse Fourier transform of the extrapolated high-frequency components.

Algorithm 1 Singular Value Thresholding for EMaC.Input: The observed data matrix Xo on the location set Ω.initialize: let Xo

e denote the enhanced form of PΩ (Xo); set M0 = X0e and t = 0.

repeat1) Qt ← Dτt (M t)2) M t ← HX0 (Qt)3) t← t+ 1

until convergenceoutput X as the data matrix with enhanced form M t.

• Dτt(·) in Algorithm 1 denotes the singular value shrinkage operator. Specifically, if the SVD of X is givenby X = UΣV ∗ with Σ = diag (σi), then

Dτt (X) := Udiag(

(σi − τt)+

)V ∗,

where τt > 0 is the soft-thresholding level.• In the K-dimensional frequency model, HXo(Qt) denotes the projection of Qt onto the subspace of enhanced

matrices (i.e. K-fold Hankel matrices) that are consistent with the observed entries.

Consequently, at each iteration, a pair (Qt,M t) is produced by first performing singular value shrinkage and thenprojecting the outcome onto the space of K-fold Hankel matrices that are consistent with observed entries.

Fig. 6 illustrates the performance of Algorithm 1. We generated a true 101 × 101 data matrix X through asuperposition of 30 random complex sinusoids, and revealed 5.8% of the total entries (i.e. m = 600) uniformly atrandom. The noise was i.i.d. Gaussian giving a signal-to-noise amplitude ratio of 10. The reconstructed vectorizedsignal is superimposed on the ground truth in Fig. 6. The normalized reconstruction error was

∥∥∥X −X∥∥∥

F/ ‖X‖F =

0.1098, validating the stability of our algorithm in the presence of noise.

VI. PROOF OF THEOREM 1

EMaC has similar spirit as the well-known matrix completion algorithms [27], [28] except that we impose Hankeland multi-fold Hankel structures on the matrices. While [28] has presented a general sufficient condition for exactrecovery (see [28, Theorem 3]), the basis in our case does not exhibit a good coherence property as required in[28], and hence these results cannot yield useful estimates in our framework. Nevertheless, the beautiful golfingscheme introduced in [28] lays the foundation of our analysis in the sequel.

For concreteness, the analysis in this paper focuses on recovering harmonically sparse signals as stated in Theorem1, since proving Theorem 1 is slightly more involved than proving Theorem 4. We note, however, that our analysis

14

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

Time (vectorized)

Am

plit

ud

e

True Signal

Reconstructed Signal

Fig. 6. The performance of SVT for Noisy-EMaC for a 101 × 101 data matrix that contains 30 random frequency spikes. 5.8% of allentries (m = 600) are observed with signal-to-noise amplitude ratio 10. Here, τt = 0.1σmax (M t) /

⌈t10

⌉empirically. For concreteness, the

reconstructed data against the true data for the first 100 time instances (after vectorization) are plotted.

already entails all reasoning required for Theorem 4. In this section, we restrict our attention to the real case (i.e.X is real-valued) for simplicity4.

Before proceeding to the proof, we would first like to stress that the incoherence measure (µ1, µ2, µ3) are notindependent. In addition to (µ1, µ2, µ3), we define another measure µ4 as the smallest number that satisfies

∀b ∈ [n1]× [n2] :∑

a∈[n1]×[n2]

ωa |〈PTAb,Aa〉|2 ≤µ4r

n1n2ωb, (27)

Some of their mutual connections are listed as follows.

Lemma 2. Suppose that Xe has incoherence (µ1, µ2, µ3, µ4). We have the following.1) GL = E∗LEL, and GR = (ERE

∗R)T ;

2) For any a, b ∈ [n1]× [n2], one has

|〈Ab,PTAa〉| ≤√ωb

ωa

3µ1csr

n1n2; (28)

3) The incoherence measure satisfiesµ2 ≤ µ2

1c2sr, µ3 ≤ µ2

1c2s r, (29)

andµ4 ≤ 9µ2

1c2sr; (30)

4) The measure µ4 can be bounded by µ1 and µ3 as follows

µ4 ≤ 6µ1cs + 3µ3cs.

Proof: See Appendix A.Note that the above lemma indicates that our new incoherence measure µ4 can be bounded by the sum of µ1 and

µ3 up to some multiplicative constant. In fact, we will prove instead the following theorem based on (µ1, µ2, µ4),which is slightly more general than Theorem 1.

Theorem 5. Suppose that X has incoherence measure (µ1, µ2, µ3, µ4). If

m > c0 max (µ1cs, µ2, µ4) r log2 (n1n2) ,

then X is the unique solution of EMaC with probability exceeding 1− (n1n2)−2.

Note that Theorem 1 can be delivered as an immediate consequence of Theorem 5 by exploiting the relationsamong (µ1, µ2, µ3, µ4) given in Lemma 2.

4All of the proof arguments in this paper work for Hermitian structured matrices (e.g. Hermitian Toeplitz matrices), and apply also tonon-Hermitian complex case with some minor adjustment in the constants (see, e.g., [28, Section III.D]).

15

A. Dual Certification

Denote by A(k,l) (M) the projection of M onto the subspace spanned by A(k,l), and define the projectionoperator onto the space spanned by all A(k,l) and its orthogonal complement as

A :=∑

(k,l)∈[n1]×[n2]

A(k,l), and A⊥ = I − A. (31)

Here,A⊥ (M)

spans a [k1k2 (n1 − k1 + 1) (n2 − k2 + 1)− n1n2] dimensional subspace.

There are two common ways to describe the randomness of Ω: one corresponds to sampling without replacement,and another concerns sampling with replacement (i.e. Ω contains m indices ai ∈ [n1]× [n2] : 1 ≤ i ≤ m that arei.i.d. generated). As discussed in [28, Section II.A], while both situations result in the same order-wide bounds,the latter situation admits simpler analysis due to independence. Therefore, we will assume that Ω is a multiset(possibly with repeated elements) and ai’s are independently and uniformly distributed throughout the proofs ofthis paper, and define the associated operators as

AΩ :=

m∑i=1

Aai . (32)

We also define another projection operator A′Ω similar to (32), but with the sum extending only over distinct samples.Its complement operator is defined as A′Ω⊥ := A−A′Ω. Note that AΩ (M) = 0 is equivalent to A′Ω(M) = 0.

With these definitions, EMaC can be rewritten as the following general matrix completion problem:

minimizeM

‖M‖∗ (33)

subject to A′Ω (M) = A′Ω (Xe) ,

A⊥ (M) = A⊥ (Xe) = 0.

To prove exact recovery of convex optimization, it suffices to produce an appropriate dual certificate, as statedin the following lemma.

Lemma 3. For a location set Ω that contains m random indices. Suppose that the sampling operator AΩ obeys∥∥∥PTAPT − n1n2

mPTAΩPT

∥∥∥ ≤ 1

2. (34)

If there exists a matrix W that obeysA′Ω⊥ (UV ∗ + W ) = 0, (35)

‖PT (W )‖F ≤1

2n21n

22

, (36)

and‖PT⊥ (W )‖ ≤ 1

2. (37)

Then Xe is the unique optimizer of (33) or, equivalently, X is the unique minimizer of EMaC.

Proof: See Appendix B.Condition (34) will be analyzed in Section VI-B, while a valid certificate W will be constructed in Section VI-C.

These are the objectives of the remaining part of the section.

B. Deviation of∥∥PTAPT − n1n2

m PTAΩPT∥∥

Lemma 3 requires that AΩ is sufficiently incoherent with respect to T . The following lemma quantifies theprojection of each A(k,l) onto the tangent space T .

Lemma 4. Suppose that (18) holds, then∥∥UU∗A(k,l)

∥∥2

F≤ µ1csr

n1n2,∥∥A(k,l)V V ∗

∥∥2

F≤ µ1csr

n1n2,∥∥PT (A(k,l)

)∥∥2

F≤ 2µ1csr

n1n2(38)

for all (k, l) ∈ [n1]× [n2].

16

Proof: See Appendix C.As long as (38) holds, the deviation of PTAΩPT can be bounded reasonably well in the following lemma. This

establishes Condition (34) required by Lemma 3.

Lemma 5. Suppose that ∥∥PT (A(k,l)

)∥∥2

F≤ 2µ1csr

n1n2,

for (k, l) ∈ [n1]× [n2]. Then for any small constant δ ≤ 2, one has∥∥∥n1n2

mPTAΩPT − PTAPT

∥∥∥ ≤ δ (39)

with probability exceeding 1− 2n1n2 exp(− δ2m

16µ1csr

).

Proof: See Appendix D.The above two lemmas taken collectively lead to the following fact: for any given constant ε < e−1 <

12 ,∥∥n1n2

m PTAΩPT − PTAPT∥∥ ≤ ε holds with probability exceeding 1 − (n1n2)−4, provided that m >

c1µ1csr log (n1n2) for some constant c1 > 0.

C. Construction of Dual Certificate

Now we are in a position to construct the dual certificate, for which we will employ the golfing scheme introducedin [28]. Suppose that we generate j0 independent random location multisets Ωi (1 ≤ i ≤ j0), each containing m

j0i.i.d. samples. This way the distribution of Ω is the same as Ω1 ∪ Ω2 ∪ · · · ∪ Ωj0 . Note that Ωi’s correspond tosampling with replacement. Let ρ := m

n1n2and q := ρ

j0denote the undersampling factors of Ω and Ωi, respectively.

Consider a small constant ε < 1e , and choose j0 := 3 log 1

εn1n2. The construction of the dual then proceeds as

follows:

Construction of a dual certificate W via the golfing scheme.1. Set B0 = 0, and j0 := 3 log 1

ε(n1n2).

2. For all i (1 ≤ i ≤ j0), let Bi = Bi−1 +(

1qAΩi +A⊥

)PT (UV ∗ −Bi−1) .

3. Set W := − (UV ∗ −Bj0).

We will establish that W is a valid dual certificate if we can show that W satisfies the conditions stated inLemma 3, which we will verify step by step.

First, by construction, we have the identities(A′Ω +A⊥

)(Bi) = Bi,

for all 1 ≤ i ≤ j0. Since UV ∗ + W = Bj0 , this validates that A′Ω⊥ (UV ∗ + W ) = 0, as required in (35).Secondly, if one defines the deviation of PTBi from UV ∗ as

F i := UV ∗ −Bi,

and hence W = F j0 , then one can verify that

PT (F i) = PT (UV ∗)− PT(Bi−1 +

(1

qAΩi +A⊥

)PT (UV ∗ −Bi−1)

)=

(PT − PT

(1

qAΩi +A⊥

)PT)

(F i−1) .

Lemma 5 asserts the following: if qn1n2 ≥ c1µ1csr log (n1n2) or, equivalently, m ≥ c1µ1csr log2(n1n2), then withoverwhelming probability one has∥∥∥∥PT − PT (1

qAΩi +A⊥

)PT∥∥∥∥ =

∥∥∥∥PTAPT − 1

qPTAΩiPT

∥∥∥∥ ≤ ε < 1

2.

17

This allows us to bound ‖PT (F i)‖F as follows

‖PT (F i)‖F ≤ εi ‖PT (F 0)‖F ≤ ε

i ‖UV ∗‖F = εi√r,

which immediately validates Condition (36):

‖PT (W )‖F = ‖PT (F j0)‖F ≤ εj0√r <

1

2n21n

22

.

Finally, it remains to show that ‖PT⊥ (W )‖ ≤ 12 . For any F ∈ T , define the following homogeneity measure

ν (F ) = max(k,l)∈[n1]×[n2]

1

ωk,l

∣∣⟨A(k,l),F⟩∣∣2 , (40)

which largely relies on the average per-entry energy in each skew diagonal. We would like to show thatν((I − PT

(1qAΩi +A⊥

))F)≤ 1

4ν (F ) with high probability. This is supplied in the following lemma.

Lemma 6. Consider any given F ∈ T , and suppose that (18) and (27) hold. If the following bound holds,

m > c7 max µ4, µ1cs r log2 (n1n2) ,

then one has

ν

((PT − PT

(1

qAΩi +A⊥

)PT)F

)≤ 1

4ν (F ) (41)

for all 1 ≤ i ≤ j0 with probability exceeding 1− (n1n2)−3.

Proof: See Appendix E.This lemma basically indicates that a homogeneous F with respect to the observation basis typically results in

a homogeneous(PT − PT

(1qAΩi +A⊥

)PT)

(F ), and hence we can hope that the homogeneity condition (19)of F 0 can carry over to every PT (F i) (1 ≤ i ≤ j0).

Observe that Condition (19) is equivalent to saying

ν (F 0) = max(k,l)∈[n1]×[n2]

1

ωk,l

∣∣⟨A(k,l),UV ∗⟩∣∣2 = max

(k,l)∈[n1]×[n2]

1

ω2k,l

∣∣∣∣∣∣∑

(α,β)∈Ωe(k,l)

(UV ∗)α,β

∣∣∣∣∣∣2

≤ µ2r

(n1n2)2 .

One can then verify that for every i (0 ≤ i ≤ j0),

ν (PT (F i)) ≤1

4ν (PT (F i−1)) ≤

(1

4

)iν (F 0) ≤

(1

4

)i µ2r

(n1n2)2

holds with high probability if m > c7 max µ4, µ1cs r log2 (n1n2) for some constant c7 > 0.The following lemma then relates the homogeneity measure with

∥∥∥PT⊥ (1qAΩi +A⊥

)(F i)

∥∥∥.

Lemma 7. For any given F ∈ T such that ν(F ). Then there exist positive constants c8 and c9 such that for anyt ≤

√ν(F )n1n2, ∥∥∥∥PT⊥ (1

qAΩi +A⊥

)(F )

∥∥∥∥ > t

holds with probability at most c8 exp(− c9qt2

ν(F )n1n2

).

Proof: See Appendix F.Since ν(F i) ≤

(14

)i µ2rn2

1n22

for all 1 ≤ i ≤ j0 with high probability, then one can bound√ν(F i)n1n2√

16µ2r≤(

1

2

)i+2

.

18

Lemma 7 immediately yields that for all i (0 ≤ i ≤ j0)

P

∀i :

∥∥∥∥PT⊥ (1

qAΩ +A⊥

)(F i)

∥∥∥∥ ≤ (1

2

)i+2≥ P

∀i :

∥∥∥∥PT⊥ (1

qAΩ +A⊥

)(F i)

∥∥∥∥ ≤√ν(F i)n1n2√

16µ2r

≥ 1− c8n1n2 exp

(−c9qn1n2

16µ2r

)≥ 1− c8 (n1n2)−4 ,

holds if qn1n2 > c12 max (µ1cs, µ4, µ2) r log (n1n2) for some constant c12 > 0. This is also equivalent to

m > c13 max (µ1cs, µ4, µ2) r log2 (n1n2)

for some constant c13 > 0. Under this condition, we can conclude

‖PT⊥ (W )‖ ≤j0∑i=0

∥∥∥∥PT⊥ (1

qAΩ +A⊥

)(F i)

∥∥∥∥≤

j0∑i=0

(1

2

)i+2

<1

2.

So far, we have successfully established that with high probability, W is a valid dual certificate, and henceEMaC admits perfect reconstruction of X .

VII. PROOF OF THEOREM 3

The algorithm Robust-EMaC has similar spirit as the well-known robust principal component analysis [10], [29]that seeks a decomposition of low-rank plus sparse matrices, except that we impose multi-fold Hankel structures onboth the low-rank and sparse matrices. Similar to the proof for Theorem 1, the proof is based on duality analysis,and we rely on the golfing scheme introduced in [28] to construct a valid dual certificate.

In this section, we prove the results for a slightly different sampling model as follows.• The location multiset Ωclean of observed uncorrupted entries is generated by sampling (1− s) ρn1n2 i.i.d.

entries uniformly at random.• The location multiset Ω of observed entries is generated by sampling ρn1n2 i.i.d. entries uniformly at random,

with the first (1− s) ρn1n2 entries coming from Ωclean.• The location set Ωdirty of observed corrupted entries is given by Ω′\Ωclean′ , where Ω′ and Ωclean′ denote the

sets of distinct entry locations in Ω and Ωclean, respectively.As mentioned in the proof of Theorem 1, this slightly different sampling model, while resulting in the sameorder-wise bounds, significantly simplifies the analysis due to the independent assumptions.

Similar to the proof for the noiseless setting, we will prove our results for the real case under the incoherencemeasures (µ1, µ2, µ4) as introduced in Section VI. Moreover, we will prove the theorem under a stronger randomnesscondition, that is, the signs of all non-zero entries of S are independent zero-mean random variables. Specifically,we will prove the following theorem.

Theorem 6. Suppose that X has incoherence measure (µ1, µ2, µ3, µ4), and let λ = 1√m log(n1n2)

. Assume that s

is some small positive constant, and that the signs of nonzero entries of S are independently generated with mean0. If

m > c0 max (µ1cs, µ2, µ4) r log3 (n1n2) ,

then Robust-EMaC succeeds in recovering X with probability exceeding 1− (n1n2)−2.

A simple derandomization argument introduced in [29, Section 2.2] immediately suggests that the performanceof Robust-EMaC under the fixed-sign pattern is no worse than that under the random-sign pattern with sparsityparameter 2s. Therefore, we will focus on the random sign pattern, which are much easier to analyze.

The same argument as in Section VI indicates that Theorem 3 can be delivered as an immediate consequence ofTheorem 6.

19

A. Dual Certification

We adopt similar notations as in Section VI-A. That said, if we generate ρn1n2 i.i.d. entry locations ai’suniformly at random, and let the multisets Ω and Ωclean contain respectively ai|1 ≤ i ≤ ρn1n2 and ai|1 ≤ i ≤ρ(1− s)n1n2), then

AΩ :=

ρn1n2∑i=1

Aai , and AΩclean :=

ρ(1−s)n1n2∑i=1

Aai ,

corresponding to sampling with replacement. Besides, A′Ω (resp. A′Ωclean) is defined similar to AΩ (resp. AΩclean),but with the sum extending only over distinct samples.

We will establish that exact recovery can be guaranteed, if we can produce a valid dual certificate as follows.

Lemma 8. Suppose that s is some small positive constant. Suppose that the associated sampling operator AΩclean

obeys ∥∥∥∥PTAPT − 1

ρ (1− s)PTAΩcleanPT

∥∥∥∥ ≤ 1

2, (42)

and‖AΩclean (M)‖F ≤ 10 log (n1n2)

∥∥A′Ωclean (M)∥∥

F, (43)

for any matrix M . If there exist a regularization parameter λ (0 < λ < 1) and a matrix W obeying‖PT (W + λsgn (Se)−UV ∗)‖F ≤

λn2

1n22,

‖PT⊥ (W + λsgn (Se))‖ ≤ 14 ,

A′(Ωclean)⊥

(W ) = 0,∥∥A′Ωclean (W )∥∥∞ ≤

λ4 ,

(44)

then Robust-EMaC is exact, i.e. the minimizer(M , S

)satisfies M = X .

Proof: See Appendix G.We note that a good bound on

∥∥∥PTAPT − 1ρ(1−s)PTAΩcleanPT

∥∥∥ can be found using Lemma 5. Specifically, thereexists some constant c1 > 0 such that if ρ (1− s)n1n2 > c1µ1csr log (n1n2), then one has∥∥∥∥PTAPT − 1

ρ (1− s)PTAΩcleanPT

∥∥∥∥ ≤ 1

2

with probability exceeding 1 − (n1n2)−4. Besides, a simple Chernoff bound [40] indicates that with probabilityexceeding 1− (n1n2)−3, none of the entries is sampled more than 10 log (n1n2) times. Equivalently,

P(∀M : ‖AΩclean (M)‖F ≤ 10 log (n1n2)

∥∥A′Ωclean (M)∥∥

F

)≥ 1− (n1n2)−3.

Our objective in the remaining part of the section is to produce a matrix W satisfying Condition (44).

B. Construction of Dual Certificate

Suppose that we generate j0 independent random location multisets Ωcleanj , where Ωclean

j contains qn1n2 i.i.d.samples uniformly at random. Here, we set q := (1−s)ρ

j0. This way the distribution of the multiset Ω is the same as

Ωclean1 ∪ Ωclean

2 ∪ · · · ∪ Ωcleanj0

.We now propose constructing a dual certificate W as follows:

Construction of a dual certificate W via the golfing scheme.1. Set F 0 = PT (UV ∗ − λsgn (Se)), and j0 := 5 log 1

εn1n2.

2. For every i (1 ≤ i ≤ j0), let F i :=(PTAPT − 1

qPTAΩcleaniPT)F i−1.

3. Set W :=∑j0

j=1

(1qAΩclean

j+A⊥

)F j−1.

20

Take λ = 1√m log(n1n2)

. We will justify that W is a valid dual certificate, by examining the conditions in (44)

step by step.(1) The first condition requires ‖PT (W + λsgn (Se)−UV ∗)‖F to be reasonably small. Lemma 5 asserts that

there exist some constants c1, c1 > 0 such that if m = ρn1n2 > c1µ1csr log2 (n1n2) or, equivalently, qin1n2 >c1µ1csr log2 (n1n2), then

‖PTF j0‖F ≤1

ε‖PTF j0−1‖F ≤ · · · ≤

(1

ε

)j0‖PTF 0‖F

≤ 1

n51n

52

(‖UV ∗‖F + λ ‖sgn (Se)‖F) (45)

≤ 1

n51n

52

(√r + λn1n2

)<

1

n51n

52

(n1n2 + λn1n2) λ

n21n

22

(46)

with probability exceeding 1− (n1n2)−3.It remains to relate PT (W + λsgn (Se)−UV ∗) with PTF j0 . By construction,

−PT (W + λsgn (Se)−UV ∗) = PTF 0 −j0∑i=1

PT(

1

qAΩclean

i+A⊥

)PTF i−1

= PTF 0 − PT(

1

qAΩclean

1+A⊥

)PTF 0 −

j0∑i=2

PT(

1

qAΩclean

i+A⊥

)F i−1

= PTF 1 −j0∑i=2

PT(

1

qAΩclean

i+A⊥

)F i−1

= · · · = PTF j0 .

Plugging this into (46) establishes that

‖PT (W + λsgn (Se)−UV ∗)‖F = ‖PTF j0‖F ≤λ

n21n

22

. (47)

(2) The second component of the proof is to develop a bound on ‖PT⊥ (W + λsgn (Se))‖, for which we proceedby bounding ‖PT⊥ (W )‖ and ‖PT⊥ (λsgn (Se))‖ separately. Recall that an inhomogenuity measure ν (F ) applied toa matrix F ∈ T is defined in (40). The following lemma develops upper bounds on ν (F i) for every i (0 ≤ i < j0).

Lemma 9. If λ = 1√m log(n1n2)

, then there exists constants c7, c9 > 0 such that if m >

c7 max (µ1cs, µ4) r log2 (n1n2), for any 0 ≤ i ≤ j0,

ν (F i) ≤c9 max (µ1cs, µ2, µ4) r

4in21n

22

with probability exceeding 1− n−31 n−3

2 .

Proof: See Appendix H.Now we are ready to develop an upper bound on ‖PT⊥ (W )‖. Observe from the construction procedure of W

that

‖PT⊥ (W )‖ =

∥∥∥∥∥j0∑i=1

PT⊥(

1

qAΩclean

i+A⊥

)F i−1

∥∥∥∥∥≤

j0∑i=1

∥∥∥∥PT⊥ (1

qAΩclean

i+A⊥

)F i−1

∥∥∥∥ . (48)

Combining Lemma 7 and Lemma 9 with some simple manipulations yields the following: if m >c13 max (µ1cs, µ2, µ4) r log2 (n1n2), then one has

P

∀i :

∥∥∥∥PT⊥ (1

qAΩclean

i+A⊥

)F i−1

∥∥∥∥ ≤ (1

2

)i+4≥ 1− c8

n41n

42

.

21

Consequently, (48) can be further bounded by

‖PT⊥ (W )‖ ≤j0∑i=0

(1

2

)i+4

<1

8(49)

with probability exceeding 1− c8n4

1n42.

We still need to bound ‖PT⊥ (λsgn (Se))‖, which is supplied in the following lemma.

Lemma 10. Consider λ = 1√m log(n1n2)

, and suppose that s is a small constant. If m >

c7 max (µ1cs, µ4) r log2 (n1n2) , we can bound

‖sgn (Se)‖ ≤√c22ρsn1n2 log

1

2 (n1n2) and ‖PT⊥ (λsgn (Se))‖ ≤1

8(50)

with probability at least 1− (n1n2)−5.

Proof: See Appendix I.Putting (49) and (50) together yields

‖PT⊥ (W + λsgn (Se))‖ ≤ ‖PT⊥ (W )‖+ ‖PT⊥ (λsgn (Se))‖ ≤1

4

with high probability.(3) By construction, A′

(Ωclean)⊥(W ) = 0.

(4) The last step is to bound∥∥A′Ωclean (W )

∥∥∞, which is apparently bounded above by ‖AΩclean (W )‖∞. By

construction, one can express

‖AΩclean (W )‖∞ =

∥∥∥∥∥j0∑i=1

AΩclean

(1

qAΩclean

i+A⊥

)F i−1

∥∥∥∥∥∞

=

∥∥∥∥∥j0∑i=1

1

qAΩclean

iF i−1

∥∥∥∥∥∞

≤j0∑i=1

1

qmax

(k,l)∈[n1]×[n2]

∣∣⟨A(k,l),F i−1

⟩∣∣√ωk,l

=

j0∑i=1

1

q

√ν (F i−1)

≤j0∑i=1

5 log (n1n2)

ρ

√max (µ1cs, µ2, µ4) r

4i−1n21n

22

≤j0−1∑i=0

1

2i

√25 max (µ1cs, µ2, µ4) r log2 (n1n2)

m2

<1

4

√1

m log (n1n2)=λ

4,

when m > c25 max (µ1cs, µ2, µ4) r log3 (n1n2) for some constant c25 > 0.We have verified that W satisfies the four conditions required in (44), and is hence a valid dual certificate. This

completes the proof.

VIII. CONCLUDING REMARKS

We present an efficient nonparametric algorithm to estimate a spectrally sparse object from its partial time-domainsamples, which poses spectral compressed sensing as a low-rank Hankel structured matrix completion problem.Under mild incoherence conditions, our algorithm enables recovery of the multi-dimensional unknown frequencieswith infinite precision, which remedies the basis mismatch issue that arises in conventional CS paradigms. We haveshown both theoretically and numerically that our algorithm is stable against bounded noise and a constant portionof arbitrary corruptions, and can be extended to tasks such as super resolution. To the best of our knowledge, ourresult on Hankel matrix completion is also the first theoretical guarantee that is close to the information-theoreticallimit (up to a logarithmic factor).

22

Our results are based on uniform random observation models. In particular, this paper considers directly takinga random subset of the time domain samples, it is also possible to take a random set of linear mixtures of the timedomain samples, as in the renowned CS setting [7]. This again can be translated into taking linear measurementsof the low-rank K-fold Hankel matrix, given as y = B(Xe). Unfortunately, due to the Hankel structures, B doesnot satisfy the matrix restricted isometry property (RIP) [31], [41]. Nonetheless, the technique developed in thispaper can be extended without difficulty to analyze linear measurements, in a similar flavor of a golfing schemedeveloped for CS in [22].

It remains to be seen whether it is possible to obtain performance guarantees of the proposed EMaC algorithmsimilar to that in [24] for super resolution. It is also of great interest to develop efficient numerical methods tosolve the EMaC algorithm in order to handle massive datasets.

APPENDIX APROOF OF LEMMA 2

(1) We first show that E∗LEL and ERE∗R coincide with the matrices GL and GT

R. Since Y d is a diagonal matrix,one can verify the identities (

Y l∗d Z∗LZLY

ld

)i1,i2

=(y∗i1yi2

)l(Z∗LZL)i1,i2 ,

and

(Z∗LZL)i1,i2 =

k2−1∑k=0

(z∗i1zi2

)k=

1−(z∗i1zi2)

k2

1−z∗i1zi2, if i1 6= i2,

k2, if i1 = i2,

which immediately give

E∗LEL =1

k1k2

[Z∗L,Y

∗dZ∗L, · · · , (Y ∗d)k1−1 Z∗L

]ZL

ZLY d...

ZLYk1−1d

=1

k1k2

k1−1∑l=0

Y l∗d Z∗LZLY

ld

=1

k1k2

((k1−1∑l=0

(y∗i1yi2

)l)(Z∗LZL)i1,i2

)1≤i1,i2≤r

=1

k1k2

(1−

(y∗i1yi2

)k11− y∗i1yi2

1−(z∗i1zi2

)k21− z∗i1zi2

)1≤i1,i2≤r

with the convention that1−(y∗i1yi1)

k1

1−y∗i1yi1= k1 and

1−(z∗i1zi1)k2

1−z∗i1zi1= k2. That said, all diagonal entries satisfy

(E∗LEL)i1,i1 = 1, and the magnitude of off-diagonal entries can be calculated as∣∣∣(E∗LEL)i1,i2

∣∣∣ =

∣∣∣∣ sin [πk1 (f1i1 − f1i2)]

k1 sin [π (f1i1 − f1i2)]

sin [πk2 (f2i1 − f2i2)]

k2 sin [π (f2i1 − f2i2)]

∣∣∣∣ .Recall that this exactly coincides with the definition of GL. Similarly, GR = (ERE

∗R)T . These findings immediately

yield

σmin (E∗LEL) ≥ 1

µ1, and σmin (ERE

∗R) ≥ 1

µ1. (51)

(2) Consider the case in which we only know σmin (GL) ≥ 1µ1

and σmin (GR) ≥ 1µ1

. In fact, since |〈Ab,PTAa〉| =|〈PTAb,Aa〉|, we only need to examine the situation where ωb < ωa.

Observe that

|〈Ab,PTAa〉| ≤ |〈Ab,UU∗Aa〉|+ |〈Ab,AaV V ∗〉|+ |〈Ab,UU∗AaV V ∗〉| .

23

Owing to the multi-fold Hankel structure of Aa, the matrix UU∗√ωaAa consists of ωa columns of UU∗. Since

there are only ωb nonzero entries in Ab each of magnitude 1√ωb

, we can derive

|〈Ab,UU∗Aa〉| ≤ ‖Ab‖1 ‖UU∗Aa‖∞ = ωb ·1√ωb·maxα,β

∣∣∣(UU∗Aa)α,β

∣∣∣≤√ωb

ωamaxα,β

∣∣∣(UU∗)α,β

∣∣∣ .Denote by M∗k and Mk∗ the kth column and kth row of M , respectively, then it can be observed that each

entry of UU∗ is bounded in magnitude by∣∣∣(UU∗)k,l

∣∣∣ =

∣∣∣∣(EL (E∗LEL)−1 E∗L

)k,l

∣∣∣∣ =∣∣∣(EL)k∗ (E∗LEL)−1 ((EL)l∗)

∗∣∣∣

≤ ‖(EL)k∗‖F ‖(EL)l∗‖F

∥∥∥(E∗LEL)−1∥∥∥

≤ r

k1k2

1

σmin (E∗LEL)≤ µ1csr

n1n2, (52)

which immediately implies that

|〈Ab,UU∗Aa〉| ≤√ωb

ωa

µ1csr

n1n2. (53)

Similarly, one can derive

|〈Ab,AaV V ∗〉| ≤√ωb

ωa

µ1csr

n1n2. (54)

We still need to bound the magnitude of 〈UU∗AaV V ∗,Ab〉. One can observe that for any 1 ≤ k ≤ k1k2:

‖(UU∗)k∗‖F ≤∥∥∥(EL)k∗ (E∗LEL)−1 E∗L

∥∥∥F

≤ ‖(EL)k∗‖F

∥∥∥(E∗LEL)−1 E∗L

∥∥∥ ≤√ r

k1k2· 1√

σmin (E∗LEL)

≤√

csr

n1n2σmin (E∗LEL).

Similarly, for any 1 ≤ l ≤ (n1 − k1 + 1) (n2 − k2 + 1), one has ‖(V V ∗)∗l‖F ≤√

csrn1n2σmin(E∗LEL) . The magnitude

of all entries of UU∗AaV V ∗ can now be bounded by

maxk,l

∣∣∣(UU∗AaV V ∗)k,l

∣∣∣ ≤ ‖Aa‖maxk‖(UU∗)k∗‖F max

l‖(V V ∗)∗l‖F

≤ 1√ωa

csr

n1n2σmin (E∗LEL)

≤ 1√ωa

µ1csr

n1n2.

Since Ab has only ωb nonzero entries each has magnitude 1√ωb

, one can verify that

|〈UU∗AaV V ∗,Ab〉| ≤(

maxk,l

∣∣∣(UU∗AaV V ∗)k,l

∣∣∣) · 1√ωbωb =

√ωb

ωa

µ1csr

n1n2. (55)

The above bounds (53), (54) and (55) taken together lead to

|〈Ab,PTAa〉| ≤ |〈UU∗Aa,Ab〉|+ |〈AaV V ∗,Ab〉|+ |〈UU∗AaV V ∗,Ab〉|

≤√ωb

ωa

3µ1csr

n1n2. (56)

24

(3) On the other hand, the bound on |〈Ab,PTAa〉| immediately leads the following upper bounds on∑a |〈UU∗AaV V ∗,Ab〉|2 ωa and

∑a |〈PTAb,Aa〉|2 ωa:∑

a∈[n1]×[n2]

|〈UU∗AaV V ∗,Ab〉|2 ωa

≤∑

a∈[n1]×[n2]

(√ωb

ωa

µ1csr

n1n2

)2

ωa = ωb

∑a∈[n1]×[n2]

(µ1csr

n1n2

)2

=ωbµ2

1c2s r

2

n1n2

which simply come from the inequality (55), and∑a∈[n1]×[n2]

|〈PTAb,Aa〉|2 ωa

≤∑

a∈[n1]×[n2]

(√ωb

ωa

3µ1csr

n1n2

)2

ωa = ωb

∑a∈[n1]×[n2]

(3µ1csr

n1n2

)2

= ωb9µ2

1c2s r

2

n1n2,

which is an immediate consequence of (56). These bounds indicate that µ3 ≤ µ21c

2s r and µ4 ≤ 9µ2

1c2s r.

We can also obtain an upper bound on µ2 through µ1 as follows. Observe that there exists a unitary matrix Bsuch that

UV ∗ = EL (E∗LEL)−1

2 B (ERE∗R)−

1

2 ER.

For any (k, l) ∈ [n1]× [n2], we can then bound∣∣∣(UV ∗)k,l

∣∣∣ =

∣∣∣∣(EL (E∗LEL)−1

2 B (ERE∗R)−

1

2 ER

)k,l

∣∣∣∣≤ ‖(EL)k∗‖F

∥∥∥(E∗LEL)−1

2

∥∥∥ ‖B‖∥∥∥(E∗RER)−1

2

∥∥∥ ‖(ER)∗l‖F

≤√

r

k1k2µ1

√r

(n1 − k1 + 1) (n2 − k2 + 1)

≤ µ1csr

n1n2.

Since A(k,l) has only ωk,l nonzero entries each of magnitude 1√ωk,l

, this leads to

1

ω2k,l

∣∣∣∣∣∣∑

(α,β)∈Ωe(k,l)

(UV ∗)α,β

∣∣∣∣∣∣2

=1

ωk,l

∣∣⟨UV ∗,A(k,l)

⟩∣∣2≤ 1

ωk,l

(maxk,l

∣∣∣(UV ∗)k,l

∣∣∣) 1√ωk,l· ωk,l

2

≤(

maxk,l

∣∣∣(UV ∗)k,l

∣∣∣)2

≤ µ21c

2s r

r

n21n

22

,

which indicates that µ2 ≤ µ21c

2s r.

(4) Finally, we split∑

a∈[n1]×[n2]

∣∣⟨PTAb,√ωaAa

⟩∣∣2 as follows∑a∈[n1]×[n2]

|〈PTAb,√ωaAa〉|2 =

∑a∈[n1]×[n2]

|〈(PU + PV − PUPV )Ab,√ωaAa〉|2

≤ 3∑

a∈[n1]×[n2]

|〈PUAb,

√ωaAa〉|2 + |〈PVAb,

√ωaAa〉|2 + |〈PUPVAb,

√ωaAa〉|2

25

Now look at∑

a

∣∣⟨PUAb,√ωaAa

⟩∣∣2 =∑

a

∣∣⟨UU∗Ab,√ωaAa

⟩∣∣2. We know that

‖UU∗Ab‖2F ≤µ1csr

n1n2, (57)

and that UU∗Ab has ωb non-zero columns, or,

UU∗Abcolumn permutation

=1√ωb

Ub︸︷︷︸ωb columns

,0

, (58)

and hence⟨UU∗Ab,

√ωaAa

⟩is simply the sum of all entries of UU∗Ab lying in the set Ωe(a). Since there are

at most ωb nonzero entries (due to the above structure of UU∗Ab) in each sum, we can bound

|〈UU∗Ab,√ωaAa〉|2 =

∣∣∣∣∣∣∑

(α,β)∈Ωe(a)

(UU∗Ab)α,β

∣∣∣∣∣∣2

≤ ωb

∑(α,β)∈Ωe(a)

∣∣∣(UU∗Ab)α,β

∣∣∣2using the inequality (

∑ωb

i=1 xi)2 ≤ ωb

∑ωb

i=1 x2i . This then gives∑

a∈[n1]×[n2]

|〈UU∗Ab,√ωaAa〉|2 ≤ ωb

∑a∈[n1]×[n2]

∑(α,β)∈Ωe(a)

∣∣∣(UU∗Ab)α,β

∣∣∣2≤ ωb ‖UU∗Ab‖2F ≤ ωb

µ1csr

n1n2,

where the last inequality follows from Lemma 4. Similarly, one has∑a∈[n1]×[n2]

|〈AbV V ∗,√ωaAa〉|2 ≤ ωb

µ1csr

n1n2.

To summarize, ∑a∈[n1]×[n2]

|〈PTAb,√ωaAa〉|2

≤ 3∑

a∈[n1]×[n2]

|〈PUAb,

√ωaAa〉|2 + |〈PVAb,

√ωaAa〉|2 + |〈PUPVAb,

√ωaAa〉|2

≤ 6µ1csωbr

n1n2+

3µ3csωbr

n1n2.

APPENDIX BPROOF OF LEMMA 3

Consider any valid perturbation H obeying PΩ (X + H) = PΩ (X), and denote by He the enhanced formof H . We note that the constraint requires A′Ω (He) = 0 (or AΩ (He) = 0) and A⊥ (He) = 0. In addition, setW 0 = PT⊥ (B) for any B that satisfies 〈B,PT⊥ (He)〉 = ‖PT⊥ (He)‖∗ and ‖B‖ ≤ 1. Therefore, W 0 ∈ T⊥ and‖W 0‖ ≤ 1, and hence UV ∗ + W 0 is a subgradient of the nuclear norm at Xe. We will establish this lemma byconsidering two scenarios separately.

(1) Consider first the case in which He satisfies

‖PT (He)‖F ≤n2

1n22

2‖PT⊥ (He)‖F . (59)

Since UV ∗ + W 0 is a subgradient of the nuclear norm at Xe, it follows that

‖Xe + He‖∗ ≥ ‖Xe‖∗ + 〈UV ∗ + W 0,He〉= ‖Xe‖∗ + 〈UV ∗ + W ,He〉+ 〈W 0,He〉 − 〈W ,He〉

= ‖Xe‖∗ +⟨(A′Ω +A⊥

)(UV ∗ + W ) ,He

⟩+ 〈W 0,He〉 − 〈W ,He〉 (60)

≥ ‖Xe‖∗ + ‖PT⊥ (He)‖∗ − 〈W ,He〉 (61)

26

where (60) holds from (35), and (61) follows from the property of W 0 and the fact that(A′Ω +A⊥

)(He) = 0.

The last term of (61) can be bounded as

〈W ,He〉 = 〈PT (W ) ,He〉+ 〈PT⊥ (W ) ,He〉≤ ‖PT (W )‖F ‖PT (He)‖F + ‖PT⊥ (W )‖ ‖PT⊥ (He)‖∗≤ 1

2n21n

22

‖PT (He)‖F +1

2‖PT⊥ (He)‖∗ ,

where the last inequality follows from the assumptions (36) and (37). Plugging this into (61) yields

‖Xe + He‖∗ ≥ ‖Xe‖∗ −1

2n21n

22

‖PT (He)‖F +1

2‖PT⊥ (He)‖∗

≥ ‖Xe‖∗ −1

4‖PT⊥ (He)‖F +

1

2‖PT⊥ (He)‖F (62)

≥ ‖Xe‖∗ +1

4‖PT⊥ (He)‖F

where (62) follows from the inequality ‖M‖∗ ≥ ‖M‖F and (59). Therefore, Xe is the minimizer of EMaC.We still need to prove the uniqueness of the minimizer. The inequality (62) implies that ‖Xe + He‖∗ = ‖Xe‖∗

only when ‖PT⊥ (He)‖F = 0. If ‖PT⊥ (He)‖F = 0, then ‖PT (He)‖F ≤n2

1n22

2 ‖PT⊥ (He)‖F = 0, and hencePT⊥ (He) = PT (He) = 0, which only occurs when He = 0. Hence, Xe is the unique minimizer in this situation.

(2) On the other hand, consider the complement scenario where the following holds

‖PT (He)‖F ≥n2

1n22

2‖PT⊥ (He)‖F . (63)

We would first like to bound∥∥(n1n2

m AΩ +A⊥)PT (He)

∥∥F and

∥∥(n1n2

m AΩ +A⊥)PT⊥ (He)

∥∥F. The former term

can be lower bounded by∥∥∥(n1n2

mAΩ +A⊥

)PT (He)

∥∥∥2

F

=⟨(n1n2

mAΩ +A⊥

)PT (He) ,

(n1n2

mAΩ +A⊥

)PT (He)

⟩=⟨n1n2

mAΩPT (He) ,

n1n2

mAΩPT (He)

⟩+⟨A⊥PT (He) ,A⊥PT (He)

⟩≥⟨PT (He) ,

n1n2

mAΩPT (He)

⟩+⟨PT (He) ,A⊥PT (He)

⟩(64)

=⟨PT (He) ,PT

(n1n2

mAΩ +A⊥

)PT (He)

⟩= 〈PT (He) ,PT (He)〉+

⟨PT (He) ,

(n1n2

mPTAΩPT − PTAPT

)PT (He)

⟩≥‖PT (He)‖2F −

∥∥∥PTAPT − n1n2

mPTAΩPT

∥∥∥ ‖PT (He)‖2F (65)

≥(

1−∥∥∥PTAPT − n1n2

mPTAΩPT

∥∥∥) ‖PT (He)‖2F

≥ 1

2‖PT (He)‖2F . (66)

On the other hand, since the operator norm of any projection operator is bounded above by 1, one can verifythat ∥∥∥n1n2

mAΩ +A⊥

∥∥∥ ≤ n1n2

m

(∥∥∥Aa1+A⊥

∥∥∥+

m∑i=2

‖Aai‖

)≤ n1n2,

where ai (1 ≤ i ≤ m) are m uniform random indices that form Ω. This implies the following bound:∥∥∥(n1n2

mAΩ +A⊥

)PT⊥ (He)

∥∥∥F≤ n1n2 ‖PT⊥ (He)‖F ≤

2

n1n2‖PT (He)‖F , (67)

27

where the last inequality arises from our assumption. Combining this with the above two bounds yields

0 =∥∥∥(n1n2

mAΩ +A⊥

)(He)

∥∥∥F≥∥∥∥(n1n2

mAΩ +A⊥

)PT (He)

∥∥∥F−∥∥∥(n1n2

mAΩ +A⊥

)PT⊥ (He)

∥∥∥F

≥√

1

2‖PT (He)‖F −

2

n1n2‖PT (He)‖F

≥ 1

2‖PT (He)‖F ≥

n21n

22

4‖PT⊥ (He)‖F ≥ 0,

which immediately indicates PT⊥ (He) = 0 and PT (He) = 0. Hence, (63) can only hold when He = 0.

APPENDIX CPROOF OF LEMMA 4

By definition, we have the identities∥∥PT (A(k,l)

)∥∥2

F=⟨PT(A(k,l)

),A(k,l)

⟩=⟨PU(A(k,l)

)+ PV

(A(k,l)

)− PUPV

(A(k,l)

),A(k,l)

⟩=∥∥PU (A(k,l)

)∥∥2

F+∥∥PV (A(k,l)

)∥∥2

F−∥∥PUPV (A(k,l)

)∥∥2

F.

Since U (resp. V ) and EL (resp. ER) determine the same column (resp. row) space, we can write

UU∗ = EL (E∗LEL)−1 E∗L and V V ∗ = E∗R (ERE∗R)−1 ER,

and thus ∥∥PT (A(k,l)

)∥∥2

F≤∥∥PU (A(k,l)

)∥∥2

F+∥∥PV (A(k,l)

)∥∥2

F

≤∥∥∥EL (E∗LEL)−1 E∗LA(k,l)

∥∥∥2

F+∥∥∥A(k,l)E

∗R (ERE

∗R)−1 ER

∥∥∥2

F

≤ 1

σmin (E∗LEL)

∥∥E∗LA(k,l)

∥∥2

F+

1

σmin (ERE∗R)

∥∥A(k,l)E∗R

∥∥2

F.

Note that √ωk,lE∗LA(k,l) consists of ωk,l columns of E∗L (and hence it contains rωk,l nonzero entries in total).Owing to the fact that each entry of E∗L has magnitude 1√

k2k2, one can derive∥∥E∗LA(k,l)

∥∥2

F=

1

ωk,l· rωk,l ·

1

k1k2=

r

k1k2≤ rcs

n1n2.

A similar argument yields ∥∥A(k,l)E∗R

∥∥2

F≤ csr

n1n2.

We know from Lemma 2 that E∗LEL = GL and ERE∗R = GT

L , and hence σmin (E∗LEL) ≥ 1µ1

and σmin (ERE∗R) ≥

1µ1

. One can, therefore, conclude that for every (k, l) ∈ [n1]× [n2],∥∥PU (A(k,l)

)∥∥2

F≤ µ1csr

n1n2,∥∥PV (A(k,l)

)∥∥2

F≤ µ1csr

n1n2, and

∥∥PT (A(k,l)

)∥∥2

F≤ 2µ1csr

n1n2. (68)

APPENDIX DPROOF OF LEMMA 5

Define a family of operators

∀(k, l) ∈ [n1]× [n2] : Z(k,l) :=n1n2

mPTA(k,l)PT −

1

mPTAPT .

We can also compute

PTA(k,l)PT (M) = PT⟨

A(k,l),PTM⟩A(k,l)

= PT

(A(k,l)

) ⟨PT(A(k,l)

),M

⟩, (69)

28

and hence (PTA(k,l)PT

)2(M) =

[PTA(k,l)PT

PT(A(k,l)

)] ⟨PT(A(k,l)

),M

⟩= PT

⟨A(k,l),PT

(A(k,l)

)⟩A(k,l)

⟨PT(A(k,l)

),M

⟩=⟨A(k,l),PT

(A(k,l)

)⟩PT(A(k,l)

) ⟨PT(A(k,l)

),M

⟩. (70)

Comparing (69) and (70) gives(PTA(k,l)PT

)2=⟨A(k,l),PT

(A(k,l)

)⟩PTA(k,l)PT ≤

2µ1csr

n1n2PTA(k,l)PT , (71)

where the inequality follows from our assumption that⟨A(k,l),PT

(A(k,l)

)⟩=∥∥PT (A(k,l)

)∥∥2

F≤ 2µ1csr

n1n2.

Let ai (1 ≤ i ≤ m) be m independent random pairs uniformly chosen from [n1]×[n2], then we have E (Zai) = 0.This further gives

EZ2ai = E

(n1n2

mPTAaiPT

)2−(E(n1n2

mPTAaiPT

))2

=n2

1n22

m2E (PTAaiPT )2 − 1

m2(PTAPT )2 ,

We can then bound the operator norm as

∥∥E (Z2ai

)∥∥ ≤ ∥∥∥∥n21n

22

m2E (PTAaiPT )2

∥∥∥∥+1

m2

∥∥∥(PTAPT )2∥∥∥

≤ n21n

22

m2

∥∥∥E (PTAaiPT )2∥∥∥+

1

m2

≤ n21n

22

m2

2µ1csr

n1n2‖E (PTAaiPT )‖+

1

m2(72)

=2µ1csrn1n2

m2

1

n1n2‖PTAPT ‖+

1

m2

≤ 4µ1csr

m2:= V0, (73)

where (72) uses the fact that PTAaiPT 0. Besides, the first equality of (71) gives∥∥PTA(k,l)PT

∥∥2 ≤∥∥PTA(k,l)

∥∥2

F

∥∥PTA(k,l)PT∥∥ and hence

∥∥PTA(k,l)PT∥∥ ≤ ∥∥PTA(k,l)

∥∥2

F, which immediately yields

‖Zai‖ ≤n1n2

m‖PTAaiPT ‖+

1

m‖PTAPT ‖ ≤

n1n2

m‖PTAai‖

2F +

1

m<

4µ1csr

m.

This together with (73) gives2mV0

‖Zai‖≥ 2.

Applying the Operator Bernstein Inequality [28, Theorem 6] yields that for any t ≤ 2, we have

P

(∥∥∥∥∥m∑i=1

Zai

∥∥∥∥∥ > t

)≤ 2n1n2 exp

(− t2

16µ1csrm

).

Finally, one can observe that∑m

i=1Zai is equivalent to n1n2

m PTAΩPT−PTAPT in distribution, which completesthe proof.

29

APPENDIX EPROOF OF LEMMA 6

Fix any b ∈ [n1]× [n2]. For any a ∈ [n1]× [n2], define

za =1

qn1n2〈Ab,PTAF 〉 −

⟨Ab,

1

qPTAa

⟩〈Aa,F 〉 .

Then for any i.i.d. αi’s chosen uniformly at random from [n1]× [n2], we can easily check that E (zαi) = 0. Definea multiset Ωl := αi | 1 ≤ i ≤ qn1n2, then the decomposition

AΩlF =

qn1n2∑i=1

Aαi 〈Aαi ,F 〉

allows us to derive

〈Ab,PTAΩlF 〉 =

⟨Ab,

qn1n2∑i=1

PTAαi 〈Aαi ,F 〉

⟩,

and thusqn1n2∑i=1

zαi = 〈Ab,PTAF 〉 −qn1n2∑i=1

⟨Ab,

1

qPTAαi

⟩〈Aαi ,F 〉

= 〈Ab,PTAF 〉 −1

q〈Ab,PTAΩlF 〉

=

⟨Ab,

(PTAPT −

1

qPTAΩlPT

)F

⟩.

Owing to the fact that Ezαi = 0, we can bound the variance of each term as follows

E |zαi |2 = Var

(⟨Ab,

1

qPTAαi

⟩〈Aαi ,F 〉

)≤ E

∣∣∣∣⟨Ab,1

qPTAαi

⟩〈Aαi ,F 〉

∣∣∣∣2=

1

n1n2

∑a∈[n1]×[n2]

∣∣∣∣⟨Ab,1

qPTAa

⟩〈Aa,F 〉

∣∣∣∣2≤ 1

q2

ν (F )

n1n2

∑a∈[n1]×[n2]

|〈PTAb,Aa〉|2 ωa

≤ µ4rν(F )

(qn1n2)2ωb,

where the last inequality arises from the definition of µ4, i.e. for every b ∈ [n1]× [n2],∑a∈[n1]×[n2]

|〈PTAb,Aa〉|2 ωa ≤µ4r

n1n2ωb. (74)

This immediately gives

1

ωbE

(qn1n2∑i=1

|zαi |2

)≤ µ4rν(F )

qn1n2≤ max µ4, 3µ1cs rν(F )

qn1n2:= V.

On the other hand, Lemma 2 shows the inequality

|〈Ab,PTAa〉| ≤√ωb

ωa

3µ1csr

n1n2, (75)

30

which further leads to

1√ωb

∣∣∣∣⟨Ab,1

qPTAa

⟩〈Aa,F 〉

∣∣∣∣ ≤√ωaν (F )1√ωbq|〈Ab,PTAa〉|

≤√ν (F )

1

q

3µ1csr

n1n2.

Since 1qn1n2

〈Ab,PTAF 〉 = E⟨Ab,

1qPTAαi

⟩〈Aαi ,F 〉, one has as well

1√ωb

∣∣∣∣ 1

qn1n2〈Ab,PTAF 〉

∣∣∣∣ =1√ωb

∣∣∣∣E⟨Ab,1

qPTAa

⟩〈Aa,F 〉

∣∣∣∣ ≤√ν (F )1

q

3µ1csr

n1n2,

which immediately leads to

1√ωb|zαi | ≤

1√ωb

∣∣∣∣ 1

qn1n2〈Ab,PTAF 〉

∣∣∣∣+1√ωb

∣∣∣∣⟨Ab,1

qPTAαi

⟩〈Aαi ,F 〉

∣∣∣∣≤√ν (F )

1

q

6µ1csr

n1n2

The above bounds indicate that2V

1√ωb|za|≥√ν(F ).

Applying the operator Bernstein inequality [28, Theorem 6] yields for any t < ν (F ),

P

1

ωb

∣∣∣∣∣qn1n2∑i=1

zαi

∣∣∣∣∣2

> t

≤ c6 exp

(− tqn1n2

4 max µ4, 3µ1cs rν(F )

).

Thus, there are some constants c7, c7 > 0 such that whenever qn1n2 > c7 max µ4, 3µ1cs r log (n1n2) or,equivalently, m > c7 max µ4, 3µ1cs r log2 (n1n2), we have

P

(|∑qn1n2

i=1 zαi |2

ωb>

1

4ν(F )

)≤ c6 exp

(− qn1n2

16 max µ4, 3µ1cs rν(F )

)≤ 1

(n1n2)4 .

Finally, we observe that in distribution,

v

((PTAPT −

1

qPTAΩlPT

)F

)= max

b∈[n1]×[n2]

|∑qn1n2

i=1 zαi |2

ωb.

Applying a simple union bound over all b ∈ [n1]× [n2] allows us to derive (41).

APPENDIX FPROOF OF LEMMA 7

For any a ∈ [n1]× [n2], define

Ha =1

qPT⊥ (Aa) 〈Aa,F 〉+

1

qn1n2PT⊥A⊥ (F ) .

Let αi (1 ≤ i ≤ qn1n2) be independently and uniformly drawn from [n1]× [n2] which forms Ωl. Observing that

AF =∑

a∈[n1]×[n2]

Aa 〈Aa,F 〉 ,

we can writePT⊥AF =

∑a∈[n1]×[n2]

PT⊥ (Aa) 〈Aa,F 〉 .

31

This immediately gives

EHαi =1

qn1n2PT⊥A⊥ (F ) +

1

qn1n2

∑a∈[n1]×[n2]

PT⊥ (Aa) 〈Aa,F 〉

=1

qn1n2PT⊥A⊥ (F ) +

1

qn1n2PT⊥A (F )

=1

qn1n2PT⊥ (F ) = 0.

Moreover, we have, in distribution, the following identity

PT⊥(

1

qAΩl +A⊥

)(F ) =

qn1n2∑i=1

Hαi .

On the other hand, since EHαi = 0, if we denote Yi = 1qPT⊥ (Aαi) 〈Aαi ,F 〉, then Hαi = Yi−EYi, and hence

EHαiH∗αi = E (Yi − EYi) (Yi − EYi)∗ ≤ EYiY∗i =

1

q2n1n2

∑a∈[n1]×[n2]

|〈Aa,F 〉|2 PT⊥ (Aa) (PT⊥ (Aa))∗ .

The definition of the spectral norm ‖M‖ := maxψ:‖ψ‖2=1 〈ψ,Mψ〉 allows us to bound

∥∥E (HαiH∗αi

)∥∥ ≤ 1

q2max

ψ:‖ψ‖2=1

1

n1n2

∑a∈[n1]×[n2]

|〈Aa,F 〉|2 〈ψ,PT⊥ (Aa) (PT⊥ (Aa))∗ ψ〉

≤ 1

q2n1n2ν(F ) max

ψ:‖ψ‖2=1

⟨ψ,

∑a∈[n1]×[n2]

ωaPT⊥ (Aa) (PT⊥ (Aa))∗

ψ

⟩

≤ 1

q2n1n2ν(F )

∑a∈[n1]×[n2]

ωa ‖Aa‖2 max

ψ:‖ψ‖2=1〈ψ,ψ〉

≤ ν(F )

q2,

where the last inequality uses the fact that ‖Aa‖2 = 1ωa

. Therefore,∥∥∥∥∥E(qn1n2∑i=1

HαiH∗αi

)∥∥∥∥∥ ≤ ν(F )n1n21

q:= V.

.Besides, the definition (40) of ν(F ) allows us to bound∥∥∥∥1

qPT⊥ (Aa) 〈Aa,F 〉

∥∥∥∥ ≤√ν (F )ωa1

q‖Aa‖ =

√ν (F )

1

q.

The fact that EHαi = 0 yields∥∥∥∥ 1

qn1n2PT⊥A⊥ (F )

∥∥∥∥ =

∥∥∥∥E1

qPT⊥ (Aαi) 〈Aαi ,F 〉

∥∥∥∥ ≤√ν (F )1

q,

and hence

‖Hαi‖ ≤∥∥∥∥ 1

qn1n2PT⊥A⊥ (F )

∥∥∥∥+

∥∥∥∥1

qPT⊥ (Aαi) 〈Aαi ,F 〉

∥∥∥∥ ≤ 2√ν(F )

q.

Applying the Operator Bernstein inequality [28, Theorem 6] yields that for any t ≤√ν(F )n1n2, we have∥∥∥PT⊥ (n1n2

mAΩ +A⊥

)(F )

∥∥∥ > t

with probability at most c8 exp(− c9qt2

ν(F )n1n2

)for some positive constants c8 and c9.

32

APPENDIX GPROOF OF LEMMA 8

Suppose there is a non-zero perturbation (H,T ) such that (X + H,S + T ) is the optimizer of Robust-EMaC.One can easily verify that PΩ⊥ (S + T ) = 0, otherwise we can always set S + T as PΩ (S + T ) to yield a betterestimate. This together with the fact that PΩ⊥ (S) = 0 implies that PΩ (T ) = T . Observe that the constraints ofRobust-EMaC indicate

PΩ (X + S) = PΩ (X + H + S + T ) ⇒ PΩ (H + T ) = 0,

which is equivalent to requiring A′Ω (He) = −A′Ω (T e) = −T e and A⊥ (He) = 0.Recall that He and Se are the enhanced forms of H and S, respectively. Set W 0 ∈ T⊥ to be a matrix satisfying

〈W 0,PT⊥ (He)〉 = ‖PT⊥ (He)‖∗ and ‖W 0‖ ≤ 1, then UV ∗ + W 0 is a subgradient of the nuclear norm at Xe.This gives

‖Xe + He‖∗ ≥ ‖Xe‖∗ + 〈UV ∗ + W 0,He〉= ‖Xe‖∗ + 〈UV ∗,He〉+ ‖PT⊥ (He)‖∗ . (76)

Owing to the fact that support (S) ⊆ Ωdirty, one has Se = A′Ωdirty (Se). Combining this and the fact thatsupport (Se + T e) ⊆ Ω yields

‖Se + T e‖1 =∥∥A′Ωclean(T e)

∥∥1

+∥∥Se +A′Ωdirty (T e)

∥∥1,

which further gives

‖Se + T e‖1 − ‖Se‖1 =∥∥A′Ωclean (T e)

∥∥1

+∥∥Se +A′Ωdirty (T e)

∥∥1− ‖Se‖1

≥∥∥A′Ωclean (T e)

∥∥1

+⟨sgn (Se) ,A′Ωdirty (T e)

⟩(77)

=∥∥A′Ωclean (T e)

∥∥1−⟨sgn (Se) ,A′Ωdirty (He)

⟩(78)

=∥∥A′Ωclean (T e)

∥∥1−⟨A′Ωdirty (sgn (Se)) ,He

⟩=∥∥A′Ωclean (He)

∥∥1− 〈sgn (Se) ,He〉 . (79)

Here, (77) follows from the fact that sgn(Se) is the subgradient of ‖·‖1 at Se, and (78) arises from the identityPΩdirty (H + T ) = 0 and hence A′Ωdirty (He) = −A′Ωdirty (T e). The inequalities (76) and (79) taken collectively leadto

‖Xe + He‖∗ + λ ‖Se + T e‖1 − (‖Xe‖∗ + λ ‖Se‖1)

≥ 〈UV ∗,He〉+ ‖PT⊥ (He)‖∗ + λ∥∥A′Ωclean (He)

∥∥1− λ 〈sgn (Se) ,He〉

≥ − 〈λsgn (Se)−UV ∗,He〉+ ‖PT⊥ (He)‖∗ + λ∥∥A′Ωclean (He)

∥∥1. (80)

It remains to show that the right-hand side of (80) cannot be negative. For a dual matrix W satisfying Conditions(44), one can derive

〈λsgn (Se)−UV ∗,He〉= 〈W + λsgn (Se)−UV ∗,He〉 − 〈W ,He〉= 〈PT (W + λsgn (Se)−UV ∗) ,PT (He)〉+ 〈PT⊥ (W + λsgn (Se)−UV ∗) ,PT⊥ (He)〉

−⟨A′Ωclean (W ) ,A′Ωclean (He)

⟩−⟨A′

(Ωclean)⊥(W ) ,A′

(Ωclean)⊥(He)

⟩.

≤ λ

n21n

22

‖PT (He)‖F +1

4‖PT⊥ (He)‖∗ +

λ

4

∥∥A′Ωclean (He)∥∥

1 , (81)

where the last inequality follows from the four properties of W in (44). Since (X + H,S + T ) is assumed to bethe optimizer, substituting (81) into (80) then yields

0 ≥‖Xe + He‖∗ + λ ‖Se + T e‖1 − (‖Xe‖∗ + λ ‖Se‖1) (82)

≥ 3

4‖PT⊥ (He)‖∗ +

3

4λ∥∥A′Ωclean (He)

∥∥1− λ

n21n

22

‖PT (He)‖F

≥ 3

4‖PT⊥ (He)‖∗ +

3

4λ∥∥A′Ωclean (He)

∥∥F −

λ

n21n

22

‖PT (He)‖F , (83)

33

where (83) arises due to the inequality ‖M‖F ≤ ‖M‖1.The invertibility condition (42) on PTAΩcleanPT is equivalent to∥∥∥∥PT − PT ( 1

ρ (1− s)AΩclean +A⊥

)PT∥∥∥∥ ≤ 1

2,

indicating that

1

2‖PT (He)‖F ≤

∥∥∥∥PT ( 1


)PT (He)

∥∥∥∥F≤ 3

2‖PT (He)‖F .

One can, therefore, bound ‖PT (He)‖F as follows

‖PT (He)‖F ≤ 2

∥∥∥∥PT ( 1


)PT (He)

∥∥∥∥F

≤ 2

ρ (1− s)‖PTAΩcleanPT (He)‖F + 2

∥∥∥PTA⊥PT (He)∥∥∥

F

≤ 2

ρ (1− s)(‖PTAΩclean (He)‖F + ‖PTAΩcleanPT⊥ (He)‖F)

+ 2∥∥∥PTA⊥ (He)

∥∥∥F

+ 2∥∥∥PTA⊥PT⊥ (He)

∥∥∥F

≤ 2

ρ (1− s)(‖AΩclean (He)‖F + ‖AΩcleanPT⊥ (He)‖F) + 2 ‖PT⊥ (He)‖F , (84)

where the last inequality exploit the facts that A⊥ (He) = 0 and ‖PT (M)‖F ≤ ‖M‖F.Recall that AΩclean corresponds to sampling with replacement. Condition (43) together with (84) leads to

‖PT (He)‖F ≤20 log (n1n2)

ρ (1− s)(∥∥A′Ωclean (He)

∥∥F +

∥∥A′ΩcleanPT⊥ (He)∥∥

F

)+ 2 ‖PT⊥ (He)‖F

≤ 20 log (n1n2)

ρ (1− s)∥∥A′Ωclean (He)

∥∥F +

(20 log (n1n2)

ρ (1− s)+ 2

)‖PT⊥ (He)‖F

≤ 20 log (n1n2)

ρ (1− s)∥∥A′Ωclean (He)

∥∥F +

(20 log (n1n2)

ρ (1− s)+ 2

)‖PT⊥ (He)‖∗ , (85)

where the last inequality follows from the fact that ‖M‖F ≤ ‖M‖∗. Substituting (85) into (83) yields(3

4− λ

n21n

22

(20 log (n1n2)

ρ (1− s)+ 2

))‖PT⊥ (He)‖∗ + λ

(3

4− 20 log (n1n2)

ρ (1− s)n21n

22

)∥∥A′Ωclean (He)∥∥

F ≤ 0. (86)

Since λ < 1 and ρn21n

22 log (n1n2), both terms on the left-hand side of (86) are positive. This can only occur

whenPT⊥ (He) = 0 and A′Ωclean (He) = 0. (87)

(1) Consider first the situation where


1n22

2‖PT⊥ (He)‖F . (88)

One can immediately see that


1n22

2‖PT⊥ (He)‖F = 0

⇒ PT (He) = PT⊥ (He) = 0; ⇒ He = 0.

That said, Robust-EMaC succeeds in finding Xe under Condition (88).(2) Consider instead the complement situation where

‖PT (He)‖F >n2

1n22

2‖PT⊥ (He)‖F .

Note that A′Ωclean(He) = A⊥(He) = 0 and∥∥∥PTAPT − 1

ρ(1−s)PTAΩcleanPT∥∥∥ ≤ 1

2 . Using the same argument as inthe proof of Lemma 3 (see the second part of Appendix B) with Ω replaced by Ωclean, we can conclude He = 0.

34

APPENDIX HPROOF OF LEMMA 9

By definition of ν(·), we can bound

ν (F 0) = max(k,l)∈[n1]×[n2]

1

ωk,l

∣∣⟨A(k,l),UV ∗ − λPT sgn (Se)⟩∣∣2

≤ max(k,l)∈[n1]×[n2]

2

ωk,l

∣∣⟨A(k,l),UV ∗⟩∣∣2 +

2

ωk,lλ2∣∣⟨A(k,l),PT (sgn (Se))

⟩∣∣2≤ 2µ2r

n21n

22

+ max(k,l)∈[n1]×[n2]

2λ2

ωk,l

∣∣⟨A(k,l),PT (sgn (Se))⟩∣∣2 , (89)

where the last inequality follows from the definition of µ2 in (19). In order to develop a good bound on ν (F 0),we still need to bound the component

∣∣⟨A(k,l),PT (sgn (Se))⟩∣∣. This is achieved through the following lemma.

Lemma 11. Suppose that s is a positive constant. If ρ > c21max(µ4,µ1cs)r

n1n2log (n1n2), one has

max(k,l)∈[n1]×[n2]

1

ωk,l

∣∣⟨A(k,l),PT (sgn (Se))⟩∣∣2 ≤ c20

ρsmax (µ4, µ1cs) r log (n1n2)

n1n2

with probability at least 1− (n1n2)−4.

By setting λ := 1√n1n2ρ log(n1n2)

, we can derive from Lemma 11 that

max(k,l)∈[n1]×[n2]

2

ωk,lλ2∣∣⟨A(k,l),PT (sgn (Se))

⟩∣∣2 < c20max (µ4, µ1cs) r

n21n

22

,

where we also use the assumption s ≤ 1/2. This together with (89) leads to

ν (F 0) ≤ c9 max (µ1cs, µ2, µ4) r

n21n

22

for some c9 > 0. Applying Lemma 6 then shows that

ν (F i) ≤1

4iν (F 0) ≤ c9 max (µ1cs, µ4) r

4in21n

22

(90)

with high probability, as long as m > c7 max (µ1cs, µ4) r log2 (n1n2) for some c7 > 0.Finally, the proof of Lemma 11 proceeds as follows.

Proof: By definition, Ωdirty is the set of distinct locations that appear in Ω but not in Ωclean. To simplify theanalysis, we introduce an auxiliary multiset Ωdirty that contains ρsn1n2 i.i.d. entries. Specifically, suppose that Ω =ai | 1 ≤ i ≤ ρn1n2, Ωclean = ai | 1 ≤ i ≤ ρ (1− s)n1n2 and Ωdirty = ai | ρ (1− s)n1n2 < i ≤ ρn1n2,where ai’s are independently and uniformly selected from [n1]× [n2].

In addition, we consider an equivalent model for sgn (S) as follows• Define K = (Kα,β)1≤α≤n1,1≤β≤n2

to be a random n1 × n2 matrix such that all of its entries are independentand have amplitude 1 (i.e. in the real case, all entries are either 1 or -1, and in the complex case, all entrieshave amplitude 1 and arbitrary phase). We assume that EK = 0.

• Set sgn (S) such that sgn (Sα,β) = Kα,β1(α,β)∈Ωdirty, and hence

sgn (Se) =∑

(α,β)∈Ωdirty

Kα,β√ωα,βAα,β.

Recall that support (S) ⊆ Ωdirty. Rather than directly studying the property of sgn (S e), we will first examine theproperty of an auxiliary matrix

S e :=

ρn1n2∑i=ρ(1−s)n1n2+1

Kai√ωaiAai ,

and then bound the difference between S e and sgn (S e).

35

For any given pair (k, l) ∈ [n1]× [n2], define a random variable

Zα,β : =

√ωα,βωk,l

⟨PTA(k,l),Kα,βAα,β

⟩.

Thus, Zai’s are conditionally independent given K. The conditional mean and second moment of Zai can becomputed as

E (Zai |K) =1

n1n2

1√ωk,l

⟨PTA(k,l),Ke

⟩,

and

Var (Zai |K) ≤ E(ZaiZ∗ai |K

)=

1

n1n2

1

ωk,l

∑b∈[n1]×[n2]

ωb

∣∣⟨PTA(k,l),Ab

⟩∣∣2≤ max (µ1cs, µ4) r

n21n

22

:= V,

where the last inequality follows from the definition of µ4. Besides, applying Lemma 2 allows us to bound themagnitude of Zα,β as follows

|Zα,β| ≤3µ1csr

n1n2:= B. (91)

Applying the Bernstein inequality [28, Theorem 6] yields that: for any t ≤ 23ρs,

P

∣∣∣∣∣∣ ρn1n2∑i=ρ(1−s)n1n2+1

Zai

− ρs√ωk,l

⟨PTA(k,l),Ke

⟩∣∣∣∣∣∣ > t

≤ 2n1n2 exp

(− t2

4ρsn1n2V

)

= 2n1n2 exp

(− t2

4ρsmax(µ1cs,µ4)rn1n2

). (92)

Recall that S e :=∑ρn1n2

i=ρ(1−s)n1n2+1Kai√ωaiAai . Conditional on K, 1√

ωk,l

⟨A(k,l),PT

(S e

)⟩is equivalent to∑ρn1n2

i=ρ(1−s)n1n2+1Zai in distribution. We also note that for any constant c20 > 0, there exists a constant c21 > 0

such that if ρsn1n2 > c21 max (µ1cs, µ4) r log (n1n2), then one has√c20ρsmax (µ1cs, µ4) r log (n1n2)

n1n2≤ 2

3ρs.

Therefore, by setting t :=√

c20ρsmax(µ1cs,µ4)r log(n1n2)n1n2

in (92), we can show that with probability exceeding 1 −(n1n2)−4, one has, when conditional on K, that

1

ωk,l

∣∣∣⟨A(k,l),PT(S e

)⟩− ρs

⟨PTA(k,l),Ke

⟩∣∣∣2 ≤ c20ρsmax (µ1cs, µ4) r log (n1n2)

n1n2(93)

for every (k, l) ∈ [n1]×[n2], provided that ρsn1n2 > c21 max (µ1cs, µ4) r log (n1n2) for some constants c20, c21 > 0.The next step is to bound ρs√

ωk,l

⟨PTA(k,l),Ke

⟩. For convenience of analysis, we represent as

Ke =∑

a∈[n1]×[n2]

za√ωaAa,

where za’s are independent (not necessarily i.i.d.) zero-mean random variables satisfying |za| = 1. Let Ya :=1√ωk,l

⟨PTA(k,l), za

√ωaAa

⟩, then the definition of µ4 and Lemma 2 allow us to compute

EYa = 0,

|Ya| =1√ωk,l

∣∣⟨PTA(k,l),√ωaAa

⟩∣∣ ≤ 3µ1csr

n1n2,

36

and ∑a∈[n1]×[n2]

EYaY∗a =1

ωk,l

∑a∈[n1]×[n2]

∣∣⟨PTA(k,l),√ωaAa

⟩∣∣2 ≤ max µ1cs, µ4 rn1n2

:= V .

Applying the Bernstein inequality [28, Theorem 6] suggests that for any t < 23 ,

P∣∣∣∣ 1√ωk,l

⟨PTA(k,l),Ke

⟩∣∣∣∣ ≥ t ≤ 2n1n2 exp

(− t2

4maxµ1cs,µ4rn1n2

).

Consequently, there exists a constant c24 > 0 such that

ρ2s2

ωk,l

∣∣⟨PTA(k,l),Ke⟩∣∣2 ≤ c24ρ

2s2 max µ1cs, µ4 r log (n1n2)

n1n2

with high probability. This together with (93) suggests that

1

ωk,l


)⟩∣∣∣2 =

∣∣∣∣∣∣ρn1n2∑

i=ρ(1−s)n1n2+1

Zai

∣∣∣∣∣∣2

≤ 2

∣∣∣∣∣∣ ρn1n2∑i=ρ(1−s)n1n2+1

Zai

− ρs√ωk,l

⟨PTA(k,l),Ke

⟩∣∣∣∣∣∣2

+ 2ρ2s2

ωk,l

∣∣⟨PTA(k,l),Ke⟩∣∣2

≤ 4c24ρsmax (µ1cs, µ4) r log (n1n2)

n1n2(94)

with high probability.We still need to bound the deviation of S e from sgn (S e). Observe that the difference between them arise

from sampling with replacement, i.e. there are a few entries in ai | ρ (1− s)n1n2 < i ≤ ρn1n2 that either fallwithin Ωclean or have appeared more than once. A simple Chernoff bound argument (e.g. [40]) indicates the numberof aforementioned conflicts is upper bounded by 10 log (n1n2) with high probability. That said, one can find acollection of entry locations b1, · · · , bN such that

S e − sgn (S e) =

N∑i=1

Kbi√ωbiAbi , (95)

where N ≤ 10 log (n1n2) with high probability. Therefore, we can bound

1√ωk,l

∣∣∣⟨A(k,l),PT(S e − sgn (S e)

)⟩∣∣∣ ≤ N∑i=1

1√ωk,l

∣∣⟨A(k,l),PT(√ωbiAbi

)⟩∣∣≤ N 3µ1csr

n1n2,

and hence, for some constant c22 > 0,

1

ωk,l


)⟩∣∣∣2 ≤ c22µ2

1c2s r

2 log2 (n1n2)

n21n

22

≤ c20ρsmax (µ1cs, µ4) r log (n1n2)

n1n2

holds with high probability, provided that ρsn1n2 ≥ c23µ1csr log (n1n2) for some c23 > 0.Putting the above results together yields that for every (k, l) ∈ [n1]× [n2],

1

ωk,l

∣∣⟨A(k,l),PT (sgn (S e))⟩∣∣2 ≤ 2

ωk,l


)⟩∣∣∣2 +2

ωk,l


)⟩∣∣∣2≤ c25ρsmax (µ1cs, µ4) r log (n1n2)

n1n2

for some constant c25 > 0, which completes the proof.

37

APPENDIX IPROOF OF LEMMA 10

Consider the model of sgn(S), K and Se as introduced in the proof of Lemma 11 in Appendix H. For any(α, β) ∈ [n1]× [n2], define

Zα,β := Aα,β (Ke) =√ωα,βKα,βAα,β.

With this notation, we can see that Zai’s are conditionally independent given K, and satisfy

E(Zai |K

)=

1

n1n2

∑(α,β)∈[n1]×[n2]

√ωα,βAα,βKα,β =

1

n1n2Ke,

∥∥∥Zα,β∥∥∥ =∥∥√ωα,βAα,β

∥∥ = 1,

and ∥∥∥E(ZaiZ∗ai |K)∥∥∥ ≤ 1

n1n2

∑(α,β)∈[n1]×[n2]

∥∥ωα,βAα,βA∗α,β

∥∥ = 1 := V.

Since Se =∑ρn1n2

i=(1−s)ρn1n2+1 Zai , applying the Bernstein inequality [28, Theorem 6] implies that for any t <2ρsn1n2, one has, conditioned on K, that

P(∥∥∥Se − ρsKe

∥∥∥ > t)≤ 2n1n2 exp

(− t2

4ρsn1n2V

).

Therefore, conditioned on K, there exists a constant c22 > 0 such that∥∥∥Se − ρsKe

∥∥∥ <√c22ρsn1n2 log (n1n2) (96)

with probability at least than 1− n−51 n−5

2 .The next step is to bound the operator norm of ρsKe. For convenience of analysis, we write

Ke =∑

a∈[n1]×[n2]

za√ωaAa,

for a collection of independent (not necessarily i.i.d.) random variables za’s satisfying |za| = 1 and Eza = 0. LetYa := za

√ωaAa, then we have EYa = 0, ‖Ya‖ = 1, and∥∥∥∥∥∥

∑a∈[n1]×[n2]

EYaY∗a

∥∥∥∥∥∥ =

∥∥∥∥∥∥∑

a∈[n1]×[n2]

ωaAaA∗a

∥∥∥∥∥∥ ≤ n1n2 := V .

Therefore, applying the Bernstein inequality yields that for any t ≤ n1n2,

P ‖Ke‖ > t = P

∥∥∥∥∥∥

∑a∈[n1]×[n2]

Ya

∥∥∥∥∥∥ > t

≤ 2n1n2 exp

(− t2

4n1n2

)or, more simply, there exists a constant c22 > 0 such that

‖Ke‖ ≤√c22n1n2 log (n1n2)

with high probability. This and (96), taken collectively, yield∥∥∥Se

∥∥∥ ≤ ∥∥∥Se − ρsKe

∥∥∥+ ρs ‖Ke‖ < 2√c22ρsn1n2 log (n1n2)

with high probability. On the other hand, (95) implies that for sufficiently large n1 and n2,∥∥∥S e − sgn (S e)∥∥∥ ≤ N∑

i=1

∥∥√ωbiAbi

∥∥ = N ≤ 10 log (n1n2)√c22ρsn1n2 log (n1n2)

38

with high probability. Consequently, for a sufficiently small constant s,

‖PT⊥ (λsgn (Se))‖ ≤ λ ‖sgn (Se)‖ ≤ λ∥∥∥Se − sgn (Se)

∥∥∥+ λ∥∥∥Se

∥∥∥≤ 2λ

√c22ρsn1n2 log (n1n2)

= 2√c22s ≤

1

8

with probability exceeding 1− n−51 n−5

2 .

APPENDIX JPROOF OF THEOREM 2

We prove this theorem under the conditions of Lemma 3, i.e. (34)–(37). Note that these conditions are satisfiedwith high probability, as we have shown in the proof of Theorem 1.

Denote the solution of Noisy-EMaC as Xe = Xe + He. Since He is a two-fold Hankel matrix, i.e. He =AΩ (He) +AΩ⊥ (He), we can obtain

‖Xe‖∗ ≥ ‖Xe‖∗ = ‖Xe + He‖∗ ≥ ‖Xe +AΩ⊥(He)‖∗ − ‖AΩ(He)‖∗. (97)

The second term can be bounded using the triangle inequality as

‖AΩ (He)‖F ≤∥∥∥AΩ

(Xe −Xo

e

)∥∥∥F

+ ‖AΩ (Xe −Xoe)‖F . (98)

Since the constraint of Noisy-EMaC requires∥∥∥PΩ

(X −Xo

)∥∥∥F≤ δ and ‖PΩ (X −Xo)‖F ≤ δ, the Hankel

structure of the enhanced form allows us to bound∥∥∥AΩ

(Xe −Xo

e

)∥∥∥F≤ √n1n2δ and ‖AΩ (Xe −Xo

e)‖F ≤√n1n2δ, which immediately leads to

‖AΩ (He)‖F ≤ 2√n1n2δ.

Using the same analysis as for (62) allows us to bound the perturbation AΩ⊥(He) as follows

‖Xe +AΩ⊥(He)‖∗ ≥ ‖Xe‖∗ +1

4‖PT⊥AΩ⊥(He)‖F .

Combining this with (97), we have

‖PT⊥AΩ⊥(He)‖F ≤ 4‖AΩ(He)‖∗ ≤ 4√n1n2‖AΩ(He)‖F ≤ 8n1n2δ.

Further from Lemma 3, we know that

‖PTAΩ⊥ (He)‖F ≤n1n2

m

√2 ‖PT⊥AΩ⊥ (He)‖F . (99)

Therefore, combining all the above results give

‖He‖F ≤ ‖AΩ(He)‖F + ‖PTAΩ⊥ (He)‖F + ‖PT⊥AΩ⊥ (He)‖F

≤

2√n1n2 + 8n1n2 +

8√

2n21n

22

m

δ.

APPENDIX KPROOF OF THEOREM 4

In order to extend the results to structured Hankel matrix completion, from the proof of Theorem 1 it is sufficientto have the first two conditions in (38) to hold for general Hankel matrices. The proof is done by recognizing thesetwo conditions are equivalent to (26).

39

REFERENCES

[1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” MagneticResonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007. I-A

[2] L. Potter, E. Ertin, J. Parker, and M. Cetin, “Sparsity and compressed sensing in radar imaging,” Proceedings of the IEEE, vol. 98,no. 6, pp. 1006–1020, 2010. I-A

[3] L. Borcea, G. Papanicolaou, C. Tsogka, and J. Berryman, “Imaging and time reversal in random media,” Inverse Problems, vol. 18,no. 5, p. 1247, 2002. I-A

[4] L. Schermelleh, R. Heintzmann, and H. Leonhardt, “A guide to super-resolution fluorescence microscopy,” The Journal of cell biology,vol. 190, no. 2, pp. 165–175, 2010. I-A

[5] A. M. Sayeed and B. Aazhang, “Joint multipath-Doppler diversity in mobile wireless communications,” IEEE Transactions onCommunications, vol. 47, no. 1, pp. 123 –132, Jan 1999. I-A

[6] Y. Chi, Y. Xie, and R. Calderbank, “Compressive demodulation of mutually interfering signals,” submitted to IEEE Transactions onInformation Theory, 2013. [Online]. Available: http://arxiv.org/abs/1303.3904 I-A

[7] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequencyinformation,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. I-A, I-B, VIII

[8] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289 –1306, April 2006. I-A, I-B[9] E. J. CandÃšs, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications

on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006. I-A, I-B[10] X. Li, “Compressed sensing and matrix completion with constant proportion of corruptions,” Constructive Approximation, vol. 37, pp.

73–99, 2013. I-A, I-B, VII[11] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” IEEE Transactions on Signal

Processing, vol. 59, no. 5, pp. 2182–2195, May 2011. I-A, I-B[12] M. Duarte and R. Baraniuk, “Spectral compressive sensing,” Applied and Computational Harmonic Analysis, 2012. I-A[13] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed sensing off the grid,” July 2012. [Online]. Available:

http://arxiv.org/abs/1207.6053 I-A, I-B, III-A1, III-B1[14] R. Prony, “Essai experimental et analytique,” J. de l’Ecole Polytechnique (Paris), vol. 1, no. 2, pp. 24–76, 1795. I-B[15] R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on Acoustics,

Speech and Signal Processing, vol. 37, no. 7, pp. 984 –995, Jul 1989. I-B[16] Y. Hua and T. K. Sarkar, “Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise,”

IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 5, pp. 814 –824, may 1990. I-B, II-B[17] D. Tufts and R. Kumaresan, “Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum

likelihood,” Proceedings of the IEEE, vol. 70, no. 9, pp. 975 – 989, sept. 1982. I-B[18] Y. Hua, “Estimating two-dimensional frequencies by matrix enhancement and matrix pencil,” IEEE Transactions on Signal Processing,

vol. 40, no. 9, pp. 2267 –2280, Sep 1992. I-B, II-B[19] P. L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets

strang-fix,” IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 1741 –1757, May 2007. I-B[20] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,” IEEE Transactions on Signal Processing,

vol. 50, no. 6, pp. 1417–1428, 2002. I-B[21] K. Gedalyahu, R. Tur, and Y. C. Eldar, “Multichannel sampling of pulse streams at the rate of innovation,” IEEE Transactions on

Signal Processing, vol. 59, no. 4, pp. 1491–1504, 2011. I-B[22] E. Candes and Y. Plan, “A probabilistic and RIPless theory of compressed sensing,” IEEE Transactions on Information Theory, vol. 57,

no. 11, pp. 7235–7254, 2011. I-B, VIII[23] Y. Chi, A. Pezeshki, L. L. Scharf, and R. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” ICASSP, Mar. 2010. I-B[24] E. J. Candes and C. Fernandez-Granda, “Towards a mathematical theory of super-resolution,” to appear in Communications on Pure

and Applied Mathematics, 2013. I-B, III-A1, V-E, VIII[25] ——, “Super-resolution from noisy data,” November 2012. [Online]. Available: http://arxiv.org/abs/1211.0290 I-B[26] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, “The convex algebraic geometry of linear inverse problems,” 48th Annual

Allerton Conference on Communication, Control, and Computing, pp. 699–703, 2010. I-B[27] E. J. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9,

no. 6, pp. 717–772, April 2009. I-B, II-B, III-A2, VI[28] D. Gross, “Recovering low-rank matrices from few coefficients in any basis,” IEEE Transactions on Information Theory, vol. 57, no. 3,

pp. 1548–1566, March 2011. I-B, I-C, II-B, VI, 4, VI-A, VI-C, VII, D, E, F, H, H, I[29] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of ACM, vol. 58, no. 3, pp. 11:1–11:37,

Jun 2011. I-B, VII, VII[30] S. Negahban and M. Wainwright, “Restricted strong convexity and weighted matrix completion: Optimal bounds with noise,” The

Journal of Machine Learning Research, vol. 98888, pp. 1665–1697, May 2012. I-B[31] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,”

SIAM Review, vol. 52, no. 3, pp. 471–501, 2010. II-C, VIII[32] M. Fazel, T. K. Pong, D. Sun, and P. Tseng, “Hankel matrix rank minimization with applications in system identification and realization,”

2011. IV[33] I. Markovsky, “Structured low-rank approximation and its applications,” Automatica, vol. 44, no. 4, pp. 891–909, 2008. IV[34] B. Balle and M. Mohri, “Spectral learning of general weighted automata via constrained matrix completion,” Advances in Neural

Information Processing Systems (NIPS), pp. 2168–2176, 2012. IV[35] A. Sankaranarayanan, P. Turaga, R. Baraniuk, and R. Chellappa, “Compressive acquisition of dynamic scenes,” Computer Vision–ECCV

2010, pp. 129–142, 2010. IV

http://arxiv.org/abs/1303.3904



40

[36] P. J. Shin, P. E. Larson, M. A. Ohliger, M. Elad, J. M. Pauly, D. B. Vigneron, and M. Lustig, “Calibrationless parallel imagingreconstruction based on structured low-rank matrix completion,” submitted to Magnetic Resonance in Medicine, 2012. IV

[37] M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distancematrices,” American Control Conference, vol. 3, pp. 2156 – 2162 vol.3, June 2003. IV

[38] J. F. Cai, E. J. Candes, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization,vol. 20, no. 4, pp. 1956–1982, 2010. V, V-F

[39] M. Grant, S. Boyd, and Y. Ye, “Cvx: Matlab software for disciplined convex programming,” Online accessiable: http://stanford. edu/˜boyd/cvx, 2008. V-B

[40] N. Alon and J. H. Spencer, The Probabilistic Method (3rd Edition). Wiley, 2008. VII-A, H[41] E. Candes and Y. Plan, “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,”

IEEE Transactions on Information Theory, vol. 57, no. 4, pp. 2342–2359, 2011. VIII

robust spectral compressed sensing via structured matrix ...yuejiec/papers/sparse... · matrix...

Documents